#Spark | #Databricks | #Structured Streaming

October 27, 2023

Tweaking Spark Kafka

Well, I’m facing a huge interesting case. I’m working at Wallbox where we need to deal with billions of rows every day. Now we need to use Spark for some Kafka filtering and publish the results into different topics according to some rules. I won’t dig deep into the logic except for performance-related stuff, let’s try to increase the processing speed. When reading from Kafka you usually get 1 task per partition, so if you have 6 partitions and 48 cores you are not using 87.5 percent of your cluster. That could be adjusted with the following property **minPartitions.** Read more

2017-2024 Adrián Abreu powered by Hugo and Kiss Theme