October 27, 2023
Tweaking Spark Kafka
Well, I’m facing a huge interesting case. I’m working at Wallbox where we need to deal with billions of rows every day. Now we need to use Spark for some Kafka filtering and publish the results into different topics according to some rules.
I won’t dig deep into the logic except for performance-related stuff, let’s try to increase the processing speed.
When reading from Kafka you usually get 1 task per partition, so if you have 6 partitions and 48 cores you are not using 87.5 percent of your cluster. That could be adjusted with the following property **minPartitions.**
Read more