May 31, 2022

Faker with PySpark

I’m preparing a small blog post about some tweakings I’ve done for a delta table, but I want to dig into the Spark UI differences before this. As this was done as part of my work I’m reproducing the problem with some generated data. I didn’t know about Faker and boy it is really simple and easy. In this case, I want to generate a small dataset for a dimension product table including its id, category and price. Read more

2017-2024 Adrián Abreu powered by Hugo and Kiss Theme