May 31, 2022
Faker with PySpark
I’m preparing a small blog post about some tweakings I’ve done for a delta table, but I want to dig into the Spark UI differences before this. As this was done as part of my work I’m reproducing the problem with some generated data.
I didn’t know about Faker and boy it is really simple and easy.
In this case, I want to generate a small dataset for a dimension product table including its id, category and price.
Read more