if i write
dataFrame.write.format("parquet").mode("append").save("temp.parquet")
in temp.parquet folder i got the same file numbers as the row numbers
i think i'm not fully understand about parquet but is it natural?
Use coalesce
before write operation
dataFrame.coalesce(1).write.format("parquet").mode("append").save("temp.parquet")
EDIT-1
Upon a closer look, the docs do warn about coalesce
However, if you're doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1)
Therefore as suggested by @Amar, it's better to use repartition