How to create sequential number column in pyspark dataframe?

max04 picture max04 · Jul 5, 2018 · Viewed 10.3k times · Source

I would like to create column with sequential numbers in pyspark dataframe starting from specified number. For instance, I want to add column A to my dataframe df which will start from 5 to the length of my dataframe, incrementing by one, so 5, 6, 7, ..., length(df).

Some simple solution using pyspark methods?

Answer

niraj kumar picture niraj kumar · Aug 22, 2019

You can do this using range

df_len = 100
freq =1
ref = spark.range(
    5, df_len, freq
).toDF("id")
ref.show(10)

+---+
| id|
+---+
|  5|
|  6|
|  7|
|  8|
|  9|
| 10|
| 11|
| 12|
| 13|
| 14|
+---+

only showing top 10 rows