How can I add a column with a value to a new Dataset in Spark Java?

Juan Carlos Nuño picture Juan Carlos Nuño · Jul 6, 2017 · Viewed 11.6k times · Source

So, I'm creating some Datasets from the java Spark API. These datasets are populated from hive table, using the spark.sql() method.

So, after performing some sql operations (like joins), I have a final dataset. What I want to do is that I want to add a new column to that final dataset, with a value of "1" to all the rows in the dataset. So, you could probably see it as adding a constrain to the Dataset.

So, for example I have this dataset:

Dataset<Row> final = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));

I want to add a new column to the "final" Dataset, something like this

final.addNewColumn("colName", 1); //I know this doesn't work, but just to give you an idea.

Is there a feasible way to add the new column to all the rows of the Dataset with a value of 1?

Answer

koiralo picture koiralo · Jul 6, 2017

If you want to add a constant value then you can use lit function

lit(Object literal)
Creates a Column of literal value.

Also, change the variable name final to something else

Dataset<Row> final12 = otherDataset.select(otherDataset.col("colA"), otherDataSet.col("colB"));


Dataset<Row> result = final12.withColumn("columnName", lit(1)) 

Hope this helps!