Adding constant value column to spark dataframe

Gaurav Bansal picture Gaurav Bansal · May 17, 2017 · Viewed 26.9k times · Source

I am using Spark version 2.1 in Databricks. I have a data frame named wamp to which I want to add a column named region which should take the constant value NE. However, I get an error saying NameError: name 'lit' is not defined when I run the following command:

wamp = wamp.withColumn('region', lit('NE'))

What am I doing wrong?

Answer

muon picture muon · May 17, 2017

you need to import lit

either

from pyspark.sql.functions import *

will make lit available

or something like

import pyspark.sql.functions as sf
wamp = wamp.withColumn('region', sf.lit('NE'))