Top "Spark-dataframe" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

pyspark error: 'DataFrame' object has no attribute 'map'

I am using pyspark 2.0 to create a DataFrame object by reading a csv using: data = spark.read.csv('data.csv', …

apache-spark spark-dataframe apache-spark-2.0
Create DataFrame with null value for few column

I am trying to create a DataFrame using RDD. First I am creating a RDD using below code - val …

scala apache-spark spark-dataframe apache-spark-dataset
how to add a Incremental column ID for a table in spark SQL

I'm working on a spark mllib algorithm. The dataset I have is in this form Company":"XXXX","CurrentTitle":"XYZ","Edu_…

apache-spark apache-spark-sql spark-dataframe apache-spark-mllib
Scala: Replacing double quotes with single quotes

How do you replace single quotes with double quotes in Scala? I have a data file that has some records …

scala dataframe spark-dataframe double-quotes single-quotes
Mapping Spark DataSet row values into new hash column

Given the following DataSet values as inputData: column0 column1 column2 column3 A 88 text 99 Z 12 test 200 T 120 foo 12 In Spark, what …

scala apache-spark spark-dataframe apache-spark-dataset
Spark + Parquet + Snappy: Overall compression ratio loses after spark shuffles data

Commmunity! Please help me understand how to get better compression ratio with Spark? Let me describe case: I have dataset, …

apache-spark apache-spark-sql spark-dataframe parquet snappy
How to rename spark data frame output file in AWS in spark SCALA

I am saving my spark data frame output as csv file in scala with partitions. This is how i do …

scala apache-spark amazon-s3 spark-dataframe multipleoutputs
Convert date to end of month in Spark

I have a Spark DataFrame as shown below: #Create DataFrame df <- data.frame(name = c("Thomas", "William", "Bill", "…

pyspark spark-dataframe sparkr
spark off heap memory config and tungsten

I thought that with the integration of project Tungesten, spark would automatically use off heap memory. What for are spark.…

apache-spark apache-spark-sql spark-dataframe apache-spark-2.0 off-heap
How to convert datetime from string format into datetime format in pyspark?

I created a dataframe using sqlContext and I have a problem with the datetime format as it is identified as …

datetime apache-spark pyspark spark-dataframe python-datetime