Popular "apache-spark" questions | Page 11

get datatype of column using pyspark

We are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ). …

apache-spark pyspark apache-spark-sql databricks

Get current number of partitions of a DataFrame

Is there any way to get the current number of partitions of a DataFrame? I checked the DataFrame javadoc (spark 1.6) …

apache-spark dataframe apache-spark-sql

How to write the resulting RDD to a csv file in Spark python

I have a resulting RDD labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions). This has output in this format: [(0.0, 0.08482142857142858), (0.0, 0.11442786069651742),.....] …

python csv apache-spark pyspark file-writing

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to …

hadoop apache-spark hive apache-spark-sql spark-dataframe

How to link PyCharm with PySpark?

I'm new with apache spark and apparently I installed apache-spark with homebrew in my macbook: Last login: Fri Jan 8 12:52:04 on …

python apache-spark pyspark pycharm homebrew

(Why) do we need to call cache or persist on a RDD

When a resilient distributed dataset (RDD) is created from a text file or collection (or from another RDD), do we …

scala apache-spark rdd

aggregate function Count usage with groupBy in Spark

I'm trying to make multiple operations in one line of code in pySpark, and not sure if that's possible for …

java scala apache-spark pyspark apache-spark-sql

Apache Spark: How to use pyspark with Python 3

I built Spark 1.4 from the GH development master, and the build went through fine. But when I do a bin/…

python python-3.x apache-spark

How to build a sparkSession in Spark 2.0 using pyspark?

I just got access to spark 2.0; I have been using spark 1.6.1 up until this point. Can someone please help me …

python sql apache-spark pyspark

pyspark dataframe filter or include based on list

I am trying to filter a dataframe in pyspark using a list. I want to either filter based on the …

apache-spark filter pyspark apache-spark-sql

Top "Apache-spark" questions