Popular "apache-spark-sql" questions | Page 7

I am pretty new to spark and would like to perform an operation on a column of a dataframe so …

scala apache-spark apache-spark-sql regexp-replace

I have two dataframes with the following columns: df1.columns // Array(ts, id, X1, X2) and df2.columns // Array(ts, …

scala apache-spark apache-spark-sql

I am migrating from Impala to SparkSQL, using the following code to read a table: my_data = sqlContext.read.parquet(…

scala apache-spark hive apache-spark-sql hdfs

I want to filter a DataFrame using a condition related to the length of a column, this question might be …

python apache-spark dataframe pyspark apache-spark-sql

The question is pretty much in the title: Is there an efficient way to count the distinct values in every …

apache-spark apache-spark-sql distinct-values

I need to join two ordinary RDDs on one/more columns. Logically this operation is equivalent to the database join …

scala join apache-spark rdd apache-spark-sql

java apache-spark pyspark apache-spark-sql

When I create a DataFrame from a JSON file in Spark SQL, how can I tell if a given column …

scala apache-spark dataframe apache-spark-sql

As mentioned in many other locations on the web, adding a new column to an existing DataFrame is not straightforward. …

python apache-spark dataframe pyspark apache-spark-sql

Question: in pandas when dropping duplicates you can specify which columns to keep. Is there an equivalent in Spark Dataframes? …

dataframe apache-spark pyspark apache-spark-sql duplicates

Top "Apache-spark-sql" questions