Top "Pyspark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Select all except particular column in spark sql

I want to select all columns in a table except StudentAddress and hence I wrote following query: select `(StudentAddress)?+.+` from …

apache-spark-sql hiveql pyspark-sql spark-hive
why left_anti join doesn't work as expected in pyspark?

In a dataframe I'm trying to identify those rows that have a value in column C2 that does not exist …

pyspark-sql anti-join
How to convert type Row into Vector to feed to the KMeans

when i try to feed df2 to kmeans i get the following error clusters = KMeans.train(df2, 10, maxIterations=30, runs=10, initializationMode="…

apache-spark pyspark k-means apache-spark-mllib pyspark-sql
How to connect spark with hive using pyspark?

I am trying to read hive tables using pyspark, remotely. It states the error that it is unable to connect …

python-3.x hive pyspark pyspark-sql thrift-protocol
use SQL inside AWS Glue pySpark script

I want to use AWS Glue to convert some csv data to orc. The ETL job I created generated the …

amazon-web-services pyspark pyspark-sql amazon-athena aws-glue
Saving a dataframe result value to a string variable?

I created a dataframe in spark when find the max date I want to save it to the variable. Just …

python dataframe spark-dataframe pyspark-sql databricks