Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
Coming from R, I am used to easily doing operations on columns. Is there any easy way to take this …
scala apache-spark dataframe apache-spark-sql user-defined-functionsWhat is the difference between DataFrame repartition() and DataFrameWriter partitionBy() methods? I hope both are used to "partition data based …
apache-spark-sql data-partitioningI have a dataframe which has one row, and several columns. Some of the columns are single values, and others …
python apache-spark dataframe pyspark apache-spark-sqlSparkSQL CLI internally uses HiveQL and in case Hive on spark(HIVE-7292) , hive uses spark as backend engine. Can somebody …
apache-spark hadoop hive apache-spark-sqlGiven Table 1 with one column "x" of type String. I want to create Table 2 with a column "y" that is …
scala apache-spark apache-spark-sql user-defined-functions nullableI would like to calculate group quantiles on a Spark dataframe (using PySpark). Either an approximate or exact result would …
apache-spark pyspark apache-spark-sql pyspark-sqlHow to bind variable in Apache Spark SQL? For example: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) …
scala apache-spark apache-spark-sql apache-spark-2.0When CSV is read as dataframe in spark, all the columns are read as string. Is there any way to …
scala apache-spark apache-spark-sql spark-csvI have this python code that runs locally in a pandas dataframe: df_result = pd.DataFrame(df .groupby('A') .apply(…
python apache-spark pyspark apache-spark-sql user-defined-functionsI'm currently trying to extract a database from MongoDB and use Spark to ingest into ElasticSearch with geo_points. The …
scala elasticsearch apache-spark etl apache-spark-sql