Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I have a Cassandra table that for simplicity looks something like: key: text jsonData: text blobData: blob I can create …
scala apache-spark dataframe apache-spark-sql spark-cassandra-connectorI looked at the docs and it says the following join types are supported: Type of join to perform. Default …
scala apache-spark apache-spark-sql spark-dataframe apache-spark-2.0I am doing some testing for spark using scala. We usually read json files which needs to be manipulated like …
scala apache-spark apache-spark-sql distributed-computingI'm trying to make sense of where you need to use a lit value, which is defined as a literal …
python apache-spark pyspark apache-spark-sqlI have a spark data frame df. Is there a way of sub selecting a few columns using a list …
apache-spark apache-spark-sql spark-dataframeI am new to spark, and I want to use group-by & reduce to find the following from CSV (one …
java apache-spark hadoop apache-spark-sql hdfsI am using Spark 1.5. I have two dataframes of the form: scala> libriFirstTable50Plus3DF res1: org.apache.spark.…
scala apache-spark join apache-spark-sqlI'm using HiveContext with SparkSQL and I'm trying to connect to a remote Hive metastore, the only way to set …
apache-spark hive apache-spark-sqlI'm trying to run an insert statement with my HiveContext, like this: hiveContext.sql('insert into my_table (id, score) …
apache-spark apache-spark-sql pyspark apache-spark-1.5 hivecontextI'm trying to use Spark dataframes instead of RDDs since they appear to be more high-level than RDDs and tend …
apache-spark pyspark apache-spark-sql