Top "Google-cloud-dataproc" questions

Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig and Hive service on Google Cloud Platform.

What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

I am using Google Data Flow to implement an ETL data ware house solution. Looking into google cloud offering, it …

google-cloud-platform google-cloud-dataflow google-cloud-dataproc
spark.sql.crossJoin.enabled for Spark 2.x

I am using the 'preview' Google DataProc Image 1.1 with Spark 2.0.0. To complete one of my operations I have to complete …

apache-spark google-cloud-dataproc
PySpark print to console

When running a PySpark job on the dataproc server like this gcloud --project <project_name> dataproc jobs submit …

python-2.7 pyspark google-cloud-dataproc
spark "basePath" option setting

When I do: allf = spark.read.parquet("gs://bucket/folder/*") I get: java.lang.AssertionError: assertion failed: Conflicting directory structures …

apache-spark pyspark google-cloud-dataproc
Incorrect memory allocation for Yarn/Spark after automatic setup of Dataproc Cluster

I'm trying to run Spark jobs on a Dataproc cluster, but Spark will not start due to Yarn being misconfigured. …

hadoop google-cloud-platform google-cloud-dataproc
"No Filesystem for Scheme: gs" when running spark job locally

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.…

apache-spark hadoop google-cloud-storage google-cloud-dataproc google-hadoop
Which HBase connector for Spark 2.0 should I use?

Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector …

scala apache-spark hbase google-cloud-dataproc google-cloud-bigtable
Dataprep vs Dataflow vs Dataproc

To perform source data preparation, data transformation or data cleansing, in what scenario should we use Dataprep vs Dataflow vs …

google-cloud-platform google-cloud-dataflow google-cloud-dataproc google-cloud-dataprep
What is default password for Jupyter created on google's data proc

I set data proc using the steps in link here https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook But my …

google-cloud-platform jupyter-notebook google-cloud-dataproc