Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig and Hive service on Google Cloud Platform.
I am using Google Data Flow to implement an ETL data ware house solution. Looking into google cloud offering, it …
google-cloud-platform google-cloud-dataflow google-cloud-dataprocI am using the 'preview' Google DataProc Image 1.1 with Spark 2.0.0. To complete one of my operations I have to complete …
apache-spark google-cloud-dataprocWhen running a PySpark job on the dataproc server like this gcloud --project <project_name> dataproc jobs submit …
python-2.7 pyspark google-cloud-dataprocI am using Google Cloud Dataproc to do spark job and my editor is Zepplin. I was trying to write …
scala apache-spark google-cloud-platform google-cloud-storage google-cloud-dataprocWhen I do: allf = spark.read.parquet("gs://bucket/folder/*") I get: java.lang.AssertionError: assertion failed: Conflicting directory structures …
apache-spark pyspark google-cloud-dataprocI'm trying to run Spark jobs on a Dataproc cluster, but Spark will not start due to Yarn being misconfigured. …
hadoop google-cloud-platform google-cloud-dataprocI am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.…
apache-spark hadoop google-cloud-storage google-cloud-dataproc google-hadoopOur stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector …
scala apache-spark hbase google-cloud-dataproc google-cloud-bigtableTo perform source data preparation, data transformation or data cleansing, in what scenario should we use Dataprep vs Dataflow vs …
google-cloud-platform google-cloud-dataflow google-cloud-dataproc google-cloud-dataprepI set data proc using the steps in link here https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook But my …
google-cloud-platform jupyter-notebook google-cloud-dataproc