Pros & cons of BigQuery vs. Amazon Redshift

user2339344 picture user2339344 · Oct 13, 2014 · Viewed 12.9k times · Source

Comparing Google BigQuery vs. Amazon Redshift shows that both can answer same set of requirements, differ mostly by cost plans. It seems that Redshift is more complex to configure (defining keys and optimization work) vs. Google BigQuery that perhaps has an issue with joining tables.

Is there a pros & cons list of Google BigQuery vs. Amazon Redshift?

Answer

Felipe Hoffa picture Felipe Hoffa · Sep 8, 2015

I posted this comparison on reddit. Quickly enough a long term RedShift practitioner came to comment on my statements. Please see https://www.reddit.com/r/bigdata/comments/3jnam1/whats_your_preference_for_running_jobs_in_the_aws/cur518e for the full conversation.

Sizing your cluster:

  • Redshift will ask you to choose a number of CPUs, RAM, HD, etc. and to turn them on.
  • BigQuery doesn't care. Use it whenever you want, no provisioning needed.

Hourly costs when doing nothing:

  • Redshift will ask you to pay per hour of each of these servers running, even when you are doing nothing.
  • When idle BigQuery only charges you $0.02 per month per GB stored. 2 cents per month per GB, that's it.

Speed of queries:

  • Redshift performance is limited by the amount of CPUs you are paying for
  • BigQuery transparently brings in as many resources as needed to run your query in seconds.

Indexing:

  • Redshift will ask you to index (correction: distribute) your data under certain criteria, and you'll only be able to run fast queries based on this index.
  • BigQuery has no indexes. Every operation is fast.

Vacuuming:

  • Redshift requires periodic maintenance and 'vacuum' operations that last hours. You are paying for each of these server hours.
  • BigQuery does not. Forget about 'vacuuming'.

Data partitioning and distributing:

  • Redshift requires you to think about how to distribute data within your servers to keep performance up - optimization that works only for certain queries.
  • BigQuery does not. Just run whatever query you want.

Streaming live data:

  • Impossible(?) with Redshift.
  • BigQuery easily handles ingesting up to 100,000 rows per second per table.

Growing your cluster:

  • If you have more data, or more concurrent users scaling up will be painful with Redshift.
  • BigQuery will just work.

Multi zone:

  • You want a multi-zone Redshift for availability and data integrity? Painful.
  • BigQuery is multi-zoned by default.

To try BigQuery you don't need a credit card or any setup time. Just try it (quick instructions to try BigQuery).

When you are ready to put your own data into BigQuery, just copy your JSON new-line separated logs from to Google Cloud Storage and import them.

See this in depth guide to data warehouse pricing on the cloud: Understanding Cloud Pricing Part 3.2 - More Data Warehouses