Showing posts with label Google BigQuery. Show all posts
Showing posts with label Google BigQuery. Show all posts

Wednesday, May 29, 2013

Google BigQuery Vs HDInsight - Comparison


ComparisonGoogle BigQueryWindows Azure HDInsight
Pricing

BigQuery uses a columnar data structure, which means that for a given query, you are only charged for data processed in each column, not the entire table.
Note: The first 100GB of data processed per month is at no charge.
Only 2 pricing components (query processing, storage)
Priced based on the configuration of Hadoop cluster and storage configuration.
Storage Options
Data can be loaded directly to Tables in BigQuery project.
Note: Recommendation
-to load data files first to Google Cloud Storage and then load data to BigQuery tables.
-Max 4 GB per file
-Max 100 GB per load
-Max 1000 files per load
HDInsight provides two options for storing data
•Windows Azure Blob Storage and
•Hadoop Distributed File system (HDFS)
Data Formats
BigQuery supports two schema types:
A flat schema in CSV or newline-delimited JSON format.
A nested/repeated schema in newline-delimited JSON format.
Supports unstructured data formats
Performance
Very fast in response for the query submitted
Slow in response (waited for several minutes to hours to complete to provide required output)
Best Practices
BigQuery Data Strategies and Best Practices
Big Data Solutions on Windows Azure