Final project for DataTalks.Club Data Engineering bootcamp
-
Updated
Apr 7, 2023 - HCL
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference.
Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
Final project for DataTalks.Club Data Engineering bootcamp
Yelp Data Processing Pipeline on GCP
Automatic Anomaly Decetor
A IaC script to ingest and process messages containing data of trips taken by vehicles.
Terraform module for BigQuery sink connector on Aiven KafkaConnect cluster
This module allows you to execute sql queries in big queries, simply by specifying the list of sql files (in the source bucket) to be executed, the result of which is then stored in the result bucket.
Simple HTTP endpoint for telemetry data type events in GCP.
Creating datasets and tables in Google BigQuery via Terraform
Terraform module for managing Google BigQuery datasets
Simple data quality on top of BigQuery by using just scheduled queries and terraform
terraform-bigquery-googlesheet
Data sync via CDC from GCP Cloud SQL to Big Query using Datastream
Use Bigquery Machine Learning (BQML) to apply advertising techniques
Terraform module of Google BigQuery S3 Data Transfer Config
Released May 19, 2010