The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Updated
Jul 10, 2024 - Python
Google BigQuery enables companies to handle large amounts of data without having to manage infrastructure. Google’s documentation describes it as a « serverless architecture (that) lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes. » Its client libraries allow the use of widely known languages such as Python, Java, JavaScript, and Go. Federated queries are also supported, making it flexible to read data from external sources.
📖 A highly rated canonical book on it is « Google BigQuery: The Definitive Guide », a comprehensive reference.
Another enriching read on the subject is the inside story told in the article by the founding product manager of BigQuery celebrating its 10th anniversary.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
📊 Cube — The Semantic Layer for Building Data Applications
⚙️ Código de manutenção do datalake (metadados e pacotes de acesso) | 📖 Docs: https://basedosdados.github.io/mais/
the portable Python dataframe library
Open Source Feature Flagging and A/B Testing Platform
Cloud Dataflow Google-provided templates for solving in-Cloud data tasks
This project demonstrates how to build and automate an ETL pipeline using DAGs in Airflow and load the transformed data to Bigquery. There are different tools that have been used in this project such as Astro, DBT, GCP, Airflow, Metabase.
The open source high performance ELT framework powered by Apache Arrow
Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Released May 19, 2010