Skip to content
View kalona's full-sized avatar
Block or Report

Block or report kalona

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results
Python 14 Updated Dec 19, 2022

GPU Development in Python 101 tutorial

Jupyter Notebook 245 64 Updated Jun 18, 2024

Ibis tutorial repository

Jupyter Notebook 24 12 Updated Jul 8, 2024

Comparing DataFusion with DuckDB based on ClickBench, H2O, and TPC-H

Python 4 3 Updated Mar 20, 2024

Prepping tables for machine learning

Python 1,060 91 Updated Jul 25, 2024

Generate Parquet Files

Rust 7 1 Updated Jun 23, 2024

This is the development home of the workflow management system Snakemake. For general information, see

HTML 2,195 527 Updated Jul 25, 2024

A DSL for data-driven computational pipelines

Groovy 2,637 608 Updated Jul 25, 2024

Python pathlib-style classes for cloud storage services such as Amazon S3, Azure Blob Storage, and Google Cloud Storage.

Python 425 50 Updated Jul 21, 2024

The WeightWatcher tool for predicting the accuracy of Deep Neural Networks

Python 1,417 121 Updated May 30, 2024

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Python 8,031 2,417 Updated Jul 12, 2024

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

Java 141 38 Updated Jun 3, 2024

A massively parallel, high-level programming language

Rust 16,904 411 Updated Jul 25, 2024

Code for the book "Software Engineering for Data Scientists"

Jupyter Notebook 29 11 Updated Jun 28, 2024

Toolkit for installing and creating an initial database on Bare Metal Solution

Shell 56 25 Updated Jul 9, 2024

Waverunner, a database migration tool to help organize and migrate databases at scale.

Python 9 6 Updated Jul 9, 2024

daft-hudi-examples

Python 1 Updated May 4, 2024

DMT is an end to end automation of data warehouse migration, focused on extraction, SQL translation, data migration, data validation, etc. The main goal is to reduce migration delivery time

Python 25 9 Updated Jul 10, 2024

Analyzing FEMA's National Flood Insurance Program (NFIP) Data With DuckDB.

Jupyter Notebook 7 1 Updated Jun 25, 2024

Create backups of BigQuery datasets/tables

Python 40 7 Updated Aug 25, 2023

[DEPRECATED] GAE python based app which regularly collects information about GCP resources and stores them in BigQuery

Python 46 5 Updated Aug 31, 2023

A topic-centric list of HQ open datasets.

59,464 9,773 Updated Jul 8, 2024

A list of publicly available datasets with real-time data maintained by the team at bytewax.io

484 20 Updated May 28, 2024

Community Security Analytics provides a set of community-driven audit & threat queries for Google Cloud

Python 306 70 Updated Jun 12, 2024
Next