Skip to content
View rohitrastogi's full-sized avatar
Block or Report

Block or report rohitrastogi

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Efficient data transformation and modeling framework that is backwards compatible with dbt.

Python 1,515 126 Updated Jul 25, 2024

Scalable datastore for metrics, events, and real-time analytics

Rust 28,305 3,516 Updated Jul 25, 2024

Apache Iceberg

Rust 526 113 Updated Jul 25, 2024

Rust implementation of Apache Iceberg with integration for Datafusion

Rust 78 11 Updated Jul 12, 2024

Create full-fledged APIs for slowly moving datasets without writing a single line of code.

Rust 3,162 173 Updated May 29, 2024

Python scraper based on AI

Python 13,455 1,042 Updated Jul 25, 2024

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

TypeScript 8,210 595 Updated Jul 25, 2024

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,111 148 Updated Jul 25, 2024

A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets from any database, data warehouse, or dat…

Rust 1,746 69 Updated Jul 25, 2024

New file format for storage of large columnar datasets.

C++ 400 20 Updated Jun 28, 2024

prost-arrow derives arrow array builders for protobuf messages generated by prost

Rust 3 Updated Mar 29, 2024

Remote shuffle service for Apache Spark to store shuffle data on remote servers.

Java 319 99 Updated Sep 29, 2023

Distributed SQL Query Engine in Python using Ray

Rust 220 14 Updated Nov 20, 2023

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Scala 1,085 393 Updated Jul 25, 2024

Reference Architecture to automate the use of S3 Express One Zone as a caching layer for S3 Regional Buckets.

Python 8 1 Updated Jul 8, 2024

Chronon is a data platform for serving for AI/ML applications.

Scala 677 36 Updated Jul 25, 2024

Distributed DataFrame for Python designed for the cloud, powered by Rust

Rust 1,902 122 Updated Jul 25, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 12,590 1,232 Updated Jul 25, 2024

baby quokka

Python 3 1 Updated Jan 10, 2024

Open Source ElasticSearch Alternative. Parseable helps you search and get insights from your logs in the most simple way possible.

Rust 1,799 93 Updated Jul 25, 2024

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Java 799 128 Updated Jul 20, 2024

Multi-hop declarative data pipelines

Java 83 12 Updated Jul 15, 2024

A compute framework for turning multimodal data structures into vector embeddings, to improve quality and control when working with LLMs. Generate custom multimodal embeddings with ease and weigh t…

Jupyter Notebook 393 19 Updated Jul 25, 2024

A distributed transaction framework, supports workflow, saga, tcc, xa, 2-phase message, outbox patterns, supports many languages.

Go 9,938 960 Updated May 31, 2024

DSPy: The framework for programming—not prompting—foundation models

Python 14,785 1,132 Updated Jul 24, 2024

A data visualization and analytics component, especially well-suited for large and/or streaming datasets.

C++ 8,039 1,072 Updated Jul 24, 2024

chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse

C++ 1,909 71 Updated Jul 24, 2024

Apache DataFusion Comet Spark Accelerator

Rust 688 128 Updated Jul 25, 2024

The Metadata Platform for your Data Stack

Java 9,500 2,818 Updated Jul 25, 2024

Open Control Plane for Tables in Data Lakehouse

Java 275 43 Updated Jul 25, 2024
Next