- San Francisco, CA
Block or Report
Block or report rohitrastogi
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Scalable datastore for metrics, events, and real-time analytics
Rust implementation of Apache Iceberg with integration for Datafusion
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
A self-hostable CDN for databases. Spice provides a unified SQL query interface and portable runtime to locally materialize, accelerate, and query datasets from any database, data warehouse, or dat…
New file format for storage of large columnar datasets.
prost-arrow derives arrow array builders for protobuf messages generated by prost
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
Distributed SQL Query Engine in Python using Ray
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Reference Architecture to automate the use of S3 Express One Zone as a caching layer for S3 Regional Buckets.
Chronon is a data platform for serving for AI/ML applications.
Distributed DataFrame for Python designed for the cloud, powered by Rust
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Open Source ElasticSearch Alternative. Parseable helps you search and get insights from your logs in the most simple way possible.
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
A compute framework for turning multimodal data structures into vector embeddings, to improve quality and control when working with LLMs. Generate custom multimodal embeddings with ease and weigh t…
A distributed transaction framework, supports workflow, saga, tcc, xa, 2-phase message, outbox patterns, supports many languages.
DSPy: The framework for programming—not prompting—foundation models
A data visualization and analytics component, especially well-suited for large and/or streaming datasets.
chDB is an in-process OLAP SQL Engine 🚀 powered by ClickHouse
Apache DataFusion Comet Spark Accelerator
The Metadata Platform for your Data Stack
Open Control Plane for Tables in Data Lakehouse