Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples		examples
raysql		raysql
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build.rs		build.rs
pyproject.toml		pyproject.toml
requirements.in		requirements.in

Repository files navigation

RaySQL: DataFusion on Ray

This is an experimental research project to evaluate the concept of performing distributed SQL queries from Python, using Ray and DataFusion.

Example

See examples/tips.py.

import ray
from raysql.context import RaySqlContext
from raysql.worker import Worker

# Start our cluster
ray.init()

# create some remote Workers
workers = [Worker.remote() for i in range(2)]

# create context and plan a query
ctx = RaySqlContext(workers)
ctx.register_csv('tips', 'tips.csv', True)
ctx.sql('select sex, smoker, avg(tip/total_bill) as tip_pct from tips group by sex, smoker')

Status

Proof-of-concept. Not producing correct results yet.

Features

Mature SQL support (CTEs, joins, subqueries, etc) thanks to DataFusion
Support for CSV and Parquet files

Limitations

Simplistic shuffle mechanism that produces lots of files
Requires a shared file system currently

Building

# prepare development environment (used to build wheel / install in development)
python3 -m venv venv
# activate the venv
source venv/bin/activate
# update pip itself if necessary
python -m pip install -U pip
# install dependencies (for Python 3.8+)
python -m pip install -r requirements-in.txt

Whenever rust code changes (your changes or via git pull):

# make sure you activate the venv using "source venv/bin/activate" first
maturin develop
python -m pytest

How to update dependencies

To change test dependencies, change the requirements.in and run

# install pip-tools (this can be done only once), also consider running in venv
python -m pip install pip-tools
python -m piptools compile --generate-hashes -o requirements-310.txt

To update dependencies, run with -U

python -m piptools compile -U --generate-hashes -o requirements-310.txt

More details here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RaySQL: DataFusion on Ray

Example

Status

Features

Limitations

Building

How to update dependencies

About

Releases 4

Packages

Contributors 4

Languages

License

datafusion-contrib/ray-sql

Folders and files

Latest commit

History

Repository files navigation

RaySQL: DataFusion on Ray

Example

Status

Features

Limitations

Building

How to update dependencies

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages