API Latest Releases: Java Core, C++ Core, Python, Memory, Pig, Hive,

Apache DataSketches GitHub Component Repositories

Our library is made up of multiple components that are partitioned into GitHub repositories by language and dependencies. The dependencies of the core components are kept to a bare minimum to enable flexible integration into many different environments. The Platform Adaptor components will have major dependencies on the respective platform envionments.

If you have a specific issue or bug report that impacts only one of these components please open an issue on the respective component. If you are a developer and wish to submit a PR, please choose the appropriate repository.

If you like what you see give us a Star on these sites!

Core Sketch Libraries

The key sketches of the Apache DataSketches libraries are available in three (soon four) programming languages. By design, a sketch that is available in one language that is also available in a different language will be “binary compatible” via serialization. For example, when serialized into its compact form, a sketch created by the DataSketches C++ library, can be read by the DataSketches Java library and visa versa.

Because of differences inherent in the languages, there will be some differences in the APIs, but we try to make the same basic functionality available across all the languages.

Repository Distribution Comments
Java Core Downloads This is the original and the most comprehensive collection of sketch algorithms. It has a dependency on the Memory component
Memory (supports Java Core) Downloads Provides high-performance access to off-heap memory
C++ Core Downloads C++ was our second core language library and provides most of the major algorithms available in Java as well as a few sketches unique to C++.
Python Core Downloads, PyPI Python was our third core language library and contains most of the major sketch families that are in Java and C++. All the Python sketches are backed by the C++ library via Pybind.
Go Core Under Development Go is our fourth core language and is still evolving.

Platform Adaptors

Adapters integrate the core library components into the aggregation APIs of specific data processing platforms. Some of these adapters are available as an Apache DataSketches distribution, other adapters are directly integrated into the target platform.

Repository Distribution Comments
Google BigQuery Adaptor Under Development Depends on C++ Core
Apache Hive Adaptor Downloads Depends on Java Core, Integrations
Apache Pig Adaptor Downloads Depends on Java Core, Integrations
PostgreSQL Adaptor Downloads, pgxn.org Depends on C++ Core, Integrations
Apache Druid Adaptor Apache Druid Release Depends on Java Core, Integrations

Other

Repository Distribution Comments
Characterization Not Formally Released Used for long-running studies of accuracy and speed performance over many different parameters.
Website Not Formally Released Public website
Vector Not Formally Released This component implements the Frequent Directions Algorithm [GLP16]. It is still experimental in that the theoretical work has not yet supplied a suitable measure of error for production work. It can be used as is, but it will not go through a formal Apache Release until we can find a way to provide better error properties. It dependends on the Memory component.
Server Not Formally Released Under development