Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks #8

Open
Jefffrey opened this issue Nov 3, 2023 · 1 comment
Open

Benchmarks #8

Jefffrey opened this issue Nov 3, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request medium Medium priority testing Tests

Comments

@Jefffrey
Copy link
Collaborator

Jefffrey commented Nov 3, 2023

To make improving performance more measurable, include benchmarks to be run.

Requires benchmark programs (see https://github.com/apache/arrow-rs/tree/master/parquet/benches)

And also large data files, ideally with all supported data types

Note for the data files, completely random data may not be sufficient, as some encodings take advantage of patterns in the data (e.g. int v2 RLE), so need to keep that in mind if considering generating data for the benchmarks

Could also use something like TPCH or TPCDS data, or NYC taxi, for more variety in data

WenyXu added a commit to WenyXu/datafusion-orc that referenced this issue Nov 9, 2023
@Jefffrey Jefffrey self-assigned this Nov 14, 2023
@Jefffrey
Copy link
Collaborator Author

Will work on adding a simple benchmark to start us off, with aim of giving visibility over whether refactors positively or negatively impact performance

@Jefffrey Jefffrey added enhancement New feature or request medium Medium priority testing Tests labels Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request medium Medium priority testing Tests
1 participant