Benchmarks #8

Jefffrey · 2023-11-03T21:55:52Z

To make improving performance more measurable, include benchmarks to be run.

Requires benchmark programs (see https://github.com/apache/arrow-rs/tree/master/parquet/benches)

And also large data files, ideally with all supported data types

Note for the data files, completely random data may not be sufficient, as some encodings take advantage of patterns in the data (e.g. int v2 RLE), so need to keep that in mind if considering generating data for the benchmarks

Could also use something like TPCH or TPCDS data, or NYC taxi, for more variety in data

Jefffrey · 2023-11-14T10:45:30Z

Will work on adding a simple benchmark to start us off, with aim of giving visibility over whether refactors positively or negatively impact performance

waynexia mentioned this issue Nov 6, 2023

Short-term roadmap for this implementation #7

Open

16 tasks

WenyXu added a commit to WenyXu/datafusion-orc that referenced this issue Nov 9, 2023

Update README.md (datafusion-contrib#8)

865d43b

Jefffrey self-assigned this Nov 14, 2023

Jefffrey mentioned this issue Nov 14, 2023

Add initial simple benchmark #39

Merged

Jefffrey added enhancement New feature or request medium Medium priority testing Tests labels Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks #8

Benchmarks #8

Jefffrey commented Nov 3, 2023 •

edited

Loading

Jefffrey commented Nov 14, 2023

Benchmarks #8

Benchmarks #8

Comments

Jefffrey commented Nov 3, 2023 • edited Loading

Jefffrey commented Nov 14, 2023

Jefffrey commented Nov 3, 2023 •

edited

Loading