Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate expected data for integration tests as feather files #73

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

Jefffrey
Copy link
Collaborator

Relates to #66

Use PyArrow to read ORC files and write the data as Arrow feather files. This is to have more robust equality checks instead of relying on JSON (which needs to be parsed back to Arrow first).

Generating the expected files is a once off activity, relevant script included.

@Jefffrey Jefffrey merged commit fd23fdb into main Mar 23, 2024
9 checks passed
@Jefffrey Jefffrey deleted the feather_integration_tests branch March 23, 2024 08:13
@progval
Copy link
Contributor

progval commented Mar 23, 2024

Would it make sense to make build.rs run the Python script, so .feather files don't have to be committed to Git?

@Jefffrey
Copy link
Collaborator Author

Would it make sense to make build.rs run the Python script, so .feather files don't have to be committed to Git?

Hmm that's a good point, I didn't consider that.

One caveat is we'd need to run in a Python venv or use a Docker container to handle the pyarrow package requirement in a robust manner

@Jefffrey
Copy link
Collaborator Author

Created an issue for the above

#74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants