Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Fail to read column of type array<float> #111

Closed
youngsofun opened this issue Jul 2, 2024 · 0 comments · Fixed by #112
Closed

Bug: Fail to read column of type array<float> #111

youngsofun opened this issue Jul 2, 2024 · 0 comments · Fixed by #112

Comments

@youngsofun
Copy link
Contributor

youngsofun commented Jul 2, 2024

What's wrong?

encounter panic when reading a column containing float values under array or map structures, such as:

  • array<float>
  • array<tuple<float, int>>
array less than expected length
thread 'basic_test_nested_array_float' panicked at src/array_decoder/mod.rs:71:30:
array less than expected length

How to reproduce?

Add the following code to tests/basic/data/write.py:

nested_array_float = {
    "value": [
        [1.0, 2.0],
        [None, 2.0],
    ],
}

_write("struct<value:array<float>>", nested_array_float, "nested_array_float.orc")

Run the test against it like nested_array.orc

Reason

FloatIter uses number of rows instead of number of leaf values.

Refer to the code here: array_decoder/mod.rs

Quick Fix

FloatIter do not have to guard against number of values. It can simply read until the end of bytes in memory, similar to how integers are handled.

pr: #112

Some questions about Column::number_of_rows

Is the Column::number_of_rows intended to represent number_of_values? If so, this code seems incorrect:

pub fn children(&self) -> Vec<Column> {
    .... 
    DataType::List { child, .. } => {
        vec![Column {
            number_of_rows: self.number_of_rows,
            footer: self.footer.clone(),
            name: "item

In my understanding, we have no way to derive number of values from the parent column.
however we can retrieve number of values from the stripe metadata, this requires a refactor.

Here’s an ugly but workable fix: Commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
1 participant