Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change array agg result from empty list to null if no row qualifed #11299

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Jul 6, 2024

Which issue does this PR close?

As @findepi pointed out in #11274 (comment) that most of the aggregate function does not return non-null result if no row qualified. I double check the result in Postgres and Duckdb and find out they does not return empty list for array_agg. I think we can follow the behaviour as they did.

I also hope this can make dealing with nullability simpler

  1. Check the nullability of each (aggregate) function
  2. Make sure nullable is helpful for logical optimizer.

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

The result of array agg is changed

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest labels Jul 6, 2024
@jayzhan211 jayzhan211 changed the title Change array agg semantic for empty result from empty list to empty row Jul 6, 2024
@jayzhan211 jayzhan211 changed the title Change array agg result from empty list to empty row if no row qualifed Jul 6, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>

# TODO: Expect NULL, got empty list
query ?
select array_agg(distinct a) from t where a > 3;
Copy link
Contributor Author

@jayzhan211 jayzhan211 Jul 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only query that has unexpected result.

I thought this should be equivalent to select array_agg(a) from t where a > 3 group by a;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 for calling this out!

Given there are no rows with a > 3, the aggregation input relation from t where a > 3 is empty, so this should produce same result as array_agg from empty table, ie NULL. currently it shows [] as a result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is related to a multi-stage grouping (where the first phase outputs a NULL and then the second phase gets that NULL and interprets it as [] 🤔

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 6, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added the core Core datafusion crate label Jul 6, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review July 6, 2024 07:49
Comment on lines -1761 to -1780
query III
WITH indices AS (
SELECT 1 AS idx UNION ALL
SELECT 2 AS idx UNION ALL
SELECT 3 AS idx UNION ALL
SELECT 4 AS idx UNION ALL
SELECT 5 AS idx
)
SELECT data.arr[indices.idx] as element, array_length(data.arr) as array_len, dummy
FROM (
SELECT array_agg(distinct c2) as arr, count(1) as dummy FROM aggregate_test_100
) data
CROSS JOIN indices
ORDER BY 1
----
1 5 100
2 5 100
3 5 100
4 5 100
5 5 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrite it to the simpler one!


# TODO: Expect NULL, got empty list
query ?
select array_agg(distinct a) from t where a > 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 for calling this out!

Given there are no rows with a > 3, the aggregation input relation from t where a > 3 is empty, so this should produce same result as array_agg from empty table, ie NULL. currently it shows [] as a result.

statement ok
drop table t;

# test with no values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add array_agg(distinct case on empty table

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as draft July 8, 2024 23:51
@github-actions github-actions bot removed the logical-expr Logical plan and expressions label Jul 9, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 9, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review July 9, 2024 08:56
@jayzhan211 jayzhan211 marked this pull request as draft July 9, 2024 09:54
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as ready for review July 9, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sqllogictest
3 participants