Customizable Nullability for UDAF #11274

jayzhan211 · 2024-07-05T03:47:03Z

Is your feature request related to a problem or challenge?

Follow on #11093 and also there is another issue #11256 expect this

Describe the solution you'd like

There are some functions always returns non-null result, like count, array_agg, min/max and more.
We can optimize the query based on the nullability of the function, so it would be helpful if we could define nullability for each UDAF

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

findepi · 2024-07-05T06:57:33Z

To address this we could extend the pattern here

datafusion/datafusion/expr/src/aggregate_function.rs

Lines 84 to 88 in 4bc3228

    
           pub fn return_type( 
        
               &self, 
        
               input_expr_types: &[DataType], 
        
               input_expr_nullable: &[bool], 
        
           ) -> Result<DataType> {

ie now we have return_type(input_types, nullability) -> return_type
we could have return_type(input_types, nullability) -> (return_type, nullability)

this is an API for builtin functions, we would need similar API for UDAFs

datafusion/datafusion/expr/src/udaf.rs

Line 330 in 5f02c8a

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;

Obviously, this would be a breaking change: both method signature and return type would change.
We can avoid the breaking change by making nullability a separate method, but it would be good to first determine the ideal end-state, and only then think how to get there.

findepi · 2024-07-05T20:10:06Z

Just thinking more about this. SQL spec

If no row qualifies, then the result of COUNT is 0 (zero), and the result of any other aggregate function is the null value.

and this is also about array_agg.

So it might be that the count is the only aggregate function that really returns non-null values, and maybe we don't need to complicate UDAF API for this.

jayzhan211 · 2024-07-06T00:56:15Z

I agree that we don't need non-null for array_agg, min/max, but we may need it for count variant like approx_count_distinct or other user-defined function that expect non-null, so able to define nullability still seems like a good idea.

btw, our array_agg always return empty array not null, we should consider change the behaviour to follow other sql database to return null or keep it as it is 🤔 . DuckDB and Postgres also returns null for array_agg.

jayzhan211 · 2024-07-06T01:38:20Z

The nullability of a function can differ from the final result, which is crucial for optimization purposes. We can inform the optimizer that a function will always return a non-null value, allowing for optimizations based on this assumption. However, the final result can still be null if no rows qualify, resulting in an empty plan and thus a null value

#11093 (comment)

jayzhan211 added the enhancement New feature or request label Jul 5, 2024

jayzhan211 changed the title ~~Customizable Nullability for UDFs and UDAFs~~ Jul 5, 2024

jayzhan211 mentioned this issue Jul 5, 2024

Infer count() aggregation is not null #11256

Merged

findepi mentioned this issue Jul 5, 2024

Convert ArrayAgg to UDAF #10999

Open

jcsherin mentioned this issue Jul 5, 2024

Convert nth_value to UDAF #11287

Merged

jayzhan211 mentioned this issue Jul 6, 2024

Change array_agg to return null on no input rather than empty list #11299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customizable Nullability for UDAF #11274

Customizable Nullability for UDAF #11274

jayzhan211 commented Jul 5, 2024 •

edited

Loading

findepi commented Jul 5, 2024

findepi commented Jul 5, 2024

jayzhan211 commented Jul 6, 2024 •

edited

Loading

jayzhan211 commented Jul 6, 2024 •

edited

Loading

Customizable Nullability for UDAF #11274

Customizable Nullability for UDAF #11274

Comments

jayzhan211 commented Jul 5, 2024 • edited Loading

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

findepi commented Jul 5, 2024

findepi commented Jul 5, 2024

jayzhan211 commented Jul 6, 2024 • edited Loading

jayzhan211 commented Jul 6, 2024 • edited Loading

jayzhan211 commented Jul 5, 2024 •

edited

Loading

jayzhan211 commented Jul 6, 2024 •

edited

Loading

jayzhan211 commented Jul 6, 2024 •

edited

Loading