-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Insights: apache/datafusion
Overview
Could not load contribution data
Please try again later
65 Pull requests merged by 38 people
-
Use
AccumulatorArgs::is_reversed
inNthValueAgg
#11669 merged
Jul 27, 2024 -
[Bug] fix bug in return type inference of
utf8_to_int_type
#11662 merged
Jul 26, 2024 -
Run CI with latest (Rust 1.80), add ticket references to commented out tests
#11661 merged
Jul 26, 2024 -
fix: dont try to coerce list for regex match
#11646 merged
Jul 26, 2024 -
Add support for USING to SQL unparser
#11636 merged
Jul 26, 2024 -
doc: why nullable of list item is set to true
#11626 merged
Jul 26, 2024 -
Extract catalog API to separate crate, change
TableProvider::scan
to take a trait rather thanSessionState
#11516 merged
Jul 26, 2024 -
chore(deps): update sqlparser requirement from 0.48 to 0.49
#11630 merged
Jul 26, 2024 -
Add
CsvExecBuilder
for creatingCsvExec
#11633 merged
Jul 25, 2024 -
Add reference to #comet channel in Arrow Rust Discord server
#11637 merged
Jul 25, 2024 -
Add parser option enable_options_value_normalization
#11330 merged
Jul 25, 2024 -
Fix clippy errors for Rust 1.80
#11654 merged
Jul 25, 2024 -
Temporarily pin toolchain version to avoid clippy errors
#11655 merged
Jul 25, 2024 -
Minor: use
ready!
macro to simplifyFilterExec
#11649 merged
Jul 25, 2024 -
GC
StringViewArray
inCoalesceBatchesStream
#11587 merged
Jul 25, 2024 -
Refactor/simplify window frame utils
#11648 merged
Jul 25, 2024 -
Add some zero column tests covering LIMIT, GROUP BY, WHERE, JOIN, and WINDOW
#11624 merged
Jul 25, 2024 -
fix: expose the fluent API fn for approx_distinct instead of the module
#11644 merged
Jul 25, 2024 -
Parsing SQL strings to Exprs with the qualified schema
#11562 merged
Jul 25, 2024 -
Consistent API to set parameters of aggregate and window functions (
AggregateExt
-->ExprFunctionExt
)#11550 merged
Jul 24, 2024 -
Unify CI and pre-commit hook settings for clippy
#11640 merged
Jul 24, 2024 -
perf: Optimize IsNotNullExpr
#11586 merged
Jul 24, 2024 -
Minor: avoid copying order by exprs in planner
#11634 merged
Jul 24, 2024 -
feat: add bounds for unary math scalar functions
#11584 merged
Jul 24, 2024 -
ExprBuilder for Physical Aggregate Expr
#11617 merged
Jul 24, 2024 -
Minor: unecessary row_count calculation in
CrossJoinExec
andNestedLoopsJoinExec
#11632 merged
Jul 24, 2024 -
Enforce uniqueness of
named_struct
field names#11614 merged
Jul 24, 2024 -
Fix :
signum
function bug when0.0
input#11580 merged
Jul 24, 2024 -
Rename
functions-array
tofunctions-nested
#11602 merged
Jul 24, 2024 -
Minor: Use upstream
concat_batches
from arrow-rs#11615 merged
Jul 24, 2024 -
fix: panic and incorrect results in
LogFunc::output_ordering()
#11571 merged
Jul 24, 2024 -
refactor: simplify
DFSchema::field_with_unqualified_name
#11619 merged
Jul 23, 2024 -
Remove ArrayAgg Builtin in favor of UDF
#11611 merged
Jul 23, 2024 -
Implement physical plan serialization for csv COPY plans , add
as_any
,Debug
toFileFormatFactory
#11588 merged
Jul 23, 2024 -
Push scalar functions into cross join
#11528 merged
Jul 23, 2024 -
Change default Parquet writer settings to match arrow-rs (except for compression & statistics)
#11558 merged
Jul 23, 2024 -
Improve unparser MySQL compatibility
#11589 merged
Jul 23, 2024 -
Doc: A tiny typo in scalar function's doc
#11620 merged
Jul 23, 2024 -
test: get file size by func metadata
#11575 merged
Jul 23, 2024 -
Fix Internal Error for an INNER JOIN query
#11578 merged
Jul 23, 2024 -
feat: support Map literals in Substrait consumer and producer
#11547 merged
Jul 23, 2024 -
Fix typo in doc of Partitioning
#11612 merged
Jul 23, 2024 -
Chore/fifo tests cleanup
#11616 merged
Jul 23, 2024 -
support Decimal256 type in datafusion-proto
#11606 merged
Jul 23, 2024 -
Minor:Disable flaky SMJ antijoin filtered test until the fix
#11608 merged
Jul 23, 2024 -
Migrate
OrderSensitiveArrayAgg
to be a user defined aggregate#11564 merged
Jul 23, 2024 -
Improve Union Equivalence Propagation
#11506 merged
Jul 22, 2024 -
Fix SortMergeJoin antijoin flaky condition
#11604 merged
Jul 22, 2024 -
Add support for Utf8View for date/temporal codepaths
#11518 merged
Jul 22, 2024 -
feat: Optimize CASE expression for usage where then and else values are literals
#11553 merged
Jul 22, 2024 -
Minor: move
Column
related tests and renamecolumn.rs
#11573 merged
Jul 22, 2024 -
Move OutputRequirements to datafusion-physical-optimizer crate
#11579 merged
Jul 22, 2024 -
Provide DataFrame API for
map
and movemap
tofunctions-array
#11560 merged
Jul 22, 2024 -
fix: CASE with NULL
#11542 merged
Jul 22, 2024 -
feat: Error when a SHOW command is passed in with an accompanying non-existant variable
#11540 merged
Jul 22, 2024 -
Move Datafusion Query Optimizer to library user guide
#11563 merged
Jul 22, 2024 -
chore: Minor cleanup
simplify_demo()
example#11576 merged
Jul 22, 2024 -
Initial support for regex_replace on
StringViewArray
#11556 merged
Jul 22, 2024 -
Fix unparser invalid sql for query with order
#11527 merged
Jul 22, 2024 -
Support SortMergeJoin spilling
#11218 merged
Jul 22, 2024 -
Support
newlines_in_values
CSV option#11533 merged
Jul 21, 2024 -
Move
sql_compound_identifier_to_expr
toExprPlanner
#11487 merged
Jul 21, 2024 -
refactor: rewrite mega type to an enum containing both cases
#11539 merged
Jul 21, 2024 -
fix: fixes trig function order by
#11559 merged
Jul 21, 2024 -
Minor: move
SessionStateDefaults
into its own module#11566 merged
Jul 20, 2024
21 Pull requests opened by 14 people
-
Fix query referencing the same unnest expr in expr tree
#11577 opened
Jul 20, 2024 -
Add was_valid parameter to NullState callbacks
#11592 opened
Jul 22, 2024 -
feat: use Substrait's PrecisionTimestamp and PrecisionTimestampTz instead of deprecated Timestamp
#11597 opened
Jul 22, 2024 -
Extract `CoalesceBatchesStream` to a struct
#11610 opened
Jul 22, 2024 -
chore(deps): update substrait requirement from 0.36.0 to 0.38.0
#11613 opened
Jul 23, 2024 -
rfc: optional skipping partial aggregation
#11627 opened
Jul 23, 2024 -
Update cache key used in rust CI script
#11641 opened
Jul 24, 2024 -
Minor: improve documentation on `SessionState`
#11642 opened
Jul 24, 2024 -
Implement physical plan serialization for json Copy plans
#11645 opened
Jul 25, 2024 -
Prototype combined Repartition/Filter + Coalesce (WIP)
#11647 opened
Jul 25, 2024 -
Add LimitPushdown optimization rule and CoalesceBatchesExec fetch
#11652 opened
Jul 25, 2024 -
Ensure statistic defaults in parquet writers are in sync
#11656 opened
Jul 25, 2024 -
Change `--string-view` to only apply to parquet formats
#11663 opened
Jul 25, 2024 -
Provide actionable error messaging due to resource exhaustion.
#11665 opened
Jul 26, 2024 -
Rename `input_type` --> `input_types` om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs
#11666 opened
Jul 26, 2024 -
Merge string-view2 branch to main
#11667 opened
Jul 26, 2024 -
Docs: adding explicit mention of test_utils to docs
#11670 opened
Jul 26, 2024 -
Custom planning behavior for selecting wildcard expression
#11673 opened
Jul 26, 2024 -
Increase ByteViewMap block size to 2MB
#11674 opened
Jul 26, 2024 -
Implement native support StringView for character length
#11676 opened
Jul 26, 2024 -
Do not push down Sorts if it violates the sort requirements
#11678 opened
Jul 26, 2024
39 Issues closed by 8 people
-
Use `AccumulatorArgs::is_reversed` in `NthValueAgg`
#11668 closed
Jul 27, 2024 -
Add nullable in `StateFieldArgs`
#11433 closed
Jul 26, 2024 -
Get Clippy clean for Rust 1.80 and run it on CI
#11657 closed
Jul 26, 2024 -
Internal error when regex operator `~` is used with `List`s (SQLancer)
#11622 closed
Jul 26, 2024 -
Support convert LogicalPlan JOIN with `Using` constraint to SQL String
#10652 closed
Jul 26, 2024 -
Document why nullable of list item does not map to schema of first argument
#11625 closed
Jul 26, 2024 -
circular dependency check CI check is failing with compile error
#11671 closed
Jul 26, 2024 -
Fix clippy lint for the number of arguments to `CsvExec::new()`
#11565 closed
Jul 25, 2024 -
Stop changing the case for COPY TO option values
#10853 closed
Jul 25, 2024 -
Clippy CI failures on main after Rust 1.80 release
#11651 closed
Jul 25, 2024 -
Make an end to end reproducer for zero column batch issues
#5713 closed
Jul 25, 2024 -
functions_aggregate::expr_fn::approx_distinct should expose the function, not the module
#11643 closed
Jul 25, 2024 -
Parsing SQL strings to Exprs wtih the qualified schema
#11551 closed
Jul 25, 2024 -
Make it easier to create WindowFunctions with the Expr API
#6747 closed
Jul 24, 2024 -
Implement `evaluate_bounds` for math unary functions
#11583 closed
Jul 24, 2024 -
The struct value should not have duplicate and null name
#11438 closed
Jul 24, 2024 -
signum function incompatible with Postgres and Apache Spark
#11557 closed
Jul 24, 2024 -
Rename `functions-array` to `functions-nested` to collect all nested-type functions
#11598 closed
Jul 24, 2024 -
Blog post with DataFusion Jan - June 2024
#9602 closed
Jul 24, 2024 -
Crash bug when `log()` is used in `order by` clause (SQLancer)
#11549 closed
Jul 24, 2024 -
Ability to chunk download from object store
#11609 closed
Jul 24, 2024 -
Internal Error for an INNER JOIN query (SQLancer)
#11412 closed
Jul 23, 2024 -
Typo in doc of datafusion::physical_plan::Partitioning
#11593 closed
Jul 23, 2024 -
Decimal256 type is not supported in datafusion-proto
#11607 closed
Jul 23, 2024 -
`SanityCheckPlan` Error during planning: ... does not satisfy parent order requirements: ...
#11492 closed
Jul 22, 2024 -
DataFusion weekly project plan (Andrew Lamb) - July 15, 2024
#11474 closed
Jul 22, 2024 -
[Epic] Complete pulling out special SQL planning from the Sql Parser
#11207 closed
Jul 22, 2024 -
[EPIC] Continued correct and improved extracting Parquet statistics into ArrayRefs
#10922 closed
Jul 22, 2024 -
Add a sub-project for map udf functions
#11572 closed
Jul 22, 2024 -
Easier Dataframe API for `map`
#11546 closed
Jul 22, 2024 -
`CASE` with `NULL` branch does not coerce when passed to aggregate function
#11258 closed
Jul 22, 2024 -
`SHOW NONSENSE` does not error
#11529 closed
Jul 22, 2024 -
Consolidate optimizer readme into datafusion user guide
#11497 closed
Jul 22, 2024 -
Move `sql_compound_identifier_to_expr` to `ExprPlanner`
#11473 closed
Jul 22, 2024 -
Add spilling in SortMergeJoin
#9359 closed
Jul 22, 2024 -
Add support for `newlines_in_values` to `CsvOptions`
#11472 closed
Jul 21, 2024 -
Add some structure to capture result of write orchestration function
#11443 closed
Jul 21, 2024 -
Query with `order by acos(sin(v1))` panic (SQLancer)
#11552 closed
Jul 21, 2024
32 Issues opened by 12 people
-
Improve performance of high cardinality grouping by reusing hash values
#11680 opened
Jul 26, 2024 -
[Epic] High cardinality aggregation performance wishlist
#11679 opened
Jul 26, 2024 -
Implement native `StringView` support for CharacterLength
#11677 opened
Jul 26, 2024 -
`pushdown_sorts` pushes a SortExec through a node in violation of its stated input ordering requirements
#11675 opened
Jul 26, 2024 -
Window function test fails when `force_hash_collisions` is enabled
#11660 opened
Jul 25, 2024 -
`equijoin_full_and_condition_from_both` slt test fails with `force_hash_collisions`
#11659 opened
Jul 25, 2024 -
Allow comparison of Timestamps with different Timezones
#11653 opened
Jul 25, 2024 -
Support per-option value normalization
#11650 opened
Jul 25, 2024 -
Allow custom planning behavior for selecting wildcard expression
#11639 opened
Jul 24, 2024 -
Optimize CASE expression for "expr or expr" usage
#11638 opened
Jul 24, 2024 -
A valid SQL query returned 'Schema error: ...' (SQLancer)
#11635 opened
Jul 24, 2024 -
Blog post with DataFusion July - Sep 2024
#11631 opened
Jul 24, 2024 -
Rewrite UDAF reversed expression name
#11629 opened
Jul 24, 2024 -
Reduce copying in `CoalesceBatchesExec` for StringViews
#11628 opened
Jul 23, 2024 -
Incorrect `NULL` handling for regex match `~` (SQLancer)
#11623 opened
Jul 23, 2024 -
Incorrect predicate evaluation result in a query (SQLancer-NoREC)
#11621 opened
Jul 23, 2024 -
Improve consistency and documentation on error handling in in UDFs
#11618 opened
Jul 23, 2024 -
Optimize CASE expression to produce dictionary-encoded arrays in some cases
#11605 opened
Jul 22, 2024 -
Write a blog post about implementing StringView in DataFusion
#11603 opened
Jul 22, 2024 -
DataFusion weekly project plan (Andrew Lamb) - July 22, 2024
#11601 opened
Jul 22, 2024 -
Review code that `downcast_ref` from `CatalogSession` to `SessionState`
#11600 opened
Jul 22, 2024 -
Propagation of ordered `SortProperties` should consider `nulls_first`
#11596 opened
Jul 22, 2024 -
Serialization of UDF might lose aliases
#11595 opened
Jul 22, 2024 -
Expression Simplifier doesn't consider associativity (`(i + 1) + 2)` is not simplified to `i + 3`)
#11594 opened
Jul 22, 2024 -
Add NullState::is_null public method
#11591 opened
Jul 22, 2024 -
Why is there no content in `Extending DataFusion’s operators: custom LogicalPlan and Execution Plan`
#11590 opened
Jul 22, 2024 -
Update Optimizer documentation
#11581 opened
Jul 21, 2024 -
Potential optimization for CASE WHEN for protecting against divide by zero
#11570 opened
Jul 20, 2024 -
Building time for `cargo bench` takes quite a long time
#11569 opened
Jul 20, 2024 -
Update ClickBench benchmarks with DataFusion 40
#11567 opened
Jul 20, 2024
37 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Move min and max to user defined aggregate function
#11013 commented on
Jul 26, 2024 • 7 new comments -
Minor: Rename `RepartitionMetrics::repartition_time` to `RepartitionMetrics::repart_time` to match metric
#11478 commented on
Jul 26, 2024 • 3 new comments -
Rename `ColumnOptions` to `ParquetColumnOptions`
#11512 commented on
Jul 26, 2024 • 1 new comment -
Implement `DynamicFileSchemaProvider` in the core
#11035 commented on
Jul 25, 2024 • 1 new comment -
Add support for SessionState in supports_filters_pushdown for a Custom Data Source
#11193 commented on
Jul 20, 2024 • 0 new comments -
Prototype implementing DataFusion functions / operators using `arrow-udf` liibrary
#11413 commented on
Jul 24, 2024 • 0 new comments -
Improve `SingleDistinctToGroupBy` to get the same plan as the `group by` query
#11360 commented on
Jul 25, 2024 • 0 new comments -
Avoid extra copies in `CoalesceBatchesExec` to improve performance
#7957 commented on
Jul 25, 2024 • 0 new comments -
Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled
#4028 commented on
Jul 25, 2024 • 0 new comments -
Intermittent failures in `fuzz_cases::join_fuzz::test_anti_join_1k_filtered`
#11555 commented on
Jul 25, 2024 • 0 new comments -
Support LogicalPlan --> `SQL String`
#8661 commented on
Jul 26, 2024 • 0 new comments -
Allow sorting to improve `FixedSizeBinary` filtering
#11170 commented on
Jul 26, 2024 • 0 new comments -
feat: support `grouping` aggregate function
#10208 commented on
Jul 22, 2024 • 0 new comments -
Feat: Implement hf:// / "hugging face" integration in datafusion-cli
#10792 commented on
Jul 21, 2024 • 0 new comments -
build(deps): update pyo3 requirement from 0.21.0 to 0.22.0
#11119 commented on
Jul 25, 2024 • 0 new comments -
add short circuit in BinaryExpr
#11247 commented on
Jul 24, 2024 • 0 new comments -
Update `prost`, `prost-derive`, `pbjson`, `tonic` ecosystem
#11372 commented on
Jul 23, 2024 • 0 new comments -
feat: precompile literal regex pattern
#11455 commented on
Jul 25, 2024 • 0 new comments -
Plan `LATERAL` subqueries
#11456 commented on
Jul 22, 2024 • 0 new comments -
Implement physical plan serialization for COPY plans `CsvLogicalExtensionCodec`
#11150 commented on
Jul 20, 2024 • 0 new comments -
Spatial data support
#7859 commented on
Jul 22, 2024 • 0 new comments -
[EPIC] Extract remaining physical optimizer out of core
#11502 commented on
Jul 22, 2024 • 0 new comments -
Internal error when there is a bitwise operation in `order by` clause (SQLancer)
#11561 commented on
Jul 22, 2024 • 0 new comments -
[EPIC] A collection of issues for supporting the `MAP` DataType
#11429 commented on
Jul 22, 2024 • 0 new comments -
[Epic] Implement support for `StringView` in DataFusion
#10918 commented on
Jul 22, 2024 • 0 new comments -
Resources exhuasted errors are confusing return the biggest memory consumers.
#11523 commented on
Jul 22, 2024 • 0 new comments -
Implement nested identifier access ( "Nested identifiers not yet supported" )
#11445 commented on
Jul 22, 2024 • 0 new comments -
[Epic] Remove Sort Merge Join Experimental status
#9846 commented on
Jul 22, 2024 • 0 new comments -
Optimize "per partition" top-k : `ROW_NUMBER < 5` / TopK
#6899 commented on
Jul 22, 2024 • 0 new comments -
Reduce repetition in `try_process_group_by_unnest` and `try_process_unnest`
#11498 commented on
Jul 23, 2024 • 0 new comments -
Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time
#10602 commented on
Jul 23, 2024 • 0 new comments -
Enable `split_file_groups_by_statistics` by default
#10336 commented on
Jul 23, 2024 • 0 new comments -
[Epic] Unify `WindowFunction` Interface (remove built in list of `BuiltInWindowFunction` s)
#8709 commented on
Jul 24, 2024 • 0 new comments -
Review use of logical expressions in physical AggregateFunctionExpr
#11359 commented on
Jul 24, 2024 • 0 new comments -
Improve Memory usage + performance with large numbers of groups / High Cardinality Aggregates
#6937 commented on
Jul 24, 2024 • 0 new comments -
[Proposal] Decouple logical from physical types
#11513 commented on
Jul 24, 2024 • 0 new comments -
Add has_side_effects to PhysicalExpr
#11490 commented on
Jul 24, 2024 • 0 new comments