-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] date and timestamp parsers #2353
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work so far. Overall, to_date
looks good, and to_datetime
can be enhanced to support timeunit inference and timezones
@@ -1436,6 +1436,30 @@ def substr(self, start: int | Expression, length: int | Expression | None = None | |||
length_expr = Expression._to_expression(length) | |||
return Expression._from_pyexpr(self._expr.utf8_substr(start_expr._expr, length_expr._expr)) | |||
|
|||
def to_date(self, format: str | Expression) -> Expression: | |||
"""Converts a string to a date using the specified format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add link to the chrono docs here that specify valid format strings? I believe its here: https://docs.rs/chrono/latest/chrono/format/strftime/index.html
And also could you do the same for to_datetime as well
daft/expressions/expressions.py
Outdated
@@ -1436,6 +1436,30 @@ def substr(self, start: int | Expression, length: int | Expression | None = None | |||
length_expr = Expression._to_expression(length) | |||
return Expression._from_pyexpr(self._expr.utf8_substr(start_expr._expr, length_expr._expr)) | |||
|
|||
def to_date(self, format: str | Expression) -> Expression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets change this to just accept str
. See the comment below for changing to_datetime
to accept a single str
as well.
daft/expressions/expressions.py
Outdated
format_expr = Expression._to_expression(format) | ||
return Expression._from_pyexpr(self._expr.utf8_to_date(format_expr._expr)) | ||
|
||
def to_datetime(self, format: str | Expression) -> Expression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to support different timeunits, as well as datetimes with timezones. To do this, lets change the function signature to to_datetime(self, format : str, timezone: str | None = None)
By constraining format to be a single str, we can infer the correct timeunit when we call to_field
in src/daft-dsl/src/functions/utf8/to_datetime.rs by checking if the format string contains nanosecond, i.e. %9f
or %.9f
, or millisecond, i.e. %.3f
or %3f
, precision, otherwise default to microsecond.
By allowing the timezone: str | None = None
argument, we can parse the string using chrono::NaiveDateTime::parse_from_str
or chrono::DateTime::parse_from_str
at the array level.
with pytest.raises(ValueError): | ||
s.str.to_date(formats) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add tests for datetimes with timezones as well. A particular edge case to consider would be columns with different timezones, where each entry should be converted to the user provided timezone.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2353 +/- ##
=======================================
Coverage ? 63.51%
=======================================
Files ? 908
Lines ? 102078
Branches ? 0
=======================================
Hits ? 64831
Misses ? 37247
Partials ? 0
|
@murex971 Is this ready for rereview? |
Thanks for this @murex971 ! I just made some additional commits to get this to the finish line, didn't think another round of back and forth was needed :) |
Resolves #2309
Resolves #2302