Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeseries object for Astropy (APE9) #12

Merged
merged 7 commits into from
Oct 20, 2023
Merged

Timeseries object for Astropy (APE9) #12

merged 7 commits into from
Oct 20, 2023

Conversation

Cadair
Copy link
Member

@Cadair Cadair commented Mar 24, 2016

This is still in very early draft.

@abigailStev
Copy link

Pinging myself so I can follow development and discussion.

@ehsteve
Copy link

ehsteve commented Mar 30, 2016

I am very interested in this. Thanks for putting this together @Cadair.

@ehsteve
Copy link

ehsteve commented Mar 30, 2016

and thanks to everyone that participate in the discussion at PyAstro16 :P

Abstract
--------

The goal of a timeseries object in astropy is to provide a core set of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the goal of sharing. that's a general astropy goal.

@ehsteve
Copy link

ehsteve commented Mar 30, 2016

I think it would be helpful to list a number of examples of analyses that we would want to perform on these.

@duncanmmacleod
Copy link

I am interested in this feature, and think it would be a great addition. GWpy would likely move to using/inheriting this object, rather than the current gwpy.timeseries.TimeSeries (which inherits from astropy.units.Quantity in the first place).
I'm happy to help with development and testing as desired.

@abigailStev
Copy link

I am in support of this APE! And I'm available to help with testing.

@eteq
Copy link
Member

eteq commented May 12, 2017

I like it and would use it.

@eaydin
Copy link

eaydin commented May 12, 2017

I love it!

@abigailStev
Copy link

I would like for @StingraySoftware to move towards using a TimeSeries object (or at least basing our Lightcurve class on it) for lightcurves.

APE9.rst Outdated

This APE proposes that we place the following restrictions on binned data:

#. Contiguious bins, i.e. the start of the i+1th bin is the end of the ith bin.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided this was not going to be necessary.

Many different areas of astrophysics have to deal with 1D timeseries data, i.e.
either sampling a continuous variable at fixed times or counting some events
binned into time windows. These types of applications require some basic
functionality:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many of these features already exist in QTable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some, but all would need to be customised for timeseries data (i.e. to preserve the index correctly.)

Mainly examples on the difference in construction between binned and sampled data.
APE9.rst Outdated
functionality:

#. Extending timeseries with extra rows
#. Concatenating multiple timeseries objects
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-word to clarify that concatenation is extra columns.


Initialize a ``SampledTimeSeries`` with a time and a data column::

ts = SampledTimeSeries(time=['2016-03-22T12:30:31', '2016-03-22T12:30:32', '2016-03-22T12:30:40'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be able to construct from a start time and a list of "deltas"?

API. Some examples of constructing these objects is given below.


Initialize a ``SampledTimeSeries`` with a time and a data column::
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should also demonstrate constructing these objects with Time and TimeDelta so that it's clear that's a thing.

APE9.rst Outdated
#. Slicing / selecting time ranges (indexing)
#. Re-binning and re-sampling timeseries
#. Interpolating to different time stamps.
#. Masking
Copy link
Member Author

@Cadair Cadair May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and missing value (NaN)

@mwcraig
Copy link
Member

mwcraig commented Nov 9, 2018

I'm late to this, but have a couple things to add:

  • I like the TimeSeries/BinnedTimeSeries option more than the original Sampled... and Binned...
  • I'd propose (this may be a little extreme) that in the BinnedTimeSeries there not be a time property at all. Instead, I'd go with the more explicit start_time, end_time and mid_time (or something along those lines). It is a little more verbose but has the virtue of being very clear what time you were using without needing to track down the documentation.
@gbelanger
Copy link

gbelanger commented Nov 10, 2018

Good morning,

I just read through this thread and find the discussion very useful and important. I have implemented, over the past 12 years, a bunch of packages for time domain analysis in Java, and have had to make all the choices that are being discussed above in terms of how to represent time-related data.

Because my background is high energy astrophysics, it was clear from the very start that I needed to have EventList and TimeSeries as different objects. As it was mentioned above, EvenList can have other characteristics in addition to arrivalTime such as x, y, energy, and a quality flag, for example. In the TimeSeries this is not the case. But there can be fundamentally different kinds of time series, like what I've called CodedMaskTimeSeries that carries a lot more information than just intensity as a function of time, because the time series is actually reconstructed from images, and the errors depend on the angle at which the source was in the field of view, and on and on.

So, I have a package for event lists and a package of time series, and in the time series package, I have an interface ITimeSeries, and abstract class AbstractTimeSeries, and two concrete classes TimeSeries, and CodedMaskTimeSeries going from the general to more specific, each class inheriting from the previous and each implementing the interface. I also have an interface ITimeSeriesFileReader with specific file readers that implement it for different kinds of input files (fits, ascii, qdp), such that the TimeSeriesReader just loops through to try each implemented type when given an input file. Similarly, there are several TimeSeriesFileWriters.

The similarity between an EventList and TimeSeries in terms of its temporal information carrying capacity is used in the factory classes that can handle time data: the TimeSeriesMaker and PeriodogramMaker. These have methods to deal with event lists and with time series as inputs. For the periodograms, it is again with an AbstractPeriodogram that is used as the base, and then different types that extend it (FFTPeriodogram, RayleighPeriodogram, LombScarglePeriodogram, ZPeriodogram, ModifiedRayleighPeriodgram, LikelihoodPeriodogram, as well as AggregatePeriodogram to combine periodograms and AveragePeriodogram to express the result, which is the only periodogram that has uncertainties on the powers, because they are not used in any other one since the power a each frequency in a periodogram is a point estimate.

Lastly, I have been recently re-writing the resampling of time series in order to make it more manageable, because dealing with resampling of a binned time series in a general way is very complicated. My time series are allowed to have both irregular sized bins and irregular spaces in between them, and everything is written to handle that. So, I have been working on decomposing the time series into a collection of intensity bins made up of (composition) an Intensity and a Bin, each of which carries specific information that defines everything we need to know. In this way, we can handle each IntensityBin one at a time. This is ongoing and I'm still sorting out difficulties like how to deal with different kinds of intensities (absolute quantity or density, for example) in a general manner.

But in conclusion, I think that firstly, having EventList and TimeSeries objects makes sense, because an EventList is not really a kind of TimeSeries even though its arrival times can be used to make one. And secondly, it is very important to look down the line and consider all the kinds of things that will be done with the EventList and TimeSeries (e.g. periodograms or resampling) so that it doesn't become too difficult to do them when we get there. Please let me know if you want more details about anything addressed in this comment. I'd be happy to help further. Unfortunately, I haven't yet started coding in python. Maybe it's the time to start ;) I also need to update my repos on GitHub which I promise to do soon.

@astrofrog
Copy link
Member

@gbelanger - thank you for your insight - it's always really helpful to know that people have gone through similar design questions in the past and see how they have solved them. I think it would be really helpful if you take a look at what we have so far (once ready to try out) and see if you have any suggestions (and it would be great if you were interested in contributing!)

@astrofrog
Copy link
Member

astrofrog commented Nov 16, 2018

I've now had a chance to digest all the suggestions above and have now updated the prototype implementation. I've also now put the implementation in a standalone package to make it easy to try out. The repository and docs are here:

:octocat: https://github.com/aperiosoftware/astropy-timeseries
📖 https://astropy-timeseries.readthedocs.io/en/latest/

It would be really helpful if people here can take it for a spin to see if you like the current API. The way I've set up this package is such that once we open a PR to astropy core, all commits will be preserved. Therefore, feel free to open pull requests, or issues! If you'd like to work on some of the open issues, just leave a comment!

You can easily install the package with:

pip install git+https://github.com/aperiosoftware/astropy-timeseries.git

and note that things will work better if you have the latest developer version of Astropy (or the 3.1rc1).

A few updates based on the above discussion:

  • I've decided to stick with separate classes for the binned and sampled time series for now, but calling the sampled one TimeSeries rather than SampledTimeSeries

  • TimeSeries has a time attribute, while BinnedTimeSeries instead has time_bin_start, time_bin_center, time_bin_end, time_bin_size attributes to make sure everything is explicit.

  • I've moved some of the functionality (e.g. the downsampling) out of the base class since the implementation that's there may not be appropriate for all cases. In general I think the base time series classes should be as generic as possible, and we or affiliated packages can then implement sub-classes if needed.

  • I don't think we need to take a stance on event lists - if people want to use TimeSeries for event lists, they are free to do so, but some packages may want to implement their own specialized event list class.

Let me know if you have any feedback!

@StuartLittlefair
Copy link

StuartLittlefair commented Nov 18, 2018

I'm a little late to this party, but having looked at and tried the prototype I have a suggestion which might make the TimeSeries class more powerful. Unfortunately I think it would mean a quite different implementation.

If I imagine my use cases for a class, it is to handle collections of data, which have a matching time attribute. Typically, data has values, uncertainty, and perhaps a mask indicating data quality - just like the NDData class.

It would also be incredibly powerful if arithmetic on TimeSeries objects performed arithmetic on matching "columns", whilst handling the uncertainties and masks automatically. Showing, an example, wouldn't it be great if something like this worked...

flux_1 = NDDataArray(data=y_1, uncertainty=StdDevUncertainty(yerr_1), flags=flags_1) 
flux_2 = NDDataArray(data=y_2, uncertainty=StdDevUncertainty(yerr_2), flags=flags_2) 
ts_1 = TimeSeries(time=t, data={'flux': flux_1})
ts_2 = TimeSeries(time=t, data={'flux': flux_2})
rel_ts = ts_1/ts_2

...and the rel_ts object also had a flux entry which had the correct values, uncertainty and mask? Users could subclass NDDataRef to get have the flags combine anyway they'd like.

Such behaviour is compatible with the APE as it stands, but can't be achieved if a TimeSeries is a subclass of Table, since they don't support having NDDataArray as a column, as far as I'm aware. One would lose the ability to join, stack etc - but I'd argue that the functionality above is more powerful for many users. I've posted a highly incomplete prototype to illustrate how it might be implemented here.

Sorry if this suggestion comes after many people have done a lot of hard work; I hadn't realised this issue until I started to play with the implementation. Feel free to ignore this, or point out glaring errors

@astrofrog
Copy link
Member

astrofrog commented Nov 19, 2018

@StuartLittlefair - thanks for the suggestion! I agree that using NDData might work well for some use cases. I'd like to make two suggestions:

  • In specutils, we made a spectral mix-in class that can be used to make one of the NDData dimensions be a spectral axis. We could do the same for time, i.e. make a mix-in class that provides an easy way to set up one of the axes of an NDData cube as a time axis. Note that this would then also allow one to set up cubes with two spatial, one spectral, and one time axis. To me, this is complementary to the Table approach - in some cases people want to think of time series as being simple tables, and in others people might want to think of it as an n-dimensional cube (where n>=1) where one axis is time. (Implement time series mix-in class for NDData aperiosoftware/astropy-timeseries#16)

  • There's no reason we can't make NDDataArray be used as a mix-in column in tables - in fact I'll open an issue about that now! (Make it possible to use NDDataArray as a mix-in column in tables astropy#8159)

I think both of these can be done of course, we don't have to choose.

I'll also mention that for NDData, we changed the API a few times because we ended up making the base class try and be too smart, and some functionality is best left to sub-classes. I think we will need to do the same here - that is, functionality like rel_ts = ts_1/ts_2 makes too many assumptions to belong in a base time series class. However, we could always develop a sub-class that behaves in the way you indicate, as a convenience layer.

I'll try and work on an NDData mix-in class implementation to see how much work it might be and whether it would suit your use case. I'll also see if I can re-use some of your prototype code.

@gully
Copy link

gully commented Oct 11, 2019

Should we merge this APE since astropy.timeseries is merged?

@bsipocz
Copy link
Member

bsipocz commented Oct 11, 2019

@gully - I would rather wait until 4.0, and ask everyone (especially folks with non single band optical photometry use cases) here to give the current version a good try and report back issues or missing functionalities.

@gully
Copy link

gully commented Oct 11, 2019

Thank you @bsipocz , indeed learning from the existing timeseries implementation should feed back to the long term vision.

@bsipocz
Copy link
Member

bsipocz commented Oct 11, 2019

(well, it's not my call to anyway to merge this, but the feedback I heard so far is that the current api is very much kepler type light curve centric. but it's very very difficult to anything about it without actual issues being opened...)

@gully
Copy link

gully commented Oct 11, 2019

Interesting, I should try out the timeseries class more to see what feedback I can give. I am curious about multichannel and sparse applications too, despite being Kepler-centric much of the time. I'm especially interested in use cases that combine sparse ground-based observations with high cadence space based observations. By "combine", I don't necessarily mean in the same "flat" timeseries object, though that's conceivable. I just mean making sparse and high cadence interoperate more easily would unlock some good science. The different bandpasses comes up a lot too.

I re-read the APE governance protocol on the README and appreciate that a lot goes into formalizing the APE, so my use of "merge" wasn't the right choice. I had thought an APE was a pre-requisite to making new functionality, but I see now that making an experimental implementation further informs use cases and guides the functionality in the longer term.

@bsipocz
Copy link
Member

bsipocz commented Oct 11, 2019

@gully - make sure you're playing with the dev version as there are some substantial improvements in the underlying classes that were necessary to be able to extend the functionalities (though none of those are reflected in the narrative docs yet)

@gully gully mentioned this pull request Oct 12, 2019
@Jerry-Ma
Copy link

Jerry-Ma commented Oct 23, 2019

Hi,

I am working on the TolTEC project and I am wring python code for handling the time stream data collected from an array of bolometers that scans in time.

As already pointed out in the previous comments, since each time sample contains tightly packed data (in our case, a vector of size ~4000, matching the total number of detectors in the array), it would make most sense to have an NDData as the storage and have timeaxis mixin attached to one of the axis.

I saw that #16 is still open and I was wondering what is the status of that?

Base automatically changed from master to main March 10, 2021 17:58
@pllim
Copy link
Member

pllim commented Jun 9, 2022

So... astropy/astropy#8540 was merged and released in v3.2. Where are we going with this in 2022?

@pllim
Copy link
Member

pllim commented May 31, 2023

Pinging again in 2023. Are we just going to wrap this up since timeseries is already in core and all that?

@astrofrog
Copy link
Member

We should have a "Superseded by events" status for APEs 😅 (jk!)

@gully
Copy link

gully commented Aug 4, 2023

So is this APE complete now that timeseries exists? I guess a repeat of the question above. Would folks say that the aspirations of this APE followed through in the implementation?

@pllim
Copy link
Member

pllim commented Aug 4, 2023

Would have to ping @astrofrog and @Cadair again

APE9.rst Outdated Show resolved Hide resolved
APE9.rst Outdated Show resolved Hide resolved
APE9.rst Outdated Show resolved Hide resolved
@hamogu hamogu merged commit 0eb5cf8 into astropy:main Oct 20, 2023
@hamogu
Copy link
Member

hamogu commented Oct 20, 2023

@eteq Please follow through with the Zenodo steps when they are online again (they had trouble minting DOIs the last few days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment