-
Applying machine learning to Galactic Archaeology: how well can we recover the origin of stars in Milky Way-like galaxies?
Authors:
Andrea Sante,
Andreea S. Font,
Sandra Ortega-Martorell,
Ivan Olier,
Ian G. McCarthy
Abstract:
We present several machine learning (ML) models developed to efficiently separate stars formed in-situ in Milky Way-type galaxies from those that were formed externally and later accreted. These models, which include examples from artificial neural networks, decision trees and dimensionality reduction techniques, are trained on a sample of disc-like, Milky Way-mass galaxies drawn from the ARTEMIS…
▽ More
We present several machine learning (ML) models developed to efficiently separate stars formed in-situ in Milky Way-type galaxies from those that were formed externally and later accreted. These models, which include examples from artificial neural networks, decision trees and dimensionality reduction techniques, are trained on a sample of disc-like, Milky Way-mass galaxies drawn from the ARTEMIS cosmological hydrodynamical zoom-in simulations. We find that the input parameters which provide an optimal performance for these models consist of a combination of stellar positions, kinematics, chemical abundances ([Fe/H] and [$α$/Fe]) and photometric properties. Models from all categories perform similarly well, with area under the precision-recall curve (PR-AUC) scores of $\simeq 0.6$. Beyond a galactocentric radius of $5$~kpc, models retrieve $>90\%$ of accreted stars, with a sample purity close to $60\%$, however the purity can be increased by adjusting the classification threshold. For one model, we also include host galaxy-specific properties in the training, to account for the variability of accretion histories of the hosts, however this does not lead to an improvement in performance. The ML models can identify accreted stars even in regions heavily dominated by the in-situ component (e.g., in the disc), and perform well on an unseen suite of simulations (the Auriga simulations). The general applicability bodes well for application of such methods on observational data to identify accreted substructures in the Milky Way without the need to resort to selection cuts for minimising the contamination from in-situ stars.
△ Less
Submitted 18 June, 2024; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Machine Learning applications for Cataclysmic Variable discovery in the ZTF alert stream
Authors:
D. Mistry,
C. M. Copperwheat,
I. Olier,
M. J. Darnley
Abstract:
Cataclysmic variables (CV) encompass a diverse array of accreting white dwarf binary systems. Each class of CV represents a snapshot along an evolutionary journey, one with the potential to trigger a type Ia supernova event. The study of CVs offers valuable insights into binary evolution and accretion physics, with the rarest examples potentially providing the deepest insights. However, the escala…
▽ More
Cataclysmic variables (CV) encompass a diverse array of accreting white dwarf binary systems. Each class of CV represents a snapshot along an evolutionary journey, one with the potential to trigger a type Ia supernova event. The study of CVs offers valuable insights into binary evolution and accretion physics, with the rarest examples potentially providing the deepest insights. However, the escalating number of detected transients, coupled with our limited capacity to investigate them all, poses challenges in identifying such rarities. Machine Learning (ML) plays a pivotal role in addressing this issue by facilitating the categorisation of each detected transient into its respective transient class. Leveraging these techniques, we have developed a two-stage pipeline tailored to the ZTF transient alert stream. The first stage is an alerts filter aimed at removing non-CVs, while the latter is an ML classifier produced using XGBoost, achieving a macro average AUC score of 0.92 for distinguishing between CV classes. By utilising the Generative Topographic Mapping algorithm with classifier posterior probabilities as input, we obtain representations indicating that CV evolutionary factors play a role in classifier performance, while the associated feature maps present a potent tool for identifying the features deemed most relevant for distinguishing between classes. Implementation of the pipeline in June 2023 yielded 51 intriguing candidates that are yet to be reported as CVs or classified with further granularity. Our classifier represents a significant step in the discovery and classification of different CV classes, a domain of research still in its infancy.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Machine Learning based search for Cataclysmic Variables within Gaia Science Alerts
Authors:
D. Mistry,
C. M. Copperwheat,
M. J. Darnley,
I. Olier
Abstract:
Wide-field time domain facilities detect transient events in large numbers through difference imaging. For example, Zwicky Transient Facility produces alerts for hundreds of thousands of transient events per night, a rate set to be dwarfed by the upcoming Vera Rubin Observatory. The automation provided by Machine Learning (ML) is, therefore, necessary to classify these events and select the most i…
▽ More
Wide-field time domain facilities detect transient events in large numbers through difference imaging. For example, Zwicky Transient Facility produces alerts for hundreds of thousands of transient events per night, a rate set to be dwarfed by the upcoming Vera Rubin Observatory. The automation provided by Machine Learning (ML) is, therefore, necessary to classify these events and select the most interesting sources for follow-up observations. Cataclysmic Variables (CVs) are a transient class that are numerous, bright, and nearby, providing excellent laboratories for the study of accretion and binary evolution. Here we focus on our use of ML to identify CVs from photometric data of transient sources published by the Gaia Science Alerts program (GSA) - a large, easily accessible resource, not fully explored with ML. The use of light curve feature extraction techniques and source metadata from the Gaia survey resulted in a Random Forest model capable of distinguishing CVs from supernovae, Active Galactic Nuclei, and Young Stellar Objects with a 92\% precision score and an 85\% hit rate. Of 13,280 sources within GSA without an assigned transient classification our model predicts the CV class for $\sim$2800. Spectroscopic observations are underway to classify a statistically significant sample of these targets to validate the performance of the model. This work puts us on a path towards the classification of rare CV subtypes from future wide-field surveys such as the Legacy Survey of Space and Time.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.