-
M-dwarf flares in the Zwicky Transient Facility data and what we can learn from them
Authors:
A. S. Voloshina,
A. D. Lavrukhina,
M. V. Pruzhinskaya,
K. L. Malanchev,
E. E. O. Ishida,
V. V. Krushinsky,
P. D. Aleo,
E. Gangler,
M. V. Kornilov,
V. S. Korolev,
E. Russeil,
T. A. Semenikhin,
S. Sreejith,
A. A. Volnova
Abstract:
In this paper, we explore the possibility of detecting M-dwarf flares using data from the Zwicky Transient Facility data releases (ZTF DRs). We employ two different approaches: the traditional method of parametric fit search and a machine learning algorithm originally developed for anomaly detection. We analyzed over 35 million ZTF light curves and visually scrutinized 1168 candidates suggested by…
▽ More
In this paper, we explore the possibility of detecting M-dwarf flares using data from the Zwicky Transient Facility data releases (ZTF DRs). We employ two different approaches: the traditional method of parametric fit search and a machine learning algorithm originally developed for anomaly detection. We analyzed over 35 million ZTF light curves and visually scrutinized 1168 candidates suggested by the algorithms to filter out artifacts, occultations of a star by an asteroid, and known variable objects of other types. Our final sample comprises 134 flares with amplitude ranging from 0.2 to 4.6 magnitudes, including repeated flares and complex flares with multiple components. Using Pan-STARRS DR2 colors, we also assigned a corresponding spectral subclass to each object in the sample. For 13 flares with well-sampled light curves, we estimated the bolometric energy. Our results show that the ZTF's cadence strategy is suitable for identifying M-dwarf flares and other fast transients, allowing for the extraction of significant astrophysical information from their light curves.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Point Spread Function Deconvolution Using a Convolutional Autoencoder for Astronomical Applications
Authors:
Sreevarsha Sreejith,
Anže Slosar,
Hong Wang
Abstract:
A major issue in optical astronomical image analysis is the combined effect of the instrument's point spread function (PSF) and the atmospheric seeing that blurs images and changes their shape in a way that is band and time-of-observation dependent. In this work we present a very simple neural network based approach to non-blind image deconvolution that relies on feeding a Convolutional Autoencode…
▽ More
A major issue in optical astronomical image analysis is the combined effect of the instrument's point spread function (PSF) and the atmospheric seeing that blurs images and changes their shape in a way that is band and time-of-observation dependent. In this work we present a very simple neural network based approach to non-blind image deconvolution that relies on feeding a Convolutional Autoencoder (CAE) input images that have been preprocessed by convolution with the corresponding PSF and its regularized inverse. Compared to our previous work based on Deep Wiener Deconvolution, the new approach is conceptually simpler and computationally much less intensive while achieving only marginally worse results. In this work we also present a new approach for dealing with limited input dynamic range of neural networks compared to the dynamic range present in astronomical images.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Rainbow: a colorful approach on multi-passband light curve estimation
Authors:
E. Russeil,
K. L. Malanchev,
P. D. Aleo,
E. E. O. Ishida,
M. V. Pruzhinskaya,
E. Gangler,
A. D. Lavrukhina,
A. A. Volnova,
A. Voloshina,
T. Semenikhin,
S. Sreejith,
M. V. Kornilov,
V. S. Korolev
Abstract:
We present Rainbow, a physically motivated framework which enables simultaneous multi-band light curve fitting. It allows the user to construct a 2-dimensional continuous surface across wavelength and time, even in situations where the number of observations in each filter is significantly limited. Assuming the electromagnetic radiation emission from the transient can be approximated by a black-bo…
▽ More
We present Rainbow, a physically motivated framework which enables simultaneous multi-band light curve fitting. It allows the user to construct a 2-dimensional continuous surface across wavelength and time, even in situations where the number of observations in each filter is significantly limited. Assuming the electromagnetic radiation emission from the transient can be approximated by a black-body, we combined an expected temperature evolution and a parametric function describing its bolometric light curve. These three ingredients allow the information available in one passband to guide the reconstruction in the others, thus enabling a proper use of multi-survey data. We demonstrate the effectiveness of our method by applying it to simulated data from the Photometric LSST Astronomical Time-series Classification Challenge (PLAsTiCC) as well as real data from the Young Supernova Experiment (YSE DR1). We evaluate the quality of the estimated light curves according to three different tests: goodness of fit, time of peak prediction and ability to transfer information to machine learning (ML) based classifiers. Results confirm that Rainbow leads to equivalent (SNII) or up to 75% better (SN Ibc) goodness of fit when compared to the Monochromatic approach. Similarly, accuracy when using Rainbow best-fit values as a parameter space in multi-class ML classification improves for all classes in our sample. An efficient implementation of Rainbow has been publicly released as part of the light curve package at https://github.com/light-curve/light-curve-python. Our approach enables straight forward light curve estimation for objects with observations in multiple filters and from multiple experiments. It is particularly well suited for situations where light curve sampling is sparse.
△ Less
Submitted 5 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Are classification metrics good proxies for SN Ia cosmological constraining power?
Authors:
Alex I. Malz,
Mi Dai,
Kara A. Ponder,
Emille E. O. Ishida,
Santiago Gonzalez-Gaitain,
Rupesh Durgesh,
Alberto Krone-Martins,
Rafael S. de Souza,
Noble Kennamer,
Sreevarsha Sreejith,
Lluis Galbany,
The LSST Dark Energy Science Collaboration,
The Cosmostatistics Initiative
Abstract:
Context: When selecting a classifier to use for a supernova Ia (SN Ia) cosmological analysis, it is common to make decisions based on metrics of classification performance, i.e. contamination within the photometrically classified SN Ia sample, rather than a measure of cosmological constraining power. If the former is an appropriate proxy for the latter, this practice would save those designing an…
▽ More
Context: When selecting a classifier to use for a supernova Ia (SN Ia) cosmological analysis, it is common to make decisions based on metrics of classification performance, i.e. contamination within the photometrically classified SN Ia sample, rather than a measure of cosmological constraining power. If the former is an appropriate proxy for the latter, this practice would save those designing an analysis pipeline from the computational expense of a full cosmology forecast. Aims: This study tests the assumption that classification metrics are an appropriate proxy for cosmology metrics. Methods: We emulate photometric SN Ia cosmology samples with controlled contamination rates of individual contaminant classes and evaluate each of them under a set of classification metrics. We then derive cosmological parameter constraints from all samples under two common analysis approaches and quantify the impact of contamination by each contaminant class on the resulting cosmological parameter estimates. Results: We observe that cosmology metrics are sensitive to both the contamination rate and the class of the contaminating population, whereas the classification metrics are insensitive to the latter. Conclusions: We therefore discourage exclusive reliance on classification-based metrics for cosmological analysis design decisions, e.g. classifier choice, and instead recommend optimizing using a metric of cosmological parameter constraining power.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
The SNAD Viewer: Everything You Want to Know about Your Favorite ZTF Object
Authors:
Konstantin Malanchev,
Matwey V. Kornilov,
Maria V. Pruzhinskaya,
Emille E. O. Ishida,
Patrick D. Aleo,
Vladimir S. Korolev,
Anastasia Lavrukhina,
Etienne Russeil,
Sreevarsha Sreejith,
Alina A. Volnova,
Anastasiya Voloshina,
Alberto Krone-Martins
Abstract:
We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a…
▽ More
We describe the SNAD Viewer, a web portal for astronomers which presents a centralized view of individual objects from the Zwicky Transient Facility's (ZTF) data releases, including data gathered from multiple publicly available astronomical archives and data sources. Initially built to enable efficient expert feedback in the context of adaptive machine learning applications, it has evolved into a full-fledged community asset that centralizes public information and provides a multi-dimensional view of ZTF sources. For users, we provide detailed descriptions of the data sources and choices underlying the information displayed in the portal. For developers, we describe our architectural choices and their consequences such that our experience can help others engaged in similar endeavors or in adapting our publicly released code to their requirements. The infrastructure we describe here is scalable and flexible and can be personalized and used by other surveys and for other science goals. The Viewer has been instrumental in highlighting the crucial roles domain experts retain in the era of big data in astronomy. Given the arrival of the upcoming generation of large-scale surveys, we believe similar systems will be paramount in enabling an optimal exploitation of the scientific potential enclosed in current terabyte and future petabyte-scale data sets. The Viewer is publicly available online at https://ztf.snad.space
△ Less
Submitted 3 March, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.
-
Neural Network Based Point Spread Function Deconvolution For Astronomical Applications
Authors:
Hong Wang,
Sreevarsha Sreejith,
Yuewei Lin,
Nesar Ramachandra,
Anže Slosar,
Shinjae Yoo
Abstract:
Optical astronomical images are strongly affected by the point spread function (PSF) of the optical system and the atmosphere (seeing) which blurs the observed image. The amount of blurring depends both on the observed band, and on the atmospheric conditions during observation. A typical astronomical image will likely have a unique PSF, that is non-circular and different in different bands. At the…
▽ More
Optical astronomical images are strongly affected by the point spread function (PSF) of the optical system and the atmosphere (seeing) which blurs the observed image. The amount of blurring depends both on the observed band, and on the atmospheric conditions during observation. A typical astronomical image will likely have a unique PSF, that is non-circular and different in different bands. At the same time, observations of known stars also give us an accurate determination of this PSF. Therefore, any serious candidate for production analysis of astronomical images must take the known PSF into account during the image analysis. So far, the majority of applications of neural networks (NN) to astronomical image analysis have ignored this problem by assuming a fixed PSF in training and validation. We present a neural-network based deconvolution algorithm based on Deep Wiener Deconvolution Network (DWDN). This algorithm belongs to a class of non-blind deconvolution algorithms, since it assumes the PSF shape is known. We study the performance of different versions of this algorithm under realistic observational conditions in terms of the recovery of the most relevant astronomical quantities such as colors, ellipticities and orientations. We investigate custom loss functions that optimize the recovery of astronomical quantities with mixed results.
△ Less
Submitted 29 August, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Supernova search with active learning in ZTF DR3
Authors:
Maria V. Pruzhinskaya,
Emille E. O. Ishida,
Alexandra K. Novinskaya,
Etienne Russeil,
Alina A. Volnova,
Konstantin L. Malanchev,
Matwey V. Kornilov,
Patrick D. Aleo,
Vladimir S. Korolev,
Vadim V. Krushinsky,
Sreevarsha Sreejith,
Emmanuel Gangler
Abstract:
We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Our SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric…
▽ More
We provide the first results from the complete SNAD adaptive learning pipeline in the context of a broad scope of data from large-scale astronomical surveys. The main goal of this work is to explore the potential of adaptive learning techniques in application to big data sets. Our SNAD team used Active Anomaly Discovery (AAD) as a tool to search for new supernova (SN) candidates in the photometric data from the first 9.4 months of the Zwicky Transient Facility (ZTF) survey, namely, between March 17 and December 31 2018 (58194 < MJD < 58483). We analysed 70 ZTF fields at a high galactic latitude and visually inspected 2100 outliers. This resulted in 104 SN-like objects being found, 57 of which were reported to the Transient Name Server for the first time and with 47 having previously been mentioned in other catalogues, either as SNe with known types or as SN candidates. We visually inspected the multi-colour light curves of the non-catalogued transients and performed fittings with different supernova models to assign it to a probable photometric class: Ia, Ib/c, IIP, IIL, or IIn. Moreover, we also identified unreported slow-evolving transients that are good superluminous SN candidates, along with a few other non-catalogued objects, such as red dwarf flares and active galactic nuclei. Beyond confirming the effectiveness of human-machine integration underlying the AAD strategy, our results shed light on potential leaks in currently available pipelines. These findings can help avoid similar losses in future large-scale astronomical surveys. Furthermore, the algorithm enables direct searches of any type of data and based on any definition of an anomaly set by the expert.
△ Less
Submitted 27 March, 2023; v1 submitted 18 August, 2022;
originally announced August 2022.
-
SNAD Transient Miner: Finding Missed Transient Events in ZTF DR4 using k-D trees
Authors:
P. D. Aleo,
K. L. Malanchev,
M. V. Pruzhinskaya,
E. E. O. Ishida,
E. Russeil,
M. V. Kornilov,
V. S. Korolev,
S. Sreejith,
A. A. Volnova,
G. S. Narayan
Abstract:
We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real l…
▽ More
We report the automatic detection of 11 transients (7 possible supernovae and 4 active galactic nuclei candidates) within the Zwicky Transient Facility fourth data release (ZTF DR4), all of them observed in 2018 and absent from public catalogs. Among these, three were not part of the ZTF alert stream. Our transient mining strategy employs 41 physically motivated features extracted from both real light curves and four simulated light curve models (SN Ia, SN II, TDE, SLSN-I). These features are input to a k-D tree algorithm, from which we calculate the 15 nearest neighbors. After pre-processing and selection cuts, our dataset contained approximately a million objects among which we visually inspected the 105 closest neighbors from seven of our brightest, most well-sampled simulations, comprising 89 unique ZTF DR4 sources. Our result illustrates the potential of coherently incorporating domain knowledge and automatic learning algorithms, which is one of the guiding principles directing the SNAD team. It also demonstrates that the ZTF DR is a suitable testing ground for data mining algorithms aiming to prepare for the next generation of astronomical data.
△ Less
Submitted 4 May, 2022; v1 submitted 22 November, 2021;
originally announced November 2021.
-
Galaxy Deblending using Residual Dense Neural networks
Authors:
Hong Wang,
Sreevarsha Sreejith,
Anže Slosar,
Yuewei Lin,
Shinjae Yoo
Abstract:
We present a new neural network approach for deblending galaxy images in astronomical data using Residual Dense Neural network (RDN) architecture. We train the network on synthetic galaxy images similar to the typical arrangements of field galaxies with a finite point spread function (PSF) and realistic noise levels. The main novelty of our approach is the usage of two distinct neural networks: i)…
▽ More
We present a new neural network approach for deblending galaxy images in astronomical data using Residual Dense Neural network (RDN) architecture. We train the network on synthetic galaxy images similar to the typical arrangements of field galaxies with a finite point spread function (PSF) and realistic noise levels. The main novelty of our approach is the usage of two distinct neural networks: i) a deblending network which isolates a single galaxy postage stamp from the composite and, ii) a classifier network which counts the remaining number of galaxies. The deblending proceeds by iteratively peeling one galaxy at a time from the composite until the image contains no further objects as determined by the classifier, or by other stopping criteria. By looking at the consistency in the outputs of the two networks, we can assess the quality of the deblending. We characterize the flux and shape reconstructions in different quality bins and compare our deblender with the industry standard, SExtractor. We also discuss possible future extensions for the project with variable PSFs and noise levels.
△ Less
Submitted 18 July, 2022; v1 submitted 20 September, 2021;
originally announced September 2021.
-
Anomaly detection in the Zwicky Transient Facility DR3
Authors:
K. L. Malanchev,
M. V. Pruzhinskaya,
V. S. Korolev,
P. D. Aleo,
M. V. Kornilov,
E. E. O. Ishida,
V. V. Krushinsky,
F. Mondon,
S. Sreejith,
A. A. Volnova,
A. A. Belinski,
A. V. Dodin,
A. M. Tatarnikov,
S. G. Zheltoukhov
Abstract:
We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million obje…
▽ More
We present results from applying the SNAD anomaly detection pipeline to the third public data release of the Zwicky Transient Facility (ZTF DR3). The pipeline is composed of 3 stages: feature extraction, search of outliers with machine learning algorithms and anomaly identification with followup by human experts. Our analysis concentrates in three ZTF fields, comprising more than 2.25 million objects. A set of 4 automatic learning algorithms was used to identify 277 outliers, which were subsequently scrutinised by an expert. From these, 188 (68%) were found to be bogus light curves -- including effects from the image subtraction pipeline as well as overlapping between a star and a known asteroid, 66 (24%) were previously reported sources whereas 23 (8%) correspond to non-catalogued objects, with the two latter cases of potential scientific interest (e. g. 1 spectroscopically confirmed RS Canum Venaticorum star, 4 supernovae candidates, 1 red dwarf flare). Moreover, using results from the expert analysis, we were able to identify a simple bi-dimensional relation which can be used to aid filtering potentially bogus light curves in future studies. We provide a complete list of objects with potential scientific application so they can be further scrutinised by the community. These results confirm the importance of combining automatic machine learning algorithms with domain knowledge in the construction of recommendation systems for astronomy. Our code is publicly available at https://github.com/snad-space/zwad
△ Less
Submitted 2 February, 2021; v1 submitted 2 December, 2020;
originally announced December 2020.
-
Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients
Authors:
Noble Kennamer,
Emille E. O. Ishida,
Santiago Gonzalez-Gaitan,
Rafael S. de Souza,
Alexander Ihler,
Kara Ponder,
Ricardo Vilalta,
Anais Moller,
David O. Jones,
Mi Dai,
Alberto Krone-Martins,
Bruno Quint,
Sreevarsha Sreejith,
Alex I. Malz,
Lluis Galbany
Abstract:
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and…
▽ More
The recent increase in volume and complexity of available astronomical data has led to a wide use of supervised machine learning techniques. Active learning strategies have been proposed as an alternative to optimize the distribution of scarce labeling resources. However, due to the specific conditions in which labels can be acquired, fundamental assumptions, such as sample representativeness and labeling cost stability cannot be fulfilled. The Recommendation System for Spectroscopic follow-up (RESSPECT) project aims to enable the construction of optimized training samples for the Rubin Observatory Legacy Survey of Space and Time (LSST), taking into account a realistic description of the astronomical data environment. In this work, we test the robustness of active learning techniques in a realistic simulated astronomical data scenario. Our experiment takes into account the evolution of training and pool samples, different costs per object, and two different sources of budget. Results show that traditional active learning strategies significantly outperform random sampling. Nevertheless, more complex batch strategies are not able to significantly overcome simple uncertainty sampling techniques. Our findings illustrate three important points: 1) active learning strategies are a powerful tool to optimize the label-acquisition task in astronomy, 2) for upcoming large surveys like LSST, such techniques allow us to tailor the construction of the training sample for the first day of the survey, and 3) the peculiar data environment related to the detection of astronomical transients is a fertile ground that calls for the development of tailored machine learning algorithms.
△ Less
Submitted 26 October, 2020; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Newly discovered dwarf galaxies in the MATLAS low density fields
Authors:
Rebecca Habas,
Francine R. Marleau,
Pierre-Alain Duc,
Patrick R. Durrell,
Sanjaya Paudel,
Mélina Poulain,
Rubén Sánchez-Janssen,
Sreevarsha Sreejith,
Joanna Ramasawmy,
Bryson Stemock,
Christopher Leach,
Jean-Charles Cuillandre,
Stephen Gwyn,
Adriano Agnello,
Michal Bílek,
Jérémy Fensch,
Oliver Müller,
Eric W. Peng,
Remco F. J. van der Burg
Abstract:
We present the photometric properties of 2210 newly identified dwarf galaxy candidates in the MATLAS fields. The Mass Assembly of early Type gaLAxies with their fine Structures (MATLAS) deep imaging survey mapped $\sim$142 deg$^2$ of the sky around nearby isolated early type galaxies using MegaCam on the Canada-France-Hawaii Telescope, reaching surface brightnesses of $\sim$ 28.5 - 29 in the g-ban…
▽ More
We present the photometric properties of 2210 newly identified dwarf galaxy candidates in the MATLAS fields. The Mass Assembly of early Type gaLAxies with their fine Structures (MATLAS) deep imaging survey mapped $\sim$142 deg$^2$ of the sky around nearby isolated early type galaxies using MegaCam on the Canada-France-Hawaii Telescope, reaching surface brightnesses of $\sim$ 28.5 - 29 in the g-band. The dwarf candidates were identified through a direct visual inspection of the images and by visually cleaning a sample selected using a partially automated approach, and were morphologically classified at the time of identification. Approximately 75% of our candidates are dEs, indicating that a large number of early type dwarfs also populate low density environments, and 23.2% are nucleated. Distances were determined for 13.5% of our sample using pre-existing $z_{spec}$ measurements and HI detections. We confirm the dwarf nature for 99% of this sub-sample based on a magnitude cut $M_g$ = -18. Additionally, most of these ($\sim$90%) have relative velocities suggesting that they form a satellite population around nearby massive galaxies rather than an isolated field sample. Assuming that the candidates over the whole survey are satellites of the nearby galaxies, we demonstrate that the MATLAS dwarfs follow the same scaling relations as dwarfs in the Local Group as well as the Virgo and Fornax clusters. We also find that the nucleated fraction increases with $M_g$, and find evidence of a morphology-density relation for dwarfs around isolated massive galaxies.
△ Less
Submitted 17 December, 2019; v1 submitted 29 October, 2019;
originally announced October 2019.
-
Active Anomaly Detection for time-domain discoveries
Authors:
Emille E. O. Ishida,
Matwey V. Kornilov,
Konstantin L. Malanchev,
Maria V. Pruzhinskaya,
Alina A. Volnova,
Vladimir S. Korolev,
Florian Mondon,
Sreevarsha Sreejith,
Anastasia Malancheva,
Shubhomoy Das
Abstract:
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine le…
▽ More
We present the first evidence that adaptive learning techniques can boost the discovery of unusual objects within astronomical light curve data sets. Our method follows an active learning strategy where the learning algorithm chooses objects which can potentially improve the learner if additional information about them is provided. This new information is subsequently used to update the machine learning model, allowing its accuracy to evolve with each new information. For the case of anomaly detection, the algorithm aims to maximize the number of scientifically interesting anomalies presented to the expert by slightly modifying the weights of a traditional Isolation Forest (IF) at each iteration. In order to demonstrate the potential of such techniques, we apply the Active Anomaly Discovery (AAD) algorithm to 2 data sets: simulated light curves from the PLAsTiCC challenge and real light curves from the Open Supernova Catalog. We compare the AAD results to those of a static IF. For both methods, we performed a detailed analysis for all objects with the ~2% highest anomaly scores. We show that, in the real data scenario, AAD was able to identify ~80\% more true anomalies than the IF. This result is the first evidence that AAD algorithms can play a central role in the search for new physics in the era of large scale sky surveys.
△ Less
Submitted 14 July, 2020; v1 submitted 29 September, 2019;
originally announced September 2019.
-
Galaxy And Mass Assembly: Automatic Morphological Classification of Galaxies Using Statistical Learning
Authors:
Sreevarsha Sreejith,
Sergiy Pereverzyev Jr.,
Lee S. Kelvin,
Francine Marleau,
Markus Haltmeier,
Judith Ebner,
Joss Bland-Hawthorn,
Simon P. Driver,
Alister W. Graham,
Benne W. Holwerda,
A. M. Hopkins,
J. Liske,
Jon Loveday,
Amanda J. Moffett,
K. A. Pimbblet,
Edward N. Taylor,
Lingyu Wang,
Angus H. Wright
Abstract:
We apply four statistical learning methods to a sample of $7941$ galaxies ($z<0.06$) from the Galaxy and Mass Assembly (GAMA) survey to test the feasibility of using automated algorithms to classify galaxies. Using $10$ features measured for each galaxy (sizes, colours, shape parameters \& stellar mass) we apply the techniques of Support Vector Machines (SVM), Classification Trees (CT), Classifica…
▽ More
We apply four statistical learning methods to a sample of $7941$ galaxies ($z<0.06$) from the Galaxy and Mass Assembly (GAMA) survey to test the feasibility of using automated algorithms to classify galaxies. Using $10$ features measured for each galaxy (sizes, colours, shape parameters \& stellar mass) we apply the techniques of Support Vector Machines (SVM), Classification Trees (CT), Classification Trees with Random Forest (CTRF) and Neural Networks (NN), returning True Prediction Ratios (TPRs) of $75.8\%$, $69.0\%$, $76.2\%$ and $76.0\%$ respectively. Those occasions whereby all four algorithms agree with each other yet disagree with the visual classification (`unanimous disagreement') serves as a potential indicator of human error in classification, occurring in $\sim9\%$ of ellipticals, $\sim9\%$ of Little Blue Spheroids, $\sim14\%$ of early-type spirals, $\sim21\%$ of intermediate-type spirals and $\sim4\%$ of late-type spirals \& irregulars. We observe that the choice of parameters rather than that of algorithms is more crucial in determining classification accuracy. Due to its simplicity in formulation and implementation, we recommend the CTRF algorithm for classifying future galaxy datasets. Adopting the CTRF algorithm, the TPRs of the 5 galaxy types are : E, $70.1\%$; LBS, $75.6\%$; S0-Sa, $63.6\%$; Sab-Scd, $56.4\%$ and Sd-Irr, $88.9\%$. Further, we train a binary classifier using this CTRF algorithm that divides galaxies into spheroid-dominated (E, LBS \& S0-Sa) and disk-dominated (Sab-Scd \& Sd-Irr), achieving an overall accuracy of $89.8\%$. This translates into an accuracy of $84.9\%$ for spheroid-dominated systems and $92.5\%$ for disk-dominated systems.
△ Less
Submitted 17 November, 2017; v1 submitted 16 November, 2017;
originally announced November 2017.