-
Machine learning technique for morphological classification of galaxies from the SDSS. III. Image-based inference of detailed features
Authors:
V. Khramtsov,
I. B. Vavilova,
D. V. Dobrycheva,
M. Yu. Vasylenko,
O. V. Melnyk,
A. A. Elyiv,
V. S. Akhmetov,
A. M. Dmytrenko
Abstract:
This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the…
▽ More
This paper follows series of our works on the applicability of various machine learning methods to the morphological galaxy classification (Vavilova et al., 2021, 2022). We exploited the sample of 315776 SDSS DR9 galaxies with absolute stellar magnitudes of -24m<Mr<-19.4m at 0.003<z<0.1 as a target data set for the CNN classifier based on the DenseNet-201. Because it is tightly overlapped with the Galaxy Zoo 2 (GZ2) sample, we use these annotated data as the training data set to classify galaxies into 34 detailed features. In the presence of a pronounced difference of visual parameters between galaxies from the GZ2 training data set and galaxies without known morphological parameters, we applied novel procedures, which allowed us for the first time to get rid of this difference for smaller and fainter SDSS galaxies.
We describe in detail the adversarial validation technique as well as how we managed the optimal train-test split of galaxies from the training data set. We have also found optimal galaxy image transformations to increase the classifier generalization ability. It can be considered as another way to improve the human bias for those galaxy images that had a poor vote classification in the GZ project. Such an approach, likely auto-immunization, when the CNN classifier trained on very good images is able to retrain bad images from the same homogeneous sample, can be considered co-planar to other methods of combating the human bias.
The accuracy of CNN classifier is in the range of 83.3-99.4 percent depending on 32 features. As a result, for the first time, we assigned the detailed morphological classification for more than 140K low-redshift galaxies, especially at the fainter end. We accentuate on the typical problem points of galaxy CNN image classification from the astronomical point of view. The catalogs will be available through the VizieR.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
Machine learning technique for morphological classification of galaxies from SDSS. II. The image-based morphological catalogs of galaxies at 0.02<z<0.1
Authors:
I. B. Vavilova,
V. Khramtsov,
D. V. Dobrycheva,
M. Yu. Vasylenko,
A. A. Elyiv,
O. V. Melnyk
Abstract:
We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with $-24^{m}<M_{r}<-19.4^{m}$ from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. As a result, we created the morphological catalog of 315782 galaxies a…
▽ More
We applied the image-based approach with a convolutional neural network model to the sample of low-redshifts galaxies with $-24^{m}<M_{r}<-19.4^{m}$ from the SDSS DR9. We divided it into two subsamples, SDSS DR9 galaxy dataset and Galaxy Zoo 2 (GZ2) dataset, considering them as the inference and training datasets, respectively. As a result, we created the morphological catalog of 315782 galaxies at 0.02<z<0.1, where morphological five classes and 34 detailed features (bar, rings, number of spiral arms, mergers, etc.) were first defined for 216148 galaxies (inference dataset) by the image-based CNN classifier. For the rest of galaxies the initial morphological classification was re-assigned as in the GZ2 project.
Our method shows the promising performance of morphological classification attaining more 93 % of accuracy for five classes morphology prediction except the cigar-shaped (75 %) and completely rounded (83 %) galaxies. Main results are presented in the catalog of 19468 completely rounded, 27321 rounded in-between, 3235 cigar-shaped, 4099 edge-on, 18615 spiral, and 72738 general low-redshift galaxies of the studied SDSS sample. As for the classification of galaxies by their detailed structural morphological features, our CNN model gives the accuracy in the range of 92-99 % depending on features, a number of galaxies with the given feature in the inference dataset, and the galaxy image quality. We demonstrate that implication of the CNN model with adversarial validation and adversarial image data augmentation improves classification of smaller and fainter SDSS galaxies with $m_{r}$ <17.7.
△ Less
Submitted 12 March, 2022;
originally announced March 2022.
-
VEXAS: VISTA EXtension to Auxiliary Surveys -- Data Release 2: Machine-learning based classification of sources in the Southern Hemisphere
Authors:
V. Khramtsov,
C. Spiniello,
A. Agnello,
A. Sergeyev
Abstract:
We present the second public data release (DR) of the VISTA EXtension to Auxiliary Surveys (VEXAS), where we classify objects into stars, galaxies and quasars based on an ensemble of machine learning algorithms. The aim of VEXAS is to build the widest multi-wavelength catalogue, providing reference magnitudes, colours and morphological information for a large number of scientific uses. We apply an…
▽ More
We present the second public data release (DR) of the VISTA EXtension to Auxiliary Surveys (VEXAS), where we classify objects into stars, galaxies and quasars based on an ensemble of machine learning algorithms. The aim of VEXAS is to build the widest multi-wavelength catalogue, providing reference magnitudes, colours and morphological information for a large number of scientific uses. We apply an ensemble of 32 different machine learning models, based on three different algorithms and on different magnitude sets, training samples and classification problems on the three VEXAS DR1 optical+infrared (IR) tables. The tables were created in DR1 cross-matching VISTA near-IR data with WISE far-IR data and with optical magnitudes from the Dark Energy Survey (VEXAS-DESW), the Sky Mapper Survey (VEXAS-SMW), and the PanSTARRS (VEXAS-PSW). We assemble a large table of spectroscopically confirmed objects (415 628 unique objects), based on the combination of 6 different spectroscopic surveys that we use for training. We develop feature imputation to classify also objects for which magnitudes in one or more bands are missing. We classify in total ~90 million objects in the Southern Hemisphere. Among these,~62.9M (~52.6M) are classified as 'high confidence' ('secure') stars, ~920k (~750k) as 'high confidence' ('secure') quasars and ~34.8M (~34.1M) as 'high confidence' ('secure') galaxies, with probabilities $p_{\rm class}\ge 0.7$ ($p_{\rm class}\ge 0.9$). The density of high-confidence extragalactic objects varies strongly with the survey depth: at $p_{\rm class}\ge 0.7$, there are 111/deg$^2$ quasars in the VEXAS-DESW footprint and 103/deg$^2$ in the VEXAS-PSW footprint, while only 10.7/deg$^2$ in the VEXAS-SM footprint. Improved depth in the midIR and coverage in the optical and nearIR are needed for the SM footprint that is not already covered by DESW and PSW.
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
KiDS-SQuaD II: Machine learning selection of bright extragalactic objects to search for new gravitationally lensed quasars
Authors:
Vladislav Khramtsov,
Alexey Sergeyev,
Chiara Spiniello,
Crescenzo Tortora,
Nicola R. Napolitano,
Adriano Agnello,
Fedor Getman,
Jelte T. A. de Jong,
Konrad Kuijken,
Mario Radovich,
HuanYuan Shan,
Valery Shulga
Abstract:
The KiDS Strongly lensed QUAsar Detection project (KiDS-SQuaD) aims at finding as many previously undiscovered gravitational lensed quasars as possible in the Kilo Degree Survey. This is the second paper of this series where we present a new, automatic object classification method based on machine learning technique. The main goal of this paper is to build a catalogue of bright extragalactic objec…
▽ More
The KiDS Strongly lensed QUAsar Detection project (KiDS-SQuaD) aims at finding as many previously undiscovered gravitational lensed quasars as possible in the Kilo Degree Survey. This is the second paper of this series where we present a new, automatic object classification method based on machine learning technique. The main goal of this paper is to build a catalogue of bright extragalactic objects (galaxies and quasars), from the KiDS Data Release 4, with a minimum stellar contamination, preserving the completeness as much as possible, to then apply morphological methods to select reliable gravitationally lensed (GL) quasar candidates. After testing some of the most used machine learning algorithms, decision trees based classifiers, we decided to use CatBoost, that was specifically trained with the aim of creating a sample of extragalactic sources as clean as possible from stars. We discuss the input data, define the training sample for the classifier, give quantitative estimates of its performances, and finally describe the validation results with Gaia DR2, AllWISE, and GAMA catalogues. We have built and make available to the scientific community the KiDS Bright EXtraGalactic Objects catalogue (KiDS-BEXGO), specifically created to find gravitational lenses. This is made of $\approx6$ millions of sources classified as quasars ($\approx 200\,000$) and galaxies ($\approx 5.7$M), up to $r<22^m$. From this catalog we selected 'Multiplets': close pairs of quasars or galaxies surrounded by at least one quasar, presenting the 12 most reliable gravitationally lensed quasar candidates, to demonstrate the potential of the catalogue, which will be further explored in a forthcoming paper. We compared our search to the previous one, presented in the first paper from this series, showing that employing a machine learning method decreases the stars-contaminators within the GL candidates.
△ Less
Submitted 7 June, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
KiDS0239-3211: A new gravitational quadruple lens candidate
Authors:
A. Sergeyev,
C. Spiniello,
V. Khramtsov,
N. R. Napolitano,
E. Bannikova,
C. Tortora,
F. I. Getman,
A. Agnello
Abstract:
We report the discovery of a candidate to quadrupole gravitationally lensed system KiDS0239-3211 based on the public data release 3 of the KiDS survey and machine learning techniques.
We report the discovery of a candidate to quadrupole gravitationally lensed system KiDS0239-3211 based on the public data release 3 of the KiDS survey and machine learning techniques.
△ Less
Submitted 3 October, 2018;
originally announced October 2018.
-
Machine-learning identification of extragalactic objects in the optical-infrared all-sky surveys
Authors:
Vladislav Khramtsov,
Volodymyr Akhmetov
Abstract:
We present new fully-automatic classification model to select extragalactic objects within astronomy photometric catalogs. Construction of the our classification model is based on the three important procedures: 1) data representation to create feature space; 2) building hypersurface in feature space to limit range of features (outliers detection); 3) building hyperplane separating extragalactic o…
▽ More
We present new fully-automatic classification model to select extragalactic objects within astronomy photometric catalogs. Construction of the our classification model is based on the three important procedures: 1) data representation to create feature space; 2) building hypersurface in feature space to limit range of features (outliers detection); 3) building hyperplane separating extragalactic objects from the galactic ones. We trained our model with 1.7 million objects (1.4 million galaxies and quasars, 0.3 million stars). The application of the model is presented as a photometric catalog of 38 million extragalactic objects, identified in the WISE and Pan-STARRS catalogs cross-matched with each other.
△ Less
Submitted 21 May, 2018;
originally announced May 2018.
-
New algorithm for astrometric reduction of the wide-field images
Authors:
Volodymyr Akhmetov,
Sergii Khlamov,
Vladislav Khramtsov,
Artem Dmytrenko
Abstract:
In this paper we presented the modified algorithm for astrometric reduction of the wide-field images. This algorithm is based on the iterative using of the method of ordinary least squares (OLS) and statistical Student t-criterion. The proposed algorithm provides the automatic selection of the most probabilistic reduction model. This approach allows eliminating almost all systematic errors that ar…
▽ More
In this paper we presented the modified algorithm for astrometric reduction of the wide-field images. This algorithm is based on the iterative using of the method of ordinary least squares (OLS) and statistical Student t-criterion. The proposed algorithm provides the automatic selection of the most probabilistic reduction model. This approach allows eliminating almost all systematic errors that are caused by imperfections in the optical system of modern large telescopes.
△ Less
Submitted 9 April, 2019; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Machine learning technique for morphological classification of galaxies from the SDSS. I. Photometry-based approach
Authors:
I. B. Vavilova,
D. V. Dobrycheva,
M. Yu. Vasylenko,
A. A. Elyiv,
O. V. Melnyk,
V. Khramtsov
Abstract:
Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, k-Nearest Neighbors, and k-fold validation. Results. We present results of a binary automated morphological classification of galaxies conducted by human labeling, multiphotometry, and supervised Machine Learning methods. W…
▽ More
Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, k-Nearest Neighbors, and k-fold validation. Results. We present results of a binary automated morphological classification of galaxies conducted by human labeling, multiphotometry, and supervised Machine Learning methods. We applied its to the sample of galaxies from the SDSS DR9 with 0.02 < z < 0.1 and 24m < Mr < 19.4m. To study the classifier, we used absolute magnitudes: Mu, Mg, Mr , Mi, Mz, Mu-Mr , Mg-Mi, Mu-Mg, Mr-Mz, and inverse concentration index to the center R50/R90. Using the Support vector machine classifier and the data on color indices, absolute magnitudes, inverse concentration index of galaxies with visual morphological types, we were able to classify 316 031 galaxies from the SDSS DR9 with unknown morphological types. Conclusions. The methods of Support Vector Machine and Random Forest with Scikit-learn machine learning in Python provide the highest accuracy for the binary galaxy morphological classification: 96.4% correctly classified (96.1% early E and 96.9% late L types) and 95.5% correctly classified (96.7% early E and 92.8% late L types), respectively. Applying the Support Vector Machine for the sample of 316 031 galaxies from the SDSS DR9 at z < 0.1, we found 141 211 E and 174 820 L types among them.
△ Less
Submitted 8 June, 2021; v1 submitted 24 December, 2017;
originally announced December 2017.