-
Beyond mirkwood: Enhancing SED Modeling with Conformal Predictions
Authors:
Sankalp Gilda
Abstract:
Traditional spectral energy distribution (SED) fitting techniques face uncertainties due to assumptions in star formation histories and dust attenuation curves. We propose an advanced machine learning-based approach that enhances flexibility and uncertainty quantification in SED fitting. Unlike the fixed NGBoost model used in mirkwood, our approach allows for any sklearn-compatible model, includin…
▽ More
Traditional spectral energy distribution (SED) fitting techniques face uncertainties due to assumptions in star formation histories and dust attenuation curves. We propose an advanced machine learning-based approach that enhances flexibility and uncertainty quantification in SED fitting. Unlike the fixed NGBoost model used in mirkwood, our approach allows for any sklearn-compatible model, including deterministic models. We incorporate conformalized quantile regression to convert point predictions into error bars, enhancing interpretability and reliability. Using CatBoost as the base predictor, we compare results with and without conformal prediction, demonstrating improved performance using metrics such as coverage and interval width. Our method offers a more versatile and accurate tool for deriving galaxy physical properties from observational data.
△ Less
Submitted 10 February, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
deep-REMAP: Parameterization of Stellar Spectra Using Regularized Multi-Task Learning
Authors:
Sankalp Gilda
Abstract:
Traditional spectral analysis methods are increasingly challenged by the exploding volumes of data produced by contemporary astronomical surveys. In response, we develop deep-Regularized Ensemble-based Multi-task Learning with Asymmetric Loss for Probabilistic Inference ($\rm{deep-REMAP}$), a novel framework that utilizes the rich synthetic spectra from the PHOENIX library and observational data f…
▽ More
Traditional spectral analysis methods are increasingly challenged by the exploding volumes of data produced by contemporary astronomical surveys. In response, we develop deep-Regularized Ensemble-based Multi-task Learning with Asymmetric Loss for Probabilistic Inference ($\rm{deep-REMAP}$), a novel framework that utilizes the rich synthetic spectra from the PHOENIX library and observational data from the MARVELS survey to accurately predict stellar atmospheric parameters. By harnessing advanced machine learning techniques, including multi-task learning and an innovative asymmetric loss function, $\rm{deep-REMAP}$ demonstrates superior predictive capabilities in determining effective temperature, surface gravity, and metallicity from observed spectra. Our results reveal the framework's effectiveness in extending to other stellar libraries and properties, paving the way for more sophisticated and automated techniques in stellar characterization.
△ Less
Submitted 21 November, 2023; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Unsupervised Domain Adaptation for Constraining Star Formation Histories
Authors:
Sankalp Gilda,
Antoine de Mathelin,
Sabine Bellstedt,
Guillaume Richard
Abstract:
The prevalent paradigm of machine learning today is to use past observations to predict future ones. What if, however, we are interested in knowing the past given the present? This situation is indeed one that astronomers must contend with often. To understand the formation of our universe, we must derive the time evolution of the visible mass content of galaxies. However, to observe a complete st…
▽ More
The prevalent paradigm of machine learning today is to use past observations to predict future ones. What if, however, we are interested in knowing the past given the present? This situation is indeed one that astronomers must contend with often. To understand the formation of our universe, we must derive the time evolution of the visible mass content of galaxies. However, to observe a complete star life, one would need to wait for one billion years! To overcome this difficulty, astrophysicists leverage supercomputers and evolve simulated models of galaxies till the current age of the universe, thus establishing a mapping between observed radiation and star formation histories (SFHs). Such ground-truth SFHs are lacking for actual galaxy observations, where they are usually inferred -- with often poor confidence -- from spectral energy distributions (SEDs) using Bayesian fitting methods. In this investigation, we discuss the ability of unsupervised domain adaptation to derive accurate SFHs for galaxies with simulated data as a necessary first step in developing a technique that can ultimately be applied to observational data.
△ Less
Submitted 26 August, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
Uncertainty-Aware Learning for Improvements in Image Quality of the Canada-France-Hawaii Telescope
Authors:
Sankalp Gilda,
Stark C. Draper,
Sebastien Fabbro,
William Mahoney,
Simon Prunet,
Kanoa Withington,
Matthew Wilson,
Yuan-Sen Ting,
Andrew Sheinis
Abstract:
We leverage state-of-the-art machine learning methods and a decade's worth of archival data from CFHT to predict observatory image quality (IQ) from environmental conditions and observatory operating parameters. Specifically, we develop accurate and interpretable models of the complex dependence between data features and observed IQ for CFHT's wide-field camera, MegaCam. Our contributions are seve…
▽ More
We leverage state-of-the-art machine learning methods and a decade's worth of archival data from CFHT to predict observatory image quality (IQ) from environmental conditions and observatory operating parameters. Specifically, we develop accurate and interpretable models of the complex dependence between data features and observed IQ for CFHT's wide-field camera, MegaCam. Our contributions are several-fold. First, we collect, collate and reprocess several disparate data sets gathered by CFHT scientists. Second, we predict probability distribution functions (PDFs) of IQ and achieve a mean absolute error of $\sim0.07''$ for the predicted medians. Third, we explore the data-driven actuation of the 12 dome "vents" installed in 2013-14 to accelerate the flushing of hot air from the dome. We leverage epistemic and aleatoric uncertainties in conjunction with probabilistic generative modeling to identify candidate vent adjustments that are in-distribution (ID); for the optimal configuration for each ID sample, we predict the reduction in required observing time to achieve a fixed SNR. On average, the reduction is $\sim12\%$. Finally, we rank input features by their Shapley values to identify the most predictive variables for each observation. Our long-term goal is to construct reliable and real-time models that can forecast optimal observatory operating parameters to optimize IQ. We can then feed such forecasts into scheduling protocols and predictive maintenance routines. We anticipate that such approaches will become standard in automating observatory operations and maintenance by the time CFHT's successor, the Maunakea Spectroscopic Explorer, is installed in the next decade.
△ Less
Submitted 15 November, 2021; v1 submitted 30 June, 2021;
originally announced July 2021.
-
{\sc mirkwood:} Fast and Accurate SED Modeling Using Machine Learning
Authors:
Sankalp Gilda,
Sidney Lower,
Desika Narayanan
Abstract:
Traditional spectral energy distribution (SED) fitting codes used to derive galaxy physical properties are often uncertain at the factor of a few level owing to uncertainties in galaxy star formation histories and dust attenuation curves. Beyond this, Bayesian fitting (which is typically used in SED fitting software) is an intrinsically compute-intensive task, often requiring access to expensive h…
▽ More
Traditional spectral energy distribution (SED) fitting codes used to derive galaxy physical properties are often uncertain at the factor of a few level owing to uncertainties in galaxy star formation histories and dust attenuation curves. Beyond this, Bayesian fitting (which is typically used in SED fitting software) is an intrinsically compute-intensive task, often requiring access to expensive hardware for long periods of time. To overcome these shortcomings, we have developed {\sc mirkwood}: a user-friendly tool comprising of an ensemble of supervised machine learning-based models capable of non-linearly mapping galaxy fluxes to their properties. By stacking multiple models, we marginalize against any individual model's poor performance in a given region of the parameter space. We demonstrate \textsc{mirkwood}'s significantly improved performance over traditional techniques by training it on a combined data set of mock photometry of z=0 galaxies from the \textsc{Simba}, \textsc{EAGLE} and \textsc{IllustrisTNG} cosmological simulations, and comparing the derived results with those obtained from traditional SED fitting techniques. \textsc{mirkwood} is also able to account for uncertainties arising both from intrinsic noise in observations, and from finite training data and incorrect modeling assumptions. To increase the added value to the observational community, we use Shapley value explanations (SHAP) to fairly evaluate the relative importance of different bands to understand why particular predictions were reached. We envisage \textsc{mirkwood} to be an evolving, open-source framework that will provide highly accurate physical properties from observations of galaxies as compared to traditional SED fitting.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Astronomical Image Quality Prediction based on Environmental and Telescope Operating Conditions
Authors:
Sankalp Gilda,
Yuan-Sen Ting,
Kanoa Withington,
Matthew Wilson,
Simon Prunet,
William Mahoney,
Sebastien Fabbro,
Stark C. Draper,
Andrew Sheinis
Abstract:
Intelligent scheduling of the sequence of scientific exposures taken at ground-based astronomical observatories is massively challenging. Observing time is over-subscribed and atmospheric conditions are constantly changing. We propose to guide observatory scheduling using machine learning. Leveraging a 15-year archive of exposures, environmental, and operating conditions logged by the Canada-Franc…
▽ More
Intelligent scheduling of the sequence of scientific exposures taken at ground-based astronomical observatories is massively challenging. Observing time is over-subscribed and atmospheric conditions are constantly changing. We propose to guide observatory scheduling using machine learning. Leveraging a 15-year archive of exposures, environmental, and operating conditions logged by the Canada-France-Hawaii Telescope, we construct a probabilistic data-driven model that accurately predicts image quality. We demonstrate that, by optimizing the opening and closing of twelve vents placed on the dome of the telescope, we can reduce dome-induced turbulence and improve telescope image quality by (0.05-0.2 arc-seconds). This translates to a reduction in exposure time (and hence cost) of $\sim 10-15\%$. Our study is the first step toward data-based optimization of the multi-million dollar operations of current and next-generation telescopes.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Gamma-ray Bursts as distance indicators through a machine learning approach
Authors:
Maria Dainotti,
Vahé Petrosian,
Malgorzata Bogdan,
Blazej Miasojedow,
Shigehiro Nagataki,
Trevor Hastie,
Zooey Nuyngen,
Sankalp Gilda,
Xavier Hernandez,
Dominika Krol
Abstract:
Gamma-ray bursts (GRBs) are spectacularly energetic events, with the potential to inform on the early universe and its evolution, once their redshifts are known. Unfortunately, determining redshifts is a painstaking procedure requiring detailed follow-up multi-wavelength observations often involving various astronomical facilities, which have to be rapidly pointed at these serendipitous events. He…
▽ More
Gamma-ray bursts (GRBs) are spectacularly energetic events, with the potential to inform on the early universe and its evolution, once their redshifts are known. Unfortunately, determining redshifts is a painstaking procedure requiring detailed follow-up multi-wavelength observations often involving various astronomical facilities, which have to be rapidly pointed at these serendipitous events. Here we use Machine Learning algorithms to infer redshifts from a collection of observed temporal and spectral features of GRBs. We obtained a very high correlation coefficient ($0.96$) between the inferred and the observed redshifts, and a small dispersion (with a mean square error of $0.003$) in the test set. The addition of plateau afterglow parameters improves the predictions by $61.4\%$ compared to previous results. The GRB luminosity function and cumulative density rate evolutions, obtained from predicted and observed redshift are in excellent agreement indicating that GRBs are effective distance indicators and a reliable step for the cosmic distance ladder.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Automatic Kalman-Filter-based Wavelet Shrinkage Denoising of 1D Stellar Spectra
Authors:
Sankalp Gilda,
Zachary Slepian
Abstract:
We propose a non-parametric method to denoise 1D stellar spectra based on wavelet shrinkage followed by adaptive Kalman thresholding. Wavelet shrinkage denoising involves applying the Discrete Wavelet Transform (DWT) to the input signal, `shrinking' certain frequency components in the transform domain, and then applying inverse DWT to the reduced components. The performance of this procedure is in…
▽ More
We propose a non-parametric method to denoise 1D stellar spectra based on wavelet shrinkage followed by adaptive Kalman thresholding. Wavelet shrinkage denoising involves applying the Discrete Wavelet Transform (DWT) to the input signal, `shrinking' certain frequency components in the transform domain, and then applying inverse DWT to the reduced components. The performance of this procedure is influenced by the choice of base wavelet, the number of decomposition levels, and the thresholding function. Typically, these parameters are chosen by `trial and error', which can be strongly dependent on the properties of the data being denoised. We here introduce an adaptive Kalman-filter-based thresholding method that eliminates the need for choosing the number of decomposition levels. We use the `Haar' wavelet basis, which we found to be the best-suited for 1D stellar spectra. We introduce various levels of Poisson noise into synthetic PHOENIX spectra, and test the performance of several common denoising methods against our own. It proves superior in terms of noise suppression and peak shape preservation. We expect it may also be of use in automatically and accurately filtering low signal-to-noise galaxy and quasar spectra obtained from surveys such as SDSS, Gaia, LSST, PESSTO, VANDELS, LEGA-C, and DESI.
△ Less
Submitted 2 July, 2020; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Feature Selection for Better Spectral Characterization or: How I Learned to Start Worrying and Love Ensembles
Authors:
Sankalp Gilda
Abstract:
An ever-looming threat to astronomical applications of machine learning is the danger of over-fitting data, also known as the `curse of dimensionality.' This occurs when there are fewer samples than the number of independent variables. In this work, we focus on the problem of stellar parameterization from low-mid resolution spectra, with blended absorption lines. We address this problem using an i…
▽ More
An ever-looming threat to astronomical applications of machine learning is the danger of over-fitting data, also known as the `curse of dimensionality.' This occurs when there are fewer samples than the number of independent variables. In this work, we focus on the problem of stellar parameterization from low-mid resolution spectra, with blended absorption lines. We address this problem using an iterative algorithm to sequentially prune redundant features from synthetic PHOENIX spectra, and arrive at an optimal set of wavelengths with the strongest correlation with each of the output variables -- T$_{\rm eff}$, $\log g$, and [Fe/H]. We find that at any given resolution, most features (i.e., absorption lines) are not only redundant, but actually act as noise and decrease the accuracy of parameter retrieval.
△ Less
Submitted 22 February, 2019; v1 submitted 19 February, 2019;
originally announced February 2019.
-
The first super-Earth Detection from the High Cadence and High Radial Velocity Precision Dharma Planet Survey
Authors:
Bo Ma,
Jian Ge,
Matthew Muterspaugh,
Michael A. Singer,
Gregory W. Henry,
Jonay I. Gonzalez Hernandez,
Sirinrat Sithajan,
Sarik Jeram,
Michael Williamson,
Keivan Stassun,
Benjamin Kimock,
Frank Varosi,
Sidney Schofield,
Jian Liu,
Scott Powell,
Anthony Cassette,
Hali Jakeman,
Louis Avner,
Nolan Grieves,
Rory Barnes,
Sankalp Gilda,
Jim Grantham,
Greg Stafford,
David Savage,
Steve Bland
, et al. (1 additional authors not shown)
Abstract:
The Dharma Planet Survey (DPS) aims to monitor about 150 nearby very bright FGKM dwarfs (within 50 pc) during 2016$-$2020 for low-mass planet detection and characterization using the TOU very high resolution optical spectrograph (R$\approx$100,000, 380-900nm). TOU was initially mounted to the 2-m Automatic Spectroscopic Telescope at Fairborn Observatory in 2013-2015 to conduct a pilot survey, then…
▽ More
The Dharma Planet Survey (DPS) aims to monitor about 150 nearby very bright FGKM dwarfs (within 50 pc) during 2016$-$2020 for low-mass planet detection and characterization using the TOU very high resolution optical spectrograph (R$\approx$100,000, 380-900nm). TOU was initially mounted to the 2-m Automatic Spectroscopic Telescope at Fairborn Observatory in 2013-2015 to conduct a pilot survey, then moved to the dedicated 50-inch automatic telescope on Mt. Lemmon in 2016 to launch the survey. Here we report the first planet detection from DPS, a super-Earth candidate orbiting a bright K dwarf star, HD 26965. It is the second brightest star ($V=4.4$ mag) on the sky with a super-Earth candidate. The planet candidate has a mass of 8.47$\pm0.47M_{\rm Earth}$, period of $42.38\pm0.01$ d, and eccentricity of $0.04^{+0.05}_{-0.03}$. This RV signal was independently detected by Diaz et al. (2018), but they could not confirm if the signal is from a planet or from stellar activity. The orbital period of the planet is close to the rotation period of the star (39$-$44.5 d) measured from stellar activity indicators. Our high precision photometric campaign and line bisector analysis of this star do not find any significant variations at the orbital period. Stellar RV jitters modeled from star spots and convection inhibition are also not strong enough to explain the RV signal detected. After further comparing RV data from the star's active magnetic phase and quiet magnetic phase, we conclude that the RV signal is due to planetary-reflex motion and not stellar activity.
△ Less
Submitted 18 July, 2018;
originally announced July 2018.