subscribe to arXiv mailings

arXiv:2310.19340 [pdf, other]

Splines 'n Lines: Rest-frame galaxy spectral energy distributions via Bayesian functional data analysis

Authors: David Kent, Tamás Budavári, Thomas J. Loredo, David Ruppert

Abstract: Survey-based measurements of the spectral energy distributions (SEDs) of galaxies have flux density estimates on badly misaligned grids in rest-frame wavelength. The shift to rest frame wavelength also causes estimated SEDs to have differing support. For many galaxies, there are sizeable wavelength regions with missing data. Finally, dim galaxies dominate typical samples and have noisy SED measure… ▽ More Survey-based measurements of the spectral energy distributions (SEDs) of galaxies have flux density estimates on badly misaligned grids in rest-frame wavelength. The shift to rest frame wavelength also causes estimated SEDs to have differing support. For many galaxies, there are sizeable wavelength regions with missing data. Finally, dim galaxies dominate typical samples and have noisy SED measurements, many near the limiting signal-to-noise level of the survey. These limitations of SED measurements shifted to the rest frame complicate downstream analysis tasks, particularly tasks requiring computation of functionals (e.g., weighted integrals) of the SEDs, such as synthetic photometry, quantifying SED similarity, and using SED measurements for photometric redshift estimation. We describe a hierarchical Bayesian framework, drawing on tools from functional data analysis, that models SEDs as a random superposition of smooth continuum basis functions (B-splines) and line features, comprising a finite-rank, nonstationary Gaussian process, measured with additive Gaussian noise. We apply this *Splines 'n Lines* (SnL) model to a collection of 678,239 galaxy SED measurements comprising the Main Galaxy Sample from the Sloan Digital Sky Survey, Data Release 17, demonstrating capability to provide continuous estimated SEDs that reliably denoise, interpolate, and extrapolate, with quantified uncertainty, including the ability to predict line features where there is missing data by leveraging correlations between line features and the entire continuum. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 20 pages, 7 figures

arXiv:2302.05804 [pdf, ps, other]

A flexible Expectation-Maximization framework for fast, scalable and high-fidelity multi-frame astronomical image deconvolution

Authors: Yashil Sukurdeep, Fausto Navarro, Tamas Budavari

Abstract: We present a computationally efficient expectation-maximization framework for multi-frame image deconvolution and super-resolution. Our method is well adapted for processing large scale imaging data from modern astronomical surveys. Our Tensorflow implementation is flexible, benefits from advanced algorithmic solutions, and allows users to seamlessly leverage Graphical Processing Unit (GPU) accele… ▽ More We present a computationally efficient expectation-maximization framework for multi-frame image deconvolution and super-resolution. Our method is well adapted for processing large scale imaging data from modern astronomical surveys. Our Tensorflow implementation is flexible, benefits from advanced algorithmic solutions, and allows users to seamlessly leverage Graphical Processing Unit (GPU) acceleration, thus making it viable for use in modern astronomical software pipelines. The testbed for our method is a set of $4$K by $4$K Hyper Suprime-Cam exposures, which are closest in terms of quality to imaging data from the upcoming Rubin Observatory. The preliminary results are extremely promising: our method produces a high-fidelity non-parametric reconstruction of the night sky, from which we recover unprecedented details such as the shape of the spiral arms of galaxies, while also managing to deconvolve stars perfectly into essentially single pixels. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.02030 [pdf, ps, other]

Learning the Night Sky with Deep Generative Priors

Authors: Fausto Navarro, Daniel Hall, Tamas Budavari, Yashil Sukurdeep

Abstract: Recovering sharper images from blurred observations, referred to as deconvolution, is an ill-posed problem where classical approaches often produce unsatisfactory results. In ground-based astronomy, combining multiple exposures to achieve images with higher signal-to-noise ratios is complicated by the variation of point-spread functions across exposures due to atmospheric effects. We develop an un… ▽ More Recovering sharper images from blurred observations, referred to as deconvolution, is an ill-posed problem where classical approaches often produce unsatisfactory results. In ground-based astronomy, combining multiple exposures to achieve images with higher signal-to-noise ratios is complicated by the variation of point-spread functions across exposures due to atmospheric effects. We develop an unsupervised multi-frame method for denoising, deblurring, and coadding images inspired by deep generative priors. We use a carefully chosen convolutional neural network architecture that combines information from multiple observations, regularizes the joint likelihood over these observations, and allows us to impose desired constraints, such as non-negativity of pixel values in the sharp, restored image. With an eye towards the Rubin Observatory, we analyze 4K by 4K Hyper Suprime-Cam exposures and obtain preliminary results which yield promising restored images and extracted source lists. △ Less

Submitted 3 February, 2023; originally announced February 2023.

arXiv:2207.11125 [pdf, other]

doi 10.3847/1538-3881/ac6bf6

Globally optimal and scalable $N$-way matching of astronomy catalogs

Authors: Tu Nguyen, Amitabh Basu, Támas Budavári

Abstract: Building on previous Bayesian approaches, we introduce a novel formulation of probabilistic cross-identification, where detections are directly associated to (hypothesized) astronomical objects in a globally optimal way. We show that this new method scales better for processing multiple catalogs than enumerating all possible candidates, especially in the limit of crowded fields, which is the most… ▽ More Building on previous Bayesian approaches, we introduce a novel formulation of probabilistic cross-identification, where detections are directly associated to (hypothesized) astronomical objects in a globally optimal way. We show that this new method scales better for processing multiple catalogs than enumerating all possible candidates, especially in the limit of crowded fields, which is the most challenging observational regime for new-generation astronomy experiments such as the Rubin Observatory Legacy Survey of Space and Time (LSST). Here we study simulated catalogs where the ground-truth is known and report on the statistical and computational performance of the method. The paper is accompanied by a public software tool to perform globally optimal catalog matching based on directional data. △ Less

Submitted 22 July, 2022; originally announced July 2022.

MSC Class: 85-05; 90-10; 90C90

Journal ref: The Astronomical Journal, Volume 163, Number 6, 2022

arXiv:2105.08026 [pdf, other]

doi 10.1016/j.ascom.2018.10.004

GPU-Accelerated Hierarchical Bayesian Inference with Application to Modeling Cosmic Populations: CUDAHM

Authors: János M. Szalai-Gindl, Thomas J. Loredo, Brandon C. Kelly, István Csabai, Tamás Budavári, László Dobos

Abstract: We describe a computational framework for hierarchical Bayesian inference with simple (typically single-plate) parametric graphical models that uses graphics processing units (GPUs) to accelerate computations, enabling deployment on very large datasets. Its C++ implementation, CUDAHM (CUDA for Hierarchical Models) exploits conditional independence between instances of a plate, facilitating massive… ▽ More We describe a computational framework for hierarchical Bayesian inference with simple (typically single-plate) parametric graphical models that uses graphics processing units (GPUs) to accelerate computations, enabling deployment on very large datasets. Its C++ implementation, CUDAHM (CUDA for Hierarchical Models) exploits conditional independence between instances of a plate, facilitating massively parallel exploration of the replication parameter space using the single instruction, multiple data architecture of GPUs. It provides support for constructing Metropolis-within-Gibbs samplers that iterate between GPU-accelerated robust adaptive Metropolis sampling of plate-level parameters conditional on upper-level parameters, and Metropolis-Hastings sampling of upper-level parameters on the host processor conditional on the GPU results. CUDAHM is motivated by demographic problems in astronomy, where density estimation and linear and nonlinear regression problems must be addressed for populations of thousands to millions of objects whose features are measured with possibly complex uncertainties. We describe a thinned latent point process framework for modeling such demographic data. We demonstrate accurate GPU-accelerated parametric conditional density deconvolution for simulated populations of up to 300,000 objects in ~1 hour using a single NVIDIA Tesla K40c GPU. Supplementary material provides details about the CUDAHM API and the demonstration problem. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: 28 pages, 7 figures, 2 appendices

Journal ref: Abridged version published as "GPU-accelerated hierarchical Bayesian estimation of luminosity functions using flux-limited observations with photometric noise" in Astronomy & Computing (2018)

arXiv:2102.10627 [pdf, other]

doi 10.3847/1538-4357/abe8d2

Probabilistic Association of Transients to their Hosts (PATH)

Authors: Kshitij Aggarwal, Tamás Budavári, Adam T. Deller, Tarraneh Eftekhari, Clancy W. James, J. Xavier Prochaska, Shriharsh P. Tendulkar

Abstract: We introduce a new method to estimate the probability that an extragalactic transient source is associated with a candidate host galaxy. This approach relies solely on simple observables: sky coordinates and their uncertainties, galaxy fluxes and angular sizes. The formalism invokes Bayes' rule to calculate the posterior probability P(O_i|x) from the galaxy prior P(O), observables x, and an assume… ▽ More We introduce a new method to estimate the probability that an extragalactic transient source is associated with a candidate host galaxy. This approach relies solely on simple observables: sky coordinates and their uncertainties, galaxy fluxes and angular sizes. The formalism invokes Bayes' rule to calculate the posterior probability P(O_i|x) from the galaxy prior P(O), observables x, and an assumed model for the true distribution of transients in/around their host galaxies. Using simulated transients placed in the well-studied COSMOS field, we consider several agnostic and physically motivated priors and offset distributions to explore the method sensitivity. We then apply the methodology to the set of 13~fast radio bursts (FRBs) localized with an uncertainty of several arcseconds. Our methodology finds nine of these are securely associated to a single host galaxy, P(O_i|x)>0.95. We examine the observed and intrinsic properties of these secure FRB hosts, recovering similar distributions as previous works. Furthermore, we find a strong correlation between the apparent magnitude of the securely identified host galaxies and the estimated cosmic dispersion measures of the corresponding FRBs, which results from the Macquart relation. Future work with FRBs will leverage this relation and other measures from the secure hosts as priors for future associations. The methodology is generic to transient type, localization error, and image quality. We encourage its application to other transients where host galaxy associations are critical to the science, e.g. gravitational wave events, gamma-ray bursts, and supernovae. We have encoded the technique in Python on GitHub: https://github.com/FRBs/astropath. △ Less

Submitted 21 February, 2021; originally announced February 2021.

Comments: In press, ApJ; comments still welcome; Visit https://github.com/FRBs/astropath to use and build PATH

arXiv:2102.10260 [pdf, other]

Wireless sensor network for in situ soil moisture monitoring

Authors: Jianing Fang, Chuheng Hu, Nour Smaoui, Doug Carlson, Jayant Gupchup, Razvan Musaloiu-E., Chieh-Jan Mike Liang, Marcus Chang, Omprakash Gnawali, Tamas Budavari, Andreas Terzis, Katalin Szlavecz, Alexander S. Szalay

Abstract: We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environm… ▽ More We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 12 pages, 16 figures, Sensornets 2021 Conference

arXiv:2008.05276 [pdf, other]

doi 10.1093/mnras/staa2447

Optimal Probabilistic Catalogue Matching for Radio Sources

Authors: Dongwei Fan, Tamás Budavári, Ray P. Norris, Amitabh Basu

Abstract: Cross-matching catalogues from radio surveys to catalogues of sources at other wavelengths is extremely hard, because radio sources are often extended, often consist of several spatially separated components, and often no radio component is coincident with the optical/infrared host galaxy. Traditionally, the cross-matching is done by eye, but this does not scale to the millions of radio sources ex… ▽ More Cross-matching catalogues from radio surveys to catalogues of sources at other wavelengths is extremely hard, because radio sources are often extended, often consist of several spatially separated components, and often no radio component is coincident with the optical/infrared host galaxy. Traditionally, the cross-matching is done by eye, but this does not scale to the millions of radio sources expected from the next generation of radio surveys. We present an innovative automated procedure, using Bayesian hypothesis testing, that models trial radio-source morphologies with putative positions of the host galaxy. This new algorithm differs from an earlier version by allowing more complex radio source morphologies, and performing a simultaneous fit over a large field. We show that this technique performs well in an unsupervised mode. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: 9 pages, 7 figures

arXiv:2007.11598 [pdf, other]

doi 10.1093/mnras/staa2165

Computational Tools for the Spectroscopic Analysis of White Dwarfs

Authors: Vedant Chandra, Hsiang-Chih Hwang, Nadia L. Zakamska, Tamás Budavári

Abstract: The spectroscopic features of white dwarfs are formed in the thin upper layer of their stellar photosphere. These features carry information about the white dwarf's surface temperature, surface gravity, and chemical composition (hereafter 'labels'). Existing methods to determine these labels rely on complex ab-initio theoretical models which are not always publicly available. Here we present two t… ▽ More The spectroscopic features of white dwarfs are formed in the thin upper layer of their stellar photosphere. These features carry information about the white dwarf's surface temperature, surface gravity, and chemical composition (hereafter 'labels'). Existing methods to determine these labels rely on complex ab-initio theoretical models which are not always publicly available. Here we present two techniques to determine atmospheric labels from white dwarf spectra: a generative fitting pipeline that interpolates theoretical spectra with artificial neural networks, and a random forest regression model using parameters derived from absorption line features. We test and compare our methods using a large catalog of white dwarfs from the Sloan Digital Sky Survey (SDSS), achieving the same accuracy and negligible bias compared to previous studies. We package our techniques into an open-source Python module 'wdtools' that provides a computationally inexpensive way to determine stellar labels from white dwarf spectra observed from any facility. We will actively develop and update our tool as more theoretical models become publicly available. We discuss applications of our tool in its present form to identify interesting outlier white dwarf systems including those with magnetic fields, helium-rich atmospheres, and double-degenerate binaries. △ Less

Submitted 8 September, 2020; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: 11 pages, 7 figures. Accepted for publication in MNRAS; updated references

arXiv:1909.10757 [pdf, other]

doi 10.1051/0004-6361/201936026

The Hubble Catalog of Variables (HCV)

Authors: A. Z. Bonanos, M. Yang, K. V. Sokolovsky, P. Gavras, D. Hatzidimitriou, I. Bellas-Velidis, G. Kakaletris, D. J. Lennon, A. Nota, R. L. White, B. C. Whitmore, K. A. Anastasiou, M. Arévalo, C. Arviset, D. Baines, T. Budavari, V. Charmandaris, C. Chatzichristodoulou, E. Dimas, J. Durán, I. Georgantopoulos, A. Karampelas, N. Laskaris, S. Lianou, A. Livanis , et al. (11 additional authors not shown)

Abstract: The Hubble Space Telescope (HST) has obtained multi-epoch observations providing the opportunity for a comprehensive variability search aiming to uncover new variables. We have therefore undertaken the task of creating a catalog of variable sources based on version 3 of the Hubble Source Catalog (HSC), which relies on publicly available images obtained with the WFPC2, ACS, and WFC3 instruments onb… ▽ More The Hubble Space Telescope (HST) has obtained multi-epoch observations providing the opportunity for a comprehensive variability search aiming to uncover new variables. We have therefore undertaken the task of creating a catalog of variable sources based on version 3 of the Hubble Source Catalog (HSC), which relies on publicly available images obtained with the WFPC2, ACS, and WFC3 instruments onboard the HST. We adopted magnitude-dependent thresholding in median absolute deviation (a robust measure of light curve scatter) combined with sophisticated preprocessing techniques and visual quality control to identify and validate variable sources observed by Hubble with the same instrument and filter combination five or more times. The Hubble Catalog of Variables (HCV) includes 84,428 candidate variable sources (out of 3.7 million HSC sources that were searched for variability) with $V \leq 27$ mag; for 11,115 of them the variability is detected in more than one filter. The data points in the light curves of the variables in the HCV catalog range from five to 120 points (typically having less than ten points); the time baseline ranges from under a day to over 15 years; while $\sim$8% of all variables have amplitudes in excess of 1 mag. Visual inspection performed on a subset of the candidate variables suggests that at least 80% of the candidate variables that passed our automated quality control are true variable sources rather than spurious detections resulting from blending, residual cosmic rays, and calibration errors. The HCV is the first, homogeneous catalog of variable sources created from archival HST data and currently is the deepest catalog of variables available. The catalog includes variable stars in our Galaxy and nearby galaxies, as well as transients and variable active galactic nuclei. (abbreviated) △ Less

Submitted 24 September, 2019; originally announced September 2019.

Comments: 33 pages, 15 figures, 11 tables; accepted in A&A. The HCV is available from the ESA Hubble Science Archive (eHST) (http://archives.esac.esa.int/ehst) at ESAC, and can be easily explored using the HCV Explorer (http://archives.esac.esa.int/hcv-explorer). The HCV is also hosted at STScI in the form of a High Level Science Product (HLSP) in MAST (https://archive.stsci.edu/hlsp/hcv)

arXiv:1908.10971 [pdf, other]

doi 10.3847/1538-3881/ab3f38

Robust Registration of Astronomy Catalogs with Applications to the Hubble Space Telescope

Authors: Fan Tian, Tamás Budavári, Amitabh Basu, Stephen H. Lubow, Richard L. White

Abstract: Astrometric calibration of images with a small field of view is often inferior to the internal accuracy of the source detections due to the small number of accessible guide stars. One important experiment with such challenges is the Hubble Space Telescope (HST). A possible solution is to cross-calibrate overlapping fields instead of just relying on standard stars. Following the approach of \citet{… ▽ More Astrometric calibration of images with a small field of view is often inferior to the internal accuracy of the source detections due to the small number of accessible guide stars. One important experiment with such challenges is the Hubble Space Telescope (HST). A possible solution is to cross-calibrate overlapping fields instead of just relying on standard stars. Following the approach of \citet{2012ApJ...761..188B}, we use infinitesimal 3D rotations for fine-tuning the calibration but devise a better objective that is robust to a large number of false candidates in the initial set of associations. Using Bayesian statistics, we accommodate bad data by explicitly modeling the quality, which yields a formalism essentially identical to an $M$-estimation in robust statistics. Our results on simulated and real catalogs show great potentials for improving the HST calibration, and those with similar challenges. △ Less

Submitted 28 August, 2019; originally announced August 2019.

arXiv:1903.06796 [pdf, ps, other]

Astro2020 Science White Paper: The Next Decade of Astroinformatics and Astrostatistics

Authors: A. Siemiginowska, G. Eadie, I. Czekala, E. Feigelson, E. B. Ford, V. Kashyap, M. Kuhn, T. Loredo, M. Ntampaka, A. Stevens, A. Avelino, K. Borne, T. Budavari, B. Burkhart, J. Cisewski-Kehe, F. Civano, I. Chilingarian, D. A. van Dyk, G. Fabbiano, D. P. Finkbeiner, D. Foreman-Mackey, P. Freeman, A. Fruscione, A. A. Goodman, M. Graham , et al. (27 additional authors not shown)

Abstract: Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New met… ▽ More Over the past century, major advances in astronomy and astrophysics have been largely driven by improvements in instrumentation and data collection. With the amassing of high quality data from new telescopes, and especially with the advent of deep and large astronomical surveys, it is becoming clear that future advances will also rely heavily on how those data are analyzed and interpreted. New methodologies derived from advances in statistics, computer science, and machine learning are beginning to be employed in sophisticated investigations that are not only bringing forth new discoveries, but are placing them on a solid footing. Progress in wide-field sky surveys, interferometric imaging, precision cosmology, exoplanet detection and characterization, and many subfields of stellar, Galactic and extragalactic astronomy, has resulted in complex data analysis challenges that must be solved to perform scientific inference. Research in astrostatistics and astroinformatics will be necessary to develop the state-of-the-art methodology needed in astronomy. Overcoming these challenges requires dedicated, interdisciplinary research. We recommend: (1) increasing funding for interdisciplinary projects in astrostatistics and astroinformatics; (2) dedicating space and time at conferences for interdisciplinary research and promotion; (3) developing sustainable funding for long-term astrostatisics appointments; and (4) funding infrastructure development for data archives and archive support, state-of-the-art algorithms, and efficient computing. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: Submitted to the Astro2020 Decadal Survey call for science white papers

arXiv:1902.05188 [pdf, other]

doi 10.1088/1538-3873/ab0f7b

A Comparison of Photometric Redshift Techniques for Large Radio Surveys

Authors: Ray P. Norris, M. Salvato, G. Longo, M. Brescia, T. Budavari, S. Carliles, S. Cavuoti, D. Farrah, J. Geach, K. Luken, A. Musaeva, K. Polsterer, G. Riccio, N. Seymour, V. Smolčić, M. Vaccari, P. Zinn

Abstract: Future radio surveys will generate catalogues of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshi… ▽ More Future radio surveys will generate catalogues of tens of millions of radio sources, for which redshift estimates will be essential to achieve many of the science goals. However, spectroscopic data will be available for only a small fraction of these sources, and in most cases even the optical and infrared photometry will be of limited quality. Furthermore, radio sources tend to be at higher redshift than most optical sources and so a significant fraction of radio sources hosts differ from those for which most photometric redshift templates are designed. We therefore need to develop new techniques for estimating the redshifts of radio sources. As a starting point in this process, we evaluate a number of machine-learning techniques for estimating redshift, together with a conventional template-fitting technique. We pay special attention to how the performance is affected by the incompleteness of the training sample and by sparseness of the parameter space or by limited availability of ancillary multi-wavelength data. As expected, we find that the quality of the photometric-redshift degrades as the quality of the photometry decreases, but that even with the limited quality of photometry available for all sky-surveys, useful redshift information is available for the majority of sources, particularly at low redshift. We find that a template-fitting technique performs best with high-quality and almost complete multi-band photometry, especially if radio sources that are also X-ray emitting are treated separately. When we reduced the quality of photometry to match that available for the EMU all-sky radio survey, the quality of the template-fitting degraded and became comparable to some of the machine learning methods. Machine learning techniques currently perform better at low redshift than at high redshift, because of incompleteness of the currently available training data at high redshifts. △ Less

Submitted 13 February, 2019; originally announced February 2019.

Comments: Submitted to PASP

arXiv:1807.07211 [pdf, other]

doi 10.3847/1538-3881/ab139f

Subband Image Reconstruction using Differential Chromatic Refraction

Authors: Matthias Lee, Tamas Budavari, Ian Sullivan, Andrew Connolly

Abstract: Refraction by the atmosphere causes the positions of sources to depend on the airmass through which an observation was taken. This shift is dependent on the underlying spectral energy of the source and the filter or bandpass through which it is observed. Wavelength-dependent refraction within a single passband is often referred to as differential chromatic refraction (DCR). With a new generation o… ▽ More Refraction by the atmosphere causes the positions of sources to depend on the airmass through which an observation was taken. This shift is dependent on the underlying spectral energy of the source and the filter or bandpass through which it is observed. Wavelength-dependent refraction within a single passband is often referred to as differential chromatic refraction (DCR). With a new generation of astronomical surveys undertaking repeated observations of the same part of the sky over a range of different airmasses and parallactic angles, DCR should be a detectable and measurable astrometric signal. In this paper we introduce a novel procedure that takes this astrometric signal and uses it to infer the underlying spectral energy distribution of a source; we solve for multiple latent images at specific wavelengths via a generalized deconvolution procedure built on robust statistics. We demonstrate the utility of such an approach for estimating a partially deconvolved image, at higher spectral resolution than the input images, for surveys such as the Large Synoptic Survey Telescope (LSST). △ Less

Submitted 24 March, 2019; v1 submitted 18 July, 2018; originally announced July 2018.

arXiv:1803.04974 [pdf, ps, other]

doi 10.1017/S1743921318002296

The Hubble Catalog of Variables (HCV)

Authors: K. V. Sokolovsky, A. Z. Bonanos, P. Gavras, M. Yang, D. Hatzidimitriou, M. I. Moretti, A. Karampelas, I. Bellas-Velidis, Z. Spetsieri, E. Pouliasis, I. Georgantopoulos, V. Charmandaris, K. Tsinganos, N. Laskaris, G. Kakaletris, A. Nota, D. Lennon, C. Arviset, B. C. Whitmore, T. Budavari, R. Downes, S. Lubow, A. Rest, L. Strolger, R. White

Abstract: The Hubble Source Catalog (HSC) combines lists of sources detected on images obtained with the WFPC2, ACS and WFC3 instruments aboard the Hubble Space Telescope (HST) available in the Hubble Legacy Archive. The catalog contains time-domain information with about two million of its sources detected with the same instrument and filter in at least five HST visits. The Hubble Catalog of Variables (HCV… ▽ More The Hubble Source Catalog (HSC) combines lists of sources detected on images obtained with the WFPC2, ACS and WFC3 instruments aboard the Hubble Space Telescope (HST) available in the Hubble Legacy Archive. The catalog contains time-domain information with about two million of its sources detected with the same instrument and filter in at least five HST visits. The Hubble Catalog of Variables (HCV) project aims to identify HSC sources showing significant brightness variations. A magnitude-dependent threshold in the median absolute deviation of photometric measurements (an outlier-resistant measure of lightcurve scatter) is adopted as the variability-detection statistic. It is supplemented with a cut in $χ_{\rm red}^2$ that removes sources with large photometric errors. A pre-processing procedure involving bad image identification, outlier rejection and computation of local magnitude zero-point corrections is applied to HSC lightcurves before computing the variability detection statistic. About 52000 HSC sources are identified as candidate variables, among which 7800 show variability in more than one filter. Visual inspection suggests that $\sim 70\%$ of the candidates detected in multiple filters are true variables while the remaining $\sim 30\%$ are sources with aperture photometry corrupted by blending, imaging artifacts or image processing anomalies. The candidate variables have AB magnitudes in the range 15-27$^{m}$ with the median 22$^{m}$. Among them are the stars in our own and nearby galaxies as well as active galactic nuclei. △ Less

Submitted 13 March, 2018; originally announced March 2018.

Comments: 4 pages, 1 figure, proceedings of the IAU Symposium 339 Southern Horizons in Time-Domain Astronomy, 13-17 November 2017, Stellenbosch, South Africa

arXiv:1711.11491 [pdf, ps, other]

Hubble Catalog of Variables

Authors: M. Yang, A. Z. Bonanos, P. Gavras, K. Sokolovsky, D. Hatzidimitriou, M. I. Moretti, A. Karampelas, I. Bellas-Velidis, Z. Spetsieri, E. Pouliasis, I. Georgantopoulos, V. Charmandaris, K. Tsinganos, N. Laskaris, G. Kakaletris, A. Nota, D. Lennon, C. Arviset, B. Whitmore, T. Budavari, R. Downes, S. Lubow, A. Rest, L. Strolger, R. White

Abstract: The Hubble Catalog of Variables (HCV) project aims to identify the variable sources in the Hubble Source Catalog (HSC), which includes about 92 million objects with over 300 million measurements detected by the WFPC2, ACS and WFC3 cameras on board of the Hubble Space Telescope (HST), by using an automated pipeline containing a set of detection and validation algorithms. All the HSC sources with mo… ▽ More The Hubble Catalog of Variables (HCV) project aims to identify the variable sources in the Hubble Source Catalog (HSC), which includes about 92 million objects with over 300 million measurements detected by the WFPC2, ACS and WFC3 cameras on board of the Hubble Space Telescope (HST), by using an automated pipeline containing a set of detection and validation algorithms. All the HSC sources with more than a predefined number of measurements in a single filter/instrument combination are pre-processed to correct systematic effect and to remove the bad measurements. The corrected data are used to compute a number of variability indexes to determine the variability status of each source. The final variable source catalog will contain variables stars, active galactic nuclei (AGNs), supernovae (SNs) or even new types of variables, reaching an unprecedented depth (V$\leq$27 mag). At the end of the project, the first release of the HCV will be available at the Mikulski Archive for Space Telescopes (MAST) and the ESA Hubble Science Archives. The HCV pipeline will be deployed at the Space Telescope Science Institute (STScI) so that an updated HCV may be generated following future releases of HSC. △ Less

Submitted 30 November, 2017; originally announced November 2017.

Comments: 4 pages, 1 figure, to appear in the proceeding of "Stellar Populations and the Distance Scale", Joseph Jensen (eds), ASPCS

arXiv:1711.02793 [pdf, other]

doi 10.1016/j.ascom.2017.09.002

Robust Statistics for Image Deconvolution

Authors: Matthias Lee, Tamas Budavari, Richard White, Charles Gulian

Abstract: We present a blind multiframe image-deconvolution method based on robust statistics. The usual shortcomings of iterative optimization of the likelihood function are alleviated by minimizing the M-scale of the residuals, which achieves more uniform convergence across the image. We focus on the deconvolution of astronomical images, which are among the most challenging due to their huge dynamic range… ▽ More We present a blind multiframe image-deconvolution method based on robust statistics. The usual shortcomings of iterative optimization of the likelihood function are alleviated by minimizing the M-scale of the residuals, which achieves more uniform convergence across the image. We focus on the deconvolution of astronomical images, which are among the most challenging due to their huge dynamic ranges and the frequent presence of large noise-dominated regions in the images. We show that high-quality image reconstruction is possible even in super-resolution and without the use of traditional regularization terms. Using a robust \r{ho}-function is straightforward to implement in a streaming setting and, hence our method is applicable to the large volumes of astronomy images. The power of our method is demonstrated on observations from the Sloan Digital Sky Survey (Stripe 82) and we briefly discuss the feasibility of a pipeline based on Graphical Processing Units for the next generation of telescope surveys. △ Less

Submitted 7 November, 2017; originally announced November 2017.

arXiv:1711.00975 [pdf, other]

doi 10.1016/j.ascom.2018.04.003

Scalable Streaming Tools for Analyzing $N$-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass

Authors: Nikita Ivkin, Zaoxing Liu, Lin F. Yang, Srinivas Suresh Kumar, Gerard Lemson, Mark Neyrinck, Alexander S. Szalay, Vladimir Braverman, Tamas Budavari

Abstract: Cosmological $N$-body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly l… ▽ More Cosmological $N$-body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to $10^9$ particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve performance and scalability. Our rigorous analysis of the sketch parameters improves the previous results from finding the centers of the $10^3$ largest halos to $\sim 10^4-10^5$, and reveals the trade-offs between memory, running time and number of halos. Our experiments show that our tool can scale to datasets with up to $\sim 10^{12}$ particles while using less than an hour of running time on a single GPU Nvidia GTX 1080. △ Less

Submitted 28 April, 2018; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: preprint

arXiv:1710.10231 [pdf, other]

Probabilistic Cross-identification of Multiple Catalogs in Crowded Fields

Authors: Xiaochen Shi, Tamas Budavari, Amitabh Basu

Abstract: Matching astronomical catalogs in crowded regions of the sky is challenging both statistically and computationally due to the many possible alternative associations. Budavári and Basu (2016) modeled the two-catalog situation as an Assignment Problem and used the famous Hungarian algorithm to solve it. Here we treat cross-identification of multiple catalogs by introducing a different approach based… ▽ More Matching astronomical catalogs in crowded regions of the sky is challenging both statistically and computationally due to the many possible alternative associations. Budavári and Basu (2016) modeled the two-catalog situation as an Assignment Problem and used the famous Hungarian algorithm to solve it. Here we treat cross-identification of multiple catalogs by introducing a different approach based on integer linear programming. We first test this new method on problems with two catalogs and compare with the previous results. We then test the efficacy of the new approach on problems with three catalogs. The performance and scalability of the new approach is discussed in the context of large surveys. △ Less

Submitted 12 November, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

Comments: 9 pages, 8 figure; Accepted for publication in The Astrophysical Journal

arXiv:1706.09546 [pdf, other]

doi 10.1016/j.ascom.2017.06.001

Probabilistic Cross-Identification of Galaxies with Realistic Clustering

Authors: Neil Mallinar, Tamas Budavari, Gerard Lemson

Abstract: Probabilistic cross-identification has been successfully applied to a number of problems in astronomy from matching simple point sources to associating stars with unknown proper motions and even radio observations with realistic morphology. Here we study the Bayes factor for clustered objects and focus in particular on galaxies to assess the effect of typical angular correlations. Numerical calcul… ▽ More Probabilistic cross-identification has been successfully applied to a number of problems in astronomy from matching simple point sources to associating stars with unknown proper motions and even radio observations with realistic morphology. Here we study the Bayes factor for clustered objects and focus in particular on galaxies to assess the effect of typical angular correlations. Numerical calculations provide the modified relationship, which (as expected) suppresses the evidence for the associations at the shortest separations where the 2-point auto-correlation function is large. Ultimately this means that the matching probability drops at somewhat shorter scales than in previous models. △ Less

Submitted 28 June, 2017; originally announced June 2017.

Comments: Accepted for publication in Astronomy and Computing, 6 pages, 3 figures

arXiv:1705.10711 [pdf, other]

doi 10.1093/mnras/stx2651

Finding counterparts for All-sky X-ray surveys with Nway: a Bayesian algorithm for cross-matching multiple catalogues

Authors: Mara Salvato, J. Buchner, T. Budavari, T. Dwelly, A. Merloni, M. Brusa, A. Rau, S. Fotopoulou, K. Nandra

Abstract: We release the AllWISE counterparts and Gaia matches to 106,573 and 17,665 X-ray sources detected in the ROSAT 2RXS and XMMSL2 surveys with |b|>15. These are the brightest X-ray sources in the sky, but their position uncertainties and the sparse multi-wavelength coverage until now rendered the identification of their counterparts a demanding task with uncertain results. New all-sky multi-wavelengt… ▽ More We release the AllWISE counterparts and Gaia matches to 106,573 and 17,665 X-ray sources detected in the ROSAT 2RXS and XMMSL2 surveys with |b|>15. These are the brightest X-ray sources in the sky, but their position uncertainties and the sparse multi-wavelength coverage until now rendered the identification of their counterparts a demanding task with uncertain results. New all-sky multi-wavelength surveys of sufficient depth, like AllWISE and Gaia, and a new Bayesian statistics based algorithm, NWAY, allow us, for the first time, to provide reliable counterpart associations. NWAY extends previous distance and sky density based association methods and, using one or more priors (e.g., colors, magnitudes), weights the probability that sources from two or more catalogues are simultaneously associated on the basis of their observable characteristics. Here, counterparts have been determined using a WISE color-magnitude prior. A reference sample of 4524 XMM/Chandra and Swift X-ray sources demonstrates a reliability of ~ 94.7% (2RXS) and 97.4% (XMMSL2). Combining our results with Chandra-COSMOS data, we propose a new separation between stars and AGN in the X-ray/WISE flux-magnitude plane, valid over six orders of magnitude. We also release the NWAY code and its user manual. NWAY was extensively tested with XMM-COSMOS data. Using two different sets of priors, we find an agreement of 96% and 99% with published Likelihood Ratio methods. Our results were achieved faster and without any follow-up visual inspection. With the advent of deep and wide area surveys in X-rays (e.g. SRG/eROSITA, Athena/WFI) and radio (ASKAP/EMU, LOFAR, APERTIF, etc.) NWAY will provide a powerful and reliable counterpart identification tool. △ Less

Submitted 20 October, 2017; v1 submitted 30 May, 2017; originally announced May 2017.

Comments: MNRAS, Paper accepted for publication. Updated catalogs are available at www.mpe.mpg.de/XraySurveys/2RXS_XMMSL2 . NWAY available at https://github.com/JohannesBuchner/nway

arXiv:1704.01796 [pdf, other]

doi 10.1093/mnras/stx864

SPIDERS: Selection of spectroscopic targets using AGN candidates detected in all-sky X-ray surveys

Authors: T. Dwelly, M. Salvato, A. Merloni, M. Brusa, J. Buchner, S. F. Anderson, Th. Boller, W. N. Brandt, T. Budavári, N. Clerc, D. Coffey, A. Del Moro, A. Georgakakis, P. J. Green, C. Jin, M. -L. Menzel, A. D. Myers, K. Nandra, R. C. Nichol, J. Ridl, A. D. Schwope, T. Simm

Abstract: SPIDERS (SPectroscopic IDentification of eROSITA Sources) is an SDSS-IV survey running in parallel to the eBOSS cosmology project. SPIDERS will obtain optical spectroscopy for large numbers of X-ray-selected AGN and galaxy cluster members detected in wide area eROSITA, XMM-Newton and ROSAT surveys. We describe the methods used to choose spectroscopic targets for two sub-programmes of SPIDERS: X-ra… ▽ More SPIDERS (SPectroscopic IDentification of eROSITA Sources) is an SDSS-IV survey running in parallel to the eBOSS cosmology project. SPIDERS will obtain optical spectroscopy for large numbers of X-ray-selected AGN and galaxy cluster members detected in wide area eROSITA, XMM-Newton and ROSAT surveys. We describe the methods used to choose spectroscopic targets for two sub-programmes of SPIDERS: X-ray selected AGN candidates detected in the ROSAT All Sky and the XMM-Newton Slew surveys. We have exploited a Bayesian cross-matching algorithm, guided by priors based on mid-IR colour-magnitude information from the WISE survey, to select the most probable optical counterpart to each X-ray detection. We empirically demonstrate the high fidelity of our counterpart selection method using a reference sample of bright well-localised X-ray sources collated from XMM-Newton, Chandra and Swift-XRT serendipitous catalogues, and also by examining blank-sky locations. We describe the down-selection steps which resulted in the final set of SPIDERS-AGN targets put forward for spectroscopy within the eBOSS/TDSS/SPIDERS survey, and present catalogues of these targets. We also present catalogues of ~12000 ROSAT and ~1500 XMM-Newton Slew survey sources which have existing optical spectroscopy from SDSS-DR12, including the results of our visual inspections. On completion of the SPIDERS program, we expect to have collected homogeneous spectroscopic redshift information over a footprint of ~7500 deg$^2$ for >85 percent of the ROSAT and XMM-Newton Slew survey sources having optical counterparts in the magnitude range 17<r<22.5, producing a large and highly complete sample of bright X-ray-selected AGN suitable for statistical studies of AGN evolution and clustering. △ Less

Submitted 6 April, 2017; originally announced April 2017.

Comments: MNRAS, accepted

arXiv:1703.02038 [pdf, other]

doi 10.1051/epjconf/201715202005

The Hubble Catalog of Variables

Authors: K. Sokolovsky, A. Bonanos, P. Gavras, M. Yang, D. Hatzidimitriou, M. I. Moretti, A. Karampelas, I. Bellas-Velidis, Z. Spetsieri, E. Pouliasis, I. Georgantopoulos, V. Charmandaris, K. Tsinganos, N. Laskaris, G. Kakaletris, A. Nota, D. Lennon, C. Arviset, B. Whitmore, T. Budavari, R. Downes, S. Lubow, A. Rest, L. Strolger, R. White

Abstract: We aim to construct an exceptionally deep (V ~< 27) catalog of variable objects in selected Galactic and extragalactic fields visited multiple times by the Hubble Space Telescope (HST). While HST observations of some of these fields were searched for specific types of variables before (most notably, the extragalactic Cepheids), we attempt a systematic study of the population of variable objects of… ▽ More We aim to construct an exceptionally deep (V ~< 27) catalog of variable objects in selected Galactic and extragalactic fields visited multiple times by the Hubble Space Telescope (HST). While HST observations of some of these fields were searched for specific types of variables before (most notably, the extragalactic Cepheids), we attempt a systematic study of the population of variable objects of all types at the magnitude range not easily accessible with ground-based telescopes. The variability timescales that can be probed range from hours to years depending on how often a particular field has been visited. For source extraction and cross-matching of sources between visits we rely on the Hubble Source Catalog which includes 10^7 objects detected with WFPC2, ACS, and WFC3 HST instruments. The lightcurves extracted from the HSC are corrected for systematic effects by applying local zero-point corrections and are screened for bad measurements. For each lightcurve we compute variability indices sensitive to a broad range of variability types. The indices characterize the overall lightcurve scatter and smoothness. Candidate variables are selected as having variability index values significantly higher than expected for objects of similar brightness in the given set of observations. The Hubble Catalog of Variables will be released in 2018. △ Less

Submitted 3 July, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

Comments: 5 pages, 3 figures, 1 table, proceedings of the 22nd Los Alamos Stellar Pulsation Conference "Wide-field variability surveys: a 21st-century perspective" held in San Pedro de Atacama, Chile, Nov. 28-Dec. 2, 2016

arXiv:1611.03171 [pdf, other]

doi 10.3847/1538-4357/aa6335

Faint Object Detection in Multi-Epoch Observations via Catalog Data Fusion

Authors: Tamas Budavari, Alexander S. Szalay, Thomas J. Loredo

Abstract: Observational astronomy in the time-domain era faces several new challenges. One of them is the efficient use of observations obtained at multiple epochs. The work presented here addresses faint object detection with multi-epoch data, and describes an incremental strategy for separating real objects from artifacts in ongoing surveys, in situations where the single-epoch data are summaries of the f… ▽ More Observational astronomy in the time-domain era faces several new challenges. One of them is the efficient use of observations obtained at multiple epochs. The work presented here addresses faint object detection with multi-epoch data, and describes an incremental strategy for separating real objects from artifacts in ongoing surveys, in situations where the single-epoch data are summaries of the full image data, such as single-epoch catalogs of flux and direction estimates for candidate sources. The basic idea is to produce low-threshold single-epoch catalogs, and use a probabilistic approach to accumulate catalog information across epochs; this is in contrast to more conventional strategies based on co-added or stacked image data across all epochs. We adopt a Bayesian approach, addressing object detection by calculating the marginal likelihoods for hypotheses asserting there is no object, or one object, in a small image patch containing at most one cataloged source at each epoch. The object-present hypothesis interprets the sources in a patch at different epochs as arising from a genuine object; the no-object (noise) hypothesis interprets candidate sources as spurious, arising from noise peaks. We study the detection probability for constant-flux objects in a simplified Gaussian noise setting, comparing results based on single exposures and stacked exposures to results based on a series of single-epoch catalog summaries. Computing the detection probability based on catalog data amounts to generalized cross-matching: it is the product of a factor accounting for matching of the estimated fluxes of candidate sources, and a factor accounting for matching of their estimated directions. We find that probabilistic fusion of multi-epoch catalog information can detect sources with only modest sacrifice in sensitivity and selectivity compared to stacking. △ Less

Submitted 9 November, 2016; originally announced November 2016.

Comments: 11 pages, 11 figures

arXiv:1611.01560 [pdf, other]

doi 10.1016/j.ascom.2017.03.002

Photo-z-SQL: integrated, flexible photometric redshift computation in a database

Authors: Róbert Beck, László Dobos, Tamás Budavári, Alexander S. Szalay, István Csabai

Abstract: We present a flexible template-based photometric redshift estimation framework, implemented in C#, that can be seamlessly integrated into a SQL database (or DB) server and executed on-demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and utilizes the computational capabilities of DB hardware. The code is able to per… ▽ More We present a flexible template-based photometric redshift estimation framework, implemented in C#, that can be seamlessly integrated into a SQL database (or DB) server and executed on-demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and utilizes the computational capabilities of DB hardware. The code is able to perform both maximum likelihood and Bayesian estimation, and can handle inputs of variable photometric filter sets and corresponding broad-band magnitudes. It is possible to take into account the full covariance matrix between filters, and filter zero points can be empirically calibrated using measurements with given redshifts. The list of spectral templates and the prior can be specified flexibly, and the expensive synthetic magnitude computations are done via lazy evaluation, coupled with a caching of results. Parallel execution is fully supported. For large upcoming photometric surveys such as the LSST, the ability to perform in-place photo-z calculation would be a significant advantage. Also, the efficient handling of variable filter sets is a necessity for heterogeneous databases, for example the Hubble Source Catalog, and for cross-match services such as SkyQuery. We illustrate the performance of our code on two reference photo-z estimation testing datasets, and provide an analysis of execution time and scalability with respect to different configurations. The code is available for download at https://github.com/beckrob/Photo-z-SQL. △ Less

Submitted 20 March, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

Comments: 14 pages, 5 figures. Minor revision accepted by Astronomy & Computing on 2017 March 11

arXiv:1610.04178 [pdf, ps, other]

doi 10.1146/annurev-statistics-010814-020231

Probabilistic record linkage in astronomy: Directional cross-identification and beyond

Authors: Tamas Budavari, Thomas J. Loredo

Abstract: Modern astronomy increasingly relies upon systematic surveys, whose dedicated telescopes continuously observe the sky across varied wavelength ranges of the electromagnetic spectrum; some surveys also observe non-electromagnetic "messengers," such as high-energy particles or gravitational waves. Stars and galaxies look different through the eyes of different instruments, and their independent meas… ▽ More Modern astronomy increasingly relies upon systematic surveys, whose dedicated telescopes continuously observe the sky across varied wavelength ranges of the electromagnetic spectrum; some surveys also observe non-electromagnetic "messengers," such as high-energy particles or gravitational waves. Stars and galaxies look different through the eyes of different instruments, and their independent measurements have to be carefully combined to provide a complete, sound picture of the multicolor and eventful universe. The association of an object's independent detections is, however, a difficult problem scientifically, computationally, and statistically, raising varied challenges across diverse astronomical applications. The fundamental problem is finding records in survey databases with directions that match to within the direction uncertainties. Such astronomical versions of the record linkage problem are known by various terms in astronomy: cross-matching, cross-identification, and directional, positional, or spatio-temporal coincidence assessment. Astronomers have developed several statistical approaches for such problems, largely independently of related developments in other disciplines. Here we review emerging approaches that compute (Bayesian) probabilities for the hypotheses of interest: possible associations, or demographic properties of a cosmic population that depend on identifying associations. Many cross-identification tasks can be formulated within a hierarchical Bayesian partition model framework, with components that explicitly account for astrophysical effects (e.g., source brightness vs. wavelength, source motion, or source extent), selection effects, and measurement error. We survey recent developments, and highlight important open areas for future research. △ Less

Submitted 13 October, 2016; originally announced October 2016.

Comments: 21 pages, 9 figures

Journal ref: Annual Review of Statistics and Its Application 2015 Vol. 2: 113-139

arXiv:1609.03932 [pdf, other]

doi 10.3847/0004-637X/833/1/26

Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data

Authors: David Lawlor, Tamás Budavári, Michael W. Mahoney

Abstract: We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digital… ▽ More We apply a novel spectral graph technique, that of locally-biased semi-supervised eigenvectors, to study the diversity of galaxies. This technique permits us to characterize empirically the natural variations in observed spectra data, and we illustrate how this approach can be used in an exploratory manner to highlight both large-scale global as well as small-scale local structure in Sloan Digital Sky Survey (SDSS) data. We use this method in a way that simultaneously takes into account the measurements of spectral lines as well as the continuum shape. Unlike Principal Component Analysis, this method does not assume that the Euclidean distance between galaxy spectra is a good global measure of similarity between all spectra, but instead it only assumes that local difference information between similar spectra is reliable. Moreover, unlike other nonlinear dimensionality methods, this method can be used to characterize very finely both small-scale local as well as large-scale global properties of realistic noisy data. The power of the method is demonstrated on the SDSS Main Galaxy Sample by illustrating that the derived embeddings of spectra carry an unprecedented amount of information. By using a straightforward global or unsupervised variant, we observe that the main features correlate strongly with star formation rate and that they clearly separate active galactic nuclei. Computed parameters of the method can be used to describe line strengths and their interdependencies. By using a locally-biased or semi-supervised variant, we are able to focus on typical variations around specific objects of astronomical interest. We present several examples illustrating that this approach can enable new discoveries in the data as well as a detailed understanding of very fine local structure that would otherwise be overwhelmed by large-scale noise and global trends in the data. △ Less

Submitted 13 September, 2016; originally announced September 2016.

Comments: 34 pages. A modified version of this paper has been accepted to The Astrophysical Journal

arXiv:1609.03065 [pdf, other]

doi 10.3847/0004-6256/152/4/86

Probabilistic Cross-Identification in Crowded Fields as an Assignment Problem

Authors: Tamas Budavari, Amitabh Basu

Abstract: One of the outstanding challenges of cross-identification is multiplicity: detections in crowded regions of the sky are often linked to more than one candidate associations of similar likelihoods. We map the resulting maximum likelihood partitioning to the fundamental assignment problem of discrete mathematics and efficiently solve the two-way catalog-level matching in the realm of combinatorial o… ▽ More One of the outstanding challenges of cross-identification is multiplicity: detections in crowded regions of the sky are often linked to more than one candidate associations of similar likelihoods. We map the resulting maximum likelihood partitioning to the fundamental assignment problem of discrete mathematics and efficiently solve the two-way catalog-level matching in the realm of combinatorial optimization using the so-called Hungarian algorithm. We introduce the method, demonstrate its performance in a mock universe where the true associations are known, and discuss the applicability of the new procedure to large surveys. △ Less

Submitted 10 September, 2016; originally announced September 2016.

Comments: 6 pages, 4 figures, accepted for publication in the Astronomical Journal

arXiv:1606.03957 [pdf, other]

doi 10.1007/s10686-016-9502-5

The Footprint Database and Web Services of the Herschel Space Observatory

Authors: László Dobos, Erika Varga-Verebélyi, Eva Verdugo, David Teyssier, Katrina Exter, Ivan Valtchanov, Tamás Budavári, Csaba Kiss

Abstract: Data from the Herschel Space Observatory is freely available to the public but no uniformly processed catalogue of the observations has been published so far. To date, the Herschel Science Archive does not contain the exact sky coverage (footprint) of individual observations and supports search for measurements based on bounding circles only. Drawing on previous experience in implementing footprin… ▽ More Data from the Herschel Space Observatory is freely available to the public but no uniformly processed catalogue of the observations has been published so far. To date, the Herschel Science Archive does not contain the exact sky coverage (footprint) of individual observations and supports search for measurements based on bounding circles only. Drawing on previous experience in implementing footprint databases, we built the Herschel Footprint Database and Web Services for the Herschel Space Observatory to provide efficient search capabilities for typical astronomical queries. The database was designed with the following main goals in mind: (a) provide a unified data model for meta-data of all instruments and observational modes, (b) quickly find observations covering a selected object and its neighbourhood, (c) quickly find every observation in a larger area of the sky, (d) allow for finding solar system objects crossing observation fields. As a first step, we developed a unified data model of observations of all three Herschel instruments for all pointing and instrument modes. Then, using telescope pointing information and observational meta-data, we compiled a database of footprints. As opposed to methods using pixellation of the sphere, we represent sky coverage in an exact geometric form allowing for precise area calculations. For easier handling of Herschel observation footprints with rather complex shapes, two algorithms were implemented to reduce the outline. Furthermore, a new visualisation tool to plot footprints with various spherical projections was developed. Indexing of the footprints using Hierarchical Triangular Mesh makes it possible to quickly find observations based on sky coverage, time and meta-data. The database is accessible via a web site (http://herschel.vo.elte.hu) and also as a set of REST web service functions. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Comments: Accepted for publication in Experimental Astronomy

arXiv:1604.00652 [pdf, ps, other]

doi 10.3847/0004-6256/152/6/155

Galaxy Redshifts from Discrete Optimization of Correlation Functions

Authors: Benjamin C. G. Lee, Tamás Budavári, Amitabh Basu, Mubdi Rahman

Abstract: We propose a new method of constraining the redshifts of individual extragalactic sources based on celestial coordinates and their ensemble statistics. Techniques from integer linear programming are utilized to optimize simultaneously for the angular two-point cross- and autocorrelation functions. Our novel formalism introduced here not only transforms the otherwise hopelessly expensive, brute-for… ▽ More We propose a new method of constraining the redshifts of individual extragalactic sources based on celestial coordinates and their ensemble statistics. Techniques from integer linear programming are utilized to optimize simultaneously for the angular two-point cross- and autocorrelation functions. Our novel formalism introduced here not only transforms the otherwise hopelessly expensive, brute-force combinatorial search into a linear system with integer constraints but also is readily implementable in off-the-shelf solvers. We adopt Gurobi, a commercial optimization solver, and use Python to build the cost function dynamically. The preliminary results on simulated data show potential for future applications to sky surveys by complementing and enhancing photometric redshift estimators. Our approach is the first application of integer linear programming to astronomical analysis. △ Less

Submitted 10 September, 2016; v1 submitted 3 April, 2016; originally announced April 2016.

Comments: 10 pages with 3 figures, accepted for publication in The Astronomical Journal on 08/04/16

arXiv:1603.09708 [pdf, other]

doi 10.1093/mnras/stw1009

Photometric redshifts for the SDSS Data Release 12

Authors: Róbert Beck, László Dobos, Tamás Budavári, Alexander S. Szalay, István Csabai

Abstract: We present the methodology and data behind the photometric redshift database of the Sloan Digital Sky Survey Data Release 12 (SDSS DR12). We adopt a hybrid technique, empirically estimating the redshift via local regression on a spectroscopic training set, then fitting a spectrum template to obtain K-corrections and absolute magnitudes. The SDSS spectroscopic catalog was augmented with data from o… ▽ More We present the methodology and data behind the photometric redshift database of the Sloan Digital Sky Survey Data Release 12 (SDSS DR12). We adopt a hybrid technique, empirically estimating the redshift via local regression on a spectroscopic training set, then fitting a spectrum template to obtain K-corrections and absolute magnitudes. The SDSS spectroscopic catalog was augmented with data from other, publicly available spectroscopic surveys to mitigate target selection effects. The training set is comprised of $1,976,978$ galaxies, and extends up to redshift $z\approx 0.8$, with a useful coverage of up to $z\approx 0.6$. We provide photometric redshifts and realistic error estimates for the $208,474,076$ galaxies of the SDSS primary photometric catalog. We achieve an average bias of $\overline{Δz_{\mathrm{norm}}} = 5.84 \times 10^{-5}$, a standard deviation of $σ\left(Δz_{\mathrm{norm}}\right)=0.0205$, and a $3σ$ outlier rate of $P_o=4.11\%$ when cross-validating on our training set. The published redshift error estimates and photometric error classes enable the selection of galaxies with high quality photometric redshifts. We also provide a supplementary error map that allows additional, sophisticated filtering of the data. △ Less

Submitted 21 April, 2016; v1 submitted 31 March, 2016; originally announced March 2016.

Comments: 12 pages, 7 figures. Original submitted to MNRAS on 2016 March 03, revision submitted on 2016 April 21

Journal ref: MNRAS 2016 460 (2): 1371-1381

arXiv:1602.04861 [pdf, other]

doi 10.3847/0004-6256/151/6/134

Version 1 of the Hubble Source Catalog

Authors: Bradley C. Whitmore, Sahar S. Allam, Tamas Budavari, Stefano Casertano, Ronald A. Downes, Thomas Donaldson, S. Michael Fall, Stephen H. Lubow, Lee Quick, Louis-Gregory Strolger, Geoff Wallace, Richard L. White

Abstract: The Hubble Source Catalog is designed to help optimize science from the Hubble Space Telescope by combining the tens of thousands of visit-based source lists in the Hubble Legacy Archive into a single master catalog. Version 1 of the Hubble Source Catalog includes WFPC2, ACS/WFC, WFC3/UVIS, and WFC3/IR photometric data generated using SExtractor software to produce the individual source lists. The… ▽ More The Hubble Source Catalog is designed to help optimize science from the Hubble Space Telescope by combining the tens of thousands of visit-based source lists in the Hubble Legacy Archive into a single master catalog. Version 1 of the Hubble Source Catalog includes WFPC2, ACS/WFC, WFC3/UVIS, and WFC3/IR photometric data generated using SExtractor software to produce the individual source lists. The catalog includes roughly 80 million detections of 30 million objects involving 112 different detector/filter combinations, and about 160 thousand HST exposures. Source lists from Data Release 8 of the Hubble Legacy Archive are matched using an algorithm developed by Budavari & Lubow (2012). The mean photometric accuracy for the catalog as a whole is better than 0.10 mag, with relative accuracy as good as 0.02 mag in certain circumstances (e.g., bright isolated stars). The relative astrometric residuals are typically within 10 mas, with a value for the mode (i.e., most common value) of 2.3 mas. The absolute astrometric accuracy is better than $\sim$0.1 arcsec for most sources, but can be much larger for a fraction of fields that could not be matched to the PanSTARRS, SDSS, or 2MASS reference systems. In this paper we describe the database design with emphasis on those aspects that enable the users to fully exploit the catalog while avoiding common misunderstandings and potential pitfalls. We provide usage examples to illustrate some of the science capabilities and data quality characteristics, and briefly discuss plans for future improvements to the Hubble Source Catalog. △ Less

Submitted 15 February, 2016; originally announced February 2016.

Comments: 25 pages, 30 figures, 2 tables, 4 appendices; AJ accepted

arXiv:1602.01050 [pdf, other]

doi 10.1017/S1743921316008401

Herschel Footprint Database and Service

Authors: E. Varga-Verebélyi, L. Dobos, T. Budavári, Cs. Kiss

Abstract: We created the Herschel Footprint Database and web services for the Herschel Space Observatory imaging data. For this database we set up a unified data model for the PACS and SPIRE Herschel instruments, from the pointing and header information of each observation, generated and stored sky coverages (footprints) of the observations in their exact geometric form. With this tool we extend the capabil… ▽ More We created the Herschel Footprint Database and web services for the Herschel Space Observatory imaging data. For this database we set up a unified data model for the PACS and SPIRE Herschel instruments, from the pointing and header information of each observation, generated and stored sky coverages (footprints) of the observations in their exact geometric form. With this tool we extend the capabilities of the Herschel Science Archive by providing an effective search tool that is able to find observations for selected sky locations (objects), or even in larger areas in the sky. △ Less

Submitted 2 February, 2016; originally announced February 2016.

Comments: IAU Symposium 315

arXiv:1512.03057 [pdf, other]

doi 10.1093/mnras/stw981

Exploring the SDSS Photometric Galaxies with Clustering Redshifts

Authors: Mubdi Rahman, Alexander J. Mendez, Brice Ménard, Ryan Scranton, Samuel J. Schmidt, Christopher B. Morrison, Tamás Budavári

Abstract: We apply clustering-based redshift inference to all extended sources from the Sloan Digital Sky Survey photometric catalogue, down to magnitude r = 22. We map the relationships between colours and redshift, without assumption of the sources' spectral energy distributions (SED). We identify and locate star-forming, quiescent galaxies, and AGN, as well as colour changes due to spectral features, suc… ▽ More We apply clustering-based redshift inference to all extended sources from the Sloan Digital Sky Survey photometric catalogue, down to magnitude r = 22. We map the relationships between colours and redshift, without assumption of the sources' spectral energy distributions (SED). We identify and locate star-forming, quiescent galaxies, and AGN, as well as colour changes due to spectral features, such as the 4000 Å break, redshifting through specific filters. Our mapping is globally in good agreement with colour-redshift tracks computed with SED templates, but reveals informative differences, such as the need for a lower fraction of M-type stars in certain templates. We compare our clustering-redshift estimates to photometric redshifts and find these two independent estimators to be in good agreement at each limiting magnitude considered. Finally, we present the global clustering-redshift distribution of all Sloan extended sources, showing objects up to z ~ 0.8. While the overall shape agrees with that inferred from photometric redshifts, the clustering redshift technique results in a smoother distribution, with no indication of structure in redshift space suggested by the photometric redshift estimates (likely artifacts imprinted by their spectroscopic training set). We also infer a higher fraction of high redshift objects. The mapping between the four observed colours and redshift can be used to estimate the redshift probability distribution function of individual galaxies. This work is an initial step towards producing a general mapping between redshift and all available observables in the photometric space, including brightness, size, concentration, and ellipticity. △ Less

Submitted 23 April, 2016; v1 submitted 9 December, 2015; originally announced December 2015.

Comments: 12 pages, 9 figures, accepted to MNRAS

arXiv:1505.00621 [pdf, ps, other]

Matching Radio Catalogs with Realistic Geometry: Application to SWIRE and ATLAS

Authors: Dongwei Fan, Tamás Budavári, Ray P. Norris, Andrew M. Hopkins

Abstract: Crossmatching catalogs at different wavelengths is a difficult problem in astronomy, especially when the objects are not point-like. At radio wavelengths an object can have several components corresponding, for example, to a core and lobes. {Considering not all radio detections correspond to visible or infrared sources, matching these catalogs can be challenging.} Traditionally this is done by eye… ▽ More Crossmatching catalogs at different wavelengths is a difficult problem in astronomy, especially when the objects are not point-like. At radio wavelengths an object can have several components corresponding, for example, to a core and lobes. {Considering not all radio detections correspond to visible or infrared sources, matching these catalogs can be challenging.} Traditionally this is done by eye for better quality, which does not scale to the large data volumes expected from the next-generation of radio telescopes. We present a novel automated procedure, using Bayesian hypothesis testing, to achieve reliable associations by explicit modelling of a particular class of radio-source morphology. {The new algorithm not only assesses the likelihood of an association between data at two different wavelengths, but also tries to assess whether different radio sources are physically associated, are double-lobed radio galaxies, or just distinct nearby objects.} Application to the SWIRE and ATLAS CDF-S catalogs shows that this method performs well without human intervention. △ Less

Submitted 4 May, 2015; originally announced May 2015.

Comments: 8 pages, 7 figures

arXiv:1410.0709 [pdf, other]

doi 10.1145/2618243.2618245

Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh

Authors: Dániel Kondor, László Dobos, István Csabai, András Bodor, Gábor Vattay, Tamás Budavári, Alexander S. Szalay

Abstract: We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-fi… ▽ More We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster. △ Less

Submitted 2 October, 2014; originally announced October 2014.

Comments: appears in Proceedings of the 26th International Conference on Scientific and Statistical Database Management (2014)

arXiv:1409.7119 [pdf, other]

doi 10.1088/0004-637X/796/1/60

CANDELS/GOODS-S, CDFS, ECDFS: Photometric Redshifts For Normal and for X-Ray-Detected Galaxies

Authors: Li-Ting Hsu, Mara Salvato, Kirpal Nandra, Marcella Brusa, Ralf Bender, Johannes Buchner, Jennifer L. Donley, Dale D. Kocevski, Yicheng Guo, Nimish P. Hathi, Cyprian Rangel, S. P. Willner, Murray Brightman, Antonis Georgakakis, Tamás Budavári, Alexander S. Szalay, Matthew L. N. Ashby, Guillermo Barro, Tomas Dahlen, Sandra M. Faber, Henry C. Ferguson, Audrey Galametz, Andrea Grazian, Norman A. Grogin, Kuang-Han Huang , et al. (7 additional authors not shown)

Abstract: We present photometric redshifts and associated probability distributions for all detected sources in the Extended Chandra Deep Field South (ECDFS). The work makes use of the most up-to-date data from the Cosmic Assembly Near-IR Deep Legacy Survey (CANDELS) and the Taiwan ECDFS Near-Infrared Survey (TENIS) in addition to other data. We also revisit multi-wavelength counterparts for published X-ray… ▽ More We present photometric redshifts and associated probability distributions for all detected sources in the Extended Chandra Deep Field South (ECDFS). The work makes use of the most up-to-date data from the Cosmic Assembly Near-IR Deep Legacy Survey (CANDELS) and the Taiwan ECDFS Near-Infrared Survey (TENIS) in addition to other data. We also revisit multi-wavelength counterparts for published X-ray sources from the 4Ms-CDFS and 250ks-ECDFS surveys, finding reliable counterparts for 1207 out of 1259 sources ($\sim 96\%$). Data used for photometric redshifts include intermediate-band photometry deblended using the TFIT method, which is used for the first time in this work. Photometric redshifts for X-ray source counterparts are based on a new library of AGN/galaxy hybrid templates appropriate for the faint X-ray population in the CDFS. Photometric redshift accuracy for normal galaxies is 0.010 and for X-ray sources is 0.014, and outlier fractions are $4\%$ and $5.4\%$ respectively. The results within the CANDELS coverage area are even better as demonstrated both by spectroscopic comparison and by galaxy-pair statistics. Intermediate-band photometry, even if shallow, is valuable when combined with deep broad-band photometry. For best accuracy, templates must include emission lines. △ Less

Submitted 24 September, 2014; originally announced September 2014.

Comments: The paper has been accepted by ApJ. The materials we provide are available under [Surveys] > [CDFS] through the portal http://www.mpe.mpg.de/XraySurveys

arXiv:1403.4358 [pdf, ps, other]

doi 10.1086/669707

Efficient Catalog Matching with Dropout Detection

Authors: Dongwei Fan, Tamás Budavári, Alexander S. Szalay, Chenzhou Cui, Yongheng Zhao

Abstract: Not only source catalogs are extracted from astronomy observations. Their sky coverage is always carefully recorded and used in statistical analyses, such as correlation and luminosity function studies. Here we present a novel method for catalog matching, which inherently builds on the coverage information for better performance and completeness. A modified version of the Zones Algorithm is introd… ▽ More Not only source catalogs are extracted from astronomy observations. Their sky coverage is always carefully recorded and used in statistical analyses, such as correlation and luminosity function studies. Here we present a novel method for catalog matching, which inherently builds on the coverage information for better performance and completeness. A modified version of the Zones Algorithm is introduced for matching partially overlapping observations, where irrelevant parts of the data are excluded up front for efficiency. Our design enables searches to focus on specific areas on the sky to further speed up the process. Another important advantage of the new method over traditional techniques is its ability to quickly detect dropouts, i.e., the missing components that are in the observed regions of the celestial sphere but did not reach the detection limit in some observations. These often provide invaluable insight into the spectral energy distribution of the matched sources but rarely available in traditional associations. △ Less

Submitted 18 March, 2014; originally announced March 2014.

Comments: 14 pages, 6 figures

arXiv:1312.1340 [pdf, ps, other]

doi 10.1093/mnras/stt2339

More than just halo mass: Modelling how the red galaxy fraction depends on multiscale density in a HOD framework

Authors: Stefanie Phleps, David J. Wilman, Stefano Zibetti, Tamás Budavári

Abstract: The fraction of galaxies with red colours depends sensitively on environment, and on the way in which environment is measured. To distinguish competing theories for the quenching of star formation, a robust and complete description of environment is required, to be applied to a large sample of galaxies. The environment of galaxies can be described using the density field of neighbours on multiple… ▽ More The fraction of galaxies with red colours depends sensitively on environment, and on the way in which environment is measured. To distinguish competing theories for the quenching of star formation, a robust and complete description of environment is required, to be applied to a large sample of galaxies. The environment of galaxies can be described using the density field of neighbours on multiple scales - the multiscale density field. We are using the Millennium simulation and a simple HOD prescription which describes the multiscale density field of Sloan Digital Sky Survey DR7 galaxies to investigate the dependence of the fraction of red galaxies on the environment. Using a volume limited sample where we have sufficient galaxies in narrow density bins, we have more dynamic range in halo mass and density for satellite galaxies than for central galaxies. Therefore we model the red fraction of central galaxies as a constant while we use a functional form to describe the red fraction of satellites as a function of halo mass which allows us to distinguish a sharp from a gradual transition. While it is clear that the data can only be explained by a gradual transition, an analysis of the multiscale density field on different scales suggests that colour segregation within the haloes is needed to explain the results. We also rule out a sharp transition for central galaxies, within the halo mass range sampled. △ Less

Submitted 4 December, 2013; originally announced December 2013.

Comments: 24 pages, 21 figures, accepted for publication by MNRAS

arXiv:1312.0637 [pdf, ps, other]

doi 10.1088/0004-6256/147/5/110

Objective Identification of Informative Wavelength Regions in Galaxy Spectra

Authors: Ching-Wa Yip, Michael Mahoney, Alex Szalay, Istvan Csabai, Tamas Budavari, Rosemary Wyse, Laszlo Dobos

Abstract: Understanding the diversity in spectra is the key to determining the physical parameters of galaxies. The optical spectra of galaxies are highly convoluted with continuum and lines which are potentially sensitive to different physical parameters. Defining the wavelength regions of interest is therefore an important question. In this work, we identify informative wavelength regions in a single-burs… ▽ More Understanding the diversity in spectra is the key to determining the physical parameters of galaxies. The optical spectra of galaxies are highly convoluted with continuum and lines which are potentially sensitive to different physical parameters. Defining the wavelength regions of interest is therefore an important question. In this work, we identify informative wavelength regions in a single-burst stellar populations model by using the CUR Matrix Decomposition. Simulating the Lick/IDS spectrograph configuration, we recover the widely used Dn(4000), Hbeta, and HdeltaA to be most informative. Simulating the SDSS spectrograph configuration with a wavelength range 3450-8350 Angstrom and a model-limited spectral resolution of 3 Angstrom, the most informative regions are: first region-the 4000 Angstrom break and the Hdelta line; second region-the Fe-like indices; third region-the Hbeta line; fourth region-the G band and the Hgamma line. A Principal Component Analysis on the first region shows that the first eigenspectrum tells primarily the stellar age, the second eigenspectrum is related to the age-metallicity degeneracy, and the third eigenspectrum shows an anti-correlation between the strengths of the Balmer and the Ca K and H absorptions. The regions can be used to determine the stellar age and metallicity in early-type galaxies which have solar abundance ratios, no dust, and a single-burst star formation history. The region identification method can be applied to any set of spectra of the user's interest, so that we eliminate the need for a common, fixed-resolution index system. We discuss future directions in extending the current analysis to late-type galaxies. △ Less

Submitted 1 February, 2014; v1 submitted 2 December, 2013; originally announced December 2013.

Comments: 36 Pages, 13 Figures, 4 Tables. AJ Accepted

arXiv:1308.1440 [pdf, other]

Graywulf: A platform for federated scientific databases and services

Authors: László Dobos, Alexander S. Szalay, Tamás Budavári, István Csabai, Nolan Li

Abstract: Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very important for scientific applications. Horizontal scalability is probably the most important missing feature which makes it challenging to adapt t… ▽ More Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very important for scientific applications. Horizontal scalability is probably the most important missing feature which makes it challenging to adapt traditional relational database systems to the ever growing data sizes. Due to the limited support of array data types and metadata management, successful application of RDBMS in science usually requires the development of custom extensions. While some of these extensions are specific to the field of science, the majority of them could easily be generalized and reused in other disciplines. With the Graywulf project we intend to target several goals. We are building a generic platform that offers reusable components for efficient storage, transformation, statistical analysis and presentation of scientific data stored in Microsoft SQL Server. Graywulf also addresses the distributed computational issues arising from current RDBMS technologies. The current version supports load balancing of simple queries and parallel execution of partitioned queries over a set of mirrored databases. Uniform user access to the data is provided through a web based query interface and a data surface for software clients. Queries are formulated in a slightly modified syntax of SQL that offers a transparent view of the distributed data. The software library consists of several components that can be reused to develop complex scientific data warehouses: a system registry, administration tools to manage entire database server clusters, a sophisticated workflow execution framework, and a SQL parser library. △ Less

Submitted 6 August, 2013; originally announced August 2013.

Comments: SSDBM 2013 proceedings

arXiv:1303.4722 [pdf, other]

Clustering-based redshift estimation: method and application to data

Authors: Brice Ménard, Ryan Scranton, Samuel Schmidt, Chris Morrison, Donghui Jeong, Tamas Budavari, Mubdi Rahman

Abstract: We present a data-driven method to infer the redshift distribution of an arbitrary dataset based on spatial cross-correlation with a reference population and we apply it to various datasets across the electromagnetic spectrum to show its potential and limitations. Our approach advocates the use of clustering measurements on all available scales, in contrast to previous works focusing only on linea… ▽ More We present a data-driven method to infer the redshift distribution of an arbitrary dataset based on spatial cross-correlation with a reference population and we apply it to various datasets across the electromagnetic spectrum to show its potential and limitations. Our approach advocates the use of clustering measurements on all available scales, in contrast to previous works focusing only on linear scales. We also show how its accuracy can be enhanced by optimally sampling a dataset within its photometric space rather than applying the estimator globally. We show that the ultimate goal of this technique is to characterize the mapping between the space of photometric observables and redshift space as this characterization then allows us to infer the clustering-redshift p.d.f. of a single galaxy. We apply this technique to estimate the redshift distributions of luminous red galaxies and emission line galaxies from the SDSS, infrared sources from WISE and radio sources from FIRST. We show that consistent redshift distributions are found using both quasars and absorber systems as reference populations. This technique brings valuable information on the third dimension of astronomical datasets. It is widely applicable to a large range of extra-galactic surveys. △ Less

Submitted 30 July, 2014; v1 submitted 19 March, 2013; originally announced March 2013.

Comments: 10 pages, 5 figures. Improved description of the formalism

arXiv:1302.7005 [pdf, other]

doi 10.1088/0004-637X/773/1/32

A Lyman Break Galaxy in the Epoch of Reionization from HST Grism Spectroscopy

Authors: James E. Rhoads, Sangeeta Malhotra, Daniel Stern, Mark Dickinson, Norbert Pirzkal, Hyron Spinrad, Naveen Reddy, Nimish Hathi, Norman Grogin, Anton Koekemoer, Michael A. Peth, Seth Cohen, Zhenya Zheng, Tamas Budavari, Ignacio Ferreras, Jonathan Gardner, Caryl Gronwall, Zoltan Haiman, Gerhardt Meurer, Leonidas Moustakas, Nino Panagia, Anna Pasquali, Kailash Sahu, Sperello di Serego Alighieri, Amber Straughn , et al. (5 additional authors not shown)

Abstract: We present observations of a luminous galaxy at redshift z=6.573 --- the end of the reioinization epoch --- which has been spectroscopically confirmed twice. The first spectroscopic confirmation comes from slitless HST ACS grism spectra from the PEARS survey (Probing Evolution And Reionization Spectroscopically), which show a dramatic continuum break in the spectrum at restframe 1216 A wavelength.… ▽ More We present observations of a luminous galaxy at redshift z=6.573 --- the end of the reioinization epoch --- which has been spectroscopically confirmed twice. The first spectroscopic confirmation comes from slitless HST ACS grism spectra from the PEARS survey (Probing Evolution And Reionization Spectroscopically), which show a dramatic continuum break in the spectrum at restframe 1216 A wavelength. The second confirmation is done with Keck + DEIMOS. The continuum is not clearly detected with ground-based spectra, but high wavelength resolution enables the Lyman alpha emission line profile to be determined. We compare the line profile to composite line profiles at redshift z=4.5. The Lyman alpha line profile shows no signature of a damping wing attenuation, confirming that the intergalactic gas is ionized at redshift z=6.57. Spectra of Lyman breaks at yet higher redshifts will be possible using comparably deep observations with IR-sensitive grisms, even at redshifts where Lyman alpha is too attenuated by the neutral IGM to be detectable using traditional spectroscopy from the ground. △ Less

Submitted 14 May, 2013; v1 submitted 27 February, 2013; originally announced February 2013.

Comments: 19 pages, four figures. Resubmitted to The Astrophysical Journal after revisions to address the referee's report

arXiv:1210.8030 [pdf, other]

Astronomy and Computing: a New Journal for the Astronomical Computing Community

Authors: Alberto Accomazzi, Tamás Budavári, Christopher Fluke, Norman Gray, Robert G Mann, William O'Mullane, Andreas Wicenec, Michael Wise

Abstract: We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In… ▽ More We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In this inaugural editorial, we describe the rationale for creating the journal, outline its scope and ambitions, and seek input from the community in defining in detail how the journal should work towards its high-level goals. △ Less

Submitted 30 October, 2012; originally announced October 2012.

Comments: 5 pages, no figures; editorial for first edition of journal

arXiv:1210.7521 [pdf, other]

doi 10.1017/pas.2012.020

Radio Continuum Surveys with Square Kilometre Array Pathfinders

Authors: Ray P. Norris, J. Afonso, D. Bacon, Rainer Beck, Martin Bell, R. J. Beswick, Philip Best, Sanjay Bhatnagar, Annalisa Bonafede, Gianfranco Brunetti, Tamas Budavari, Rossella Cassano, J. J. Condo, Catherine Cress, Arwa Dabbech, I. Feain, Rob Fender, Chiara Ferrari, B. M. Gaensler, G. Giovannini, Marijke Haverkorn, George Heald, Kurt van der Heyden, A. M. Hopkins, M. Jarvis , et al. (26 additional authors not shown)

Abstract: In the lead-up to the Square Kilometre Array (SKA) project, several next-generation radio telescopes and upgrades are already being built around the world. These include APERTIF (The Netherlands), ASKAP (Australia), eMERLIN (UK), VLA (USA), e-EVN (based in Europe), LOFAR (The Netherlands), Meerkat (South Africa), and the Murchison Widefield Array (MWA). Each of these new instruments has different… ▽ More In the lead-up to the Square Kilometre Array (SKA) project, several next-generation radio telescopes and upgrades are already being built around the world. These include APERTIF (The Netherlands), ASKAP (Australia), eMERLIN (UK), VLA (USA), e-EVN (based in Europe), LOFAR (The Netherlands), Meerkat (South Africa), and the Murchison Widefield Array (MWA). Each of these new instruments has different strengths, and coordination of surveys between them can help maximise the science from each of them. A radio continuum survey is being planned on each of them with the primary science objective of understanding the formation and evolution of galaxies over cosmic time, and the cosmological parameters and large-scale structures which drive it. In pursuit of this objective, the different teams are developing a variety of new techniques, and refining existing ones. Here we describe these projects, their science goals, and the technical challenges which are being addressed to maximise the science return. △ Less

Submitted 28 October, 2012; originally announced October 2012.

Comments: Accepted by PASA, 22 October 2012

arXiv:1209.6490 [pdf]

Spatial Indexing of Large Multidimensional Databases

Authors: István Csabai, Márton Trencséni, Géza Herczegh, László Dobos, Péter Józsa, Norbert Purger, Tamás Budavári, Alexander Szalay

Abstract: Scientific endeavors such as large astronomical surveys generate databases on the terabyte scale. These, usually multidimensional databases must be visualized and mined in order to find interesting objects or to extract meaningful and qualitatively new relationships. Many statistical algorithms required for these tasks run reasonably fast when operating on small sets of in-memory data, but take no… ▽ More Scientific endeavors such as large astronomical surveys generate databases on the terabyte scale. These, usually multidimensional databases must be visualized and mined in order to find interesting objects or to extract meaningful and qualitatively new relationships. Many statistical algorithms required for these tasks run reasonably fast when operating on small sets of in-memory data, but take noticeable performance hits when operating on large databases that do not fit into memory. We utilize new software technologies to develop and evaluate fast multidimensional indexing schemes that inherently follow the underlying, highly non-uniform distribution of the data: they are layered uniform grid indices, hierarchical binary space partitioning, and sampled flat Voronoi tessellation of the data. Our working database is the 5-dimensional magnitude space of the Sloan Digital Sky Survey with more than 270 million data points, where we show that these techniques can dramatically speed up data mining operations such as finding similar objects by example, classifying objects or comparing extensive simulation sets with observations. We are also developing tools to interact with the multidimensional database and visualize the data at multiple resolutions in an adaptive manner. △ Less

Submitted 28 September, 2012; originally announced September 2012.

Comments: 12 pages, 16 figures; CIDR 2007

arXiv:1206.5021 [pdf, ps, other]

SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases

Authors: László Dobos, Tamás Budavári, Nolan Li, Alexander S. Szalay, István Csabai

Abstract: Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclu… ▽ More Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while the ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. The varying statistical error of position measurements, moving and extended objects, and other physical properties make it necessary to perform the cross-identification using a mathematically correct, proper Bayesian probabilistic algorithm, capable of including various priors. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets. △ Less

Submitted 21 June, 2012; originally announced June 2012.

arXiv:1206.0644 [pdf, ps, other]

doi 10.1088/0004-637X/761/2/188

Catalog Matching with Astrometric Correction and its Application to the Hubble Legacy Archive

Authors: Tamas Budavari, Stephen H. Lubow

Abstract: Object cross-identification in multiple observations is often complicated by the uncertainties in their astrometric calibration. Due to the lack of standard reference objects, an image with a small field of view can have significantly larger errors in its absolute positioning than the relative precision of the detected sources within. We present a new general solution for the relative astrometry t… ▽ More Object cross-identification in multiple observations is often complicated by the uncertainties in their astrometric calibration. Due to the lack of standard reference objects, an image with a small field of view can have significantly larger errors in its absolute positioning than the relative precision of the detected sources within. We present a new general solution for the relative astrometry that quickly refines the World Coordinate System of overlapping fields. The efficiency is obtained through the use of infinitesimal 3-D rotations on the celestial sphere, which do not involve trigonometric functions. They also enable an analytic solution to an important step in making the astrometric corrections. In cases with many overlapping images, the correct identification of detections that match together across different images is difficult to determine. We describe a new greedy Bayesian approach for selecting the best object matches across a large number of overlapping images. The methods are developed and demonstrated on the Hubble Legacy Archive, one of the most challenging data sets today. We describe a novel catalog compiled from many Hubble Space Telescope observations, where the detections are combined into a searchable collection of matches that link the individual detections. The matches provide descriptions of astronomical objects involving multiple wavelengths and epochs. High relative positional accuracy of objects is achieved across the Hubble images, often sub-pixel precision in the order of just a few milli-arcseconds. The result is a reliable set of high-quality associations that are publicly available online. △ Less

Submitted 31 October, 2012; v1 submitted 4 June, 2012; originally announced June 2012.

Comments: 9 pages, 9 figures, accepted for publication in the Astrophysical Journal

arXiv:1204.3055 [pdf, ps, other]

doi 10.5479/ADS/bib/2011ivoa.spec.1120M

IVOA Recommendation: Spectrum Data Model 1.1

Authors: Jonathan McDowell, Doug Tody, Tamas Budavari, Markus Dolensky, Inga Kamp, Kelly McCusker, Pavlos Protopapas, Arnold Rots, Randy Thompson, Frank Valdes, Petr Skoda, Bruno Rino, Sebastien Derriere, Jesus Salgado, Omar Laurino, the IVOA Data Access Layer, Data Model Working Groups

Abstract: We present a data model describing the structure of spectrophotometric datasets with spectral and temporal coordinates and associated metadata. This data model may be used to represent spectra, time series data, segments of SED (Spectral Energy Distributions) and other spectral or temporal associations. We present a data model describing the structure of spectrophotometric datasets with spectral and temporal coordinates and associated metadata. This data model may be used to represent spectra, time series data, segments of SED (Spectral Energy Distributions) and other spectral or temporal associations. △ Less

Submitted 13 April, 2012; originally announced April 2012.

Comments: http://www.ivoa.net

Report number: REC-SpectrumDM-1.1-20111120

arXiv:1203.5725 [pdf]

doi 10.5479/ADS/bib/2012ivoa.spec.0210T

IVOA Recommendation: Simple Spectral Access Protocol Version 1.1

Authors: Doug Tody, Markus Dolensky, Jonathan McDowell, Francois Bonnarel, Tamas Budavari, Ivo Busko, Alberto Micol, Pedro Osuna, Jesus Salgado, Petr Skoda, Randy Thompson, Frank Valdes, the Data Access Layer working group

Abstract: The Simple Spectral Access (SSA) Protocol (SSAP) defines a uniform interface to remotely discover and access one dimensional spectra. SSA is a member of an integrated family of data access interfaces altogether comprising the Data Access Layer (DAL) of the IVOA. SSA is based on a more general data model capable of describing most tabular spectrophotometric data, including time series and spectral… ▽ More The Simple Spectral Access (SSA) Protocol (SSAP) defines a uniform interface to remotely discover and access one dimensional spectra. SSA is a member of an integrated family of data access interfaces altogether comprising the Data Access Layer (DAL) of the IVOA. SSA is based on a more general data model capable of describing most tabular spectrophotometric data, including time series and spectral energy distributions (SEDs) as well as 1-D spectra; however the scope of the SSA interface as specified in this document is limited to simple 1-D spectra, including simple aggregations of 1-D spectra. The form of the SSA interface is simple: clients first query the global resource registry to find services of interest and then issue a data discovery query to selected services to determine what relevant data is available from each service; the candidate datasets available are described uniformly in a VOTable format document which is returned in response to the query. Finally, the client may retrieve selected datasets for analysis. Spectrum datasets returned by an SSA spectrum service may be either precomputed, archival datasets, or they may be virtual data which is computed on the fly to respond to a client request. Spectrum datasets may conform to a standard data model defined by SSA, or may be native spectra with custom project-defined content. Spectra may be returned in any of a number of standard data formats. Spectral data is generally stored externally to the VO in a format specific to each spectral data collection; currently there is no standard way to represent astronomical spectra, and virtually every project does it differently. Hence spectra may be actively mediated to the standard SSA-defined data model at access time by the service, so that client analysis programs do not have to be familiar with the idiosyncratic details of each data collection to be accessed. △ Less

Submitted 26 March, 2012; originally announced March 2012.

Report number: REC-SSA-1.1-20120210

Showing 1–50 of 125 results for author: Budavari, T