-
AstroInformatics: Recommendations for Global Cooperation
Authors:
Ashish Mahabal,
Pranav Sharma,
Rana Adhikari,
Mark Allen,
Stefano Andreon,
Varun Bhalerao,
Federica Bianco,
Anthony Brown,
S. Bradley Cenko,
Paula Coehlo,
Jeffery Cooke,
Daniel Crichton,
Chenzhou Cui,
Reinaldo de Carvalho,
Richard Doyle,
Laurent Eyer,
Bernard Fanaroff,
Christopher Fluke,
Francisco Forster,
Kevin Govender,
Matthew J. Graham,
Renée Hložek,
Puji Irawati,
Ajit Kembhavi,
Juna Kollmeier
, et al. (23 additional authors not shown)
Abstract:
Policy Brief on "AstroInformatics, Recommendations for Global Collaboration", distilled from panel discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023.
The deliberations encompassed a wide array of topics, including broad astroinformatics, sky surveys, large-scale international initiatives, global data repositories, space-related data, regi…
▽ More
Policy Brief on "AstroInformatics, Recommendations for Global Collaboration", distilled from panel discussions during S20 Policy Webinar on Astroinformatics for Sustainable Development held on 6-7 July 2023.
The deliberations encompassed a wide array of topics, including broad astroinformatics, sky surveys, large-scale international initiatives, global data repositories, space-related data, regional and international collaborative efforts, as well as workforce development within the field. These discussions comprehensively addressed the current status, notable achievements, and the manifold challenges that the field of astroinformatics currently confronts.
The G20 nations present a unique opportunity due to their abundant human and technological capabilities, coupled with their widespread geographical representation. Leveraging these strengths, significant strides can be made in various domains. These include, but are not limited to, the advancement of STEM education and workforce development, the promotion of equitable resource utilization, and contributions to fields such as Earth Science and Climate Science.
We present a concise overview, followed by specific recommendations that pertain to both ground-based and space data initiatives. Our team remains readily available to furnish further elaboration on any of these proposals as required. Furthermore, we anticipate further engagement during the upcoming G20 presidencies in Brazil (2024) and South Africa (2025) to ensure the continued discussion and realization of these objectives.
The policy webinar took place during the G20 presidency in India (2023). Notes based on the seven panels will be separately published.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Estimation of line-of-sight velocities of individual galaxies using neural networks I. Modelling redshift-space distortions at large scales
Authors:
Hongxiang Chen,
Jie Wang,
Tianxiang Mao,
Juntao Ma,
Yuxi Meng,
Baojiu Li,
Yan-Chuan Cai,
Mark Neyrinck,
Bridget Falck,
Alexander S. Szalay
Abstract:
We present a scheme based on artificial neural networks (ANN) to estimate the line-of-sight velocities of individual galaxies from an observed redshift-space galaxy distribution. We find an estimate of the peculiar velocity at a galaxy based on galaxy counts and barycenters in shells around it. By training the network with environmental characteristics, such as the total mass and mass center withi…
▽ More
We present a scheme based on artificial neural networks (ANN) to estimate the line-of-sight velocities of individual galaxies from an observed redshift-space galaxy distribution. We find an estimate of the peculiar velocity at a galaxy based on galaxy counts and barycenters in shells around it. By training the network with environmental characteristics, such as the total mass and mass center within each shell surrounding every galaxy in redshift space, our ANN model can accurately predict the line-of-sight velocity of each individual galaxy. When this velocity is used to eliminate the RSD effect, the two-point correlation function (TPCF) in real space can be recovered with an accuracy better than 1% at $s$ > 8 $h^{-1}\mathrm{Mpc}$, and 4% on all scales compared to ground truth. The real-space power spectrum can be recovered within 3% on $k$< 0.5 $\mathrm{Mpc}^{-1}h$, and less than 5% for all $k$ modes. The quadrupole moment of the TPCF or power spectrum is almost zero down to $s$ = 10 $h^{-1}\mathrm{Mpc}$ or all $k$ modes, indicating an effective correction of the spatial anisotropy caused by the RSD effect. We demonstrate that on large scales, without additional training with new data, our network is adaptable to different galaxy formation models, different cosmological models, and mock galaxy samples at high redshifts and high biases, achieving less than 10% error for scales greater than 15 $h^{-1}\mathrm{Mpc}$. As it is sensitive to large-scale densities, it does not manage to remove Fingers of God in large clusters, but works remarkably well at recovering real-space galaxy positions elsewhere. Our scheme provides a novel way to predict the peculiar velocity of individual galaxies, to eliminate the RSD effect directly in future large galaxy surveys, and to reconstruct the 3-D cosmic velocity field accurately.
△ Less
Submitted 8 July, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Comparing local energy cascade rates in isotropic turbulence using structure function and filtering formulations
Authors:
H. Yao,
M. Schnaubelt,
A. Szalay,
T. Zaki,
C. Meneveau
Abstract:
Two common definitions of the spatially local rate of kinetic energy cascade at some scale $\ell$ in turbulent flows are (i) the cubic velocity difference term appearing in the generalized Kolmogorov-Hill equation (GKHE) (structure function approach), and (ii) the subfilter-scale energy flux term in the transport equation for subgrid-scale kinetic energy (filtering approach). We perform a comparat…
▽ More
Two common definitions of the spatially local rate of kinetic energy cascade at some scale $\ell$ in turbulent flows are (i) the cubic velocity difference term appearing in the generalized Kolmogorov-Hill equation (GKHE) (structure function approach), and (ii) the subfilter-scale energy flux term in the transport equation for subgrid-scale kinetic energy (filtering approach). We perform a comparative study of both quantities based on direct numerical simulation data of isotropic turbulence at Taylor-scale Reynolds number of 1250. While observations of negative subfilter-scale energy flux (backscatter) have in the past led to debates regarding interpretation and relevance of such observations, we argue that the interpretation of the local structure function-based cascade rate definition is unambiguous since it arises from a divergence term in scale space. Conditional averaging is used to explore the relationship between the local cascade rate and the local filtered viscous dissipation rate as well as filtered velocity gradient tensor properties such as its invariants. We find statistically robust evidence of inverse cascade when both the large-scale rotation rate is strong and the large-scale strain rate is weak. Even stronger net inverse cascading is observed in the ``vortex compression'' $R>0$, $Q>0$ quadrant where $R$ and $Q$ are velocity gradient invariants. Qualitatively similar, but quantitatively much weaker trends are observed for the conditionally averaged subfilter scale energy flux. Flow visualizations show consistent trends, namely that spatially the inverse cascade events appear to be located within large-scale vortices, specifically in subregions when $R$ is large.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Playing catch-up in building an open research commons
Authors:
Philip E. Bourne,
Vivien Bonazzi,
Amy Brand,
Bonnie Carroll,
Ian Foster,
Ramanathan V. Guha,
Robert Hanisch,
Sallie Ann Keller,
Mary Lee Kennedy,
Christine Kirkpatrick,
Barend Mons,
Sarah M. Nusser,
Michael Stebbins,
George Strawn,
Alex Szalay
Abstract:
On August 2, 2021 a group of concerned scientists and US funding agency and federal government officials met for an informal discussion to explore the value and need for a well-coordinated US Open Research Commons (ORC); an interoperable collection of data and compute resources within both the public and private sectors which are easy to use and accessible to all.
On August 2, 2021 a group of concerned scientists and US funding agency and federal government officials met for an informal discussion to explore the value and need for a well-coordinated US Open Research Commons (ORC); an interoperable collection of data and compute resources within both the public and private sectors which are easy to use and accessible to all.
△ Less
Submitted 15 July, 2022;
originally announced August 2022.
-
Now is the time to build a national data ecosystem for materials science and chemistry research data
Authors:
E. M. Campo,
S. Shankar,
A. S. Szalay,
R. J. Hanisch
Abstract:
A call for coordinated action from government, academia, and industry.
A call for coordinated action from government, academia, and industry.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
A Light-weight Interpretable Compositional Model for Nuclei Detection and Weakly-Supervised Segmentation
Authors:
Yixiao Zhang,
Adam Kortylewski,
Qing Liu,
Seyoun Park,
Benjamin Green,
Elizabeth Engle,
Guillermo Almodovar,
Ryan Walk,
Sigfredo Soto-Diaz,
Janis Taube,
Alex Szalay,
Alan Yuille
Abstract:
The field of computational pathology has witnessed great advancements since deep neural networks have been widely applied. These networks usually require large numbers of annotated data to train vast parameters. However, it takes significant effort to annotate a large histopathology dataset. We introduce a light-weight and interpretable model for nuclei detection and weakly-supervised segmentation…
▽ More
The field of computational pathology has witnessed great advancements since deep neural networks have been widely applied. These networks usually require large numbers of annotated data to train vast parameters. However, it takes significant effort to annotate a large histopathology dataset. We introduce a light-weight and interpretable model for nuclei detection and weakly-supervised segmentation. It only requires annotations on isolated nucleus, rather than on all nuclei in the dataset. Besides, it is a generative compositional model that first locates parts of nucleus, then learns the spatial correlation of the parts to further locate the nucleus. This process brings interpretability in its prediction. Empirical results on an in-house dataset show that in detection, the proposed method achieved comparable or better performance than its deep network counterparts, especially when the annotated data is limited. It also outperforms popular weakly-supervised segmentation methods. The proposed method could be an alternative solution for the data-hungry problem of deep learning methods.
△ Less
Submitted 9 August, 2022; v1 submitted 26 October, 2021;
originally announced October 2021.
-
Wireless sensor network for in situ soil moisture monitoring
Authors:
Jianing Fang,
Chuheng Hu,
Nour Smaoui,
Doug Carlson,
Jayant Gupchup,
Razvan Musaloiu-E.,
Chieh-Jan Mike Liang,
Marcus Chang,
Omprakash Gnawali,
Tamas Budavari,
Andreas Terzis,
Katalin Szlavecz,
Alexander S. Szalay
Abstract:
We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environm…
▽ More
We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Indra: a Public Computationally Accessible Suite of Cosmological $N$-body Simulations
Authors:
Bridget Falck,
Jie Wang,
Adrian Jenkins,
Gerard Lemson,
Dmitry Medvedev,
Mark C. Neyrinck,
Alex S. Szalay
Abstract:
Indra is a suite of large-volume cosmological $N$-body simulations with the goal of providing excellent statistics of the large-scale features of the distribution of dark matter. Each of the 384 simulations is computed with the same cosmological parameters and different initial phases, with 1024$^3$ dark matter particles in a box of length 1 Gpc/h, 64 snapshots of particle data and halo catalogs,…
▽ More
Indra is a suite of large-volume cosmological $N$-body simulations with the goal of providing excellent statistics of the large-scale features of the distribution of dark matter. Each of the 384 simulations is computed with the same cosmological parameters and different initial phases, with 1024$^3$ dark matter particles in a box of length 1 Gpc/h, 64 snapshots of particle data and halo catalogs, and 505 time steps of the Fourier modes of the density field, amounting to almost a petabyte of data. All of the Indra data are immediately available for analysis via the SciServer science platform, which provides interactive and batch computing modes, personal data storage, and other hosted data sets such as the Millennium simulations and many astronomical surveys. We present the Indra simulations, describe the data products and how to access them, and measure ensemble averages, variances, and covariances of the matter power spectrum, the matter correlation function, and the halo mass function to demonstrate the types of computations that Indra enables. We hope that Indra will be both a resource for large-scale structure research and a demonstration of how to make very large datasets public and computationally accessible.
△ Less
Submitted 3 August, 2021; v1 submitted 10 January, 2021;
originally announced January 2021.
-
Sketch and Scale: Geo-distributed tSNE and UMAP
Authors:
Viska Wei,
Nikita Ivkin,
Vladimir Braverman,
Alexander Szalay
Abstract:
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools s…
▽ More
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem in the world of data management policies ensuring privacy and data security. Visualizing high dimensional data using tools such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) became common practice for data scientists. Both tools scale poorly in time and memory. While recent optimizations showed successful handling of 10,000 data points, scaling beyond million points is still challenging. We introduce a novel framework: Sketch and Scale (SnS). It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary, representing the densest areas, extracted from the aggregated sketch. We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe. We demonstrate the power of our method on two mid-size datasets: cancer data with 52 million 35-band pixels from multiple images of tumor biopsies; and astrophysics data of 100 million stars with multi-color photometry from the Sloan Digital Sky Survey (SDSS).
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Baryon acoustic oscillations reconstruction using convolutional neural networks
Authors:
Tian-Xiang Mao,
Jie Wang,
Baojiu Li,
Yan-Chuan Cai,
Bridget Falck,
Mark Neyrinck,
Alex Szalay
Abstract:
We propose a new scheme to reconstruct the baryon acoustic oscillations (BAO) signal, which contains key cosmological information, based on deep convolutional neural networks (CNN). Trained with almost no fine-tuning, the network can recover large-scale modes accurately in the test set: the correlation coefficient between the true and reconstructed initial conditions reaches $90\%$ at…
▽ More
We propose a new scheme to reconstruct the baryon acoustic oscillations (BAO) signal, which contains key cosmological information, based on deep convolutional neural networks (CNN). Trained with almost no fine-tuning, the network can recover large-scale modes accurately in the test set: the correlation coefficient between the true and reconstructed initial conditions reaches $90\%$ at $k\leq 0.2 h\mathrm{Mpc}^{-1}$, which can lead to significant improvements of the BAO signal-to-noise ratio down to $k\simeq0.4h\mathrm{Mpc}^{-1}$. Since this new scheme is based on the configuration-space density field in sub-boxes, it is local and less affected by survey boundaries than the standard reconstruction method, as our tests confirm. We find that the network trained in one cosmology is able to reconstruct BAO peaks in the others, i.e. recovering information lost to non-linearity independent of cosmology. The accuracy of recovered BAO peak positions is far less than that caused by the difference in the cosmology models for training and testing, suggesting that different models can be distinguished efficiently in our scheme. It is very promising that Our scheme provides a different new way to extract the cosmological information from the ongoing and future large galaxy surveys.
△ Less
Submitted 3 December, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
SciServer: a Science Platform for Astronomy and Beyond
Authors:
Manuchehr Taghizadeh-Popp,
Jai Won Kim,
Gerard Lemson,
Dmitry Medvedev,
M. Jordan Raddick,
Alexander S. Szalay,
Aniruddha R. Thakar,
Joseph Booker,
Camy Chhetri,
Laszlo Dobos,
Michael Rippin
Abstract:
We present SciServer, a science platform built and supported by the Institute for Data Intensive Engineering and Science at the Johns Hopkins University. SciServer builds upon and extends the SkyServer system of server-side tools that introduced the astronomical community to SQL (Structured Query Language) and has been serving the Sloan Digital Sky Survey catalog data to the public. SciServer uses…
▽ More
We present SciServer, a science platform built and supported by the Institute for Data Intensive Engineering and Science at the Johns Hopkins University. SciServer builds upon and extends the SkyServer system of server-side tools that introduced the astronomical community to SQL (Structured Query Language) and has been serving the Sloan Digital Sky Survey catalog data to the public. SciServer uses a Docker/VM based architecture to provide interactive and batch mode server-side analysis with scripting languages like Python and R in various environments including Jupyter (notebooks), RStudio and command-line in addition to traditional SQL-based data analysis. Users have access to private file storage as well as personal SQL database space. A flexible resource access control system allows users to share their resources with collaborators, a feature that has also been very useful in classroom environments. All these services, wrapped in a layer of REST APIs, constitute a scalable collaborative data-driven science platform that is attractive to science disciplines beyond astronomy.
△ Less
Submitted 4 September, 2020; v1 submitted 23 January, 2020;
originally announced January 2020.
-
Six Dimensional Streaming Algorithm for Cluster Finding in N-Body Simulations
Authors:
Aidan Reilly,
Nikita Ivkin,
Gerard Lemson,
Vladimir Braverman,
Alexander Szalay
Abstract:
Cosmological N-body simulations are crucial for understanding how the Universe evolves. Studying large-scale distributions of matter in these simulations and comparing them to observations usually involves detecting dense clusters of particles called "halos,'' which are gravitationally bound and expected to form galaxies. However, traditional cluster finders are computationally expensive and use m…
▽ More
Cosmological N-body simulations are crucial for understanding how the Universe evolves. Studying large-scale distributions of matter in these simulations and comparing them to observations usually involves detecting dense clusters of particles called "halos,'' which are gravitationally bound and expected to form galaxies. However, traditional cluster finders are computationally expensive and use massive amounts of memory. Recent work by Liu et al (Liu et al. (2015)) showed the connection between cluster detection and memory-efficient streaming algorithms and presented a halo finder based on heavy hitter algorithm. Later, Ivkin et al. (Ivkin et al. (2018)) improved the scalability of suggested streaming halo finder with efficient GPU implementation. Both works map particles' positions onto a discrete grid, and therefore lose the rest of the information, such as their velocities. Therefore, two halos travelling through each other are indistinguishable in positional space, while the velocity distribution of those halos can help to identify this process which is worth further studying. In this project we analyze data from the Millennium Simulation Project (Springel et al. (2005)) to motivate the inclusion of the velocity into streaming method we introduce. We then demonstrate a use of suggested method, which allows one to find the same halos as before, while also detecting those which were indistinguishable in prior methods.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
Petabytes to Science
Authors:
Amanda E. Bauer,
Eric C. Bellm,
Adam S. Bolton,
Surajit Chaudhuri,
A. J. Connolly,
Kelle L. Cruz,
Vandana Desai,
Alex Drlica-Wagner,
Frossie Economou,
Niall Gaffney,
J. Kavelaars,
J. Kinney,
Ting S. Li,
B. Lundgren,
R. Margutti,
G. Narayan,
B. Nord,
Dara J. Norman,
W. O'Mullane,
S. Padhi,
J. E. G. Peek,
C. Schafer,
Megan E. Schwamb,
Arfon M. Smith,
Erik J. Tollerud
, et al. (2 additional authors not shown)
Abstract:
A Kavli foundation sponsored workshop on the theme \emph{Petabytes to Science} was held 12$^{th}$ to 14$^{th}$ of February 2019 in Las Vegas. The aim of the this workshop was to discuss important trends and technologies which may support astronomy. We also tackled how to better shape the workforce for the new trends and how we should approach education and public outreach. This document was coauth…
▽ More
A Kavli foundation sponsored workshop on the theme \emph{Petabytes to Science} was held 12$^{th}$ to 14$^{th}$ of February 2019 in Las Vegas. The aim of the this workshop was to discuss important trends and technologies which may support astronomy. We also tackled how to better shape the workforce for the new trends and how we should approach education and public outreach. This document was coauthored during the workshop and edited in the weeks after. It comprises the discussions and highlights many recommendations which came out of the workshop.
We shall distill parts of this document and formulate potential white papers for the decadal survey.
△ Less
Submitted 17 November, 2019; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Halo Spin from Primordial Inner Motions
Authors:
Mark C. Neyrinck,
Miguel A. Aragon-Calvo,
Bridget Falck,
Alexander S. Szalay,
Jie Wang
Abstract:
The standard explanation for galaxy spin starts with the tidal-torque theory (TTT), in which an ellipsoidal dark-matter protohalo, which comes to host the galaxy, is torqued up by the tidal gravitational field around it. We discuss a complementary picture, using the relatively familiar velocity field, instead of the tidal field, whose intuitive connection to the surrounding, possibly faraway matte…
▽ More
The standard explanation for galaxy spin starts with the tidal-torque theory (TTT), in which an ellipsoidal dark-matter protohalo, which comes to host the galaxy, is torqued up by the tidal gravitational field around it. We discuss a complementary picture, using the relatively familiar velocity field, instead of the tidal field, whose intuitive connection to the surrounding, possibly faraway matter arrangement is more obscure. In this 'spin from primordial inner motions' (SPIM) concept, implicit in TTT derivations but not previously emphasized, the angular momentum from the gravity-sourced velocity field inside a protohalo largely cancels out, but has some excess from the aspherical outskirts. At first, the net spin scales according to linear theory, a sort of comoving conservation of familiar angular momentum. Then, at collapse, it is conserved in physical coordinates. Small haloes are then typically subject to secondary exchanges of angular momentum. The TTT is useful for analytic estimates. But a literal interpretation of the TTT is inaccurate in detail, without some implicit concepts about smoothing of the velocity and tidal fields. This could lead to misconceptions, for those first learning about how galaxies come to spin. Protohaloes are not perfectly ellipsoidal and do not uniformly torque up, as in a naive interpretation of the TTT; their inner velocity fields retain substantial dispersion. Furthermore, quantitatively, given initial conditions and protohalo boundaries, SPIM is more direct and accurate than the TTT to predict halo spins. We also discuss how SPIM applies to rotating filaments, and the relation between halo mass and spin, in which the total spin of a halo can be thought of as a sum of random contributions.
△ Less
Submitted 12 March, 2020; v1 submitted 5 April, 2019;
originally announced April 2019.
-
The Wide Field Infrared Survey Telescope: 100 Hubbles for the 2020s
Authors:
Rachel Akeson,
Lee Armus,
Etienne Bachelet,
Vanessa Bailey,
Lisa Bartusek,
Andrea Bellini,
Dominic Benford,
David Bennett,
Aparna Bhattacharya,
Ralph Bohlin,
Martha Boyer,
Valerio Bozza,
Geoffrey Bryden,
Sebastiano Calchi Novati,
Kenneth Carpenter,
Stefano Casertano,
Ami Choi,
David Content,
Pratika Dayal,
Alan Dressler,
Olivier Doré,
S. Michael Fall,
Xiaohui Fan,
Xiao Fang,
Alexei Filippenko
, et al. (81 additional authors not shown)
Abstract:
The Wide Field Infrared Survey Telescope (WFIRST) is a 2.4m space telescope with a 0.281 deg^2 field of view for near-IR imaging and slitless spectroscopy and a coronagraph designed for > 10^8 starlight suppresion. As background information for Astro2020 white papers, this article summarizes the current design and anticipated performance of WFIRST. While WFIRST does not have the UV imaging/spectro…
▽ More
The Wide Field Infrared Survey Telescope (WFIRST) is a 2.4m space telescope with a 0.281 deg^2 field of view for near-IR imaging and slitless spectroscopy and a coronagraph designed for > 10^8 starlight suppresion. As background information for Astro2020 white papers, this article summarizes the current design and anticipated performance of WFIRST. While WFIRST does not have the UV imaging/spectroscopic capabilities of the Hubble Space Telescope, for wide field near-IR surveys WFIRST is hundreds of times more efficient. Some of the most ambitious multi-cycle HST Treasury programs could be executed as routine General Observer (GO) programs on WFIRST. The large area and time-domain surveys planned for the cosmology and exoplanet microlensing programs will produce extraordinarily rich data sets that enable an enormous range of Archival Research (AR) investigations. Requirements for the coronagraph are defined based on its status as a technology demonstration, but its expected performance will enable unprecedented observations of nearby giant exoplanets and circumstellar disks. WFIRST is currently in the Preliminary Design and Technology Completion phase (Phase B), on schedule for launch in 2025, with several of its critical components already in production.
△ Less
Submitted 14 February, 2019;
originally announced February 2019.
-
The Perils of Detecting Measurement Faults in Environmental Monitoring Networks
Authors:
Jayant Gupchup,
Abhishek Sharma,
Andreas Terzis,
Randal Burns,
Alex Szalay
Abstract:
Scientists deploy environmental monitoring net-works to discover previously unobservable phenomena and quantify subtle spatial and temporal differences in the physical quantities they measure. Our experience, shared by others, has shown that measurements gathered by such networks are perturbed by sensor faults. In response, multiple fault detection techniques have been proposed in the literature.…
▽ More
Scientists deploy environmental monitoring net-works to discover previously unobservable phenomena and quantify subtle spatial and temporal differences in the physical quantities they measure. Our experience, shared by others, has shown that measurements gathered by such networks are perturbed by sensor faults. In response, multiple fault detection techniques have been proposed in the literature. However, in this paper we argue that these techniques may misclassify events (e.g. rain events for soil moisture measurements) as faults, potentially discarding the most interesting measurements. We support this argument by applying two commonly used fault detection techniques on data collected from a soil monitoring network. Our results show that in this case, up to 45% of the event measurements are misclassified as faults. Furthermore, tuning the fault detection algorithms to avoid event misclassification, causes them to miss the majority of actual faults. In addition to exposing the tension between fault and event detection, our findings motivate the need to develop novel fault detection mechanisms which incorporate knowledge of the underlying events and are customized to the sensing modality they monitor.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
Phoenix: An Epidemic Approach to Time Reconstruction
Authors:
Jayant Gupchup,
Douglas Carlson,
Răzvan Musăloiu-E.,
Alex Szalay,
Andreas Terzis
Abstract:
Harsh deployment environments and uncertain run-time conditions create numerous challenges for postmortem time reconstruction methods. For example, motes often reboot and thus lose their clock state, considering that the majority of mote platforms lack a real-time clock. While existing time reconstruction methods for long-term data gathering networks rely on a persistent basestation for assigning…
▽ More
Harsh deployment environments and uncertain run-time conditions create numerous challenges for postmortem time reconstruction methods. For example, motes often reboot and thus lose their clock state, considering that the majority of mote platforms lack a real-time clock. While existing time reconstruction methods for long-term data gathering networks rely on a persistent basestation for assigning global timestamps to measurements, the basestation may be unavailable due to hardware and software faults. We present Phoenix, a novel offline algorithm for reconstructing global timestamps that is robust to frequent mote reboots and does not require a persistent global time source. This independence sets Phoenix apart from the majority of time reconstruction algorithms which assume that such a source is always available. Motes in Phoenix exchange their time-related state with their neighbors, establishing a chain of transitive temporal relationships to one or more motes with references to the global time. These relationships allow Phoenix to reconstruct the measurement timeline for each mote. Results from simulations and a deployment indicate that Phoenix can achieve timing accuracy up to 6 ppm for 99% of the collected measurements. Phoenix is able to maintain this performance for periods that last for months without a persistent global time source. To achieve this level of performance for the targeted environmental monitoring application, Phoenix requires an additional space overhead of 4% and an additional duty cycle of 0.2%.
△ Less
Submitted 2 February, 2019;
originally announced February 2019.
-
Sundial: Using Sunlight to Reconstruct Global Timestamps
Authors:
Jayant Gupchup,
Răzvan Musăloiu-E.,
Alex Szalay,
Andreas Terzis
Abstract:
This paper investigates postmortem timestamp reconstruction in environmental monitoring networks. In the absence of a time-synchronization protocol, these networks use multiple pairs of (local, global) timestamps to retroactively estimate the motes' clock drift and offset and thus reconstruct the measurement time series. We present Sundial, a novel offline algorithm for reconstructing global times…
▽ More
This paper investigates postmortem timestamp reconstruction in environmental monitoring networks. In the absence of a time-synchronization protocol, these networks use multiple pairs of (local, global) timestamps to retroactively estimate the motes' clock drift and offset and thus reconstruct the measurement time series. We present Sundial, a novel offline algorithm for reconstructing global timestamps that is robust to unreliable global clock sources. Sundial reconstructs timestamps by correlating annual solar patterns with measurements provided by the motes' inexpensive light sensors. The surprising ability to accurately estimate the length of day using light intensity measurements enables Sundial to be robust to arbitrary mote clock restarts. Experimental results, based on multiple environmental network deployments spanning a period of over 2.5 years, show that Sundial achieves accuracy as high as 10 parts per million (ppm), using solar radiation readings recorded at 20 minute intervals.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
StePS: A Multi-GPU Cosmological N-body Code for Compactified Simulations
Authors:
Gábor Rácz,
István Szapudi,
László Dobos,
István Csabai,
Alexander S. Szalay
Abstract:
We present the multi-GPU realization of the StePS (Stereographically Projected Cosmological Simulations) algorithm with MPI-OpenMP-CUDA hybrid parallelization and nearly ideal scale-out to multiple compute nodes. Our new zoom-in cosmological direct N-body simulation method simulates the infinite universe with unprecedented dynamic range for a given amount of memory and, in contrast to traditional…
▽ More
We present the multi-GPU realization of the StePS (Stereographically Projected Cosmological Simulations) algorithm with MPI-OpenMP-CUDA hybrid parallelization and nearly ideal scale-out to multiple compute nodes. Our new zoom-in cosmological direct N-body simulation method simulates the infinite universe with unprecedented dynamic range for a given amount of memory and, in contrast to traditional periodic simulations, its fundamental geometry and topology match observations. By using a spherical geometry instead of periodic boundary conditions, and gradually decreasing the mass resolution with radius, our code is capable of running simulations with a few gigaparsecs in diameter and with a mass resolution of $\sim 10^{9}M_{\odot}$ in the center in four days on three compute nodes with four GTX 1080Ti GPUs in each. The code can also be used to run extremely fast simulations with reasonable resolution for fitting cosmological parameters. These simulations are useful for prediction needs of large surveys. The StePS code is publicly available for the research community.
△ Less
Submitted 21 March, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Scalable Streaming Tools for Analyzing $N$-body Simulations: Finding Halos and Investigating Excursion Sets in One Pass
Authors:
Nikita Ivkin,
Zaoxing Liu,
Lin F. Yang,
Srinivas Suresh Kumar,
Gerard Lemson,
Mark Neyrinck,
Alexander S. Szalay,
Vladimir Braverman,
Tamas Budavari
Abstract:
Cosmological $N$-body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly l…
▽ More
Cosmological $N$-body simulations play a vital role in studying models for the evolution of the Universe. To compare to observations and make a scientific inference, statistic analysis on large simulation datasets, e.g., finding halos, obtaining multi-point correlation functions, is crucial. However, traditional in-memory methods for these tasks do not scale to the datasets that are forbiddingly large in modern simulations. Our prior paper proposes memory-efficient streaming algorithms that can find the largest halos in a simulation with up to $10^9$ particles on a small server or desktop. However, this approach fails when directly scaling to larger datasets. This paper presents a robust streaming tool that leverages state-of-the-art techniques on GPU boosting, sampling, and parallel I/O, to significantly improve performance and scalability. Our rigorous analysis of the sketch parameters improves the previous results from finding the centers of the $10^3$ largest halos to $\sim 10^4-10^5$, and reveals the trade-offs between memory, running time and number of halos. Our experiments show that our tool can scale to datasets with up to $\sim 10^{12}$ particles while using less than an hour of running time on a single GPU Nvidia GTX 1080.
△ Less
Submitted 28 April, 2018; v1 submitted 2 November, 2017;
originally announced November 2017.
-
Geometry and Scaling Laws of Excursion and Iso-sets of Enstrophy and Dissipation in Isotropic Turbulence
Authors:
José Hugo Elsas,
Alexander S. Szalay,
Charles Meneveau
Abstract:
Motivated by interest in the geometry of high intensity events of turbulent flows, we examine spatial correlation functions of sets where turbulent events are particularly intense. These sets are defined using indicator functions on excursion and iso-value sets. Their geometric scaling properties are analyzed by examining possible power-law decay of their radial correlation function. We apply the…
▽ More
Motivated by interest in the geometry of high intensity events of turbulent flows, we examine spatial correlation functions of sets where turbulent events are particularly intense. These sets are defined using indicator functions on excursion and iso-value sets. Their geometric scaling properties are analyzed by examining possible power-law decay of their radial correlation function. We apply the analysis to enstrophy, dissipation, and velocity gradient invariants $Q$ and $R$ and their joint spatial distibutions, using data from a direct numerical simulation of isotropic turbulence at ${\rm Re}_λ\approx 430$. While no fractal scaling is found in the inertial range using box-counting in the finite Reynolds number flow considered here, power-law scaling in the inertial range is found in the radial correlation functions. Thus a geometric characterization in terms of these sets' correlation dimension is possible. Strong dependence on the enstrophy and dissipation threshold is found, consistent with multifractal behavior. Nevertheless the lack of scaling of the box-counting analysis precludes direct quantitative comparisons with earlier work based on the multifractal formalism. Surprising trends, such as a lower correlation dimension for strong dissipation events compared to strong enstrophy events, are observed and interpreted in terms of spatial coherence of vortices in the flow. We show that sets defined by joint conditions on strain and enstrophy, and on $Q$ and $R$, also display power law scaling of correlation functions, providing further characterization of the complex spatial structure of these intersection sets.
△ Less
Submitted 17 August, 2017;
originally announced August 2017.
-
The Pan-STARRS1 Surveys
Authors:
K. C. Chambers,
E. A. Magnier,
N. Metcalfe,
H. A. Flewelling,
M. E. Huber,
C. Z. Waters,
L. Denneau,
P. W. Draper,
D. Farrow,
D. P. Finkbeiner,
C. Holmberg,
J. Koppenhoefer,
P. A. Price,
A. Rest,
R. P. Saglia,
E. F. Schlafly,
S. J. Smartt,
W. Sweeney,
R. J. Wainscoat,
W. S. Burgett,
S. Chastel,
T. Grav,
J. N. Heasley,
K. W. Hodapp,
R. Jedicke
, et al. (101 additional authors not shown)
Abstract:
Pan-STARRS1 has carried out a set of distinct synoptic imaging sky surveys including the $3π$ Steradian Survey and the Medium Deep Survey in 5 bands ($grizy_{P1}$). The mean 5$σ$ point source limiting sensitivities in the stacked 3$π$ Steradian Survey in $grizy_{P1}$ are (23.3, 23.2, 23.1, 22.3, 21.4) respectively. The upper bound on the systematic uncertainty in the photometric calibration across…
▽ More
Pan-STARRS1 has carried out a set of distinct synoptic imaging sky surveys including the $3π$ Steradian Survey and the Medium Deep Survey in 5 bands ($grizy_{P1}$). The mean 5$σ$ point source limiting sensitivities in the stacked 3$π$ Steradian Survey in $grizy_{P1}$ are (23.3, 23.2, 23.1, 22.3, 21.4) respectively. The upper bound on the systematic uncertainty in the photometric calibration across the sky is 7-12 millimag depending on the bandpass. The systematic uncertainty of the astrometric calibration using the Gaia frame comes from a comparison of the results with Gaia: the standard deviation of the mean and median residuals ($ Δra, Δdec $) are (2.3, 1.7) milliarcsec, and (3.1, 4.8) milliarcsec respectively. The Pan-STARRS system and the design of the PS1 surveys are described and an overview of the resulting image and catalog data products and their basic characteristics are described together with a summary of important results. The images, reduced data products, and derived data products from the Pan-STARRS1 surveys are available to the community from the Mikulski Archive for Space Telescopes (MAST) at STScI.
△ Less
Submitted 28 January, 2019; v1 submitted 16 December, 2016;
originally announced December 2016.
-
The Pan-STARRS1 Database and Data Products
Authors:
H. A. Flewelling,
E. A. Magnier,
K. C. Chambers,
J. N. Heasley,
C. Holmberg,
M. E. Huber,
W. Sweeney,
C. Z. Waters,
A. Calamida,
S. Casertano,
X. Chen,
D. Farrow,
G. Hasinger,
R. Henderson,
K. S. Long,
N. Metcalfe,
G. Narayan,
M. A. Nieto-Santisteban,
P. Norberg,
A. Rest,
R. P. Saglia,
A. Szalay,
A. R. Thakar,
J. L. Tonry,
J. Valenti
, et al. (15 additional authors not shown)
Abstract:
This paper describes the organization of the database and the catalog data products from the Pan-STARRS1 $3π$ Steradian Survey. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI. The database is described in detail, including the construction of the database, the provenance of the data, the schema,…
▽ More
This paper describes the organization of the database and the catalog data products from the Pan-STARRS1 $3π$ Steradian Survey. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI. The database is described in detail, including the construction of the database, the provenance of the data, the schema, and how the database tables are related. Examples of queries for a range of science goals are included. The catalog data products are available in the form of an SQL-based relational database from MAST, the Mikulski Archive for Space Telescopes at STScI.
△ Less
Submitted 29 January, 2019; v1 submitted 15 December, 2016;
originally announced December 2016.
-
Faint Object Detection in Multi-Epoch Observations via Catalog Data Fusion
Authors:
Tamas Budavari,
Alexander S. Szalay,
Thomas J. Loredo
Abstract:
Observational astronomy in the time-domain era faces several new challenges. One of them is the efficient use of observations obtained at multiple epochs. The work presented here addresses faint object detection with multi-epoch data, and describes an incremental strategy for separating real objects from artifacts in ongoing surveys, in situations where the single-epoch data are summaries of the f…
▽ More
Observational astronomy in the time-domain era faces several new challenges. One of them is the efficient use of observations obtained at multiple epochs. The work presented here addresses faint object detection with multi-epoch data, and describes an incremental strategy for separating real objects from artifacts in ongoing surveys, in situations where the single-epoch data are summaries of the full image data, such as single-epoch catalogs of flux and direction estimates for candidate sources. The basic idea is to produce low-threshold single-epoch catalogs, and use a probabilistic approach to accumulate catalog information across epochs; this is in contrast to more conventional strategies based on co-added or stacked image data across all epochs. We adopt a Bayesian approach, addressing object detection by calculating the marginal likelihoods for hypotheses asserting there is no object, or one object, in a small image patch containing at most one cataloged source at each epoch. The object-present hypothesis interprets the sources in a patch at different epochs as arising from a genuine object; the no-object (noise) hypothesis interprets candidate sources as spurious, arising from noise peaks. We study the detection probability for constant-flux objects in a simplified Gaussian noise setting, comparing results based on single exposures and stacked exposures to results based on a series of single-epoch catalog summaries. Computing the detection probability based on catalog data amounts to generalized cross-matching: it is the product of a factor accounting for matching of the estimated fluxes of candidate sources, and a factor accounting for matching of their estimated directions. We find that probabilistic fusion of multi-epoch catalog information can detect sources with only modest sacrifice in sensitivity and selectivity compared to stacking.
△ Less
Submitted 9 November, 2016;
originally announced November 2016.
-
Photo-z-SQL: integrated, flexible photometric redshift computation in a database
Authors:
Róbert Beck,
László Dobos,
Tamás Budavári,
Alexander S. Szalay,
István Csabai
Abstract:
We present a flexible template-based photometric redshift estimation framework, implemented in C#, that can be seamlessly integrated into a SQL database (or DB) server and executed on-demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and utilizes the computational capabilities of DB hardware. The code is able to per…
▽ More
We present a flexible template-based photometric redshift estimation framework, implemented in C#, that can be seamlessly integrated into a SQL database (or DB) server and executed on-demand in SQL. The DB integration eliminates the need to move large photometric datasets outside a database for redshift estimation, and utilizes the computational capabilities of DB hardware. The code is able to perform both maximum likelihood and Bayesian estimation, and can handle inputs of variable photometric filter sets and corresponding broad-band magnitudes. It is possible to take into account the full covariance matrix between filters, and filter zero points can be empirically calibrated using measurements with given redshifts. The list of spectral templates and the prior can be specified flexibly, and the expensive synthetic magnitude computations are done via lazy evaluation, coupled with a caching of results. Parallel execution is fully supported. For large upcoming photometric surveys such as the LSST, the ability to perform in-place photo-z calculation would be a significant advantage. Also, the efficient handling of variable filter sets is a necessity for heterogeneous databases, for example the Hubble Source Catalog, and for cross-match services such as SkyQuery. We illustrate the performance of our code on two reference photo-z estimation testing datasets, and provide an analysis of execution time and scalability with respect to different configurations. The code is available for download at https://github.com/beckrob/Photo-z-SQL.
△ Less
Submitted 20 March, 2017; v1 submitted 4 November, 2016;
originally announced November 2016.
-
Density-dependent clustering: I. Pulling back the curtains on motions of the BAO peak
Authors:
Mark C. Neyrinck,
István Szapudi,
Nuala McCullagh,
Alex Szalay,
Bridget Falck,
Jie Wang
Abstract:
The most common statistic used to analyze large-scale structure surveys is the correlation function, or power spectrum. Here, we show how `slicing' the correlation function on local density brings sensitivity to interesting non-Gaussian features in the large-scale structure, such as the expansion or contraction of baryon acoustic oscillations (BAO) according to the local density. The sliced correl…
▽ More
The most common statistic used to analyze large-scale structure surveys is the correlation function, or power spectrum. Here, we show how `slicing' the correlation function on local density brings sensitivity to interesting non-Gaussian features in the large-scale structure, such as the expansion or contraction of baryon acoustic oscillations (BAO) according to the local density. The sliced correlation function measures the large-scale flows that smear out the BAO, instead of just correcting them as reconstruction algorithms do. Thus, we expect the sliced correlation function to be useful in constraining the growth factor, and modified gravity theories that involve the local density. Out of the studied cases, we find that the run of the BAO peak location with density is best revealed when slicing on a $\sim 40$ Mpc/$h$ filtered density. But slicing on a $\sim100$ Mpc/$h$ filtered density may be most useful in distinguishing between underdense and overdense regions, whose BAO peaks are separated by a substantial $\sim 5$ Mpc/$h$ at $z=0$. We also introduce `curtain plots' showing how local densities drive particle motions toward or away from each other over the course of an $N$-body simulation.
△ Less
Submitted 14 May, 2018; v1 submitted 19 October, 2016;
originally announced October 2016.
-
The Effect of Corner Modes in the Initial Conditions of Cosmological Simulations
Authors:
B. Falck,
N. McCullagh,
M. C. Neyrinck,
J. Wang,
A. S. Szalay
Abstract:
In view of future high-precision large-scale structure surveys, it is important to quantify the percent and subpercent level effects in cosmological $N$-body simulations from which theoretical predictions are drawn. One such effect involves deciding whether to zero all modes above the one-dimensional Nyquist frequency, the so-called "corner" modes, in the initial conditions. We investigate this ef…
▽ More
In view of future high-precision large-scale structure surveys, it is important to quantify the percent and subpercent level effects in cosmological $N$-body simulations from which theoretical predictions are drawn. One such effect involves deciding whether to zero all modes above the one-dimensional Nyquist frequency, the so-called "corner" modes, in the initial conditions. We investigate this effect by comparing power spectra, density distribution functions, halo mass functions, and halo profiles in simulations with and without these modes. For a simulation with a mass resolution of $m_p \sim 10^{11}\,h^{-1}\,M_{\odot}$, we find that at $z>6$, the difference in the matter power spectrum is large at wavenumbers above $\sim 80$\% of $k_{\rm{Ny}}$, reducing to below 2\% at all scales by $z\sim 3$. Including corner modes results in a better match between low- and high-resolution simulations at wavenumbers around the Nyquist frequency of the low-resolution simulation, but the effect of the corner modes is smaller than the effect of particle discreteness. The differences in mass functions are 3\% for the smallest halos at $z=6$ for the $m_p \sim 10^{11}\,h^{-1}\,M_{\odot}$ simulation, but we find no significant difference in the stacked profiles of well-resolved halos at $z \leq 6$. Thus removing power at $|\mathbf{k}|>k_{\rm{Ny}}$ in the initial conditions of cosmological simulations has a small effect on small scales and high redshifts, typically below a few percent.
△ Less
Submitted 26 April, 2017; v1 submitted 16 October, 2016;
originally announced October 2016.
-
Photometric redshifts for the SDSS Data Release 12
Authors:
Róbert Beck,
László Dobos,
Tamás Budavári,
Alexander S. Szalay,
István Csabai
Abstract:
We present the methodology and data behind the photometric redshift database of the Sloan Digital Sky Survey Data Release 12 (SDSS DR12). We adopt a hybrid technique, empirically estimating the redshift via local regression on a spectroscopic training set, then fitting a spectrum template to obtain K-corrections and absolute magnitudes. The SDSS spectroscopic catalog was augmented with data from o…
▽ More
We present the methodology and data behind the photometric redshift database of the Sloan Digital Sky Survey Data Release 12 (SDSS DR12). We adopt a hybrid technique, empirically estimating the redshift via local regression on a spectroscopic training set, then fitting a spectrum template to obtain K-corrections and absolute magnitudes. The SDSS spectroscopic catalog was augmented with data from other, publicly available spectroscopic surveys to mitigate target selection effects. The training set is comprised of $1,976,978$ galaxies, and extends up to redshift $z\approx 0.8$, with a useful coverage of up to $z\approx 0.6$. We provide photometric redshifts and realistic error estimates for the $208,474,076$ galaxies of the SDSS primary photometric catalog. We achieve an average bias of $\overline{Δz_{\mathrm{norm}}} = 5.84 \times 10^{-5}$, a standard deviation of $σ\left(Δz_{\mathrm{norm}}\right)=0.0205$, and a $3σ$ outlier rate of $P_o=4.11\%$ when cross-validating on our training set. The published redshift error estimates and photometric error classes enable the selection of galaxies with high quality photometric redshifts. We also provide a supplementary error map that allows additional, sophisticated filtering of the data.
△ Less
Submitted 21 April, 2016; v1 submitted 31 March, 2016;
originally announced March 2016.
-
Doppler term in the galaxy two-point correlation function: wide-angle, velocity, Doppler lensing and cosmic acceleration effects
Authors:
Alvise Raccanelli,
Daniele Bertacca,
Donghui Jeong,
Mark C. Neyrinck,
Alexander S. Szalay
Abstract:
We study the parity-odd part (that we shall call Doppler term) of the linear galaxy two-point correlation function that arises from wide-angle, velocity, Doppler lensing and cosmic acceleration effects. As it is important at low redshift and at large angular separations, the Doppler term is usually neglected in the current generation of galaxy surveys. For future wide-angle galaxy surveys such as…
▽ More
We study the parity-odd part (that we shall call Doppler term) of the linear galaxy two-point correlation function that arises from wide-angle, velocity, Doppler lensing and cosmic acceleration effects. As it is important at low redshift and at large angular separations, the Doppler term is usually neglected in the current generation of galaxy surveys. For future wide-angle galaxy surveys such as Euclid, SPHEREx and SKA, however, we show that the Doppler term must be included. The effect of these terms is dominated by the magnification due to relativistic aberration effects and the slope of the galaxy redshift distribution and it generally mimics the effect of the local type primordial non-Gaussianity with the effective nonlinearity parameter $f_{\rm NL}^{\rm eff}$ of a few, we show that this would affect forecasts on measurements of $f_{\rm NL}$ at low-redshift. Our results show that a survey at low redshift with large number density over a wide area of the sky could detect the Doppler term with a signal-to-noise ratio of $\sim 1-20$, depending on survey specifications.
△ Less
Submitted 9 February, 2016;
originally announced February 2016.
-
An SSD-based eigensolver for spectral analysis on billion-node graphs
Authors:
Da Zheng,
Randal Burns,
Joshua Vogelstein,
Carey E. Priebe,
Alexander S. Szalay
Abstract:
Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems.
In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi ei…
▽ More
Many eigensolvers such as ARPACK and Anasazi have been developed to compute eigenvalues of a large sparse matrix. These eigensolvers are limited by the capacity of RAM. They run in memory of a single machine for smaller eigenvalue problems and require the distributed memory for larger problems.
In contrast, we develop an SSD-based eigensolver framework called FlashEigen, which extends Anasazi eigensolvers to SSDs, to compute eigenvalues of a graph with hundreds of millions or even billions of vertices in a single machine. FlashEigen performs sparse matrix multiplication in a semi-external memory fashion, i.e., we keep the sparse matrix on SSDs and the dense matrix in memory. We store the entire vector subspace on SSDs and reduce I/O to improve performance through caching the most recent dense matrix. Our result shows that FlashEigen is able to achieve 40%-60% performance of its in-memory implementation and has performance comparable to the Anasazi eigensolvers on a machine with 48 CPU cores. Furthermore, it is capable of scaling to a graph with 3.4 billion vertices and 129 billion edges. It takes about four hours to compute eight eigenvalues of the billion-node graph using 120 GB memory.
△ Less
Submitted 26 February, 2016; v1 submitted 3 February, 2016;
originally announced February 2016.
-
Dispersal of tidal debris in a Milky-Way-sized dark matter halo
Authors:
Wayne Ngan,
Raymond G. Carlberg,
Brandon Bozek,
Rosemary F. G. Wyse,
Alexander S. Szalay,
Piero Madau
Abstract:
We simulate the tidal disruption of a collisionless N-body globular star cluster in a total of 300 different orbits selected to have galactocentric radii between 10 and 30 kpc in four dark matter halos: (a) a spherical halo with no subhalos, (b) a spherical halo with subhalos, (c) a realistic halo with no subhalos, and (d) a realistic halo with subhalos. This allows us to isolate and study how the…
▽ More
We simulate the tidal disruption of a collisionless N-body globular star cluster in a total of 300 different orbits selected to have galactocentric radii between 10 and 30 kpc in four dark matter halos: (a) a spherical halo with no subhalos, (b) a spherical halo with subhalos, (c) a realistic halo with no subhalos, and (d) a realistic halo with subhalos. This allows us to isolate and study how the halo's (lack of) dynamical symmetry and substructures affect the dispersal of tidal debris. The realistic halos are constructed from the snapshot of the Via Lactea II simulation at redshift zero. We find that the overall halo's lack of dynamical symmetry disperses tidal debris to make the streams fluffier, consistent with previous studies of tidal debris of dwarf galaxies in larger orbits than ours in this study. On the other hand, subhalos in realistic potentials can locally enhance the densities along streams, making streams denser than their counterparts in smooth potentials. We show that many long and thin streams can survive in a realistic and lumpy halo for a Hubble time. This suggests that upcoming stellar surveys will likely uncover more thin streams which may contain density gaps that have been shown to be promising probes for dark matter substructures.
△ Less
Submitted 2 March, 2016; v1 submitted 18 January, 2016;
originally announced January 2016.
-
Statistical Decoupling of Lagrangian Fluid Parcel in Newtonian Cosmology
Authors:
Xin Wang,
Alex Szalay
Abstract:
The Lagrangian dynamics of a single fluid element within a self-gravitational matter field is intrinsically non-local due to the presence of the tidal force. This complicates the theoretical investigation of the non-linear evolution of various cosmic objects, e.g. dark matter halos, in the context of Lagrangian fluid dynamics, since a fluid parcel with given initial density and shape may evolve di…
▽ More
The Lagrangian dynamics of a single fluid element within a self-gravitational matter field is intrinsically non-local due to the presence of the tidal force. This complicates the theoretical investigation of the non-linear evolution of various cosmic objects, e.g. dark matter halos, in the context of Lagrangian fluid dynamics, since a fluid parcel with given initial density and shape may evolve differently depending on their environments. In this paper, we provide a statistical solution that could decouple this environmental dependence. After deriving the probability distribution evolution equation of the matter field, our method produces a set of closed ordinary differential equations whose solution is uniquely determined by the initial condition of the fluid element. Mathematically, it corresponds to the projected characteristic curve of the transport equation of the density-weighted probability density function (PDF). Consequently it is guaranteed that the one-point PDF would be preserved by evolving these local, yet non-linear, curves with the same set of initial data as the real system. Physically, these trajectories describe the mean evolution averaged over all environments by substituting the tidal tensor with its conditional average. For Gaussian distributed dynamical variables, this mean tidal tensor is simply proportional to the velocity shear tensor, and the dynamical system would recover the prediction of Zel'dovich approximation (ZA) with the further assumption of the linearized continuity equation. For Weakly non-Gaussian field, the averaged tidal tensor could be expanded perturbatively as a function of all relevant dynamical variables whose coefficients are determined by the statistics of the field.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Quantifying correlations between galaxy emission lines and stellar continua
Authors:
Róbert Beck,
László Dobos,
Ching-Wa Yip,
Alexander S. Szalay,
István Csabai
Abstract:
We analyse the correlations between continuum properties and emission line equivalent widths of star-forming and active galaxies from the Sloan Digital Sky Survey. Since upcoming large sky surveys will make broad-band observations only, including strong emission lines into theoretical modelling of spectra will be essential to estimate physical properties of photometric galaxies. We show that emiss…
▽ More
We analyse the correlations between continuum properties and emission line equivalent widths of star-forming and active galaxies from the Sloan Digital Sky Survey. Since upcoming large sky surveys will make broad-band observations only, including strong emission lines into theoretical modelling of spectra will be essential to estimate physical properties of photometric galaxies. We show that emission line equivalent widths can be fairly well reconstructed from the stellar continuum using local multiple linear regression in the continuum principal component analysis (PCA) space. Line reconstruction is good for star-forming galaxies and reasonable for galaxies with active nuclei. We propose a practical method to combine stellar population synthesis models with empirical modelling of emission lines. The technique will help generate more accurate model spectra and mock catalogues of galaxies to fit observations of the new surveys. More accurate modelling of emission lines is also expected to improve template-based photometric redshift estimation methods. We also show that, by combining PCA coefficients from the pure continuum and the emission lines, automatic distinction between hosts of weak active galactic nuclei (AGNs) and quiescent star-forming galaxies can be made. The classification method is based on a training set consisting of high-confidence starburst galaxies and AGNs, and allows for the similar separation of active and star-forming galaxies as the empirical curve found by Kauffmann et al. We demonstrate the use of three important machine learning algorithms in the paper: k-nearest neighbour finding, k-means clustering and support vector machines.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Toward Accurate Modeling of the Nonlinear Matter Bispectrum: Standard Perturbation Theory and Transients from Initial Conditions
Authors:
Nuala McCullagh,
Donghui Jeong,
Alexander S. Szalay
Abstract:
Accurate modeling of nonlinearities in the galaxy bispectrum, the Fourier transform of the galaxy three-point correlation function, is essential to fully exploit it as a cosmological probe. In this paper, we present numerical and theoretical challenges in modeling the nonlinear bispectrum. First, we test the robustness of the matter bispectrum measured from N-body simulations using different initi…
▽ More
Accurate modeling of nonlinearities in the galaxy bispectrum, the Fourier transform of the galaxy three-point correlation function, is essential to fully exploit it as a cosmological probe. In this paper, we present numerical and theoretical challenges in modeling the nonlinear bispectrum. First, we test the robustness of the matter bispectrum measured from N-body simulations using different initial conditions generators. We run a suite of N-body simulations using the Zel'dovich approximation and second-order Lagrangian perturbation theory (2LPT) at different starting redshifts, and find that transients from initial decaying modes systematically reduce the nonlinearities in the matter bispectrum. To achieve 1% accuracy in the matter bispectrum for $z\le3$ on scales $k<1$ $h$/Mpc, 2LPT initial conditions generator with initial redshift of $z\gtrsim100$ is required. We then compare various analytical formulas and empirical fitting functions for modeling the nonlinear matter bispectrum, and discuss the regimes for which each is valid. We find that the next-to-leading order (one-loop) correction from standard perturbation theory matches with N-body results on quasi-linear scales for $z\ge1$. We find that the fitting formula in Gil-Marín et al. (2012) accurately predicts the matter bispectrum for $z\le1$ on a wide range of scales, but at higher redshifts, the fitting formula given in Scoccimarro & Couchman (2001) gives the best agreement with measurements from N-body simulations.
△ Less
Submitted 4 August, 2015; v1 submitted 28 July, 2015;
originally announced July 2015.
-
Optimize Unsynchronized Garbage Collection in an SSD Array
Authors:
Da Zheng,
Randal Burns,
Alexander S. Szalay
Abstract:
Solid state disks (SSDs) have advanced to outperform traditional hard drives significantly in both random reads and writes. However, heavy random writes trigger fre- quent garbage collection and decrease the performance of SSDs. In an SSD array, garbage collection of individ- ual SSDs is not synchronized, leading to underutilization of some of the SSDs.
We propose a software solution to tackle t…
▽ More
Solid state disks (SSDs) have advanced to outperform traditional hard drives significantly in both random reads and writes. However, heavy random writes trigger fre- quent garbage collection and decrease the performance of SSDs. In an SSD array, garbage collection of individ- ual SSDs is not synchronized, leading to underutilization of some of the SSDs.
We propose a software solution to tackle the unsyn- chronized garbage collection in an SSD array installed in a host bus adaptor (HBA), where individual SSDs are exposed to an operating system. We maintain a long I/O queue for each SSD and flush dirty pages intelligently to fill the long I/O queues so that we hide the performance imbalance among SSDs even when there are few parallel application writes. We further define a policy of select- ing dirty pages to flush and a policy of taking out stale flush requests to reduce the amount of data written to SSDs. We evaluate our solution in a real system. Experi- ments show that our solution fully utilizes all SSDs in an array under random write-heavy workloads. It improves I/O throughput by up to 62% under random workloads of mixed reads and writes when SSDs are under active garbage collection. It causes little extra data writeback and increases the cache hit rate.
△ Less
Submitted 24 June, 2015;
originally announced June 2015.
-
Delivering SKA Science
Authors:
Peter Quinn,
Tim Axelrod,
Ian Bird,
Richard Dodson,
Alex Szalay,
Andreas Wicenec
Abstract:
The SKA will be capable of producing a stream of science data products that are Exa-scale in terms of their storage and processing requirements. This Google-scale enterprise is attracting considerable international interest and excitement from within the industrial and academic communities. In this chapter we examine the data flow, storage and processing requirements of a number of key SKA survey…
▽ More
The SKA will be capable of producing a stream of science data products that are Exa-scale in terms of their storage and processing requirements. This Google-scale enterprise is attracting considerable international interest and excitement from within the industrial and academic communities. In this chapter we examine the data flow, storage and processing requirements of a number of key SKA survey science projects to be executed on the baseline SKA1 configuration. Based on a set of conservative assumptions about trends for HPC and storage costs, and the data flow process within the SKA Observatory, it is apparent that survey projects of the scale proposed will potentially drive construction and operations costs beyond the current anticipated SKA1 budget. This implies a sharing of the resources and costs to deliver SKA science between the community and what is contained within the SKA Observatory. A similar situation was apparent to the designers of the LHC more than 10 years ago. We propose that it is time for the SKA project and community to consider the effort and process needed to design and implement a distributed SKA science data system that leans on the lessons of other projects and looks to recent developments in Cloud technologies to ensure an affordable, effective and global achievement of SKA science goals.
△ Less
Submitted 21 January, 2015;
originally announced January 2015.
-
Simulating Deep Hubble Images With Semi-empirical Models of Galaxy Formation
Authors:
Manuchehr Taghizadeh-Popp,
S. Michael Fall,
Richard L. White,
Alexander S. Szalay
Abstract:
We simulate deep images from the Hubble Space Telescope (HST) using semi-empirical models of galaxy formation with only a few basic assumptions and parameters. We project our simulations all the way to the observational domain, adding cosmological and instrumental effects to the images, and analyze them in the same way as real HST images ("forward modeling"). This is a powerful tool for testing an…
▽ More
We simulate deep images from the Hubble Space Telescope (HST) using semi-empirical models of galaxy formation with only a few basic assumptions and parameters. We project our simulations all the way to the observational domain, adding cosmological and instrumental effects to the images, and analyze them in the same way as real HST images ("forward modeling"). This is a powerful tool for testing and comparing galaxy evolution models, since it allows us to make unbiased comparisons between the predicted and observed distributions of galaxy properties, while automatically taking into account all relevant selection effects.
Our semi-empirical models populate each dark matter halo with a galaxy of determined stellar mass and scale radius. We compute the luminosity and spectrum of each simulated galaxy from its evolving stellar mass using stellar population synthesis models. We calculate the intrinsic scatter in the stellar mass-halo mass relation that naturally results from enforcing a monotonically increasing stellar mass along the merger history of each halo. The simulated galaxy images are drawn from cutouts of real galaxies from the Sloan Digital Sky Survey, with sizes and fluxes rescaled to match those of the model galaxies.
The distributions of galaxy luminosities, sizes, and surface brightnesses depend on the adjustable parameters in the models, and they agree well with observations for reasonable values of those parameters. Measured galaxy magnitudes and sizes have significant magnitude-dependent biases, with both being underestimated near the magnitude detection limit. The fraction of galaxies detected and fraction of light detected also depend sensitively on the details of the model.
△ Less
Submitted 13 April, 2015; v1 submitted 6 January, 2015;
originally announced January 2015.
-
On the Nonlinear Evolution of Cosmic Web: Lagrangian Dynamics Revisited
Authors:
Xin Wang,
Alex Szalay
Abstract:
We investigate the nonlinear evolution of cosmic morphologies of the large-scale structure by examining the Lagrangian dynamics of various tensors of a cosmic fluid element, including the velocity gradient tensor, the Hessian matrix of the gravitational potential as well as the deformation tensor. Instead of the eigenvalue representation, the first two tensors, which associate with the "kinematic"…
▽ More
We investigate the nonlinear evolution of cosmic morphologies of the large-scale structure by examining the Lagrangian dynamics of various tensors of a cosmic fluid element, including the velocity gradient tensor, the Hessian matrix of the gravitational potential as well as the deformation tensor. Instead of the eigenvalue representation, the first two tensors, which associate with the "kinematic" and "dynamical" cosmic web classification algorithm respectively, are studied in a more convenient parameter space. These parameters are defined as the rotational invariant coefficients of the characteristic equation of the tensor. In the nonlinear local model (NLM) where the magnetic part of Weyl tensor vanishes, these invariants are fully capable of characterizing the dynamics. Unlike the Zeldovich approximation (ZA), where various morphologies do not change before approaching a one-dimensional singularity, the sheets in NLM are unstable for both overdense and underdense perturbations. While it has long been known that the coupling between tidal tensor and velocity shear would cause a filamentary final configuration of a collapsing region, we show that the underdense perturbation are more subtle, as the balance between the shear rate (tidal force) and the divergence (density) could lead to different morphologies. Interestingly, this instability also sets the basis for understanding some distinctions of the cosmic web identified dynamically and kinematically. We show that the sheets with negative density perturbation in the potential based algorithm would turn to filaments faster than in the kinematic method, which could explain the distorted dynamical filamentary structure observed in the simulation.
△ Less
Submitted 15 November, 2014;
originally announced November 2014.
-
Simulating tidal streams in a high resolution dark matter halo
Authors:
Wayne Ngan,
Brandon Bozek,
Raymond G. Carlberg,
Rosemary F. G. Wyse,
Alexander S. Szalay,
Piero Madau
Abstract:
We simulate tidal streams in the presence and absence of substructures inside the zero redshift snapshot of the Via Lactea II (VL-2) simulation. A halo finder is used to remove and isolate the subhalos found inside the high resolution dark matter halo of VL-2, and the potentials for both the main halo and all the subhalos are constructed individually using the self-consistent field (SCF) method. T…
▽ More
We simulate tidal streams in the presence and absence of substructures inside the zero redshift snapshot of the Via Lactea II (VL-2) simulation. A halo finder is used to remove and isolate the subhalos found inside the high resolution dark matter halo of VL-2, and the potentials for both the main halo and all the subhalos are constructed individually using the self-consistent field (SCF) method. This allows us to make direct comparison of tidal streams between a smooth halo and a lumpy halo without assuming idealized profiles or triaxial fits. We simulate the kinematics of a star cluster starting with the same orbital position but two different velocities. Although these two orbits are only moderately eccentric and have similar apo- and pericentric distances, we find that the two streams have very different morphologies. We conclude that our model of the potential of VL-2 can provide insights about tidal streams that have not been explored by previous studies using idealized or axisymmetric models.
△ Less
Submitted 20 April, 2015; v1 submitted 13 November, 2014;
originally announced November 2014.
-
Nonlinear Behavior of Baryon Acoustic Oscillations in Redshift Space from the Zel'dovich Approximation
Authors:
Nuala McCullagh,
Alexander S. Szalay
Abstract:
Baryon acoustic oscillations (BAO) are a powerful probe of the expansion history of the universe, which can tell us about the nature of dark energy. In order to accurately characterize the dark energy equation of state using BAO, we must understand the effects of both nonlinearities and redshift space distortions on the location and shape of the acoustic peak. In a previous paper we introduced a n…
▽ More
Baryon acoustic oscillations (BAO) are a powerful probe of the expansion history of the universe, which can tell us about the nature of dark energy. In order to accurately characterize the dark energy equation of state using BAO, we must understand the effects of both nonlinearities and redshift space distortions on the location and shape of the acoustic peak. In a previous paper we introduced a novel approach to 2nd order perturbation theory in configuration space using the Zel'dovich approximation, and presented a simple result for the first nonlinear term of the correlation function. In this paper, we extend this approach to redshift space. We show how perform the computation, and present the analytic result for the first nonlinear term in the correlation function. Finally, we validate our result through comparison to numerical simulations.
△ Less
Submitted 5 November, 2014;
originally announced November 2014.
-
Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh
Authors:
Dániel Kondor,
László Dobos,
István Csabai,
András Bodor,
Gábor Vattay,
Tamás Budavári,
Alexander S. Szalay
Abstract:
We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-fi…
▽ More
We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.
△ Less
Submitted 2 October, 2014;
originally announced October 2014.
-
CANDELS/GOODS-S, CDFS, ECDFS: Photometric Redshifts For Normal and for X-Ray-Detected Galaxies
Authors:
Li-Ting Hsu,
Mara Salvato,
Kirpal Nandra,
Marcella Brusa,
Ralf Bender,
Johannes Buchner,
Jennifer L. Donley,
Dale D. Kocevski,
Yicheng Guo,
Nimish P. Hathi,
Cyprian Rangel,
S. P. Willner,
Murray Brightman,
Antonis Georgakakis,
Tamás Budavári,
Alexander S. Szalay,
Matthew L. N. Ashby,
Guillermo Barro,
Tomas Dahlen,
Sandra M. Faber,
Henry C. Ferguson,
Audrey Galametz,
Andrea Grazian,
Norman A. Grogin,
Kuang-Han Huang
, et al. (7 additional authors not shown)
Abstract:
We present photometric redshifts and associated probability distributions for all detected sources in the Extended Chandra Deep Field South (ECDFS). The work makes use of the most up-to-date data from the Cosmic Assembly Near-IR Deep Legacy Survey (CANDELS) and the Taiwan ECDFS Near-Infrared Survey (TENIS) in addition to other data. We also revisit multi-wavelength counterparts for published X-ray…
▽ More
We present photometric redshifts and associated probability distributions for all detected sources in the Extended Chandra Deep Field South (ECDFS). The work makes use of the most up-to-date data from the Cosmic Assembly Near-IR Deep Legacy Survey (CANDELS) and the Taiwan ECDFS Near-Infrared Survey (TENIS) in addition to other data. We also revisit multi-wavelength counterparts for published X-ray sources from the 4Ms-CDFS and 250ks-ECDFS surveys, finding reliable counterparts for 1207 out of 1259 sources ($\sim 96\%$). Data used for photometric redshifts include intermediate-band photometry deblended using the TFIT method, which is used for the first time in this work. Photometric redshifts for X-ray source counterparts are based on a new library of AGN/galaxy hybrid templates appropriate for the faint X-ray population in the CDFS. Photometric redshift accuracy for normal galaxies is 0.010 and for X-ray sources is 0.014, and outlier fractions are $4\%$ and $5.4\%$ respectively. The results within the CANDELS coverage area are even better as demonstrated both by spectroscopic comparison and by galaxy-pair statistics. Intermediate-band photometry, even if shallow, is valuable when combined with deep broad-band photometry. For best accuracy, templates must include emission lines.
△ Less
Submitted 24 September, 2014;
originally announced September 2014.
-
The redshift-space galaxy two-point correlation function and baryon acoustic oscillations
Authors:
Donghui Jeong,
Liang Dai,
Marc Kamionkowski,
Alexander S. Szalay
Abstract:
Future galaxy surveys will measure baryon acoustic oscillations (BAOs) with high significance, and a complete understanding of the anisotropies of BAOs in redshift space will be important to exploit the cosmological information in BAOs. Here we describe the anisotropies that arise in the redshift-space galaxy two-point correlation function (2PCF) and elucidate the origin of features that arise in…
▽ More
Future galaxy surveys will measure baryon acoustic oscillations (BAOs) with high significance, and a complete understanding of the anisotropies of BAOs in redshift space will be important to exploit the cosmological information in BAOs. Here we describe the anisotropies that arise in the redshift-space galaxy two-point correlation function (2PCF) and elucidate the origin of features that arise in the dependence of the BAOs on the angle between the orientation of the galaxy pair and the line of sight. We do so with a derivation of the configuration-space 2PCF using streaming model. We find that, contrary to common belief, the locations of BAO peaks in the redshift-space 2PCF are anisotropic even in the linear theory. Anisotropies in BAO depend strongly on the method of extracting the peak, showing maximum 3 % angular variation. We also find that extracting the BAO peak of $r^2ξ(r,μ)$ significantly reduces the anisotropy to sub-percent level angular variation. When subtracting the tilt due to the broadband behavior of the 2PCF, the BAO bump is enhanced along the line of sight because of local infall velocities toward the BAO bump. Precise measurement of the angular dependence of the redshift-space 2PCF will allow new geometrical tests of dark energy beyond the BAO.
△ Less
Submitted 20 August, 2014;
originally announced August 2014.
-
Hadoop in Low-Power Processors
Authors:
Da Zheng,
Alexander Szalay,
Andreas Terzis
Abstract:
In our previous work we introduced a so-called Amdahl blade microserver that combines a low-power Atom processor, with a GPU and an SSD to provide a balanced and energy-efficient system. Our preliminary results suggested that the sequential I/O of Amdahl blades can be ten times higher than that a cluster of conventional servers with comparable power consumption. In this paper we investigate the pe…
▽ More
In our previous work we introduced a so-called Amdahl blade microserver that combines a low-power Atom processor, with a GPU and an SSD to provide a balanced and energy-efficient system. Our preliminary results suggested that the sequential I/O of Amdahl blades can be ten times higher than that a cluster of conventional servers with comparable power consumption. In this paper we investigate the performance and energy efficiency of Amdahl blades running Hadoop. Our results show that Amdahl blades are 7.7 times and 3.4 times as energy-efficient as the Open Cloud Consortium cluster for a data-intensive and a compute-intensive application, respectively. The Hadoop Distributed Filesystem has relatively poor performance on Amdahl blades because both disk and network I/O are CPU-heavy operations on Atom processors. We demonstrate three effective techniques to reduce CPU consumption and improve performance. However, even with these improvements, the Atom processor is still the system's bottleneck. We revisit Amdahl's law, and estimate that Amdahl blades need four Atom cores to be well balanced for Hadoop tasks.
△ Less
Submitted 10 August, 2014;
originally announced August 2014.
-
FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs
Authors:
Da Zheng,
Disa Mhembere,
Randal Burns,
Joshua Vogelstein,
Carey E. Priebe,
Alexander S. Szalay
Abstract:
Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance los…
▽ More
Graph analysis performs many random reads and writes, thus, these workloads are typically performed in memory. Traditionally, analyzing large graphs requires a cluster of machines so the aggregate memory exceeds the graph size. We demonstrate that a multicore server can process graphs with billions of vertices and hundreds of billions of edges, utilizing commodity SSDs with minimal performance loss. We do so by implementing a graph-processing engine on top of a user-space SSD file system designed for high IOPS and extreme parallelism. Our semi-external memory graph engine called FlashGraph stores vertex state in memory and edge lists on SSDs. It hides latency by overlapping computation with I/O. To save I/O bandwidth, FlashGraph only accesses edge lists requested by applications from SSDs; to increase I/O throughput and reduce CPU overhead for I/O, it conservatively merges I/O requests. These designs maximize performance for applications with different I/O characteristics. FlashGraph exposes a general and flexible vertex-centric programming interface that can express a wide variety of graph algorithms and their optimizations. We demonstrate that FlashGraph in semi-external memory performs many algorithms with performance up to 80% of its in-memory implementation and significantly outperforms PowerGraph, a popular distributed in-memory graph engine.
△ Less
Submitted 25 January, 2015; v1 submitted 3 August, 2014;
originally announced August 2014.
-
Efficient Catalog Matching with Dropout Detection
Authors:
Dongwei Fan,
Tamás Budavári,
Alexander S. Szalay,
Chenzhou Cui,
Yongheng Zhao
Abstract:
Not only source catalogs are extracted from astronomy observations. Their sky coverage is always carefully recorded and used in statistical analyses, such as correlation and luminosity function studies. Here we present a novel method for catalog matching, which inherently builds on the coverage information for better performance and completeness. A modified version of the Zones Algorithm is introd…
▽ More
Not only source catalogs are extracted from astronomy observations. Their sky coverage is always carefully recorded and used in statistical analyses, such as correlation and luminosity function studies. Here we present a novel method for catalog matching, which inherently builds on the coverage information for better performance and completeness. A modified version of the Zones Algorithm is introduced for matching partially overlapping observations, where irrelevant parts of the data are excluded up front for efficiency. Our design enables searches to focus on specific areas on the sky to further speed up the process. Another important advantage of the new method over traditional techniques is its ability to quickly detect dropouts, i.e., the missing components that are in the observed regions of the celestial sphere but did not reach the detection limit in some observations. These often provide invaluable insight into the spectral energy distribution of the matched sources but rarely available in traditional associations.
△ Less
Submitted 18 March, 2014;
originally announced March 2014.
-
Planning the Future of U.S. Particle Physics (Snowmass 2013): Chapter 9: Computing
Authors:
L. A. T. Bauerdick,
S. Gottlieb,
G. Bell,
K. Bloom,
T. Blum,
D. Brown,
M. Butler,
A. Connolly,
E. Cormier,
P. Elmer,
M. Ernst,
I. Fisk,
G. Fuller,
R. Gerber,
S. Habib,
M. Hildreth,
S. Hoeche,
D. Holmgren,
C. Joshi,
A. Mezzacappa,
R. Mount,
R. Pordes,
B. Rebel,
L. Reina,
M. C. Sanchez
, et al. (6 additional authors not shown)
Abstract:
These reports present the results of the 2013 Community Summer Study of the APS Division of Particles and Fields ("Snowmass 2013") on the future program of particle physics in the U.S. Chapter 9, on Computing, discusses the computing challenges for future experiments in the Energy, Intensity, and Cosmic Frontiers, for accelerator science, and for particle theory, as well as structural issues in su…
▽ More
These reports present the results of the 2013 Community Summer Study of the APS Division of Particles and Fields ("Snowmass 2013") on the future program of particle physics in the U.S. Chapter 9, on Computing, discusses the computing challenges for future experiments in the Energy, Intensity, and Cosmic Frontiers, for accelerator science, and for particle theory, as well as structural issues in supporting the intense uses of computing required in all areas of particle physics.
△ Less
Submitted 23 January, 2014;
originally announced January 2014.
-
Objective Identification of Informative Wavelength Regions in Galaxy Spectra
Authors:
Ching-Wa Yip,
Michael Mahoney,
Alex Szalay,
Istvan Csabai,
Tamas Budavari,
Rosemary Wyse,
Laszlo Dobos
Abstract:
Understanding the diversity in spectra is the key to determining the physical parameters of galaxies. The optical spectra of galaxies are highly convoluted with continuum and lines which are potentially sensitive to different physical parameters. Defining the wavelength regions of interest is therefore an important question. In this work, we identify informative wavelength regions in a single-burs…
▽ More
Understanding the diversity in spectra is the key to determining the physical parameters of galaxies. The optical spectra of galaxies are highly convoluted with continuum and lines which are potentially sensitive to different physical parameters. Defining the wavelength regions of interest is therefore an important question. In this work, we identify informative wavelength regions in a single-burst stellar populations model by using the CUR Matrix Decomposition. Simulating the Lick/IDS spectrograph configuration, we recover the widely used Dn(4000), Hbeta, and HdeltaA to be most informative. Simulating the SDSS spectrograph configuration with a wavelength range 3450-8350 Angstrom and a model-limited spectral resolution of 3 Angstrom, the most informative regions are: first region-the 4000 Angstrom break and the Hdelta line; second region-the Fe-like indices; third region-the Hbeta line; fourth region-the G band and the Hgamma line. A Principal Component Analysis on the first region shows that the first eigenspectrum tells primarily the stellar age, the second eigenspectrum is related to the age-metallicity degeneracy, and the third eigenspectrum shows an anti-correlation between the strengths of the Balmer and the Ca K and H absorptions. The regions can be used to determine the stellar age and metallicity in early-type galaxies which have solar abundance ratios, no dust, and a single-burst star formation history. The region identification method can be applied to any set of spectra of the user's interest, so that we eliminate the need for a common, fixed-resolution index system. We discuss future directions in extending the current analysis to late-type galaxies.
△ Less
Submitted 1 February, 2014; v1 submitted 2 December, 2013;
originally announced December 2013.
-
The Dark Matter Contribution to Galactic Diffuse Gamma Ray Emission
Authors:
Lin F. Yang,
Joseph Silk,
Alexander S. Szalay,
Rosemary F. G. Wyse,
Brandon Bozek,
Piero Madau
Abstract:
Observations of diffuse Galactic gamma ray emission (DGE) by the Fermi Large Area Telescope (LAT) allow a detailed study of cosmic rays and the interstellar medium. However, diffuse emission models of the inner Galaxy underpredict the Fermi-LAT data at energies above a few GeV and hint at possible non-astrophysical sources including dark matter (DM) annihilations or decays. We present a study of t…
▽ More
Observations of diffuse Galactic gamma ray emission (DGE) by the Fermi Large Area Telescope (LAT) allow a detailed study of cosmic rays and the interstellar medium. However, diffuse emission models of the inner Galaxy underpredict the Fermi-LAT data at energies above a few GeV and hint at possible non-astrophysical sources including dark matter (DM) annihilations or decays. We present a study of the possible emission components from DM using the high-resolution Via Lactea II N-body simulation of a Milky Way-sized DM halo. We generate full-sky maps of DM annihilation and decay signals that include modeling of the adiabatic contraction of the host density profile, Sommerfeld enhanced DM annihilations, $p$-wave annihilations, and decaying DM. We compare our results with the DGE models produced by the Fermi-LAT team over different sky regions, including the Galactic center, high Galactic latitudes, and the Galactic anti-center. This work provides possible templates to fit the observational data that includes the contribution of the subhalo population to DM gamma-ray emission, with the significance depending on the annihilation/decay channels and the Galactic regions being considered.
△ Less
Submitted 2 April, 2014; v1 submitted 29 November, 2013;
originally announced December 2013.
-
Snowmass Computing Frontier: Computing for the Cosmic Frontier, Astrophysics, and Cosmology
Authors:
Andrew Connolly,
Salman Habib,
Alex Szalay,
Julian Borrill,
George Fuller,
Nick Gnedin,
Katrin Heitmann,
Danny Jacobs,
Don Lamb,
Tony Mezzacappa,
Bronson Messer,
Steve Myers,
Brian Nord,
Peter Nugent,
Brian O'Shea,
Paul Ricker,
Michael Schneider
Abstract:
This document presents (off-line) computing requrements and challenges for Cosmic Frontier science, covering the areas of data management, analysis, and simulations. We invite contributions to extend the range of covered topics and to enhance the current descriptions.
This document presents (off-line) computing requrements and challenges for Cosmic Frontier science, covering the areas of data management, analysis, and simulations. We invite contributions to extend the range of covered topics and to enhance the current descriptions.
△ Less
Submitted 12 November, 2013;
originally announced November 2013.