subscribe to arXiv mailings

Spectro-Photometry and Radial Distribution of Multiple Stellar Populations in Globular Clusters from Gaia XP Spectra

Authors: V. J. Mehta, A. P. Milone, L. Casagrande, A. F. Marino, M. V. Legnardi, G. Cordoni, E. Dondoglio, S. Jang, T. Ziliotto, M. Barbieri, M. Bernizzoni, E. Bortolan, A. Bouras Moreno Sanchez, E. P. Lagioia, S. Lionetto, A. Mohandasan, F. Muratore

Abstract: Understanding the formation of multiple populations in globular clusters (GCs) represents a challenge for stellar population studies. Nevertheless, the outermost GC regions, which may retain information of the initial configuration of the multiple populations, are poorly studied. We use synthetic spectra that account for the chemical compositions of first- and second-population (1P, 2P) stars of 4… ▽ More Understanding the formation of multiple populations in globular clusters (GCs) represents a challenge for stellar population studies. Nevertheless, the outermost GC regions, which may retain information of the initial configuration of the multiple populations, are poorly studied. We use synthetic spectra that account for the chemical compositions of first- and second-population (1P, 2P) stars of 47 Tucanae to identify the spectral regions that are sensitive to its multiple populations. Hence, we defined new photometric bands that are efficient to disentangle 1P and 2P giant stars from Gaia XP spectra. To test these new filters, we constructed the pseudo two-color diagrams dubbed chromosome maps (ChMs) and identified for the first time, 1P and 2P stars in the outermost cluster regions of 47 Tucanae and outside the tidal radius. We constructed similar diagrams for NGC3201, NGC6121, NGC6752, and NGC6397, thus exploring GCs with different metallicities. The ChMs allowed us to clearly disentangle 1P and 2P stars in the external regions of all clusters, with the exception of the metal-poor NGC6397. Our findings, together with literature results from more-internal regions, show that the 2P stars of 47 Tucanae and NGC 3201 are more-centrally concentrated than the 1P, whereas the multiple populations of NGC 6121, and NGC 6752 share the same radial distributions. These radial behaviors are consistent with the GC formation scenarios where 2P stars originate in the central regions. Noticeably, results on NGC 3201 are in tension with the conclusion from recent work that its 1P is more centrally concentrated than the 2P and might form with more central concentration. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 9 pages, 7 figures, submitted

arXiv:2405.20304 [pdf, other]

Group Robust Preference Optimization in Reward-free RLHF

Authors: Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj Mehta, Pier Giuseppe Sessa, Haitham Bou Ammar, Ilija Bogunovic

Abstract: Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimiz… ▽ More Adapting large language models (LLMs) for specific tasks usually involves fine-tuning through reinforcement learning with human feedback (RLHF) on preference data. While these data often come from diverse labelers' groups (e.g., different demographics, ethnicities, company teams, etc.), traditional RLHF approaches adopt a "one-size-fits-all" approach, i.e., they indiscriminately assume and optimize a single preference model, thus not being robust to unique characteristics and needs of the various groups. To address this limitation, we propose a novel Group Robust Preference Optimization (GRPO) method to align LLMs to individual groups' preferences robustly. Our approach builds upon reward-free direct preference optimization methods, but unlike previous approaches, it seeks a robust policy which maximizes the worst-case group performance. To achieve this, GRPO adaptively and sequentially weights the importance of different groups, prioritizing groups with worse cumulative loss. We theoretically study the feasibility of GRPO and analyze its convergence for the log-linear policy class. By fine-tuning LLMs with GRPO using diverse group-based global opinion data, we significantly improved performance for the worst-performing groups, reduced loss imbalances across groups, and improved probability accuracies compared to non-robust baselines. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.13491 [pdf, other]

Euclid. I. Overview of the Euclid mission

Authors: Euclid Collaboration, Y. Mellier, Abdurro'uf, J. A. Acevedo Barroso, A. Achúcarro, J. Adamek, R. Adam, G. E. Addison, N. Aghanim, M. Aguena, V. Ajani, Y. Akrami, A. Al-Bahlawan, A. Alavi, I. S. Albuquerque, G. Alestas, G. Alguero, A. Allaoui, S. W. Allen, V. Allevato, A. V. Alonso-Tetilla, B. Altieri, A. Alvarez-Candal, A. Amara, L. Amendola , et al. (1086 additional authors not shown)

Abstract: The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14… ▽ More The current standard model of cosmology successfully describes a variety of measurements, but the nature of its main ingredients, dark matter and dark energy, remains unknown. Euclid is a medium-class mission in the Cosmic Vision 2015-2025 programme of the European Space Agency (ESA) that will provide high-resolution optical imaging, as well as near-infrared imaging and spectroscopy, over about 14,000 deg^2 of extragalactic sky. In addition to accurate weak lensing and clustering measurements that probe structure formation over half of the age of the Universe, its primary probes for cosmology, these exquisite data will enable a wide range of science. This paper provides a high-level overview of the mission, summarising the survey characteristics, the various data-processing steps, and data products. We also highlight the main science objectives and expected performance. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Paper submitted as part of the A&A special issue`Euclid on Sky'

arXiv:2405.10908 [pdf, other]

UVCANDELS: The role of dust on the stellar mass-size relation of disk galaxies at 0.5 $\leq z \leq$ 3.0

Authors: Kalina V. Nedkova, Marc Rafelski, Harry I. Teplitz, Vihang Mehta, Laura DeGroot, Swara Ravindranath, Anahita Alavi, Alexander Beckett, Norman A. Grogin, Boris Häußler, Anton M. Koekemoer, Grecco A. Oyarzún, Laura Prichard, Mitchell Revalski, Gregory F. Snyder, Ben Sunnquist, Xin Wang, Rogier A. Windhorst, Nima Chartab, Christopher J. Conselice, Yicheng Guo, Nimish Hathi, Matthew J. Hayes, Zhiyuan Ji, Keunho J. Kim , et al. (8 additional authors not shown)

Abstract: We use the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey fields (UVCANDELS) to measure half-light radii in the rest-frame far-UV for $\sim$16,000 disk-like galaxies over $0.5\leq z \leq 3$. We compare these results to rest-frame optical sizes that we measure in a self-consistent way and find that the stellar mass-size relation of disk galaxies is steeper… ▽ More We use the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey fields (UVCANDELS) to measure half-light radii in the rest-frame far-UV for $\sim$16,000 disk-like galaxies over $0.5\leq z \leq 3$. We compare these results to rest-frame optical sizes that we measure in a self-consistent way and find that the stellar mass-size relation of disk galaxies is steeper in the rest-frame UV than in the optical across our entire redshift range. We show that this is mainly driven by massive galaxies ($\gtrsim10^{10}$M$_\odot$), which we find to also be among the most dusty. Our results are consistent with the literature and have commonly been interpreted as evidence of inside-out growth wherein galaxies form their central structures first. However, they could also suggest that the centers of massive galaxies are more heavily attenuated than their outskirts. We distinguish between these scenarios by modeling and selecting galaxies at $z=2$ from the VELA simulation suite in a way that is consistent with UVCANDELS. We show that the effects of dust alone can account for the size differences we measure at $z=2$. This indicates that, at different wavelengths, size differences and the different slopes of the stellar mass-size relation do not constitute evidence for inside-out growth. △ Less

Submitted 28 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted for publication in ApJ. 22 pages, 12 figures, and 4 tables

arXiv:2404.12241 [pdf, other]

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark. △ Less

Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12033 [pdf, other]

Quantum Optical Approach to the $K$ Nearest Neighbour Algorithm

Authors: Vivek Mehta, Francesco Petruccione, Utpal Roy

Abstract: We construct a hybrid quantum-classical approach for the $K$-Nearest Neighbour algorithm, where the information is embedded in a phase-distributed multimode coherent state with the assistance of a single photon. The task of finding the closeness between the data points is delivered by the quantum optical computer, while the sorting and class assignment are performed by a classical computer. We pro… ▽ More We construct a hybrid quantum-classical approach for the $K$-Nearest Neighbour algorithm, where the information is embedded in a phase-distributed multimode coherent state with the assistance of a single photon. The task of finding the closeness between the data points is delivered by the quantum optical computer, while the sorting and class assignment are performed by a classical computer. We provide the quantum optical architecture corresponding to our algorithm. The subordinate optical network is validated by numerical simulation. We also optimize the computational resources of the algorithm in the context of space, energy requirements and gate complexity. Applications are presented for diverse and well-known public benchmarks and synthesized data sets. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 18 pages, 6 figures

arXiv:2404.09015 [pdf, other]

On the universal validity of Case B recombination theory

Authors: C. Scarlata, M. Hayes, N. Panagia, V. Mehta, F. Haardt, M. Bagley

Abstract: In an ongoing search for low-mass extreme emission line galaxies, we identified a galaxy with a Ha/Hb Balmer line ratio of 2.620 +- 0.078. Ha/Hb Balmer ratios lower than the dust-free Case~B value appear relatively frequently in extreme emission line galaxies. These low values suggest that the Case~B assumption may not be valid in these objects. After ruling out the possibility that the low Ha/Hb… ▽ More In an ongoing search for low-mass extreme emission line galaxies, we identified a galaxy with a Ha/Hb Balmer line ratio of 2.620 +- 0.078. Ha/Hb Balmer ratios lower than the dust-free Case~B value appear relatively frequently in extreme emission line galaxies. These low values suggest that the Case~B assumption may not be valid in these objects. After ruling out the possibility that the low Ha/Hb ratio is due to systematic errors introduced by observational effects, we use constraints from the total Hb luminosity, the [OIII]/[OII] line ratio and the Balmer line equivalent widths, to suggest that the gas is optically thick to both Ha and Lya photons, and the geometry and orientation of the scattering gas causes Ha photons to be preferentially removed from the line of sight with respect to higher order Balmer series photons. Finally, we use data from the SDSS survey to show that Balmer self-absorption may be more important than previously assumed in high excitation emission line galaxies, where Lya pumping of the hydrogen excited state can be effective. If not recognized, Balmer self-absorption could lead to inaccurate estimates of galaxy physical properties. As an example, the effect of dust extinction could be over-estimated, for spherically symmetric scattering medium, or under-estimated, for a not spherically-symmetric distribution. △ Less

Submitted 29 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

arXiv:2404.04762 [pdf, other]

WFC3 Infrared Spectroscopic Parallel (WISP) Survey: Photometric and Emission Line Data Release

Authors: A. J. Battisti, M. B. Bagley, M. Rafelski, I. Baronchelli, Y. S. Dai, A. L. Henry, H. Atek, J. Colbert, M. A. Malkan, P. J. McCarthy, C. Scarlata, B. Siana, H. I. Teplitz, A. Alavi, K. Boyett, A. J. Bunker, J. P. Gardner, N. P. Hathi, D. Masters, V. Mehta, M. Rutkowski, K. Shahinyan, B. Sunnquist, X. Wang

Abstract: We present reduced images and catalogues of photometric and emission line data ($\sim$230,000 and $\sim$8,000 sources, respectively) for the WFC3 Infrared Spectroscopic Parallel (WISP) Survey. These data are made publicly available on the Mikulski Archive for Space Telescopes (MAST) and include reduced images from various facilities: ground-based $ugri$, HST WFC3, and Spitzer IRAC (Infrared Array… ▽ More We present reduced images and catalogues of photometric and emission line data ($\sim$230,000 and $\sim$8,000 sources, respectively) for the WFC3 Infrared Spectroscopic Parallel (WISP) Survey. These data are made publicly available on the Mikulski Archive for Space Telescopes (MAST) and include reduced images from various facilities: ground-based $ugri$, HST WFC3, and Spitzer IRAC (Infrared Array Camera). Coverage in at least one additional filter beyond the WFC3/IR data are available for roughly half of the fields (227 out of 483), with $\sim$20% (86) having coverage in six or more filters from $u$-band to IRAC 3.6$μ$m (0.35-3.6$μ$m). For the lower spatial resolution (and shallower) ground-based and IRAC data, we perform PSF-matched, prior-based, deconfusion photometry (i.e., forced-photometry) using the TPHOT software to optimally extract measurements or upper limits. We present the methodology and software used for the WISP emission line detection and visual inspection. The former adopts a continuous wavelet transformation that significantly reduces the number of spurious sources as candidates before the visual inspection stage. We combine both WISP catalogues and perform SED fitting on galaxies with reliable spectroscopic redshifts and multi-band photometry to measure their stellar masses. We stack WISP spectra as functions of stellar mass and redshift and measure average emission line fluxes and ratios. We find that WISP emission line sources are typically `normal' star-forming galaxies based on the Mass-Excitation diagram ([OIII]/H$β$ vs. $M_\star$; $0.74<z_\mathrm{grism}<2.31$), the galaxy main sequence (SFR vs. $M_\star$; $0.30<z_\mathrm{grism}<1.45$), $S_{32}$ ratio vs. $M_\star$ ($0.30<z_\mathrm{grism}<0.73$), and $O_{32}$ and $R_{23}$ ratios vs. $M_\star$ ($1.27<z_\mathrm{grism}<1.45$). △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 36 pages, 21 figures, 17 tables. Accepted for publication in MNRAS. The WISP Photometric and Emission Line catalogues and reduced images are in the process of being added as HLSPs to the WISP MAST website (https://archive.stsci.edu/prepds/wisp/). Please email the first-author (provided in paper) to request access to files prior to the MAST release

arXiv:2404.04330 [pdf, other]

Hydrodynamical simulations favor a pure deflagration origin of the near-Chandrasekhar mass supernova remnant 3C 397

Authors: Vrutant Mehta, Jack Sullivan, Robert Fisher, Yuken Ohshiro, Hiroya Yamaguchi, Khanak Bhargava, Sudarshan Neopane

Abstract: Suzaku X-ray observations of the Type Ia supernova remnant (SNR) 3C 397 discovered exceptionally high mass ratios of Mn/Fe, Ni/Fe, and Cr/Fe, consistent with a near $M_{\rm Ch}$ progenitor white dwarf (WD). The Suzaku observations have established 3C 397 as our best candidate for a near-$M_{\rm Ch}$ SNR Ia, and opened the way to address additional outstanding questions about the origin and explosi… ▽ More Suzaku X-ray observations of the Type Ia supernova remnant (SNR) 3C 397 discovered exceptionally high mass ratios of Mn/Fe, Ni/Fe, and Cr/Fe, consistent with a near $M_{\rm Ch}$ progenitor white dwarf (WD). The Suzaku observations have established 3C 397 as our best candidate for a near-$M_{\rm Ch}$ SNR Ia, and opened the way to address additional outstanding questions about the origin and explosion mechanism of these transients. In particular, subsequent XMM-Newton observations revealed an unusually clumpy distribution of iron group elemental (IGE) abundances within the ejecta of 3C 397. In this paper, we undertake a suite of two dimensional hydrodynamical models, varying both the explosion mechanism -- either deflagration-to-detonation (DDT), or pure deflagration -- WD progenitors, and WD progenitor metallicity, and analyze their detailed nucleosynthetic abundances and associated clumping. We find that pure deflagrations naturally give rise to clumpy distributions of neutronized species concentrated towards the outer limb of the remnant, and confirm DDTs have smoothly structured ejecta with a central concentration of neutronization. Our findings indicate that 3C 397 was most likely a pure deflagration of a high central density WD. We discuss a range of implications of these findings for the broader SN Ia progenitor problem. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures, Stable mean nucleosynthetic yields' datasets are available at https://doi.org/10.5281/zenodo.10927265 . Comments are welcome

arXiv:2404.00399 [pdf, other]

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

Authors: Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak , et al. (20 additional authors not shown)

Abstract: Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, where… ▽ More Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to democratize access to pretrained models for collaborative community development. However, such existing models face challenges: limited multilingual capabilities, continual pretraining causing catastrophic forgetting, whereas pretraining from scratch is computationally expensive, and compliance with AI safety and development laws. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435 billion additional tokens, Aurora-M surpasses 2 trillion tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Aurora-M is rigorously evaluated across various tasks and languages, demonstrating robustness against catastrophic forgetting and outperforming alternatives in multilingual settings, particularly in safety evaluations. To promote responsible open-source LLM development, Aurora-M and its variants are released at https://huggingface.co/collections/aurora-m/aurora-m-models-65fdfdff62471e09812f5407 . △ Less

Submitted 23 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: Preprint

arXiv:2403.11560 [pdf, other]

Variable Hyperparameterized Gaussian Kernel using Displaced Squeezed Vacuum State

Authors: Vivek Mehta, Utpal Roy

Abstract: There are schemes for realizing different types of kernels by quantum states of light. It is particularly interesting to realize the Gaussian kernel due to its wider applicability. A multimode coherent state can generate the Gaussian kernel with a constant value of hyperparameter. This constant hyperparameter has limited the application of the Gaussian kernel when it is applied to complex learning… ▽ More There are schemes for realizing different types of kernels by quantum states of light. It is particularly interesting to realize the Gaussian kernel due to its wider applicability. A multimode coherent state can generate the Gaussian kernel with a constant value of hyperparameter. This constant hyperparameter has limited the application of the Gaussian kernel when it is applied to complex learning problems. We realize the variable hyperparameterized Gaussian kernel with a multimode-displaced squeezed vacuum state. The learning capacity of this kernel is tested with the support vector machines over some synthesized data sets as well as public benchmark data sets. We establish that the proposed variable hyperparameterized Gaussian kernel offers better accuracy over the constant Gaussian kernel. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 6 pages and 4 figures

arXiv:2401.03094 [pdf, other]

Lyman Continuum Emission from AGN at 2.3$\lesssim$z$\lesssim$3.7 in the UVCANDELS Fields

Authors: Brent M. Smith, Rogier A. Windhorst, Harry Teplitz, Matthew Hayes, Marc Rafelski, Mark Dickinson, Vihang Mehta, Nimish P. Hathi, John MacKenty, L. Y. Aaron Yung, Anton M. Koekemoer, Emmaris Soto, Christopher J. Conselice, Ray A. Lucas, Xin Wang, Keunho J. Kim, Anahita Alavi, Norman A. Grogin, Ben Sunnquist, Laura Prichard, Rolf A. Jansen, the UVCANDELS team

Abstract: We present the results of our search for Lyman continuum (LyC) emitting AGN at redshifts 2.3$\lesssim$z$\lesssim$4.9 from HST WFC3 F275W observations in the UVCANDELS fields. We also include LyC emission from AGN using HST WFC3 F225W, F275W, and F336W found in the ERS and HDUV data. We performed exhaustive queries of the Vizier database to locate AGN with high quality spectroscopic redshifts. In t… ▽ More We present the results of our search for Lyman continuum (LyC) emitting AGN at redshifts 2.3$\lesssim$z$\lesssim$4.9 from HST WFC3 F275W observations in the UVCANDELS fields. We also include LyC emission from AGN using HST WFC3 F225W, F275W, and F336W found in the ERS and HDUV data. We performed exhaustive queries of the Vizier database to locate AGN with high quality spectroscopic redshifts. In total, we found 51 AGN that met our criteria within the UVCANDELS and ERS footprints. Of these 51, we find 12 AGN had $\geq$4$σ$ detected LyC flux in the WFC3/UVIS images. Using space- and ground-based data from X-ray to radio, we fit the multi-wavelength photometric data of each AGN to a CIGALE SED and correlate various SED parameters to the LyC flux. KS-tests of the SED parameter distributions for the LyC-detected and non-detected AGN showed they are likely not distinct samples. However, we find that X-ray luminosity, star-formation onset age, and disk luminosity show strong correlations relative to their emitted LyC flux. We also find strong correlation of the LyC flux to several dust parameters, i.e., polar and toroidal dust emission, 6 $μm$ luminosity, and anti-correlation with metallicity and $A_{FUV}$. We simulate the LyC escape fraction ($f_{esc}$) using the CIGALE and IGM transmission models for the LyC-detected AGN and find an average $f_{esc}$$\simeq$18%, weighted by uncertainties. We stack the LyC flux of subsamples of AGN according to the wavelength continuum region in which they are detected and find no significant distinctions in their LyC emission, although our $sub-mm\ detected$ F336W sample shows the brightest stacked LyC flux. These findings indicate that LyC-production and -escape in AGN is more complicated than the simple assumption of thermal emission and a 100% escape fraction. Further testing of AGN models with larger samples than presented here is needed. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 21 pages, 6 figures, 3 tables. Accepted for publication in The Astrophysical Journal

arXiv:2312.03503 [pdf, other]

Transfer learning for galaxy feature detection: Finding Giant Star-forming Clumps in low redshift galaxies using Faster R-CNN

Authors: Jürgen Popp, Hugh Dickinson, Stephen Serjeant, Mike Walmsley, Dominic Adams, Lucy Fortson, Kameswara Mantha, Vihang Mehta, James M. Dawson, Sandor Kruk, Brooke Simmons

Abstract: Giant Star-forming Clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z>1) galaxies but their formation and role in galaxy evolution remain unclear. High-resolution observations of low-redshift clumpy galaxy analogues are rare and restricted to a limited set of galaxies but the increasing availability of wide-field galaxy survey data makes the detecti… ▽ More Giant Star-forming Clumps (GSFCs) are areas of intensive star-formation that are commonly observed in high-redshift (z>1) galaxies but their formation and role in galaxy evolution remain unclear. High-resolution observations of low-redshift clumpy galaxy analogues are rare and restricted to a limited set of galaxies but the increasing availability of wide-field galaxy survey data makes the detection of large clumpy galaxy samples increasingly feasible. Deep Learning, and in particular CNNs, have been successfully applied to image classification tasks in astrophysical data analysis. However, one application of DL that remains relatively unexplored is that of automatically identifying and localising specific objects or features in astrophysical imaging data. In this paper we demonstrate the feasibility of using Deep learning-based object detection models to localise GSFCs in astrophysical imaging data. We apply the Faster R-CNN object detection framework (FRCNN) to identify GSFCs in low redshift (z<0.3) galaxies. Unlike other studies, we train different FRCNN models not on simulated images with known labels but on real observational data that was collected by the Sloan Digital Sky Survey Legacy Survey and labelled by volunteers from the citizen science project `Galaxy Zoo: Clump Scout'. The FRCNN model relies on a CNN component as a `backbone' feature extractor. We show that CNNs, that have been pre-trained for image classification using astrophysical images, outperform those that have been pre-trained on terrestrial images. In particular, we compare a domain-specific CNN -`Zoobot' - with a generic classification backbone and find that Zoobot achieves higher detection performance and also requires smaller training data sets to do so. Our final model is capable of producing GSFC detections with a completeness and purity of >=0.8 while only being trained on ~5,000 galaxy images. △ Less

Submitted 1 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Accepted for publication in RASTI, 22 pages

arXiv:2312.00267 [pdf, other]

Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

Authors: Viraj Mehta, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Jeff Schneider, Willie Neiswanger

Abstract: Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that… ▽ More Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that one can often choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and formalize this as an offline contextual dueling bandit problem. We give an upper-confidence-bound style algorithm for this problem and prove a polynomial worst-case regret bound. We then provide empirical confirmation in a synthetic setting that our approach outperforms existing methods. After, we extend the setting and methodology for practical use in RLHF training of large language models. Here, our method is able to reach better performance with fewer samples of human preferences than multiple baselines on three real-world datasets. △ Less

Submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.15664 [pdf, other]

The UV luminosity function at 0.6 < z < 1 from UVCANDELS

Authors: Lei Sun, Xin Wang, Harry I. Teplitz, Vihang Mehta, Anahita Alavi, Marc Rafelski, Rogier A. Windhorst, Claudia Scarlata, Jonathan P. Gardner, Brent M. Smith, Ben Sunnquist, Laura Prichard, Yingjie Cheng, Norman Grogin, Nimish P. Hathi, Matthew Hayes, Anton M. Koekemoer, Bahram Mobasher, Kalina V. Nedkova, Robert O'Connell, Brant Robertson, Sina Taamoli, L. Y. Aaron Yung, Gabriel Brammer, James Colbert , et al. (53 additional authors not shown)

Abstract: UVCANDELS is a HST Cycle-26 Treasury Program awarded 164 orbits of primary ultraviolet (UV) F275W imaging and coordinated parallel optical F435W imaging in four CANDELS fields: GOODS-N, GOODS-S, EGS, and COSMOS, covering a total area of $\sim426$ arcmin$^2$. This is $\sim2.7$ times larger than the area covered by previous deep-field space UV data combined, reaching a depth of about 27 and 28 ABmag… ▽ More UVCANDELS is a HST Cycle-26 Treasury Program awarded 164 orbits of primary ultraviolet (UV) F275W imaging and coordinated parallel optical F435W imaging in four CANDELS fields: GOODS-N, GOODS-S, EGS, and COSMOS, covering a total area of $\sim426$ arcmin$^2$. This is $\sim2.7$ times larger than the area covered by previous deep-field space UV data combined, reaching a depth of about 27 and 28 ABmag ($5σ$ in $0.2"$ apertures) for F275W and F435W, respectively. Along with the new photometric catalogs, we present an analysis of the rest-frame UV luminosity function (LF), relying on our UV-optimized aperture photometry method yielding a factor of $1.5\times$ increase than the H-isophot aperture photometry in the signal-to-noise ratios of galaxies in our F275W imaging. Using well tested photometric redshift measurements we identify 5810 galaxies at redshifts $0.6<z<1$, down to an absolute magnitude of $M_\text{UV} = -14.2$. In order to minimize the effect of uncertainties in estimating the completeness function, especially at the faint-end, we restrict our analysis to sources above $30\%$ completeness, which provides a final sample of 4726 galaxies at $-21.5<M_\text{UV}<-15.5$. We performed a maximum likelihood estimate to derive the best-fit parameters of the UV LF. We report a best-fit faint-end slope of $α= -1.359^{+0.041}_{-0.041}$ at $z \sim 0.8$. Creating sub-samples at $z\sim0.7$ and $z\sim0.9$, we observe a possible evolution of $α$ with redshift. The unobscured UV luminosity density at $M_\text{UV}<-10$ is derived as $ρ_\text{UV}=1.339^{+0.027}_{-0.030}\ (\times10^{26} \text{ergs/s/Hz/Mpc}^3)$ using our best-fit LF parameters. The new F275W and F435 photometric catalogs from UVCANDELS have been made publicly available on the Barbara A. Mikulski Archive for Space Telescopes (MAST). △ Less

Submitted 2 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: 17 pages, 8 figures, Accepted for publication in ApJ

arXiv:2310.05674 [pdf, other]

Making Scalable Meta Learning Practical

Authors: Sang Keun Choe, Sanket Vaibhav Mehta, Hwijeen Ahn, Willie Neiswanger, Pengtao Xie, Emma Strubell, Eric Xing

Abstract: Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which co… ▽ More Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains. △ Less

Submitted 23 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.01831 [pdf, other]

Axion minima in string theory

Authors: Naomi Gendler, Oliver Janssen, Matthew Kleban, Joan La Madrid, Viraf M. Mehta

Abstract: We study the landscape of axion theories in compactifications of type IIB string theory on orientifolds of Calabi-Yau threefolds. In a sample of approximately 400,000 geometries we find that in the regime of perturbative control there are only a handful of distinct axion minima per geometry, despite there being infinitely many instanton contributions to the potential with unbounded charges. The en… ▽ More We study the landscape of axion theories in compactifications of type IIB string theory on orientifolds of Calabi-Yau threefolds. In a sample of approximately 400,000 geometries we find that in the regime of perturbative control there are only a handful of distinct axion minima per geometry, despite there being infinitely many instanton contributions to the potential with unbounded charges. The ensemble we consider has numbers of axion fields ranging from 1 to 491, but the median number of distinct minima is 1, the mean number is 1.9 and the largest is 54. These small numbers of minima occur because the leading axion charge matrix is quite sparse, while the subleading corrections are increasingly exponentially suppressed as the charges increase. On their own, such potentials are nowhere near rich enough to be of interest anthropically. This is in stark contrast to potentials for which the charge matrix is less sparse or the hierarchies between the instanton contributions are less steep, where one can find $\mathcal{O}(10^{500})$ minima for $\mathcal{O}(500)$ axions. To generate a sufficiently large landscape from string compactifications our results indicate that one would need to rely on varying flux or topology, or to develop tools that allow one to go beyond the regime we can control with current techniques. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 14+6 pages, 1 figure

arXiv:2308.15141 [pdf]

doi 10.1016/j.media.2023.102861

Uncertainty Aware Training to Improve Deep Learning Model Calibration for Classification of Cardiac MR Images

Authors: Tareen Dawood, Chen Chen, Baldeep S. Sidhua, Bram Ruijsink, Justin Goulda, Bradley Porter, Mark K. Elliott, Vishal Mehta, Christopher A. Rinaldi, Esther Puyol-Anton, Reza Razavi, Andrew P. King

Abstract: Quantifying uncertainty of predictions has been identified as one way to develop more trustworthy artificial intelligence (AI) models beyond conventional reporting of performance metrics. When considering their role in a clinical decision support setting, AI classification models should ideally avoid confident wrong predictions and maximise the confidence of correct predictions. Models that do thi… ▽ More Quantifying uncertainty of predictions has been identified as one way to develop more trustworthy artificial intelligence (AI) models beyond conventional reporting of performance metrics. When considering their role in a clinical decision support setting, AI classification models should ideally avoid confident wrong predictions and maximise the confidence of correct predictions. Models that do this are said to be well-calibrated with regard to confidence. However, relatively little attention has been paid to how to improve calibration when training these models, i.e., to make the training strategy uncertainty-aware. In this work we evaluate three novel uncertainty-aware training strategies comparing against two state-of-the-art approaches. We analyse performance on two different clinical applications: cardiac resynchronisation therapy (CRT) response prediction and coronary artery disease (CAD) diagnosis from cardiac magnetic resonance (CMR) images. The best-performing model in terms of both classification accuracy and the most common calibration measure, expected calibration error (ECE) was the Confidence Weight method, a novel approach that weights the loss of samples to explicitly penalise confident incorrect predictions. The method reduced the ECE by 17% for CRT response prediction and by 22% for CAD diagnosis when compared to a baseline classifier in which no uncertainty-aware strategy was included. In both applications, as well as reducing the ECE there was a slight increase in accuracy from 69% to 70% and 70% to 72% for CRT response prediction and CAD diagnosis respectively. However, our analysis showed a lack of consistency in terms of optimal models when using different calibration measures. This indicates the need for careful consideration of performance metrics when training and selecting models for complex high-risk applications in healthcare. △ Less

Submitted 29 August, 2023; originally announced August 2023.

arXiv:2308.10927 [pdf, other]

HST UV Spectroscopy of the Dwarf Starburst Galaxy Pox 186

Authors: Noah S. J. Rogers, Claudia M. Scarlata, Evan D. Skillman, Nathan R. Eggen, Anne E. Jaskot, Vihang Mehta, John M. Cannon

Abstract: Studying the galaxies responsible for reionization is often conducted through local reionization-era analogs; however, many of these local analogs are too massive to be representative of the low-mass star-forming galaxies that are thought to play a dominant role in reionization. The local, low-mass dwarf starburst galaxy Pox 186 is one such system with physical conditions representative of a reion… ▽ More Studying the galaxies responsible for reionization is often conducted through local reionization-era analogs; however, many of these local analogs are too massive to be representative of the low-mass star-forming galaxies that are thought to play a dominant role in reionization. The local, low-mass dwarf starburst galaxy Pox 186 is one such system with physical conditions representative of a reionization-era starburst galaxy. We present deep ultraviolet (UV) spectroscopy of Pox 186 to study its stellar population and ionization conditions and to compare these conditions to other local starburst galaxies. The new Cosmic Origins Spectrograph data are combined with archival observations to cover $\sim$1150-2000 A and allow for an assessment of Pox 186's stellar population, the relative enrichment of C and O, and the escape of ionizing photons. We detect significant Ly$α$ and low-ionization state absorption features, indicative of previously undetected neutral gas in Pox 186. The C/O relative abundance, log(C/O) = -0.62$\pm$0.02, is consistent with other low-metallicity dwarf galaxies and suggests a comparable star formation history in these systems. We compare UV line ratios in Pox 186 to those of dwarf galaxies and photoionization models, and we find excellent agreement for the ratios utilizing the intense C III], O III], and double-peaked C IV lines. However, the UV and optical He II emission is faint and distinguishes Pox 186 from other local starburst dwarf galaxies. We explore mechanisms that could produce faint He II, which have implications for the low-mass reionization-era galaxies which may have similar ionization conditions. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 22 pages, 9 figures, accepted for publication in The Astrophysical Journal

arXiv:2308.09064 [pdf, other]

The Lyman Continuum Escape Fraction of Star-forming Galaxies at $2.4\lesssim z\lesssim3.7$ from UVCANDELS

Authors: Xin Wang, Harry I. Teplitz, Brent M. Smith, Rogier A. Windhorst, Marc Rafelski, Vihang Mehta, Anahita Alavi, Gabriel Brammer, James Colbert, Norman Grogin, Nimish P. Hathi, Anton M. Koekemoer, Laura Prichard, Claudia Scarlata, Ben Sunnquist, Pablo Arrabal Haro, Christopher Conselice, Eric Gawiser, Yicheng Guo, Matthew Hayes, Rolf A. Jansen, Zhiyuan Ji, Ray A. Lucas, Robert O'Connell, Brant Robertson , et al. (52 additional authors not shown)

Abstract: The UltraViolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) survey is a Hubble Space Telescope (HST) Cycle-26 Treasury Program, allocated in total 164 orbits of primary Wide-Field Camera 3 Ultraviolet and Visible light F275W imaging with coordinated parallel Advanced Camera for Surveys F435W imaging, on four of the five premier extragalactic sur… ▽ More The UltraViolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) survey is a Hubble Space Telescope (HST) Cycle-26 Treasury Program, allocated in total 164 orbits of primary Wide-Field Camera 3 Ultraviolet and Visible light F275W imaging with coordinated parallel Advanced Camera for Surveys F435W imaging, on four of the five premier extragalactic survey fields: GOODS-N, GOODS-S, EGS, and COSMOS. We introduce this survey by presenting a thorough search for galaxies at $z\gtrsim2.4$ that leak significant Lyman continuum (LyC) radiation, as well as a stringent constraint on the LyC escape fraction ($f_{\rm esc}$) from stacking the UV images of a population of star-forming galaxies with secure redshifts. Our extensive search for LyC emission and stacking analysis benefit from the catalogs of high-quality spectroscopic redshifts compiled from archival ground-based data and HST slitless spectroscopy, carefully vetted by dedicated visual inspection efforts. We report a sample of five galaxies as individual LyC leaker candidates, showing $f_{\rm esc}^{\rm rel}\gtrsim60\%$ estimated using detailed Monte Carlo analysis of intergalactic medium attenuation. We develop a robust stacking method to apply to five samples of in total 85 non-detection galaxies in the redshift range of $z\in[2.4,3.7]$. Most stacks give tight 2-$σ$ upper limits below $f_{\rm esc}^{\rm rel}<6\%$. A stack for a subset of 32 emission-line galaxies shows tentative LyC leakage detected at 2.9-$σ$, indicating $f_{\rm esc}^{\rm rel}=5.7\%$ at $z\sim2.65$, supporting the key role of such galaxies in contributing to the cosmic reionization and maintaining the UV ionization background. These new F275W and F435W imaging mosaics from UVCANDELS have been made publicly available on the Barbara A. Mikulski Archive for Space Telescopes. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 33 pages, 21 figures, and 5 tables. Resubmitted after addressing the referee report

arXiv:2308.00041 [pdf, other]

doi 10.3847/1538-4357/aced3e

UV-Bright Star-Forming Clumps and Their Host Galaxies in UVCANDELS at 0.5 $\leq$ z $\leq$ 1

Authors: Alec Martin, Yicheng Guo, Xin Wang, Anton M. Koekemoer, Marc Rafelski, Harry I. Teplitz, Rogier A. Windhorst, Anahita Alavi, Norman A. Grogin, Laura Prichard, Ben Sunnquist, Daniel Ceverino, Nima Chartab, Christopher J. Conselice, Y. Sophia Dai, Avishai Dekel, Johnathan P. Gardner, Eric Gawiser, Nimish P. Hathi, Matthew J. Hayes, Rolf A. Jansen, Zhiyuan Ji, David C. Koo, Ray A. Lucas, Nir Mandelker , et al. (10 additional authors not shown)

Abstract: Giant star-forming clumps are a prominent feature of star-forming galaxies (SFGs) and contain important clues on galaxy formation and evolution. However, basic demographics of clumps and their host galaxies remain uncertain. Using the HST/WFC3 F275W images from the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (UVCANDELS), we detect and analyze giant sta… ▽ More Giant star-forming clumps are a prominent feature of star-forming galaxies (SFGs) and contain important clues on galaxy formation and evolution. However, basic demographics of clumps and their host galaxies remain uncertain. Using the HST/WFC3 F275W images from the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (UVCANDELS), we detect and analyze giant star-forming clumps in galaxies at 0.5 $\leq$ z $\leq$ 1, connecting two epochs when clumps are common (at cosmic high-noon, z $\sim$ 2) and rare (in the local universe). We construct a clump sample whose rest-frame 1600 Å luminosity is 3 times higher than the most luminous local HII regions (M$_{UV} \leq -$16 AB). In our sample, 35 $\pm$ 3$\%$ of low-mass galaxies (log[M$_{*}$/M$_{\odot}$] $<$ 10) are clumpy (i.e., containing at least one off-center clump). This fraction changes to 22 $\pm$ 3$\%$ and 22 $\pm$ 4$\%$ for intermediate (10 $\leq$ log[M$_{*}$/M$_{\odot}$] $\leq$ 10.5) and high-mass (log[M$_{*}$/M$_{\odot}$] $>$ 10.5) galaxies in agreement with previous studies. When compared to similar-mass non-clumpy SFGs, low- and intermediate-mass clumpy SFGs tend to have higher SFRs and bluer rest-frame U-V colors, while high-mass clumpy SFGs tend to be larger than non-clumpy SFGs. However, clumpy and non-clumpy SFGs have similar Sérsic index, indicating a similar underlying density profile. Furthermore, we investigate how UV luminosity of star-forming regions correlates with the physical properties of host galaxies. On average, more luminous star-forming regions reside in more luminous, smaller, and/or higher-specific SFR galaxies and are found closer to their hosts' galactic center. △ Less

Submitted 2 October, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: 21 pages, 13 figures, accepted for publication in ApJ

Journal ref: ApJ 955 106 (2023)

arXiv:2307.11288 [pdf, other]

Kernelized Offline Contextual Dueling Bandits

Authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger

Abstract: Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the a… ▽ More Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback in order to most efficiently identify a good policy, and introduce the offline contextual dueling bandit setting. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound. We also give empirical confirmation that this method outperforms a similar strategy that uses uniformly sampled contexts. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2307.09789 [pdf, other]

Quantum Optics based Algorithm for Measuring the Similarity between Images

Authors: Vivek Mehta, Sonali Jana, Utpal Roy

Abstract: We report an algorithm, based on quantum optics formulation, where a coherent state is used as the elementary quantum resource for the image representation. We provide an architecture with constituent optical elements in linear order with respect to the image resolution. The obtained phase-distributed multimode coherent state is fed into an image retrieval scheme and we identify the appropriate la… ▽ More We report an algorithm, based on quantum optics formulation, where a coherent state is used as the elementary quantum resource for the image representation. We provide an architecture with constituent optical elements in linear order with respect to the image resolution. The obtained phase-distributed multimode coherent state is fed into an image retrieval scheme and we identify the appropriate laser intensity parameter for similarity measurement. The use of the principle of quantum superposition in the similarity measurement protocol enables us to encode multiple input images. We demonstrate the viability of the protocol through an objective quality assessment of images by adding consecutive layers of noises. The results are in good agreement with the expected outcome. The image distortion-sensitivity analysis of the metric establishes the further merit of the model. Our quantum algorithm has wider applicability also in supervised machine learning tasks. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 13 pages, 7 figures

arXiv:2306.05376 [pdf, other]

Anomaly Detection in Satellite Videos using Diffusion Models

Authors: Akash Awasthi, Son Ly, Jaer Nizam, Samira Zare, Videet Mehta, Safwan Ahmed, Keshav Shah, Ramakrishna Nemani, Saurabh Prasad, Hien Van Nguyen

Abstract: The definition of anomaly detection is the identification of an unexpected event. Real-time detection of extreme events such as wildfires, cyclones, or floods using satellite data has become crucial for disaster management. Although several earth-observing satellites provide information about disasters, satellites in the geostationary orbit provide data at intervals as frequent as every minute, ef… ▽ More The definition of anomaly detection is the identification of an unexpected event. Real-time detection of extreme events such as wildfires, cyclones, or floods using satellite data has become crucial for disaster management. Although several earth-observing satellites provide information about disasters, satellites in the geostationary orbit provide data at intervals as frequent as every minute, effectively creating a video from space. There are many techniques that have been proposed to identify anomalies in surveillance videos; however, the available datasets do not have dynamic behavior, so we discuss an anomaly framework that can work on very high-frequency datasets to find very fast-moving anomalies. In this work, we present a diffusion model which does not need any motion component to capture the fast-moving anomalies and outperforms the other baseline methods. △ Less

Submitted 25 May, 2023; originally announced June 2023.

arXiv:2306.02837 [pdf, other]

Environmental sustainability in basic research: a perspective from HECAP+

Authors: Sustainable HECAP+ Initiative, :, Shankha Banerjee, Thomas Y. Chen, Claire David, Michael Düren, Harold Erbin, Jacopo Ghiglieri, Mandeep S. S. Gill, L Glaser, Christian Gütschow, Jack Joseph Hall, Johannes Hampp, Patrick Koppenburg, Matthias Koschnitzke, Kristin Lohwasser, Rakhi Mahbubani, Viraf Mehta, Peter Millington, Ayan Paul, Frauke Poblotzki, Karolos Potamianos, Nikolina Šarčević, Rajeev Singh, Hannah Wakeling , et al. (3 additional authors not shown)

Abstract: The climate crisis and the degradation of the world's ecosystems require humanity to take immediate action. The international scientific community has a responsibility to limit the negative environmental impacts of basic research. The HECAP+ communities (High Energy Physics, Cosmology, Astroparticle Physics, and Hadron and Nuclear Physics) make use of common and similar experimental infrastructure… ▽ More The climate crisis and the degradation of the world's ecosystems require humanity to take immediate action. The international scientific community has a responsibility to limit the negative environmental impacts of basic research. The HECAP+ communities (High Energy Physics, Cosmology, Astroparticle Physics, and Hadron and Nuclear Physics) make use of common and similar experimental infrastructure, such as accelerators and observatories, and rely similarly on the processing of big data. Our communities therefore face similar challenges to improving the sustainability of our research. This document aims to reflect on the environmental impacts of our work practices and research infrastructure, to highlight best practice, to make recommendations for positive changes, and to identify the opportunities and challenges that such changes present for wider aspects of social responsibility. △ Less

Submitted 18 August, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 158 pages, 21 figures; comments welcome. Revisions included in Version 2.0 are detailed on page 3 of the pdf. If you would like to endorse this document please visit: https://sustainable-hecap-plus.github.io/. An HTML version of this document is available at: https://sustainable-hecap-plus.github.io/

arXiv:2305.09021 [pdf, other]

doi 10.3847/1538-4357/acd5d6

Fraction of Clumpy Star-Forming Galaxies at $0.5\leq z\leq 3$ in UVCANDELS: Dependence on Stellar Mass and Environment

Authors: Zahra Sattari, Bahram Mobasher, Nima Chartab, Daniel D. Kelson, Harry I. Teplitz, Marc Rafelski, Norman A. Grogin, Anton M. Koekemoer, Xin Wang, Rogier A. Windhorst, Anahita Alavi, Laura Prichard, Ben Sunnquist, Jonathan P. Gardner, Eric Gawiser, Nimish P. Hathi, Matthew J. Hayes, Zhiyuan Ji, Vihang Mehta, Brant E. Robertson, Claudia Scarlata, L. Y. Aaron Yung, Christopher J. Conselice, Y. Sophia Dai, Yicheng Guo , et al. (3 additional authors not shown)

Abstract: High-resolution imaging of galaxies in rest-frame UV has revealed the existence of giant star-forming clumps prevalent in high redshift galaxies. Studying these sub-structures provides important information about their formation and evolution and informs theoretical galaxy evolution models. We present a new method to identify clumps in galaxies' high-resolution rest-frame UV images. Using imaging… ▽ More High-resolution imaging of galaxies in rest-frame UV has revealed the existence of giant star-forming clumps prevalent in high redshift galaxies. Studying these sub-structures provides important information about their formation and evolution and informs theoretical galaxy evolution models. We present a new method to identify clumps in galaxies' high-resolution rest-frame UV images. Using imaging data from CANDELS and UVCANDELS, we identify star-forming clumps in an HST/F160W$\leq 25$ AB mag sample of 6767 galaxies at $0.5\leq z\leq 3$ in four fields, GOODS-N, GOODS-S, EGS, and COSMOS. We use a low-pass band filter in Fourier space to reconstruct the background image of a galaxy and detect small-scale features (clumps) on the background-subtracted image. Clumpy galaxies are defined as those having at least one off-center clump that contributes a minimum of 10$\%$ of the galaxy's total rest-frame UV flux. We measure the fraction of clumpy galaxies ($\rm f_{clumpy}$) as a function of stellar mass, redshift, and galaxy environment. Our results indicate that $\rm f_{clumpy}$ increases with redshift, reaching $\sim 65\%$ at $z\sim 1.5$. We also find that $\rm f_{clumpy}$ in low-mass galaxies ($\rm 9.5\leq log(M_*/M_\odot)\leq 10$) is 10$\%$ higher compared to that of their high-mass counterparts ($\rm log(M_*/M_\odot)>10.5$). Moreover, we find no evidence of significant environmental dependence of $\rm f_{clumpy}$ for galaxies at the redshift range of this study. Our results suggest that the fragmentation of gas clouds under violent disk instability remains the primary driving mechanism for clump formation, and incidents common in dense environments, such as mergers, are not the dominant processes. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 16 pages, 11 figures, 2 tables, accepted for publication in ApJ

arXiv:2305.00131 [pdf, other]

Regularizing Self-training for Unsupervised Domain Adaptation via Structural Constraints

Authors: Rajshekhar Das, Jonathan Francis, Sanket Vaibhav Mehta, Jean Oh, Emma Strubell, Jose Moura

Abstract: Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in th… ▽ More Self-training based on pseudo-labels has emerged as a dominant approach for addressing conditional distribution shifts in unsupervised domain adaptation (UDA) for semantic segmentation problems. A notable drawback, however, is that this family of approaches is susceptible to erroneous pseudo labels that arise from confirmation biases in the source domain and that manifest as nuisance factors in the target domain. A possible source for this mismatch is the reliance on only photometric cues provided by RGB image inputs, which may ultimately lead to sub-optimal adaptation. To mitigate the effect of mismatched pseudo-labels, we propose to incorporate structural cues from auxiliary modalities, such as depth, to regularise conventional self-training objectives. Specifically, we introduce a contrastive pixel-level objectness constraint that pulls the pixel representations within a region of an object instance closer, while pushing those from different object categories apart. To obtain object regions consistent with the true underlying object, we extract information from both depth maps and RGB-images in the form of multimodal clustering. Crucially, the objectness constraint is agnostic to the ground-truth semantic labels and, hence, appropriate for unsupervised domain adaptation. In this work, we show that our regularizer significantly improves top performing self-training methods (by up to $2$ points) in various UDA benchmarks for semantic segmentation. We include all code in the supplementary. △ Less

Submitted 28 April, 2023; originally announced May 2023.

arXiv:2304.08486 [pdf, other]

BenchMD: A Benchmark for Unified Learning on Medical Images and Sensors

Authors: Kathryn Wantlin, Chenwei Wu, Shih-Cheng Huang, Oishi Banerjee, Farah Dadabhoy, Veeral Vipin Mehta, Ryan Wonhee Han, Fang Cao, Raja R. Narayan, Errol Colak, Adewole Adamson, Laura Heacock, Geoffrey H. Tison, Alex Tamkin, Pranav Rajpurkar

Abstract: Medical data poses a daunting challenge for AI algorithms: it exists in many different modalities, experiences frequent distribution shifts, and suffers from a scarcity of examples and labels. Recent advances, including transformers and self-supervised learning, promise a more universal approach that can be applied flexibly across these diverse conditions. To measure and drive progress in this dir… ▽ More Medical data poses a daunting challenge for AI algorithms: it exists in many different modalities, experiences frequent distribution shifts, and suffers from a scarcity of examples and labels. Recent advances, including transformers and self-supervised learning, promise a more universal approach that can be applied flexibly across these diverse conditions. To measure and drive progress in this direction, we present BenchMD: a benchmark that tests how well unified, modality-agnostic methods, including architectures and training techniques (e.g. self-supervised learning, ImageNet pretraining),perform on a diverse array of clinically-relevant medical tasks. BenchMD combines 19 publicly available datasets for 7 medical modalities, including 1D sensor data, 2D images, and 3D volumetric scans. Our benchmark reflects real-world data constraints by evaluating methods across a range of dataset sizes, including challenging few-shot settings that incentivize the use of pretraining. Finally, we evaluate performance on out-of-distribution data collected at different hospitals than the training data, representing naturally-occurring distribution shifts that frequently degrade the performance of medical AI models. Our baseline results demonstrate that no unified learning technique achieves strong performance across all modalities, leaving ample room for improvement on the benchmark. Code is released at https://github.com/rajpurkarlab/BenchMD. △ Less

Submitted 26 June, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2304.03962 [pdf, other]

doi 10.1016/j.aop.2023.169314

Einstein-Podolsky-Rosen-Bohm experiments: a discrete data driven approach

Authors: Hans De Raedt, Mikhail I. Katsnelson, Manpreet S. Jattana, Vrinda Mehta, Madita Willsch, Dennis Willsch, Kristel Michielsen, Fengping Jin

Abstract: We take the point of view that building a one-way bridge from experimental data to mathematical models instead of the other way around avoids running into controversies resulting from attaching meaning to the symbols used in the latter. In particular, we show that adopting this view offers new perspectives for constructing mathematical models for and interpreting the results of Einstein-Podolsky-R… ▽ More We take the point of view that building a one-way bridge from experimental data to mathematical models instead of the other way around avoids running into controversies resulting from attaching meaning to the symbols used in the latter. In particular, we show that adopting this view offers new perspectives for constructing mathematical models for and interpreting the results of Einstein-Podolsky-Rosen-Bohm experiments. We first prove new Bell-type inequalities constraining the values of the four correlations obtained by performing Einstein-Podolsky-Rosen-Bohm experiments under four different conditions. The proof is ``model-free'' in the sense that it does not refer to any mathematical model that one imagines to have produced the data. The constraints only depend on the number of quadruples obtained by reshuffling the data in the four data sets without changing the values of the correlations. These new inequalities reduce to model-free versions of the well-known Bell-type inequalities if the maximum fraction of quadruples is equal to one. Being model-free, a violation of the latter by experimental data implies that not all the data in the four data sets can be reshuffled to form quadruples. Furthermore, being model-free inequalities, a violation of the latter by experimental data only implies that any mathematical model assumed to produce this data does not apply. Starting from the data obtained by performing Einstein-Podolsky-Rosen-Bohm experiments, we construct instead of postulate mathematical models that describe the main features of these data. The mathematical framework of plausible reasoning is applied to reproducible and robust data, yielding without using any concept of quantum theory, the expression of the correlation for a system of two spin-1/2 objects in the singlet state. (truncated here) △ Less

Submitted 14 June, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

Comments: Corrected typo's, minor corrections in appendix B.3

Journal ref: Annals of Physics, Volume 453, 169314, 2023

arXiv:2302.01805 [pdf, other]

Model-free inequality for data of Einstein-Podolsky-Rosen-Bohm experiments

Authors: Hans De Raedt, Mikhail I. Katsnelson, Manpreet S. Jattana, Vrinda Mehta, Madita Willsch, Dennis Willsch, Kristel Michielsen, Fengping Jin

Abstract: We present a new inequality constraining correlations obtained when performing Einstein-Podolsky-Rosen-Bohm experiments. The proof does not rely on mathematical models that are imagined to have produced the data and is therefore ``model-free''. The new inequality contains the model-free version of the well-known Bell-CHSH inequality as a special case. A violation of the latter implies that not all… ▽ More We present a new inequality constraining correlations obtained when performing Einstein-Podolsky-Rosen-Bohm experiments. The proof does not rely on mathematical models that are imagined to have produced the data and is therefore ``model-free''. The new inequality contains the model-free version of the well-known Bell-CHSH inequality as a special case. A violation of the latter implies that not all the data pairs in four data sets can be reshuffled to create quadruples. This conclusion provides a new perspective on the implications of the violation of Bell-type inequalities by experimental data. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: Extended version of Annals of Physics, Volume 453, 169314, 2023 (https://doi.org/10.1016/j.aop.2023.169314)

arXiv:2212.09744 [pdf, other]

DSI++: Updating Transformer Memory with New Documents

Authors: Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

Abstract: Differentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning ch… ▽ More Differentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents ($+12\%$). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting significantly. Concretely, it improves the average Hits@10 by $+21.1\%$ over competitive baselines for NQ and requires $6$ times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence. △ Less

Submitted 8 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Accepted at EMNLP 2023 main conference

arXiv:2212.09510 [pdf, other]

Near-optimal Policy Identification in Active Reinforcement Learning

Authors: Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic

Abstract: Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm… ▽ More Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2212.02626 [pdf, other]

doi 10.1145/3576915.3623105

A Generic Methodology for the Modular Verification of Security Protocol Implementations (extended version)

Authors: Linard Arquint, Malte Schwerhoff, Vaibhav Mehta, Peter Müller

Abstract: Security protocols are essential building blocks of modern IT systems. Subtle flaws in their design or implementation may compromise the security of entire systems. It is, thus, important to prove the absence of such flaws through formal verification. Much existing work focuses on the verification of protocol *models*, which is not sufficient to show that their *implementations* are actually secur… ▽ More Security protocols are essential building blocks of modern IT systems. Subtle flaws in their design or implementation may compromise the security of entire systems. It is, thus, important to prove the absence of such flaws through formal verification. Much existing work focuses on the verification of protocol *models*, which is not sufficient to show that their *implementations* are actually secure. Verification techniques for protocol implementations (e.g., via code generation or model extraction) typically impose severe restrictions on the used programming language and code design, which may lead to sub-optimal implementations. In this paper, we present a methodology for the modular verification of strong security properties directly on the level of the protocol implementations. Our methodology leverages state-of-the-art verification logics and tools to support a wide range of implementations and programming languages. We demonstrate its effectiveness by verifying memory safety and security of Go implementations of the Needham-Schroeder-Lowe, Diffie-Hellman key exchange, and WireGuard protocols, including forward secrecy and injective agreement for WireGuard. We also show that our methodology is agnostic to a particular language or program verifier with a prototype implementation for C. △ Less

Submitted 10 September, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.02670 [pdf, other]

Flashlights: More than A Dozen High-Significance Microlensing Events of Extremely Magnified Stars in Galaxies at Redshifts z=0.7-1.5

Authors: Patrick L. Kelly, Wenlei Chen, Amruth Alfred, Thomas J. Broadhurst, Jose M. Diego, Najmeh Emami, Alexei V. Filippenko, Allison Keen, Sung Kei Li, Jeremy Lim, Ashish K. Meena, Masamune Oguri, Claudia Scarlata, Tommaso Treu, Hayley Williams, Liliya L. R. Williams, Rui Zhou, Adi Zitrin, Ryan J. Foley, Saurabh W. Jha, Nick Kaiser, Vihang Mehta, Steven Rieck, Laura Salo, Nathan Smith , et al. (1 additional authors not shown)

Abstract: Once only accessible in nearby galaxies, we can now study individual stars across much of the observable universe aided by galaxy-cluster gravitational lenses. When a star, compact object, or multiple such objects in the foreground galaxy-cluster lens become aligned, they can magnify a background individual star, and the timescale of a magnification peak can limit its size to tens of AU. The numbe… ▽ More Once only accessible in nearby galaxies, we can now study individual stars across much of the observable universe aided by galaxy-cluster gravitational lenses. When a star, compact object, or multiple such objects in the foreground galaxy-cluster lens become aligned, they can magnify a background individual star, and the timescale of a magnification peak can limit its size to tens of AU. The number and frequency of microlensing events therefore opens a window into the population of stars and compact objects, as well as high-redshift stars. To assemble the first statistical sample of stars in order to constrain the initial mass function (IMF) of massive stars at redshift z=0.7-1.5, the abundance of primordial black holes in galaxy-cluster dark matter, and the IMF of the stars making up the intracluster light, we are carrying out a 192-orbit program with the Hubble Space Telescope called "Flashlights," which is now two-thirds complete owing to scheduling challenges. We use the ultrawide F200LP and F350LP long-pass WFC3 UVIS filters and conduct two 16-orbit visits separated by one year. Having an identical roll angle during both visits, while difficult to schedule, yields extremely clean subtraction. Here we report the discovery of more than a dozen bright microlensing events, including multiple examples in the famous "Dragon Arc" discovered in the 1980s, as well as the "Spocks" and "Warhol" arcs that have hosted already known supergiants. The ultradeep observer-frame ultraviolet-through-optical imaging is sensitive to hot stars, which will complement deep James Webb Space Telescope infrared imaging. We are also acquiring Large Binocular Telescope LUCI and Keck-I MOSFIRE near-infrared spectra of the highly magnified arcs to constrain their recent star-formation histories. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2211.02094 [pdf, other]

doi 10.3847/1538-4357/acd181

Low Metallicity Galaxies from the Dark Energy Survey

Authors: Yu-Heng Lin, Claudia Scarlata, Vihang Mehta, Evan Skillman, Matthew Hayes, Kristen B. W. McQuinn, Lucy Fortson, Katherine Chworowsky, Leonardo Clarke

Abstract: We present a new selection of 358 blue compact dwarf galaxies (BCDs) from 5,000 square degrees in the Dark Energy Survey (DES), and the spectroscopic follow-up of a subsample of 68 objects. For the subsample of 34 objects with deep spectra, we measure the metallicity via the direct T$_e$ method using the auroral [\oiii]$λ$ 4363 emission line. These BCDs have an average oxygen abundance of 12+log(O… ▽ More We present a new selection of 358 blue compact dwarf galaxies (BCDs) from 5,000 square degrees in the Dark Energy Survey (DES), and the spectroscopic follow-up of a subsample of 68 objects. For the subsample of 34 objects with deep spectra, we measure the metallicity via the direct T$_e$ method using the auroral [\oiii]$λ$ 4363 emission line. These BCDs have an average oxygen abundance of 12+log(O/H)= 7.8, with stellar masses between 10$^7$ to 10$^8$ M$_\odot$ and specific star formation rates between $\sim$ 10$^{-9}$ to 10$^{-7}$ yr$^{-1}$. We compare the position of our BCDs with the Mass-metallicity (M-Z) and Luminosity-metallicity (L-Z) relation derived from the Local Volume Legacy sample. We find the scatter about the M-Z relation is smaller than the scatter about the L-Z relation. We identify a correlation between the offsets from the M-Z and L-Z relation that we suggest is due to the contribution of metal-poor inflows. Finally, we explore the validity of the mass-metallicity-SFR fundamental plane in the mass range probed by our galaxies. We find that BCDs with stellar masses smaller than $10^{8}$M$_{\odot}$ do not follow the extrapolation of the fundamental plane. This result suggests that mechanisms other than the balance between inflows and outflows may be at play in regulating the position of low mass galaxies in the M-Z-SFR space. △ Less

Submitted 14 November, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: 19 pages, 10 figures, accepted by ApJ

arXiv:2211.02056 [pdf, other]

doi 10.3847/1538-4357/acd9cf

A spatially resolved analysis of star-formation burstiness by comparing UV and H$α$ in galaxies at z$\sim$1 with UVCANDELS

Authors: Vihang Mehta, Harry I. Teplitz, Claudia Scarlata, Xin Wang, Anahita Alavi, James Colbert, Marc Rafelski, Norman Grogin, Anton Koekemoer, Laura Prichard, Rogier Windhorst, Justin M. Barber, Christopher J. Conselice, Y. Sophia Dai, Jonathan P. Gardner, Eric Gawiser, Yicheng Guo, Nimish Hathi, Pablo Arrabal Haro, Matthew Hayes, Kartheik G. Iyer, Rolf A. Jansen, Zhiyuan Ji, Peter Kurczynski, Maxwell Kuschel , et al. (8 additional authors not shown)

Abstract: The UltraViolet imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) program provides HST/UVIS F275W imaging for four CANDELS fields. We combine this UV imaging with existing HST/near-IR grism spectroscopy from 3D-HST$+$AGHAST to directly compare the resolved rest-frame UV and H$α$ emission for a sample of 979 galaxies at $0.7<z<1.5$ spanning a range in… ▽ More The UltraViolet imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) program provides HST/UVIS F275W imaging for four CANDELS fields. We combine this UV imaging with existing HST/near-IR grism spectroscopy from 3D-HST$+$AGHAST to directly compare the resolved rest-frame UV and H$α$ emission for a sample of 979 galaxies at $0.7<z<1.5$ spanning a range in stellar mass of $10^{8-11.5}$ M$_\odot$. Using a stacking analysis, we perform a resolved comparison between homogenized maps of rest-UV and H$α$ to compute the average UV-to-H$α$ luminosity ratio (an indicator of burstiness in star-formation) as a function of galactocentric radius. We find that galaxies below stellar mass of $\sim$10$^{9.5}$ M$_\odot$, at all radii, have a UV-to-H$α$ ratio higher than the equilibrium value expected from constant star-formation, indicating a significant contribution from bursty star-formation. Even for galaxies with stellar mass $\gtrsim$10$^{9.5}$ M$_\odot$, the UV-to-H$α$ ratio is elevated towards in their outskirts ($R/R_{eff}>1.5$), suggesting that bursty star-formation is likely prevalent in the outskirts of even the most massive galaxies but is likely over-shadowed by their brighter cores. Furthermore, we present the UV-to-H$α$ ratio as a function of galaxy surface brightness, a proxy for stellar mass surface density, and find that regions below $\sim$10$^{7.5}$ M$_\odot$ kpc$^{-2}$ are consistent with bursty star-formation, regardless of their galaxy stellar mass, potentially suggesting that local star-formation is independent of global galaxy properties at the smallest scales. Lastly, we find galaxies at $z>1.1$ to have bursty star-formation regardless of radius or surface brightness. △ Less

Submitted 15 June, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: 22 pages, 10 figures; accepted to ApJ

arXiv:2210.04642 [pdf, other]

Exploration via Planning for Information about the Optimal Trajectory

Authors: Viraj Mehta, Ian Char, Joseph Abbate, Rory Conlin, Mark D. Boyer, Stefano Ermon, Jeff Schneider, Willie Neiswanger

Abstract: Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maxim… ▽ More Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maximizing policy or by attempting to gather maximal information about environment dynamics without taking the given task into account. In this work, we develop a method that allows us to plan for exploration while taking both the task and the current knowledge about the dynamics into account. The key insight to our approach is to plan an action sequence that maximizes the expected information gain about the optimal trajectory for the task at hand. We demonstrate that our method learns strong policies with 2x fewer samples than strong exploration baselines and 200x fewer samples than model free methods on a diverse set of low-to-medium dimensional control tasks in both the open-loop and closed-loop control settings. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: Conference paper at Neurips 2022. Code available at https://github.com/fusion-ml/trajectory-information-rl. arXiv admin note: text overlap with arXiv:2112.05244

arXiv:2210.03684 [pdf, other]

doi 10.1093/mnras/stac2919

Galaxy Zoo: Clump Scout -- Design and first application of a two-dimensional aggregation tool for citizen science

Authors: Hugh Dickinson, Dominic Adams, Vihang Mehta, Claudia Scarlata, Lucy Fortson, Stephen Serjeant, Coleman Krawczyk, Sandor Kruk, Chris Lintott, Kameswara Mantha, Brooke D. Simmons, Mike Walmsley

Abstract: Galaxy Zoo: Clump Scout is a web-based citizen science project designed to identify and spatially locate giant star forming clumps in galaxies that were imaged by the Sloan Digital Sky Survey Legacy Survey. We present a statistically driven software framework that is designed to aggregate two-dimensional annotations of clump locations provided by multiple independent Galaxy Zoo: Clump Scout volunt… ▽ More Galaxy Zoo: Clump Scout is a web-based citizen science project designed to identify and spatially locate giant star forming clumps in galaxies that were imaged by the Sloan Digital Sky Survey Legacy Survey. We present a statistically driven software framework that is designed to aggregate two-dimensional annotations of clump locations provided by multiple independent Galaxy Zoo: Clump Scout volunteers and generate a consensus label that identifies the locations of probable clumps within each galaxy. The statistical model our framework is based on allows us to assign false-positive probabilities to each of the clumps we identify, to estimate the skill levels of each of the volunteers who contribute to Galaxy Zoo: Clump Scout and also to quantitatively assess the reliability of the consensus labels that are derived for each subject. We apply our framework to a dataset containing 3,561,454 two-dimensional points, which constitute 1,739,259 annotations of 85,286 distinct subjects provided by 20,999 volunteers. Using this dataset, we identify 128,100 potential clumps distributed among 44,126 galaxies. This dataset can be used to study the prevalence and demographics of giant star forming clumps in low-redshift galaxies. The code for our aggregation software framework is publicly available at: https://github.com/ou-astrophysics/BoxAggregator △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 31 pages, 22 figures. Accepted for publication in Monthly Notices of the Royal Astronomical Society

arXiv:2207.04354 [pdf, other]

An Introduction to Lifelong Supervised Learning

Authors: Shagun Sodhani, Mojtaba Faramarzi, Sanket Vaibhav Mehta, Pranshu Malviya, Mohamed Abdelsalam, Janarthanan Janarthanan, Sarath Chandar

Abstract: This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the des… ▽ More This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7. △ Less

Submitted 12 July, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

Comments: Lifelong Learning Primer

arXiv:2206.11689 [pdf, other]

On the hardness of quadratic unconstrained binary optimization problems

Authors: Vrinda Mehta, Fengping Jin, Kristel Michielsen, Hans De Raedt

Abstract: We use exact enumeration to characterize the solutions of quadratic unconstrained binary optimization problems of less than 21 variables in terms of their distributions of Hamming distances to close-by solutions. We also perform experiments with the D-Wave Advantage 5.1 quantum annealer, solving many instances of up to 170-variable, quadratic unconstrained binary optimization problems. Our results… ▽ More We use exact enumeration to characterize the solutions of quadratic unconstrained binary optimization problems of less than 21 variables in terms of their distributions of Hamming distances to close-by solutions. We also perform experiments with the D-Wave Advantage 5.1 quantum annealer, solving many instances of up to 170-variable, quadratic unconstrained binary optimization problems. Our results demonstrate that the exponents characterizing the success probability of a D-Wave annealer to solve a QUBO correlate very well with the predictions based on the Hamming distance distributions computed for small problem instances. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: 6 pages, 6 figures

arXiv:2205.12694 [pdf, other]

doi 10.18653/v1/2022.findings-emnlp.361

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

Authors: Clara Na, Sanket Vaibhav Mehta, Emma Strubell

Abstract: Model compression by way of parameter pruning, quantization, or distillation has recently gained popularity as an approach for reducing the computational requirements of modern deep neural network models for NLP. Inspired by prior works suggesting a connection between simpler, more generalizable models and those that lie within wider loss basins, we hypothesize that optimizing for flat minima shou… ▽ More Model compression by way of parameter pruning, quantization, or distillation has recently gained popularity as an approach for reducing the computational requirements of modern deep neural network models for NLP. Inspired by prior works suggesting a connection between simpler, more generalizable models and those that lie within wider loss basins, we hypothesize that optimizing for flat minima should lead to simpler parameterizations and thus more compressible models. We propose to combine sharpness-aware minimization (SAM) with various task-specific model compression methods, including iterative magnitude pruning (IMP), structured pruning with a distillation objective, and post-training dynamic quantization. Empirically, we show that optimizing for flatter minima consistently leads to greater compressibility of parameters compared to vanilla Adam when fine-tuning BERT models, with little to no loss in accuracy on the GLUE text classification and SQuAD question answering benchmarks. Moreover, SAM finds superior winning tickets during IMP that 1) are amenable to vanilla Adam optimization, and 2) transfer more effectively across tasks. △ Less

Submitted 24 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: EMNLP 2022 Findings, 28 pages

arXiv:2205.12169 [pdf, other]

doi 10.3847/1538-4357/acacfd

Investigating the Dominant Environmental Quenching Process in UVCANDELS/COSMOS Groups

Authors: Maxwell Kuschel, Claudia Scarlata, Vihang Mehta, Harry I. Teplitz, Marc Rafelski, Xin Wang, Ben Sunnquist, Laura Prichard, Norman Grogin, Rogier Windhorst, Michael Rutkowski, Anahita Alavi, Nima Chartab, Christopher J. Conselice, Y. Sophia Dai, Eric Gawiser, Mauro Giavalisco, Pablo Arrabal Haro, Nimish Hathi, Rolf Jansen, Zhiyuan Ji, Anton Koekemoer, Ray A. Lucas, Kameswara Mantha, Bahram Mobasher , et al. (14 additional authors not shown)

Abstract: We explore how the fraction of quenched galaxies changes in groups of galaxies with respect to the distance to the center of the group, redshift, and stellar mass to determine the dominant process of environmental quenching in $0.2 < z < 0.8$ groups. We use new UV data from the UVCANDELS project in addition to existing multiband photometry to derive new galaxy physical properties of the group gala… ▽ More We explore how the fraction of quenched galaxies changes in groups of galaxies with respect to the distance to the center of the group, redshift, and stellar mass to determine the dominant process of environmental quenching in $0.2 < z < 0.8$ groups. We use new UV data from the UVCANDELS project in addition to existing multiband photometry to derive new galaxy physical properties of the group galaxies from the zCOSMOS 20k Group Catalog. Limiting our analysis to a complete sample of log$(M_*/M_{\odot})>10.56$ group galaxies we find that the probability of being quenched increases slowly with decreasing redshift, diverging from the stagnant field galaxy population. A corresponding analysis on how the probability of being quenched increases with time within groups suggests that the dominant environmental quenching process is characterized by slow ($\sim$Gyr) timescales. We find a quenching time of approximately $4.91^{+0.91}_{-1.47} $Gyrs, consistent with the slow processes of strangulation (Larson et al. 1980) and delayed-then-rapid quenching (Wetzel et al. 2013 arXiv:1206.3571v2 [astro-ph.CO]), although more data are needed to confirm this result. △ Less

Submitted 20 June, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

Journal ref: ApJ 947 17 (2023)

arXiv:2204.12026 [pdf, other]

BATS: Best Action Trajectory Stitching

Authors: Ian Char, Viraj Mehta, Adam Villaflor, John M. Dolan, Jeff Schneider

Abstract: The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by plannin… ▽ More The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by planning on the fixed dataset directly. Specifically, we introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We do this by using learned dynamics models to plan short trajectories between states. Since exact value iteration can be performed on this constructed MDP, it becomes easy to identify which trajectories are advantageous to add to the MDP. Crucially, since most transitions in this MDP come from the logged data, trajectories from the MDP can be rolled out for long periods with confidence. We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics. Finally, we demonstrate empirically how algorithms that uniformly constrain the learned policy to the entire dataset can result in unwanted behavior, and we show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem. △ Less

Submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted to NeurIPS Offline RL Workshop 2021

arXiv:2204.05553 [pdf, ps, other]

doi 10.1093/mnras/stac1052

The average dust attenuation curve at z~1.3 based on HST grism surveys

Authors: A. J. Battisti, M. B. Bagley, I. Baronchelli, Y. -S. Dai, A. L. Henry, M. A. Malkan, A. Alavi, D. Calzetti, J. Colbert, P. J. McCarthy, V. Mehta, M. Rafelski, C. Scarlata, I. Shivaei, E. Wisnioski

Abstract: We present the first characterisation of the average dust attenuation curve at $z\sim1.3$ by combining rest-frame ultraviolet through near-IR photometry with Balmer decrement ($\mathrm{H}α$/$\mathrm{H}β$) constraints for $\sim$900 galaxies with $8\lesssim\log (M_\star /M_\odot)<10.2$ at $0.75<z<1.5$ in the HST WFC3 IR Spectroscopic Parallel (WISP) and 3D-HST grism surveys. Using galaxies in SDSS,… ▽ More We present the first characterisation of the average dust attenuation curve at $z\sim1.3$ by combining rest-frame ultraviolet through near-IR photometry with Balmer decrement ($\mathrm{H}α$/$\mathrm{H}β$) constraints for $\sim$900 galaxies with $8\lesssim\log (M_\star /M_\odot)<10.2$ at $0.75<z<1.5$ in the HST WFC3 IR Spectroscopic Parallel (WISP) and 3D-HST grism surveys. Using galaxies in SDSS, we establish that the ($\mathrm{H}α$+[NII])/[OIII] line ratio and stellar mass are good proxies for the Balmer decrement in low-spectral resolution grism data when only upper-limits on $\mathrm{H}β$ are available and/or $\mathrm{H}α$ is blended with [NII]. The slope of the $z\sim1.3$ attenuation curve ($A(0.15μm)/A(V)=3.15$) and its normalization ($R_V=3.26$) lie in-between the values found for $z=0$ and $z\sim2$ dust attenuation curves derived with similar methods. These provide supporting evidence that the average dust attenuation curve of star forming galaxies evolves continuously with redshift. The $z\sim1.3$ curve has a mild 2175Å feature (bump amplitude, $E_b=0.83$; $\sim$25% that of the MW extinction curve), which is comparable to several other studies at $0<z\lesssim3$, and suggests that the average strength of this feature may not evolve significantly with redshift. The methods we develop to constrain dust attenuation from HST grism data can be applied to future grism surveys with JWST, Euclid, and RST. These new facilities will detect millions of emission line galaxies and offer the opportunity to significantly improve our understanding of how and why dust attenuation curves evolve. △ Less

Submitted 12 April, 2022; originally announced April 2022.

Comments: 21 pages, 13 figures, 4 tables. Accepted for publication in MNRAS

arXiv:2203.11726 [pdf, other]

AI-enabled Assessment of Cardiac Systolic and Diastolic Function from Echocardiography

Authors: Esther Puyol-Antón, Bram Ruijsink, Baldeep S. Sidhu, Justin Gould, Bradley Porter, Mark K. Elliott, Vishal Mehta, Haotian Gu, Miguel Xochicale, Alberto Gomez, Christopher A. Rinaldi, Martin Cowie, Phil Chowienczyk, Reza Razavi, Andrew P. King

Abstract: Left ventricular (LV) function is an important factor in terms of patient management, outcome, and long-term survival of patients with heart disease. The most recently published clinical guidelines for heart failure recognise that over reliance on only one measure of cardiac function (LV ejection fraction) as a diagnostic and treatment stratification biomarker is suboptimal. Recent advances in AI-… ▽ More Left ventricular (LV) function is an important factor in terms of patient management, outcome, and long-term survival of patients with heart disease. The most recently published clinical guidelines for heart failure recognise that over reliance on only one measure of cardiac function (LV ejection fraction) as a diagnostic and treatment stratification biomarker is suboptimal. Recent advances in AI-based echocardiography analysis have shown excellent results on automated estimation of LV volumes and LV ejection fraction. However, from time-varying 2-D echocardiography acquisition, a richer description of cardiac function can be obtained by estimating functional biomarkers from the complete cardiac cycle. In this work we propose for the first time an AI approach for deriving advanced biomarkers of systolic and diastolic LV function from 2-D echocardiography based on segmentations of the full cardiac cycle. These biomarkers will allow clinicians to obtain a much richer picture of the heart in health and disease. The AI model is based on the 'nn-Unet' framework and was trained and tested using four different databases. Results show excellent agreement between manual and automated analysis and showcase the potential of the advanced systolic and diastolic biomarkers for patient stratification. Finally, for a subset of 50 cases, we perform a correlation analysis between clinical biomarkers derived from echocardiography and CMR and we show excellent agreement between the two modalities. △ Less

Submitted 21 July, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Journal ref: MICCAI ASMUS 2020

arXiv:2202.00118 [pdf]

doi 10.1103/PhysRevA.105.062406

Quantum annealing for hard 2-SAT problems : Distribution and scaling of minimum energy gap and success probability

Authors: Vrinda Mehta, Fengping Jin, Hans De Raedt, Kristel Michielsen

Abstract: In recent years, quantum annealing has gained the status of being a promising candidate for solving various optimization problems. Using a set of hard 2-satisfiabilty (2-SAT) problems, consisting of upto 18-variables problems, we analyze the scaling complexity of the quantum annealing algorithm and study the distributions of the minimum energy gap and the success probability. We extend the analysi… ▽ More In recent years, quantum annealing has gained the status of being a promising candidate for solving various optimization problems. Using a set of hard 2-satisfiabilty (2-SAT) problems, consisting of upto 18-variables problems, we analyze the scaling complexity of the quantum annealing algorithm and study the distributions of the minimum energy gap and the success probability. We extend the analysis of the standard quantum annealing Hamiltonian by introducing an additional term, the trigger Hamiltonian, which can be of two types : ferromagnetic and antiferromagnetic. We use these trigger Hamiltonians to study their influence on the success probability for solving the selected 2-SAT problems. We found that although the scaling of the run-time is exponential for the standard and modified quantum annealing Hamiltonians, the scaling constant in case of adding the trigger Hamiltonians can be significantly smaller. Furthermore, certain choices for the trigger Hamiltonian and annealing times can result in a better scaling than that for simulated annealing. Lastly, we also use the quantum annealers of D-Wave Systems Inc. to study their performance in solving the 2-SAT problems and compare it with the simulation results. △ Less

Submitted 31 January, 2022; originally announced February 2022.

Comments: 17 pages, 14 figures

Journal ref: PhysRevA. 105 062406 (2022)

arXiv:2201.06581 [pdf, other]

doi 10.3847/1538-4357/ac6512

Galaxy Zoo: Clump Scout: Surveying the Local Universe for Giant Star-forming Clumps

Authors: Dominic Adams, Vihang Mehta, Hugh Dickinson, Claudia Scarlata, Lucy Fortson, Sandor Kruk, Brooke Simmons, Chris Lintott

Abstract: Massive, star-forming clumps are a common feature of high-redshift star-forming galaxies. How they formed, and why they are so rare at low redshift, remains unclear. In this paper we identify the largest yet sample of clumpy galaxies (7,052) at low redshift using data from the citizen science project \textit{Galaxy Zoo: Clump Scout}, in which volunteers classified over 58,000 Sloan Digital Sky Sur… ▽ More Massive, star-forming clumps are a common feature of high-redshift star-forming galaxies. How they formed, and why they are so rare at low redshift, remains unclear. In this paper we identify the largest yet sample of clumpy galaxies (7,052) at low redshift using data from the citizen science project \textit{Galaxy Zoo: Clump Scout}, in which volunteers classified over 58,000 Sloan Digital Sky Survey (SDSS) galaxies spanning redshift $0.02 < z < 0.15$. We apply a robust completeness correction by comparing with simulated clumps identified by the same method. Requiring that the ratio of clump-to-galaxy flux in the SDSS $u$ band be greater than 8\% (similar to clump definitions used by other works), we estimate the fraction of local galaxies hosting at least one clump ($f_{clumpy}$) to be $2.68_{-0.30}^{+0.33}\%$. We also compute the same fraction with a less stringent cut of 3\% ($11.33_{-1.16}^{+0.89}\%$), as the higher number count and lower statistical noise of this fraction permits sharper comparison with future low-redshift clumpy galaxy studies. Our results reveal a sharp decline in $f_{clumpy}$ over $0 < z < 0.5$. The minor merger rate remains roughly constant over the same span, so we suggest that minor mergers are unlikely to be the primary driver of clump formation. Instead, the rate of galaxy turbulence is a better tracer for $f_{clumpy}$ over $0 < z < 1.5$ for galaxies of all masses, which supports the idea that clump formation is primarily driven by violent disk instability for all galaxy populations during this period. △ Less

Submitted 17 January, 2022; originally announced January 2022.

Comments: 23 pages, 13 figures, 4 tables, submitted to ApJ

arXiv:2112.09153 [pdf, other]

An Empirical Investigation of the Role of Pre-training in Lifelong Learning

Authors: Sanket Vaibhav Mehta, Darshan Patil, Sarath Chandar, Emma Strubell

Abstract: The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-t… ▽ More The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks. △ Less

Submitted 29 August, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

Journal ref: Journal of Machine Learning Research 24 (2023) 1-50

arXiv:2112.06868 [pdf, other]

Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias

Authors: Frederic Koehler, Viraj Mehta, Chenghui Zhou, Andrej Risteski

Abstract: Variational Autoencoders are one of the most commonly used generative models, particularly for image data. A prominent difficulty in training VAEs is data that is supported on a lower-dimensional manifold. Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance… ▽ More Variational Autoencoders are one of the most commonly used generative models, particularly for image data. A prominent difficulty in training VAEs is data that is supported on a lower-dimensional manifold. Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance which is correctly supported on the ground truth manifold. They gave partial support for that conjecture by showing that some optima of the VAE loss do satisfy this property, but did not analyze the training dynamics. In this paper, we show that for linear encoders/decoders, the conjecture is true-that is the VAE training does recover a generator with support equal to the ground truth manifold-and does so due to an implicit bias of gradient descent rather than merely the VAE loss itself. In the nonlinear case, we show that VAE training frequently learns a higher-dimensional manifold which is a superset of the ground truth manifold. △ Less

Submitted 17 May, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted as a conference paper at ICLR 2022

arXiv:2112.05244 [pdf, other]

An Experimental Design Perspective on Model-Based Reinforcement Learning

Authors: Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, Willie Neiswanger

Abstract: In many practical applications of RL, it is expensive to observe state transitions from the environment. For example, in the problem of plasma control for nuclear fusion, computing the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohi… ▽ More In many practical applications of RL, it is expensive to observe state transitions from the environment. For example, in the problem of plasma control for nuclear fusion, computing the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohibits application of standard RL algorithms which usually require a large number of observations to learn. In this work, we address the problem of efficiently learning a policy while making a minimal number of state-action queries to the transition function. In particular, we leverage ideas from Bayesian optimal experimental design to guide the selection of state-action queries for efficient learning. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process. At each iteration, our algorithm maximizes this acquisition function, to choose the most informative state-action pair to be queried, thus yielding a data-efficient RL approach. We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to $5$ -- $1,000\times$ less data than model-based RL baselines and $10^3$ -- $10^5\times$ less data than model-free RL baselines. We also provide several ablated comparisons which point to substantial improvements arising from the principled method of obtaining data. △ Less

Submitted 15 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

Comments: Conference paper at ICLR 2022

Showing 1–50 of 135 results for author: Mehta, V