subscribe to arXiv mailings

doi 10.1109/MCSE.2024.3375572

Training Next Generation AI Users and Developers at NCSA

Authors: Daniel S. Katz, Volodymyr Kindratenko, Olena Kindratenko, Priyam Mazumdar

Abstract: This article focuses on training work carried out in artificial intelligence (AI) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign via a research experience for undergraduates (REU) program named FoDOMMaT. It also describes why we are interested in AI, and concludes by discussing what we've learned from running this program and its predec… ▽ More This article focuses on training work carried out in artificial intelligence (AI) at the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign via a research experience for undergraduates (REU) program named FoDOMMaT. It also describes why we are interested in AI, and concludes by discussing what we've learned from running this program and its predecessor over six years. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2404.10486 [pdf, other]

doi 10.1051/0004-6361/202449763

Discovery of a dormant 33 solar-mass black hole in pre-release Gaia astrometry

Authors: Gaia Collaboration, P. Panuzzo, T. Mazeh, F. Arenou, B. Holl, E. Caffau, A. Jorissen, C. Babusiaux, P. Gavras, J. Sahlmann, U. Bastian, Ł. Wyrzykowski, L. Eyer, N. Leclerc, N. Bauchet, A. Bombrun, N. Mowlavi, G. M. Seabroke, D. Teyssier, E. Balbinot, A. Helmi, A. G. A. Brown, A. Vallenari, T. Prusti, J. H. J. de Bruijne , et al. (390 additional authors not shown)

Abstract: Gravitational waves from black-hole merging events have revealed a population of extra-galactic BHs residing in short-period binaries with masses that are higher than expected based on most stellar evolution models - and also higher than known stellar-origin black holes in our Galaxy. It has been proposed that those high-mass BHs are the remnants of massive metal-poor stars. Gaia astrometry is exp… ▽ More Gravitational waves from black-hole merging events have revealed a population of extra-galactic BHs residing in short-period binaries with masses that are higher than expected based on most stellar evolution models - and also higher than known stellar-origin black holes in our Galaxy. It has been proposed that those high-mass BHs are the remnants of massive metal-poor stars. Gaia astrometry is expected to uncover many Galactic wide-binary systems containing dormant BHs, which may not have been detected before. The study of this population will provide new information on the BH-mass distribution in binaries and shed light on their formation mechanisms and progenitors. As part of the validation efforts in preparation for the fourth Gaia data release (DR4), we analysed the preliminary astrometric binary solutions, obtained by the Gaia Non-Single Star pipeline, to verify their significance and to minimise false-detection rates in high-mass-function orbital solutions. The astrometric binary solution of one source, Gaia BH3, implies the presence of a 32.70 \pm 0.82 M\odot BH in a binary system with a period of 11.6 yr. Gaia radial velocities independently validate the astrometric orbit. Broad-band photometric and spectroscopic data show that the visible component is an old, very metal-poor giant of the Galactic halo, at a distance of 590 pc. The BH in the Gaia BH3 system is more massive than any other Galactic stellar-origin BH known thus far. The low metallicity of the star companion supports the scenario that metal-poor massive stars are progenitors of the high-mass BHs detected by gravitational-wave telescopes. The Galactic orbit of the system and its metallicity indicate that it might belong to the Sequoia halo substructure. Alternatively, and more plausibly, it could belong to the ED-2 stream, which likely originated from a globular cluster that had been disrupted by the Milky Way. △ Less

Submitted 19 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 23 pages, accepted fro publication in A&A Letters. New version with small fixes

arXiv:2403.19394 [pdf, ps, other]

Cycling on the Freeway: The Perilous State of Open Source Neuroscience Software

Authors: Britta U. Westner, Daniel R. McCloy, Eric Larson, Alexandre Gramfort, Daniel S. Katz, Arfon M. Smith, invited co-signees

Abstract: Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - o… ▽ More Most scientists need software to perform their research (Barker et al., 2020; Carver et al., 2022; Hettrick, 2014; Hettrick et al., 2014; Switters and Osimo, 2019), and neuroscientists are no exception. Whether we work with reaction times, electrophysiological signals, or magnetic resonance imaging data, we rely on software to acquire, analyze, and statistically evaluate the raw data we obtain - or to generate such data if we work with simulations. In recent years there has been a shift toward relying on free, open-source scientific software (FOSSS) for neuroscience data analysis (Poldrack et al., 2019), in line with the broader open science movement in academia (McKiernan et al., 2016) and wider industry trends (Eghbal, 2016). Importantly, FOSSS is typically developed by working scientists (not professional software developers) which sets up a precarious situation given the nature of the typical academic workplace (wherein academics, especially in their early careers, are on short and fixed term contracts). In this paper, we will argue that the existing ecosystem of neuroscientific open source software is brittle, and discuss why and how the neuroscience community needs to come together to ensure a healthy growth of our software landscape to the benefit of all. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.16709 [pdf, other]

The crossover line in the $(T, μ)$-phase diagram of QCD

Authors: Jana N. Guenther, Szabolcs Borsányi, Zoltan Fodor, Ruben Kara, Sandor D. Katz, Paolo Parotto, Attila Pásztor, Claudia Ratti, Kalman K. Szabó

Abstract: An efficient way to study the QCD phase diagram at small finite density is to extrapolate thermodynamical observables from imaginary chemical potential. The phase diagram features a crossover line starting from the transition temperature already determined at zero chemical potential. In this work we focus on the Taylor expansion of this line up to $μ^4$ contributions. We present the continuum extr… ▽ More An efficient way to study the QCD phase diagram at small finite density is to extrapolate thermodynamical observables from imaginary chemical potential. The phase diagram features a crossover line starting from the transition temperature already determined at zero chemical potential. In this work we focus on the Taylor expansion of this line up to $μ^4$ contributions. We present the continuum extrapolation of the crossover temperature based on different observables at several lattice spacings. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Proceedings to Quark Matter Conference 2019

Journal ref: Nucl.Phys.A 982 (2019) 303-306

arXiv:2403.08963 [pdf, other]

Timing the Milky Way bar formation and the accompanying radial migration episode

Authors: Misha Haywood, Sergey Khoperskov, Valeria Cerqui, Paola Di Matteo, David Katz, Owain Snaith

Abstract: We derive the metallicity profile of the Milky Way low-$α$ disc population from 2 to 20 kpc from the Galactic centre in 1 Gyr age bins using the astroNN catalogue, and show that it is highly structured, with a plateau between 4 and 7 kpc and a break at 10-12 kpc. We argue that these features result from the two main bar resonances, the corotation and the Outer Lindblad Resonance (OLR), respectivel… ▽ More We derive the metallicity profile of the Milky Way low-$α$ disc population from 2 to 20 kpc from the Galactic centre in 1 Gyr age bins using the astroNN catalogue, and show that it is highly structured, with a plateau between 4 and 7 kpc and a break at 10-12 kpc. We argue that these features result from the two main bar resonances, the corotation and the Outer Lindblad Resonance (OLR), respectively. We show that the break in the metallicity profile is most visible in stars having 7-8 Gyr, reaching an amplitude of about 0.4 dex, and is the signpost of the position of the bar OLR. The bar formation was accompanied by an episode of radial migration triggered by its slowing down and is responsible for spreading old metal-rich stars up to the OLR. The data show that the slowdown of the bar ended 6-7 Gyr ago. Based on numerical simulations that reproduce well the break observed in the metallicity profile, we argue that this implies that the bar formed in our Galaxy 8-10 Gyr ago. Analysis of the metallicity distribution as a function of radius shows no evidence of significant systematic outward radial migration after this first episode. We argue that the variation of the metallicity dispersion as a function of the guiding radius is dominated by the migration triggered by the bar, but also that the libration of orbits around the bar resonances induces a mixing that may have a significant impact on the observed metallicity dispersion. In contrast, the absence of a break in the metallicity profile of populations younger than about $\sim$6 Gyr and the flattening of the gradient at younger ages is interpreted as evidence that the strength of the bar has decreased, loosening its barrier effect and allowing the gas and metals on both sides of the OLR to mix, erasing the break. Beyond the OLR, stars younger than 7 Gyr show very small metallicity dispersion, suggesting no or limited mixing. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 20 pages, 23 figures. Submitted to A&A

arXiv:2403.04972 [pdf, ps, other]

On Abelian extensions in mixed characteristic and ramification in codimension one

Authors: Daniel Katz, Prashanth Sridhar

Abstract: A theorem of Paul Roberts states that the integral closure of a regular local ring in a generically abelian extension is Cohen-Macaulay, provided the characteristic of the residue field does not divide the order of the Galois group. An example of Koh shows the conclusion is false in the modular case. After a modification to the statement concerning ramification over $p$ in codimension one, we give… ▽ More A theorem of Paul Roberts states that the integral closure of a regular local ring in a generically abelian extension is Cohen-Macaulay, provided the characteristic of the residue field does not divide the order of the Galois group. An example of Koh shows the conclusion is false in the modular case. After a modification to the statement concerning ramification over $p$ in codimension one, we give an extension of Roberts's theorem to the modular case for unramified regular local rings in mixed characteristic when the $p$-torsion of the Galois group is annihilated by $p$. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 21 pages

MSC Class: 13B05

arXiv:2402.02878 [pdf, other]

doi 10.1051/0004-6361/202348191

The Gaia RVS benchmark stars II. A sample of stars selected for their Gaia high radial velocity

Authors: E. Caffau, D. Katz, A. Gómez, P. Bonifacio, R. Lallement, P. Sartoretti, L. Sbordone, M. Spite, A. Mucciarelli, R. Ibata, L. Chemin, F. Thévenin, P. Panuzzo, N. Leclerc, P. François, H. -G. Ludwig, L. Monaco, M. Haywood, C. Soubiran

Abstract: The Gaia satellite has already provided the astronomical community with three data releases, and the Radial Velocity Spectrometer (RVS) on board Gaia has provided the radial velocity for 33 million stars. When deriving the radial velocity from the RVS spectra, several stars are measured to have large values. To verify the credibility of these measurements, we selected some bright stars with the mo… ▽ More The Gaia satellite has already provided the astronomical community with three data releases, and the Radial Velocity Spectrometer (RVS) on board Gaia has provided the radial velocity for 33 million stars. When deriving the radial velocity from the RVS spectra, several stars are measured to have large values. To verify the credibility of these measurements, we selected some bright stars with the modulus of radial velocity in excess of 500\ to be observed with SOPHIE at OHP and UVES at VLT. This paper is devoted to investigating the chemical composition of the stars observed with UVES. We derived atmospheric parameters using Gaia photometry and parallaxes, and we performed a chemical analysis using the code. We find that the sample consists of metal-poor stars, although none have extremely low metallicities. The abundance patterns match what has been found in other samples of metal-poor stars selected irrespective of their radial velocities. We highlight the presence of three stars with low Cu and Zn abundances that are likely descendants of pair-instability supernovae. Two stars are apparently younger than 1\,Ga, and their masses exceed twice the turn-off mass of metal-poor populations. This makes it unlikely that they are blue stragglers because it would imply they formed from triple or multiple systems. We suggest instead that they are young metal-poor stars accreted from a dwarf galaxy. Finally, we find that the star RVS721 is associated with the Gjoll stream, which itself is associated with the Globular Cluster NGC\,3201. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Astronomy and Astrophysics - A\&A, In press

arXiv:2402.02824 [pdf]

FAIR-USE4OS: Guidelines for Creating Impactful Open-Source Software

Authors: Raphael Sonabend, Hugo Gruson, Leo Wolansky, Agnes Kiragga, Daniel S. Katz

Abstract: This paper extends the FAIR (Findable, Accessible, Interoperable, Reusable) guidelines to provide criteria for assessing if software conforms to best practices in open source. By adding 'USE' (User-Centered, Sustainable, Equitable), software development can adhere to open source best practice by incorporating user-input early on, ensuring front-end designs are accessible to all possible stakeholde… ▽ More This paper extends the FAIR (Findable, Accessible, Interoperable, Reusable) guidelines to provide criteria for assessing if software conforms to best practices in open source. By adding 'USE' (User-Centered, Sustainable, Equitable), software development can adhere to open source best practice by incorporating user-input early on, ensuring front-end designs are accessible to all possible stakeholders, and planning long-term sustainability alongside software design. The FAIR-USE4OS guidelines will allow funders and researchers to more effectively evaluate and plan open source software projects. There is good evidence of funders increasingly mandating that all funded research software is open source; however, even under the FAIR guidelines, this could simply mean software released on public repositories with a Zenodo DOI. By creating FAIR-USE software, best practice can be demonstrated from the very beginning of the design process and the software has the greatest chance of success by being impactful. △ Less

Submitted 3 April, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2312.07711 [pdf, other]

Leveraging Large Language Models to Build and Execute Computational Workflows

Authors: Alejandro Duque, Abdullah Syed, Kastan V. Day, Matthew J. Berry, Daniel S. Katz, Volodymyr V. Kindratenko

Abstract: The recent development of large language models (LLMs) with multi-billion parameters, coupled with the creation of user-friendly application programming interfaces (APIs), has paved the way for automatically generating and executing code in response to straightforward human queries. This paper explores how these emerging capabilities can be harnessed to facilitate complex scientific workflows, eli… ▽ More The recent development of large language models (LLMs) with multi-billion parameters, coupled with the creation of user-friendly application programming interfaces (APIs), has paved the way for automatically generating and executing code in response to straightforward human queries. This paper explores how these emerging capabilities can be harnessed to facilitate complex scientific workflows, eliminating the need for traditional coding methods. We present initial findings from our attempt to integrate Phyloflow with OpenAI's function-calling API, and outline a strategy for developing a comprehensive workflow management system based on these concepts. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.07528 [pdf, other]

Continuum extrapolated high order baryon fluctuations

Authors: Szabolcs Borsányi, Zoltán Fodor, Jana N. Guenther, Sándor D. Katz, Paolo Parotto, Attila Pásztor, Dávid Pesznyák, Kálmán K. Szabó, Chik Him Wong

Abstract: Fluctuations play a key role in the study of QCD phases. Lattice QCD is a valuable tool to calculate them, but going to high orders is challenging. Up to the fourth order, continuum results are available since 2015. We present the first continuum results for sixth order baryon fluctuations for temperatures between $T=130 - 200$ MeV, and eighth order at $T=145$ MeV in a fixed volume. We show that f… ▽ More Fluctuations play a key role in the study of QCD phases. Lattice QCD is a valuable tool to calculate them, but going to high orders is challenging. Up to the fourth order, continuum results are available since 2015. We present the first continuum results for sixth order baryon fluctuations for temperatures between $T=130 - 200$ MeV, and eighth order at $T=145$ MeV in a fixed volume. We show that for $T \leq 145$ MeV, relevant for criticality search, finite volume effects are under control. Our results are in sharp contrast with well known results in the literature obtained at finite lattice spacing. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: 5 pages, 2 figures (main text) + 5 pages, 7 figures (supplemental material)

arXiv:2310.06551 [pdf, other]

doi 10.1051/0004-6361/202347203

Gaia Focused Product Release: Sources from Service Interface Function image analysis -- Half a million new sources in omega Centauri

Authors: Gaia Collaboration, K. Weingrill, A. Mints, J. Castañeda, Z. Kostrzewa-Rutkowska, M. Davidson, F. De Angeli, J. Hernández, F. Torra, M. Ramos-Lerate, C. Babusiaux, M. Biermann, C. Crowley, D. W. Evans, L. Lindegren, J. M. Martín-Fleitas, L. Palaversa, D. Ruz Mieres, K. Tisanić, A. G. A. Brown, A. Vallenari, T. Prusti, J. H. J. de Bruijne, F. Arenou, A. Barbier , et al. (378 additional authors not shown)

Abstract: Gaia's readout window strategy is challenged by very dense fields in the sky. Therefore, in addition to standard Gaia observations, full Sky Mapper (SM) images were recorded for nine selected regions in the sky. A new software pipeline exploits these Service Interface Function (SIF) images of crowded fields (CFs), making use of the availability of the full two-dimensional (2D) information. This ne… ▽ More Gaia's readout window strategy is challenged by very dense fields in the sky. Therefore, in addition to standard Gaia observations, full Sky Mapper (SM) images were recorded for nine selected regions in the sky. A new software pipeline exploits these Service Interface Function (SIF) images of crowded fields (CFs), making use of the availability of the full two-dimensional (2D) information. This new pipeline produced half a million additional Gaia sources in the region of the omega Centauri ($ω$ Cen) cluster, which are published with this Focused Product Release. We discuss the dedicated SIF CF data reduction pipeline, validate its data products, and introduce their Gaia archive table. Our aim is to improve the completeness of the {\it Gaia} source inventory in a very dense region in the sky, $ω$ Cen. An adapted version of {\it Gaia}'s Source Detection and Image Parameter Determination software located sources in the 2D SIF CF images. We validated the results by comparing them to the public {\it Gaia} DR3 catalogue and external Hubble Space Telescope data. With this Focused Product Release, 526\,587 new sources have been added to the {\it Gaia} catalogue in $ω$ Cen. Apart from positions and brightnesses, the additional catalogue contains parallaxes and proper motions, but no meaningful colour information. While SIF CF source parameters generally have a lower precision than nominal {\it Gaia} sources, in the cluster centre they increase the depth of the combined catalogue by three magnitudes and improve the source density by a factor of ten. This first SIF CF data publication already adds great value to the {\it Gaia} catalogue. It demonstrates what to expect for the fourth {\it Gaia} catalogue, which will contain additional sources for all nine SIF CF regions. △ Less

Submitted 8 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Journal ref: A&A 680, A35 (2023)

arXiv:2310.06295 [pdf, other]

doi 10.1051/0004-6361/202347273

Gaia Focused Product Release: A catalogue of sources around quasars to search for strongly lensed quasars

Authors: Gaia Collaboration, A. Krone-Martins, C. Ducourant, L. Galluccio, L. Delchambre, I. Oreshina-Slezak, R. Teixeira, J. Braine, J. -F. Le Campion, F. Mignard, W. Roux, A. Blazere, L. Pegoraro, A. G. A. Brown, A. Vallenari, T. Prusti, J. H. J. de Bruijne, F. Arenou, C. Babusiaux, A. Barbier, M. Biermann, O. L. Creevey, D. W. Evans, L. Eyer, R. Guerra , et al. (376 additional authors not shown)

Abstract: Context. Strongly lensed quasars are fundamental sources for cosmology. The Gaia space mission covers the entire sky with the unprecedented resolution of $0.18$" in the optical, making it an ideal instrument to search for gravitational lenses down to the limiting magnitude of 21. Nevertheless, the previous Gaia Data Releases are known to be incomplete for small angular separations such as those ex… ▽ More Context. Strongly lensed quasars are fundamental sources for cosmology. The Gaia space mission covers the entire sky with the unprecedented resolution of $0.18$" in the optical, making it an ideal instrument to search for gravitational lenses down to the limiting magnitude of 21. Nevertheless, the previous Gaia Data Releases are known to be incomplete for small angular separations such as those expected for most lenses. Aims. We present the Data Processing and Analysis Consortium GravLens pipeline, which was built to analyse all Gaia detections around quasars and to cluster them into sources, thus producing a catalogue of secondary sources around each quasar. We analysed the resulting catalogue to produce scores that indicate source configurations that are compatible with strongly lensed quasars. Methods. GravLens uses the DBSCAN unsupervised clustering algorithm to detect sources around quasars. The resulting catalogue of multiplets is then analysed with several methods to identify potential gravitational lenses. We developed and applied an outlier scoring method, a comparison between the average BP and RP spectra of the components, and we also used an extremely randomised tree algorithm. These methods produce scores to identify the most probable configurations and to establish a list of lens candidates. Results. We analysed the environment of 3 760 032 quasars. A total of 4 760 920 sources, including the quasars, were found within 6" of the quasar positions. This list is given in the Gaia archive. In 87\% of cases, the quasar remains a single source, and in 501 385 cases neighbouring sources were detected. We propose a list of 381 lensed candidates, of which we identified 49 as the most promising. Beyond these candidates, the associate tables in this Focused Product Release allow the entire community to explore the unique Gaia data for strong lensing studies further. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 35 pages, 60 figures, accepted for publication by Astronomy and Astrophysics

Journal ref: A&A 685, A130 (2024)

arXiv:2310.06051 [pdf, other]

Gaia Focused Product Release: Radial velocity time series of long-period variables

Authors: Gaia Collaboration, Gaia Collaboration, M. Trabucchi, N. Mowlavi, T. Lebzelter, I. Lecoeur-Taibi, M. Audard, L. Eyer, P. García-Lario, P. Gavras, B. Holl, G. Jevardat de Fombelle, K. Nienartowicz, L. Rimoldini, P. Sartoretti, R. Blomme, Y. Frémat, O. Marchal, Y. Damerdji, A. G. A. Brown, A. Guerrier, P. Panuzzo, D. Katz, G. M. Seabroke, K. Benson , et al. (382 additional authors not shown)

Abstract: The third Gaia Data Release (DR3) provided photometric time series of more than 2 million long-period variable (LPV) candidates. Anticipating the publication of full radial-velocity (RV) in DR4, this Focused Product Release (FPR) provides RV time series for a selection of LPVs with high-quality observations. We describe the production and content of the Gaia catalog of LPV RV time series, and the… ▽ More The third Gaia Data Release (DR3) provided photometric time series of more than 2 million long-period variable (LPV) candidates. Anticipating the publication of full radial-velocity (RV) in DR4, this Focused Product Release (FPR) provides RV time series for a selection of LPVs with high-quality observations. We describe the production and content of the Gaia catalog of LPV RV time series, and the methods used to compute variability parameters published in the Gaia FPR. Starting from the DR3 LPVs catalog, we applied filters to construct a sample of sources with high-quality RV measurements. We modeled their RV and photometric time series to derive their periods and amplitudes, and further refined the sample by requiring compatibility between the RV period and at least one of the $G$, $G_{\rm BP}$, or $G_{\rm RP}$ photometric periods. The catalog includes RV time series and variability parameters for 9\,614 sources in the magnitude range $6\lesssim G/{\rm mag}\lesssim 14$, including a flagged top-quality subsample of 6\,093 stars whose RV periods are fully compatible with the values derived from the $G$, $G_{\rm BP}$, and $G_{\rm RP}$ photometric time series. The RV time series contain a mean of 24 measurements per source taken unevenly over a duration of about three years. We identify the great most sources (88%) as genuine LPVs, with about half of them showing a pulsation period and the other half displaying a long secondary period. The remaining 12% consists of candidate ellipsoidal binaries. Quality checks against RVs available in the literature show excellent agreement. We provide illustrative examples and cautionary remarks. The publication of RV time series for almost 10\,000 LPVs constitutes, by far, the largest such database available to date in the literature. The availability of simultaneous photometric measurements gives a unique added value to the Gaia catalog (abridged) △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 36 pages, 38 figures

arXiv:2309.14571 [pdf, ps, other]

Software Citation in HEP: Current State and Recommendations for the Future

Authors: Matthew Feickert, Daniel S. Katz, Mark S. Neubauer, Elizabeth Sexton-Kennedy, Graeme A. Stewart

Abstract: In November 2022, the HEP Software Foundation and the Institute for Research and Innovation for Software in High-Energy Physics organized a workshop on the topic of Software Citation and Recognition in HEP. The goal of the workshop was to bring together different types of stakeholders whose roles relate to software citation, and the associated credit it provides, in order to engage the community i… ▽ More In November 2022, the HEP Software Foundation and the Institute for Research and Innovation for Software in High-Energy Physics organized a workshop on the topic of Software Citation and Recognition in HEP. The goal of the workshop was to bring together different types of stakeholders whose roles relate to software citation, and the associated credit it provides, in order to engage the community in a discussion on: the ways HEP experiments handle citation of software, recognition for software efforts that enable physics results disseminated to the public, and how the scholarly publishing ecosystem supports these activities. Reports were given from the publication board leadership of the ATLAS, CMS, and LHCb experiments and HEP open source software community organizations (ROOT, Scikit-HEP, MCnet), and perspectives were given from publishers (Elsevier, JOSS) and related tool providers (INSPIRE, Zenodo). This paper summarizes key findings and recommendations from the workshop as presented at the 26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023). △ Less

Submitted 4 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: 7 pages, 2 listings. Contribution to the Proceedings of the 26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)

arXiv:2308.14954 [pdf]

Transitioning ECP Software Technology into a Foundation for Sustainable Research Software

Authors: Gregory R. Watson, Addi Malviya-Thakur, Daniel S. Katz, Elaine M. Raybourn, Bill Hoffman, Dana Robinson, John Kellerman, Clark Roundy

Abstract: Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. The Sustainable Research Software Institute (SRSI) Model has been designed to address the concerns, and presents a comprehensive framework designed to promote sustainable practices in the research software community. However th… ▽ More Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. The Sustainable Research Software Institute (SRSI) Model has been designed to address the concerns, and presents a comprehensive framework designed to promote sustainable practices in the research software community. However the SRSI Model does not address the transitional requirements for the Exascale Computing Project (ECP) Software Technology (ECP-ST) focus area specifically. This white paper provides an overview and detailed description of how ECP-ST will transition into the SRSI in a compressed time frame that a) meets the needs of the ECP end-of-technical-activities deadline; and b) ensures the continuity of the sustainability efforts that are already underway. △ Less

Submitted 30 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 7 pages, 1 figure

Report number: 200366

arXiv:2308.14953 [pdf]

An Open Community-Driven Model For Sustainable Research Software: Sustainable Research Software Institute

Authors: Gregory R. Watson, Addi Malviya-Thakur, Daniel S. Katz, Elaine M. Raybourn, Bill Hoffman, Dana Robinson, John Kellerman, Clark Roundy

Abstract: Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. To address these concerns, the Sustainable Research Software Institute (SRSI) Model presents a comprehensive framework designed to promote sustainable practices in the research software community. This white paper provides an i… ▽ More Research software plays a crucial role in advancing scientific knowledge, but ensuring its sustainability, maintainability, and long-term viability is an ongoing challenge. To address these concerns, the Sustainable Research Software Institute (SRSI) Model presents a comprehensive framework designed to promote sustainable practices in the research software community. This white paper provides an in-depth overview of the SRSI Model, outlining its objectives, services, funding mechanisms, collaborations, and the significant potential impact it could have on the research software community. It explores the wide range of services offered, diverse funding sources, extensive collaboration opportunities, and the transformative influence of the SRSI Model on the research software landscape △ Less

Submitted 30 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 13 pages, 1 figure

Report number: 200363

arXiv:2308.07796 [pdf, other]

doi 10.1109/e-Science58273.2023.10254813

Research Software Engineering in 2030

Authors: Daniel S. Katz, Simon Hettrick

Abstract: This position paper for an invited talk on the "Future of eScience" discusses the Research Software Engineering Movement and where it might be in 2030. Because of the authors' experiences, it is aimed globally but with examples that focus on the United States and United Kingdom. This position paper for an invited talk on the "Future of eScience" discusses the Research Software Engineering Movement and where it might be in 2030. Because of the authors' experiences, it is aimed globally but with examples that focus on the United States and United Kingdom. △ Less

Submitted 27 September, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Invited paper for 2023 IEEE Conference on eScience

arXiv:2308.07467 [pdf, ps, other]

Sequences with identical autocorrelation spectra

Authors: Daniel J. Katz, Adeebur Rahman, Michael J Ward

Abstract: Aperiodic autocorrelation measures the similarity between a finite-length sequence of complex numbers and translates of itself. Autocorrelation is important in communications, remote sensing, and scientific instrumentation. The autocorrelation function reports the aperiodic autocorrelation at every possible translation. Knowing the autocorrelation function of a sequence is equivalent to knowing th… ▽ More Aperiodic autocorrelation measures the similarity between a finite-length sequence of complex numbers and translates of itself. Autocorrelation is important in communications, remote sensing, and scientific instrumentation. The autocorrelation function reports the aperiodic autocorrelation at every possible translation. Knowing the autocorrelation function of a sequence is equivalent to knowing the magnitude of its Fourier transform. Resolving the lack of phase information is called the phase problem. We say that two sequences are isospectral to mean that they have the same aperiodic autocorrelation function. Sequences used in technological applications often have restrictions on their terms: they are not arbitrary complex numbers, but come from an alphabet that may reside in a proper subring of the complex field or may come from a finite set of values. For example, binary sequences involve terms equal to only $+1$ and $-1$. In this paper, we investigate the necessary and sufficient conditions for two sequences to be isospectral, where we take their alphabet into consideration. There are trivial forms of isospectrality arising from modifications that predictably preserve the autocorrelation, for example, negating sequences or both conjugating their terms and writing them in reverse order. By an exhaustive search of binary sequences up to length $34$, we find that nontrivial isospectrality among binary sequences does occur, but is rare. We say that a positive integer $n$ is barren to mean that there are no nontrivially isospectral binary sequences of length $n$. For integers $n \leq 34$, we found that the barren ones are $1$--$8$, $10$, $11$, $13$, $14$, $19$, $22$, $23$, $26$, and $29$. We prove that any multiple of a non-barren number is also not barren, and pose an open question as to whether there are finitely or infinitely many barren numbers. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 12 pages

MSC Class: 94A12 42A05 42A38 42A85

arXiv:2308.06105 [pdf, other]

Can rooted staggered fermions describe nonzero baryon density at low temperatures?

Authors: Szabolcs Borsanyi, Zoltan Fodor, Matteo Giordano, Jana N. Guenther, Sandor D. Katz, Attila Pasztor, Chik Him Wong

Abstract: Research on the QCD phase diagram with lattice field theory methods is dominated by the use of rooted staggered fermions, as they are the computationally cheapest discretization available. We show that rooted staggered fermions at a nonzero baryochemical potential $μ_B$ predict a sharp rise in the baryon density at low temperatures and $μ_B \gtrsim 3 m_π/2$, where $m_π$ is the Goldstone pion mass.… ▽ More Research on the QCD phase diagram with lattice field theory methods is dominated by the use of rooted staggered fermions, as they are the computationally cheapest discretization available. We show that rooted staggered fermions at a nonzero baryochemical potential $μ_B$ predict a sharp rise in the baryon density at low temperatures and $μ_B \gtrsim 3 m_π/2$, where $m_π$ is the Goldstone pion mass. We elucidate the nature of the non-analyticity behind this sharp rise in the density by a comparison of reweighting results with a Taylor expansion of high order. While at first sight this non-analytic behavior becomes apparent at the same position where the pion condensation transition takes place in the phase-quenched theory, the nature of the non-analyticity in the two theories appears to be quite different: While at nonzero isospin density the data are consistent with a genuine thermodynamic (branch-point) singularity, the results at nonzero baryon density point to an essential singularity at $μ_B=0$. The effect is absent for four flavors of degenerate quarks, where rooting is not used. For the two-flavor case, we show numerical evidence that the magnitude of the effect diminishes on finer lattices. We discuss the implications of this technical complication on future studies of the QCD phase diagram. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 12 pages, 6 figures

arXiv:2307.15657 [pdf, ps, other]

Almost perfect nonlinear power functions with exponents expressed as fractions

Authors: Daniel J. Katz, Kathleen R. O'Connor, Kyle Pacheco, Yakov Sapozhnikov

Abstract: Let $F$ be a finite field, let $f$ be a function from $F$ to $F$, and let $a$ be a nonzero element of $F$. The discrete derivative of $f$ in direction $a$ is $Δ_a f \colon F \to F$ with $(Δ_a f)(x)=f(x+a)-f(x)$. The differential spectrum of $f$ is the multiset of cardinalities of all the fibers of all the derivatives $Δ_a f$ as $a$ runs through $F^*$. The function $f$ is almost perfect nonlinear (… ▽ More Let $F$ be a finite field, let $f$ be a function from $F$ to $F$, and let $a$ be a nonzero element of $F$. The discrete derivative of $f$ in direction $a$ is $Δ_a f \colon F \to F$ with $(Δ_a f)(x)=f(x+a)-f(x)$. The differential spectrum of $f$ is the multiset of cardinalities of all the fibers of all the derivatives $Δ_a f$ as $a$ runs through $F^*$. The function $f$ is almost perfect nonlinear (APN) if the largest cardinality in the differential spectrum is $2$. Almost perfect nonlinear functions are of interest as cryptographic primitives. If $d$ is a positive integer, the power function over $F$ with exponent $d$ is the function $f \colon F \to F$ with $f(x)=x^d$ for every $x \in F$. There is a small number of known infinite families of APN power functions. In this paper, we re-express the exponents for one such family in a more convenient form. This enables us to give the differential spectrum and, even more, to determine the sizes of individual fibers of derivatives. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: 30 pages

arXiv:2307.14566 [pdf, ps, other]

Limiting Moments of Autocorrelation Demerit Factors of Binary Sequences

Authors: Daniel J. Katz, Miriam E. Ramirez

Abstract: An aperiodic binary sequence of length $\ell$ is a doubly infinite sequence $f=\ldots,f_{-1},f_0,f_1,\ldots$ with $f_j \in \{-1,1\}$ when $0 \leq j < \ell$ and and $f_j=0$ otherwise. Various problems in engineering and natural science demand binary sequences that do not resemble translates of themselves. The autocorrelation of $f$ at shift $s$ is the dot product of $f$ with the sequence obtained b… ▽ More An aperiodic binary sequence of length $\ell$ is a doubly infinite sequence $f=\ldots,f_{-1},f_0,f_1,\ldots$ with $f_j \in \{-1,1\}$ when $0 \leq j < \ell$ and and $f_j=0$ otherwise. Various problems in engineering and natural science demand binary sequences that do not resemble translates of themselves. The autocorrelation of $f$ at shift $s$ is the dot product of $f$ with the sequence obtained by translating $f$ by $s$ places. The demerit factor of $f$ is the sum of the squares of the autocorrelations at all nonzero shifts for the sequence obtained by normalizing $f$ to unit Euclidean norm. Low demerit factor therefore indicates low self-similarity under translation. We endow the $2^\ell$ binary sequences of length $\ell$ with uniform probability measure and consider the distribution of their demerit factors. Earlier works used combinatorial techniques to find exact formulas for the mean, variance, skewness, and kurtosis of the distribution as a function of $\ell$. These revealed that for $\ell \geq 4$, the $p$th central moment of this distribution is strictly positive for every $p \geq 2$. This article shows that every $p$th central moment is a quasi-polynomial function of $\ell$ with rational coefficients divided by $\ell^{2 p}$. It also shows that, in the limit as $\ell$ tends to infinity, the $p$th standardized moment is the same as that of the standard normal distribution. △ Less

Submitted 11 June, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: 27 pages

MSC Class: 60C05; 94A55; 05A99; 05A18; 05E18

arXiv:2307.14281 [pdf, ps, other]

Moments of Autocorrelation Demerit Factors of Binary Sequences

Authors: Daniel J. Katz, Miriam E. Ramirez

Abstract: Sequences with low aperiodic autocorrelation are used in communications and remote sensing for synchronization and ranging. The autocorrelation demerit factor of a sequence is the sum of the squared magnitudes of its autocorrelation values at every nonzero shift when we normalize the sequence to have unit Euclidean length. The merit factor, introduced by Golay, is the reciprocal of the demerit fac… ▽ More Sequences with low aperiodic autocorrelation are used in communications and remote sensing for synchronization and ranging. The autocorrelation demerit factor of a sequence is the sum of the squared magnitudes of its autocorrelation values at every nonzero shift when we normalize the sequence to have unit Euclidean length. The merit factor, introduced by Golay, is the reciprocal of the demerit factor. We consider the uniform probability measure on the $2^\ell$ binary sequences of length $\ell$ and investigate the distribution of the demerit factors of these sequences. Sarwate and Jedwab have respectively calculated the mean and variance of this distribution. We develop new combinatorial techniques to calculate the $p$th central moment of the demerit factor for binary sequences of length $\ell$. These techniques prove that for $p\geq 2$ and $\ell \geq 4$, all the central moments are strictly positive. For any given $p$, one may use the technique to obtain an exact formula for the $p$th central moment of the demerit factor as a function of the length $\ell$. Jedwab's formula for variance is confirmed by our technique with a short calculation, and we go beyond previous results by also deriving an exact formula for the skewness. A computer-assisted application of our method also obtains exact formulas for the kurtosis, which we report here, as well as the fifth central moment. △ Less

Submitted 7 June, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: 41 pages

MSC Class: 60C05; 94A55; 05A99; 05A18; 05E18

arXiv:2307.11383 [pdf, ps, other]

Wanted: standards for automatic reproducibility of computational experiments

Authors: Samuel Grayson, Reed Milewicz, Joshua Teves, Daniel S. Katz, Darko Marinov

Abstract: Those seeking to reproduce a computational experiment often need to manually look at the code to see how to build necessary libraries, configure parameters, find data, and invoke the experiment; it is not automatic. Automatic reproducibility is a more stringent goal, but working towards it would benefit the community. This work discusses a machine-readable language for specifying how to execute a… ▽ More Those seeking to reproduce a computational experiment often need to manually look at the code to see how to build necessary libraries, configure parameters, find data, and invoke the experiment; it is not automatic. Automatic reproducibility is a more stringent goal, but working towards it would benefit the community. This work discusses a machine-readable language for specifying how to execute a computational experiment. We invite interested stakeholders to discuss this language at https://github.com/charmoniumQ/execution-description . △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: Submitted to SE4RS'23 Portland, OR

arXiv:2307.11060 [pdf, ps, other]

The Changing Role of RSEs over the Lifetime of Parsl

Authors: Daniel S. Katz, Ben Clifford, Yadu Babuji, Kevin Hunter Kesling, Anna Woodard, Kyle Chard

Abstract: This position paper describes the Parsl open source research software project and its various phases over seven years. It defines four types of research software engineers (RSEs) who have been important to the project in those phases; we believe this is also applicable to other research software projects. This position paper describes the Parsl open source research software project and its various phases over seven years. It defines four types of research software engineers (RSEs) who have been important to the project in those phases; we believe this is also applicable to other research software projects. △ Less

Submitted 20 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: 3 pages

arXiv:2307.07630 [pdf, other]

Optical Studies of Seven Bright Southern Cataclysmic Variable Stars

Authors: John R. Thorstensen, Chase K. Alvarado-Anderson, Abigail D. Burrows, Rowan M. Goebel-Bain, David C. Katz

Abstract: We report spectroscopic observations of seven bright southern cataclysmic variable stars, collected on a single two-week observing run using the 1.9-m Radcliffe telescope at the South African Astronomical Observatory. We used radial velocity time series, in some cases in combination with other data, to determine or clarify orbital periods for five of them, namely ATO J061.1478-31.0634, BMAM-V547,… ▽ More We report spectroscopic observations of seven bright southern cataclysmic variable stars, collected on a single two-week observing run using the 1.9-m Radcliffe telescope at the South African Astronomical Observatory. We used radial velocity time series, in some cases in combination with other data, to determine or clarify orbital periods for five of them, namely ATO J061.1478-31.0634, BMAM-V547, MGAB-V202, NSV 4202, and V1147 Cen. For BMAM-V547, we use data from the Transiting Exoplanet Survey Satellite (TESS) to corroborate and sharpen the orbital period; the TESS data also show a photometric period near 3.93 d, likely indicating precession of the accretion disk. Also, we find a periodic modulation in the radial velocities of the SU UMa-type dwarf nova Var Ret2005, but are unable to specify a unique cycle count. Finally, we show a spectrum of ASASSN-V J061528.41-412007.3 that appears typical of a luminous novalike variable. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: 12 pages, 13 figures. Accepted for The Astronomical Journal

arXiv:2306.14414 [pdf, ps, other]

Rationality of Four-Valued Families of Weil Sums of Binomials

Authors: Daniel J. Katz, Allison E. Wong

Abstract: We investigate the rationality of Weil sums of binomials of the form $W^{K,s}_u=\sum_{x \in K} ψ(x^s - u x)$, where $K$ is a finite field whose canonical additive character is $ψ$, and where $u$ is an element of $K^{\times}$ and $s$ is a positive integer relatively prime to $|K^\times|$, so that $x \mapsto x^s$ is a permutation of $K$. The Weil spectrum for $K$ and $s$, which is the family of valu… ▽ More We investigate the rationality of Weil sums of binomials of the form $W^{K,s}_u=\sum_{x \in K} ψ(x^s - u x)$, where $K$ is a finite field whose canonical additive character is $ψ$, and where $u$ is an element of $K^{\times}$ and $s$ is a positive integer relatively prime to $|K^\times|$, so that $x \mapsto x^s$ is a permutation of $K$. The Weil spectrum for $K$ and $s$, which is the family of values $W^{K,s}_u$ as $u$ runs through $K^\times$, is of interest in arithmetic geometry and in several information-theoretic applications. The Weil spectrum always contains at least three distinct values if $s$ is nondegenerate (i.e., if $s$ is not a power of $p$ modulo $|K^\times|$, where $p$ is the characteristic of $K$). It is already known that if the Weil spectrum contains precisely three distinct values, then they must all be rational integers. We show that if the Weil spectrum contains precisely four distinct values, then they must all be rational integers, with the sole exception of the case where $|K|=5$ and $s \equiv 3 \pmod{4}$. △ Less

Submitted 6 April, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: 33 pages

MSC Class: 11T24; 11L05; 11L40; 11T22; 11G25; 11T71; 94A55; 94A60; 94B15

arXiv:2306.11615 [pdf, other]

Fine-grained Policy-driven I/O Sharing for Burst Buffers

Authors: Ed Karrels, Lei Huang, Yuhong Kan, Ishank Arora, Yinzhi Wang, Daniel S. Katz, William D. Gropp, Zhao Zhang

Abstract: A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present… ▽ More A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5-13.7% higher I/O throughput and 19.5-40.4% lower performance variation than existing algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1-99.8% caused by I/O interference. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.03126 [pdf, other]

doi 10.1051/0004-6361/202346334

Stragglers of the thick disc

Authors: Valeria Cerqui, Misha Haywood, Paola Di Matteo, David Katz, Frédéric Royer

Abstract: Young alpha-rich (YAR) stars have been detected in the past as outliers to the local age $\rm-$ [$α$/Fe] relation. These objects are enhanced in $α$-elements but apparently younger than typical thick disc stars. We study the global kinematics and chemical properties of YAR giant stars in APOGEE DR17 survey and show that they have properties similar to those of the standard thick disc stellar popul… ▽ More Young alpha-rich (YAR) stars have been detected in the past as outliers to the local age $\rm-$ [$α$/Fe] relation. These objects are enhanced in $α$-elements but apparently younger than typical thick disc stars. We study the global kinematics and chemical properties of YAR giant stars in APOGEE DR17 survey and show that they have properties similar to those of the standard thick disc stellar population. This leads us to conclude that YAR are rejuvenated thick disc objects, most probably evolved blue stragglers. This is confirmed by their position in the Hertzsprung-Russel diagram (HRD). Extending our selection to dwarfs allows us to obtain the first general straggler distribution in an HRD of field stars. We also compare the elemental abundances of our sample with those of standard thick disc stars, and find that our YAR stars are shifted in oxygen, magnesium, sodium, and the slow neutron-capture element cerium. Although we detect no sign of binarity for most objects, the enhancement in cerium may be the signature of a mass transfer from an asymptotic giant branch companion. The most massive YAR stars suggest that mass transfer from an evolved star may not be the only formation pathway, and that other scenarios, such as collision or coalescence should be considered. △ Less

Submitted 17 July, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 18 Pages, 20 Figures, 1 Table; accepted for publication in Astronomy & Astrophysics

Journal ref: A&A 676, A108 (2023)

arXiv:2305.07507 [pdf, other]

LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development

Authors: Ilias Chalkidis, Nicolas Garneau, Catalina Goanta, Daniel Martin Katz, Anders Søgaard

Abstract: In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora use… ▽ More In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models' size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model's size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs. △ Less

Submitted 22 May, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: 9 pages, long paper at ACL 2023 proceedings

arXiv:2304.00019 [pdf, other]

doi 10.5281/zenodo.7750670

Workflows Community Summit 2022: A Roadmap Revolution

Authors: Rafael Ferreira da Silva, Rosa M. Badia, Venkat Bala, Debbie Bard, Peer-Timo Bremer, Ian Buckley, Silvina Caino-Lores, Kyle Chard, Carole Goble, Shantenu Jha, Daniel S. Katz, Daniel Laney, Manish Parashar, Frederic Suter, Nick Tyler, Thomas Uram, Ilkay Altintas, Stefan Andersson, William Arndt, Juan Aznar, Jonathan Bader, Bartosz Balis, Chris Blanton, Kelly Rosa Braghetto, Aharon Brodutch , et al. (80 additional authors not shown)

Abstract: Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t… ▽ More Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and the evolving needs of emerging scientific applications, it is paramount that the development of novel scientific workflows and system functionalities seek to increase the efficiency, resilience, and pervasiveness of existing systems and applications. Specifically, the proliferation of machine learning/artificial intelligence (ML/AI) workflows, need for processing large scale datasets produced by instruments at the edge, intensification of near real-time data processing, support for long-term experiment campaigns, and emergence of quantum computing as an adjunct to HPC, have significantly changed the functional and operational requirements of workflow systems. Workflow systems now need to, for example, support data streams from the edge-to-cloud-to-HPC enable the management of many small-sized files, allow data reduction while ensuring high accuracy, orchestrate distributed services (workflows, instruments, data movement, provenance, publication, etc.) across computing and user facilities, among others. Further, to accelerate science, it is also necessary that these systems implement specifications/standards and APIs for seamless (horizontal and vertical) integration between systems and applications, as well as enabling the publication of workflows and their associated products according to the FAIR principles. This document reports on discussions and findings from the 2022 international edition of the Workflows Community Summit that took place on November 29 and 30, 2022. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Report number: ORNL/TM-2023/2885

arXiv:2303.17034 [pdf, other]

doi 10.1109/MCSE.2023.3263458

Overcoming Challenges to Continuous Integration in HPC

Authors: Todd Gamblin, Daniel S. Katz

Abstract: Continuous integration (CI) has become a ubiquitous practice in modern software development, with major code hosting services offering free automation on popular platforms. CI offers major benefits, as it enables detecting bugs in code prior to committing changes. While high-performance computing (HPC) research relies heavily on software, HPC machines are not considered "common" platforms. This pr… ▽ More Continuous integration (CI) has become a ubiquitous practice in modern software development, with major code hosting services offering free automation on popular platforms. CI offers major benefits, as it enables detecting bugs in code prior to committing changes. While high-performance computing (HPC) research relies heavily on software, HPC machines are not considered "common" platforms. This presents several challenges that hinder the adoption of CI in HPC environments, making it difficult to maintain bug-free HPC projects, and resulting in adverse effects on the research community. In this article, we explore the challenges that impede HPC CI, such as hardware diversity, security, isolation, administrative policies, and non-standard authentication, environments, and job submission mechanisms. We propose several solutions that could enhance the quality of HPC software and the experience of developers. Implementing these solutions would require significant changes at HPC centers, but if these changes are made, it would ultimately enable faster and better science. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2302.12039 [pdf, other]

Natural Language Processing in the Legal Domain

Authors: Daniel Martin Katz, Dirk Hartung, Lauritz Gerlach, Abhik Jana, Michael J. Bommarito II

Abstract: In this paper, we summarize the current state of the field of NLP & Law with a specific focus on recent technical and substantive developments. To support our analysis, we construct and analyze a nearly complete corpus of more than six hundred NLP & Law related papers published over the past decade. Our analysis highlights several major trends. Namely, we document an increasing number of papers wr… ▽ More In this paper, we summarize the current state of the field of NLP & Law with a specific focus on recent technical and substantive developments. To support our analysis, we construct and analyze a nearly complete corpus of more than six hundred NLP & Law related papers published over the past decade. Our analysis highlights several major trends. Namely, we document an increasing number of papers written, tasks undertaken, and languages covered over the course of the past decade. We observe an increase in the sophistication of the methods which researchers deployed in this applied context. Slowly but surely, Legal NLP is beginning to match not only the methodological sophistication of general NLP but also the professional standards of data availability and code reproducibility observed within the broader scientific community. We believe all of these trends bode well for the future of the field, but many questions in both the academic and commercial sphere still remain open. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 13 pages, 7 figures, 2 tables, online source and data

arXiv:2302.11838 [pdf, other]

Minimum-Entropy Coupling Approximation Guarantees Beyond the Majorization Barrier

Authors: Spencer Compton, Dmitriy Katz, Benjamin Qi, Kristjan Greenewald, Murat Kocaoglu

Abstract: Given a set of discrete probability distributions, the minimum entropy coupling is the minimum entropy joint distribution that has the input distributions as its marginals. This has immediate relevance to tasks such as entropic causal inference for causal graph discovery and bounding mutual information between variables that we observe separately. Since finding the minimum entropy coupling is NP-H… ▽ More Given a set of discrete probability distributions, the minimum entropy coupling is the minimum entropy joint distribution that has the input distributions as its marginals. This has immediate relevance to tasks such as entropic causal inference for causal graph discovery and bounding mutual information between variables that we observe separately. Since finding the minimum entropy coupling is NP-Hard, various works have studied approximation algorithms. The work of [Compton, ISIT 2022] shows that the greedy coupling algorithm of [Kocaoglu et al., AAAI 2017] is always within $log_2(e) \approx 1.44$ bits of the optimal coupling. Moreover, they show that it is impossible to obtain a better approximation guarantee using the majorization lower-bound that all prior works have used: thus establishing a majorization barrier. In this work, we break the majorization barrier by designing a stronger lower-bound that we call the profile method. Using this profile method, we are able to show that the greedy algorithm is always within $log_2(e)/e \approx 0.53$ bits of optimal for coupling two distributions (previous best-known bound is within 1 bit), and within $(1 + log_2(e))/2 \approx 1.22$ bits for coupling any number of distributions (previous best-known bound is within 1.44 bits). We also examine a generalization of the minimum entropy coupling problem: Concave Minimum-Cost Couplings. We are able to obtain similar guarantees for this generalization in terms of the concave cost function. Additionally, we make progress on the open problem of [Kovačević et al., Inf. Comput. 2015] regarding NP membership of the minimum entropy coupling problem by showing that any hardness of minimum entropy coupling beyond NP comes from the difficulty of computing arithmetic in the complexity class NP. Finally, we present exponential-time algorithms for computing the exactly optimal solution. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: AISTATS 2023

arXiv:2301.04408 [pdf, other]

GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities

Authors: Jillian Bommarito, Michael Bommarito, Daniel Martin Katz, Jessica Katz

Abstract: The global economy is increasingly dependent on knowledge workers to meet the needs of public and private organizations. While there is no single definition of knowledge work, organizations and industry groups still attempt to measure individuals' capability to engage in it. The most comprehensive assessment of capability readiness for professional knowledge workers is the Uniform CPA Examination… ▽ More The global economy is increasingly dependent on knowledge workers to meet the needs of public and private organizations. While there is no single definition of knowledge work, organizations and industry groups still attempt to measure individuals' capability to engage in it. The most comprehensive assessment of capability readiness for professional knowledge workers is the Uniform CPA Examination developed by the American Institute of Certified Public Accountants (AICPA). In this paper, we experimentally evaluate OpenAI's `text-davinci-003` and prior versions of GPT on both a sample Regulation (REG) exam and an assessment of over 200 multiple-choice questions based on the AICPA Blueprints for legal, financial, accounting, technology, and ethical tasks. First, we find that `text-davinci-003` achieves a correct rate of 14.4% on a sample REG exam section, significantly underperforming human capabilities on quantitative reasoning in zero-shot prompts. Second, `text-davinci-003` appears to be approaching human-level performance on the Remembering & Understanding and Application skill levels in the Exam absent calculation. For best prompt and parameters, the model answers 57.6% of questions correctly, significantly better than the 25% guessing rate, and its top two answers are correct 82.1% of the time, indicating strong non-entailment. Finally, we find that recent generations of GPT-3 demonstrate material improvements on this assessment, rising from 30% for `text-davinci-001` to 57% for `text-davinci-003`. These findings strongly suggest that large language models have the potential to transform the quality and efficiency of future knowledge work. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Comments: Source code and data available in online SI at https://github.com/mjbommar/gpt-as-knowledge-worker

arXiv:2212.14402 [pdf, other]

GPT Takes the Bar Exam

Authors: Michael Bommarito II, Daniel Martin Katz

Abstract: Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months… ▽ More Nearly all jurisdictions in the United States require a professional license exam, commonly referred to as "the Bar Exam," as a precondition for law practice. To even sit for the exam, most jurisdictions require that an applicant completes at least seven years of post-secondary education, including three years at an accredited law school. In addition, most test-takers also undergo weeks to months of further, exam-specific preparation. Despite this significant investment of time and capital, approximately one in five test-takers still score under the rate required to pass the exam on their first try. In the face of a complex task that requires such depth of knowledge, what, then, should we expect of the state of the art in "AI?" In this research, we document our experimental evaluation of the performance of OpenAI's `text-davinci-003` model, often-referred to as GPT-3.5, on the multistate multiple choice (MBE) section of the exam. While we find no benefit in fine-tuning over GPT-3.5's zero-shot performance at the scale of our training data, we do find that hyperparameter optimization and prompt engineering positively impacted GPT-3.5's zero-shot performance. For best prompt and parameters, GPT-3.5 achieves a headline correct rate of 50.3% on a complete NCBE MBE practice exam, significantly in excess of the 25% baseline guessing rate, and performs at a passing rate for both Evidence and Torts. GPT-3.5's ranking of responses is also highly-correlated with correctness; its top two and top three choices are correct 71% and 88% of the time, respectively, indicating very strong non-entailment performance. While our ability to interpret these results is limited by nascent scientific understanding of LLMs and the proprietary nature of GPT, we believe that these results strongly suggest that an LLM will pass the MBE component of the Bar Exam in the near future. △ Less

Submitted 29 December, 2022; originally announced December 2022.

Comments: Additional material available online at https://github.com/mjbommar/gpt-takes-the-bar-exam

arXiv:2212.11987 [pdf, other]

doi 10.1051/0004-6361/202245518

The phase spiral in Gaia DR3

Authors: T. Antoja, P. Ramos, B. García-Conde, M. Bernet, C. F. P. Laporte, D. Katz

Abstract: We aim to study the phase spiral in the Milky Way (MW) with Gaia DR3. We used an edge detection algorithm to find the border of the phase spiral, allowing us to robustly quantify its shape at different positions and for different selections. We calculated the time of onset of the phase-mixing by determining the different turns of the phase spiral and using the vertical frequencies from commonly us… ▽ More We aim to study the phase spiral in the Milky Way (MW) with Gaia DR3. We used an edge detection algorithm to find the border of the phase spiral, allowing us to robustly quantify its shape at different positions and for different selections. We calculated the time of onset of the phase-mixing by determining the different turns of the phase spiral and using the vertical frequencies from commonly used MW potential models. We find that the phase spiral extends down to $-1.2$ kpc in height below the plane (about 3 to 5 scale heights of the thin disc) and beyond $\pm 50$ km/s in $V_Z$. We see a secondary branch mostly at positive vertical velocities when coloured by azimuthal velocity and in the counts projection. We also find complex variations of the phase spirals with angular momentum and azimuth. All these possibly provide evidence of multiple perturbations (from different times or from different perturbers) and/or of the complexity of the phase mixing process. We detect the phase spiral from 6 to 11 kpc from the Galactic centre and find signatures of vertical asymmetries 1-2 kpc beyond this range. We measure small but clear variations with azimuth. When we determine the phase mixing times from the phase spiral at different angular momenta and using the different spiral turns (at different $Z$) we obtain inconsistent times with systematic differences (times increasing with $|L_Z|$ and with $|Z|$). Our determinations are mostly in the range of [0.3-0.9] Gyr, with an average of 0.5 Gyr. The inconsistencies do not change when using different usual potential models, different stellar distances or frequencies for different kinetic temperatures. They could stem from the inconsistency of potential models with the true MW, and from too simple modelling, in particular neglecting self-gravity, not considering the multiple perturbations and the interference with other processes. △ Less

Submitted 25 May, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: version after proofs corrections

Journal ref: A&A 673, A115 (2023)

arXiv:2212.05081 [pdf, other]

doi 10.1088/2632-2153/ad12e3

FAIR AI Models in High Energy Physics

Authors: Javier Duarte, Haoyang Li, Avik Roy, Ruike Zhu, E. A. Huerta, Daniel Diaz, Philip Harris, Raghav Kansal, Daniel S. Katz, Ishaan H. Kavoori, Volodymyr V. Kindratenko, Farouk Mokhtar, Mark S. Neubauer, Sang Eon Park, Melissa Quinnan, Roger Rusack, Zhizhen Zhao

Abstract: The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly… ▽ More The findable, accessible, interoperable, and reusable (FAIR) data principles provide a framework for examining, evaluating, and improving how data is shared to facilitate scientific discovery. Generalizing these principles to research software and other digital products is an active area of research. Machine learning (ML) models -- algorithms that have been trained on data without being explicitly programmed -- and more generally, artificial intelligence (AI) models, are an important target for this because of the ever-increasing pace with which AI is transforming scientific domains, such as experimental high energy physics (HEP). In this paper, we propose a practical definition of FAIR principles for AI models in HEP and describe a template for the application of these principles. We demonstrate the template's use with an example AI model applied to HEP, in which a graph neural network is used to identify Higgs bosons decaying to two bottom quarks. We report on the robustness of this FAIR AI model, its portability across hardware architectures and software frameworks, and its interpretability. △ Less

Submitted 29 December, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

Comments: 34 pages, 9 figures, 10 tables

Journal ref: Mach. Learn.: Sci. Technol. 4 (2023) 045062

arXiv:2211.07436 [pdf, other]

doi 10.1109/MCSE.2023.3253847

Giving RSEs a Larger Stage through the Better Scientific Software Fellowship

Authors: William F. Godoy, Ritu Arora, Keith Beattie, David E. Bernholdt, Sarah E. Bratt, Daniel S. Katz, Ignacio Laguna, Amiya K. Maji, Addi Malviya Thakur, Rafael M. Mudafort, Nitin Sukhija, Damian Rouson, Cindy Rubio-González, Karan Vahi

Abstract: The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last fiv… ▽ More The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). This paper provides case studies from several of the program's participants to illustrate some of the diverse ways BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as BSSwF can be a valuable means to recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations and ideas for a larger audience. △ Less

Submitted 14 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: submitted to Computing in Science & Engineering (CiSE), Special Issue on the Future of Research Software Engineers in the US

arXiv:2210.08973 [pdf, ps, other]

doi 10.1038/s41597-023-02298-6

FAIR for AI: An interdisciplinary and international community building perspective

Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver Rübel, Zhizhen Zhao, Ruike Zhu

Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022. △ Less

Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

ACM Class: I.2.0; E.0

Journal ref: Scientific Data 10, 487 (2023)

arXiv:2210.04275 [pdf, other]

doi 10.1109/MCSE.2023.3258630

Research Software Engineers: Career Entry Points and Training Gaps

Authors: Ian A. Cosden, Kenton McHenry, Daniel S. Katz

Abstract: As software has become more essential to research across disciplines, and as the recognition of this fact has grown, the importance of professionalizing the development and maintenance of this software has also increased. The community of software professionals who work on this software have come together under the title Research Software Engineer (RSE) over the last decade. This has led to the fo… ▽ More As software has become more essential to research across disciplines, and as the recognition of this fact has grown, the importance of professionalizing the development and maintenance of this software has also increased. The community of software professionals who work on this software have come together under the title Research Software Engineer (RSE) over the last decade. This has led to the formalization of RSE roles and organized RSE groups in universities, national labs, and industry. This, in turn, has created the need to understand how RSEs come into this profession and into these groups, how to further promote this career path to potential members, as well as the need to understand what training gaps need to be filled for RSEs coming from different entry points. We have categorized three main classifications of entry paths into the RSE profession and identified key elements, both advantages and disadvantages, that should be acknowledged and addressed by the broader research community in order to attract and retain a talented and diverse pool of future RSEs. △ Less

Submitted 15 March, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

Comments: Accepted by IEEE Computing in Science & Engineering (CiSE): Special Issue on the Future of Research Software Engineers in the US

arXiv:2209.11631 [pdf, other]

doi 10.1109/TPDS.2022.3208767

funcX: Federated Function as a Service for Science

Authors: Zhuozhao Li, Ryan Chard, Yadu Babuji, Ben Galewsky, Tyler Skluzacek, Kirill Nagaitsev, Anna Woodard, Ben Blaiszik, Josh Bryan, Daniel S. Katz, Ian Foster, Kyle Chard

Abstract: funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and superc… ▽ More funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and supercomputers, in effect turning them into function serving systems. funcX's cloud-hosted service provides a single location for registering, sharing, and managing both functions and endpoints. It allows for transparent, secure, and reliable function execution across the federated ecosystem of endpoints--enabling users to route functions to endpoints based on specific needs. funcX uses containers (e.g., Docker, Singularity, and Shifter) to provide common execution environments across endpoints. funcX implements various container management strategies to execute functions with high performance and efficiency on diverse funcX endpoints. funcX also integrates with an in-memory data store and Globus for managing data that may span endpoints. We motivate the need for funcX, present our prototype design and implementation, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than 130 000 concurrent workers. We show that funcX's container warming-aware routing algorithm can reduce the completion time for 3000 functions by up to 61% compared to a randomized algorithm and the in-memory data store can speed up data transfers by up to 3x compared to a shared file system. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2005.04215

arXiv:2208.05398 [pdf, other]

doi 10.1103/PhysRevD.107.L091503

Equation of state of a hot-and-dense quark gluon plasma: lattice simulations at real $μ_B$ vs. extrapolations

Authors: Szabolcs Borsanyi, Zoltan Fodor, Matteo Giordano, Jana N. Guenther, Sandor D. Katz, Attila Pasztor, Chik Him Wong

Abstract: The equation of state of the quark gluon plasma is a key ingredient of heavy ion phenomenology. In addition to the traditional Taylor method, several novel approximation schemes have been proposed with the aim of calculating it at finite baryon density. In order to gain a pragmatic understanding of the limits of these schemes, we compare them to direct results at $μ_B>0$, using reweighting techniq… ▽ More The equation of state of the quark gluon plasma is a key ingredient of heavy ion phenomenology. In addition to the traditional Taylor method, several novel approximation schemes have been proposed with the aim of calculating it at finite baryon density. In order to gain a pragmatic understanding of the limits of these schemes, we compare them to direct results at $μ_B>0$, using reweighting techniques free from an overlap problem. We use 2stout improved staggered fermions with 8 time-slices and cover the entire RHIC BES range in the baryochemical potential, up to $μ_B/T=3$. △ Less

Submitted 12 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: 7 pages, 3 figures

arXiv:2208.00211 [pdf, other]

doi 10.1051/0004-6361/202243940

Gaia Data Release 3: Summary of the content and survey properties

Authors: Gaia Collaboration, A. Vallenari, A. G. A. Brown, T. Prusti, J. H. J. de Bruijne, F. Arenou, C. Babusiaux, M. Biermann, O. L. Creevey, C. Ducourant, D. W. Evans, L. Eyer, R. Guerra, A. Hutton, C. Jordi, S. A. Klioner, U. L. Lammers, L. Lindegren, X. Luri, F. Mignard, C. Panem, D. Pourbaix, S. Randich, P. Sartoretti, C. Soubiran , et al. (431 additional authors not shown)

Abstract: We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photom… ▽ More We present the third data release of the European Space Agency's Gaia mission, GDR3. The GDR3 catalogue is the outcome of the processing of raw data collected with the Gaia instruments during the first 34 months of the mission by the Gaia Data Processing and Analysis Consortium. The GDR3 catalogue contains the same source list, celestial positions, proper motions, parallaxes, and broad band photometry in the G, G$_{BP}$, and G$_{RP}$ pass-bands already present in the Early Third Data Release. GDR3 introduces an impressive wealth of new data products. More than 33 million objects in the ranges $G_{rvs} < 14$ and $3100 <T_{eff} <14500 $, have new determinations of their mean radial velocities based on data collected by Gaia. We provide G$_{rvs}$ magnitudes for most sources with radial velocities, and a line broadening parameter is listed for a subset of these. Mean Gaia spectra are made available to the community. The GDR3 catalogue includes about 1 million mean spectra from the radial velocity spectrometer, and about 220 million low-resolution blue and red prism photometer BPRP mean spectra. The results of the analysis of epoch photometry are provided for some 10 million sources across 24 variability types. GDR3 includes astrophysical parameters and source class probabilities for about 470 million and 1500 million sources, respectively, including stars, galaxies, and quasars. Orbital elements and trend parameters are provided for some $800\,000$ astrometric, spectroscopic and eclipsing binaries. More than $150\,000$ Solar System objects, including new discoveries, with preliminary orbital solutions and individual epoch observations are part of this release. Reflectance spectra derived from the epoch BPRP spectral data are published for about 60\,000 asteroids. Finally, an additional data set is provided, namely the Gaia Andromeda Photometric Survey (abridged) △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: 23 pages, 2 figures

arXiv:2206.12174 [pdf, other]

doi 10.1051/0004-6361/202243791

Gaia Data Release 3: Reflectance spectra of Solar System small bodies

Authors: Gaia Collaboration, L. Galluccio, M. Delbo, F. De Angeli, T. Pauwels, P. Tanga, F. Mignard, A. Cellino, A. G. A. Brown, K. Muinonen, A. Penttila, S. Jordan, A. Vallenari, T. Prusti, J. H. J. de Bruijne, F. Arenou, C. Babusiaux, M. Biermann, O. L. Creevey, C. Ducourant, D. W. Evans, L. Eyer, R. Guerra, A. Hutton, C. Jordi , et al. (422 additional authors not shown)

Abstract: The Gaia mission of the European Space Agency (ESA) has been routinely observing Solar System objects (SSOs) since the beginning of its operations in August 2014. The Gaia data release three (DR3) includes, for the first time, the mean reflectance spectra of a selected sample of 60 518 SSOs, primarily asteroids, observed between August 5, 2014, and May 28, 2017. Each reflectance spectrum was deriv… ▽ More The Gaia mission of the European Space Agency (ESA) has been routinely observing Solar System objects (SSOs) since the beginning of its operations in August 2014. The Gaia data release three (DR3) includes, for the first time, the mean reflectance spectra of a selected sample of 60 518 SSOs, primarily asteroids, observed between August 5, 2014, and May 28, 2017. Each reflectance spectrum was derived from measurements obtained by means of the Blue and Red photometers (BP/RP), which were binned in 16 discrete wavelength bands. We describe the processing of the Gaia spectral data of SSOs, explaining both the criteria used to select the subset of asteroid spectra published in Gaia DR3, and the different steps of our internal validation procedures. In order to further assess the quality of Gaia SSO reflectance spectra, we carried out external validation against SSO reflectance spectra obtained from ground-based and space-borne telescopes and available in the literature. For each selected SSO, an epoch reflectance was computed by dividing the calibrated spectrum observed by the BP/RP at each transit on the focal plane by the mean spectrum of a solar analogue. The latter was obtained by averaging the Gaia spectral measurements of a selected sample of stars known to have very similar spectra to that of the Sun. Finally, a mean of the epoch reflectance spectra was calculated in 16 spectral bands for each SSO. The agreement between Gaia mean reflectance spectra and those available in the literature is good for bright SSOs, regardless of their taxonomic spectral class. We identify an increase in the spectral slope of S-type SSOs with increasing phase angle. Moreover, we show that the spectral slope increases and the depth of the 1 um absorption band decreases for increasing ages of S-type asteroid families. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: 30 pages, 26 figures

arXiv:2206.10986 [pdf, other]

doi 10.1051/0004-6361/202243809

Gaia Data Release 3: Properties of the line broadening parameter derived with the Radial Velocity Spectrometer (RVS)

Authors: Y. Frémat, F. Royer, O. Marchal, R. Blomme, P. Sartoretti, A. Guerrier, P. Panuzzo, D. Katz, G. M. Seabroke, F. Thévenin, M. Cropper, K. Benson, Y. Damerdji, R. Haigron, A. Lobel, M. Smith, S. G. Baker, L. Chemin, M. David, C. Dolding, E. Gosset, K. Janßen, G. Jasniewicz, G. Plum, N. Samaras , et al. (16 additional authors not shown)

Abstract: The third release of the Gaia catalogue contains the radial velocities for 33,812,183 stars having effective temperatures ranging from 3100 K to 14,500 K. The measurements are based on the comparison of the observed RVS spectrum (wavelength coverage: 846--870 nm, median resolving power: 11,500) to synthetic data broadened to the adequate Along-Scan Line Spread Function. The additional line-broaden… ▽ More The third release of the Gaia catalogue contains the radial velocities for 33,812,183 stars having effective temperatures ranging from 3100 K to 14,500 K. The measurements are based on the comparison of the observed RVS spectrum (wavelength coverage: 846--870 nm, median resolving power: 11,500) to synthetic data broadened to the adequate Along-Scan Line Spread Function. The additional line-broadening, fitted as it would only be due to axial rotation, is also produced by the pipeline and is available in the catalogue (field name gaia_source:vbroad). To describe the properties of the line-broadening information extracted from the RVS and published in the catalogue, as well as to analyse the limitations imposed by the adopted method, wavelength range, and instrument. We use simulations to express the link existing between the line broadening measurement provided in Gaia Data Release 3 and Vsin(i). We then compare the observed values to the measurements published by various catalogues and surveys (GALAH, APOGEE, LAMOST, ...). While we recommend being cautious in the interpretation of the vbroad measurement, we also find a reasonable global agreement between the Gaia Data Release 3 line broadening values and those found in the other catalogues. We discuss and establish the validity domain of the published vbroad values. The estimate tends to be overestimated at the lower vsini end, and at $T_\mathrm{eff}>7500\,\mathrm{K}$ its quality and significance degrade rapidly when $G_\mathrm{RVS}>10$. Despite all the known and reported limitations, the Gaia Data Release 3 line broadening catalogue contains the measurements obtained for 3,524,677 stars with $T_\mathrm{eff}$\ ranging from 3500 to 14,500 K, and $G_\mathrm{RVS}<12$. It gathers the largest stellar sample ever considered for the purpose, and allows a first mapping of the \Gaia\ line broadening parameter across the HR diagram. △ Less

Submitted 27 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: 19 pages, 17 figures, see https://www.cosmos.esa.int/web/gaia/dr3-papers Paper accepted for publication in Astronomy and Astrophysics on 23th June 2022

Journal ref: A&A 674, A8 (2023)

arXiv:2206.09044 [pdf, other]

Universal Complexity Bounds Based on Value Iteration and Application to Entropy Games

Authors: Xavier Allamigeon, Stéphane Gaubert, Ricardo D. Katz, Mateusz Skomra

Abstract: We develop value iteration-based algorithms to solve in a unified manner different classes of combinatorial zero-sum games with mean-payoff type rewards. These algorithms rely on an oracle, evaluating the dynamic programming operator up to a given precision. We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the… ▽ More We develop value iteration-based algorithms to solve in a unified manner different classes of combinatorial zero-sum games with mean-payoff type rewards. These algorithms rely on an oracle, evaluating the dynamic programming operator up to a given precision. We show that the number of calls to the oracle needed to determine exact optimal (positional) strategies is, up to a factor polynomial in the dimension, of order R/sep, where the "separation" sep is defined as the minimal difference between distinct values arising from strategies, and R is a metric estimate, involving the norm of approximate sub and super-eigenvectors of the dynamic programming operator. We illustrate this method by two applications. The first one is a new proof, leading to improved complexity estimates, of a theorem of Boros, Elbassioni, Gurvich and Makino, showing that turn-based mean-payoff games with a fixed number of random positions can be solved in pseudo-polynomial time. The second one concerns entropy games, a model introduced by Asarin, Cervelle, Degorre, Dima, Horn and Kozyakin. The rank of an entropy game is defined as the maximal rank among all the ambiguity matrices determined by strategies of the two players. We show that entropy games with a fixed rank, in their original formulation, can be solved in polynomial time, and that an extension of entropy games incorporating weights can be solved in pseudo-polynomial time under the same fixed rank condition. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 41 pages, 7 figures

arXiv:2206.06207 [pdf, other]

doi 10.1051/0004-6361/202243797

Gaia Data Release 3: Mapping the asymmetric disc of the Milky Way

Authors: Gaia Collaboration, R. Drimmel, M. Romero-Gomez, L. Chemin, P. Ramos, E. Poggio, V. Ripepi, R. Andrae, R. Blomme, T. Cantat-Gaudin, A. Castro-Ginard, G. Clementini, F. Figueras, M. Fouesneau, Y. Fremat, K. Jardine, S. Khanna, A. Lobel, D. J. Marshall, T. Muraveva, A. G. A. Brown, A. Vallenari, T. Prusti, J. H. J. de Bruijne, F. Arenou , et al. (431 additional authors not shown)

Abstract: With the most recent Gaia data release the number of sources with complete 6D phase space information (position and velocity) has increased to well over 33 million stars, while stellar astrophysical parameters are provided for more than 470 million sources, in addition to the identification of over 11 million variable stars. Using the astrophysical parameters and variability classifications provid… ▽ More With the most recent Gaia data release the number of sources with complete 6D phase space information (position and velocity) has increased to well over 33 million stars, while stellar astrophysical parameters are provided for more than 470 million sources, in addition to the identification of over 11 million variable stars. Using the astrophysical parameters and variability classifications provided in Gaia DR3, we select various stellar populations to explore and identify non-axisymmetric features in the disc of the Milky Way in both configuration and velocity space. Using more about 580 thousand sources identified as hot OB stars, together with 988 known open clusters younger than 100 million years, we map the spiral structure associated with star formation 4-5 kpc from the Sun. We select over 2800 Classical Cepheids younger than 200 million years, which show spiral features extending as far as 10 kpc from the Sun in the outer disc. We also identify more than 8.7 million sources on the red giant branch (RGB), of which 5.7 million have line-of-sight velocities, allowing the velocity field of the Milky Way to be mapped as far as 8 kpc from the Sun, including the inner disc. The spiral structure revealed by the young populations is consistent with recent results using Gaia EDR3 astrometry and source lists based on near infrared photometry, showing the Local (Orion) arm to be at least 8 kpc long, and an outer arm consistent with what is seen in HI surveys, which seems to be a continuation of the Perseus arm into the third quadrant. Meanwhile, the subset of RGB stars with velocities clearly reveals the large scale kinematic signature of the bar in the inner disc, as well as evidence of streaming motions in the outer disc that might be associated with spiral arms or bar resonances. (abridged) △ Less

Submitted 5 August, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 35 pages, 27 figures, accepted for publication in A&A special Gaia DR3 issue. V2: abstract completed. V3: complete author list and link to data: https://drive.google.com/drive/u/1/folders/1yOJPjYmM7QK5XVsqaiSOTuwDQNti2LlZ

Journal ref: A&A 674, A37 (2023)

arXiv:2206.06075 [pdf, other]

doi 10.1051/0004-6361/202243767

Gaia Data Release 3: Pulsations in main sequence OBAF-type stars

Authors: Gaia Collaboration, J. De Ridder, V. Ripepi, C. Aerts, L. Palaversa, L. Eyer, B. Holl, M. Audard, L. Rimoldini, A. G. A. Brown, A. Vallenari, T. Prusti, J. H. J. de Bruijne, F. Arenou, C. Babusiaux, M. Biermann, O. L. Creevey, C. Ducourant, D. W. Evans, R. Guerra, A. Hutton, C. Jordi, S. A. Klioner, U. L. Lammers, L. Lindegren , et al. (423 additional authors not shown)

Abstract: The third Gaia data release provides photometric time series covering 34 months for about 10 million stars. For many of those stars, a characterisation in Fourier space and their variability classification are also provided. This paper focuses on intermediate- to high-mass (IHM) main sequence pulsators M >= 1.3 Msun) of spectral types O, B, A, or F, known as beta Cep, slowly pulsating B (SPB), del… ▽ More The third Gaia data release provides photometric time series covering 34 months for about 10 million stars. For many of those stars, a characterisation in Fourier space and their variability classification are also provided. This paper focuses on intermediate- to high-mass (IHM) main sequence pulsators M >= 1.3 Msun) of spectral types O, B, A, or F, known as beta Cep, slowly pulsating B (SPB), delta Sct, and gamma Dor stars. These stars are often multi-periodic and display low amplitudes, making them challenging targets to analyse with sparse time series. All datasets used in this analysis are part of the Gaia DR3 data release. The photometric time series were used to perform a Fourier analysis, while the global astrophysical parameters necessary for the empirical instability strips were taken from the Gaia DR3 gspphot tables, and the vsini data were taken from the Gaia DR3 esphs tables. We show that for nearby OBAF-type pulsators, the Gaia DR3 data are precise and accurate enough to pinpoint them in the Hertzsprung-Russell diagram. We find empirical instability strips covering broader regions than theoretically predicted. In particular, our study reveals the presence of fast rotating gravity-mode pulsators outside the strips, as well as the co-existence of rotationally modulated variables inside the strips as reported before in the literature. We derive an extensive period-luminosity relation for delta Sct stars and provide evidence that the relation features different regimes depending on the oscillation period. Finally, we demonstrate how stellar rotation attenuates the amplitude of the dominant oscillation mode of delta Sct stars. △ Less

Submitted 16 August, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

Journal ref: A&A 674, A36 (2023)

arXiv:2206.05902 [pdf, other]

doi 10.1051/0004-6361/202244220

Gaia Data Release 3 Properties and validation of the radial velocities

Authors: D. Katz, P. Sartoretti, A. Guerrier, P. Panuzzo, G. M. Seabroke, F. Thévenin, M. Cropper, K. Benson, R. Blomme, R. Haigron, O. Marchal, M. Smith, S. Baker, L. Chemin, Y. Damerdji, M. David, C. Dolding, Y. Frémat, E. Gosset, K. Janßen, G. Jasniewicz, A. Lobel, G. Plum, N. Samaras, O. Snaith , et al. (25 additional authors not shown)

Abstract: Gaia Data Release 3 (Gaia DR3) contains the second release of the combined radial velocities. It is based on the spectra collected during the first 34 months of the nominal mission. The longer time baseline and the improvements of the pipeline made it possible to push the processing limit, from Grvs = 12 in Gaia DR2, to Grvs = 14 mag. In this article, we describe the new functionalities implemente… ▽ More Gaia Data Release 3 (Gaia DR3) contains the second release of the combined radial velocities. It is based on the spectra collected during the first 34 months of the nominal mission. The longer time baseline and the improvements of the pipeline made it possible to push the processing limit, from Grvs = 12 in Gaia DR2, to Grvs = 14 mag. In this article, we describe the new functionalities implemented for Gaia DR3, the quality filters applied during processing and post-processing and the properties and performance of the published velocities. For Gaia DR3, several functionalities were upgraded or added. (Abridged) Gaia DR3 contains the combined radial velocities of 33 812 183 stars. With respect to Gaia DR2, the interval of temperature has been expanded from Teff \in [3600, 6750] K to Teff \in [3100, 14500] K for the bright stars ( Grvs \leq 12 mag) and [3100, 6750] K for the fainter stars. The radial velocities sample a significant part of the Milky Way: they reach a few kilo-parsecs beyond the Galactic centre in the disc and up to about 10-15 kpc vertically into the inner halo. The median formal precision of the velocities is of 1.3 km/s at Grvs = 12 and 6.4 km/s at Grvs = 14 mag. The velocity zero point exhibits a small systematic trend with magnitude starting around Grvs = 11 mag and reaching about 400 m/s at Grvs = 14 mag. A correction formula is provided, which can be applied to the published data. The Gaia DR3 velocity scale is in satisfactory agreement with APOGEE, GALAH, GES and RAVE, with systematic differences that mostly do not exceed a few hundreds m/s. The properties of the radial velocities are also illustrated with specific objects: open clusters, globular clusters as well as the Large Magellanic Cloud (LMC). For example, the precision of the data allows to map the line-of-sight rotational velocities of the globular cluster 47 Tuc and of the LMC. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: Sumitted to A&A

Journal ref: A&A 674, A5 (2023)

arXiv:2206.05870 [pdf, other]

doi 10.1051/0004-6361/202243800

Gaia Data Release 3: A Golden Sample of Astrophysical Parameters

Authors: Gaia Collaboration, O. L. Creevey, L. M. Sarro, A. Lobel, E. Pancino, R. Andrae, R. L. Smart, G. Clementini, U. Heiter, A. J. Korn, M. Fouesneau, Y. Frémat, F. De Angeli, A. Vallenari, D. L. Harrison, F. Thévenin, C. Reylé, R. Sordo, A. Garofalo, A. G. A. Brown, L. Eyer, T. Prusti, J. H. J. de Bruijne, F. Arenou, C. Babusiaux , et al. (423 additional authors not shown)

Abstract: Gaia Data Release 3 (DR3) provides a wealth of new data products for the astronomical community to exploit, including astrophysical parameters for a half billion stars. In this work we demonstrate the high quality of these data products and illustrate their use in different astrophysical contexts. We query the astrophysical parameter tables along with other tables in Gaia DR3 to derive the samples… ▽ More Gaia Data Release 3 (DR3) provides a wealth of new data products for the astronomical community to exploit, including astrophysical parameters for a half billion stars. In this work we demonstrate the high quality of these data products and illustrate their use in different astrophysical contexts. We query the astrophysical parameter tables along with other tables in Gaia DR3 to derive the samples of the stars of interest. We validate our results by using the Gaia catalogue itself and by comparison with external data. We have produced six homogeneous samples of stars with high quality astrophysical parameters across the HR diagram for the community to exploit. We first focus on three samples that span a large parameter space: young massive disk stars (~3M), FGKM spectral type stars (~3M), and UCDs (~20K). We provide these sources along with additional information (either a flag or complementary parameters) as tables that are made available in the Gaia archive. We furthermore identify 15740 bone fide carbon stars, 5863 solar-analogues, and provide the first homogeneous set of stellar parameters of the Spectro Photometric Standard Stars. We use a subset of the OBA sample to illustrate its usefulness to analyse the Milky Way rotation curve. We then use the properties of the FGKM stars to analyse known exoplanet systems. We also analyse the ages of some unseen UCD-companions to the FGKM stars. We additionally predict the colours of the Sun in various passbands (Gaia, 2MASS, WISE) using the solar-analogue sample. △ Less

Submitted 12 June, 2022; originally announced June 2022.

Comments: 35 pages, (incl 6 pages references, acknowledgements, affiliations), 37 figures, A&A accepted

Journal ref: A&A 674, A39 (2023)

Showing 1–50 of 434 results for author: Katz, D