subscribe to arXiv mailings

Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs

Authors: Sanjeet Singh, Shreya Gupta, Niralee Gupta, Naimish Sharma, Lokesh Srivastava, Vibhu Agarwal, Ashutosh Modi

Abstract: The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the lett… ▽ More The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the letter and spirit of the law. Computer-based systems for de-identification of personal information are vulnerable to data drift, often rendering them ineffective in cross-institution settings. Therefore, a rigorous assessment of existing de-identification against local health datasets is imperative to support the safe adoption of digital health initiatives in India. Using a small set of de-identified patient discharge summaries provided by an Indian healthcare institution, in this paper, we report the nominal performance of de-identification algorithms (based on language models) trained on publicly available non-Indian datasets, pointing towards a lack of cross-institutional generalization. Similarly, experimentation with off-the-shelf de-identification systems reveals potential risks associated with the approach. To overcome data scarcity, we explore generating synthetic clinical reports (using publicly available and Indian summaries) by performing in-context learning over Large Language Models (LLMs). Our experiments demonstrate the use of generated reports as an effective strategy for creating high-performing de-identification systems with good generalization capabilities. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted at BioNLP Workshop at ACL 2024; 21 pages (9 pages main content)

arXiv:2407.04727 [pdf]

Dynamical Embedding of Single Channel Electroencephalogram for Artifact Subspace Reconstruction

Authors: Doli Hazarika, Vishnu KN, Ramdas Ransing, Cota Navin Gupta

Abstract: This study introduces a novel framework to apply Artifact Subspace Reconstruction (ASR) algorithm on single-channel Electroencephalogram (EEG) data. ASR, renowned for its automated capability to effectively eliminate various artifacts like eye-blinks and eye movements from EEG signals. Importantly it has been implemented on android smartphones, but relied on multiple channels for principal compone… ▽ More This study introduces a novel framework to apply Artifact Subspace Reconstruction (ASR) algorithm on single-channel Electroencephalogram (EEG) data. ASR, renowned for its automated capability to effectively eliminate various artifacts like eye-blinks and eye movements from EEG signals. Importantly it has been implemented on android smartphones, but relied on multiple channels for principal component subspace calculations. To overcome this limitation, we incorporate the established dynamical embedding approach into the algorithm, naming it Embedded-ASR (E-ASR). In our proposed method, an embedded matrix is first constructed from a single-channel EEG data using series of delay vectors. ASR is then applied to this embedded matrix, and the resulting cleaned single-channel EEG is reconstructed by removing the time lag and concatenating the rows of the embedded matrix. Data was collected from four subjects in resting states with eyes open from pre-frontal (Fp1 and Fp2) electrodes using CameraEEG app. To assess the effectiveness of the E-ASR algorithm on an EEG dataset with artifacts, we employed performance metrics such as relative root mean square error (RRMSE), correlation coefficient (CC), average power ratio as well as estimated the number of eye-blinks with and without the E-ASR approach. E-ASR was able to reduce artifacts from the semi-simulated EEG data, with an RRMSE of 45.45% and a CC of 0.91. For real EEG data, the counted eye-blinks were manually cross-checked with ground truth obtained from CameraEEG video data across all subjects for individual Fp1 and Fp2 electrodes. In conclusion, our study suggests E-ASR framework can remove artifacts from single channel EEG data. This promising algorithm might have potential for smartphone-based natural environment EEG applications, where minimal number of electrodes is a critical factor. △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.14670 [pdf, other]

Exploring Design Choices for Building Language-Specific LLMs

Authors: Atula Tejaswi, Nilesh Gupta, Eunsol Choi

Abstract: Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of… ▽ More Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of efficiency (how many tokens are needed to encode the same amount of information) and end task performance. We find that (1) the initial performance before the adaptation is not always indicative of the final performance. (2) Efficiency can easily improved with simple vocabulary extension and continued fine-tuning in most LLMs we study, and (3) The optimal adaptation method is highly language-dependent, and the simplest approach works well across various experimental settings. Adapting English-centric models can yield better results than adapting multilingual models despite their worse initial performance on low-resource languages. Together, our work lays foundations on efficiently building language-specific LLMs by adapting existing LLMs. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 15 pages, 6 figures, 11 tables

arXiv:2406.10005 [pdf, ps, other]

Optimal Rates for Functional Linear Regression with General Regularization

Authors: Naveen Gupta, S. Sivananthan, Bharath K. Sriperumbudur

Abstract: Functional linear regression is one of the fundamental and well-studied methods in functional data analysis. In this work, we investigate the functional linear regression model within the context of reproducing kernel Hilbert space by employing general spectral regularization to approximate the slope function with certain smoothness assumptions. We establish optimal convergence rates for estimatio… ▽ More Functional linear regression is one of the fundamental and well-studied methods in functional data analysis. In this work, we investigate the functional linear regression model within the context of reproducing kernel Hilbert space by employing general spectral regularization to approximate the slope function with certain smoothness assumptions. We establish optimal convergence rates for estimation and prediction errors associated with the proposed method under a Hölder type source condition, which generalizes and sharpens all the known results in the literature. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.05834 [pdf, other]

Stochastic ordering of series and parallel systems lifetime in Archimedean copula under random shock

Authors: Sarikul Islam, Nitin Gupta

Abstract: In this manuscript, we studied the stochastic ordering behavior of series as well as parallel systems lifetimes comprising dependent and heterogeneous components, experiencing random shocks, and exhibiting distinct dependency structures. We establish certain conditions for the lifetime of individual components, the dependency among components defined by Archimedean copulas, and the impact of rando… ▽ More In this manuscript, we studied the stochastic ordering behavior of series as well as parallel systems lifetimes comprising dependent and heterogeneous components, experiencing random shocks, and exhibiting distinct dependency structures. We establish certain conditions for the lifetime of individual components, the dependency among components defined by Archimedean copulas, and the impact of random shocks on the overall system lifetime to reach the conclusion. We consider components whose survival functions are either increasing log-concave or decreasing log-convex functions of the parameters involved and systems exhibit different dependency structures. These conditions make it possible to compare the lifetimes of two systems using the usual stochastic order framework. Additionally, we provide examples and graphical representations to elucidate our theoretical findings. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Number of pages 18, total figure 4

Report number: AP-2024-19419 MSC Class: Primary: 60E15. Secondary: 90B25; 62G30

arXiv:2405.20244 [pdf, ps, other]

Chiral $Λ$-$\mathfrak{bms}_4$ symmetry of 3d conformal gravity

Authors: Nishant Gupta, Nemani V. Suryanarayana

Abstract: We propose mixed boundary conditions for 3d conformal gravity consistent with variational principle in its second-order formalism that admit the chiral $Λ$-$\mathfrak{bms}_4$ algebra as their asymptotic symmetry algebra. This algebra is one of the four chiral $\mathcal W$-algebra extensions of $\mathfrak{so}(2,3)$ and is a generalisation of the chiral $\mathfrak{bms}_4$ algebra responsible for sof… ▽ More We propose mixed boundary conditions for 3d conformal gravity consistent with variational principle in its second-order formalism that admit the chiral $Λ$-$\mathfrak{bms}_4$ algebra as their asymptotic symmetry algebra. This algebra is one of the four chiral $\mathcal W$-algebra extensions of $\mathfrak{so}(2,3)$ and is a generalisation of the chiral $\mathfrak{bms}_4$ algebra responsible for soft theorems of graviton MHV amplitudes in ${\mathbb R}^{1,3}$ gravity to the case of non-zero negative cosmological constant. The corresponding charges calculated using the modified covariant phase space formalism are shown to be finite and integrable, and realise this non-linear ${\cal W}$-algebra. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 22 pages

arXiv:2405.19261 [pdf, other]

Faster Cascades via Speculative Decoding

Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in parallel verification mode. These mechanisms offer different benefits: empirically, cascades are often capable of yielding better quality than even the larger model, while theoretically, speculative decoding offers a guarantee of quality-neutrality. In this paper, we leverage the best of both these approaches by designing new speculative cascading techniques that implement their deferral rule through speculative execution. We characterize the optimal deferral rule for our speculative cascades, and employ a plug-in approximation to the optimal rule. Through experiments with T5 models on benchmark language tasks, we show that the proposed approach yields better cost-quality trade-offs than cascading and speculative decoding baselines. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.15657 [pdf, other]

Multiple Emission Regions in Jets of Low Luminosity Active Galactic Nucleus in NGC 4278

Authors: Samik Dutta, Nayantara Gupta

Abstract: The Large High Altitude Airshower Array (LHAASO) has detected very high energy gamma rays from the LINER galaxy NGC 4278, which has a low luminosity active galactic nucleus, and symmetric mildly relativistic S-shaped twin jets detected by radio observations. Few low-luminosity active galactic nuclei are detected in gamma rays due to their faintness. Earlier, several radio-emitting components were… ▽ More The Large High Altitude Airshower Array (LHAASO) has detected very high energy gamma rays from the LINER galaxy NGC 4278, which has a low luminosity active galactic nucleus, and symmetric mildly relativistic S-shaped twin jets detected by radio observations. Few low-luminosity active galactic nuclei are detected in gamma rays due to their faintness. Earlier, several radio-emitting components were detected in the jets of NGC 4278. We model their radio emission with synchrotron emission of ultra-relativistic electrons to estimate the strength of the magnetic field inside these components within a time-dependent framework after including the ages of the different components. We show that the synchrotron and synchrotron self-Compton emission by these components cannot explain the Swift X-ray data detected from NGC 4278 and the LHAASO gamma-ray data associated with NGC 4278. We suggest that a separate component in one of the jets is responsible for the high energy emission whose age, size, magnetic field and the spectrum of the ultra-relativistic electrons inside it have been estimated after fitting the multi-wavelength data of NGC 4278 with the sum of the spectral energy distributions from the radio components and the high energy component. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14432 [pdf, other]

Boosting Robustness by Clipping Gradients in Distributed Learning

Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan

Abstract: Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. Th… ▽ More Robust distributed learning consists in achieving good learning performance despite the presence of misbehaving workers. State-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods, relying on robust aggregation, have been proven to be optimal: Their learning error matches the lower bound established under the standard heterogeneity model of $(G, B)$-gradient dissimilarity. The learning guarantee of SOTA Robust-DGD cannot be further improved when model initialization is done arbitrarily. However, we show that it is possible to circumvent the lower bound, and improve the learning performance, when the workers' gradients at model initialization are assumed to be bounded. We prove this by proposing pre-aggregation clipping of workers' gradients, using a novel scheme called adaptive robust clipping (ARC). Incorporating ARC in Robust-DGD provably improves the learning, under the aforementioned assumption on model initialization. The factor of improvement is prominent when the tolerable fraction of misbehaving workers approaches the breakdown point. ARC induces this improvement by constricting the search space, while preserving the robustness property of the original aggregation scheme at the same time. We validate this theoretical finding through exhaustive experiments on benchmark image classification tasks. △ Less

Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11201 [pdf, ps, other]

On General Weighted Extropy of Percentile Ranked Set Sampling

Authors: Pradeep Kumar Sahu, Nitin Gupta

Abstract: The extropy measure, first proposed by Lad, Sanfilippo, and Agro in their (2015) paper in Statistical Science, has attracted considerable attention in recent years. Our study introduces a fresh approach to representing weighted extropy in the framework of percentile ranked set sampling. Furthermore, we provide additional insights such as stochastic orders, characterizations, and bounds. Our findin… ▽ More The extropy measure, first proposed by Lad, Sanfilippo, and Agro in their (2015) paper in Statistical Science, has attracted considerable attention in recent years. Our study introduces a fresh approach to representing weighted extropy in the framework of percentile ranked set sampling. Furthermore, we provide additional insights such as stochastic orders, characterizations, and bounds. Our findings illuminate the comparison between the weighted extropy of percentile ranked set sampling and its equivalent in simple random sampling. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2403.02673, arXiv:2207.02003

arXiv:2405.09966 [pdf, ps, other]

Tempered Fractional Hawkes Process and Its Generalization

Authors: Neha Gupta, Aditya Maheshwari

Abstract: Hawkes process (HP) is a point process with a conditionally dependent intensity function. This paper defines the tempered fractional Hawkes process (TFHP) by time-changing the HP with an inverse tempered stable subordinator. We obtained results that generalize the fractional Hawkes process defined in Hainaut (2020) to a tempered version which has \textit{semi-heavy tailed} decay. We derive the mea… ▽ More Hawkes process (HP) is a point process with a conditionally dependent intensity function. This paper defines the tempered fractional Hawkes process (TFHP) by time-changing the HP with an inverse tempered stable subordinator. We obtained results that generalize the fractional Hawkes process defined in Hainaut (2020) to a tempered version which has \textit{semi-heavy tailed} decay. We derive the mean, the variance, covariance and the governing fractional difference-differential equations of the TFHP. Additionally, we introduce the generalized fractional Hawkes process (GFHP) by time-changing the HP with the inverse Lévy subordinator. This definition encompasses all potential (inverse Lévy) time changes as specific instances. We also explore the distributional characteristics and the governing difference-differential equation of the one-dimensional distribution for the GFHP. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 15 papages

MSC Class: 60G22; 60G51; 60G55

arXiv:2405.07205 [pdf, ps, other]

On Epimorphism and related problems for linear hypersurfaces

Authors: Parnashree Ghosh, Neena Gupta, Ananya Pal

Abstract: Linear hypersurfaces over a field $k$ have been playing a central role in the study of some of the challenging problems on affine spaces. Breakthroughs on such problems have occurred by examining two questions on linear polynomials of the form\\ $H:=α(X_1,\dots,X_m)Y - F(X_1,\dots, X_m,Z,T)\in D:=k[X_1,\ldots,X_m, Y,Z,T]$: (i) Whether the affine variety $\mathbb{V}\in \mathbb{A}^{m+3}_k$ defined b… ▽ More Linear hypersurfaces over a field $k$ have been playing a central role in the study of some of the challenging problems on affine spaces. Breakthroughs on such problems have occurred by examining two questions on linear polynomials of the form\\ $H:=α(X_1,\dots,X_m)Y - F(X_1,\dots, X_m,Z,T)\in D:=k[X_1,\ldots,X_m, Y,Z,T]$: (i) Whether the affine variety $\mathbb{V}\in \mathbb{A}^{m+3}_k$ defined by $H$ is isomorphic to $\mathbb{A}^{m+2}_k$. (ii) If $\mathbb{V}$ is isomorphic to an affine space, then whether $H$ is a coordinate in $D$. In \cite{adv2}, the first two authors had addressed these questions when $α$ is a monomial of the form $α(X_1,\ldots,X_m) = X_1^{r_1}\dots X_m^{r_m}$; $r_i>1,\, 1 \leqslant i \leqslant m$ and $F$ is of a certain type. In this paper, using $K$-theory and $\mathbb{G}_a$-actions, we address these questions for a wider family of linear varieties. In particular, we show that when the characteristic of $k$ is zero, $F \in k[Z,T]$ and $H$ defines a hyperplane (i.e., the affine variety $\mathbb{V}$ defined by $H$ is an affine space), then $H$ is a coordinate in $D$ along with $X_1, X_2, \dots, X_m$. As a consequence we obtain a certain families of higher dimensional linear hyperplanes satisfying the Abhyankar-Sathaye conjecture on the Epimorphism Problem. Our results in arbitrary characteristic yield counter examples to the Zariski Cancellation Problem in positive characteristic. △ Less

Submitted 12 May, 2024; originally announced May 2024.

MSC Class: Primary: 14R10; Secondary: 13B25; 13A50; 13A02

arXiv:2405.04374 [pdf, other]

ASKAP reveals the radio tail structure of the Corkscrew Galaxy shaped by its passage through the Abell 3627 cluster

Authors: Bärbel S. Koribalski, Stefan W. Duchesne, Emil Lenc, Tiziana Venturi, Andrea Botteon, Stanislav S. Shabala, Tessa Vernstrom, Ettore Carretti, Ray P. Norris, Craig Anderson, Andrew M. Hopkins, C. J. Riseley, Nikhel Gupta, Velibor Velović, -

Abstract: Among the bent tail radio galaxies common in galaxy clusters are some with long, collimated tails (so-called head-tail galaxies) shaped by their interactions with the intracluster medium (ICM). Here we report the discovery of intricate filamentary structure in and beyond the ~28' (570 kpc) long, helical radio tail of the Corkscrew Galaxy (1610-60.5, ESO137-G007), which resides in the X-ray bright… ▽ More Among the bent tail radio galaxies common in galaxy clusters are some with long, collimated tails (so-called head-tail galaxies) shaped by their interactions with the intracluster medium (ICM). Here we report the discovery of intricate filamentary structure in and beyond the ~28' (570 kpc) long, helical radio tail of the Corkscrew Galaxy (1610-60.5, ESO137-G007), which resides in the X-ray bright cluster Abell 3627 (D = 70 Mpc). Deep radio continuum data were obtained with wide-field Phased Array Feeds on the Australian Square Kilometer Array Pathfinder (ASKAP) at 944 MHz and 1.4 GHz. The Corkscrew Galaxy is located 15' north of the prominent wide-angle tail (WAT) radio galaxy 1610-60.8 (ESO137-G006) near the cluster centre. While the bright (young) part of its radio tail is highly collimated, the faint (old) part shows increasing oscillation amplitudes, break-ups, and filaments. We find a stunning set of arc-shaped radio filaments beyond and mostly orthogonal to the collimated Corkscrew tail end, forming a partial bubble. This may be the first detection of a "proto-lobe" seen in 3D MHD simulations by Nolting et al. (2019), formed by the face-on impact of the Corkscrew Galaxy with a shock front in the cluster outskirts. Interactions of the radio galaxy tail with the ICM are likely responsible for the tail collimation and shear forces within the ICM for its increasingly filamentary structure. We also report the discovery of small (~20-30 kpc) ram-pressure stripped radio tails in four Abell 3627 cluster galaxies. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures, MNRAS, submitted

arXiv:2405.00491 [pdf, ps, other]

On the Relevance of Byzantine Robust Optimization Against Data Poisoning

Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot

Abstract: The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faul… ▽ More The success of machine learning (ML) has been intimately linked with the availability of large amounts of data, typically collected from heterogeneous sources and processed on vast networks of computing devices (also called {\em workers}). Beyond accuracy, the use of ML in critical domains such as healthcare and autonomous driving calls for robustness against {\em data poisoning}and some {\em faulty workers}. The problem of {\em Byzantine ML} formalizes these robustness issues by considering a distributed ML environment in which workers (storing a portion of the global dataset) can deviate arbitrarily from the prescribed algorithm. Although the problem has attracted a lot of attention from a theoretical point of view, its practical importance for addressing realistic faults (where the behavior of any worker is locally constrained) remains unclear. It has been argued that the seemingly weaker threat model where only workers' local datasets get poisoned is more reasonable. We prove that, while tolerating a wider range of faulty behaviors, Byzantine ML yields solutions that are, in a precise sense, optimal even under the weaker data poisoning threat model. Then, we study a generic data poisoning model wherein some workers have {\em fully-poisonous local data}, i.e., their datasets are entirely corruptible, and the remainders have {\em partially-poisonous local data}, i.e., only a fraction of their local datasets is corruptible. We prove that Byzantine-robust schemes yield optimal solutions against both these forms of data poisoning, and that the former is more harmful when workers have {\em heterogeneous} local data. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 38 pages

arXiv:2404.18462 [pdf, other]

Self-supervised contrastive learning of radio data for source detection, classification and peculiar object discovery

Authors: S. Riggi, T. Cecconello, S. Palazzo, A. M. Hopkins, N. Gupta, C. Bordiu, A. Ingallinera, C. Buemi, F. Bufano, F. Cavallaro, M. D. Filipović, P. Leto, S. Loru, A. C. Ruggeri, C. Trigilio, G. Umana, F. Vitello

Abstract: New advancements in radio data post-processing are underway within the SKA precursor community, aiming to facilitate the extraction of scientific results from survey images through a semi-automated approach. Several of these developments leverage deep learning (DL) methodologies for diverse tasks, including source detection, object or morphology classification, and anomaly detection. Despite subst… ▽ More New advancements in radio data post-processing are underway within the SKA precursor community, aiming to facilitate the extraction of scientific results from survey images through a semi-automated approach. Several of these developments leverage deep learning (DL) methodologies for diverse tasks, including source detection, object or morphology classification, and anomaly detection. Despite substantial progress, the full potential of these methods often remains untapped due to challenges associated with training large supervised models, particularly in the presence of small and class-unbalanced labelled datasets. Self-supervised learning has recently established itself as a powerful methodology to deal with some of the aforementioned challenges, by directly learning a lower-dimensional representation from large samples of unlabelled data. The resulting model and data representation can then be used for data inspection and various downstream tasks if a small subset of labelled data is available. In this work, we explored contrastive learning methods to learn suitable radio data representation from unlabelled images taken from the ASKAP EMU and SARAO MeerKAT GPS surveys. We evaluated trained models and the obtained data representation over smaller labelled datasets, also taken from different radio surveys, in selected analysis tasks: source detection and classification, and search for objects with peculiar morphology. For all explored downstream tasks, we reported and discussed the benefits brought by self-supervised foundational models built on radio data. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 21 pages, 16 figures

arXiv:2404.16816 [pdf, other]

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

Authors: Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, Partha Talukdar

Abstract: As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse… ▽ More As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. IndicGenBench is composed of diverse generation tasks like cross-lingual summarization, machine translation, and cross-lingual question answering. IndicGenBench extends existing benchmarks to many Indic languages through human curation providing multi-way parallel evaluation data for many under-represented Indic languages for the first time. We evaluate a wide range of proprietary and open-source LLMs including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM and LLaMA on IndicGenBench in a variety of settings. The largest PaLM-2 models performs the best on most tasks, however, there is a significant performance gap in all languages compared to English showing that further research is needed for the development of more inclusive multilingual language models. IndicGenBench is released at www.github.com/google-research-datasets/indic-gen-bench △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.10136 [pdf, other]

Language Model Cascades: Token-level uncertainty and beyond

Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks. In this work, we initiate a systematic study of deferral rules for LM cascades. We begin by examining the natural extension of predicted class uncertainty to generative LM tasks, namely, the predicted sequence uncertainty. We show that this measure suffers from the length bias problem, either over- or under-emphasizing outputs based on their lengths. This is because LMs produce a sequence of uncertainty values, one for each output token; and moreover, the number of output tokens is variable across examples. To mitigate this issue, we propose to exploit the richer token-level uncertainty information implicit in generative LMs. We argue that naive predicted sequence uncertainty corresponds to a simple aggregation of these uncertainties. By contrast, we show that incorporating token-level uncertainty through learned post-hoc deferral rules can significantly outperform such simple aggregation strategies, via experiments on a range of natural language benchmarks with FLAN-T5 models. We further show that incorporating embeddings from the smaller model and intermediate layers of the larger model can give an additional boost in the overall cost-quality tradeoff. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09522 [pdf, other]

The Physalis system: Discovery of ORC-like radio shells around a massive pair of interacting early-type galaxies with offset X-ray emission

Authors: Bärbel S. Koribalski, Ildar Khabibullin, Klaus Dolag, Eugene Churazov, Ray P. Norris, Ettore Carretti, Andrew M. Hopkins, Tessa Vernstrom, Stanislav S. Shabala, Nikhel Gupta

Abstract: We present the discovery of large radio shells around a massive pair of interacting galaxies and extended diffuse X-ray emission within the shells. The radio data were obtained with the Australian Square Kilometer Array Pathfinder (ASKAP) in two frequency bands centred at 944 MHz and 1.4 GHz, respectively, while the X-ray data are from the XMM-Newton observatory. The host galaxy pair, which consis… ▽ More We present the discovery of large radio shells around a massive pair of interacting galaxies and extended diffuse X-ray emission within the shells. The radio data were obtained with the Australian Square Kilometer Array Pathfinder (ASKAP) in two frequency bands centred at 944 MHz and 1.4 GHz, respectively, while the X-ray data are from the XMM-Newton observatory. The host galaxy pair, which consists of the early-type galaxies ESO 184-G042 and LEDA 418116, is part of a loose group at a distance of only 75 Mpc (redshift z = 0.017). The observed outer radio shells (diameter ~ 145 kpc) and ridge-like central emission of the system, ASKAP J1914-5433 (Physalis), are likely associated with merger shocks during the formation of the central galaxy (ESO 184-G042) and resemble the new class of odd radio circles (ORCs). This is supported by the brightest X-ray emission found offset from the centre of the Physalis system, instead centered at the less massive galaxy, LEDA 418116. The host galaxy pair is embedded in an irregular envelope of diffuse light, highlighting on-going interactions. We complement our combined radio and X-ray study with high-resolution simulations of the circumgalactic medium (CGM) around galaxy mergers from the Magneticum project to analyse the evolutionary state of the Physalis system. We argue that ORCs / radio shells could be produced by a combination of energy release from the central AGN and subsequent lightening up in radio emission by merger shocks traveling through the CGM of these systems. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 12 pages, 8 figures, submitted to MNRAS

arXiv:2404.05872 [pdf, other]

doi 10.1145/3649153.3649212

TabConv: Low-Computation CNN Inference via Table Lookups

Authors: Neelesh Gupta, Narayanan Kannan, Pengmiao Zhang, Viktor Prasanna

Abstract: Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations, making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still… ▽ More Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations, making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still heavily rely on matrix multiplication, leading to significant computational overhead. To bridge the gap between hardware, algorithmic acceleration, and approximate matrix multiplication, we propose TabConv, a novel, table-based approximation for convolution to significantly reduce arithmetic operations during inference. Additionally, we introduce a priority masking technique based on cosine similarity to select layers for table-based approximation, thereby maintaining the model performance. We evaluate our approach on popular CNNs: ResNet-18, ResNet-34, and NetworkInNetwork (NIN). TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36.5%, 25.8%, and 99.4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35.6% and 99.3% for ResNet-34 on CIFAR-10 and MNIST, and 98.9% for NIN on MNIST, achieving low-computation inference. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 8 pages, Accepted at CF '24

ACM Class: I.5.1

arXiv:2404.00665 [pdf, ps, other]

On cumulative and relative cumulative past information generating function

Authors: Santosh Kumar Chaudhary, Nitin Gupta, Achintya Roy

Abstract: In this paper, we introduce the cumulative past information generating function (CPIG) and relative cumulative past information generating function (RCPIG). We study its properties. We establish its relation with generalized cumulative past entropy (GCPE). We defined CPIG stochastic order and its relation with dispersive order. We provide the results for the CPIG measure of the convoluted random v… ▽ More In this paper, we introduce the cumulative past information generating function (CPIG) and relative cumulative past information generating function (RCPIG). We study its properties. We establish its relation with generalized cumulative past entropy (GCPE). We defined CPIG stochastic order and its relation with dispersive order. We provide the results for the CPIG measure of the convoluted random variables in terms of the measures of its components. We found some inequality relating to Shannon entropy, CPIG and GCPE. Some characterization and estimation results are also discussed regarding CPIG. We defined divergence measures between two random variables, Jensen-cumulative past information generating function(JCPIG), Jensen fractional cumulative past entropy measure, cumulative past Taneja entropy, and Jensen cumulative past Taneja entropy information measure. △ Less

Submitted 22 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.20327 [pdf, other]

Gecko: Versatile Text Embeddings Distilled from Large Language Models

Authors: Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM. The effectiveness of our approach is demonstrated by the compactness of the Gecko. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size. Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2403.17558 [pdf, ps, other]

Neural category

Authors: Neha Gupta, Suhith K N

Abstract: A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. Curto et al. \cite{curto2013neural} associated a ring $\mathcal{R}_{\mathcal{C}}$ (neural ring) to a neural code $\mathcal{C}$. A special class of ring homomorphisms between two neural rings, called neural ring homomorphism, was introduced by Curto and Youngs \cite{curto2020neural}. The main work in this… ▽ More A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. Curto et al. \cite{curto2013neural} associated a ring $\mathcal{R}_{\mathcal{C}}$ (neural ring) to a neural code $\mathcal{C}$. A special class of ring homomorphisms between two neural rings, called neural ring homomorphism, was introduced by Curto and Youngs \cite{curto2020neural}. The main work in this paper comprises constructing two categories. First is the $\mathfrak{C}$ category, a subcategory of SETS consisting of neural codes and code maps. Second is the neural category $\mathfrak{N}$, a subcategory of \textit{Rngs} consisting of neural rings and neural ring homomorphisms. Then, the rest of the paper characterizes the properties of these two categories like initial and final objects, products, coproducts, limits, etc. Also, we show that these two categories are in dual equivalence. △ Less

Submitted 26 March, 2024; originally announced March 2024.

MSC Class: 52A37; 92B99; 18A99

arXiv:2403.17548 [pdf, other]

Properties of graphs of neural codes

Authors: Suhith K N, Neha Gupta

Abstract: A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. In this paper, we study some properties of graphs of neural codes. In particular, we study codeword containment graph (CCG) given by Chan et al. (SIAM J. on Dis. Math., 37(1):114-145,2017) and general relationship graph (GRG) given by Gross et al. (Adv. in App. Math., 95:65-95, 2018). We provide a suffici… ▽ More A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. In this paper, we study some properties of graphs of neural codes. In particular, we study codeword containment graph (CCG) given by Chan et al. (SIAM J. on Dis. Math., 37(1):114-145,2017) and general relationship graph (GRG) given by Gross et al. (Adv. in App. Math., 95:65-95, 2018). We provide a sufficient condition for CCG to be connected. We also show that the connectedness and completeness of CCG are preserved under surjective morphisms between neural codes defined by A. Jeffs (SIAM J. on App. Alg. and Geo., 4(1):99-122,2020). Further, we show that if CCG of any neural code $\mathcal{C}$ is complete with $|\mathcal{C}|=m$, then $\mathcal{C} \cong \{\emptyset,1,12,\dots,123\cdots m\}$ as neural codes. We also prove that a code whose CCG is complete is open convex. Later, we show that if a code $\mathcal{C}$ with $|\mathcal{C}|>3$ has its CCG to be connected 2-regular then $|\mathcal{C}| $ is even. The GRG was defined only for degree two neural codes using the canonical forms of its neural ideal. We first define GRG for any neural code. Then, we show the behaviour of GRGs under the various elementary code maps. At last, we compare these two graphs for certain classes of codes and see their properties. △ Less

Submitted 26 March, 2024; originally announced March 2024.

MSC Class: 52A37; 92B99; 05C40; 05C99

arXiv:2403.17397 [pdf, ps, other]

On Abhyankar-Sathaye Conjecture for a family of linear hypersurfaces in $\A_{k}^4$

Authors: Parnashree Ghosh, Neena Gupta, Ananya Pal

Abstract: Let $k$ be a field. In this paper we study the Abhyankar-Sathaye Epimorphism Conjecture for certain hyperplanes in $\A_k^4$ defined by a polynomial of the form $a(X)Y-F(X,Z,T)$. When $k=\bC$, Kaliman, Vénéreau and Zaidenberg have proved that such hyperplanes are rectifiable in $\A^4_{\bC}$. We extend their result over any field of characteristic zero and when $k$ is a field of arbitrary characteri… ▽ More Let $k$ be a field. In this paper we study the Abhyankar-Sathaye Epimorphism Conjecture for certain hyperplanes in $\A_k^4$ defined by a polynomial of the form $a(X)Y-F(X,Z,T)$. When $k=\bC$, Kaliman, Vénéreau and Zaidenberg have proved that such hyperplanes are rectifiable in $\A^4_{\bC}$. We extend their result over any field of characteristic zero and when $k$ is a field of arbitrary characteristic we prove the Abhyankar-Sathaye Conjecture for such hyperplanes under some conditions on the roots of $a(X)$. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Comments are welcome!

MSC Class: Primary: 14R10; Secondary: 13B25; 13A50; 13A02

arXiv:2403.14235 [pdf, other]

RG-CAT: Detection Pipeline and Catalogue of Radio Galaxies in the EMU Pilot Survey

Authors: Nikhel Gupta, Ray P. Norris, Zeeshan Hayder, Minh Huynh, Lars Petersson, X. Rosalind Wang, Andrew M. Hopkins, Heinz Andernach, Yjan Gordon, Simone Riggi, Miranda Yew, Evan J. Crawford, Bärbel Koribalski, Miroslav D. Filipović, Anna D. Kapinśka, Stanislav Shabala, Tessa Vernstrom, Joshua R. Marvil

Abstract: We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio… ▽ More We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer-vision networks (Gupta et al., 2024) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5,000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within $3^{\prime \prime}$ of the ground truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritize Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211,625 radio sources, with 201,211 classified as compact and unresolved. The remaining 10,414 are categorized as extended radio morphologies, including 582 FR-I, 5,602 FR-II, 1,494 FR-x (uncertain whether FR-I or FR-II), 2,375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted for publication in PASA. The paper has 22 pages, 12 figures and 5 tables

arXiv:2403.07534 [pdf, ps, other]

Frobenius numbers associated with Diophantine triples of $x^2+y^2=z^r$ (extended version)

Authors: Takao Komatsu, Neha Gupta, Manoj Upreti

Abstract: We give an explicit formula for the $p$-Frobenius number of triples associated with Diophantine equations $x^2+y^2=z^r$, that is, the largest positive integer that can only be represented in $p$ ways by combining the three integers of the solutions of Diophantine equations $x^2+y^2=z^r$. When $r=2$, the Frobenius number has already been given. We give an explicit formula for the $p$-Frobenius number of triples associated with Diophantine equations $x^2+y^2=z^r$, that is, the largest positive integer that can only be represented in $p$ ways by combining the three integers of the solutions of Diophantine equations $x^2+y^2=z^r$. When $r=2$, the Frobenius number has already been given. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.02673 [pdf, ps, other]

On General Weighted Extropy of Extreme Ranked Set Sampling

Authors: Pradeep Kumar Sahu, Nitin Gupta

Abstract: The extropy measure, introduced by Lad, Sanfilippo, and Agro in their (2015) paper in Statistical Science, has garnered significant interest over the past years. In this study, we present a novel representation for the weighted extropy within the context of extreme ranked set sampling. Additionally, we offer related findings such as stochastic orders, characterizations, and precise bounds. Our res… ▽ More The extropy measure, introduced by Lad, Sanfilippo, and Agro in their (2015) paper in Statistical Science, has garnered significant interest over the past years. In this study, we present a novel representation for the weighted extropy within the context of extreme ranked set sampling. Additionally, we offer related findings such as stochastic orders, characterizations, and precise bounds. Our results shed light onthe comparison between the weighted extropy of extreme ranked set sampling and its counterpart in simple random sampling. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2207.02003

arXiv:2403.02337 [pdf, other]

First Constraints on the Epoch of Reionization Using the non-Gaussianity of the Kinematic Sunyaev-Zel{'}dovich Effect from the South Pole Telescope and {\it Herschel}-SPIRE Observations

Authors: S. Raghunathan, P. A. R. Ade, A. J. Anderson, B. Ansarinejad, M. Archipley, J. E. Austermann, L. Balkenhol, J. A. Beall, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, L. E. Bleem, J. Bock, F. R. Bouchet, L. Bryant, E. Camphuis, J. E. Carlstrom, T. W. Cecil, C. L. Chang, P. Chaubal, H. C. Chiang, P. M. Chichura, T. -L. Chou, R. Citron , et al. (97 additional authors not shown)

Abstract: We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ i… ▽ More We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ in bands centered at 95, 150, and 220 GHz. For SPIRE, we include data from the 600 and 857 GHz bands. We reconstruct the velocity-induced large-scale correlation of the small-scale kSZ signal with a quadratic estimator that uses two cosmic microwave background (CMB) temperature maps, constructed by optimally combining data from all the frequency bands. We reject the null hypothesis of a zero trispectrum at $10.3σ$ level. However, the measured trispectrum contains contributions from both the kSZ and other undesired components, such as CMB lensing and astrophysical foregrounds, with kSZ being sub-dominant. We use the \textsc{Agora} simulations to estimate the expected signal from CMB lensing and astrophysical foregrounds. After accounting for the contributions from CMB lensing and foreground signals, we do not detect an excess kSZ-only trispectrum and use this non-detection to set constraints on reionization. By applying a prior based on observations of the Gunn-Peterson trough, we obtain an upper limit on the duration of reionization of $Δz_{\rm re, 50} < 4.5$ (95\% C.L). We find these constraints are fairly robust to foregrounds assumptions. This trispectrum measurement is independent of, but consistent with, {\it Planck}'s optical depth measurement. This result is the first constraint on the epoch of reionization using the non-Gaussian nature of the kSZ signal. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 15 pages, 5 figures (3 in main text and 2 in Appendix); To be submitted to PRL; Comments welcome; Data products and plotting scripts can be downloaded from https://github.com/sriniraghunathan/kSZ_4pt_SPT_SPIRE

arXiv:2402.15329 [pdf, ps, other]

Iterations of the functor of naive $\mathbb A^1$-connected components of varieties

Authors: Nidhi Gupta

Abstract: For any sheaf of sets $\mathcal F$ on $Sm/k$, it is well known that the universal $\mathbb A^1$-invariant quotient of $\mathcal F$ is given as the colimit of sheaves $\mathcal S^n(\mathcal F)$ where $\mathcal S(F)$ is the sheaf of naive $\mathbb A^1$-connected components of $\mathcal F$. We show that these infinite iterations of naive $\mathbb A^1$-connected components in the construction of unive… ▽ More For any sheaf of sets $\mathcal F$ on $Sm/k$, it is well known that the universal $\mathbb A^1$-invariant quotient of $\mathcal F$ is given as the colimit of sheaves $\mathcal S^n(\mathcal F)$ where $\mathcal S(F)$ is the sheaf of naive $\mathbb A^1$-connected components of $\mathcal F$. We show that these infinite iterations of naive $\mathbb A^1$-connected components in the construction of universal $\mathbb A^1$-invariant quotient for a scheme are certainly required. For every $n$, we construct an $\mathbb A^1$-connected variety $X_n$ such that $\mathcal S^n(X_n)\neq \mathcal S^{n+1}(X_n)$ and $\mathcal S^{n+2}(X_n)=*$. △ Less

Submitted 14 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: 10 pages, comments are welcome, v2: added subsection 4.3

MSC Class: 14F42

arXiv:2402.13441 [pdf, other]

doi 10.1109/HPEC58863.2023.10363610

PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Authors: Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, Viktor Prasanna

Abstract: Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching. However, existing DNN-based MAP models suffer from the challenges such as significant physical storage space and poor inference latency, primarily due to their large number of parameters. These limitations render them imp… ▽ More Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching. However, existing DNN-based MAP models suffer from the challenges such as significant physical storage space and poor inference latency, primarily due to their large number of parameters. These limitations render them impractical for deployment in real-world scenarios. In this paper, we propose PaCKD, a Pattern-Clustered Knowledge Distillation approach to compress MAP models while maintaining the prediction performance. The PaCKD approach encompasses three steps: clustering memory access sequences into distinct partitions involving similar patterns, training large pattern-specific teacher models for memory access prediction for each partition, and training a single lightweight student model by distilling the knowledge from the trained pattern-specific teachers. We evaluate our approach on LSTM, MLP-Mixer, and ResNet models, as they exhibit diverse structures and are widely used for image classification tasks in order to test their effectiveness in four widely used graph applications. Compared to the teacher models with 5.406M parameters and an F1-score of 0.4626, our student models achieve a 552$\times$ model size compression while maintaining an F1-score of 0.4538 (with a 1.92% performance drop). Our approach yields an 8.70% higher result compared to student models trained with standard knowledge distillation and an 8.88% higher result compared to student models trained without any form of knowledge distillation. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 6 pages, 2 figures, HPEC '23

Journal ref: 2023 IEEE High Performance Extreme Computing Conference (HPEC), 2023, pp. 1-7

arXiv:2402.12780 [pdf, other]

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Authors: Youssef Allouah, Sadegh Farhadkhani, Rachid GuerraouI, Nirupam Gupta, Rafael Pinot, Geovani Rizk, Sasha Voitovych

Abstract: The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the c… ▽ More The possibility of adversarial (a.k.a., {\em Byzantine}) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a \emph{robust averaging rule}. While a significant amount of work has been devoted to studying the convergence of federated {\em robust averaging} (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of {\em client subsampling} and {\em local steps}, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i.e., non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy {\em diminishes} with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks. △ Less

Submitted 10 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.07411 [pdf, other]

Potential-Based Reward Shaping For Intrinsic Motivation

Authors: Grant C. Forbes, Nitish Gupta, Leonardo Villalobos-Arias, Colin M. Potts, Arnav Jhala, David L. Roberts

Abstract: Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been ap… ▽ More Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present {\em Potential-Based Intrinsic Motivation} (PBIM), a method for converting IM rewards into a potential-based form that is useable without altering the set of optimal policies. Testing in the MiniGrid DoorKey and Cliff Walking environments, we demonstrate that PBIM successfully prevents the agent from converging to a suboptimal policy and can speed up training. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Extended version of paper appearing in AAMAS 2024

ACM Class: I.2.6

arXiv:2402.02945 [pdf, other]

Stochastic ordering of extreme order statistics in Archimax copula

Authors: Sarikul Islam, Nitin Gupta

Abstract: An extension of Archimax copula class in more than two random variables ( Multivariate ) was introduced in (Jágr 2011) for describing dependency structures among random variables in higher dimension, and some properties of Archimax copula were explored in (Charpentier et al. 2014). In this article, some results for stochastic ordering of extreme order statistics in (Li and Fang 2015) are generaliz… ▽ More An extension of Archimax copula class in more than two random variables ( Multivariate ) was introduced in (Jágr 2011) for describing dependency structures among random variables in higher dimension, and some properties of Archimax copula were explored in (Charpentier et al. 2014). In this article, some results for stochastic ordering of extreme order statistics in (Li and Fang 2015) are generalized and proved in Archimax copula. Stochastic ordering of sample extremes for PHR models is generalized and proved in Archimax copula. Examples with graphical illustrations are also presented. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Multivariate Statistics, Stochastic ordering, 18 pages, 8 figures

ACM Class: G.3

arXiv:2402.00045 [pdf, other]

Detecting Multimedia Generated by Large AI Models: A Survey

Authors: Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

Abstract: The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting mu… ▽ More The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life. Although beneficial in numerous fields, this content presents significant risks, including potential misuse, societal disruptions, and ethical concerns. Consequently, detecting multimedia generated by LAIMs has become crucial, with a marked rise in related research. Despite this, there remains a notable gap in systematic surveys that focus specifically on detecting LAIM-generated multimedia. Addressing this, we provide the first survey to comprehensively cover existing research on detecting multimedia (such as text, images, videos, audio, and multimodal content) created by LAIMs. Specifically, we introduce a novel taxonomy for detection methods, categorized by media modality, and aligned with two perspectives: pure detection (aiming to enhance detection performance) and beyond detection (adding attributes like generalizability, robustness, and interpretability to detectors). Additionally, we have presented a brief overview of generation mechanisms, public datasets, and online detection tools to provide a valuable resource for researchers and practitioners in this field. Furthermore, we identify current challenges in detection and propose directions for future research that address unexplored, ongoing, and emerging issues in detecting multimedia generated by LAIMs. Our aim for this survey is to fill an academic gap and contribute to global AI security efforts, helping to ensure the integrity of information in the digital realm. The project link is https://github.com/Purdue-M2/Detect-LAIM-generated-Multimedia-Survey. △ Less

Submitted 7 February, 2024; v1 submitted 22 January, 2024; originally announced February 2024.

arXiv:2401.13065 [pdf, ps, other]

Extropy and Varextropy estimators with applications

Authors: Santosh Kumar Chaudhary, Nitin Gupta

Abstract: In many statistical studies, the measure of uncertainties like entropy, extropy, varentropy and varextropy of a distribution function is of prime interest. This paper proposes estimators of extropy and varextropy. Proposed estimators are consistent. Based on extropy estimator, a test of symmetry is given. The proposed test has the advantage that we do not need to estimate the centre of symmetry. T… ▽ More In many statistical studies, the measure of uncertainties like entropy, extropy, varentropy and varextropy of a distribution function is of prime interest. This paper proposes estimators of extropy and varextropy. Proposed estimators are consistent. Based on extropy estimator, a test of symmetry is given. The proposed test has the advantage that we do not need to estimate the centre of symmetry. The critical value and power of the proposed test statistics have been obtained. The test procedure has been implemented on six real-life data sets to verify its performance in identifying the symmetric nature. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2209.06703

arXiv:2401.12381 [pdf, other]

doi 10.1021/jacsau.4c00068

Computational Reverse Engineering Analysis of Scattering Experiments Method for Interpretation of 2D Small-Angle Scattering Profiles (CREASE-2D)

Authors: Sri Vishnuvardhan Reddy Akepati, Nitant Gupta, Arthi Jayaraman

Abstract: Characterization of structural diversity within soft materials is key for engineering new materials for various applications. Small-angle scattering (SAS) is a widely used characterization technique that provides structural information in soft materials at varying length scales and typically outputs scattered intensity I(q) as a function of the scattered wavevector represented by its magnitude q a… ▽ More Characterization of structural diversity within soft materials is key for engineering new materials for various applications. Small-angle scattering (SAS) is a widely used characterization technique that provides structural information in soft materials at varying length scales and typically outputs scattered intensity I(q) as a function of the scattered wavevector represented by its magnitude q and azimuthal angle θ. While isotropic structures can be interpreted from azimuthally averaged 1D SAS profile, to understand anisotropic spatial arrangements, one has to interpret the 2D SAS profile, I(q,θ). In this paper, we present a new method called CREASE-2D that interprets I(q,θ) as is and outputs the relevant structural features. CREASE-2D is an extension of the 'computational reverse engineering analysis for scatting experiments' (CREASE) method that has been used successfully to analyze 1D SAS profiles for a variety of soft materials. CREASE uses a genetic algorithm for optimization and a surrogate machine learning (ML) model for fast calculation of 1D 'computed' scattering profiles that are then compared to the experimental 1D scattering profiles during optimization. In CREASE-2D, which goes beyond CREASE in interpretting 2D scattering profiles, we use XGBoost as the surrogate ML model to relate structural features to the I(q,θ) profile. The CREASE-2D workflow identifies the structural features whose computed I(q,θ) profiles match the input experimental I(q,θ). We test the performance of CREASE-2D by using as input a variety of in silico 2D SAS profiles with known structural features and demonstrate that CREASE-2D converges towards their correct structural features. We expect this method will be valuable for materials' researchers who need direct interpretation of 2D scattering profiles to explore structural anisotropy. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 14 pages, 5 figures, supporting information included

arXiv:2401.06362 [pdf, other]

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Authors: Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna

Abstract: Attention-based Neural Networks (NN) have demonstrated their effectiveness in accurate memory access prediction, an essential step in data prefetching. However, the substantial computational overheads associated with these models result in high inference latency, limiting their feasibility as practical prefetchers. To close the gap, we propose a new approach based on tabularization that significan… ▽ More Attention-based Neural Networks (NN) have demonstrated their effectiveness in accurate memory access prediction, an essential step in data prefetching. However, the substantial computational overheads associated with these models result in high inference latency, limiting their feasibility as practical prefetchers. To close the gap, we propose a new approach based on tabularization that significantly reduces model complexity and inference latency without sacrificing prediction accuracy. Our novel tabularization methodology takes as input a distilled, yet highly accurate attention-based model for memory access prediction and efficiently converts its expensive matrix multiplications into a hierarchy of fast table lookups. As an exemplar of the above approach, we develop DART, a prefetcher comprised of a simple hierarchy of tables. With a modest 0.09 drop in F1-score, DART reduces 99.99% of arithmetic operations from the large attention-based model and 91.83% from the distilled model. DART accelerates the large model inference by 170x and the distilled model by 9.4x. DART has comparable latency and storage costs as state-of-the-art rule-based prefetcher BO but surpasses it by 6.1% in IPC improvement. DART outperforms state-of-the-art NN-based prefetchers TransFetch by 33.1% and Voyager by 37.2% in terms of IPC improvement, primarily due to its low prefetching latency. △ Less

Submitted 21 February, 2024; v1 submitted 23 December, 2023; originally announced January 2024.

arXiv:2401.02412 [pdf, other]

LLM Augmented LLMs: Expanding Capabilities through Composition

Authors: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domai… ▽ More Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 17 pages, 2 figures, 8 tables

arXiv:2401.02075 [pdf, other]

SPT Clusters with DES and HST Weak Lensing. II. Cosmological Constraints from the Abundance of Massive Halos

Authors: S. Bocquet, S. Grandis, L. E. Bleem, M. Klein, J. J. Mohr, T. Schrabback, T. M. C. Abbott, P. A. R. Ade, M. Aguena, A. Alarcon, S. Allam, S. W. Allen, O. Alves, A. Amon, A. J. Anderson, J. Annis, B. Ansarinejad, J. E. Austermann, S. Avila, D. Bacon, M. Bayliss, J. A. Beall, K. Bechtol, M. R. Becker, A. N. Bender , et al. (171 additional authors not shown)

Abstract: We present cosmological constraints from the abundance of galaxy clusters selected via the thermal Sunyaev-Zel'dovich (SZ) effect in South Pole Telescope (SPT) data with a simultaneous mass calibration using weak gravitational lensing data from the Dark Energy Survey (DES) and the Hubble Space Telescope (HST). The cluster sample is constructed from the combined SPT-SZ, SPTpol ECS, and SPTpol 500d… ▽ More We present cosmological constraints from the abundance of galaxy clusters selected via the thermal Sunyaev-Zel'dovich (SZ) effect in South Pole Telescope (SPT) data with a simultaneous mass calibration using weak gravitational lensing data from the Dark Energy Survey (DES) and the Hubble Space Telescope (HST). The cluster sample is constructed from the combined SPT-SZ, SPTpol ECS, and SPTpol 500d surveys, and comprises 1,005 confirmed clusters in the redshift range $0.25-1.78$ over a total sky area of 5,200 deg$^2$. We use DES Year 3 weak-lensing data for 688 clusters with redshifts $z<0.95$ and HST weak-lensing data for 39 clusters with $0.6<z<1.7$. The weak-lensing measurements enable robust mass measurements of sample clusters and allow us to empirically constrain the SZ observable--mass relation. For a flat $Λ$CDM cosmology, and marginalizing over the sum of massive neutrinos, we measure $Ω_\mathrm{m}=0.286\pm0.032$, $σ_8=0.817\pm0.026$, and the parameter combination $σ_8\,(Ω_\mathrm{m}/0.3)^{0.25}=0.805\pm0.016$. Our measurement of $S_8\equivσ_8\,\sqrt{Ω_\mathrm{m}/0.3}=0.795\pm0.029$ and the constraint from Planck CMB anisotropies (2018 TT,TE,EE+lowE) differ by $1.1σ$. In combination with that Planck dataset, we place a 95% upper limit on the sum of neutrino masses $\sum m_ν<0.18$ eV. When additionally allowing the dark energy equation of state parameter $w$ to vary, we obtain $w=-1.45\pm0.31$ from our cluster-based analysis. In combination with Planck data, we measure $w=-1.34^{+0.22}_{-0.15}$, or a $2.2σ$ difference with a cosmological constant. We use the cluster abundance to measure $σ_8$ in five redshift bins between 0.25 and 1.8, and we find the results to be consistent with structure growth as predicted by the $Λ$CDM model fit to Planck primary CMB data. △ Less

Submitted 21 June, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted for publication in Phys. Rev. D. arXiv v2 corresponds to published article

arXiv:2312.15790 [pdf, other]

Complexity and Operator Growth for Quantum Systems in Dynamic Equilibrium

Authors: Cameron Beetar, Nitin Gupta, S. Shajidul Haque, Jeff Murugan, Hendrik J R Van Zyl

Abstract: Krylov complexity is a measure of operator growth in quantum systems, based on the number of orthogonal basis vectors needed to approximate the time evolution of an operator. In this paper, we study the Krylov complexity of a $\mathsf{PT}$-symmetric system of oscillators, which exhibits two phase transitions that separate a dissipative state, a Rabi-oscillation state, and an ultra-strongly coupled… ▽ More Krylov complexity is a measure of operator growth in quantum systems, based on the number of orthogonal basis vectors needed to approximate the time evolution of an operator. In this paper, we study the Krylov complexity of a $\mathsf{PT}$-symmetric system of oscillators, which exhibits two phase transitions that separate a dissipative state, a Rabi-oscillation state, and an ultra-strongly coupled regime. We use a generalization of the $su(1,1)$ algebra associated to the Bateman oscillator to describe the Hamiltonian of the coupled system, and construct a set of coherent states associated with this algebra. We compute the Krylov (spread) complexity using these coherent states, and find that it can distinguish between the $\mathsf{PT}$-symmetric and $\mathsf{PT}$ symmetry-broken phases. We also show that the Krylov complexity reveals the ill-defined nature of the vacuum of the Bateman oscillator, which is a special case of our system. Our results demonstrate the utility of Krylov complexity as a tool to probe the properties and transitions of $\mathsf{PT}$-symmetric systems. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 24 + 4 pages and appendices

arXiv:2312.07343 [pdf, ps, other]

Can ChatGPT Play the Role of a Teaching Assistant in an Introductory Programming Course?

Authors: Anishka, Atharva Mehta, Nipun Gupta, Aarav Balachandran, Dhruv Kumar, Pankaj Jalote

Abstract: The emergence of Large language models (LLMs) is expected to have a major impact on education. This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an Introductory Programming Course. We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions. The TA functions which we focus on include… ▽ More The emergence of Large language models (LLMs) is expected to have a major impact on education. This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an Introductory Programming Course. We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions. The TA functions which we focus on include (1) grading student code submissions, and (2) providing feedback to undergraduate students in an introductory programming course. Firstly, we assess ChatGPT's proficiency in grading student code submissions using a given grading rubric and compare its performance with the grades assigned by human TAs. Secondly, we analyze the quality and relevance of the feedback provided by ChatGPT. This evaluation considers how well ChatGPT addresses mistakes and offers suggestions for improvement in student solutions from both code correctness and code quality perspectives. We conclude with a discussion on the implications of integrating ChatGPT into computing education for automated grading, personalized learning experiences, and instructional support. △ Less

Submitted 22 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: Under review

arXiv:2312.06728 [pdf, other]

doi 10.1017/pasa.2023.64

A Multimodal Dataset and Benchmark for Radio Galaxy and Infrared Host Detection

Authors: Nikhel Gupta, Zeeshan Hayder, Ray P. Norris, Minh Hyunh, Lars Petersson

Abstract: We present a novel multimodal dataset developed by expert astronomers to automate the detection and localisation of multi-component extended radio galaxies and their corresponding infrared hosts. The dataset comprises 4,155 instances of galaxies in 2,800 images with both radio and infrared modalities. Each instance contains information on the extended radio galaxy class, its corresponding bounding… ▽ More We present a novel multimodal dataset developed by expert astronomers to automate the detection and localisation of multi-component extended radio galaxies and their corresponding infrared hosts. The dataset comprises 4,155 instances of galaxies in 2,800 images with both radio and infrared modalities. Each instance contains information on the extended radio galaxy class, its corresponding bounding box that encompasses all of its components, pixel-level segmentation mask, and the position of its corresponding infrared host galaxy. Our dataset is the first publicly accessible dataset that includes images from a highly sensitive radio telescope, infrared satellite, and instance-level annotations for their identification. We benchmark several object detection algorithms on the dataset and propose a novel multimodal approach to identify radio galaxies and the positions of infrared hosts simultaneously. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Accepted in NeurIPS 2023 conference ML4PS workshop (https://nips.cc/). The full version accepted in PASA, is available at https://doi.org/10.1017/pasa.2023.64

arXiv:2312.05456 [pdf, other]

On the calibration of compartmental epidemiological models

Authors: Nikunj Gupta, Anh Mai, Azza Abouzied, Dennis Shasha

Abstract: Epidemiological compartmental models are useful for understanding infectious disease propagation and directing public health policy decisions. Calibration of these models is an important step in offering accurate forecasts of disease dynamics and the effectiveness of interventions. In this study, we present an overview of calibrating strategies that can be employed, including several optimization… ▽ More Epidemiological compartmental models are useful for understanding infectious disease propagation and directing public health policy decisions. Calibration of these models is an important step in offering accurate forecasts of disease dynamics and the effectiveness of interventions. In this study, we present an overview of calibrating strategies that can be employed, including several optimization methods and reinforcement learning (RL). We discuss the benefits and drawbacks of these methods and highlight relevant practical conclusions from our experiments. Optimization methods iteratively adjust the parameters of the model until the model output matches the available data, whereas RL uses trial and error to learn the optimal set of parameters by maximizing a reward signal. Finally, we discuss how the calibration of parameters of epidemiological compartmental models is an emerging field that has the potential to improve the accuracy of disease modeling and public health decision-making. Further research is needed to validate the effectiveness and scalability of these approaches in different epidemiological contexts. All codes and resources are available on \url{https://github.com/Nikunj-Gupta/On-the-Calibration-of-Compartmental-Epidemiological-Models}. We hope this work can facilitate related research. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.00306 [pdf, other]

RadioGalaxyNET: Dataset and Novel Computer Vision Algorithms for the Detection of Extended Radio Galaxies and Infrared Hosts

Authors: Nikhel Gupta, Zeeshan Hayder, Ray P. Norris, Minh Huynh, Lars Petersson

Abstract: Creating radio galaxy catalogues from next-generation deep surveys requires automated identification of associated components of extended sources and their corresponding infrared hosts. In this paper, we introduce RadioGalaxyNET, a multimodal dataset, and a suite of novel computer vision algorithms designed to automate the detection and localization of multi-component extended radio galaxies and t… ▽ More Creating radio galaxy catalogues from next-generation deep surveys requires automated identification of associated components of extended sources and their corresponding infrared hosts. In this paper, we introduce RadioGalaxyNET, a multimodal dataset, and a suite of novel computer vision algorithms designed to automate the detection and localization of multi-component extended radio galaxies and their corresponding infrared hosts. The dataset comprises 4,155 instances of galaxies in 2,800 images with both radio and infrared channels. Each instance provides information about the extended radio galaxy class, its corresponding bounding box encompassing all components, the pixel-level segmentation mask, and the keypoint position of its corresponding infrared host galaxy. RadioGalaxyNET is the first dataset to include images from the highly sensitive Australian Square Kilometre Array Pathfinder (ASKAP) radio telescope, corresponding infrared images, and instance-level annotations for galaxy detection. We benchmark several object detection algorithms on the dataset and propose a novel multimodal approach to simultaneously detect radio galaxies and the positions of infrared hosts. △ Less

Submitted 30 November, 2023; originally announced December 2023.

Comments: Accepted for publication in PASA. The paper has 17 pages, 6 figures, 5 tables

arXiv:2311.07512 [pdf, other]

doi 10.21105/astro.2311.07512

Galaxy Clusters Discovered via the Thermal Sunyaev-Zel'dovich Effect in the 500-square-degree SPTpol Survey

Authors: L. E. Bleem, M. Klein, T. M. C. Abbott, P. A. R. Ade, M. Aguena, O. Alves, A. J. Anderson, F. Andrade-Oliveira, B. Ansarinejad, M. Archipley, M. L. N. Ashby, J. E. Austermann, D. Bacon, J. A. Beall, A. N. Bender, B. A. Benson, F. Bianchini, S. Bocquet, D. Brooks, D. L. Burke, M. Calzadilla, J. E. Carlstrom, A. Carnero Rosell, J. Carretero, C. L. Chang , et al. (103 additional authors not shown)

Abstract: We present a catalog of 689 galaxy cluster candidates detected at significance $ξ>4$ via their thermal Sunyaev-Zel'dovich (SZ) effect signature in 95 and 150 GHz data from the 500-square-degree SPTpol survey. We use optical and infrared data from the Dark Energy Camera and the Wide-field Infrared Survey Explorer (WISE) and \spitzer \ satellites, to confirm 544 of these candidates as clusters with… ▽ More We present a catalog of 689 galaxy cluster candidates detected at significance $ξ>4$ via their thermal Sunyaev-Zel'dovich (SZ) effect signature in 95 and 150 GHz data from the 500-square-degree SPTpol survey. We use optical and infrared data from the Dark Energy Camera and the Wide-field Infrared Survey Explorer (WISE) and \spitzer \ satellites, to confirm 544 of these candidates as clusters with $\sim94\%$ purity. The sample has an approximately redshift-independent mass threshold at redshift $z>0.25$ and spans $1.5 \times 10^{14} < M_{500c} < 9.1 \times 10^{14}$ $M_\odot/h_{70}$ \ and $0.03<z\lesssim1.6$ in mass and redshift, respectively; 21\% of the confirmed clusters are at $z>1$. We use external radio data from the Sydney University Molonglo Sky Survey (SUMSS) to estimate contamination to the SZ signal from synchrotron sources. The contamination reduces the recovered $ξ$ by a median value of 0.032, or $\sim0.8\%$ of the $ξ=4$ threshold value, and $\sim7\%$ of candidates have a predicted contamination greater than $Δξ= 1$. With the exception of a small number of systems $(<1\%)$, an analysis of clusters detected in single-frequency 95 and 150 GHz data shows no significant contamination of the SZ signal by emission from dusty or synchrotron sources. This cluster sample will be a key component in upcoming astrophysical and cosmological analyses of clusters. The SPTpol millimeter-wave maps and associated data products used to produce this sample are available at https://pole.uchicago.edu/public/data/sptpol_500d_clusters/index.html, and the NASA LAMBDA website. An interactive sky server with the SPTpol maps and Dark Energy Survey data release 2 images is also available at NCSA https://skyviewer.ncsa.illinois.edu. △ Less

Submitted 8 February, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Matches version accepted by OJA. 19 pages + references, 14 figures, cluster candidate table provided in Appendix. Data products available at https://pole.uchicago.edu/public/data/sptpol_500d_clusters/index.html and an interactive sky server at https://skyviewer.ncsa.illinois.edu

Journal ref: Open Journal of Astrophysics, Volume 7, 2024

arXiv:2311.07481 [pdf, other]

HESS J1809-193: Gamma-Ray Emission by Cosmic Rays from Past Explosion

Authors: Sovan Boxi, Nayantara Gupta

Abstract: The very high energy gamma-ray source HESS J1809-193 has been detected by the LHAASO and HAWC observatory beyond 100 TeV energy. It is an interesting candidate for exploring the underlying mechanisms of gamma-ray production due to the presence of supernova remnants, pulsar and molecular clouds close to it. We have considered the injection of the energetic cosmic rays from a past explosion, whose r… ▽ More The very high energy gamma-ray source HESS J1809-193 has been detected by the LHAASO and HAWC observatory beyond 100 TeV energy. It is an interesting candidate for exploring the underlying mechanisms of gamma-ray production due to the presence of supernova remnants, pulsar and molecular clouds close to it. We have considered the injection of the energetic cosmic rays from a past explosion, whose reminiscent may be SNR G011.0-00.0, which is located within the extended gamma-ray source HESS J1809-193. We explain the multi-wavelength data from the region of HESS J1809-193 with synchrotron, inverse Compton, bremsstrahlung emission of cosmic ray electrons and secondary gamma-ray production in interactions of cosmic ray protons with the cold protons in the local molecular clouds within a time-dependent framework including the diffusion loss of cosmic rays. The observational data has been modelled with the secondary photons produced by the time-evolved cosmic ray spectrum, assuming the age of the explosion is 4500 years. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 8 pages, 6 figures, Accepted in ApJ

arXiv:2311.00336 [pdf, other]

MALS discovery of a rare HI 21-cm absorber at $z\sim1.35$: origin of the absorbing gas in powerful AGN

Authors: P. P. Deka, N. Gupta, H. W. Chen, S. D. Johnson, P. Noterdaeme, F. Combes, E. Boettcher, S. A. Balashev, K. L. Emig, G. I. G. Józsa, H. -R. Klöckner, J-. K. Krogager, E. Momjian, P. Petitjean, G. C. Rudie, J. Wagenveld, F. S. Zahedy

Abstract: We report a new, rare detection of HI 21-cm absorption associated with a quasar (only six known at $1<z<2$) here towards J2339-5523 at $z_{em}$ = 1.3531, discovered through the MeerKAT Absorption Line Survey (MALS). The absorption profile is broad ($\sim 400$ km/s), and the peak is redshifted by $\sim 200$ km/s, from $z_{em}$. Interestingly, optical/FUV spectra of the quasar from Magellan-MIKE/HST… ▽ More We report a new, rare detection of HI 21-cm absorption associated with a quasar (only six known at $1<z<2$) here towards J2339-5523 at $z_{em}$ = 1.3531, discovered through the MeerKAT Absorption Line Survey (MALS). The absorption profile is broad ($\sim 400$ km/s), and the peak is redshifted by $\sim 200$ km/s, from $z_{em}$. Interestingly, optical/FUV spectra of the quasar from Magellan-MIKE/HST-COS spectrographs do not show any absorption features associated with the 21-cm absorption. This is despite the coincident presence of the optical quasar and the radio `core' inferred from a flat spectrum component of flux density $\sim 65$ mJy at high frequencies ($>5$ GHz). The simplest explanation would be that no large HI column (N(HI)$>10^{17}$ cm$^{-2}$) is present towards the radio `core' and the optical AGN. Based on the joint optical and radio analysis of a heterogeneous sample of 16 quasars ($z_{median}$ = 0.7) and 15 radio galaxies ($z_{median}$ = 0.3) with HI 21-cm absorption detection and matched in 1.4 GHz luminosity (L$_{\rm 1.4\,GHz}$), a consistent picture emerges where quasars are primarily tracing the gas in the inner circumnuclear disk and cocoon created by the jet-ISM interaction. These exhibit L$_{1.4\,\rm GHz}$ - $ΔV_{\rm null}$ correlation, and frequent mismatch between the radio and optical spectral lines. The radio galaxies show no such correlation and likely trace the gas from the cocoon and the galaxy-wide ISM outside the photoionization cone. The analysis presented here demonstrates the potential of radio spectroscopic observations to reveal the origin of the absorbing gas associated with AGN that may be missed in optical observations. △ Less

Submitted 19 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 10 pages, 8 figures, accepted for publication in A&A

arXiv:2310.17204 [pdf, other]

Cold molecules in HI 21cm absorbers across redshifts 0.1-4

Authors: Francoise Combes, Neeraj Gupta

Abstract: Absorption lines at high redshift in front of quasars are rare in the mm domain. Only five associated and five intervening systems have been reported in the literature. These bring very useful information complementary to emission lines, for instance, to distinguish between inflows and outflows. They are also good candidates to study the variations of the fundamental constants. We report here the… ▽ More Absorption lines at high redshift in front of quasars are rare in the mm domain. Only five associated and five intervening systems have been reported in the literature. These bring very useful information complementary to emission lines, for instance, to distinguish between inflows and outflows. They are also good candidates to study the variations of the fundamental constants. We report here the search for molecules in emission and absorption in front of a sample of 30 targets, comprising 16 associated and 14 intervening HI 21-cm absorbers. The observations have been done with the IRAM-30m telescope, simultaneously at 3mm and 2mm, exploring CO ladder and HCO+ lines. Eight targets have been detected in emission, of which five are new. Their molecular gas masses range from 10^9 to 7 10^11 Mo. We also report four new detections in absorption. Two of the associated CO absorption line detections at high redshift (z=1.211 and 1.275) resulted from the high spatial resolution follow-up with NOEMA. The disparity between the mm molecular and HI 21-cm absorption lines for these and another intervening system detected in HNC at z = 1.275, is attributable to radio and mm sight lines tracing different media. Comparing HI and H2 in the 14 known high redshift molecular absorbers, associated HI absorption lines are broad, with multiple components and the molecular absorption corresponds to the broader and weaker 21-cm absorption component. This indicates two distinct phases: one near galaxy centers with a larger CO-to-HI abundance ratio, and another with lower molecular abundance in the outer regions of the galaxy. The comparison of interferometric and single dish observations shows that the detection of absorption requires sufficient spatial resolution to overcome the dilution by emission, and will be an important criterion for mm follow-up of 21-cm absorbers from ongoing large-scale surveys. △ Less

Submitted 16 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 14 pages, 10 figures, accepted in A&A

Showing 1–50 of 556 results for author: Gupta, N