subscribe to arXiv mailings

Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

Authors: Qian Yang, Weixiang Yan, Aishwarya Agrawal

Abstract: Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence gener… ▽ More Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence generation, often suffer from overconfidence. Other methods use self-consistency comparison but are affected by confirmation biases. To alleviate these, we propose \textbf{De}compose and \textbf{C}ompare \textbf{C}onsistency (\texttt{DeCC}) for reliability measurement. By comparing the consistency between the direct answer generated using the VLM's internal reasoning process, and the indirect answers obtained by decomposing the question into sub-questions and reasoning over the sub-answers produced by the VLM, \texttt{DeCC} measures the reliability of VLM's direct answer. Experiments across six vision-language tasks with three VLMs show \texttt{DeCC}'s reliability estimation achieves better correlation with task accuracy compared to the existing methods. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2407.07000 [pdf, other]

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Authors: Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov

Abstract: Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of use… ▽ More Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of user-facing performance crucial for real-time applications such as chat and translation. In this paper, we first identify the pitfalls of current performance metrics in evaluating LLM inference systems. We then propose Metron, a comprehensive performance evaluation framework that includes fluidity-index -- a novel metric designed to reflect the intricacies of the LLM inference process and its impact on real-time user experience. Finally, we evaluate various existing open-source platforms and model-as-a-service offerings using Metron, discussing their strengths and weaknesses. Metron is available at https://github.com/project-metron/metron. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06167 [pdf, other]

DεpS: Delayed ε-Shrinking for Faster Once-For-All Training

Authors: Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov

Abstract: CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment constraints. As the number of deployment scenarios grows, there is a need to find scalable solutions to design and train specialized CNNs. Once-for-all tr… ▽ More CNNs are increasingly deployed across different hardware, dynamic environments, and low-power embedded devices. This has led to the design and training of CNN architectures with the goal of maximizing accuracy subject to such variable deployment constraints. As the number of deployment scenarios grows, there is a need to find scalable solutions to design and train specialized CNNs. Once-for-all training has emerged as a scalable approach that jointly co-trains many models (subnets) at once with a constant training cost and finds specialized CNNs later. The scalability is achieved by training the full model and simultaneously reducing it to smaller subnets that share model weights (weight-shared shrinking). However, existing once-for-all training approaches incur huge training costs reaching 1200 GPU hours. We argue this is because they either start the process of shrinking the full model too early or too late. Hence, we propose Delayed $ε$-Shrinking (D$ε$pS) that starts the process of shrinking the full model when it is partially trained (~50%) which leads to training cost improvement and better in-place knowledge distillation to smaller models. The proposed approach also consists of novel heuristics that dynamically adjust subnet learning rates incrementally (E), leading to improved weight-shared knowledge distillation from larger to smaller subnets as well. As a result, DEpS outperforms state-of-the-art once-for-all training techniques across different datasets including CIFAR10/100, ImageNet-100, and ImageNet-1k on accuracy and cost. It achieves 1.83% higher ImageNet-1k top1 accuracy or the same accuracy with 1.3x reduction in FLOPs and 2.5x drop in training cost (GPU*hrs) △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted to the 18th European Conference on Computer Vision (ECCV 2024)

arXiv:2407.05618 [pdf, other]

Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (83 additional authors not shown)

Abstract: AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c… ▽ More AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.00835 [pdf, other]

Distributed Quantum Computing across an Optical Network Link

Authors: D. Main, P. Drmota, D. P. Nadlinger, E. M. Ainley, A. Agrawal, B. C. Nichol, R. Srinivas, G. Araneda, D. M. Lucas

Abstract: Distributed quantum computing (DQC) combines the computing power of multiple networked quantum processing modules, enabling the execution of large quantum circuits without compromising on performance and connectivity. Photonic networks are well-suited as a versatile and reconfigurable interconnect layer for DQC; remote entanglement shared between matter qubits across the network enables all-to-all… ▽ More Distributed quantum computing (DQC) combines the computing power of multiple networked quantum processing modules, enabling the execution of large quantum circuits without compromising on performance and connectivity. Photonic networks are well-suited as a versatile and reconfigurable interconnect layer for DQC; remote entanglement shared between matter qubits across the network enables all-to-all logical connectivity via quantum gate teleportation (QGT). For a scalable DQC architecture, the QGT implementation must be deterministic and repeatable; until now, there has been no demonstration satisfying these requirements. We experimentally demonstrate the distribution of quantum computations between two photonically interconnected trapped-ion modules. The modules are separated by $\sim$ 2 m, and each contains dedicated network and circuit qubits. By using heralded remote entanglement between the network qubits, we deterministically teleport a controlled-Z gate between two circuit qubits in separate modules, achieving 86% fidelity. We then execute Grover's search algorithm - the first implementation of a distributed quantum algorithm comprising multiple non-local two-qubit gates - and measure a 71% success rate. Furthermore, we implement distributed iSWAP and SWAP circuits, compiled with 2 and 3 instances of QGT, respectively, demonstrating the ability to distribute arbitrary two-qubit operations. As photons can be interfaced with a variety of systems, this technique has applications extending beyond trapped-ion quantum computers, providing a viable pathway towards large-scale quantum computing for a range of physical platforms. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 16 pages, 7 figures, 2 tables

arXiv:2407.00548 [pdf, other]

KOROL: Learning Visualizable Object Feature with Koopman Operator Rollout for Manipulation

Authors: Hongyi Chen, Abulikemu Abuduweili, Aviral Agrawal, Yunhai Han, Harish Ravichandar, Changliu Liu, Jeffrey Ichnowski

Abstract: Learning dexterous manipulation skills presents significant challenges due to complex nonlinear dynamics that underlie the interactions between objects and multi-fingered hands. Koopman operators have emerged as a robust method for modeling such nonlinear dynamics within a linear framework. However, current methods rely on runtime access to ground-truth (GT) object states, making them unsuitable f… ▽ More Learning dexterous manipulation skills presents significant challenges due to complex nonlinear dynamics that underlie the interactions between objects and multi-fingered hands. Koopman operators have emerged as a robust method for modeling such nonlinear dynamics within a linear framework. However, current methods rely on runtime access to ground-truth (GT) object states, making them unsuitable for vision-based practical applications. Unlike image-to-action policies that implicitly learn visual features for control, we use a dynamics model, specifically the Koopman operator, to learn visually interpretable object features critical for robotic manipulation within a scene. We construct a Koopman operator using object features predicted by a feature extractor and utilize it to auto-regressively advance system states. We train the feature extractor to embed scene information into object features, thereby enabling the accurate propagation of robot trajectories. We evaluate our approach on simulated and real-world robot tasks, with results showing that it outperformed the model-based imitation learning NDP by 1.08$\times$ and the image-to-action Diffusion Policy by 1.16$\times$. The results suggest that our method maintains task success rates with learned features and extends applicability to real-world manipulation without GT object states. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17037 [pdf, other]

Cheaper and more noise-resilient quantum state preparation using eigenvector continuation

Authors: Anjali A. Agrawal, Akhil Francis, A. F. Kemper

Abstract: Subspace methods are powerful, noise-resilient methods that can effectively prepare ground states on quantum computers. The challenge is to get a subspace with a small condition number that spans the states of interest using minimal quantum resources. In this work, we will use eigenvector continuation (EC) to build a subspace from the low-lying states of a set of Hamiltonians. The basis vectors ar… ▽ More Subspace methods are powerful, noise-resilient methods that can effectively prepare ground states on quantum computers. The challenge is to get a subspace with a small condition number that spans the states of interest using minimal quantum resources. In this work, we will use eigenvector continuation (EC) to build a subspace from the low-lying states of a set of Hamiltonians. The basis vectors are prepared using truncated versions of standard state preparation methods such as imaginary time evolution (ITE) and adiabatic state preparation (ASP). By using these truncated methods combined with eigenvector continuation, we can directly improve upon them, obtaining more accurate ground state energies at a reduced cost. We use several spin systems to demonstrate convergence even when methods like ITE and ASP fail, such as ASP in the presence of level crossings and ITE with vanishing energy gaps. We also showcase the noise resilience of this approach beyond the gains already made by having a shallower quantum circuit. Our findings suggest that eigenvector continuation can be used to improve existing state preparation methods in the near term. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 17 pages, 18 figures

arXiv:2406.10266 [pdf]

COVID-19 Twitter Sentiment Classification Using Hybrid Deep Learning Model Based on Grid Search Methodology

Authors: Jitendra Tembhurne, Anant Agrawal, Kirtan Lakhotia

Abstract: In the contemporary era, social media platforms amass an extensive volume of social data contributed by their users. In order to promptly grasp the opinions and emotional inclinations of individuals regarding a product or event, it becomes imperative to perform sentiment analysis on the user-generated content. Microblog comments often encompass both lengthy and concise text entries, presenting a c… ▽ More In the contemporary era, social media platforms amass an extensive volume of social data contributed by their users. In order to promptly grasp the opinions and emotional inclinations of individuals regarding a product or event, it becomes imperative to perform sentiment analysis on the user-generated content. Microblog comments often encompass both lengthy and concise text entries, presenting a complex scenario. This complexity is particularly pronounced in extensive textual content due to its rich content and intricate word interrelations compared to shorter text entries. Sentiment analysis of public opinion shared on social networking websites such as Facebook or Twitter has evolved and found diverse applications. However, several challenges remain to be tackled in this field. The hybrid methodologies have emerged as promising models for mitigating sentiment analysis errors, particularly when dealing with progressively intricate training data. In this article, to investigate the hesitancy of COVID-19 vaccination, we propose eight different hybrid deep learning models for sentiment classification with an aim of improving overall accuracy of the model. The sentiment prediction is achieved using embedding, deep learning model and grid search algorithm on Twitter COVID-19 dataset. According to the study, public sentiment towards COVID-19 immunization appears to be improving with time, as evidenced by the gradual decline in vaccine reluctance. Through extensive evaluation, proposed model reported an increased accuracy of 98.86%, outperforming other models. Specifically, the combination of BERT, CNN and GS yield the highest accuracy, while the combination of GloVe, BiLSTM, CNN and GS follows closely behind with an accuracy of 98.17%. In addition, increase in accuracy in the range of 2.11% to 14.46% is reported by the proposed model in comparisons with existing works. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 14 pages, 6 figures, 11 tables

arXiv:2406.09998 [pdf, other]

Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors

Authors: Chaeyeon Han, Pavan Seshadri, Yiwei Ding, Noah Posner, Bon Woo Koo, Animesh Agrawal, Alexander Lerch, Subhrajit Guhathakurta

Abstract: While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study dis… ▽ More While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: submitted to Urban Informatics

arXiv:2406.09698 [pdf, other]

Projected background and sensitivity of AMoRE-II

Authors: A. Agrawal, V. V. Alenkov, P. Aryal, J. Beyer, B. Bhandari, R. S. Boiko, K. Boonin, O. Buzanov, C. R. Byeon, N. Chanthima, M. K. Cheoun, J. S. Choe, Seonho Choi, S. Choudhury, J. S. Chung, F. A. Danevich, M. Djamal, D. Drung, C. Enss, A. Fleischmann, A. M. Gangapshev, L. Gastaldo, Y. M. Gavrilyuk, A. M. Gezhaev, O. Gileva , et al. (81 additional authors not shown)

Abstract: AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap… ▽ More AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08412 [pdf, other]

Experimental Quantum Advantage in the Odd-Cycle Game

Authors: P. Drmota, D. Main, E. M. Ainley, A. Agrawal, G. Araneda, D. P. Nadlinger, B. C. Nichol, R. Srinivas, A. Cabello, D. M. Lucas

Abstract: We report the first experimental demonstration of the odd-cycle game. We entangle two ions separated by ~2 m and the players use them to win the odd-cycle game significantly more often than the best classical strategy allows. The experiment implements the optimal quantum strategy, is free of the detection loophole (the detection efficiency is >~99.999 %), and achieves 97.8(3) % of the theoretical… ▽ More We report the first experimental demonstration of the odd-cycle game. We entangle two ions separated by ~2 m and the players use them to win the odd-cycle game significantly more often than the best classical strategy allows. The experiment implements the optimal quantum strategy, is free of the detection loophole (the detection efficiency is >~99.999 %), and achieves 97.8(3) % of the theoretical limit to the quantum winning probability. It provides a nonlocal content of 0.54(2) -- the largest value for physically separate devices, free of the detection loophole, ever observed. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.06511 [pdf, other]

Quantifying fault tolerant simulation of strongly correlated systems using the Fermi-Hubbard model

Authors: Anjali A. Agrawal, Joshua Job, Tyler L. Wilson, S. N. Saadatmand, Mark J. Hodson, Josh Y. Mutus, Athena Caesura, Peter D. Johnson, Justin E. Elenewski, Kaitlyn J. Morrell, Alexander F. Kemper

Abstract: Understanding the physics of strongly correlated materials is one of the grand challenge problems for physics today. A large class of scientifically interesting materials, from high-$T_c$ superconductors to spin liquids, involve medium to strong correlations, and building a holistic understanding of these materials is critical. Doing so is hindered by the competition between the kinetic energy and… ▽ More Understanding the physics of strongly correlated materials is one of the grand challenge problems for physics today. A large class of scientifically interesting materials, from high-$T_c$ superconductors to spin liquids, involve medium to strong correlations, and building a holistic understanding of these materials is critical. Doing so is hindered by the competition between the kinetic energy and Coulomb repulsion, which renders both analytic and numerical methods unsatisfactory for describing interacting materials. Fault-tolerant quantum computers have been proposed as a path forward to overcome these difficulties, but this potential capability has not yet been fully assessed. Here, using the multi-orbital Fermi-Hubbard model as a representative model and a source of scalable problem specifications, we estimate the resource costs needed to use fault-tolerant quantum computers for obtaining experimentally relevant quantities such as correlation function estimation. We find that advances in quantum algorithms and hardware will be needed in order to reduce quantum resources and feasibly address utility-scale problem instances. △ Less

Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06079 [pdf, other]

Latent Representation Matters: Human-like Sketches in One-shot Drawing Tasks

Authors: Victor Boutin, Rishav Mukherji, Aditya Agrawal, Sabine Muzellec, Thomas Fel, Thomas Serre, Rufin VanRullen

Abstract: Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent… ▽ More Humans can effortlessly draw new categories from a single exemplar, a feat that has long posed a challenge for generative models. However, this gap has started to close with recent advances in diffusion models. This one-shot drawing task requires powerful inductive biases that have not been systematically investigated. Here, we study how different inductive biases shape the latent space of Latent Diffusion Models (LDMs). Along with standard LDM regularizers (KL and vector quantization), we explore supervised regularizations (including classification and prototype-based representation) and contrastive inductive biases (using SimCLR and redundancy reduction objectives). We demonstrate that LDMs with redundancy reduction and prototype-based regularizations produce near-human-like drawings (regarding both samples' recognizability and originality) -- better mimicking human perception (as evaluated psychophysically). Overall, our results suggest that the gap between humans and machines in one-shot drawings is almost closed. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.01241 [pdf, ps, other]

Black Holes and Wormholes Beyond Classical General Relativity

Authors: A. S. Agrawal, Sergio Zerbini, B. Mishra

Abstract: In the paper, only Static Spherically Symmetric space-times in four dimensions are considered within modified gravity models. The non-singular static metrics, including black holes not admitting a de Sitter core in the center and traversable wormholes, are reconsidered within a class of higher-order $F(R)$, satisfying the constraints $F(0)=\frac{dF}{dR}(0)=0$. Furthermore, by making use of the so-… ▽ More In the paper, only Static Spherically Symmetric space-times in four dimensions are considered within modified gravity models. The non-singular static metrics, including black holes not admitting a de Sitter core in the center and traversable wormholes, are reconsidered within a class of higher-order $F(R)$, satisfying the constraints $F(0)=\frac{dF}{dR}(0)=0$. Furthermore, by making use of the so-called effective field theory formulation of gravity, the quantum corrections to Einstein-Hilbert's action due to higher-derivative terms related to curvature invariants are investigated. In particular, in the case of Einstein-Hilbert action plus cubic curvature Goroff-Sagnotti contribution, the second-order correction in the Goroff-Sagnotti coupling constant is computed. In general, it is shown that the effective metrics, namely Schwarzschild expression plus small quantum corrections, are related to black holes and not to traversable wormholes. In this framework, within the approximation considered, the resolution of singularity for $r=0$ is not accomplished. The related properties of these solutions are investigated. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.19747 [pdf, other]

Understanding and mitigating difficulties in posterior predictive evaluation

Authors: Abhinav Agrawal, Justin Domke

Abstract: Predictive posterior densities (PPDs) are of interest in approximate Bayesian inference. Typically, these are estimated by simple Monte Carlo (MC) averages using samples from the approximate posterior. We observe that the signal-to-noise ratio (SNR) of such estimators can be extremely low. An analysis for exact inference reveals SNR decays exponentially as there is an increase in (a) the mismatch… ▽ More Predictive posterior densities (PPDs) are of interest in approximate Bayesian inference. Typically, these are estimated by simple Monte Carlo (MC) averages using samples from the approximate posterior. We observe that the signal-to-noise ratio (SNR) of such estimators can be extremely low. An analysis for exact inference reveals SNR decays exponentially as there is an increase in (a) the mismatch between training and test data, (b) the dimensionality of the latent space, or (c) the size of the test data relative to the training data. Further analysis extends these results to approximate inference. To remedy the low SNR problem, we propose replacing simple MC sampling with importance sampling using a proposal distribution optimized at test time on a variational proxy for the SNR and demonstrate that this yields greatly improved estimates. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17247 [pdf, other]

An Introduction to Vision-Language Modeling

Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind mapping vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on mapping images to language, we also discuss extending VLMs to videos. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.13938 [pdf, other]

eXmY: A Data Type and Technique for Arbitrary Bit Precision Quantization

Authors: Aditya Agrawal, Matthew Hedlund, Blake Hechtman

Abstract: eXmY is a novel data type for quantization of ML models. It supports both arbitrary bit widths and arbitrary integer and floating point formats. For example, it seamlessly supports 3, 5, 6, 7, 9 bit formats. For a specific bit width, say 7, it defines all possible formats e.g. e0m6, e1m5, e2m4, e3m3, e4m2, e5m1 and e6m0. For non-power of two bit widths e.g. 5, 6, 7, we created a novel encoding and… ▽ More eXmY is a novel data type for quantization of ML models. It supports both arbitrary bit widths and arbitrary integer and floating point formats. For example, it seamlessly supports 3, 5, 6, 7, 9 bit formats. For a specific bit width, say 7, it defines all possible formats e.g. e0m6, e1m5, e2m4, e3m3, e4m2, e5m1 and e6m0. For non-power of two bit widths e.g. 5, 6, 7, we created a novel encoding and decoding scheme which achieves perfect compression, byte addressability and is amenable to sharding and vector processing. We implemented libraries for emulation, encoding and decoding tensors and checkpoints in C++, TensorFlow, JAX and PAX. For optimal performance, the codecs use SIMD instructions on CPUs and vector instructions on TPUs and GPUs. eXmY is also a technique and exploits the statistical distribution of exponents in tensors. It can be used to quantize weights, static and dynamic activations, gradients, master weights and optimizer state. It can reduce memory (CPU DRAM and accelerator HBM), network and disk storage and transfers. It can increase multi tenancy and accelerate compute. eXmY has been deployed in production for almost 2 years. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12993 [pdf]

A novel portfolio construction strategy based on the core-periphery profile of stocks

Authors: Imran Ansari, Charu Sharma, Akshay Agrawal, Niteesh Sahni

Abstract: This paper highlights the significance of mesoscale structures, particularly the core-periphery structure, in financial networks for portfolio optimization. We build portfolios of stocks belonging to the periphery part of the Planar maximally filtered subgraphs of the underlying network of stocks created from Pearson correlations between pairs of stocks and compare its performance with some well-k… ▽ More This paper highlights the significance of mesoscale structures, particularly the core-periphery structure, in financial networks for portfolio optimization. We build portfolios of stocks belonging to the periphery part of the Planar maximally filtered subgraphs of the underlying network of stocks created from Pearson correlations between pairs of stocks and compare its performance with some well-known strategies of Pozzi et. al. hinging around the local indices of centrality in terms of the Sharpe ratio, returns and standard deviation. Our findings reveal that these portfolios consistently outperform traditional strategies and further the core-periphery profile obtained is statistically significant across time periods. These empirical findings substantiate the efficacy of using the core-periphery profile of the stock market network for both inter-day and intraday trading and provide valuable insights for investors seeking better returns. △ Less

Submitted 27 April, 2024; originally announced May 2024.

arXiv:2405.12859 [pdf, other]

Defect-assisted reversible phase transition in mono- and few-layer ReS$_2$

Authors: George Zograf, Andrew B. Yankovich, Betül Küçüköz, Abhay V. Agrawal, Alexander Yu. Polyakov, Joachim Ciers, Fredrik Eriksson, Åsa Haglund, Paul Erhart, Tomasz J. Antosiewicz, Eva Olsson, Timur O. Shegai

Abstract: Transition metal dichalcogenide (TMD) materials have attracted substantial interest due to their remarkable excitonic, optical, electrical, and mechanical properties, which are highly dependent on their crystal structure. Controlling the crystal structure of these materials is essential for fine-tuning their performance, $\textit{e.g.}$, linear and nonlinear optical, as well as charge transport pr… ▽ More Transition metal dichalcogenide (TMD) materials have attracted substantial interest due to their remarkable excitonic, optical, electrical, and mechanical properties, which are highly dependent on their crystal structure. Controlling the crystal structure of these materials is essential for fine-tuning their performance, $\textit{e.g.}$, linear and nonlinear optical, as well as charge transport properties. While various phase-switching TMD materials, like molybdenum telluride (MoTe$_2$), are available, their transitions are often irreversible. Here, we investigate the mechanism of a light-induced reversible phase transition in mono- and bilayer flakes of rhenium disulfide (ReS$_2$). Our observations, based on scanning transmission electron microscopy, nonlinear spectroscopy, and density functional theory calculations, reveal a transition from the ground T$''$ (double distorted T) to the metastable H$'$ (distorted H) phase under femtosecond laser irradiation or influence of highly-energetic electrons. We show that the formation of sulfur vacancies facilitates this phenomenon. Our findings pave the way towards actively manipulating the crystal structure of ReS$_2$ and possibly its heterostructures. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.05465 [pdf, other]

Vidur: A Large-Scale Simulation Framework For LLM Inference

Authors: Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov

Abstract: Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easil… ▽ More Optimizing the deployment of Large language models (LLMs) is expensive today since it requires experimentally running an application workload against an LLM implementation while exploring large configuration space formed by system knobs such as parallelization strategies, batching techniques, and scheduling policies. To address this challenge, we present Vidur - a large-scale, high-fidelity, easily-extensible simulation framework for LLM inference performance. Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads by estimating several metrics of interest such as latency and throughput. We validate the fidelity of Vidur on several LLMs and show that it estimates inference latency with less than 9% error across the range. Further, we present Vidur-Search, a configuration search tool that helps optimize LLM deployment. Vidur-Search uses Vidur to automatically identify the most cost-effective deployment configuration that meets application performance constraints. For example, Vidur-Search finds the best deployment configuration for LLaMA2-70B in one hour on a CPU machine, in contrast to a deployment-based exploration which would require 42K GPU hours - costing ~218K dollars. Source code for Vidur is available at https://github.com/microsoft/vidur. △ Less

Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.01790 [pdf, other]

Understanding Position Bias Effects on Fairness in Social Multi-Document Summarization

Authors: Olubusayo Olabisi, Ameeta Agrawal

Abstract: Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to assess not only the quality of the gene… ▽ More Text summarization models have typically focused on optimizing aspects of quality such as fluency, relevance, and coherence, particularly in the context of news articles. However, summarization models are increasingly being used to summarize diverse sources of text, such as social media data, that encompass a wide demographic user base. It is thus crucial to assess not only the quality of the generated summaries, but also the extent to which they can fairly represent the opinions of diverse social groups. Position bias, a long-known issue in news summarization, has received limited attention in the context of social multi-document summarization. We deeply investigate this phenomenon by analyzing the effect of group ordering in input documents when summarizing tweets from three distinct linguistic communities: African-American English, Hispanic-aligned Language, and White-aligned Language. Our empirical analysis shows that although the textual quality of the summaries remains consistent regardless of the input document order, in terms of fairness, the results vary significantly depending on how the dialect groups are presented in the input data. Our results suggest that position bias manifests differently in social multi-document summarization, severely impacting the fairness of summarization models. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted at VarDial 2024

arXiv:2404.19159 [pdf, other]

What Drives Performance in Multilingual Language Models?

Authors: Sina Bagheri Nezhad, Ameeta Agrawal

Abstract: This study investigates the factors influencing the performance of multilingual large language models (MLLMs) across diverse languages. We study 6 MLLMs, including masked language models, autoregressive models, and instruction-tuned LLMs, on the SIB-200 dataset, a topic classification dataset encompassing 204 languages. Our analysis considers three scenarios: ALL languages, SEEN languages (present… ▽ More This study investigates the factors influencing the performance of multilingual large language models (MLLMs) across diverse languages. We study 6 MLLMs, including masked language models, autoregressive models, and instruction-tuned LLMs, on the SIB-200 dataset, a topic classification dataset encompassing 204 languages. Our analysis considers three scenarios: ALL languages, SEEN languages (present in the model's pretraining data), and UNSEEN languages (not present or documented in the model's pretraining data in any meaningful way). We examine the impact of factors such as pretraining data size, general resource availability, language family, and script type on model performance. Decision tree analysis reveals that pretraining data size is the most influential factor for SEEN languages. However, interestingly, script type and language family are crucial for UNSEEN languages, highlighting the importance of cross-lingual transfer learning. Notably, model size and architecture do not significantly alter the most important features identified. Our findings provide valuable insights into the strengths and limitations of current MLLMs and hope to guide the development of more effective and equitable multilingual NLP systems. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted at VarDial @ NAACL 2024

ACM Class: I.2.7

arXiv:2404.18090 [pdf, other]

A Novel Classification of Attacks on Blockchain Layers: Vulnerabilities, Attacks, Mitigations, and Research Directions

Authors: Kaustubh Dwivedi, Ankit Agrawal, Ashutosh Bhatia, Kamlesh Tiwari

Abstract: The widespread adoption of blockchain technology has amplified the spectrum of potential threats to its integrity and security. The ongoing quest to exploit vulnerabilities emphasizes how critical it is to expand on current research initiatives. Thus, using a methodology based on discrete blockchain layers, our survey study aims to broaden the existing body of knowledge by thoroughly discussing bo… ▽ More The widespread adoption of blockchain technology has amplified the spectrum of potential threats to its integrity and security. The ongoing quest to exploit vulnerabilities emphasizes how critical it is to expand on current research initiatives. Thus, using a methodology based on discrete blockchain layers, our survey study aims to broaden the existing body of knowledge by thoroughly discussing both new and known attack vectors inside the blockchain ecosystem. This survey proposes a novel classification of blockchain attacks and an in-depth investigation of blockchain data security. In particular, the paper provides a thorough discussion of the attack techniques and vulnerabilities that are specific to each tier, along with a detailed look at mitigating techniques. We reveal the deep dynamics of these security concerns by closely investigating the fundamental causes of attacks at various blockchain tiers. We clarify mitigation methods for known vulnerabilities and offer new information on recently developed attack vectors. We also discuss the implications of quantum computing in blockchain and the weaknesses in the current technology that can be exploited in the future. Our study advances the field of blockchain security and privacy research while also contributing to our understanding of blockchain vulnerabilities and attacks. This survey paper is a useful tool for readers who want to learn more about the intricacies of blockchain security. It also invites researchers to help strengthen blockchain privacy and security, paving the way for further developments in this dynamic and ever-evolving field. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.13530 [pdf, other]

Listen Then See: Video Alignment with Speaker Attention

Authors: Aviral Agrawal, Carlos Mateo Samudio Lezcano, Iqui Balam Heredia-Marin, Prabhdeep Singh Sethi

Abstract: Video-based Question Answering (Video QA) is a challenging task and becomes even more intricate when addressing Socially Intelligent Question Answering (SIQA). SIQA requires context understanding, temporal reasoning, and the integration of multimodal information, but in addition, it requires processing nuanced human behavior. Furthermore, the complexities involved are exacerbated by the dominance… ▽ More Video-based Question Answering (Video QA) is a challenging task and becomes even more intricate when addressing Socially Intelligent Question Answering (SIQA). SIQA requires context understanding, temporal reasoning, and the integration of multimodal information, but in addition, it requires processing nuanced human behavior. Furthermore, the complexities involved are exacerbated by the dominance of the primary modality (text) over the others. Thus, there is a need to help the task's secondary modalities to work in tandem with the primary modality. In this work, we introduce a cross-modal alignment and subsequent representation fusion approach that achieves state-of-the-art results (82.06\% accuracy) on the Social IQ 2.0 dataset for SIQA. Our approach exhibits an improved ability to leverage the video modality by using the audio modality as a bridge with the language modality. This leads to enhanced performance by reducing the prevalent issue of language overfitting and resultant video modality bypassing encountered by current existing techniques. Our code and models are publicly available at https://github.com/sts-vlcc/sts-vlcc △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13210 [pdf, other]

Laser cooling $^{88}$Sr to microkelvin temperature with an integrated-photonics system

Authors: Andrew R. Ferdinand, Zheng Luo, Sindhu Jammi, Zachary Newman, Grisha Spektor, Okan Koksal, Parth B. Patel, Daniel Sheredy, William Lunden, Akash Rakholia, Travis C. Briles, Wenqi Zhu, Martin M. Boyd, Amit Agrawal, Scott B. Papp

Abstract: We report on experiments generating a magneto-optical trap (MOT) of 88-strontium ($^{88}$Sr) atoms at microkelvin temperature, using integrated-photonics devices. With metasurface optics integrated on a fused-silica substrate, we generate six-beam, circularly polarized, counter-propagating MOTs on the blue broad-line, 461 nm, and red narrow-line, 689 nm, Sr cooling transitions without bulk optics.… ▽ More We report on experiments generating a magneto-optical trap (MOT) of 88-strontium ($^{88}$Sr) atoms at microkelvin temperature, using integrated-photonics devices. With metasurface optics integrated on a fused-silica substrate, we generate six-beam, circularly polarized, counter-propagating MOTs on the blue broad-line, 461 nm, and red narrow-line, 689 nm, Sr cooling transitions without bulk optics. By use of a diverging beam configuration, we create up to 10 mm diameter MOT beams at the trapping location. To frequency stabilize and linewidth narrow the cooling lasers, we use fiber-packaged, integrated nonlinear waveguides to spectrally broaden a frequency comb. The ultra-coherent supercontinuum of the waveguides covers 650 nm to 2500 nm, enabling phase locks of the cooling lasers to hertz level linewidth. Our work highlights the possibility to simplify the preparation of an ultracold 88Sr gas for an optical-lattice clock with photonic devices. By implementing a timing sequence for control of the MOT lasers and the quadrupole magnetic-field gradient, we collect atoms directly from a thermal beam into the blue MOT and continuously cool into a red MOT with dynamic detuning and intensity control. There, the red MOT temperature is as low as $2~μ$K and the overall transfer efficiency up to 16%. We characterize this sequence, including an intermediate red MOT with modulated detuning. Our experiments demonstrate an integrated photonics system capable of cooling alkaline-earth gases to microkelvin temperature with sufficient transfer efficiencies for adoption in scalable optical clocks and quantum sensors. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 5 pages, 3 figures

arXiv:2404.12241 [pdf, other]

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark. △ Less

Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.09771 [pdf, other]

Eliminating Crossings in Ordered Graphs

Authors: Akanksha Agrawal, Sergio Cabello, Michael Kaufmann, Saket Saurabh, Roohani Sharma, Yushi Uno, Alexander Wolff

Abstract: Drawing a graph in the plane with as few crossings as possible is one of the central problems in graph drawing and computational geometry. Another option is to remove the smallest number of vertices or edges such that the remaining graph can be drawn without crossings. We study both problems in a book-embedding setting for ordered graphs, that is, graphs with a fixed vertex order. In this setting,… ▽ More Drawing a graph in the plane with as few crossings as possible is one of the central problems in graph drawing and computational geometry. Another option is to remove the smallest number of vertices or edges such that the remaining graph can be drawn without crossings. We study both problems in a book-embedding setting for ordered graphs, that is, graphs with a fixed vertex order. In this setting, the vertices lie on a straight line, called the spine, in the given order, and each edge must be drawn on one of several pages of a book such that every edge has at most a fixed number of crossings. In book embeddings, there is another way to reduce or avoid crossings; namely by using more pages. The minimum number of pages needed to draw an ordered graph without any crossings is its (fixed-vertex-order) page number. We show that the page number of an ordered graph with $n$ vertices and $m$ edges can be computed in $2^m \cdot n^{O(1)}$ time. An $O(\log n)$-approximation of this number can be computed efficiently. We can decide in $2^{O(d \sqrt{k} \log (d+k))} \cdot n^{O(1)}$ time whether it suffices to delete $k$ edges of an ordered graph to obtain a $d$-planar layout (where every edge crosses at most $d$ other edges) on one page. As an additional parameter, we consider the size $h$ of a hitting set, that is, a set of points on the spine such that every edge, seen as an open interval, contains at least one of the points. For $h=1$, we can efficiently compute the minimum number of edges whose deletion yields fixed-vertex-order page number $p$. For $h>1$, we give an XP algorithm with respect to $h+p$. Finally, we consider spine+$t$-track drawings, where some but not all vertices lie on the spine. The vertex order on the spine is given; we must map every vertex that does not lie on the spine to one of $t$ tracks, each of which is a straight line on a separate page, parallel to the spine. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Appears in Proc. 19th Scandinavian Symposium on Algorithm Theory (SWAT 2024)

arXiv:2404.08744 [pdf, other]

Routing and Spectrum Allocation in Broadband Quantum Entanglement Distribution

Authors: Rohan Bali, Ashley N. Tittelbaugh, Shelbi L. Jenkins, Anuj Agrawal, Jerry Horgan, Marco Ruffini, Daniel C. Kilper, Boulat A. Bash

Abstract: We investigate resource allocation for quantum entanglement distribution over an optical network. We characterize and model a network architecture that employs a single quasi-deterministic time-frequency heralded Einstein-Podolsky-Rosen (EPR) pair source, and develop a routing scheme for distributing entangled photon pairs over such a network. We focus on max-min fairness in entanglement distribut… ▽ More We investigate resource allocation for quantum entanglement distribution over an optical network. We characterize and model a network architecture that employs a single quasi-deterministic time-frequency heralded Einstein-Podolsky-Rosen (EPR) pair source, and develop a routing scheme for distributing entangled photon pairs over such a network. We focus on max-min fairness in entanglement distribution and compare the performance of various spectrum allocation schemes by examining the max-min and median number of EPR-pairs assigned by them, and the Jain index associated with this assignment. Since this presents an NP-hard problem, we identify two approximation algorithms that outperform others in minimum and mean EPR-pair rate distribution and are comparable to others in the Jain index. We also analyze how the network size and connectivity affect these metrics using Watts-Strogatz random graphs. We find that a spectrum allocation approach that achieves high minimum EPR-pair rate can perform significantly worse when the median EPR-pair rate, Jain index, and runtimes are considered. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2311.14613

arXiv:2404.02139 [pdf, other]

Lensed Type Ia Supernova "Encore" at z=2: The First Instance of Two Multiply-Imaged Supernovae in the Same Host Galaxy

Authors: J. D. R. Pierel, A. B. Newman, S. Dhawan, M. Gu, B. A. Joshi, T. Li, S. Schuldt, L. G. Strolger, S. H. Suyu, G. B. Caminha, S. H. Cohen, J. M. Diego, J. C. J. Dsilva, S. Ertl, B. L. Frye, G. Granata, C. Grillo, A. M. Koekemoer, J. Li, A. Robotham, J. Summers, T. Treu, R. A. Windhorst, A. Zitrin, S. Agarwal , et al. (38 additional authors not shown)

Abstract: A bright ($m_{\rm F150W,AB}$=24 mag), $z=1.95$ supernova (SN) candidate was discovered in JWST/NIRCam imaging acquired on 2023 November 17. The SN is quintuply-imaged as a result of strong gravitational lensing by a foreground galaxy cluster, detected in three locations, and remarkably is the second lensed SN found in the same host galaxy. The previous lensed SN was called "Requiem", and therefore… ▽ More A bright ($m_{\rm F150W,AB}$=24 mag), $z=1.95$ supernova (SN) candidate was discovered in JWST/NIRCam imaging acquired on 2023 November 17. The SN is quintuply-imaged as a result of strong gravitational lensing by a foreground galaxy cluster, detected in three locations, and remarkably is the second lensed SN found in the same host galaxy. The previous lensed SN was called "Requiem", and therefore the new SN is named "Encore". This makes the MACS J0138.0$-$2155 cluster the first known system to produce more than one multiply-imaged SN. Moreover, both SN Requiem and SN Encore are Type Ia SNe (SNe Ia), making this the most distant case of a galaxy hosting two SNe Ia. Using parametric host fitting, we determine the probability of detecting two SNe Ia in this host galaxy over a $\sim10$ year window to be $\approx3\%$. These observations have the potential to yield a Hubble Constant ($H_0$) measurement with $\sim10\%$ precision, only the third lensed SN capable of such a result, using the three visible images of the SN. Both SN Requiem and SN Encore have a fourth image that is expected to appear within a few years of $\sim2030$, providing an unprecedented baseline for time-delay cosmography. △ Less

Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Submitted to ApJL

arXiv:2403.18183 [pdf, other]

Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence

Authors: Hsiu-Wei Yang, Abhinav Agrawal, Pavlos Fragkogiannis, Shubham Nitin Mulay

Abstract: A well-designed document communicates not only through its words but also through its visual eloquence. Authors utilize aesthetic elements such as colors, fonts, graphics, and layouts to shape the perception of information. Thoughtful document design, informed by psychological insights, enhances both the visual appeal and the comprehension of the content. While state-of-the-art document AI models… ▽ More A well-designed document communicates not only through its words but also through its visual eloquence. Authors utilize aesthetic elements such as colors, fonts, graphics, and layouts to shape the perception of information. Thoughtful document design, informed by psychological insights, enhances both the visual appeal and the comprehension of the content. While state-of-the-art document AI models demonstrate the benefits of incorporating layout and image data, it remains unclear whether the nuances of document aesthetics are effectively captured. To bridge the gap between human cognition and AI interpretation of aesthetic elements, we formulated hypotheses concerning AI behavior in document understanding tasks, specifically anchored in document design principles. With a focus on legibility and layout quality, we tested four aspects of aesthetic effects: noise, font-size contrast, alignment, and complexity, on model confidence using correlational analysis. The results and observations highlight the value of model analysis rooted in document design theories. Our work serves as a trailhead for further studies and we advocate for continued research in this topic to deepen our understanding of how AI interprets document aesthetics. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.18121 [pdf, other]

ChatGPT Role-play Dataset: Analysis of User Motives and Model Naturalness

Authors: Yufei Tao, Ameeta Agrawal, Judit Dombi, Tetyana Sydorenko, Jung In Lee

Abstract: Recent advances in interactive large language models like ChatGPT have revolutionized various domains; however, their behavior in natural and role-play conversation settings remains underexplored. In our study, we address this gap by deeply investigating how ChatGPT behaves during conversations in different settings by analyzing its interactions in both a normal way and a role-play setting. We int… ▽ More Recent advances in interactive large language models like ChatGPT have revolutionized various domains; however, their behavior in natural and role-play conversation settings remains underexplored. In our study, we address this gap by deeply investigating how ChatGPT behaves during conversations in different settings by analyzing its interactions in both a normal way and a role-play setting. We introduce a novel dataset of broad range of human-AI conversations annotated with user motives and model naturalness to examine (i) how humans engage with the conversational AI model, and (ii) how natural are AI model responses. Our study highlights the diversity of user motives when interacting with ChatGPT and variable AI naturalness, showing not only the nuanced dynamics of natural conversations between humans and AI, but also providing new avenues for improving the effectiveness of human-AI communication. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2403.17804 [pdf, other]

Improving Text-to-Image Consistency via Automatic Prompt Optimization

Authors: Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

Abstract: Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions… ▽ More Impressive advances in text-to-image (T2I) generative models have yielded a plethora of high performing models which are able to generate aesthetically appealing, photorealistic images. Despite the progress, these models still struggle to produce images that are consistent with the input prompt, oftentimes failing to capture object quantities, relations and attributes properly. Existing solutions to improve prompt-image consistency suffer from the following challenges: (1) they oftentimes require model fine-tuning, (2) they only focus on nearby prompt samples, and (3) they are affected by unfavorable trade-offs among image quality, representation diversity, and prompt-image consistency. In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models. Our framework starts from a user prompt and iteratively generates revised prompts with the goal of maximizing a consistency score. Our extensive validation on two datasets, MSCOCO and PartiPrompts, shows that OPT2I can boost the initial consistency score by up to 24.9% in terms of DSG score while preserving the FID and increasing the recall between generated and real data. Our work paves the way toward building more reliable and robust T2I systems by harnessing the power of LLMs. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.16287 [pdf, other]

Coupled Requirements-driven Testing of CPS: From Simulation To Reality

Authors: Ankit Agrawal, Philipp Zech, Michael Vierhauser

Abstract: Failures in safety-critical Cyber-Physical Systems (CPS), both software and hardware-related, can lead to severe incidents impacting physical infrastructure or even harming humans. As a result, extensive simulations and field tests need to be conducted, as part of the verification and validation of system requirements, to ensure system safety. However, current simulation and field testing practice… ▽ More Failures in safety-critical Cyber-Physical Systems (CPS), both software and hardware-related, can lead to severe incidents impacting physical infrastructure or even harming humans. As a result, extensive simulations and field tests need to be conducted, as part of the verification and validation of system requirements, to ensure system safety. However, current simulation and field testing practices, particularly in the domain of small Unmanned Aerial Systems (sUAS), are ad-hoc and lack a thorough, structured testing process. Furthermore, there is a dearth of standard processes and methodologies to inform the design of comprehensive simulation and field tests. This gap in the testing process leads to the deployment of sUAS applications that are: (a) tested in simulation environments which do not adequately capture the real-world complexity, such as environmental factors, due to a lack of tool support; (b) not subjected to a comprehensive range of scenarios during simulation testing to validate the system requirements, due to the absence of a process defining the relationship between requirements and simulation tests; and (c) not analyzed through standard safety analysis processes, because of missing traceability between simulation testing artifacts and safety analysis artifacts. To address these issues, we have developed an initial framework for validating CPS, specifically focusing on sUAS and robotic applications. We demonstrate the suitability of our framework by applying it to an example from the sUAS domain. Our preliminary results confirm the applicability of our framework. We conclude with a research roadmap to outline our next research goals along with our current proposal. △ Less

Submitted 21 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14938 [pdf, ps, other]

On Zero-Shot Counterspeech Generation by LLMs

Authors: Punyajoy Saha, Aalok Agrawal, Abhik Jana, Chris Biemann, Animesh Mukherjee

Abstract: With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large lan… ▽ More With the emergence of numerous Large Language Models (LLM), the usage of such models in various Natural Language Processing (NLP) applications is increasing extensively. Counterspeech generation is one such key task where efforts are made to develop generative models by fine-tuning LLMs with hatespeech - counterspeech pairs, but none of these attempts explores the intrinsic properties of large language models in zero-shot settings. In this work, we present a comprehensive analysis of the performances of four LLMs namely GPT-2, DialoGPT, ChatGPT and FlanT5 in zero-shot settings for counterspeech generation, which is the first of its kind. For GPT-2 and DialoGPT, we further investigate the deviation in performance with respect to the sizes (small, medium, large) of the models. On the other hand, we propose three different prompting strategies for generating different types of counterspeech and analyse the impact of such strategies on the performance of the models. Our analysis shows that there is an improvement in generation quality for two datasets (17%), however the toxicity increase (25%) with increase in model size. Considering type of model, GPT-2 and FlanT5 models are significantly better in terms of counterspeech quality but also have high toxicity as compared to DialoGPT. ChatGPT are much better at generating counter speech than other models across all metrics. In terms of prompting, we find that our proposed strategies help in improving counter speech generation across all the models. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 12 pages, 7 tables, accepted at LREC-COLING 2024

arXiv:2403.14208 [pdf, other]

Automatic Annotation of Grammaticality in Child-Caregiver Conversations

Authors: Mitja Nikolaus, Abhishek Agrawal, Petros Kaklamanis, Alex Warstadt, Abdellah Fourtassi

Abstract: The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools for automatic annotation can offer an effective alternative to tedious manual annotation. We propose a coding scheme for context-dependent grammaticalit… ▽ More The acquisition of grammar has been a central question to adjudicate between theories of language acquisition. In order to conduct faster, more reproducible, and larger-scale corpus studies on grammaticality in child-caregiver conversations, tools for automatic annotation can offer an effective alternative to tedious manual annotation. We propose a coding scheme for context-dependent grammaticality in child-caregiver conversations and annotate more than 4,000 utterances from a large corpus of transcribed conversations. Based on these annotations, we train and evaluate a range of NLP models. Our results show that fine-tuned Transformer-based models perform best, achieving human inter-annotation agreement levels.As a first application and sanity check of this tool, we use the trained models to annotate a corpus almost two orders of magnitude larger than the manually annotated data and verify that children's grammaticality shows a steady increase with age.This work contributes to the growing literature on applying state-of-the-art NLP methods to help study child language acquisition at scale. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Journal ref: LREC-Coling 2024, May 2024, Turin, Italy

arXiv:2403.07118 [pdf, other]

Narrating Causal Graphs with Large Language Models

Authors: Atharva Phatak, Vijay K. Mago, Ameeta Agrawal, Aravind Inbasekaran, Philippe J. Giabbanelli

Abstract: The use of generative AI to create text descriptions from graphs has mostly focused on knowledge graphs, which connect concepts using facts. In this work we explore the capability of large pretrained language models to generate text from causal graphs, where salient concepts are represented as nodes and causality is represented via directed, typed edges. The causal reasoning encoded in these graph… ▽ More The use of generative AI to create text descriptions from graphs has mostly focused on knowledge graphs, which connect concepts using facts. In this work we explore the capability of large pretrained language models to generate text from causal graphs, where salient concepts are represented as nodes and causality is represented via directed, typed edges. The causal reasoning encoded in these graphs can support applications as diverse as healthcare or marketing. Using two publicly available causal graph datasets, we empirically investigate the performance of four GPT-3 models under various settings. Our results indicate that while causal text descriptions improve with training data, compared to fact-based graphs, they are harder to generate under zero-shot settings. Results further suggest that users of generative AI can deploy future applications faster since similar performances are obtained when training a model with only a few examples as compared to fine-tuning via a large curated dataset. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: HICSS '24

Report number: https://hdl.handle.net/10125/107290

Journal ref: Proceedings of the 57th Hawaii International Conference on System Sciences 2024

arXiv:2403.07089 [pdf, other]

Graph learning methods to extract empathy supporting regions in a naturalistic stimuli fMRI

Authors: Sasanka GRS, Ayushi Agrawal, Santosh Nannuru, Kavita Vemuri

Abstract: Functional MRI (fMRI) research, employing naturalistic stimuli like movies, explores brain network interactions in complex cognitive processes such as empathy. The empathy network encompasses multiple brain areas, including the Insula, PFC, ACC, and parietal regions. Our novel processing pipeline applies graph learning methods to whole-brain timeseries signals, incorporating high-pass filtering, v… ▽ More Functional MRI (fMRI) research, employing naturalistic stimuli like movies, explores brain network interactions in complex cognitive processes such as empathy. The empathy network encompasses multiple brain areas, including the Insula, PFC, ACC, and parietal regions. Our novel processing pipeline applies graph learning methods to whole-brain timeseries signals, incorporating high-pass filtering, voxel-level clustering, and windowed graph learning with a sparsity-based approach. The study involves two short movies shown to 14 healthy volunteers, considering 54 regions extracted from the AAL Atlas. The sparsity-based graph learning consistently outperforms, achieving over 88% accuracy in capturing emotion contagion variations. Temporal analysis reveals a gradual induction of empathy, supported by the method's effectiveness in capturing dynamic connectomes through graph clustering. Edge-weight dynamics analysis underscores sparsity-based learning's superiority, while connectome-network analysis highlights the pivotal role of the Insula, Amygdala, and Thalamus in empathy. Spectral filtering analysis emphasizes the band-pass filter's significance in isolating regions linked to emotional and empathetic processing during empathy HIGH states. Key regions like Amygdala, Insula, and Angular Gyrus consistently activate, supporting their critical role in immediate emotional responses. Strong similarities across movies in graph cluster labels, connectome-network analysis, and spectral filtering-based analyses reveal robust neural correlates of empathy. These findings advance our understanding of empathy-related neural dynamics and identify specific regions in empathetic responses, offering insights for targeted interventions and treatments associated with empathetic processing. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 9 figures, 2 tables

arXiv:2403.06159 [pdf]

Cracking the neural code for word recognition in convolutional neural networks

Authors: Aakash Agrawal, Stanislas Dehaene

Abstract: Learning to read places a strong challenge on the visual system. Years of expertise lead to a remarkable capacity to separate highly similar letters and encode their relative positions, thus distinguishing words such as FORM and FROM, invariantly over a large range of sizes and absolute positions. How neural circuits achieve invariant word recognition remains unknown. Here, we address this issue b… ▽ More Learning to read places a strong challenge on the visual system. Years of expertise lead to a remarkable capacity to separate highly similar letters and encode their relative positions, thus distinguishing words such as FORM and FROM, invariantly over a large range of sizes and absolute positions. How neural circuits achieve invariant word recognition remains unknown. Here, we address this issue by training deep neural network models to recognize written words and then analyzing how reading-specialized units emerge and operate across different layers of the network. With literacy, a small subset of units becomes specialized for word recognition in the learned script, similar to the "visual word form area" of the human brain. We show that these units are sensitive to specific letter identities and their distance from the blank space at the left or right of a word, thus acting as "space bigrams". These units specifically encode ordinal positions and operate by pooling across low and high-frequency detector units from early layers of the network. The proposed neural code provides a mechanistic insight into how information on letter identity and position is extracted and allow for invariant word recognition, and leads to predictions for reading behavior, error patterns, and the neurophysiology of reading. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 33 pages, 6 main figures, 4 supplementary figures

arXiv:2403.04298 [pdf, other]

doi 10.1109/WI-IAT55865.2022.00096

Understanding how social discussion platforms like Reddit are influencing financial behavior

Authors: Sachin Thukral, Suyash Sangwan, Arnab Chatterjee, Lipika Dey, Aaditya Agrawal, Pramit Kumar Chandra, Animesh Mukherjee

Abstract: This study proposes content and interaction analysis techniques for a large repository created from social media content. Though we have presented our study for a large platform dedicated to discussions around financial topics, the proposed methods are generic and applicable to all platforms. Along with an extension of topic extraction method using Latent Dirichlet Allocation, we propose a few mea… ▽ More This study proposes content and interaction analysis techniques for a large repository created from social media content. Though we have presented our study for a large platform dedicated to discussions around financial topics, the proposed methods are generic and applicable to all platforms. Along with an extension of topic extraction method using Latent Dirichlet Allocation, we propose a few measures to assess user participation, influence and topic affinities specifically. Our study also maps user-generated content to components of behavioral finance. While these types of information are usually gathered through surveys, it is obvious that large scale data analysis from social media can reveal many potentially unknown or rare insights. Characterising users based on their platform behavior to provide critical insights about how communities are formed and trust is established in these platforms using graphical analysis is also studied. △ Less

Submitted 12 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

Comments: 8 pages, 8 figures, 3 tables, and 1 algorithm; Published in WI-IAT 2022 (The 21st IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology)

Journal ref: IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) 2022 (pp. 612-619)

arXiv:2403.02310 [pdf, other]

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Authors: Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee

Abstract: Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low… ▽ More Each LLM serving request goes through two phases. The first is prefill which processes the entire input prompt and produces the first output token and the second is decode which generates the rest of output tokens, one-at-a-time. Prefill iterations have high latency but saturate GPU compute due to parallel processing of the input prompt. In contrast, decode iterations have low latency but also low compute utilization because a decode iteration processes only a single token per request. This makes batching highly effective for decodes and consequently for overall throughput. However, batching multiple requests leads to an interleaving of prefill and decode iterations which makes it challenging to achieve both high throughput and low latency. We introduce an efficient LLM inference scheduler, Sarathi-Serve, to address this throughput-latency tradeoff. Sarathi-Serve introduces chunked-prefills which splits a prefill request into near equal sized chunks and creates stall-free schedules that adds new requests in a batch without pausing ongoing decodes. Stall-free scheduling unlocks the opportunity to improve throughput with large batch sizes while minimizing the effect of batching on latency. Furthermore, uniform batches in Sarathi-Serve ameliorate the imbalance between iterations resulting in minimal pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware under tail latency constraints. For Mistral-7B on single A100 GPUs, we achieve 2.6x higher serving capacity and up to 3.7x higher serving capacity for the Yi-34B model on two A100 GPUs as compared to vLLM. When used with pipeline parallelism on Falcon-180B, Sarathi-Serve provides up to 5.6x gain in the end-to-end serving capacity. The source code for Sarathi-Serve is available at https://github.com/microsoft/sarathi-serve. △ Less

Submitted 17 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00310 [pdf, other]

doi 10.1016/j.physletb.2024.138818

Wormholes in the f(R,L,T) theory of gravity

Authors: P. H. R. S. Moraes, A. S. Agrawal, B. Mishra

Abstract: Morris and Thorne developed wormhole solutions in the late 1980s when they discovered a recipe that wormholes must follow for travelers to cross them safely. They describe exotic matter as satisfying $-p_{r} > ρ$, where $p_{r}$ is the radial pressure and $ρ$ is the energy density of the wormhole. This is a notable characteristic of the General Relativity Theory. The current article discusses trave… ▽ More Morris and Thorne developed wormhole solutions in the late 1980s when they discovered a recipe that wormholes must follow for travelers to cross them safely. They describe exotic matter as satisfying $-p_{r} > ρ$, where $p_{r}$ is the radial pressure and $ρ$ is the energy density of the wormhole. This is a notable characteristic of the General Relativity Theory. The current article discusses traversable wormhole solutions in $f(R, L, T)=R+αL+βT$, with $α$ and $β$ are model parameters. The wormhole solutions presented here satisfy the metric constraints of traversability while remarkably avoiding the exotic matter condition, indicating that $f(R, L, T)$ gravity wormholes can be filled with ordinary matter. The derived solutions for the shape function of the wormhole meet the required metric conditions. They exhibit behavior that is comparable to that of wormholes reported in earlier references, which is also the case for our solutions for the energy density of such objects. △ Less

Submitted 22 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: 5 Pages, 4 figures

Journal ref: Physics Letter B, Volume 855, August 2024

arXiv:2403.00286 [pdf, other]

Niobium coaxial cavities with internal quality factors exceeding 1.5 billion for circuit quantum electrodynamics

Authors: Andrew E. Oriani, Fang Zhao, Tanay Roy, Alexander Anferov, Kevin He, Ankur Agrawal, Riju Banerjee, Srivatsan Chakram, David I. Schuster

Abstract: Group-V materials such as niobium and tantalum have become popular choices for extending the performance of circuit quantum electrodynamics (cQED) platforms allowing for quantum processors and memories with reduced error rates and more modes. The complex surface chemistry of niobium however makes identifying the main modes of decoherence difficult at millikelvin temperatures and single-photon powe… ▽ More Group-V materials such as niobium and tantalum have become popular choices for extending the performance of circuit quantum electrodynamics (cQED) platforms allowing for quantum processors and memories with reduced error rates and more modes. The complex surface chemistry of niobium however makes identifying the main modes of decoherence difficult at millikelvin temperatures and single-photon powers. We use niobium coaxial quarter-wave cavities to study the impact of etch chemistry, prolonged atmospheric exposure, and the significance of cavity conditions prior to and during cooldown, in particular niobium hydride evolution, on single-photon coherence. We demonstrate cavities with quality factors of $Q_{\rm int}\gtrsim 1.4\times10^{9}$ in the single-photon regime, a $15$ fold improvement over aluminum cavities of the same geometry. We rigorously quantify the sensitivity of our fabrication process to various loss mechanisms and demonstrate a $2-4\times$ reduction in the two-level system (TLS) loss tangent and a $3-5\times$ improvement in the residual resistivity over traditional BCP etching techniques. Finally, we demonstrate transmon integration and coherent cavity control while maintaining a cavity coherence of \SI{11.3}{ms}. The accessibility of our method, which can easily be replicated in academic-lab settings, and the demonstration of its performance mark an advancement in 3D cQED. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 14 pages, 10 figures

arXiv:2402.16714 [pdf, other]

Quantum linear algebra is all you need for Transformer architectures

Authors: Naixu Guo, Zhan Yu, Matthew Choi, Aman Agrawal, Kouhei Nakaji, Alán Aspuru-Guzik, Patrick Rebentrost

Abstract: Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures… ▽ More Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures under the lens of fault-tolerant quantum computing. The input model is one where trained weight matrices are given as block encodings and we construct the query, key, and value matrices for the transformer. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. In addition, we combine quantum subroutines to construct important building blocks in the transformer, the residual connection and layer normalization, and the feed-forward neural network. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction. Based on common open-source large-language models, we provide insights into the behavior of important parameters determining the run time of the quantum algorithm. We discuss the potential and challenges for obtaining a quantum advantage. △ Less

Submitted 30 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: 31 pages, 4 figures, 2 tables, comments are welcome

arXiv:2402.16159 [pdf, other]

DistALANER: Distantly Supervised Active Learning Augmented Named Entity Recognition in the Open Source Software Ecosystem

Authors: Somnath Banerjee, Avik Dutta, Aaditya Agrawal, Rima Hazra, Animesh Mukherjee

Abstract: With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent. A crucial requirement in the automation of support tools for such systems is the early identification of named entities, which… ▽ More With the AI revolution in place, the trend for building automated systems to support professionals in different domains such as the open source software systems, healthcare systems, banking systems, transportation systems and many others have become increasingly prominent. A crucial requirement in the automation of support tools for such systems is the early identification of named entities, which serves as a foundation for developing specialized functionalities. However, due to the specific nature of each domain, different technical terminologies and specialized languages, expert annotation of available data becomes expensive and challenging. In light of these challenges, this paper proposes a novel named entity recognition (NER) technique specifically tailored for the open-source software systems. Our approach aims to address the scarcity of annotated software data by employing a comprehensive two-step distantly supervised annotation process. This process strategically leverages language heuristics, unique lookup tables, external knowledge sources, and an active learning approach. By harnessing these powerful techniques, we not only enhance model performance but also effectively mitigate the limitations associated with cost and the scarcity of expert annotators. It is noteworthy that our model significantly outperforms the state-of-the-art LLMs by a substantial margin. We also show the effectiveness of NER in the downstream task of relation extraction. △ Less

Submitted 20 June, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted at ECML-PKDD 2024 (Long Paper)

arXiv:2402.11465 [pdf, other]

Odd Cycle Transversal on $P_5$-free Graphs in Polynomial Time

Authors: Akanksha Agrawal, Paloma T. Lima, Daniel Lokshtanov, Pawel Rzążewski, Saket Saurabh, Roohani Sharma

Abstract: An independent set in a graph G is a set of pairwise non-adjacent vertices. A graph $G$ is bipartite if its vertex set can be partitioned into two independent sets. In the Odd Cycle Transversal problem, the input is a graph $G$ along with a weight function $w$ associating a rational weight with each vertex, and the task is to find a smallest weight vertex subset $S$ in $G$ such that $G - S$ is bip… ▽ More An independent set in a graph G is a set of pairwise non-adjacent vertices. A graph $G$ is bipartite if its vertex set can be partitioned into two independent sets. In the Odd Cycle Transversal problem, the input is a graph $G$ along with a weight function $w$ associating a rational weight with each vertex, and the task is to find a smallest weight vertex subset $S$ in $G$ such that $G - S$ is bipartite; the weight of $S$, $w(S) = \sum_{v\in S} w(v)$. We show that Odd Cycle Transversal is polynomial-time solvable on graphs excluding $P_5$ (a path on five vertices) as an induced subgraph. The problem was previously known to be polynomial-time solvable on $P_4$-free graphs and NP-hard on $P_6$-free graphs [Dabrowski, Feghali, Johnson, Paesani, Paulusma and Rzążewski, Algorithmica 2020]. Bonamy, Dabrowski, Feghali, Johnson and Paulusma [Algorithmica 2019] posed the existence of a polynomial-time algorithm on $P_5$-free graphs as an open problem, this was later re-stated by Rzążewski [Dagstuhl Reports, 9(6): 2019] and by Chudnovsky, King, Pilipczuk, Rzążewski, and Spirkl [SIDMA 2021], who gave an algorithm with running time $n^{O(\sqrt{n})}$. △ Less

Submitted 18 February, 2024; originally announced February 2024.

MSC Class: 68Q25; 05C85 ACM Class: F.2

arXiv:2402.08885 [pdf, other]

Three-dimensional, multi-wavelength beam formation with integrated metasurface optics for Sr laser cooling

Authors: Sindhu Jammi, Andrew R. Ferdinand, Zheng Luo, Zachary L. Newman, Gregory Spektor, Junyeob Song, Okan Koksal, Akash V. Rakholia, William Lunden, Daniel Sheredy, Parth B. Patel, Martin M. Boyd, Wenqi Zhu, Amit Agrawal, Travis C. Briles, Scott B. Papp

Abstract: We demonstrate the formation of a complex, multi-wavelength, three-dimensional laser beam configuration with integrated metasurface optics. Our experiments support the development of a compact Sr optical-lattice clock, which leverages magneto-optical trapping on atomic transitions at 461 nm and 689 nm without bulk free-space optics. We integrate six, mm-scale metasurface optics on a fused-silica s… ▽ More We demonstrate the formation of a complex, multi-wavelength, three-dimensional laser beam configuration with integrated metasurface optics. Our experiments support the development of a compact Sr optical-lattice clock, which leverages magneto-optical trapping on atomic transitions at 461 nm and 689 nm without bulk free-space optics. We integrate six, mm-scale metasurface optics on a fused-silica substrate and illuminate them with light from optical fibers. The metasurface optics provide full control of beam pointing, divergence, and polarization to create the laser configuration for a magneto-optical trap. We report the efficiency and integration of the three-dimensional visible laser beam configuration, demonstrating the suitability of metasurface optics for atomic laser cooling. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 5 pages, 3 figures

arXiv:2402.06895 [pdf, other]

Bouncing Scenario and Cosmic Dynamics in Modified Theories of Gravity

Authors: A. S. Agrawal

Abstract: The main objective of this study is to investigate the phenomenon of the bouncing scenario of the universe. The most widely recognized cosmological framework is the standard cosmological model, sometimes referred to as the Big Bang model. This is mainly because of its inherent properties and its consistent alignment with recent observational studies. However, the standard cosmological model faces… ▽ More The main objective of this study is to investigate the phenomenon of the bouncing scenario of the universe. The most widely recognized cosmological framework is the standard cosmological model, sometimes referred to as the Big Bang model. This is mainly because of its inherent properties and its consistent alignment with recent observational studies. However, the standard cosmological model faces some challenges concerning the physical conditions at the initial epochs. Some of these issues include the initial singularity problem, flatness problem, horizon problem, etc. Some of these challenges could potentially be addressed by incorporating the inflationary scenario into the cosmological framework of the universe. However, the inflationary mechanism is not able to tackle the occurrence of the initial singularity. The bouncing cosmology offers a probable solution to this initial singularity issue. In addition, it is capable of addressing some other issues that may arise during the early stages. Hence, in the modified gravity theory, bounce cosmology has been discussed. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: Ph.D. Thesis. The thesis contains the publications: Gravitation and Cosmology 29 (3), 294-304, Fortschritte der Physik 70 (1), 2100065, The European Physical Journal C 83 (113), Physics of the Dark Universe 33, 100863, The European Physical Journal C 84 (1), 56

arXiv:2402.05983 [pdf, other]

Capability enhancement of the X-ray micro-tomography system via ML-assisted approaches

Authors: Dhruvi Shah, Shruti Mehta, Ashish Agrawal, Shishir Purohit, Bhaskar Chaudhury

Abstract: Ring artifacts in X-ray micro-CT images are one of the primary causes of concern in their accurate visual interpretation and quantitative analysis. The geometry of X-ray micro-CT scanners is similar to the medical CT machines, except the sample is rotated with a stationary source and detector. The ring artifacts are caused by a defect or non-linear responses in detector pixels during the MicroCT d… ▽ More Ring artifacts in X-ray micro-CT images are one of the primary causes of concern in their accurate visual interpretation and quantitative analysis. The geometry of X-ray micro-CT scanners is similar to the medical CT machines, except the sample is rotated with a stationary source and detector. The ring artifacts are caused by a defect or non-linear responses in detector pixels during the MicroCT data acquisition. Artifacts in MicroCT images can often be so severe that the images are no longer useful for further analysis. Therefore, it is essential to comprehend the causes of artifacts and potential solutions to maximize image quality. This article presents a convolution neural network (CNN)-based Deep Learning (DL) model inspired by UNet with a series of encoder and decoder units with skip connections for removal of ring artifacts. The proposed architecture has been evaluated using the Structural Similarity Index Measure (SSIM) and Mean Squared Error (MSE). Additionally, the results are compared with conventional filter-based non-ML techniques and are found to be better than the latter. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05127 [pdf]

Illuminate: A novel approach for depression detection with explainable analysis and proactive therapy using prompt engineering

Authors: Aryan Agrawal

Abstract: This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini. These LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression. A unique few-shot prompting method enhances the models' ability to analyze and… ▽ More This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini. These LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression. A unique few-shot prompting method enhances the models' ability to analyze and explain depressive symptoms based on the DSM-5 criteria. In the interaction phase, the models engage in empathetic dialogue management, drawing from resources like PsychDB and a Cognitive Behavioral Therapy (CBT) Guide, fostering supportive interactions with individuals experiencing major depressive disorders. Additionally, the research introduces the Illuminate Database, enriched with various CBT modules, aiding in personalized therapy recommendations. The study evaluates LLM performance using metrics such as F1 scores, Precision, Recall, Cosine similarity, and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) across different test sets, demonstrating their effectiveness. This comprehensive approach blends cutting-edge AI with established psychological methods, offering new possibilities in mental health care and showcasing the potential of LLMs in revolutionizing depression diagnosis and treatment strategies. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 10 pages, 9 figures, 9 tables

arXiv:2402.02080 [pdf, other]

Translation Errors Significantly Impact Low-Resource Languages in Cross-Lingual Learning

Authors: Ashish Sunil Agrawal, Barah Fazili, Preethi Jyothi

Abstract: Popular benchmarks (e.g., XNLI) used to evaluate cross-lingual language understanding consist of parallel versions of English evaluation sets in multiple target languages created with the help of professional translators. When creating such parallel data, it is critical to ensure high-quality translations for all target languages for an accurate characterization of cross-lingual transfer. In this… ▽ More Popular benchmarks (e.g., XNLI) used to evaluate cross-lingual language understanding consist of parallel versions of English evaluation sets in multiple target languages created with the help of professional translators. When creating such parallel data, it is critical to ensure high-quality translations for all target languages for an accurate characterization of cross-lingual transfer. In this work, we find that translation inconsistencies do exist and interestingly they disproportionally impact low-resource languages in XNLI. To identify such inconsistencies, we propose measuring the gap in performance between zero-shot evaluations on the human-translated and machine-translated target text across multiple target languages; relatively large gaps are indicative of translation errors. We also corroborate that translation errors exist for two target languages, namely Hindi and Urdu, by doing a manual reannotation of human-translated test instances in these two languages and finding poor agreement with the original English labels these instances were supposed to inherit. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: Accepted to main proceedings of "The 18th Conference of the European Chapter of the Association for Computational Linguistics"

Showing 1–50 of 379 results for author: Agrawal, A