subscribe to arXiv mailings

arXiv:2407.07666 [pdf]

A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

Authors: Ting Fang Tan, Kabilan Elangovan, Jasmine Ong, Nigam Shah, Joseph Sung, Tien Yin Wong, Lan Xue, Nan Liu, Haibo Wang, Chang Fu Kuo, Simon Chesterman, Zee Kin Yeong, Daniel SW Ting

Abstract: A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models… ▽ More A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07562 [pdf, other]

Transforming qubits via quasi-geometric approaches

Authors: Nyirahafashimana Valentine, Nurisya Mohd Shah, Umair Abdul Halim, Sharifah Kartini Said Husain, Ahmed Jellal

Abstract: We develop a theory based on quasi-geometric (QG) approach to transform a small number of qubits into a larger number of error-correcting qubits by considering four different cases. More precisely, we use the 2-dimensional quasi-orthogonal complete complementary codes (2D-QOCCCSs) and quasi-cyclic asymmetric quantum error-correcting codes (AQECCs) via quasigroup and group theory properties. We int… ▽ More We develop a theory based on quasi-geometric (QG) approach to transform a small number of qubits into a larger number of error-correcting qubits by considering four different cases. More precisely, we use the 2-dimensional quasi-orthogonal complete complementary codes (2D-QOCCCSs) and quasi-cyclic asymmetric quantum error-correcting codes (AQECCs) via quasigroup and group theory properties. We integrate the Pauli $X$-gate to detect and correct errors, as well as the Hadamard $H$-gate to superpose the initial and final qubits in the quantum circuit diagram. We compare the numerical results to analyze the success, consistency, and performance of the corrected errors through bar graphs for 2D-QOCCCs and AQECCs according to their characteristics. The difficulty in generating additional sets of results and counts for AQECCs arises because mapping a smaller initial number of qubits to a larger final number is necessary to correct more errors. For AQECCs, the number of errors that can be corrected must be equal to or less than the initial number of qubits. High error correction performance is observed when mapping 1-qubit state to 29-qubits to correct 5 errors using 2D-QOCCCs. Similarly, transforming 1-qubit to 13-qubits using AQECCs also shows high performance, successfully correcting 2 errors. The results show that our theory has the advantage of providing a basis for refining and optimizing these codes in future quantum computing applications due to its high performance in error correction. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 24 pages, 20 figures, 10 tables

arXiv:2407.04935 [pdf, ps, other]

Equidistribution of polynomially bounded o-minimal curves in homogeneous spaces

Authors: Michael Bersudsky, Nimish A. Shah, Hao Xing

Abstract: We extend Ratner's theorem on equidistribution of individual orbits of unipotent flows on finite volume homogeneous spaces of Lie groups to trajectories of non-contracting curves definable in polynomially bounded o-minimal structures. To be precise, let $\varphi:[0,\infty)\to \text{SL}(n,\mathbb R)$ be a continuous map whose coordinate functions are definable in a polynomially bounded o-minimal… ▽ More We extend Ratner's theorem on equidistribution of individual orbits of unipotent flows on finite volume homogeneous spaces of Lie groups to trajectories of non-contracting curves definable in polynomially bounded o-minimal structures. To be precise, let $\varphi:[0,\infty)\to \text{SL}(n,\mathbb R)$ be a continuous map whose coordinate functions are definable in a polynomially bounded o-minimal structure; for example, rational functions. Suppose that $\varphi$ is non-contracting; that is, for any linearly independent vectors $v_1,\ldots,v_k$ in $\mathbb R^n$, $\varphi(t).(v_1\wedge\cdots\wedge v_k)\not\to0$ as $t\to\infty$. Then, there exists a unique smallest subgroup $H_\varphi$ of $\text{SL}(n,\mathbb R)$ generated by unipotent one-parameter subgroups such that $\varphi(t)H_\varphi\to g_0H_\varphi$ in $\text{SL}(n,\mathbb R)/H_\varphi$ as $t\to\infty$ for some $g_0\in \text{SL}(n,\mathbb R)$. Let $G$ be a closed subgroup of $\text{SL}(n,\mathbb R)$ and $Γ$ be a lattice in $G$. Suppose that $\varphi([0,\infty))\subset G$. Then $H_\varphi\subset G$, and for any $x\in G/Γ$, the trajectory $\{\varphi(t)x:t\in [0,T]\}$ gets equidistributed with respect to the measure $g_0μ_{Lx}$ as $T\to\infty$, where $L$ is a closed subgroup of $G$ such that $\overline{Hx}=Lx$ and $Lx$ admits a unique $L$-invariant probability measure, denoted by $μ_{Lx}$. A crucial new ingredient in this work is proving that for any finite-dimensional representation $V$ of $\text{SL}(n,\mathbb R)$, there exist $T_0>0$, $C>0$, and $α>0$ such that for any $v\in G$, the map $t\mapsto \|\varphi(t)v\|$ is $(C,α)$-good on $[T_0,\infty)$. △ Less

Submitted 5 July, 2024; originally announced July 2024.

MSC Class: 03C64; 37A17

arXiv:2407.00541 [pdf]

Answering real-world clinical questions using large language model based systems

Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

arXiv:2406.16321 [pdf, other]

Multimodal Graph Benchmark

Authors: Jing Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

Abstract: Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and v… ▽ More Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and visual information. MM-GRAPH surpasses previous efforts, which have primarily focused on text-attributed graphs with various connectivity patterns. MM-GRAPH consists of five graph learning datasets of various scales that are appropriate for different learning tasks. Their multimodal node features, enabling a more comprehensive evaluation of graph learning algorithms in real-world scenarios. To facilitate research on multimodal graph learning, we further provide an extensive study on the performance of various graph neural networks in the presence of features from various modalities. MM-GRAPH aims to foster research on multimodal graph learning and drive the development of more advanced and robust graph learning algorithms. By providing a diverse set of datasets and benchmarks, MM-GRAPH enables researchers to evaluate and compare their models in realistic settings, ultimately leading to improved performance on real-world applications that rely on multimodal graph data. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: https://mm-graph-benchmark.github.io/

arXiv:2406.15405 [pdf, other]

What is Best for Students, Numerical Scores or Letter Grades?

Authors: Evi Micha, Shreyas Sekar, Nisarg Shah

Abstract: We study letter grading schemes, which are routinely employed for evaluating student performance. Typically, a numerical score obtained via one or more evaluations is converted into a letter grade (e.g., A+, B-, etc.) by associating a disjoint interval of numerical scores to each letter grade. We propose the first model for studying the (de)motivational effects of such grading on the students an… ▽ More We study letter grading schemes, which are routinely employed for evaluating student performance. Typically, a numerical score obtained via one or more evaluations is converted into a letter grade (e.g., A+, B-, etc.) by associating a disjoint interval of numerical scores to each letter grade. We propose the first model for studying the (de)motivational effects of such grading on the students and, consequently, on their performance in future evaluations. We use the model to compare uniform letter grading schemes, in which the range of scores is divided into equal-length parts that are mapped to the letter grades, to numerical scoring, in which the score is not converted to any letter grade (equivalently, every score is its own letter grade). Theoretically, we identify realistic conditions under which numerical scoring is better than any uniform letter grading scheme. Our experiments confirm that this holds under even weaker conditions, but also find cases where the converse occurs. △ Less

Submitted 10 May, 2024; originally announced June 2024.

Comments: Accepted to IJCAI 2024

arXiv:2406.14770 [pdf, other]

Gravitational Scattering and Beyond from Extreme Mass Ratio Effective Field Theory

Authors: Clifford Cheung, Julio Parra-Martinez, Ira Z. Rothstein, Nabha Shah, Jordan Wilson-Gerow

Abstract: We explore a recently proposed effective field theory describing electromagnetically or gravitationally interacting massive particles in an expansion about their mass ratio, also known as the self-force (SF) expansion. By integrating out the deviation of the heavy particle about its inertial trajectory, we obtain an effective action whose only degrees of freedom are the lighter particle together w… ▽ More We explore a recently proposed effective field theory describing electromagnetically or gravitationally interacting massive particles in an expansion about their mass ratio, also known as the self-force (SF) expansion. By integrating out the deviation of the heavy particle about its inertial trajectory, we obtain an effective action whose only degrees of freedom are the lighter particle together with the photon or graviton, all propagating in a Coulomb or Schwarzschild background. The 0SF dynamics are described by the usual background field method, which at 1SF is supplemented by a "recoil operator" that encodes the wobble of the heavy particle, and similarly computable corrections appearing at 2SF and higher. Our formalism exploits the fact that the analytic expressions for classical backgrounds and particle trajectories encode dynamical information to all orders in the couplings, and from them we extract multiloop integrands for perturbative scattering. As a check, we study the two-loop classical scattering of scalar particles in electromagnetism and gravity, verifying known results. We then present new calculations for the two-loop classical scattering of dyons, and of particles interacting with an additional scalar or vector field coupling directly to the lighter particle but only gravitationally to the heavier particle. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 77 pages, 10 figures

Report number: CALT-TH 2024-023

arXiv:2406.13264 [pdf, other]

Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks

Authors: Michael Wornow, Avanika Narayan, Ben Viggiano, Ishan S. Khare, Tathagat Verma, Tibor Thompson, Miguel Angel Fuentes Hernandez, Sudharsan Sundar, Chloe Trujillo, Krrish Chawla, Rongfei Lu, Justin Shen, Divya Nagaraj, Joshua Martinez, Vardhan Agrawal, Althea Hudson, Nigam H. Shah, Christopher Re

Abstract: Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This f… ▽ More Existing ML benchmarks lack the depth and diversity of annotations needed for evaluating models on business process management (BPM) tasks. BPM is the practice of documenting, measuring, improving, and automating enterprise workflows. However, research has focused almost exclusively on one task - full end-to-end automation using agents based on multimodal foundation models (FMs) like GPT-4. This focus on automation ignores the reality of how most BPM tools are applied today - simply documenting the relevant workflow takes 60% of the time of the typical process optimization project. To address this gap we present WONDERBREAD, the first benchmark for evaluating multimodal FMs on BPM tasks beyond automation. Our contributions are: (1) a dataset containing 2928 documented workflow demonstrations; (2) 6 novel BPM tasks sourced from real-world applications ranging from workflow documentation to knowledge transfer to process improvement; and (3) an automated evaluation harness. Our benchmark shows that while state-of-the-art FMs can automatically generate documentation (e.g. recalling 88% of the steps taken in a video demonstration of a workflow), they struggle to re-apply that knowledge towards finer-grained validation of workflow completion (F1 < 0.3). We hope WONDERBREAD encourages the development of more "human-centered" AI tooling for enterprise applications and furthers the exploration of multimodal FMs for the broader universe of BPM tasks. We publish our dataset and experiments here: https://github.com/HazyResearch/wonderbread △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.08802 [pdf, other]

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

Authors: Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah

Abstract: Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns well with the speakers lip movements given in the reference video even when the spoken text is different or in a different l… ▽ More Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns well with the speakers lip movements given in the reference video even when the spoken text is different or in a different language. To accomplish this, we propose to utilize cross-modal attention techniques in a pre-trained GPT-based TTS. We combine linguistic tokens from text, speaker identity tokens via a voice cloning network, and video tokens via a proposed duration controller network. We demonstrate the effectiveness of our system on the Lip2Wav-Chemistry and LRS2 datasets. Also, the proposed method achieves improved lip sync and naturalness compared to the SOTAs for the same language but different text (i.e., non-parallel) and the different language, different text (i.e., cross-lingual) scenarios. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.08076 [pdf, other]

VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech

Authors: Ashishkumar Gudmalwar, Nirmesh Shah, Sai Akarsh, Pankaj Wasnik, Rajiv Ratn Shah

Abstract: Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. While previous approaches have mainly concentrated on co… ▽ More Despite the significant advancements in Text-to-Speech (TTS) systems, their full utilization in automatic dubbing remains limited. This task necessitates the extraction of voice identity and emotional style from a reference speech in a source language and subsequently transferring them to a target language using cross-lingual TTS techniques. While previous approaches have mainly concentrated on controlling voice identity within the cross-lingual TTS framework, there has been limited work on incorporating emotion and voice identity together. To this end, we introduce an end-to-end Voice Identity and Emotional Style Controllable Cross-Lingual (VECL) TTS system using multilingual speakers and an emotion embedding network. Moreover, we introduce content and style consistency losses to enhance the quality of synthesized speech further. The proposed system achieved an average relative improvement of 8.83\% compared to the state-of-the-art (SOTA) methods on a database comprising English and three Indian languages (Hindi, Telugu, and Marathi). △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.06512 [pdf, other]

Merlin: A Vision Language Foundation Model for 3D Computed Tomography

Authors: Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston , et al. (6 additional authors not shown)

Abstract: Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la… ▽ More Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2406.04744 [pdf, other]

CRAG -- Comprehensive RAG Benchmark

Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4,409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve <=34% accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracting thousands of participants and submissions within the first 50 days of the competition. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04106 [pdf, other]

Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

Authors: Agostina Calabrese, Leonardo Neves, Neil Shah, Maarten W. Bos, Björn Ross, Mirella Lapata, Francesco Barbieri

Abstract: Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improv… ▽ More Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages, 14 figures, to be published at ACL 2024

arXiv:2405.19385 [pdf, other]

Listening to Quantum Gravity?

Authors: Lawrence M. Krauss, Francesco Marino, Samuel L. Braunstein, Mir Faizal, Naveed A. Shah

Abstract: Recent experimental progresses in controlling classical and quantum fluids have made it possible to realize acoustic analogues of gravitational black holes, where a flowing fluid provides an effective spacetime on which sound waves propagate, demonstrating Hawking-like radiation and Penrose superradiance. We propose the exciting possibility that new hydrodynamic systems might provide insights to h… ▽ More Recent experimental progresses in controlling classical and quantum fluids have made it possible to realize acoustic analogues of gravitational black holes, where a flowing fluid provides an effective spacetime on which sound waves propagate, demonstrating Hawking-like radiation and Penrose superradiance. We propose the exciting possibility that new hydrodynamic systems might provide insights to help resolve mysteries associated with quantum gravity, including the black hole information-loss paradox and the removal of spacetime singularities. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 16 pages; Essay received an honorable mention in the Gravity Research Foundation Essay Competition 2024. arXiv admin note: text overlap with arXiv:2402.16136

arXiv:2405.14645 [pdf, other]

Lagrangian Neural Networks for Reversible Dissipative Evolution

Authors: Veera Sundararaghavan, Megna N. Shah, Jeff P. Simmons

Abstract: There is a growing attention given to utilizing Lagrangian and Hamiltonian mechanics with network training in order to incorporate physics into the network. Most commonly, conservative systems are modeled, in which there are no frictional losses, so the system may be run forward and backward in time without requiring regularization. This work addresses systems in which the reverse direction is ill… ▽ More There is a growing attention given to utilizing Lagrangian and Hamiltonian mechanics with network training in order to incorporate physics into the network. Most commonly, conservative systems are modeled, in which there are no frictional losses, so the system may be run forward and backward in time without requiring regularization. This work addresses systems in which the reverse direction is ill-posed because of the dissipation that occurs in forward evolution. The novelty is the use of Morse-Feshbach Lagrangian, which models dissipative dynamics by doubling the number of dimensions of the system in order to create a mirror latent representation that would counterbalance the dissipation of the observable system, making it a conservative system, albeit embedded in a larger space. We start with their formal approach by redefining a new Dissipative Lagrangian, such that the unknown matrices in the Euler-Lagrange's equations arise as partial derivatives of the Lagrangian with respect to only the observables. We then train a network from simulated training data for dissipative systems such as Fickian diffusion that arise in materials sciences. It is shown by experiments that the systems can be evolved in both forward and reverse directions without regularization beyond that provided by the Morse-Feshbach Lagrangian. Experiments of dissipative systems, such as Fickian diffusion, demonstrate the degree to which dynamics can be reversed. △ Less

Submitted 26 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06664 [pdf, other]

A categorical account of composition methods in logic (extended version)

Authors: Tomáš Jakl, Dan Marsden, Nihil Shah

Abstract: We present a categorical theory of the composition methods in finite model theory -- a key technique enabling modular reasoning about complex structures by building them out of simpler components. The crucial results required by the composition methods are Feferman--Vaught--Mostowski (FVM) type theorems, which characterize how logical equivalence behaves under composition and transformation of mod… ▽ More We present a categorical theory of the composition methods in finite model theory -- a key technique enabling modular reasoning about complex structures by building them out of simpler components. The crucial results required by the composition methods are Feferman--Vaught--Mostowski (FVM) type theorems, which characterize how logical equivalence behaves under composition and transformation of models. Our results are developed by extending the recently introduced game comonad semantics for model comparison games. This level of abstraction allow us to give conditions yielding FVM type results in a uniform way. Our theorems are parametric in the classes of models, logics and operations involved. Furthermore, they naturally account for the existential and positive existential fragments, and extensions with counting quantifiers of these logics. We also reveal surprising connections between FVM type theorems, and classical concepts in the theory of monads. We illustrate our methods by recovering many classical theorems of practical interest, including a refinement of a previous result by Dawar, Severini, and Zapata concerning the 3-variable counting logic and cospectrality. To highlight the importance of our techniques being parametric in the logic of interest, we prove a family of FVM theorems for products of structures, uniformly in the logic in question, which cannot be done using specific game arguments. This is an extended version of the LiCS 2023 conference paper of the same name. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: This is an extended version of arXiv:2304.10196 which, apart from providing full proofs of all statements, takes a more categorical point of view to tell the whole story. In particular, we highlight and explain the underlying categorical constructions in detail

arXiv:2405.06563 [pdf, other]

What Can Natural Language Processing Do for Peer Review?

Authors: Ilia Kuznetsov, Osama Mohammed Afzal, Koen Dercksen, Nils Dycke, Alexander Goldberg, Tom Hope, Dirk Hovy, Jonathan K. Kummerfeld, Anne Lauscher, Kevin Leyton-Brown, Sheng Lu, Mausam, Margot Mieskes, Aurélie Névéol, Danish Pruthi, Lizhen Qu, Roy Schwartz, Noah A. Smith, Thamar Solorio, Jingyan Wang, Xiaodan Zhu, Anna Rogers, Nihar B. Shah, Iryna Gurevych

Abstract: The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time… ▽ More The number of scientific articles produced every year is growing rapidly. Providing quality control over them is crucial for scientists and, ultimately, for the public good. In modern science, this process is largely delegated to peer review -- a distributed procedure in which each submission is evaluated by several independent experts in the field. Peer review is widely used, yet it is hard, time-consuming, and prone to error. Since the artifacts involved in peer review -- manuscripts, reviews, discussions -- are largely text-based, Natural Language Processing has great potential to improve reviewing. As the emergence of large language models (LLMs) has enabled NLP assistance for many new tasks, the discussion on machine-assisted peer review is picking up the pace. Yet, where exactly is help needed, where can NLP help, and where should it stand aside? The goal of our paper is to provide a foundation for the future efforts in NLP for peer-reviewing assistance. We discuss peer review as a general process, exemplified by reviewing at AI conferences. We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance, illustrated by existing work. We then turn to the big challenges in NLP for peer review as a whole, including data acquisition and licensing, operationalization and experimentation, and ethical issues. To help consolidate community efforts, we create a companion repository that aggregates key datasets pertaining to peer review. Finally, we issue a detailed call for action for the scientific community, NLP and AI researchers, policymakers, and funding bodies to help bring the research in NLP for peer review forward. We hope that our work will help set the agenda for research in machine-assisted scientific quality control in the age of AI, within the NLP community and beyond. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.03710 [pdf, other]

Automating the Enterprise with Foundation Models

Authors: Michael Wornow, Avanika Narayan, Krista Opsahl-Ong, Quinn McIntyre, Nigam H. Shah, Christopher Re

Abstract: Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workfl… ▽ More Automating enterprise workflows could unlock $4 trillion/year in productivity gains. Despite being of interest to the data management community for decades, the ultimate vision of end-to-end workflow automation has remained elusive. Current solutions rely on process mining and robotic process automation (RPA), in which a bot is hard-coded to follow a set of predefined rules for completing a workflow. Through case studies of a hospital and large B2B enterprise, we find that the adoption of RPA has been inhibited by high set-up costs (12-18 months), unreliable execution (60% initial accuracy), and burdensome maintenance (requiring multiple FTEs). Multimodal foundation models (FMs) such as GPT-4 offer a promising new approach for end-to-end workflow automation given their generalized reasoning and planning abilities. To study these capabilities we propose ECLAIR, a system to automate enterprise workflows with minimal human supervision. We conduct initial experiments showing that multimodal FMs can address the limitations of traditional RPA with (1) near-human-level understanding of workflows (93% accuracy on a workflow understanding task) and (2) instant set-up with minimal technical barrier (based solely on a natural language description of a workflow, ECLAIR achieves end-to-end completion rates of 40%). We identify human-AI collaboration, validation, and self-improvement as open challenges, and suggest ways they can be solved with data management techniques. Code is available at: https://github.com/HazyResearch/eclair-agents △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.08660 [pdf, other]

How Does Message Passing Improve Collaborative Filtering?

Authors: Mingxuan Ju, William Shiao, Zhichun Guo, Yanfang Ye, Yozen Liu, Neil Shah, Tong Zhao

Abstract: Collaborative filtering (CF) has exhibited prominent results for recommender systems and been broadly utilized for real-world applications. A branch of research enhances CF methods by message passing used in graph neural networks, due to its strong capabilities of extracting knowledge from graph-structured data, like user-item bipartite graphs that naturally exist in CF. They assume that message p… ▽ More Collaborative filtering (CF) has exhibited prominent results for recommender systems and been broadly utilized for real-world applications. A branch of research enhances CF methods by message passing used in graph neural networks, due to its strong capabilities of extracting knowledge from graph-structured data, like user-item bipartite graphs that naturally exist in CF. They assume that message passing helps CF methods in a manner akin to its benefits for graph-based learning tasks in general. However, even though message passing empirically improves CF, whether or not this assumption is correct still needs verification. To address this gap, we formally investigate why message passing helps CF from multiple perspectives and show that many assumptions made by previous works are not entirely accurate. With our curated ablation studies and theoretical analyses, we discover that (1) message passing improves the CF performance primarily by additional representations passed from neighbors during the forward pass instead of additional gradient updates to neighbor representations during the model back-propagation and (ii) message passing usually helps low-degree nodes more than high-degree nodes. Utilizing these novel findings, we present Test-time Aggregation for CF, namely TAG-CF, a test-time augmentation framework that only conducts message passing once at inference time. The key novelty of TAG-CF is that it effectively utilizes graph knowledge while circumventing most of notorious computational overheads of message passing. Besides, TAG-CF is extremely versatile can be used as a plug-and-play module to enhance representations trained by different CF supervision signals. Evaluated on six datasets, TAG-CF consistently improves the recommendation performance of CF methods without graph by up to 39.2% on cold users and 31.7% on all users, with little to no extra computational overheads. △ Less

Submitted 27 March, 2024; originally announced April 2024.

arXiv:2404.03551 [pdf, other]

Streamlining CXL Adoption for Hyperscale Efficiency

Authors: Angelos Arelakis, Nilesh Shah, Yiannis Nikolakopoulos, Dimitrios Palyvos-Giannas

Abstract: In our exploration of Composable Memory systems utilizing CXL, we focus on overcoming adoption barriers at Hyperscale, underscored by economic models demonstrating Total Cost of Ownership (TCO). While CXL addresses the pressing memory capacity needs of emerging Hyperscale applications, the escalating demands from evolving use cases such as AI outpace the capabilities of current CXL solutions. Hype… ▽ More In our exploration of Composable Memory systems utilizing CXL, we focus on overcoming adoption barriers at Hyperscale, underscored by economic models demonstrating Total Cost of Ownership (TCO). While CXL addresses the pressing memory capacity needs of emerging Hyperscale applications, the escalating demands from evolving use cases such as AI outpace the capabilities of current CXL solutions. Hyperscalers resort to software-based memory (de)compression technology, alleviating memory capacity, storage, and network constraints but incurring a notable "Tax" on Compute CPU cycles. As a pivotal guide to the CXL community, Hyperscalers have formulated the groundbreaking Open Compute Project (OCP) Hyperscale CXL Tiered Memory Expander specification. If implemented, this specification lowers TCO adoption barriers, enabling diverse CXL deployments at both Hyperscaler and Enterprise levels. We present a CXL integrated solution, aligning with the aforementioned specification, introducing an energy-efficient, scalable, hardware-accelerated, Lossless Compressed Memory CXL Tier. This solution, slated for mid-2024 production and open for integration with Memory Expander controller manufacturers, offers 2-3X CXL memory compression in nanoseconds, delivering a 20-25% reduction in TCO for end customers without requiring additional physical slots. In our discussion, we pinpoint areas for collaborative innovation within the CXL Community to expedite software/hardware advancements for CXL Tiered Memory Expansion. Furthermore, we delve into unresolved challenges in Pooled deployment and explore potential solutions, collectively aiming to make CXL adoption a "No Brainer" at Hyperscale. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Presented at the 3rd Workshop on Heterogeneous Composable and Disaggregated Systems (HCDS 2024)

arXiv:2404.01244 [pdf, other]

Searching for enhancement in coalescence of in-jet (anti-)deuterons in proton-proton collisions

Authors: Yoshini Bailung, Neha Shah, Ankhi Roy

Abstract: Recent measurements from ALICE report that $``$in-jet'' nucleons carry a higher probability of forming a deuteron via coalescence than the nucleons from the underlying event (UE). This study makes use of an event shape classifier to separate the $``$in-jet'' deuterons and the deuterons in the UE produced in high multiplicity proton-proton collisions at $\sqrt{s} = 13$ TeV. Event shape variables su… ▽ More Recent measurements from ALICE report that $``$in-jet'' nucleons carry a higher probability of forming a deuteron via coalescence than the nucleons from the underlying event (UE). This study makes use of an event shape classifier to separate the $``$in-jet'' deuterons and the deuterons in the UE produced in high multiplicity proton-proton collisions at $\sqrt{s} = 13$ TeV. Event shape variables such as transverse spherocity allow the categorization of hard and soft components of an event, which can be divided into two respective classes; $``$jetty'' and $``$isotropic''. The $``$jetty'' deuterons minus the contribution of the deuterons from the $``$isotropic'' event are taken as $``$in-jet'' deuterons, and the coalescence mechanism is tested. The coalescence is performed with a Wigner function formalism, augmented as an afterburner to \textsc{pythia}8. The possible enhancement of the coalescence probability of $``$in-jet'' deuterons is investigated by calculating the coalescence parameter ($B_{2}$) in different spherocity classes in high-multiplicity $pp$ collisions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 11 pages, 9 figures, To appear in Physical Review C

arXiv:2404.00808 [pdf, other]

Using Explainable AI and Hierarchical Planning for Outreach with Robots

Authors: Daksh Dobhal, Jayesh Nagpal, Rushang Karia, Pulkit Verma, Rashmeet Kaur Nayyar, Naman Shah, Siddharth Srivastava

Abstract: Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts the complexities of robot planning can be challenging. This work presents an open-source platform that simplifies the process using a visual interface that completely abstracts the complex internals of hierarchical planning that robot… ▽ More Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts the complexities of robot planning can be challenging. This work presents an open-source platform that simplifies the process using a visual interface that completely abstracts the complex internals of hierarchical planning that robots use for performing task and motion planning. Using the principles developed in the field of explainable AI, this intuitive platform enables users to create plans for robots to complete tasks, and provides helpful hints and natural language explanations for errors. The platform also has a built-in simulator to demonstrate how robots execute submitted plans. This platform's efficacy was tested in a user study on university students with little to no computer science background. Our results show that this platform is highly effective in teaching novice users the intuitions of robot task planning. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.18280 [pdf, other]

Improving Out-of-Vocabulary Handling in Recommendation Systems

Authors: William Shiao, Mingxuan Ju, Zhichun Guo, Xin Chen, Evangelos Papalexakis, Tong Zhao, Neil Shah, Yozen Liu

Abstract: Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommendin… ▽ More Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 11 pages, 6 figures

arXiv:2403.15469 [pdf, other]

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Authors: Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

Abstract: Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subseque… ▽ More Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subsequent to the dubbing process. Previous approaches have focused on aligning the number of characters and words in the source and target language texts of Machine Translation models. However, our approach aims to align the number of phonemes instead, as they are closely associated with speech duration. In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs. To evaluate our models, we propose the Phoneme Count Compliance (PCC) score, which is a measure of length compliance. Our approach demonstrates a substantial improvement of approximately 36% in the PCC score compared to the state-of-the-art models when applied to English-Hindi language pairs. Moreover, we propose a student-teacher architecture within the framework of our RL approach to maintain a trade-off between the phoneme count and translation quality. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted in NAACL2024 Findings

arXiv:2403.15046 [pdf, other]

Benchmark Lines and Planes for Higgs-to-Higgs Decays in the NMSSM

Authors: Ulrich Ellwanger, Margarete Muehlleitner, Nikolaos Rompotis, Nausheen R. Shah, Daniel Winterbottom

Abstract: A number of benchmark scenarios for NMSSM Higgs boson searches via Higgs-to-Higgs decays at the LHC have been proposed by the NMSSM Subgroup of the LHC HWG3. Some of them are already in use by the ATLAS and CMS collaborations for the interpretation of their results from Run 2. In this document we summarize the theory setup, the underlying procedures and reproduce the benchmark scenarios in table f… ▽ More A number of benchmark scenarios for NMSSM Higgs boson searches via Higgs-to-Higgs decays at the LHC have been proposed by the NMSSM Subgroup of the LHC HWG3. Some of them are already in use by the ATLAS and CMS collaborations for the interpretation of their results from Run 2. In this document we summarize the theory setup, the underlying procedures and reproduce the benchmark scenarios in table form. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 11 Pages, 1 Figure, 6 Tables

Report number: LHCHWG-2024-002

arXiv:2403.11640 [pdf, other]

Quasinormal Modes of Near-Extremal Electric and Magnetic Black Branes

Authors: Swapnil Nitin Shah

Abstract: Gauge-gravity duality provides a robust mathematical framework for studying the behavior of strongly coupled non-abelian plasmas both near and far away from thermodynamic equilibrium. In particular, their near-equilibrium transport coefficients such as viscosity, conductivity, diffusion constants, etc. can be determined from poles of the retarded Green's function which are the dissipative eigenmod… ▽ More Gauge-gravity duality provides a robust mathematical framework for studying the behavior of strongly coupled non-abelian plasmas both near and far away from thermodynamic equilibrium. In particular, their near-equilibrium transport coefficients such as viscosity, conductivity, diffusion constants, etc. can be determined from poles of the retarded Green's function which are the dissipative eigenmodes i.e., the quasinormal modes (QNMs) of the dual gravitational field equations. The AdS5/CFT4 correspondence admits the description of a strongly coupled $\mathcal{N}$= 4 Supersymmetric Yang Mills (SYM) plasma at non-zero temperature as a dual AdS5 black brane geometry. We demonstrate the application of pseudospectral methods to solving the dual Einstein field equations using the example of homogenous isotropization in $\mathcal{N}$= 4 SYM plasma far from equilibrium. Using this framework, we also compute the quasinormal modes of electrically (Reissner-Nordstrom) and magnetically charged AdS5 black branes for the case of vanishing spatial momenta. The near-extremal behavior of these QNMs is analyzed for both types of black branes. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 27 pages, 7 figures

arXiv:2403.07911 [pdf]

Standing on FURM ground -- A framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems

Authors: Alison Callahan, Duncan McElfresh, Juan M. Banda, Gabrielle Bunney, Danton Char, Jonathan Chen, Conor K. Corbin, Debadutta Dash, Norman L. Downing, Sneha S. Jain, Nikesh Kotecha, Jonathan Masterson, Michelle M. Mello, Keith Morse, Srikar Nallan, Abby Pandya, Anurang Revri, Aditya Sharma, Christopher Sharp, Rahul Thapa, Michael Wornow, Alaa Youssef, Michael A. Pfeffer, Nigam H. Shah

Abstract: The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge… ▽ More The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge the chasm between AI model development and achievable benefit. To accomplish this, the Data Science team at Stanford Health Care has developed a Testing and Evaluation (T&E) mechanism to identify fair, useful and reliable AI models (FURM) by conducting an ethical review to identify potential value mismatches, simulations to estimate usefulness, financial projections to assess sustainability, as well as analyses to determine IT feasibility, design a deployment strategy, and recommend a prospective monitoring and evaluation plan. We report on FURM assessments done to evaluate six AI guided solutions for potential adoption, spanning clinical and operational settings, each with the potential to impact from several dozen to tens of thousands of patients each year. We describe the assessment process, summarize the six assessments, and share our framework to enable others to conduct similar assessments. Of the six solutions we assessed, two have moved into a planning and implementation phase. Our novel contributions - usefulness estimates by simulation, financial projections to quantify sustainability, and a process to do ethical assessments - as well as their underlying methods and open source tools, are available for other healthcare systems to conduct actionable evaluations of candidate AI solutions. △ Less

Submitted 14 March, 2024; v1 submitted 26 February, 2024; originally announced March 2024.

arXiv:2403.01837 [pdf, other]

Generalized Symmetry in Dynamical Gravity

Authors: Clifford Cheung, Maria Derda, Joon-Hwi Kim, Vinicius Nevoa, Ira Rothstein, Nabha Shah

Abstract: We explore generalized symmetry in the context of nonlinear dynamical gravity. Our basic strategy is to transcribe known results from Yang-Mills theory directly to gravity via the tetrad formalism, which recasts general relativity as a gauge theory of the local Lorentz group. By analogy, we deduce that gravity exhibits a one-form symmetry implemented by an operator $U_α$ labeled by a center elemen… ▽ More We explore generalized symmetry in the context of nonlinear dynamical gravity. Our basic strategy is to transcribe known results from Yang-Mills theory directly to gravity via the tetrad formalism, which recasts general relativity as a gauge theory of the local Lorentz group. By analogy, we deduce that gravity exhibits a one-form symmetry implemented by an operator $U_α$ labeled by a center element $α$ of the Lorentz group and associated with a certain area measured in Planck units. The corresponding charged line operator $W_ρ$ is the holonomy in a spin representation $ρ$, which is the gravitational analog of a Wilson loop. The topological linking of $U_α$ and $W_ρ$ has an elegant physical interpretation from classical gravitation: the former materializes an exotic chiral cosmic string defect whose quantized conical deficit angle is measured by the latter. We verify this claim explicitly in an AdS-Schwarzschild black hole background. Notably, our conclusions imply that the standard model exhibits a new symmetry of nature at scales below the lightest neutrino mass. More generally, the absence of global symmetries in quantum gravity suggests that the gravitational one-form symmetry is either gauged or explicitly broken. The latter mandates the existence of fermions. Finally, we comment on generalizations to magnetic higher-form or higher-group gravitational symmetries. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 60 pages, 13 figures

Report number: CALT-TH 2024-009

arXiv:2403.01015 [pdf, other]

A Randomized Controlled Trial on Anonymizing Reviewers to Each Other in Peer Review Discussions

Authors: Charvi Rastogi, Xiangchen Song, Zhijing Jin, Ivan Stelmakh, Hal Daumé III, Kun Zhang, Nihar B. Shah

Abstract: Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two… ▽ More Peer review often involves reviewers submitting their independent reviews, followed by a discussion among reviewers of each paper. A question among policymakers is whether the reviewers of a paper should be anonymous to each other during the discussion. We shed light on this by conducting a randomized controlled trial at the UAI 2022 conference. We randomly split the reviewers and papers into two conditions--one with anonymous discussions and the other with non-anonymous discussions, and conduct an anonymous survey of all reviewers, to address the following questions: 1. Do reviewers discuss more in one of the conditions? Marginally more in anonymous (n = 2281, p = 0.051). 2. Does seniority have more influence on final decisions when non-anonymous? Yes, the decisions are closer to senior reviewers' scores in the non-anonymous condition than in anonymous (n = 484, p = 0.04). 3. Are reviewers more polite in one of the conditions? No significant difference in politeness of reviewers' text-based responses (n = 1125, p = 0.72). 4. Do reviewers' self-reported experiences differ across the two conditions? No significant difference for each of the five questions asked (n = 132 and p > 0.3). 5. Do reviewers prefer one condition over the other? Yes, there is a weak preference for anonymous discussions (n = 159 and Cohen's d= 0.25). 6. What do reviewers consider important to make policy on anonymity among reviewers? Reviewers' feeling of safety in expressing their opinions was rated most important, while polite communication among reviewers was rated least important (n = 159). 7. Have reviewers experienced dishonest behavior due to non-anonymity in discussions? Yes, roughly 7% of respondents answered affirmatively (n = 167). Overall, this experiment reveals evidence supporting an anonymous discussion setup in the peer-review process, in terms of the evaluation criteria considered. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 18 pages, 4 figures, 3 tables

arXiv:2402.16136 [pdf, other]

doi 10.1038/s42254-023-00630-y

Analogue simulations of quantum gravity with fluids

Authors: Samuel L. Braunstein, Mir Faizal, Lawrence M. Krauss, Francesco Marino, Naveed A. Shah

Abstract: The recent technological advances in controlling and manipulating fluids have enabled the experimental realization of acoustic analogues of gravitational black holes. A flowing fluid provides an effective curved spacetime on which sound waves can propagate, allowing the simulation of gravitational geometries and related phenomena. The last decade has witnessed a variety of hydrodynamic experiments… ▽ More The recent technological advances in controlling and manipulating fluids have enabled the experimental realization of acoustic analogues of gravitational black holes. A flowing fluid provides an effective curved spacetime on which sound waves can propagate, allowing the simulation of gravitational geometries and related phenomena. The last decade has witnessed a variety of hydrodynamic experiments testing disparate aspects of black hole physics culminating in the recent experimental evidence of Hawking radiation and Penrose superradiance. In this Perspective, we discuss the potential use of analogue hydrodynamic systems beyond classical general relativity towards the exploration of quantum gravitational effects. These include possible insights into the information-loss paradox, black hole physics with Planck-scale quantum corrections, emergent gravity scenarios and the regularization of curvature singularities. We aim at bridging the gap between the non-overlapping communities of experimentalists working with classical and quantum fluids and quantum-gravity theorists, illustrating the opportunities made possible by the latest experimental and theoretical developments in these important areas of research △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted version in Nature Reviews Physics. A view-only version of the manuscript with edited figures and additional references can be accessed at the link https://rdcu.be/dj6CN

Journal ref: Nature Reviews Physics 5, 612-622 (2023)

arXiv:2402.11871 [pdf, other]

From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw Data

Authors: Naman Shah, Jayesh Nagpal, Pulkit Verma, Siddharth Srivastava

Abstract: Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Re… ▽ More Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Removing this dependency on human intuition is a highly active research area. This paper presents the first approach for autonomously learning generalizable, logic-based relational representations for abstract states and actions starting from unannotated high-dimensional, real-valued robot trajectories. The learned representations constitute auto-invented PDDL-like domain models. Empirical results in deterministic settings show that powerful abstract representations can be learned from just a handful of robot trajectories; the learned relational representations include but go beyond classical, intuitive notions of high-level actions; and that the learned models allow planning algorithms to scale to tasks that were previously beyond the scope of planning without hand-crafted abstractions. △ Less

Submitted 4 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.09711 [pdf, other]

Node Duplication Improves Cold-start Link Prediction

Authors: Zhichun Guo, Tong Zhao, Yozen Liu, Kaiwen Dong, William Shiao, Neil Shah, Nitesh V. Chawla

Abstract: Graph Neural Networks (GNNs) are prominent in graph machine learning and have shown state-of-the-art performance in Link Prediction (LP) tasks. Nonetheless, recent studies show that GNNs struggle to produce good results on low-degree nodes despite their overall strong performance. In practical applications of LP, like recommendation systems, improving performance on low-degree nodes is critical, a… ▽ More Graph Neural Networks (GNNs) are prominent in graph machine learning and have shown state-of-the-art performance in Link Prediction (LP) tasks. Nonetheless, recent studies show that GNNs struggle to produce good results on low-degree nodes despite their overall strong performance. In practical applications of LP, like recommendation systems, improving performance on low-degree nodes is critical, as it amounts to tackling the cold-start problem of improving the experiences of users with few observed interactions. In this paper, we investigate improving GNNs' LP performance on low-degree nodes while preserving their performance on high-degree nodes and propose a simple yet surprisingly effective augmentation technique called NodeDup. Specifically, NodeDup duplicates low-degree nodes and creates links between nodes and their own duplicates before following the standard supervised LP training scheme. By leveraging a ''multi-view'' perspective for low-degree nodes, NodeDup shows significant LP performance improvements on low-degree nodes without compromising any performance on high-degree nodes. Additionally, as a plug-and-play augmentation module, NodeDup can be easily applied to existing GNNs with very light computational cost. Extensive experiments show that NodeDup achieves 38.49%, 13.34%, and 6.76% improvements on isolated, low-degree, and warm nodes, respectively, on average across all datasets compared to GNNs and state-of-the-art cold-start methods. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08170 [pdf, other]

LLaGA: Large Language and Graph Assistant

Authors: Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang

Abstract: Graph Neural Networks (GNNs) have empowered the advance in graph-structured data analysis. Recently, the rise of Large Language Models (LLMs) like GPT-4 has heralded a new era in deep learning. However, their application to graph data poses distinct challenges due to the inherent difficulty of translating graph structures to language. To this end, we introduce the Large Language and Graph Assistan… ▽ More Graph Neural Networks (GNNs) have empowered the advance in graph-structured data analysis. Recently, the rise of Large Language Models (LLMs) like GPT-4 has heralded a new era in deep learning. However, their application to graph data poses distinct challenges due to the inherent difficulty of translating graph structures to language. To this end, we introduce the Large Language and Graph Assistant (LLaGA), an innovative model that effectively integrates LLM capabilities to handle the complexities of graph-structured data. LLaGA retains the general-purpose nature of LLMs while adapting graph data into a format compatible with LLM input. LLaGA achieves this by reorganizing graph nodes to structure-aware sequences and then mapping these into the token embedding space through a versatile projector. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks, extend its ability to unseen datasets or tasks, and provide explanations for graphs. Our extensive experiments across popular graph benchmarks show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model, surpassing state-of-the-art graph models in both supervised and zero-shot scenarios. Our code is available at \url{https://github.com/VITA-Group/LLaGA}. △ Less

Submitted 11 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.07860 [pdf, other]

On the Detection of Reviewer-Author Collusion Rings From Paper Bidding

Authors: Steven Jecmen, Nihar B. Shah, Fei Fang, Leman Akoglu

Abstract: A major threat to the peer-review systems of computer science conferences is the existence of "collusion rings" between reviewers. In such collusion rings, reviewers who have also submitted their own papers to the conference work together to manipulate the conference's paper assignment, with the aim of being assigned to review each other's papers. The most straightforward way that colluding review… ▽ More A major threat to the peer-review systems of computer science conferences is the existence of "collusion rings" between reviewers. In such collusion rings, reviewers who have also submitted their own papers to the conference work together to manipulate the conference's paper assignment, with the aim of being assigned to review each other's papers. The most straightforward way that colluding reviewers can manipulate the paper assignment is by indicating their interest in each other's papers through strategic paper bidding. One potential approach to solve this important problem would be to detect the colluding reviewers from their manipulated bids, after which the conference can take appropriate action. While prior work has developed effective techniques to detect other kinds of fraud, no research has yet established that detecting collusion rings is even possible. In this work, we tackle the question of whether it is feasible to detect collusion rings from the paper bidding. To answer this question, we conduct empirical analysis of two realistic conference bidding datasets, including evaluations of existing algorithms for fraud detection in other applications. We find that collusion rings can achieve considerable success at manipulating the paper assignment while remaining hidden from detection: for example, in one dataset, undetected colluders are able to achieve assignment to up to 30% of the papers authored by other colluders. In addition, when 10 colluders bid on all of each other's papers, no detection algorithm outputs a group of reviewers with more than 31% overlap with the true colluders. These results suggest that collusion cannot be effectively detected from the bidding using popular existing tools, demonstrating the need to develop more complex detection algorithms as well as those that leverage additional metadata (e.g., reviewer-paper text-similarity scores). △ Less

Submitted 10 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.05125 [pdf, other]

Zero-Shot Clinical Trial Patient Matching with LLMs

Authors: Michael Wornow, Alejandro Lozano, Dev Dash, Jenelle Jindal, Kenneth W. Mahaffey, Nigam H. Shah

Abstract: Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore… ▽ More Matching patients to clinical trials is a key unsolved challenge in bringing new drugs to market. Today, identifying patients who meet a trial's eligibility criteria is highly manual, taking up to 1 hour per patient. Automated screening is challenging, however, as it requires understanding unstructured clinical text. Large language models (LLMs) offer a promising solution. In this work, we explore their application to trial matching. First, we design an LLM-based system which, given a patient's medical history as unstructured clinical text, evaluates whether that patient meets a set of inclusion criteria (also specified as free text). Our zero-shot system achieves state-of-the-art scores on the n2c2 2018 cohort selection benchmark. Second, we improve the data and cost efficiency of our method by identifying a prompting strategy which matches patients an order of magnitude faster and more cheaply than the status quo, and develop a two-stage retrieval pipeline that reduces the number of tokens processed by up to a third while retaining high performance. Third, we evaluate the interpretability of our system by having clinicians evaluate the natural language justifications generated by the LLM for each eligibility decision, and show that it can output coherent explanations for 97% of its correct decisions and 75% of its incorrect ones. Our results establish the feasibility of using LLMs to accelerate clinical trial operations. △ Less

Submitted 10 April, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.02216 [pdf, other]

Position: Graph Foundation Models are Already Here

Authors: Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, Jiliang Tang

Abstract: Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. Developing GFMs presents unique challenges over traditional Graph Neural Networks (GNNs), which are typically trained from scratch for specific tasks on particular datas… ▽ More Graph Foundation Models (GFMs) are emerging as a significant research topic in the graph domain, aiming to develop graph models trained on extensive and diverse data to enhance their applicability across various tasks and domains. Developing GFMs presents unique challenges over traditional Graph Neural Networks (GNNs), which are typically trained from scratch for specific tasks on particular datasets. The primary challenge in constructing GFMs lies in effectively leveraging vast and diverse graph data to achieve positive transfer. Drawing inspiration from existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a ``graph vocabulary'', in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, expressiveness, and stability. Such a vocabulary perspective can potentially advance the future GFM design in line with the neural scaling laws. All relevant resources with GFM design can be found here. △ Less

Submitted 30 May, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

Comments: 23 pages, 2 figures

arXiv:2402.02054 [pdf, other]

Neural Scaling Laws on Graphs

Authors: Jingzhe Liu, Haitao Mao, Zhikai Chen, Tong Zhao, Neil Shah, Jiliang Tang

Abstract: Deep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the scaling properties of deep graph models have not been systematically investigated, casting doubt on the feasibility of achieving large graph models through enlarging the model and dataset sizes. In this work, we delve into neural… ▽ More Deep graph models (e.g., graph neural networks and graph transformers) have become important techniques for leveraging knowledge across various types of graphs. Yet, the scaling properties of deep graph models have not been systematically investigated, casting doubt on the feasibility of achieving large graph models through enlarging the model and dataset sizes. In this work, we delve into neural scaling laws on graphs from both model and data perspectives. We first verify the validity of such laws on graphs, establishing formulations to describe the scaling behaviors. For model scaling, we investigate the phenomenon of scaling law collapse and identify overfitting as the potential reason. Moreover, we reveal that the model depth of deep graph models can impact the model scaling behaviors, which differ from observations in other domains such as CV and NLP. For data scaling, we suggest that the number of graphs can not effectively metric the graph data volume in scaling law since the sizes of different graphs are highly irregular. Instead, we reform the data scaling law with the number of edges as the metric to address the irregular graph sizes. We further demonstrate the reformed law offers a unified view of the data scaling behaviors for various fundamental graph tasks including node classification, link prediction, and graph classification. This work provides valuable insights into neural scaling laws on graphs, which can serve as an essential step toward large graph models. △ Less

Submitted 9 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

arXiv:2401.03337 [pdf, other]

MTAC: Hierarchical Reinforcement Learning-based Multi-gait Terrain-adaptive Quadruped Controller

Authors: Nishaant Shah, Kshitij Tiwari, Aniket Bera

Abstract: Urban search and rescue missions require rapid first response to minimize loss of life and damage. Often, such efforts are assisted by humanitarian robots which need to handle dynamic operational conditions such as uneven and rough terrains, especially during mass casualty incidents like an earthquake. Quadruped robots, owing to their versatile design, have the potential to assist in such scenario… ▽ More Urban search and rescue missions require rapid first response to minimize loss of life and damage. Often, such efforts are assisted by humanitarian robots which need to handle dynamic operational conditions such as uneven and rough terrains, especially during mass casualty incidents like an earthquake. Quadruped robots, owing to their versatile design, have the potential to assist in such scenarios. However, control of quadruped robots in dynamic and rough terrain environments is a challenging problem due to the many degrees of freedom of these robots. Current locomotion controllers for quadrupeds are limited in their ability to produce multiple adaptive gaits, solve tasks in a time and resource-efficient manner, and require tedious training and manual tuning procedures. To address these challenges, we propose MTAC: a multi-gait terrain-adaptive controller, which utilizes a Hierarchical reinforcement learning (HRL) approach while being time and memory-efficient. We show that our proposed method scales well to a diverse range of environments with similar compute times as state-of-the-art methods. Our method showed greater than 75% on most tasks, outperforming previous work on the majority of test cases. △ Less

Submitted 1 November, 2023; originally announced January 2024.

Comments: Submitted to ICRA2024

arXiv:2312.11109 [pdf, other]

Graph Transformers for Large Graphs

Authors: Vijay Prakash Dwivedi, Yozen Liu, Anh Tuan Luu, Xavier Bresson, Neil Shah, Tong Zhao

Abstract: Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the sc… ▽ More Transformers have recently emerged as powerful neural networks for graph learning, showcasing state-of-the-art performance on several graph property prediction tasks. However, these results have been limited to small-scale graphs, where the computational feasibility of the global attention mechanism is possible. The next goal is to scale up these architectures to handle very large graphs on the scale of millions or even billions of nodes. With large-scale graphs, global attention learning is proven impractical due to its quadratic complexity w.r.t. the number of nodes. On the other hand, neighborhood sampling techniques become essential to manage large graph sizes, yet finding the optimal trade-off between speed and accuracy with sampling techniques remains challenging. This work advances representation learning on single large-scale graphs with a focus on identifying model characteristics and critical design constraints for developing scalable graph transformer (GT) architectures. We argue such GT requires layers that can adeptly learn both local and global graph representations while swiftly sampling the graph topology. As such, a key innovation of this work lies in the creation of a fast neighborhood sampling technique coupled with a local attention mechanism that encompasses a 4-hop reception field, but achieved through just 2-hop operations. This local node embedding is then integrated with a global node embedding, acquired via another self-attention layer with an approximate global codebook, before finally sent through a downstream layer for node predictions. The proposed GT framework, named LargeGT, overcomes previous computational bottlenecks and is validated on three large-scale node classification benchmarks. We report a 3x speedup and 16.8% performance gain on ogbn-products and snap-patents, while we also scale LargeGT on ogbn-papers100M with a 5.9% performance improvement. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10082 [pdf, other]

doi 10.1145/3636555.3636898

Finding Paths for Explainable MOOC Recommendation: A Learner Perspective

Authors: Jibril Frej, Neel Shah, Marta Knežević, Tanya Nazaretsky, Tanja Käser

Abstract: The increasing availability of Massive Open Online Courses (MOOCs) has created a necessity for personalized course recommendation systems. These systems often combine neural networks with Knowledge Graphs (KGs) to achieve richer representations of learners and courses. While these enriched representations allow more accurate and personalized recommendations, explainability remains a significant ch… ▽ More The increasing availability of Massive Open Online Courses (MOOCs) has created a necessity for personalized course recommendation systems. These systems often combine neural networks with Knowledge Graphs (KGs) to achieve richer representations of learners and courses. While these enriched representations allow more accurate and personalized recommendations, explainability remains a significant challenge which is especially problematic for certain domains with significant impact such as education and online learning. Recently, a novel class of recommender systems that uses reinforcement learning and graph reasoning over KGs has been proposed to generate explainable recommendations in the form of paths over a KG. Despite their accuracy and interpretability on e-commerce datasets, these approaches have scarcely been applied to the educational domain and their use in practice has not been studied. In this work, we propose an explainable recommendation system for MOOCs that uses graph reasoning. To validate the practical implications of our approach, we conducted a user study examining user perceptions of our new explainable recommendations. We demonstrate the generalizability of our approach by conducting experiments on two educational datasets: COCO and Xuetang. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.02137 [pdf, other]

MANUS: Markerless Grasp Capture using Articulated 3D Gaussians

Authors: Chandradeep Pokhariya, Ishaan N Shah, Angela Xing, Zekun Li, Kefan Chen, Avinash Sharma, Srinath Sridhar

Abstract: Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that does not represent hand shape accurately resulting in inaccurate contacts. We present MANUS,… ▽ More Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that does not represent hand shape accurately resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 50+ cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand. △ Less

Submitted 28 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) 2024

arXiv:2312.00634 [pdf]

A Recent Survey of Vision Transformers for Medical Image Segmentation

Authors: Asifullah Khan, Zunaira Rauf, Abdul Rehman Khan, Saima Rathore, Saddam Hussain Khan, Najmus Saher Shah, Umair Farooq, Hifsa Asif, Aqsa Asif, Umme Zahoora, Rafi Ullah Khalil, Suleman Qamar, Umme Hani Asif, Faiza Babar Khan, Abdul Majid, Jeonghwan Gwak

Abstract: Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, inte… ▽ More Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, interconnected structures often encountered in medical data. In recent years, Vision Transformers (ViTs) have emerged as a promising technique for addressing the challenges in medical image segmentation. Their multi-scale attention mechanism enables effective modeling of long-range dependencies between distant structures, crucial for segmenting organs or lesions spanning the image. Additionally, ViTs' ability to discern subtle pattern heterogeneity allows for the precise delineation of intricate boundaries and edges, a critical aspect of accurate medical image segmentation. However, they do lack image-related inductive bias and translational invariance, potentially impacting their performance. Recently, researchers have come up with various ViT-based approaches that incorporate CNNs in their architectures, known as Hybrid Vision Transformers (HVTs) to capture local correlation in addition to the global information in the images. This survey paper provides a detailed review of the recent advancements in ViTs and HVTs for medical image segmentation. Along with the categorization of ViT and HVT-based medical image segmentation approaches, we also present a detailed overview of their real-time applications in several medical image modalities. This survey may serve as a valuable resource for researchers, healthcare practitioners, and students in understanding the state-of-the-art approaches for ViT-based medical image segmentation. △ Less

Submitted 18 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

arXiv:2311.18065 [pdf, other]

doi 10.1016/j.nuclphysa.2023.122701

Exploring light nuclei production at RHIC and LHC energies with A Multi-Phase Transport model and a coalescence afterburner

Authors: Yoshini Bailung, Neha Shah, Ankhi Roy

Abstract: In heavy-ion collisions, understanding how light nuclei species are produced can provide insight into the nature of hadronic interactions in extreme conditions. It can also shed light on understanding the matter-antimatter asymmetry and dark matter searches in astrophysical processes. To investigate the production mechanism of light nuclei such as deuteron, triton, and helium-3, we use a naive coa… ▽ More In heavy-ion collisions, understanding how light nuclei species are produced can provide insight into the nature of hadronic interactions in extreme conditions. It can also shed light on understanding the matter-antimatter asymmetry and dark matter searches in astrophysical processes. To investigate the production mechanism of light nuclei such as deuteron, triton, and helium-3, we use a naive coalescence afterburner coupled to the well-known $``$A Multi-Phase Transport model" (AMPT). We focus on studying the production of light nuclei in central Au+Au collisions at different center of mass energies ($\sqrt{s_{_{\rm{NN}}}}$ = 19.6, 39, and 200 GeV) and in Pb+Pb collisions at $\sqrt{s_{_{\rm{NN}}}}$ = 2.76 TeV, at mid-rapidity. We generate events with the string melting version of AMPT, and feed the information of the nucleons with spatial and momentum conditions into the coalescence afterburner. Our study reports differential and integrated yields in transverse momentum ($p_{\rm{T}}$) of the light nuclei in different center of mass energies. We also estimate the coalescence parameters ($B_A$) as a function of $p_{\rm{T}}$ and collision energy for (anti-)deuterons, tritons and helium-3s for Au+Au and Pb+Pb collisions, which are compared to other light nuclei production studies. All results are compared with measurements from the STAR and ALICE experiments. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 10 pages, 7 figures

Journal ref: Nucl. Phys. A 1037 (2023), 122701

arXiv:2311.11483 [pdf]

A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records

Authors: Lin Lawrence Guo, Jason Fries, Ethan Steinberg, Scott Lanyon Fleming, Keith Morse, Catherine Aftandilian, Jose Posada, Nigam Shah, Lillian Sung

Abstract: Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks, making AI development more scalable and cost-effective. Structured EHR foundation models, trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved… ▽ More Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks, making AI development more scalable and cost-effective. Structured EHR foundation models, trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across different hospitals and their performance for local task adaptation. This multi-center study examined the adaptability of a recently released structured EHR foundation model ($FM_{SM}$), trained on longitudinal medical record data from 2.57M Stanford Medicine patients. Experiments were conducted using EHR data at The Hospital for Sick Children and MIMIC-IV. We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of training models from scratch at each site, including a local foundation model. We evaluated the performance of these models on 8 clinical prediction tasks. In both datasets, adapting the off-the-shelf $FM_{SM}$ matched the performance of GBM models locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. With continued pretraining on local data, label efficiency substantially improved, such that $FM_{SM}$ required fewer than 1% of training examples to match the fully trained GBM's performance. Continued pretraining was also 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings show that adapting shared EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI. △ Less

Submitted 22 April, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

Comments: 46 pages, 5 figures, 3 tables, 14 appendices

arXiv:2311.10798 [pdf, other]

INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

Authors: Shih-Cheng Huang, Zepeng Huo, Ethan Steinberg, Chia-Chun Chiang, Matthew P. Lungren, Curtis P. Langlotz, Serena Yeung, Nigam H. Shah, Jason A. Fries

Abstract: Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patien… ▽ More Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.09497 [pdf, other]

Peer Reviews of Peer Reviews: A Randomized Controlled Trial and Other Experiments

Authors: Alexander Goldberg, Ivan Stelmakh, Kyunghyun Cho, Alice Oh, Alekh Agarwal, Danielle Belgrave, Nihar B. Shah

Abstract: Is it possible to reliably evaluate the quality of peer reviews? We study this question driven by two primary motivations -- incentivizing high-quality reviewing using assessed quality of reviews and measuring changes to review quality in experiments. We conduct a large scale study at the NeurIPS 2022 conference, a top-tier conference in machine learning, in which we invited (meta)-reviewers and a… ▽ More Is it possible to reliably evaluate the quality of peer reviews? We study this question driven by two primary motivations -- incentivizing high-quality reviewing using assessed quality of reviews and measuring changes to review quality in experiments. We conduct a large scale study at the NeurIPS 2022 conference, a top-tier conference in machine learning, in which we invited (meta)-reviewers and authors to evaluate reviews given to submitted papers. First, we conduct a RCT to examine bias due to the length of reviews. We generate elongated versions of reviews by adding substantial amounts of non-informative content. Participants in the control group evaluate the original reviews, whereas participants in the experimental group evaluate the artificially lengthened versions. We find that lengthened reviews are scored (statistically significantly) higher quality than the original reviews. Additionally, in analysis of observational data we find that authors are positively biased towards reviews recommending acceptance of their own papers, even after controlling for confounders of review length, quality, and different numbers of papers per author. We also measure disagreement rates between multiple evaluations of the same review of 28%-32%, which is comparable to that of paper reviewers at NeurIPS. Further, we assess the amount of miscalibration of evaluators of reviews using a linear model of quality scores and find that it is similar to estimates of miscalibration of paper reviewers at NeurIPS. Finally, we estimate the amount of variability in subjective opinions around how to map individual criteria to overall scores of review quality and find that it is roughly the same as that in the review of papers. Our results suggest that the various problems that exist in reviews of papers -- inconsistency, bias towards irrelevant factors, miscalibration, subjectivity -- also arise in reviewing of reviews. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.05834 [pdf, ps, other]

An upper bound of the Hausdorff dimension of singular vectors on affine subspaces

Authors: Nimish A. Shah, Pengyu Yang

Abstract: In Diophantine approximation, the notion of singular vectors was introduced by Khintchine in the 1920's. We study the set of singular vectors on an affine subspace of $\mathbb{R}^n$. We give an upper bound of its Hausdorff dimension in terms of the Diophantine exponent of the parameter of the affine subspace. In Diophantine approximation, the notion of singular vectors was introduced by Khintchine in the 1920's. We study the set of singular vectors on an affine subspace of $\mathbb{R}^n$. We give an upper bound of its Hausdorff dimension in terms of the Diophantine exponent of the parameter of the affine subspace. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 18 pages

MSC Class: 37A17; 11J83; 22E46

arXiv:2310.16891 [pdf, other]

Detecting Detached Black Hole binaries through Photometric Variability

Authors: Chirag Chawla, Sourav Chatterjee, Neev Shah, Katelyn Breivik

Abstract: Understanding the connection between the properties of black holes (BHs) and their progenitors is interesting in many branches of astrophysics. Discovering BHs in detached orbits with luminous companions (LCs) promises to help create this map since the LC and BH progenitor are expected to have the same metallicity and formation time. We explore the possibility of detecting BH-LC binaries in detach… ▽ More Understanding the connection between the properties of black holes (BHs) and their progenitors is interesting in many branches of astrophysics. Discovering BHs in detached orbits with luminous companions (LCs) promises to help create this map since the LC and BH progenitor are expected to have the same metallicity and formation time. We explore the possibility of detecting BH-LC binaries in detached orbits using photometric variations of the LC flux, induced by tidal ellipsoidal variation, relativistic beaming, and self-lensing. We create realistic present-day populations of detached BH-LC binaries in the Milky Way (MW) using binary population synthesis where we adopt observationally motivated initial stellar and binary properties, star formation history and present-day distribution of these sources in the MW based on detailed cosmological simulations. We test detectability of these sources via photometric variability by Gaia and TESS missions by incorporating their respective detailed detection biases as well as interstellar extinction. We find that Gaia (TESS) is expected to resolve ~700-1500 (~100-400) detached BH-LC binaries depending on the photometric precision and details of supernova physics. We find that ~369 BH-LC binaries would be common both in Gaia and TESS. Moreover, between ~80-270 (~70-290) of these BH-LC binaries can be further characterised using Gaia's radial velocity (astrometry) measurements. △ Less

Submitted 21 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 22 pages, 16 figures, and 1 table; submitted to The Astrophysical Journal; Comments welcome

arXiv:2310.16146 [pdf, other]

Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature

Authors: Alejandro Lozano, Scott L Fleming, Chia-Chun Chiang, Nigam Shah

Abstract: The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality… ▽ More The quickly-expanding nature of published medical literature makes it challenging for clinicians and researchers to keep up with and summarize recent, relevant findings in a timely manner. While several closed-source summarization tools based on large language models (LLMs) now exist, rigorous and systematic evaluations of their outputs are lacking. Furthermore, there is a paucity of high-quality datasets and appropriate benchmark tasks with which to evaluate these tools. We address these issues with four contributions: we release Clinfo.ai, an open-source WebApp that answers clinical questions based on dynamically retrieved scientific literature; we specify an information retrieval and abstractive summarization task to evaluate the performance of such retrieval-augmented LLM systems; we release a dataset of 200 questions and corresponding answers derived from published systematic reviews, which we name PubMed Retrieval and Synthesis (PubMedRS-200); and report benchmark results for Clinfo.ai and other publicly available OpenQA systems on PubMedRS-200. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.13077 [pdf, other]

doi 10.1109/CASE56687.2023.10260513

NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving

Authors: Kaustab Pal, Aditya Sharma, Mohd Omama, Parth N. Shah, K. Madhava Krishna

Abstract: In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha… ▽ More In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon that precludes further resampling, an iterative process that makes sampling based optimal control formulations difficult to adopt in real time settings. Generating control samples around the network-predicted optimal mean retains the advantage of sample diversity while enabling real time rollout of trajectories that avoids multiple dynamic obstacles in an on-road navigation setting. Further the 3D CNN architecture implicitly learns the future trajectories of the dynamic agents in the scene resulting in successful collision free navigation despite no explicit future trajectory prediction. We show performance gain over multiple baselines in a number of on-road scenes through closed loop simulations in CARLA. We also showcase the real world applicability of our system by running it on our custom Autonomous Driving Platform (AutoDP). △ Less

Submitted 19 October, 2023; originally announced October 2023.

Comments: Published in 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE)

Showing 1–50 of 674 results for author: Shah, N