-
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Authors:
Zeyu Leo Liu,
Shrey Pandit,
Xi Ye,
Eunsol Choi,
Greg Durrett
Abstract:
Large language models (LLMs) are increasingly being used to synthesize and reason about source code. However, the static nature of these models' knowledge does not reflect the fact that libraries and API functions they invoke are continuously evolving, with functionality being added or changing. While numerous benchmarks evaluate how LLMs can generate code, no prior work has studied how an LLMs' k…
▽ More
Large language models (LLMs) are increasingly being used to synthesize and reason about source code. However, the static nature of these models' knowledge does not reflect the fact that libraries and API functions they invoke are continuously evolving, with functionality being added or changing. While numerous benchmarks evaluate how LLMs can generate code, no prior work has studied how an LLMs' knowledge about code API functions can be updated. To fill this gap, we present CodeUpdateArena, a benchmark for knowledge editing in the code domain. An instance in our benchmark consists of a synthetic API function update paired with a program synthesis example that uses the updated functionality; our goal is to update an LLM to be able to solve this program synthesis example without providing documentation of the update at inference time. Compared to knowledge editing for facts encoded in text, success here is more challenging: a code LLM must correctly reason about the semantics of the modified function rather than just reproduce its syntax. Our dataset is constructed by first prompting GPT-4 to generate atomic and executable function updates. Then, for each update, we generate program synthesis examples whose code solutions are prone to use the update. Our benchmark covers updates of various types to 54 functions from seven diverse Python packages, with a total of 670 program synthesis examples. Our experiments show that prepending documentation of the update to open-source code LLMs (i.e., DeepSeek, CodeLlama) does not allow them to incorporate changes for problem solving, and existing knowledge editing techniques also have substantial room for improvement. We hope our benchmark will inspire new methods for knowledge updating in code LLMs.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Insulator-to-Metal Transition and Isotropic Gigantic Magnetoresistance in Layered Magnetic Semiconductors
Authors:
Gokul Acharya,
Bimal Neupane,
Chia-Hsiu Hsu,
Xian P. Yang,
David Graf,
Eun Sang Choi,
Krishna Pandey,
Md Rafique Un Nabi,
Santosh Karki Chhetri,
Rabindra Basnet,
Sumaya Rahman,
Jian Wang,
Zhengxin Hu,
Bo Da,
Hugh Churchill,
Guoqing Chang,
M. Zahid Hasan,
Yuanxi Wang,
Jin Hu
Abstract:
Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology ap…
▽ More
Magnetotransport, the response of electrical conduction to external magnetic field, acts as an important tool to reveal fundamental concepts behind exotic phenomena and plays a key role in enabling spintronic applications. Magnetotransport is generally sensitive to magnetic field orientations. In contrast, efficient and isotropic modulation of electronic transport, which is useful in technology applications such as omnidirectional sensing, is rarely seen, especially for pristine crystals. Here we propose a strategy to realize extremely strong modulation of electron conduction by magnetic field which is independent of field direction. GdPS, a layered antiferromagnetic semiconductor with resistivity anisotropies, supports a field-driven insulator-to-metal transition with a paradoxically isotropic gigantic negative magnetoresistance insensitive to magnetic field orientations. This isotropic magnetoresistance originates from the combined effects of a near-zero spin-orbit coupling of Gd3+-based half-filling f-electron system and the strong on-site f-d exchange coupling in Gd atoms. Our results not only provide a novel material system with extraordinary magnetotransport that offers a missing block for antiferromagnet-based ultrafast and efficient spintronic devices, but also demonstrate the key ingredients for designing magnetic materials with desired transport properties for advanced functionalities.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Averaging log-likelihoods in direct alignment
Authors:
Nathan Grinsztajn,
Yannis Flet-Berliac,
Mohammad Gheshlaghi Azar,
Florian Strub,
Bill Wu,
Eugene Choi,
Chris Cremer,
Arash Ahmadian,
Yash Chandak,
Olivier Pietquin,
Matthieu Geist
Abstract:
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involvin…
▽ More
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involving the log-likelihood of (dis)preferred completions according to the trained model. However, completions have various lengths, and the log-likelihood is not length-invariant. On the other side, the cross-entropy loss used in supervised training is length-invariant, as batches are typically averaged token-wise. To reconcile these approaches, we introduce a principled approach for making direct alignment length-invariant. Formally, we introduce a new averaging operator, to be composed with the optimality operator giving the best policy for the underlying RL problem. It translates into averaging the log-likelihood within the loss. We empirically study the effect of such averaging, observing a trade-off between the length of generations and their scores.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Authors:
Yannis Flet-Berliac,
Nathan Grinsztajn,
Florian Strub,
Eugene Choi,
Chris Cremer,
Arash Ahmadian,
Yash Chandak,
Mohammad Gheshlaghi Azar,
Olivier Pietquin,
Matthieu Geist
Abstract:
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-…
▽ More
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-based ones are not the only rewards of interest for LLMs (eg., unit tests for code generation or textual entailment for summarization, among others). RL-finetuning is usually done with a variation of policy gradient, which calls for on-policy or near-on-policy samples, requiring costly generations. We introduce Contrastive Policy Gradient, or CoPG, a simple and mathematically principled new RL algorithm that can estimate the optimal policy even from off-policy data. It can be seen as an off-policy policy gradient approach that does not rely on important sampling techniques and highlights the importance of using (the right) state baseline. We show this approach to generalize the direct alignment method IPO (identity preference optimization) and classic policy gradient. We experiment with the proposed CoPG on a toy bandit problem to illustrate its properties, as well as for finetuning LLMs on a summarization task, using a learned reward function considered as ground truth for the purpose of the experiments.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Spectrum and low-energy gap in triangular quantum spin liquid NaYbSe$_2$
Authors:
A. O. Scheie,
Minseong Lee,
Kevin Wang,
P. Laurell,
E. S. Choi,
D. Pajerowski,
Qingming Zhang,
Jie Ma,
H. D. Zhou,
Sangyun Lee,
S. M. Thomas,
M. O. Ajeesh,
P. F. S. Rosa,
Ao Chen,
Vivien S. Zapf,
M. Heyl,
C. D. Batista,
E. Dagotto,
J. E. Moore,
D. Alan Tennant
Abstract:
We report neutron scattering, pressure-dependent AC calorimetry, and AC magnetic susceptibility measurements of triangular lattice NaYbSe$_2$. We observe a continuum of scattering, which is reproduced by matrix product simulations, and no phase transition is detected in any bulk measurements. Comparison to heat capacity simulations suggest the material is within the Heisenberg spin liquid phase. A…
▽ More
We report neutron scattering, pressure-dependent AC calorimetry, and AC magnetic susceptibility measurements of triangular lattice NaYbSe$_2$. We observe a continuum of scattering, which is reproduced by matrix product simulations, and no phase transition is detected in any bulk measurements. Comparison to heat capacity simulations suggest the material is within the Heisenberg spin liquid phase. AC Susceptibility shows a significant 23~mK downturn, indicating a gap in the magnetic spectrum. The combination of a gap with no detectable magnetic order, comparison to theoretical models, and comparison to other $A$YbSe$_2$ compounds all strongly indicate NaYbSe$_2$ is within the quantum spin liquid phase. The gap also allows us to rule out a gapless Dirac spin liquid, with a gapped $\mathbb{Z}_2$ liquid the most natural explanation.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
CaLMQA: Exploring culturally specific long-form question answering across 23 languages
Authors:
Shane Arora,
Marzena Karpinska,
Hung-Ting Chen,
Ipsita Bhattacharjee,
Mohit Iyyer,
Eunsol Choi
Abstract:
Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally a…
▽ More
Large language models (LLMs) are used for long-form question answering (LFQA), which requires them to generate paragraph-length answers to complex questions. While LFQA has been well-studied in English, this research has not been extended to other languages. To bridge this gap, we introduce CaLMQA, a collection of 1.5K complex culturally specific questions spanning 23 languages and 51 culturally agnostic questions translated from English into 22 other languages. We define culturally specific questions as those uniquely or more likely to be asked by people from cultures associated with the question's language. We collect naturally-occurring questions from community web forums and hire native speakers to write questions to cover under-resourced, rarely-studied languages such as Fijian and Kirundi. Our dataset contains diverse, complex questions that reflect cultural topics (e.g. traditions, laws, news) and the language usage of native speakers. We automatically evaluate a suite of open- and closed-source models on CaLMQA by detecting incorrect language and token repetitions in answers, and observe that the quality of LLM-generated answers degrades significantly for some low-resource languages. Lastly, we perform human evaluation on a subset of models and languages. Manual evaluation reveals that model performance is significantly worse for culturally specific questions than for culturally agnostic questions. Our findings highlight the need for further research in non-English LFQA and provide an evaluation framework.
△ Less
Submitted 3 July, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Authors:
Thom Lake,
Eunsol Choi,
Greg Durrett
Abstract:
The alignment process changes several properties of a large language model's (LLM's) output distribution. We analyze two aspects of post-alignment distributional shift of LLM responses. First, we re-examine previously reported reductions in response diversity post-alignment. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and inform…
▽ More
The alignment process changes several properties of a large language model's (LLM's) output distribution. We analyze two aspects of post-alignment distributional shift of LLM responses. First, we re-examine previously reported reductions in response diversity post-alignment. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Alignment suppresses irrelevant and unhelpful content while shifting the output distribution toward longer responses that cover information spanning several responses from the base LLM, essentially presenting diverse information in a single response. Finding little evidence that alignment suppresses useful information, it is natural to ask the opposite question: do aligned models surface information that cannot be recovered from base models? Our second investigation shows this is not the case and the behavior of aligned models is recoverable from base models without fine-tuning. A combination of in-context examples and lower-resolution semantic hints about response content can elicit responses from base LLMs that are as similar to alignment-tuned LLM responses as alignment-tuned LLM responses are to each other. Taken together, these results indicate that current alignment techniques capture but do not extend the useful subset of assistant-like base LLM behavior, providing further evidence for the Superficial Alignment Hypothesis. They also show that in-context alignment can go surprisingly far as a strategy for imitating aligned LLMs without fine-tuning. Our code and data is available at https://github.com/thomlake/investigating-alignment.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records
Authors:
Yeonsu Kwon,
Jiho Kim,
Gyubok Lee,
Seongsu Bae,
Daeun Kyung,
Wonchul Cha,
Tom Pollard,
Alistair Johnson,
Edward Choi
Abstract:
Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system design…
▽ More
Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system designs and human errors, posing serious risks to patient safety. To address this, we developed EHRCon, a new dataset and task specifically designed to ensure data consistency between structured tables and unstructured notes in EHRs. EHRCon was crafted in collaboration with healthcare professionals using the MIMIC-III EHR dataset, and includes manual annotations of 3,943 entities across 105 clinical notes checked against database entries for consistency. EHRCon has two versions, one using the original MIMIC-III schema, and another using the OMOP CDM schema, in order to increase its applicability and generalizability. Furthermore, leveraging the capabilities of large language models, we introduce CheckEHR, a novel framework for verifying the consistency between clinical notes and database tables. CheckEHR utilizes an eight-stage process and shows promising results in both few-shot and zero-shot settings. The code is available at https://github.com/dustn1259/EHRCon.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Exploring Design Choices for Building Language-Specific LLMs
Authors:
Atula Tejaswi,
Nilesh Gupta,
Eunsol Choi
Abstract:
Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of…
▽ More
Despite rapid progress in large language models (LLMs), their performance on a vast majority of languages remain unsatisfactory. In this paper, we study building language-specific LLMs by adapting monolingual and multilingual LLMs. We conduct systematic experiments on how design choices (base model selection, vocabulary extension, and continued fine-tuning) impact the adapted LLM, both in terms of efficiency (how many tokens are needed to encode the same amount of information) and end task performance. We find that (1) the initial performance before the adaptation is not always indicative of the final performance. (2) Efficiency can easily improved with simple vocabulary extension and continued fine-tuning in most LLMs we study, and (3) The optimal adaptation method is highly language-dependent, and the simplest approach works well across various experimental settings. Adapting English-centric models can yield better results than adapting multilingual models despite their worse initial performance on low-resource languages. Together, our work lays foundations on efficiently building language-specific LLMs by adapting existing LLMs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents
Authors:
Jiho Kim,
Woosog Chay,
Hyeonji Hwang,
Daeun Kyung,
Hyunseung Chung,
Eunbyeol Cho,
Yohan Jo,
Edward Choi
Abstract:
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge…
▽ More
Recent advancements in Large Language Models (LLMs) have significantly enhanced the capabilities of conversational agents, making them applicable to various fields (e.g., education). Despite their progress, the evaluation of the agents often overlooks the complexities of real-world conversations, such as real-time interactions, multi-party dialogues, and extended contextual dependencies. To bridge this gap, we introduce DialSim, a real-time dialogue simulator. In this simulator, an agent is assigned the role of a character from popular TV shows, requiring it to respond to spontaneous questions using past dialogue information and to distinguish between known and unknown information. Key features of DialSim include evaluating the agent's ability to respond within a reasonable time limit, handling long-term multi-party dialogues, and managing adversarial settings (e.g., swap character names) to challenge the agent's reliance on pre-trained knowledge. We utilized this simulator to evaluate the latest conversational agents and analyze their limitations. Our experiments highlight both the strengths and weaknesses of these agents, providing valuable insights for future improvements in the field of conversational AI. DialSim is available at https://github.com/jiho283/Simulator.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Self-Improving Robust Preference Optimization
Authors:
Eugene Choi,
Arash Ahmadian,
Matthieu Geist,
Oilvier Pietquin,
Mohammad Gheshlaghi Azar
Abstract:
Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimizati…
▽ More
Both online and offline RLHF methods such as PPO and DPO have been extremely successful in aligning AI with human preferences. Despite their success, the existing methods suffer from a fundamental problem that their optimal solution is highly task-dependent (i.e., not robust to out-of-distribution (OOD) tasks). Here we address this challenge by proposing Self-Improving Robust Preference Optimization SRPO, a practical and mathematically principled offline RLHF framework that is completely robust to the changes in the task. The key idea of SRPO is to cast the problem of learning from human preferences as a self-improvement process, which can be mathematically expressed in terms of a min-max objective that aims at joint optimization of self-improvement policy and the generative policy in an adversarial fashion. The solution for this optimization problem is independent of the training task and thus it is robust to its changes. We then show that this objective can be re-expressed in the form of a non-adversarial offline loss which can be optimized using standard supervised optimization techniques at scale without any need for reward model and online inference. We show the effectiveness of SRPO in terms of AI Win-Rate (WR) against human (GOLD) completions. In particular, when SRPO is evaluated on the OOD XSUM dataset, it outperforms the celebrated DPO by a clear margin of 15% after 5 self-revisions, achieving WR of 90%.
△ Less
Submitted 7 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
EHR-SeqSQL : A Sequential Text-to-SQL Dataset For Interactively Exploring Electronic Health Records
Authors:
Jaehee Ryu,
Seonhee Cho,
Gyubok Lee,
Edward Choi
Abstract:
In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include…
▽ More
In this paper, we introduce EHR-SeqSQL, a novel sequential text-to-SQL dataset for Electronic Health Record (EHR) databases. EHR-SeqSQL is designed to address critical yet underexplored aspects in text-to-SQL parsing: interactivity, compositionality, and efficiency. To the best of our knowledge, EHR-SeqSQL is not only the largest but also the first medical text-to-SQL dataset benchmark to include sequential and contextual questions. We provide a data split and the new test set designed to assess compositional generalization ability. Our experiments demonstrate the superiority of a multi-turn approach over a single-turn approach in learning compositionality. Additionally, our dataset integrates specially crafted tokens into SQL queries to improve execution efficiency. With EHR-SeqSQL, we aim to bridge the gap between practical needs and academic research in the text-to-SQL domain.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors
Authors:
Vijay Lingam,
Atula Tejaswi,
Aditya Vavre,
Aneesh Shetty,
Gautham Krishna Gudur,
Joydeep Ghosh,
Alex Dimakis,
Eunsol Choi,
Aleksandar Bojchevski,
Sujay Sanghavi
Abstract:
Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\) and inject learnable matrices \(ΔW\). These \(ΔW\) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although…
▽ More
Popular parameter-efficient fine-tuning (PEFT) methods, such as LoRA and its variants, freeze pre-trained model weights \(W\) and inject learnable matrices \(ΔW\). These \(ΔW\) matrices are structured for efficient parameterization, often using techniques like low-rank approximations or scaling vectors. However, these methods typically show a performance gap compared to full fine-tuning. Although recent PEFT methods have narrowed this gap, they do so at the cost of additional learnable parameters. We propose SVFT, a simple approach that fundamentally differs from existing methods: the structure imposed on \(ΔW\) depends on the specific weight matrix \(W\). Specifically, SVFT updates \(W\) as a sparse combination of outer products of its singular vectors, training only the coefficients (scales) of these sparse combinations. This approach allows fine-grained control over expressivity through the number of coefficients. Extensive experiments on language and vision benchmarks show that SVFT recovers up to 96% of full fine-tuning performance while training only 0.006 to 0.25% of parameters, outperforming existing methods that only recover up to 85% performance using 0.03 to 0.8% of the trainable parameter budget.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Energy-efficient predictive control for connected, automated driving under localization uncertainty
Authors:
Eunhyek Joa,
Eric Yongkeun Choi,
Francesco Borrelli
Abstract:
This paper presents a data-driven Model Predictive Control (MPC) for energy-efficient urban road driving for connected, automated vehicles. The proposed MPC aims to minimize total energy consumption by controlling the vehicle's longitudinal motion on roads with traffic lights and preceding vehicles. Its terminal cost function and terminal constraints are learned from data, which consists of the cl…
▽ More
This paper presents a data-driven Model Predictive Control (MPC) for energy-efficient urban road driving for connected, automated vehicles. The proposed MPC aims to minimize total energy consumption by controlling the vehicle's longitudinal motion on roads with traffic lights and preceding vehicles. Its terminal cost function and terminal constraints are learned from data, which consists of the closed-loop state and input trajectories. The terminal cost function represents the remaining energy-to-spend starting from a given terminal state. The terminal constraints are designed to ensure that the controlled vehicle timely crosses the upcoming traffic light, adheres to traffic laws, and accounts for the preceding vehicles. We validate the effectiveness of our method through both simulations and real-world vehicle experiments, demonstrating $\textbf{19\%}$ improvement in average energy consumption compared to conventional approaches that involve solving a long-horizon optimal control problem for speed planning and employing a separate controller for speed tracking.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments
Authors:
Jooyong Park,
Jungwoo Lee,
Euncheol Choi,
Younggun Cho
Abstract:
In urban environments for delivery robots, particularly in areas such as campuses and towns, many custom features defy standard road semantic categorizations. Addressing this challenge, our paper introduces a method leveraging Salient Object Detection (SOD) to extract these unique features, employing them as pivotal factors for enhanced robot loop closure and localization. Traditional geometric fe…
▽ More
In urban environments for delivery robots, particularly in areas such as campuses and towns, many custom features defy standard road semantic categorizations. Addressing this challenge, our paper introduces a method leveraging Salient Object Detection (SOD) to extract these unique features, employing them as pivotal factors for enhanced robot loop closure and localization. Traditional geometric feature-based localization is hampered by fluctuating illumination and appearance changes. Our preference for SOD over semantic segmentation sidesteps the intricacies of classifying a myriad of non-standardized urban features. To achieve consistent ground features, the Motion Compensate IPM (MC-IPM) technique is implemented, capitalizing on motion for distortion compensation and subsequently selecting the most pertinent salient ground features through moment computations. For thorough evaluation, we validated the saliency detection and localization performances to the real urban scenarios. Project page: https://sites.google.com/view/salient-ground-feature/home.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Magnetic properties of the quasi-XY Shastry-Sutherland magnet Er$_2$Be$_2$SiO$_7$
Authors:
A. Brassington,
1 Q. Ma,
G. Sala,
A. I. Kolesnikov,
K. M. Taddei,
Y. Wu,
E. S Choi,
H. Wang,
W. Xie,
J. Ma,
H. D. Zhou,
A. A. Aczel
Abstract:
Polycrystalline and single crystal samples of the insulating Shastry-Sutherland compound Er$_2$Be$_2$SiO$_7$ were synthesized via a solid-state reaction and the floating zone method respectively. The crystal structure, Er single ion anisotropy, zero-field magnetic ground state, and magnetic phase diagrams along high-symmetry crystallographic directions were investigated by bulk measurement techniq…
▽ More
Polycrystalline and single crystal samples of the insulating Shastry-Sutherland compound Er$_2$Be$_2$SiO$_7$ were synthesized via a solid-state reaction and the floating zone method respectively. The crystal structure, Er single ion anisotropy, zero-field magnetic ground state, and magnetic phase diagrams along high-symmetry crystallographic directions were investigated by bulk measurement techniques, x-ray and neutron diffraction, and neutron spectroscopy. We establish that Er$_2$Be$_2$SiO$_7$ crystallizes in a tetragonal space group with planes of orthogonal Er dimers and a strong preference for the Er moments to lie in the local plane perpendicular to each dimer bond. We also find that this system has a non-collinear ordered ground state in zero field with a transition temperature of 0.841 K consisting of antiferromagnetic dimers and in-plane moments. Finally, we mapped out the $H-T$ phase diagrams for Er$_2$Be$_2$SiO$_7$ along the directions $H \parallel$ [001], [100], and [110]. While an increasing in-plane field simply induces a phase transition to a field-polarized phase, we identify three metamagnetic transitions before the field-polarized phase is established in the $H \parallel$ [001] case. This complex behavior establishes insulating Er$_2$Be$_2$SiO$_7$ and other isostructural family members as promising candidates for uncovering exotic magnetic properties and phenomena that can be readily compared to theoretical predictions of the exactly soluble Shastry-Sutherland model.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Overview of the EHRSQL 2024 Shared Task on Reliable Text-to-SQL Modeling on Electronic Health Records
Authors:
Gyubok Lee,
Sunjun Kweon,
Seongsu Bae,
Edward Choi
Abstract:
Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make…
▽ More
Electronic Health Records (EHRs) are relational databases that store the entire medical histories of patients within hospitals. They record numerous aspects of patients' medical care, from hospital admission and diagnosis to treatment and discharge. While EHRs are vital sources of clinical data, exploring them beyond a predefined set of queries requires skills in query languages like SQL. To make information retrieval more accessible, one strategy is to build a question-answering system, possibly leveraging text-to-SQL models that can automatically translate natural language questions into corresponding SQL queries and use these queries to retrieve the answers. The EHRSQL 2024 shared task aims to advance and promote research in developing a question-answering system for EHRs using text-to-SQL modeling, capable of reliably providing requested answers to various healthcare professionals to improve their clinical work processes and satisfy their needs. Among more than 100 participants who applied to the shared task, eight teams were formed and completed the entire shared task requirement and demonstrated a wide range of methods to effectively solve this task. In this paper, we describe the task of reliable text-to-SQL modeling, the dataset, and the methods and results of the participants. We hope this shared task will spur further research and insights into developing reliable question-answering systems for EHRs.
△ Less
Submitted 23 May, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL
Authors:
Yongjin Yang,
Sihyeon Kim,
SangMook Kim,
Gyubok Lee,
Se-Young Yun,
Edward Choi
Abstract:
Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identi…
▽ More
Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identify a data bias in these unanswerable questions; they can often be discerned simply by filtering with specific N-gram patterns. Such biases jeopardize the authenticity and reliability of QA system evaluations. To tackle this problem, we propose a simple debiasing method of adjusting the split between the validation and test sets to neutralize the undue influence of N-gram filtering. By experimenting on the MIMIC-III dataset, we demonstrate both the existing data bias in EHRSQL and the effectiveness of our data split strategy in mitigating this bias.
△ Less
Submitted 28 April, 2024;
originally announced May 2024.
-
Cubiquitous Lattices and Branched Covers bounding rational balls
Authors:
Erica Choi,
Nur Saglam,
Jonathan Simone,
Katerina Stuopis,
Hugo Zhou
Abstract:
Greene and Owens explore cubiquitous lattices as an obstruction to rational homology 3-spheres bounding rational homology 4-balls. The purpose of this article is to better understand which sublattices of $\mathbb{Z}^n$ are cubiquitous with the aim of effectively using their cubiquity obstruction. We develop a geometric obstruction (called the Wu obstruction) to cubiquity and use it as tool to comp…
▽ More
Greene and Owens explore cubiquitous lattices as an obstruction to rational homology 3-spheres bounding rational homology 4-balls. The purpose of this article is to better understand which sublattices of $\mathbb{Z}^n$ are cubiquitous with the aim of effectively using their cubiquity obstruction. We develop a geometric obstruction (called the Wu obstruction) to cubiquity and use it as tool to completely classify which sublattices with orthogonal bases are cubiquitous. We then apply this result the double branched covers of alternating connected sums of torus links. Finally, we explore how the Wu obstruction can be used in conjunction with contractions to obstruct the cubiquity of infinite families of lattices.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
EHRFL: Federated Learning Framework for Heterogeneous EHRs and Precision-guided Selection of Participating Clients
Authors:
Jiyoun Kim,
Junu Kim,
Kyunghoon Hur,
Edward Choi
Abstract:
In this study, we provide solutions to two practical yet overlooked scenarios in federated learning for electronic health records (EHRs): firstly, we introduce EHRFL, a framework that facilitates federated learning across healthcare institutions with distinct medical coding systems and database schemas using text-based linearization of EHRs. Secondly, we focus on a scenario where a single healthca…
▽ More
In this study, we provide solutions to two practical yet overlooked scenarios in federated learning for electronic health records (EHRs): firstly, we introduce EHRFL, a framework that facilitates federated learning across healthcare institutions with distinct medical coding systems and database schemas using text-based linearization of EHRs. Secondly, we focus on a scenario where a single healthcare institution initiates federated learning to build a model tailored for itself, in which the number of clients must be optimized in order to reduce expenses incurred by the host. For selecting participating clients, we present a novel precision-based method, leveraging data latents to identify suitable participants for the institution. Our empirical results show that EHRFL effectively enables federated learning across hospitals with different EHR systems. Furthermore, our results demonstrate the efficacy of our precision-based method in selecting reduced number of participating clients without compromising model performance, resulting in lower operational costs when constructing institution-specific models. We believe this work lays a foundation for the broader adoption of federated learning on EHRs.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
DinAR: Augmenting Reality for Sustainable Dining
Authors:
MJ Johns,
Eunsol Sol Choi,
Derusha Baskaran
Abstract:
Sustainable food is among the many challenges associated with climate change. The resources required to grow or gather the food and the distance it travels to reach the consumer are two key factors of an ingredient's sustainability. Food that is grown locally and is currently "in-season" will have a lower carbon footprint, but when dining out these details unfortunately may not affect one's orderi…
▽ More
Sustainable food is among the many challenges associated with climate change. The resources required to grow or gather the food and the distance it travels to reach the consumer are two key factors of an ingredient's sustainability. Food that is grown locally and is currently "in-season" will have a lower carbon footprint, but when dining out these details unfortunately may not affect one's ordering preferences. We introduce DinAR as an immersive experience to make this information more accessible and to encourage better dining choices through friendly competition with a leaderboard of sustainability scores. Our study measures the effectiveness of immersive AR experiences on impacting consumer preferences towards sustainability.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
AmbigDocs: Reasoning across Documents on Different Entities under the Same Name
Authors:
Yoonsang Lee,
Xi Ye,
Eunsol Choi
Abstract:
Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, w…
▽ More
Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.
△ Less
Submitted 26 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Multi-Granularity Guided Fusion-in-Decoder
Authors:
Eunseong Choi,
Hyeri Lee,
Jongwuk Lee
Abstract:
In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose th…
▽ More
In Open-domain Question Answering (ODQA), it is essential to discern relevant contexts as evidence and avoid spurious ones among retrieved results. The model architecture that uses concatenated multiple contexts in the decoding phase, i.e., Fusion-in-Decoder, demonstrates promising performance but generates incorrect outputs from seemingly plausible contexts. To address this problem, we propose the Multi-Granularity guided Fusion-in-Decoder (MGFiD), discerning evidence across multiple levels of granularity. Based on multi-task learning, MGFiD harmonizes passage re-ranking with sentence classification. It aggregates evident sentences into an anchor vector that instructs the decoder. Additionally, it improves decoding efficiency by reusing the results of passage re-ranking for passage pruning. Through our experiments, MGFiD outperforms existing models on the Natural Questions (NQ) and TriviaQA (TQA) datasets, highlighting the benefits of its multi-granularity solution.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Contextual AI Journaling: Integrating LLM and Time Series Behavioral Sensing Technology to Promote Self-Reflection and Well-being using the MindScape App
Authors:
Subigya Nepal,
Arvind Pillai,
William Campbell,
Talie Massachi,
Eunsol Soul Choi,
Orson Xu,
Joanna Kuc,
Jeremy Huckins,
Jason Holden,
Colin Depp,
Nicholas Jacobson,
Mary Czerwinski,
Eric Granholm,
Andrew T. Campbell
Abstract:
MindScape aims to study the benefits of integrating time series behavioral patterns (e.g., conversational engagement, sleep, location) with Large Language Models (LLMs) to create a new form of contextual AI journaling, promoting self-reflection and well-being. We argue that integrating behavioral sensing in LLMs will likely lead to a new frontier in AI. In this Late-Breaking Work paper, we discuss…
▽ More
MindScape aims to study the benefits of integrating time series behavioral patterns (e.g., conversational engagement, sleep, location) with Large Language Models (LLMs) to create a new form of contextual AI journaling, promoting self-reflection and well-being. We argue that integrating behavioral sensing in LLMs will likely lead to a new frontier in AI. In this Late-Breaking Work paper, we discuss the MindScape contextual journal App design that uses LLMs and behavioral sensing to generate contextual and personalized journaling prompts crafted to encourage self-reflection and emotional development. We also discuss the MindScape study of college students based on a preliminary user study and our upcoming study to assess the effectiveness of contextual AI journaling in promoting better well-being on college campuses. MindScape represents a new application class that embeds behavioral intelligence in AI.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
TrustSQL: Benchmarking Text-to-SQL Reliability with Penalty-Based Scoring
Authors:
Gyubok Lee,
Woosog Chay,
Seonhee Cho,
Edward Choi
Abstract:
Text-to-SQL enables users to interact with databases using natural language, simplifying the retrieval and synthesis of information. Despite the remarkable success of large language models (LLMs) in translating natural language questions into SQL queries, widespread deployment remains limited due to two primary challenges. First, the effective use of text-to-SQL models depends on users' understand…
▽ More
Text-to-SQL enables users to interact with databases using natural language, simplifying the retrieval and synthesis of information. Despite the remarkable success of large language models (LLMs) in translating natural language questions into SQL queries, widespread deployment remains limited due to two primary challenges. First, the effective use of text-to-SQL models depends on users' understanding of the model's capabilities-the scope of questions the model can correctly answer. Second, the absence of abstention mechanisms can lead to incorrect SQL generation going unnoticed, thereby undermining trust in the model's output. To enable wider deployment, it is crucial to address these challenges in model design and enhance model evaluation to build trust in the model's output. To this end, we introduce TrustSQL, a novel comprehensive benchmark designed to evaluate text-to-SQL reliability-defined as a model's ability to correctly handle any type of input question by generating correct SQL queries for feasible questions and abstaining from generating infeasible ones (e.g., due to schema incompatibility or functionalities beyond SQL). We evaluate existing methods using a novel penalty-based scoring metric with two modeling approaches: (1) pipeline-based methods combining SQL generators with infeasible question detectors and SQL error detectors for abstention; and (2) unified methods using a single model for the entire task. Our experimental results reveal that achieving high scores under severe penalties requires significant effort and provide a new perspective on developing text-to-SQL models for safer deployment. TrustSQL is available at https://github.com/glee4810/TrustSQL.
△ Less
Submitted 2 July, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer
Authors:
Eunjee Choi,
Jong-Kook Kim
Abstract:
Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining…
▽ More
Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining for unified vision-language understanding and generation (BLIP) for three types of information: BERT and BLIP\textsubscript{Txt} for text, ResNet and BLIP\textsubscript{Img} for images, and bidirectional BLIP encoders for multimodal information. The Multimodal Tri-Transformer fuses tri-modal features using three types of multi-head attention mechanisms, ensuring integrated modalities for enhanced representations and improved multimodal data analysis. The experiments are performed using two fake news datasets, Weibo and Gossipcop. The results indicate TT-BLIP outperforms the state-of-the-art models.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
On the Consideration of AI Openness: Can Good Intent Be Abused?
Authors:
Yeeun Kim,
Eunkyung Choi,
Hyunjun Kim,
Hongseok Oh,
Hyunseo Shin,
Wonseok Hwang
Abstract:
Openness is critical for the advancement of science. In particular, recent rapid progress in AI has been made possible only by various open-source models, datasets, and libraries. However, this openness also means that technologies can be freely used for socially harmful purposes. Can open-source models or datasets be used for malicious purposes? If so, how easy is it to adapt technology for such…
▽ More
Openness is critical for the advancement of science. In particular, recent rapid progress in AI has been made possible only by various open-source models, datasets, and libraries. However, this openness also means that technologies can be freely used for socially harmful purposes. Can open-source models or datasets be used for malicious purposes? If so, how easy is it to adapt technology for such goals? Here, we conduct a case study in the legal domain, a realm where individual decisions can have profound social consequences. To this end, we build EVE, a dataset consisting of 200 examples of questions and corresponding answers about criminal activities based on 200 Korean precedents. We found that a widely accepted open-source LLM, which initially refuses to answer unethical questions, can be easily tuned with EVE to provide unethical and informative answers about criminal activities. This implies that although open-source technologies contribute to scientific progress, some care must be taken to mitigate possible malicious use cases. Warning: This paper contains contents that some may find unethical.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions
Authors:
Fangyuan Xu,
Kyle Lo,
Luca Soldaini,
Bailey Kuehl,
Eunsol Choi,
David Wadden
Abstract:
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. To evaluate the capabilities of current LLMs on this task, we construct KIWI, a dataset of knowledge-intensive writing instructions in the scientifi…
▽ More
Large language models (LLMs) adapted to follow user instructions are now widely deployed as conversational agents. In this work, we examine one increasingly common instruction-following task: providing writing assistance to compose a long-form answer. To evaluate the capabilities of current LLMs on this task, we construct KIWI, a dataset of knowledge-intensive writing instructions in the scientific domain. Given a research question, an initial model-generated answer and a set of relevant papers, an expert annotator iteratively issues instructions for the model to revise and improve its answer. We collect 1,260 interaction turns from 234 interaction sessions with three state-of-the-art LLMs. Each turn includes a user instruction, a model response, and a human evaluation of the model response. Through a detailed analysis of the collected responses, we find that all models struggle to incorporate new information into an existing answer, and to perform precise and unambiguous edits. Further, we find that models struggle to judge whether their outputs successfully followed user instructions, with accuracy at least 10 points short of human agreement. Our findings indicate that KIWI will be a valuable resource to measure progress and improve LLMs' instruction-following capabilities for knowledge intensive writing tasks.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Recent Advances, Applications, and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2023 Symposium
Authors:
Hyewon Jeong,
Sarah Jabbour,
Yuzhe Yang,
Rahul Thapta,
Hussein Mozannar,
William Jongwon Han,
Nikita Mehandru,
Michael Wornow,
Vladislav Lialin,
Xin Liu,
Alejandro Lozano,
Jiacheng Zhu,
Rafal Dariusz Kocielnik,
Keith Harrigian,
Haoran Zhang,
Edward Lee,
Milos Vukadinovic,
Aparna Balagopalan,
Vincent Jeanselme,
Katherine Matton,
Ilker Demirel,
Jason Fries,
Parisa Rashidi,
Brett Beaulieu-Jones,
Xuhai Orson Xu
, et al. (18 additional authors not shown)
Abstract:
The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four vir…
▽ More
The third ML4H symposium was held in person on December 10, 2023, in New Orleans, Louisiana, USA. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant topics for the \ac{ML4H} community. Encouraged by the successful virtual roundtables in the previous year, we organized eleven in-person roundtables and four virtual roundtables at ML4H 2022. The organization of the research roundtables at the conference involved 17 Senior Chairs and 19 Junior Chairs across 11 tables. Each roundtable session included invited senior chairs (with substantial experience in the field), junior chairs (responsible for facilitating the discussion), and attendees from diverse backgrounds with interest in the session's topic. Herein we detail the organization process and compile takeaways from these roundtable discussions, including recent advances, applications, and open challenges for each topic. We conclude with a summary and lessons learned across all roundtables. This document serves as a comprehensive review paper, summarizing the recent advancements in machine learning for healthcare as contributed by foremost researchers in the field.
△ Less
Submitted 5 April, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations
Authors:
Sunjun Kweon,
Byungjin Choi,
Minkyu Kim,
Rae Woong Park,
Edward Choi
Abstract:
We introduce KorMedMCQA, the first Korean multiple-choice question answering (MCQA) benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects. We conduct baseline experiments on vari…
▽ More
We introduce KorMedMCQA, the first Korean multiple-choice question answering (MCQA) benchmark derived from Korean healthcare professional licensing examinations, covering from the year 2012 to year 2023. This dataset consists of a selection of questions from the license examinations for doctors, nurses, and pharmacists, featuring a diverse array of subjects. We conduct baseline experiments on various large language models, including proprietary/open-source, multilingual/Korean-additional pretrained, and clinical context pretrained models, highlighting the potential for further enhancements. We make our data publicly available on HuggingFace (https://huggingface.co/datasets/sean0042/KorMedMCQA) and provide a evaluation script via LM-Harness, inviting further exploration and advancement in Korean healthcare environments.
△ Less
Submitted 5 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Authors:
Sunjun Kweon,
Jiyoun Kim,
Heeyoung Kwak,
Dongchul Cha,
Hangyul Yoon,
Kwanghyun Kim,
Jeewon Yang,
Seunghyun Won,
Edward Choi
Abstract:
Discharge summaries in Electronic Health Records (EHRs) are crucial for clinical decision-making, but their length and complexity make information extraction challenging, especially when dealing with accumulated summaries across multiple patient admissions. Large Language Models (LLMs) show promise in addressing this challenge by efficiently analyzing vast and complex data. Existing benchmarks, ho…
▽ More
Discharge summaries in Electronic Health Records (EHRs) are crucial for clinical decision-making, but their length and complexity make information extraction challenging, especially when dealing with accumulated summaries across multiple patient admissions. Large Language Models (LLMs) show promise in addressing this challenge by efficiently analyzing vast and complex data. Existing benchmarks, however, fall short in properly evaluating LLMs' capabilities in this context, as they typically focus on single-note information or limited topics, failing to reflect the real-world inquiries required by clinicians. To bridge this gap, we introduce EHRNoteQA, a novel benchmark built on the MIMIC-IV EHR, comprising 962 different QA pairs each linked to distinct patients' discharge summaries. Every QA pair is initially generated using GPT-4 and then manually reviewed and refined by three clinicians to ensure clinical relevance. EHRNoteQA includes questions that require information across multiple discharge summaries and covers eight diverse topics, mirroring the complexity and diversity of real clinical inquiries. We offer EHRNoteQA in two formats: open-ended and multi-choice question answering, and propose a reliable evaluation method for each. We evaluate 27 LLMs using EHRNoteQA and examine various factors affecting the model performance (e.g., the length and number of discharge summaries). Furthermore, to validate EHRNoteQA as a reliable proxy for expert evaluations in clinical practice, we measure the correlation between the LLM performance on EHRNoteQA, and the LLM performance manually evaluated by clinicians. Results show that LLM performance on EHRNoteQA have higher correlation with clinician-evaluated performance (Spearman: 0.78, Kendall: 0.62) compared to other benchmarks, demonstrating its practical relevance in evaluating LLMs in clinical settings.
△ Less
Submitted 27 June, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Phase Diagram and Spectroscopic Evidence of Supersolids in Quantum Ising Magnet K$_2$Co(SeO$_3$)$_2$
Authors:
Tong Chen,
Alireza Ghasemi,
Junyi Zhang,
Liyu Shi,
Zhenisbek Tagay,
Lei Chen,
Eun-Sang Choi,
Marcelo Jaime,
Minseong Lee,
Yiqing Hao,
Huibo Cao,
Barry Winn,
Ruidan Zhong,
Xianghan Xu,
N. P. Armitage,
Robert Cava,
Collin Broholm
Abstract:
A supersolid is a quantum entangled state of matter that combines features of both superfluids and solids. Despite predictions of its analog in quantum magnets, the experimental realization was lacking until recent claims in triangular-lattice compounds. Here, we report the magnetic phase diagram and neutron scattering for a spin-1/2 triangular-lattice antiferromagnet, K$_2$Co(SeO$_3$)$_2$. In zer…
▽ More
A supersolid is a quantum entangled state of matter that combines features of both superfluids and solids. Despite predictions of its analog in quantum magnets, the experimental realization was lacking until recent claims in triangular-lattice compounds. Here, we report the magnetic phase diagram and neutron scattering for a spin-1/2 triangular-lattice antiferromagnet, K$_2$Co(SeO$_3$)$_2$. In zero field, neutron spectroscopy reveals the gradual development of a $\sqrt3 \times sqrt3$ magnetic order associated with $Z_3$ symmetry breaking for temperatures 5 K < T < 15 K. Below 5 K, the emergence of a Goldstone mode from low-energy continuum scattering suggests that the system enters a supersolid phase characterized by the breaking of both $Z_3$ and spin rotational U(1) symmetry. In c-axis-oriented magnetic fields 1.1 T < B < 21 T, a prominent 1/3 magnetization plateau phase emerges, accompanied by a distinct high-field supersolid phase (18 T < B < 21 T). From the coherent spin wave excitations in the 1/3 magnetized plateau phase, we infer the spin Hamiltonian, which features nearest neighbor interactions with $J_z$ = 2.98(2) meV and $J_{\rm perp}$ = 0.21(3) meV. Our work demonstrates that K$_2$Co(SeO$_3$)$_2$ is a spectacular example of a spin-1/2 triangular-lattice quantum Ising antiferromagnet, documents its magnetic phase diagram highlighting two supersolid phases, and provides spectroscopic evidence of zero-field supersolidity.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval
Authors:
Soyoung Yoon,
Eunbi Choi,
Jiyeon Kim,
Hyeongu Yun,
Yireun Kim,
Seung-won Hwang
Abstract:
We propose ListT5, a novel reranking approach based on Fusion-in-Decoder (FiD) that handles multiple candidate passages at both train and inference time. We also introduce an efficient inference framework for listwise ranking based on m-ary tournament sort with output caching. We evaluate and compare our model on the BEIR benchmark for zero-shot retrieval task, demonstrating that ListT5 (1) outper…
▽ More
We propose ListT5, a novel reranking approach based on Fusion-in-Decoder (FiD) that handles multiple candidate passages at both train and inference time. We also introduce an efficient inference framework for listwise ranking based on m-ary tournament sort with output caching. We evaluate and compare our model on the BEIR benchmark for zero-shot retrieval task, demonstrating that ListT5 (1) outperforms the state-of-the-art RankT5 baseline with a notable +1.3 gain in the average NDCG@10 score, (2) has an efficiency comparable to pointwise ranking models and surpasses the efficiency of previous listwise ranking models, and (3) overcomes the lost-in-the-middle problem of previous listwise rerankers. Our code, model checkpoints, and the evaluation framework are fully open-sourced at \url{https://github.com/soyoung97/ListT5}.
△ Less
Submitted 6 June, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
Multimodal Transformer With a Low-Computational-Cost Guarantee
Authors:
Sungjin Park,
Edward Choi
Abstract:
Transformer-based models have significantly improved performance across a range of multimodal understanding tasks, such as visual question answering and action recognition. However, multimodal Transformers significantly suffer from a quadratic complexity of the multi-head attention with the input sequence length, especially as the number of modalities increases. To address this, we introduce Low-C…
▽ More
Transformer-based models have significantly improved performance across a range of multimodal understanding tasks, such as visual question answering and action recognition. However, multimodal Transformers significantly suffer from a quadratic complexity of the multi-head attention with the input sequence length, especially as the number of modalities increases. To address this, we introduce Low-Cost Multimodal Transformer (LoCoMT), a novel multimodal attention mechanism that aims to reduce computational cost during training and inference with minimal performance loss. Specifically, by assigning different multimodal attention patterns to each attention head, LoCoMT can flexibly control multimodal signals and theoretically ensures a reduced computational cost compared to existing multimodal Transformer variants. Experimental results on two multimodal datasets, namely Audioset and MedVidCL demonstrate that LoCoMT not only reduces GFLOPs but also matches or even outperforms established models.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Authors:
Jiyoung Lee,
Minwoo Kim,
Seungho Kim,
Junghwan Kim,
Seunghyun Won,
Hwaran Lee,
Edward Choi
Abstract:
For Large Language Models (LLMs) to be effectively deployed in a specific country, they must possess an understanding of the nation's culture and basic knowledge. To this end, we introduce National Alignment, which measures an alignment between an LLM and a targeted country from two aspects: social value alignment and common knowledge alignment. Social value alignment evaluates how well the model…
▽ More
For Large Language Models (LLMs) to be effectively deployed in a specific country, they must possess an understanding of the nation's culture and basic knowledge. To this end, we introduce National Alignment, which measures an alignment between an LLM and a targeted country from two aspects: social value alignment and common knowledge alignment. Social value alignment evaluates how well the model understands nation-specific social values, while common knowledge alignment examines how well the model captures basic knowledge related to the nation. We constructed KorNAT, the first benchmark that measures national alignment with South Korea. For the social value dataset, we obtained ground truth labels from a large-scale survey involving 6,174 unique Korean participants. For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials. KorNAT contains 4K and 6K multiple-choice questions for social value and common knowledge, respectively. Our dataset creation process is meticulously designed and based on statistical sampling theory and was refined through multiple rounds of human review. The experiment results of seven LLMs reveal that only a few models met our reference score, indicating a potential for further enhancement. KorNAT has received government approval after passing an assessment conducted by a government-affiliated organization dedicated to evaluating dataset quality. Samples and detailed evaluation protocols of our dataset can be found in https://huggingface.co/datasets/jiyounglee0523/KorNAT .
△ Less
Submitted 5 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Magnetic field-temperature phase diagram of spin-1/2 triangular lattice antiferromagnet KYbSe$_2$
Authors:
Sangyun Lee,
Andrew J. Woods,
Minseong Lee,
Shengzhi Zhang,
Eun Sang Choi,
A. O. Scheie,
D. A. Tennant,
J. Xing,
A. S. Sefat,
R. Movshovich
Abstract:
A quantum spin liquid (QSL) is a state of matter characterized by fractionalized quasiparticle excitations, quantum entanglement, and a lack of long-range magnetic order. However, QSLs have evaded definitive experimental observation. Several Yb$^{3+}$-based triangular lattice antiferromagnets with effective $S$ = $\frac{1}{2}$ have been suggested to stabilize the QSL state as the ground state. Her…
▽ More
A quantum spin liquid (QSL) is a state of matter characterized by fractionalized quasiparticle excitations, quantum entanglement, and a lack of long-range magnetic order. However, QSLs have evaded definitive experimental observation. Several Yb$^{3+}$-based triangular lattice antiferromagnets with effective $S$ = $\frac{1}{2}$ have been suggested to stabilize the QSL state as the ground state. Here, we build a comprehensive magnetic temperature phase diagram of a high-quality single crystalline KYbSe$_2$ via heat capacity and magnetocaloric effect down to 30 mK with magnetic field applied along the $a$-axis. At zero magnetic field, we observe the magnetic long-range order at $T_N$ = 0.29 K entering 120 degrees ordered state in heat capacity, consistent with neutron scattering studies. Analysis of the low-temperature ($T$) specific heat ($C$) at zero magnetic field indicates linear $T$-dependence of $C/T$ and a broad hump of $C/T$ in the proximate QSL region above $T_N$. By applying magnetic field, we observe the up-up-down phase with 1/3 magnetization plateau and oblique phases, in addition to two new phases. These observations strongly indicate that while KYbSe$_2$ closely exhibits characteristics resembling an ideal triangular lattice, deviations may exist, such as the effect of the next-nearest-neighbor exchange interaction, calling for careful consideration for spin Hamiltonian modeling. Further investigations into tuning parameters, such as chemical pressure, could potentially induce an intriguing QSL phase in the material.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs
Authors:
Eun Cheol Choi,
Emilio Ferrara
Abstract:
Our society is facing rampant misinformation harming public health and trust. To address the societal challenge, we introduce FACT-GPT, a system leveraging Large Language Models (LLMs) to automate the claim matching stage of fact-checking. FACT-GPT, trained on a synthetic dataset, identifies social media content that aligns with, contradicts, or is irrelevant to previously debunked claims. Our eva…
▽ More
Our society is facing rampant misinformation harming public health and trust. To address the societal challenge, we introduce FACT-GPT, a system leveraging Large Language Models (LLMs) to automate the claim matching stage of fact-checking. FACT-GPT, trained on a synthetic dataset, identifies social media content that aligns with, contradicts, or is irrelevant to previously debunked claims. Our evaluation shows that our specialized LLMs can match the accuracy of larger models in identifying related claims, closely mirroring human judgment. This research provides an automated solution for efficient claim matching, demonstrates the potential of LLMs in supporting fact-checkers, and offers valuable resources for further research in the field.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Self-Supervised Contrastive Learning for Long-term Forecasting
Authors:
Junwoo Park,
Daehoon Gwak,
Jaegul Choo,
Edward Choi
Abstract:
Long-term forecasting presents unique challenges due to the time and memory complexity of handling long sequences. Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations that are partially caught within the short window (i.e., outer-window variations). In this paper, we introduce a novel approach that overcomes this limitatio…
▽ More
Long-term forecasting presents unique challenges due to the time and memory complexity of handling long sequences. Existing methods, which rely on sliding windows to process long sequences, struggle to effectively capture long-term variations that are partially caught within the short window (i.e., outer-window variations). In this paper, we introduce a novel approach that overcomes this limitation by employing contrastive learning and enhanced decomposition architecture, specifically designed to focus on long-term variations. To this end, our contrastive loss incorporates global autocorrelation held in the whole time series, which facilitates the construction of positive and negative pairs in a self-supervised manner. When combined with our decomposition networks, our contrastive learning significantly improves long-term forecasting performance. Extensive experiments demonstrate that our approach outperforms 14 baseline models in multiple experiments over nine long-term benchmarks, especially in challenging scenarios that require a significantly long output for forecasting. Source code is available at https://github.com/junwoopark92/Self-Supervised-Contrastive-Forecsating.
△ Less
Submitted 24 March, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Authors:
Zhisheng Zheng,
Puyuan Peng,
Ziyang Ma,
Xie Chen,
Eunsol Choi,
David Harwath
Abstract:
Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model with the natural language reasoning capabilities of a large language model (LLM) to replicate this innate ability. To address the lack of existing da…
▽ More
Spatial sound reasoning is a fundamental human skill, enabling us to navigate and interpret our surroundings based on sound. In this paper we present BAT, which combines the spatial sound perception ability of a binaural acoustic scene analysis model with the natural language reasoning capabilities of a large language model (LLM) to replicate this innate ability. To address the lack of existing datasets of in-the-wild spatial sounds, we synthesized a binaural audio dataset using AudioSet and SoundSpaces 2.0. Next, we developed SpatialSoundQA, a spatial sound-based question-answering dataset, offering a range of QA tasks that train BAT in various aspects of spatial sound perception and reasoning. The acoustic front end encoder of BAT is a novel spatial audio encoder named Spatial Audio Spectrogram Transformer, or Spatial-AST, which by itself achieves strong performance across sound event detection, spatial localization, and distance estimation. By integrating Spatial-AST with LLaMA-2 7B model, BAT transcends standard Sound Event Localization and Detection (SELD) tasks, enabling the model to reason about the relationships between the sounds in its environment. Our experiments demonstrate BAT's superior performance on both spatial sound perception and reasoning, showcasing the immense potential of LLMs in navigating and interpreting complex spatial audio environments.
△ Less
Submitted 25 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Eco-driving under localization uncertainty for connected vehicles on Urban roads: Data-driven approach and Experiment verification
Authors:
Eunhyek Joa,
Eric Yongkeun Choi,
Francesco Borrelli
Abstract:
This paper addresses the eco-driving problem for connected vehicles on urban roads, considering localization uncertainty. Eco-driving is defined as longitudinal speed planning and control on roads with the presence of a sequence of traffic lights. We solve the problem by using a data-driven model predictive control (MPC) strategy. This approach involves learning a cost-to-go function and constrain…
▽ More
This paper addresses the eco-driving problem for connected vehicles on urban roads, considering localization uncertainty. Eco-driving is defined as longitudinal speed planning and control on roads with the presence of a sequence of traffic lights. We solve the problem by using a data-driven model predictive control (MPC) strategy. This approach involves learning a cost-to-go function and constraints from state-input data. The cost-to-go function represents the remaining energy-to-spend from the given state, and the constraints ensure that the controlled vehicle passes the upcoming traffic light timely while obeying traffic laws. The resulting convex optimization problem has a short horizon and is amenable for real-time implementations. We demonstrate the effectiveness of our approach through real-world vehicle experiments. Our method demonstrates $12\%$ improvement in energy efficiency compared to the traditional approaches, which plan longitudinal speed by solving a long-horizon optimal control problem and track the planned speed using another controller, as evidenced by vehicle experiments.
△ Less
Submitted 4 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement
Authors:
Aaqib Saeed,
Dimitris Spathis,
Jungwoo Oh,
Edward Choi,
Ali Etemad
Abstract:
Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the us…
▽ More
Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the users and usually require rich metadata. As a result, label noise can become an increasingly thorny issue when labeling such data. In this paper, we propose a novel solution to address noisy label learning, entitled Few-Shot Human-in-the-Loop Refinement (FHLR). Our method initially learns a seed model using weak labels. Next, it fine-tunes the seed model using a handful of expert corrections. Finally, it achieves better generalizability and robustness by merging the seed and fine-tuned models via weighted parameter averaging. We evaluate our approach on four challenging tasks and datasets, and compare it against eight competitive baselines designed to deal with noisy labels. We show that FHLR achieves significantly better performance when learning from noisy labels and achieves state-of-the-art by a large margin, with up to 19% accuracy improvement under symmetric and asymmetric noise. Notably, we find that FHLR is particularly robust to increased label noise, unlike prior works that suffer from severe performance degradation. Our work not only achieves better generalization in high-stakes health sensing benchmarks but also sheds light on how noise affects commonly-used models.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Anomalous Proximitized Transport in Metal/Quantum Magnet Heterostructure $\rm{Bi_{2}Ir_{2}O_{7}/Yb_{2}Ti_{2}O_{7}}$
Authors:
Chengkun Xing,
Shu Zhang,
Weiliang Yao,
Dapeng Cui,
Qing Huang,
Junyi Yang,
Shashi Pandey,
Dongliang Gong,
Lukas Horák,
Yan Xin,
Eun Sang Choi,
Yang Zhang,
Haidong Zhou,
Jian Liu
Abstract:
Fluctuations of quantum spins play a crucial role in the emergence of exotic magnetic phases and excitations. The lack of the charge degree of freedom in insulating quantum magnets, however, precludes such fluctuations from mediating electronic transport. Here we show that the quantum fluctuations of a localized frustrated magnet induce strong proximitized charge transport of the conduction electr…
▽ More
Fluctuations of quantum spins play a crucial role in the emergence of exotic magnetic phases and excitations. The lack of the charge degree of freedom in insulating quantum magnets, however, precludes such fluctuations from mediating electronic transport. Here we show that the quantum fluctuations of a localized frustrated magnet induce strong proximitized charge transport of the conduction electrons in a synthetic heterostructure comprising an epitaxial $\rm{Bi_{2}Ir_{2}O_{7}}$ ultrathin film on the single crystal of $\rm{Yb_{2}Ti_{2}O_{7}}$. The proximity effects are evidenced by the scaling behavior of the $\rm{Bi_{2}Ir_{2}O_{7}}$ resistance in correspondance with the dynamic scaling of the dynamic spin correlation function of $\rm{Yb_{2}Ti_{2}O_{7}}$, which is a result of quantum fluctuations near a multi-phase quantum critical point. The proximitized transport in $\rm{Bi_{2}Ir_{2}O_{7}}$ can be effectively tuned by magnetic field through suppressing the quantum spin fluctuations as well as inducing transitions via magnetic anisotropy in $\rm{Yb_{2}Ti_{2}O_{7}}$. Our work establishes a new pathway for harnessing quantum spin fluctuations in magnetic insulators with electric transport, offering exciting prospects for potential applications in the realm of quantum spintronics.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Limitations of Data-Driven Spectral Reconstruction -- Optics-Aware Analysis and Mitigation
Authors:
Qiang Fu,
Matheus Souza,
Eunsue Choi,
Suhyun Shin,
Seung-Hwan Baek,
Wolfgang Heidrich
Abstract:
Hyperspectral imaging empowers machine vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware.
In this paper we systematically analyze the performance of such m…
▽ More
Hyperspectral imaging empowers machine vision systems with the distinct capability of identifying materials through recording their spectral signatures. Recent efforts in data-driven spectral reconstruction aim at extracting spectral information from RGB images captured by cost-effective RGB cameras, instead of dedicated hardware.
In this paper we systematically analyze the performance of such methods, evaluating both the practical limitations with respect to current datasets and overfitting, as well as fundamental limitations with respect to the nature of the information encoded in the RGB images, and the dependency of this information on the optical system of the camera.
We find that, the current models are not robust under slight variations, e.g., in noise level or compression of the RGB file. Without modeling underrepresented spectral content, existing datasets and the models trained on them are limited in their ability to cope with challenging metameric colors. To mitigate this issue, we propose to exploit the combination of metameric data augmentation and optical lens aberrations to improve the encoding of the metameric information into the RGB image, which paves the road towards higher performing spectral imaging and reconstruction approaches.
△ Less
Submitted 2 April, 2024; v1 submitted 8 January, 2024;
originally announced January 2024.
-
Generic magnetic field dependence of thermal conductivity in magnetic insulators via hybridization of acoustic phonons and spin-flip excitations
Authors:
Christopher A. Pocs,
Ian A. Leahy,
Jie Xing,
Eun Sang Choi,
Athena S. Sefat,
Michael Hermele,
Minhyea Lee
Abstract:
Magnetic insulators provide excellent playgrounds to realize a range of exciting spin models, some of which predict exotic spin ground states, and thermal transport properties have been taking center stage in probing the spin excitations. Despite the fact that acoustic phonons make the major contribution to heat conduction in a crystalline system, their interplay with magnetic excitations is often…
▽ More
Magnetic insulators provide excellent playgrounds to realize a range of exciting spin models, some of which predict exotic spin ground states, and thermal transport properties have been taking center stage in probing the spin excitations. Despite the fact that acoustic phonons make the major contribution to heat conduction in a crystalline system, their interplay with magnetic excitations is often viewed as peripheral to the physics of interest, for instance as an inconvenient source of scattering or decoherence. Here, we present a comprehensive study on the longitudinal magneto-thermal transport in a paramagnetic effective spin-1/2 magnetic insulator CsYbSe$_2$. We introduce a minimal model requiring only Zeeman splitting and magnetoelastic coupling, and use it to argue that hybridized excitations -- formed from acoustic phonons and localized spin-flip-excitations across the Zeeman gap of the crystal electric field ground doublet -- are responsible for a striking non-monotonic field dependence of longitudinal thermal conductivity. Beyond highlighting a starring role for phonons, our results raise the prospect of universal magneto-thermal transport phenomena in magnetic insulators that originate from simple features shared across many systems.
△ Less
Submitted 26 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Open Domain Knowledge Extraction for Knowledge Graphs
Authors:
Kun Qian,
Anton Belyi,
Fei Wu,
Samira Khorshidi,
Azadeh Nikfarjam,
Rahul Khot,
Yisi Sang,
Katherine Luna,
Xianqi Chu,
Eric Choi,
Yash Govind,
Chloe Seivwright,
Yiwen Sun,
Ahmed Fakhry,
Theo Rekatsinas,
Ihab Ilyas,
Xiaoguang Qi,
Yunyao Li
Abstract:
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from ope…
▽ More
The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.
△ Less
Submitted 30 October, 2023;
originally announced December 2023.
-
The Origins of Gas Accreted by Supermassive Black Holes: the Importance of Recycled Gas
Authors:
Ena Choi,
Rachel S. Somerville,
Jeremiah P. Ostriker,
Michaela Hirschmann,
Thorsten Naab
Abstract:
We investigate the fueling mechanisms of supermassive black holes (SMBHs) by analyzing ten zoom-in cosmological simulations of massive galaxies, with stellar masses $10^{11-12} M_{\odot}$ and SMBH masses $10^{8.9-9.7}$ at $z=0$ and featuring various major and minor merger events. By tracing the gas history in these simulations, we categorize the gas accreted by the central SMBHs based on its origi…
▽ More
We investigate the fueling mechanisms of supermassive black holes (SMBHs) by analyzing ten zoom-in cosmological simulations of massive galaxies, with stellar masses $10^{11-12} M_{\odot}$ and SMBH masses $10^{8.9-9.7}$ at $z=0$ and featuring various major and minor merger events. By tracing the gas history in these simulations, we categorize the gas accreted by the central SMBHs based on its origin. Gas that belonged to a different galaxy before accretion onto the BH is labeled as (1) ``external," while smoothly accreted cosmic gas is classified as (2) ``smooth." Gas produced within the primary halo through stellar evolution and subsequently accreted by the SMBH is classified as (3) ``recycled." Our analysis, which included stellar feedback, reveals that the primary fuel source for SMBHs is the recycled gas from dying stars. This recycled gas from stars in the inner region of the galaxy readily collapses toward the center, triggering starbursts, and simultaneously fueling the SMBH. Galaxy mergers also play a crucial role in fueling SMBHs in massive galaxies as SMBHs in massive halos tend to accrete a higher fraction of external gas from mergers compared to smoothly accreted gas. However, on average, it takes approximately 1.85 Gyr for external gas to enter the main galaxy and accrete onto the SMBH. Considering the presence of various other gas triggers for AGN activity alongside this time delay, the association between AGN and mergers may not always be obvious.
△ Less
Submitted 19 February, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Authors:
Yujin Jeon,
Eunsue Choi,
Youngchan Kim,
Yunseong Moon,
Khalid Omer,
Felix Heide,
Seung-Hwan Baek
Abstract:
Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing da…
▽ More
Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although spectro-polarimetric datasets exist, these datasets have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets: trichromatic Stokes images and hyperspectral Stokes images. These novel datasets encompass both linear and circular polarization; they introduce multiple spectral channels; and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research. Dataset and code will be publicly available.
△ Less
Submitted 30 November, 2023; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Engineering Anomalously Large Electron Transport in Topological Semimetals
Authors:
Vincent M. Plisson,
Xiaohan Yao,
Yaxian Wang,
George Varnavides,
Alexey Suslov,
David Graf,
Eun Sang Choi,
Hung-Yu Yang,
Yiping Wang,
Marisa Romanelli,
Grant McNamara,
Birender Singh,
Gregory T. McCandless,
Julia Y. Chan,
Prineha Narang,
Fazel Tafti,
Kenneth S. Burch
Abstract:
Anomalous transport of topological semimetals has generated significant interest for applications in optoelectronics, nanoscale devices, and interconnects. Understanding the origin of novel transport is crucial to engineering the desired material properties, yet their orders of magnitude higher transport than single-particle mobilities remain unexplained. This work demonstrates the dramatic mobili…
▽ More
Anomalous transport of topological semimetals has generated significant interest for applications in optoelectronics, nanoscale devices, and interconnects. Understanding the origin of novel transport is crucial to engineering the desired material properties, yet their orders of magnitude higher transport than single-particle mobilities remain unexplained. This work demonstrates the dramatic mobility enhancements result from phonons primarily returning momentum to electrons due to phonon-electron dominating over phonon-phonon scattering. Proving this idea, proposed by Peierls in 1932, requires tuning electron and phonon dispersions without changing symmetry, topology, or disorder. This is achieved by combining de Haas - van Alphen (dHvA), electron transport, Raman scattering, and first-principles calculations in the topological semimetals MX$_2$ (M=Nb, Ta and X=Ge, Si). Replacing Ge with Si brings the transport mobilities from an order magnitude larger than single particle ones to nearly balanced. This occurs without changing the crystal structure or topology and with small differences in disorder or Fermi surface. Simultaneously, Raman scattering and first-principles calculations establish phonon-electron dominated scattering only in the MGe$_2$ compounds. Thus, this study proves that phonon-drag is crucial to the transport properties of topological semimetals and provides insight to further engineer these materials.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Crafting In-context Examples according to LMs' Parametric Knowledge
Authors:
Yoonsang Lee,
Pranav Atreya,
Xi Ye,
Eunsol Choi
Abstract:
In-context learning can improve the performances of knowledge-rich tasks such as question answering. In such scenarios, in-context examples trigger a language model (LM) to surface information stored in its parametric knowledge. We study how to better construct in-context example sets, based on whether the model is aware of the in-context examples. We identify 'known' examples, where models can co…
▽ More
In-context learning can improve the performances of knowledge-rich tasks such as question answering. In such scenarios, in-context examples trigger a language model (LM) to surface information stored in its parametric knowledge. We study how to better construct in-context example sets, based on whether the model is aware of the in-context examples. We identify 'known' examples, where models can correctly answer from their parametric knowledge, and 'unknown' ones. Our experiments show that prompting with 'unknown' examples decreases the performance, potentially as it encourages hallucination rather than searching for its parametric knowledge. Constructing an in-context example set that presents both known and unknown information performs the best across diverse settings. We perform analysis on three multi-answer question answering datasets, which allows us to further study answer set ordering strategies based on the LM's knowledge of each answer. Together, our study sheds light on how to best construct in-context example sets for knowledge-rich tasks.
△ Less
Submitted 3 April, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Clarify When Necessary: Resolving Ambiguity Through Interaction with LMs
Authors:
Michael J. Q. Zhang,
Eunsol Choi
Abstract:
Resolving ambiguities through interaction is a hallmark of natural language, and modeling this behavior is a core challenge in crafting AI assistants. In this work, we study such behavior in LMs by proposing a task-agnostic framework for resolving ambiguity by asking users clarifying questions. Our framework breaks down this objective into three subtasks: (1) determining when clarification is need…
▽ More
Resolving ambiguities through interaction is a hallmark of natural language, and modeling this behavior is a core challenge in crafting AI assistants. In this work, we study such behavior in LMs by proposing a task-agnostic framework for resolving ambiguity by asking users clarifying questions. Our framework breaks down this objective into three subtasks: (1) determining when clarification is needed, (2) determining what clarifying question to ask, and (3) responding accurately with the new information gathered through clarification. We evaluate systems across three NLP applications: question answering, machine translation and natural language inference. For the first subtask, we present a novel uncertainty estimation approach, intent-sim, that determines the utility of querying for clarification by estimating the entropy over user intents. Our method consistently outperforms existing uncertainty estimation approaches at identifying predictions that will benefit from clarification. When only allowed to ask for clarification on 10% of examples, our system is able to double the performance gains over randomly selecting examples to clarify. Furthermore, we find that intent-sim is robust, demonstrating improvements across a wide range of NLP tasks and LMs. Together, our work lays foundation for studying clarifying interactions with LMs.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.