subscribe to arXiv mailings

Molecule Language Model with Augmented Pairs and Expertise Transfer

Authors: Namkyeong Lee, Siddhartha Laghuvarapu, Chanyoung Park, Jimeng Sun

Abstract: Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augment… ▽ More Understanding the molecules and their textual descriptions via molecule language models (MoLM) recently got a surge of interest among researchers. However, unique challenges exist in the field of MoLM due to 1) a limited amount of molecule-text paired data and 2) missing expertise that occurred due to the specialized areas of focus among the experts. To this end, we propose AMOLE, which 1) augments molecule-text pairs with structural similarity preserving loss, and 2) transfers the expertise between the molecules. Extensive experiments on various downstream tasks demonstrate the superiority of AMOLE in comprehending molecules and their descriptions, highlighting its potential for application in real-world drug discovery. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: ACL 2024 Workshop on Languages and Molecule

arXiv:2406.15524 [pdf, other]

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Authors: Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

Abstract: This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this… ▽ More This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.09948 [pdf, other]

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

Authors: Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla Perez-Almendros, Abinew Ali Ayele, Víctor Gutiérrez-Basulto, Yazmín Ibáñez-García, Hwaran Lee, Shamsuddeen Hassan Muhammad, Kiwoong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jose Camacho-Collados, Alice Oh

Abstract: Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food… ▽ More Large language models (LLMs) often lack culture-specific knowledge of daily life, especially across diverse regions and non-English languages. Existing benchmarks for evaluating LLMs' cultural sensitivities are limited to a single language or collected from online sources such as Wikipedia, which do not reflect the mundane everyday lifestyles of diverse regions. That is, information about the food people eat for their birthday celebrations, spices they typically use, musical instruments youngsters play, or the sports they practice in school is common cultural knowledge but uncommon in easily collected online sources, especially for underrepresented cultures. To address this issue, we introduce BLEnD, a hand-crafted benchmark designed to evaluate LLMs' everyday knowledge across diverse cultures and languages. BLEnD comprises 52.6k question-answer pairs from 16 countries/regions, in 13 different languages, including low-resource ones such as Amharic, Assamese, Azerbaijani, Hausa, and Sundanese. We construct the benchmark to include two formats of questions: short-answer and multiple-choice. We show that LLMs perform better for cultures that are highly represented online, with a maximum 57.34% difference in GPT-4, the best-performing model, in the short-answer format. For cultures represented by mid-to-high-resource languages, LLMs perform better in their local languages, but for cultures represented by low-resource languages, LLMs perform better in English than the local languages. We make our dataset publicly available at: https://github.com/nlee0212/BLEnD. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.06424 [pdf, other]

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

Authors: Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong

Abstract: Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the al… ▽ More Modern alignment techniques based on human preferences, such as RLHF and DPO, typically employ divergence regularization relative to the reference model to ensure training stability. However, this often limits the flexibility of models during alignment, especially when there is a clear distributional discrepancy between the preference data and the reference model. In this paper, we focus on the alignment of recent text-to-image diffusion models, such as Stable Diffusion XL (SDXL), and find that this "reference mismatch" is indeed a significant problem in aligning these models due to the unstructured nature of visual modalities: e.g., a preference for a particular stylistic aspect can easily induce such a discrepancy. Motivated by this observation, we propose a novel and memory-friendly preference alignment method for diffusion models that does not depend on any reference model, coined margin-aware preference optimization (MaPO). MaPO jointly maximizes the likelihood margin between the preferred and dispreferred image sets and the likelihood of the preferred sets, simultaneously learning general stylistic features and preferences. For evaluation, we introduce two new pairwise preference datasets, which comprise self-generated image pairs from SDXL, Pick-Style and Pick-Safety, simulating diverse scenarios of reference mismatch. Our experiments validate that MaPO can significantly improve alignment on Pick-Style and Pick-Safety and general preference alignment when used with Pick-a-Pic v2, surpassing the base SDXL and other existing methods. Our code, models, and datasets are publicly available via https://mapo-t2i.github.io △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.05761 [pdf, other]

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Work in Progress

arXiv:2406.00925 [pdf, other]

Dimers for Type D Relativistic Toda Model

Authors: Kimyeong Lee, Norton Lee

Abstract: We construct dimer graphs for type D relativistic Toda models by introducing impurities to the $Y^{2N,0}$ square dimer graphs. By properly placing the impurities and change of canonical variables assigned to the 1-loops on the dimer graph, we introduce the "folding" of the graphs and get the type D relativistic Toda lattice Hamiltonian and monodromy matrix. We construct dimer graphs for type D relativistic Toda models by introducing impurities to the $Y^{2N,0}$ square dimer graphs. By properly placing the impurities and change of canonical variables assigned to the 1-loops on the dimer graph, we introduce the "folding" of the graphs and get the type D relativistic Toda lattice Hamiltonian and monodromy matrix. △ Less

Submitted 23 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: 25+6 pages, 14 figures, add citation

Report number: KIAS-P24038, CGP24008

arXiv:2405.08614 [pdf, other]

FDD Massive MIMO: How to Optimally Combine UL Pilot and Limited DL CSI Feedback?

Authors: Jungyeon Kim, Jinseok Choi, Jeonghun Park, Ahmed Alkhateeb, Namyoon Lee

Abstract: In frequency-division duplexing (FDD) multiple-input multiple-output (MIMO) systems, obtaining accurate downlink channel state information (CSI) for precoding is vastly challenging due to the tremendous feedback overhead with the growing number of antennas. Utilizing uplink pilots for downlink CSI estimation is a promising approach that can eliminate CSI feedback. However, the downlink CSI estimat… ▽ More In frequency-division duplexing (FDD) multiple-input multiple-output (MIMO) systems, obtaining accurate downlink channel state information (CSI) for precoding is vastly challenging due to the tremendous feedback overhead with the growing number of antennas. Utilizing uplink pilots for downlink CSI estimation is a promising approach that can eliminate CSI feedback. However, the downlink CSI estimation accuracy diminishes significantly as the number of channel paths increases, resulting in reduced spectral efficiency. In this paper, we demonstrate that achieving downlink spectral efficiency comparable to perfect CSI is feasible by combining uplink CSI with limited downlink CSI feedback information. Our proposed downlink CSI feedback strategy transmits quantized phase information of downlink channel paths, deviating from conventional limited methods. We put forth a mean square error (MSE)-optimal downlink channel reconstruction method by jointly exploiting the uplink CSI and the limited downlink CSI. Armed with the MSE-optimal estimator, we derive the MSE as a function of the number of feedback bits for phase quantization. Subsequently, we present an optimal feedback bit allocation method for minimizing the MSE in the reconstructed channel through phase quantization. Utilizing a robust downlink precoding technique, we establish that the proposed downlink channel reconstruction method is sufficient for attaining a sum-spectral efficiency comparable to perfect CSI. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 13 pages, 10 figures

arXiv:2404.14276 [pdf, other]

A Bayesian Approach for Prioritising Driving Behaviour Investigations in Telematic Auto Insurance Policies

Authors: Mark McLeod, Bernardo Perez-Orozco, Nika Lee, Davide Zilli

Abstract: Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases.… ▽ More Automotive insurers increasingly have access to telematic information via black-box recorders installed in the insured vehicle, and wish to identify undesirable behaviour which may signify increased risk or uninsured activities. However, identification of such behaviour with machine learning is non-trivial, and results are far from perfect, requiring human investigation to verify suspected cases. An appropriately formed priority score, generated by automated analysis of GPS data, allows underwriters to make more efficient use of their time, improving detection of the behaviour under investigation. An example of such behaviour is the use of a privately insured vehicle for commercial purposes, such as delivering meals and parcels. We first make use of trip GPS and accelerometer data, augmented by geospatial information, to train an imperfect classifier for delivery driving on a per-trip basis. We make use of a mixture of Beta-Binomial distributions to model the propensity of a policyholder to undertake trips which result in a positive classification as being drawn from either a rare high-scoring or common low-scoring group, and learn the parameters of this model using MCMC. This model provides us with a posterior probability that any policyholder will be a regular generator of automated alerts given any number of trips and alerts. This posterior probability is converted to a priority score, which was used to select the most valuable candidates for manual investigation. Testing over a 1-year period ranked policyholders by likelihood of commercial driving activity on a weekly basis. The top 0.9% have been reviewed at least once by the underwriters at the time of writing, and of those 99.4% have been confirmed as correctly identified, showing the approach has achieved a significant improvement in efficiency of human resource allocation compared to manual searching. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: International Congress of Actuaries (2023)

arXiv:2404.09959 [pdf, other]

NNLO QCD corrections to polarized semi-inclusive DIS

Authors: Saurav Goyal, Roman N. Lee, Sven-Olaf Moch, Vaibhav Pathak, Narayan Rana, V. Ravindran

Abstract: Polarized semi-inclusive deep-inelastic scattering (SIDIS) is a key process in the quest for a resolution of the proton spin puzzle. We present the complete results for the polarized SIDIS process at next-to-next-to-leading order (NNLO) in perturbative quantum chromodynamics. Our analytical results include all partonic channels for the scattering of polarized leptons off hadrons and a spin-average… ▽ More Polarized semi-inclusive deep-inelastic scattering (SIDIS) is a key process in the quest for a resolution of the proton spin puzzle. We present the complete results for the polarized SIDIS process at next-to-next-to-leading order (NNLO) in perturbative quantum chromodynamics. Our analytical results include all partonic channels for the scattering of polarized leptons off hadrons and a spin-averaged hadron identified in the final state. A numerical analysis of the NNLO corrections illustrates their significance and the reduced residual scale dependence in the kinematic range probed by the future Electron-Ion-Collider EIC. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 6 pages, 2 figures; 1 ancillary file

arXiv:2404.03655 [pdf, other]

Magnetic fields from small-scale primordial perturbations

Authors: Nanoom Lee, Yacine Ali-Haimoud

Abstract: Weak magnetic fields must have existed in the early Universe, as they were sourced by the cross product of electron density and temperature gradients through the Biermann-battery mechanism. In this paper we calculate the magnetic fields generated at cosmic dawn by a variety of small-scale primordial perturbations, carefully computing the evolution of electron density and temperature fluctuations,… ▽ More Weak magnetic fields must have existed in the early Universe, as they were sourced by the cross product of electron density and temperature gradients through the Biermann-battery mechanism. In this paper we calculate the magnetic fields generated at cosmic dawn by a variety of small-scale primordial perturbations, carefully computing the evolution of electron density and temperature fluctuations, and consistently accounting for relative velocities between baryons and dark matter. We first compute the magnetic field resulting from standard, nearly scale-invariant primordial adiabatic perturbations, making significant improvements to previous calculations. This "standard" primordial field has a root mean square (rms) of $\sim10^{-15}$ nG at $20\lesssim z \lesssim 100$, with fluctuations on $\sim$ kpc comoving scales, and could serve as the seed of present-day magnetic fields observed in galaxies and galaxy clusters. In addition, we consider early-Universe magnetic fields as a possible probe of non-standard initial conditions of the Universe on small scales $k \sim 1-10^3$ Mpc$^{-1}$. To this end, we compute the maximally-allowed magnetic fields within current upper limits on small-scale adiabatic and isocurvature perturbations. Under the current Cosmic Microwave Background spectral-distortion constraints magnetic fields could be produced with a rms of $\sim 5\times 10^{-11}$ nG at $z = 20$. Uncorrelated small-scale isocurvature perturbations within current Big-Bang Nucleosynthesis bounds could potentially enhance the magnetic field to $\sim 10^{-14}-10^{-10}$ nG at $z = 20$, depending on the specific isocurvature mode considered. While these very weak fields remain well below current observational capabilities, our work points out that magnetic fields could potentially provide an interesting window into the poorly constrained small-scale initial conditions of the Universe. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 12 pages, 6 figures

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.01844 [pdf, ps, other]

doi 10.1016/j.nuclphysb.2024.116604

Generalized Calogero-Moser system and supergroup gauge origami

Authors: Taro Kimura, Norton Lee

Abstract: We study the integrability and the Bethe/Gauge correspondence of the Generalized Calogero-Moser system proposed by Berntson, Langmann and Lenells which we call the elliptic quadruple Calogero-Moser system (eqCM). We write down the Dunkl operators which give commuting Hamiltonians of the quantum integrable system. We identify the gauge theory in correspondence is a supergroup version of the gauge o… ▽ More We study the integrability and the Bethe/Gauge correspondence of the Generalized Calogero-Moser system proposed by Berntson, Langmann and Lenells which we call the elliptic quadruple Calogero-Moser system (eqCM). We write down the Dunkl operators which give commuting Hamiltonians of the quantum integrable system. We identify the gauge theory in correspondence is a supergroup version of the gauge origami, from which we construct the transfer matrix of the eqCM system. △ Less

Submitted 30 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 28+4 pages. hyperlink fixed, add reference. arXiv admin note: text overlap with arXiv:1908.04928

Report number: CGP24006

Journal ref: Nucl.Phys.B1005(2024)116604

arXiv:2403.18932 [pdf, other]

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

Authors: Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung

Abstract: We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine… ▽ More We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues. Existing benchmarks and measures focus on gender and racial biases. However, political bias exists in LLMs and can lead to polarization and other harms in downstream applications. In order to provide transparency to users, we advocate that there should be fine-grained and explainable measures of political biases generated by LLMs. Our proposed measure looks at different political issues such as reproductive rights and climate change, at both the content (the substance of the generation) and the style (the lexical polarity) of such bias. We measured the political bias in eleven open-sourced LLMs and showed that our proposed framework is easily scalable to other topics and is explainable. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 16 pages

arXiv:2403.16372 [pdf, other]

SignSGD with Federated Voting

Authors: Chanho Park, H. Vincent Poor, Namyoon Lee

Abstract: Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed lear… ▽ More Distributed learning is commonly used for accelerating model training by harnessing the computational capabilities of multiple-edge devices. However, in practical applications, the communication delay emerges as a bottleneck due to the substantial information exchange required between workers and a central parameter server. SignSGD with majority voting (signSGD-MV) is an effective distributed learning algorithm that can significantly reduce communication costs by one-bit quantization. However, due to heterogeneous computational capabilities, it fails to converge when the mini-batch sizes differ among workers. To overcome this, we propose a novel signSGD optimizer with \textit{federated voting} (signSGD-FV). The idea of federated voting is to exploit learnable weights to perform weighted majority voting. The server learns the weights assigned to the edge devices in an online fashion based on their computational capabilities. Subsequently, these weights are employed to decode the signs of the aggregated local gradients in such a way to minimize the sign decoding error probability. We provide a unified convergence rate analysis framework applicable to scenarios where the estimated weights are known to the parameter server either perfectly or imperfectly. We demonstrate that the proposed signSGD-FV algorithm has a theoretical convergence guarantee even when edge devices use heterogeneous mini-batch sizes. Experimental results show that signSGD-FV outperforms signSGD-MV, exhibiting a faster convergence rate, especially in heterogeneous mini-batch sizes. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.15692 [pdf, other]

Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption

Authors: Donghwa Han, Bowhyung Lee, Min Jang, Donghun Lee, Seho Myung, Namyoon Lee

Abstract: Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n… ▽ More Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth novel joint demodulation and decoding methods for BOSS codes under fading channels. For a fast fading channel, we present a minimum mean square error approximate maximum a posteriori (MMSE-A-MAP) algorithm for the joint demodulation and decoding when channel state information is available at the receiver (CSIR). We also propose a joint demodulation and decoding method without using CSIR for a block fading channel scenario. We refer to this as the non-coherent sphere decoding (NSD) algorithm. Simulation results demonstrate that BOSS codes with MMSE-A-MAP decoding outperform CRC-aided polar codes, while NSD decoding achieves comparable performance to quasi-maximum likelihood decoding with significantly reduced complexity. Both decoding algorithms are suitable for parallelization, satisfying low-latency constraints. Additionally, real-time simulations on a software-defined radio testbed validate the feasibility of using BOSS codes for low-power transmission. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.15042 [pdf, other]

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Authors: Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipali, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

Abstract: Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation st… ▽ More Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation strategy that uses a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used for fine-tuning on a specific task. LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data. This approach amplifies the signal from incorrectly predicted data points by the LLM during training and reintegrates them into the dataset to focus on more challenging examples for the LLM. Our results show that LLM2LLM significantly enhances the performance of LLMs in the low-data regime, outperforming both traditional fine-tuning and other data augmentation baselines. LLM2LLM reduces the dependence on labor-intensive data curation and paves the way for more scalable and performant LLM solutions, allowing us to tackle data-constrained domains and tasks. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime using a LLaMA2-7B student model. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Our code is available at https://github.com/SqueezeAILab/LLM2LLM

arXiv:2403.11762 [pdf, other]

Full-Duplex MU-MIMO Systems with Coarse Quantization: How Many Bits Do We Need?

Authors: Seunghyeong Yoo, Seokjun Park, Mintaek Oh, Namyoon Lee, Jinseok Choi

Abstract: This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total re… ▽ More This paper investigates full-duplex (FD) multi-user multiple-input multiple-output (MU-MIMO) system design with coarse quantization. We first analyze the impact of self-interference (SI) on quantization in FD single-input single-output systems. The analysis elucidates that the minimum required number of analog-to-digital converter (ADC) bits is logarithmically proportional to the ratio of total received power to the received power of desired signals. Motivated by this, we design a FD MIMO beamforming method that effectively manages the SI. Dividing a spectral efficiency maximization beamforming problem into two sub-problems for alternating optimization, we address the first by optimizing the precoder: obtaining a generalized eigenvalue problem from the first-order optimality condition, where the principal eigenvector is the optimal stationary solution, and adopting a power iteration method to identify this eigenvector. Subsequently, a quantization-aware minimum mean square error combiner is computed for the derived precoder. Through numerical studies, we observe that the proposed beamformer reduces the minimum required number of ADC bits for achieving higher spectral efficiency than that of half-duplex (HD) systems, compared to FD benchmarks. The overall analysis shows that, unlike with quantized HD systems, more than 6 bits are required for the ADC to fully realize the potential of the quantized FD system. △ Less

Submitted 18 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11096 [pdf, other]

Modeling and Coverage Analysis of K-Tier Integrated Satellite-Terrestrial Downlink Networks

Authors: Jungbin Yim, Jeonghun Park, Namyoon Lee

Abstract: Integrated satellite-terrestrial networks (ISTNs) can significantly expand network coverage while diminishing reliance on terrestrial infrastructure. Despite the enticing potential of ISTNs, there is no comprehensive mathematical performance analysis framework for these emerging networks. In this paper, we introduce a tractable approach to analyze the downlink coverage performance of multi-tier IS… ▽ More Integrated satellite-terrestrial networks (ISTNs) can significantly expand network coverage while diminishing reliance on terrestrial infrastructure. Despite the enticing potential of ISTNs, there is no comprehensive mathematical performance analysis framework for these emerging networks. In this paper, we introduce a tractable approach to analyze the downlink coverage performance of multi-tier ISTNs, where each network tier operates with orthogonal frequency bands. The proposed approach is to model the spatial distribution of cellular and satellite base stations using homogeneous Poisson point processes arranged on concentric spheres with varying radii. Central to our analysis is a displacement principle that transforms base station locations on different spheres into projected rings while preserving the distance distribution to the typical user. By incorporating the effects of Shadowed-Rician fading on satellite channels and employing orthogonal frequency bands, we derive analytical expressions for coverage in the integrated networks while keeping full generality. Our primary discovery is that network performance reaches its maximum when selecting the optimal density ratio of users associated with the network according to the density and the channel parameters of each network. Through simulations, we validate the precision of our derived expressions. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 13 pages, 9 figures

arXiv:2403.11094 [pdf, other]

Nonlinear Self-Interference Cancellation With Learnable Orthonormal Polynomials for Full-Duplex Wireless Systems

Authors: Hyowon Lee, Jungyeon Kim, Geon Choi, Ian P. Roberts, Jinseok Choi, Namyoon Lee

Abstract: Nonlinear self-interference cancellation (SIC) is essential for full-duplex communication systems, which can offer twice the spectral efficiency of traditional half-duplex systems. The challenge of nonlinear SIC is similar to the classic problem of system identification in adaptive filter theory, whose crux lies in identifying the optimal nonlinear basis functions for a nonlinear system. This beco… ▽ More Nonlinear self-interference cancellation (SIC) is essential for full-duplex communication systems, which can offer twice the spectral efficiency of traditional half-duplex systems. The challenge of nonlinear SIC is similar to the classic problem of system identification in adaptive filter theory, whose crux lies in identifying the optimal nonlinear basis functions for a nonlinear system. This becomes especially difficult when the system input has a non-stationary distribution. In this paper, we propose a novel algorithm for nonlinear digital SIC that adaptively constructs orthonormal polynomial basis functions according to the non-stationary moments of the transmit signal. By combining these basis functions with the least mean squares (LMS) algorithm, we introduce a new SIC technique, called as the adaptive orthonormal polynomial LMS (AOP-LMS) algorithm. To reduce computational complexity for practical systems, we augment our approach with a precomputed look-up table, which maps a given modulation and coding scheme to its corresponding basis functions. Numerical simulation indicates that our proposed method surpasses existing state-of-the-art SIC algorithms in terms of convergence speed and mean squared error when the transmit signal is non-stationary, such as with adaptive modulation and coding. Experimental evaluation with a wireless testbed confirms that our proposed approach outperforms existing digital SIC algorithms. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 13 pages, total 16 figures

arXiv:2403.07821 [pdf, other]

Augmenting Interpolation-Based Model Checking with Auxiliary Invariants (Extended Version)

Authors: Dirk Beyer, Po-Chun Chien, Nian-Ze Lee

Abstract: Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that a… ▽ More Software model checking is a challenging problem, and generating relevant invariants is a key factor in proving the safety properties of a program. Program invariants can be obtained by various approaches, including lightweight procedures based on data-flow analysis and intensive techniques using Craig interpolation. Although data-flow analysis runs efficiently, it often produces invariants that are too weak to prove the properties. By contrast, interpolation-based approaches build strong invariants from interpolants, but they might not scale well due to expensive interpolation procedures. Invariants can also be injected into model-checking algorithms to assist the analysis. Invariant injection has been studied for many well-known approaches, including k-induction, predicate abstraction, and symbolic execution. We propose an augmented interpolation-based verification algorithm that injects external invariants into interpolation-based model checking (McMillan, 2003), a hardware model-checking algorithm recently adopted for software verification. The auxiliary invariants help prune unreachable states in Craig interpolants and confine the analysis to the reachable parts of a program. We implemented the proposed technique in the verification framework CPAchecker and evaluated it against mature SMT-based methods in CPAchecker as well as other state-of-the-art software verifiers. We found that injecting invariants reduces the number of interpolation queries needed to prove safety properties and improves the run-time efficiency. Consequently, the proposed invariant-injection approach verified difficult tasks that none of its plain version (i.e., without invariants), the invariant generator, or any compared tools could solve. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07691 [pdf, other]

ORPO: Monolithic Preference Optimization without Reference Model

Authors: Jiwoo Hong, Noah Lee, James Thorne

Abstract: While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building… ▽ More While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preference-aligned SFT. Building on this foundation, we introduce a straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, fine-tuning Phi-2 (2.7B), Llama-2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of state-of-the-art language models with more than 7B and 13B parameters: achieving up to 12.20% on $\text{AlpacaEval}_{2.0}$ (Figure 1), 66.19% on IFEval (instruction-level loose, Table 6), and 7.32 in MT-Bench (Figure 12). We release code and model checkpoints for Mistral-ORPO-$α$ (7B) and Mistral-ORPO-$β$ (7B). △ Less

Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Preprint

arXiv:2403.05389 [pdf, other]

Multi-reference coupled cluster theory using the normal ordered exponential ansatz

Authors: Alexander Gunasekera, Nicholas Lee, David P. Tew

Abstract: Properly spin-adapted coupled-cluster theory for general open-shell configurations remains an elusive goal in electronic structure theory. In this contribution we examine Lindgren's normal-ordered exponential ansatz using spin-free excitation operators, with the aid of automatic equation generation software. We present a size-extensive reformulation of the unlinked working equations, and analyse t… ▽ More Properly spin-adapted coupled-cluster theory for general open-shell configurations remains an elusive goal in electronic structure theory. In this contribution we examine Lindgren's normal-ordered exponential ansatz using spin-free excitation operators, with the aid of automatic equation generation software. We present a size-extensive reformulation of the unlinked working equations, and analyse the performance of the method with single and double excitations for simple molecular systems in terms of accuracy and size consistency. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 9 pages, 2 figures

arXiv:2402.13889 [pdf, other]

Bispectral duality and separation of variables from surface defect transition

Authors: Saebyeok Jeong, Norton Lee

Abstract: We study two types of surface observables $-$ the $\mathbf{Q}$-observables and the $\mathbf{H}$-observables $-$ of the 4d $\mathcal{N}=2$ $A_1$-quiver $U(N)$ gauge theory obtained by coupling a 2d $\mathcal{N}=(2,2)$ gauged linear sigma model. We demonstrate that the transition between the two surface defects manifests as a Fourier transformation between the surface observables. Utilizing the resu… ▽ More We study two types of surface observables $-$ the $\mathbf{Q}$-observables and the $\mathbf{H}$-observables $-$ of the 4d $\mathcal{N}=2$ $A_1$-quiver $U(N)$ gauge theory obtained by coupling a 2d $\mathcal{N}=(2,2)$ gauged linear sigma model. We demonstrate that the transition between the two surface defects manifests as a Fourier transformation between the surface observables. Utilizing the results from our previous works, which establish that the $\mathbf{Q}$-observables and the $\mathbf{H}$-observables give rise, respectively, to the $Q$-operators on the evaluation module over the Yangian $Y(\mathfrak{gl}(2))$ and the Hecke operators on the twisted $\widehat{\mathfrak{sl}}(N)$-coinvariants, we derive an exact duality between the spectral problems of the $\mathfrak{gl}(2)$ XXX spin chain with $N$ sites and the $\mathfrak{sl}(N)$ Gaudin model with 4 sites, both of which are defined on bi-infinite modules. Moreover, we present a dual description of the monodromy surface defect as coupling a 2d $\mathcal{N}=(2,2)$ gauged linear sigma model. Employing this dual perspective, we demonstrate how the monodromy surface defect undergoes a transition to multiple $\mathbf{Q}$-observables or $\mathbf{H}$-observables, implemented through integral transformations between their surface observables. These transformations provide, respectively, $\hbar$-deformation and a higher-rank generalization of the KZ/BPZ correspondence. In the limit $\varepsilon_2\to 0$, they give rise to the quantum separation of variables for the $\mathfrak{gl}(2)$ XXX spin chain and the $\mathfrak{sl}(N)$ Gaudin model, respectively. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 62+11 pages; 10 figures

Report number: CERN-TH-2024-024, CGP24003

arXiv:2402.13888 [pdf, other]

di-Langlands correspondence and extended observables

Authors: Saebyeok Jeong, Norton Lee, Nikita Nekrasov

Abstract: We explore the $\textit{difference Langlands correspondence}$ using the four dimensional ${\mathcal{N}}=2$ super-QCD. Surface defects and surface observables play the crucial role. As an application, we give the first construction of the full set of quantum integrals, i.e. commuting differential operators, such that the partition function of the so-called regular monodromy surface defect is their… ▽ More We explore the $\textit{difference Langlands correspondence}$ using the four dimensional ${\mathcal{N}}=2$ super-QCD. Surface defects and surface observables play the crucial role. As an application, we give the first construction of the full set of quantum integrals, i.e. commuting differential operators, such that the partition function of the so-called regular monodromy surface defect is their joint eigenvectors in an evaluation module over the Yangian $Y(\mathfrak{gl}(2))$, making it the wavefunction of a $N$-site $\mathfrak{gl}(2)$ spin chain with bi-infinite spin modules. We construct the $\mathbf{Q}$- and $\tilde{\mathbf{Q}}$-surface observables which are believed to be the $Q$-operators on the bi-infinite module over the Yangian $Y(\mathfrak{gl}(2))$, and compute their eigenvalues, the $Q$-functions, as vevs of the surface observables. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 50+11 pages

Report number: CERN-TH-2023-220, CGP24002

arXiv:2402.09903 [pdf, ps, other]

Enumeration of multiplex juggling card sequences using generalized q-derivatives

Authors: Yumin Cho, Jaehyun Kim, Jang Soo Kim, Nakyung Lee

Abstract: In 2019, Butler, Choi, Kim, and Seo introduced a new type of juggling card that represents multiplex juggling patterns in a natural bijective way. They conjectured a formula for the generating function for the number of multiplex juggling cards with capacity 2. In this paper we prove their conjecture. More generally, we find an explicit formula for the generating function with any capacity. We als… ▽ More In 2019, Butler, Choi, Kim, and Seo introduced a new type of juggling card that represents multiplex juggling patterns in a natural bijective way. They conjectured a formula for the generating function for the number of multiplex juggling cards with capacity 2. In this paper we prove their conjecture. More generally, we find an explicit formula for the generating function with any capacity. We also find an expression for the generating function for multiplex juggling card sequences by introducing a generalization of the q-derivative operator. As a consequence, we show that this generating function is a rational function. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 17 pages, 4 figures

arXiv:2402.09155 [pdf, ps, other]

Joint and Robust Beamforming Framework for Integrated Sensing and Communication Systems

Authors: Jinseok Choi, Jeonghun Park, Namyoon Lee, Ahmed Alkhateeb

Abstract: Integrated sensing and communication (ISAC) is widely recognized as a fundamental enabler for future wireless communications. In this paper, we present a joint communication and radar beamforming framework for maximizing a sum spectral efficiency (SE) while guaranteeing desired radar performance with imperfect channel state information (CSI) in multi-user and multi-target ISAC systems. To this end… ▽ More Integrated sensing and communication (ISAC) is widely recognized as a fundamental enabler for future wireless communications. In this paper, we present a joint communication and radar beamforming framework for maximizing a sum spectral efficiency (SE) while guaranteeing desired radar performance with imperfect channel state information (CSI) in multi-user and multi-target ISAC systems. To this end, we adopt either a radar transmit beam mean square error (MSE) or receive signal-to-clutter-plus-noise ratio (SCNR) as a radar performance constraint of a sum SE maximization problem. To resolve inherent challenges such as non-convexity and imperfect CSI, we reformulate the problems and identify first-order optimality conditions for the joint radar and communication beamformer. Turning the condition to a nonlinear eigenvalue problem with eigenvector dependency (NEPv), we develop an alternating method which finds the joint beamformer through power iteration and a Lagrangian multiplier through binary search. The proposed framework encompasses both the radar metrics and is robust to channel estimation error with low complexity. Simulations validate the proposed methods. In particular, we observe that the MSE and SCNR constraints exhibit complementary performance depending on the operating environment, which manifests the importance of the proposed comprehensive and robust optimization framework. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: submitted for possible IEEE publication

arXiv:2402.08858 [pdf, other]

Spin-coupled molecular orbitals: chemical intuition meets quantum chemistry

Authors: Daniel Marti-Dafcik, Nicholas Lee, Hugh G. A. Burton, David P. Tew

Abstract: Molecular orbital theory is powerful both as a conceptual tool for understanding chemical bonding, and as a theoretical framework for ab initio quantum chemistry. Despite its undoubted success, MO theory has well documented shortcomings, most notably that it fails to correctly describe diradical states and homolytic bond fission. In this contribution, we introduce a generalised MO theory that incl… ▽ More Molecular orbital theory is powerful both as a conceptual tool for understanding chemical bonding, and as a theoretical framework for ab initio quantum chemistry. Despite its undoubted success, MO theory has well documented shortcomings, most notably that it fails to correctly describe diradical states and homolytic bond fission. In this contribution, we introduce a generalised MO theory that includes spin-coupled radical states. We show through archetypical examples that when bonds break, the electronic state transitions between a small number of valence configurations, characterised by occupation of both delocalised molecular orbitals and spin-coupled localised orbitals. Our theory provides a model for chemical bonding that is both chemically intuitive and qualitatively accurate when combined with ab initio theory. Although exploitation of our theory presents significant challenges for classical computing, the predictable structure of spin-coupled states is ideally suited to algorithms that exploit quantum computers. Our approach provides a systematic route to overcoming the initial state overlap problem and unlocking the potential of quantum computational chemistry. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 11 pages, 5 figures

arXiv:2402.07381 [pdf, other]

RIS-Empowered LEO Satellite Networks for 6G: Promising Usage Scenarios and Future Directions

Authors: Mesut Toka, Byungju Lee, Jaehyup Seong, Aryan Kaushik, Juhwan Lee, Jungwoo Lee, Namyoon Lee, Wonjae Shin, H. Vincent Poor

Abstract: Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has eme… ▽ More Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has emerged with favorable features, such as flexible deployment, cost & power efficiency, less transmission delay, noise-free nature, and in-band full-duplex structure. LEO satellite networks have many practical imperfections and limitations; however, exploiting RISs has been shown to be a potential solution to overcome these challenges. Particularly, RISs can enhance link quality, reduce the Doppler shift effect, and mitigate inter-/intra beam interference. In this article, we delve into exploiting RISs in LEO satellite networks. First, we present a holistic overview of LEO satellite communication and RIS technology, highlighting potential benefits and challenges. Second, we describe promising usage scenarios and applications in detail. Finally, we discuss potential future directions and challenges on RIS-empowered LEO networks, offering futuristic visions of the upcoming 6G era. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 18 pages, 5 figures, Paper accepted by IEEE Communications Magazine

arXiv:2402.04248 [pdf, other]

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Authors: Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

Abstract: State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of mo… ▽ More State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain underexplored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, MambaFormer, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models. △ Less

Submitted 25 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: Changes in v2: experiments on formal language ICL and explorations of width vs. depth on ICL; code repo available (24 pages, 10 figures)

arXiv:2402.01340 [pdf, ps, other]

SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding

Authors: Chanho Park, Namyoon Lee

Abstract: Distributed learning is an effective approach to accelerate model training using multiple workers. However, substantial communication delays emerge between workers and a parameter server due to massive costs associated with communicating gradients. SignSGD with majority voting (signSGD-MV) is a simple yet effective optimizer that reduces communication costs through one-bit quantization, yet the co… ▽ More Distributed learning is an effective approach to accelerate model training using multiple workers. However, substantial communication delays emerge between workers and a parameter server due to massive costs associated with communicating gradients. SignSGD with majority voting (signSGD-MV) is a simple yet effective optimizer that reduces communication costs through one-bit quantization, yet the convergence rates considerably decrease as adversarial workers increase. In this paper, we show that the convergence rate is invariant as the number of adversarial workers increases, provided that the number of adversarial workers is smaller than that of benign workers. The key idea showing this counter-intuitive result is our novel signSGD with federated defense (signSGD-FD). Unlike the traditional approaches, signSGD-FD exploits the gradient information sent by adversarial workers with the proper weights, which are obtained through gradient sign decoding. Experimental results demonstrate signSGD-FD achieves superior convergence rates over traditional algorithms in various adversarial attack scenarios. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.05193 [pdf, ps, other]

Experiment Planning with Function Approximation

Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill

Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data coll… ▽ More We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied, results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 10 pages main

arXiv:2401.04724 [pdf, other]

A parametrically programmable delay line for microwave photons

Authors: Takuma Makihara, Nathan Lee, Yudan Guo, Wenyan Guan, Amir H. Safavi-Naeini

Abstract: Delay lines capable of storing quantum information are crucial for advancing quantum repeaters and hardware efficient quantum computers. Traditionally, they are physically realized as extended systems that support wave propagation, such as waveguides. But such delay lines typically provide limited control over the propagating fields. Here, we introduce a parametrically addressed delay line (PADL)… ▽ More Delay lines capable of storing quantum information are crucial for advancing quantum repeaters and hardware efficient quantum computers. Traditionally, they are physically realized as extended systems that support wave propagation, such as waveguides. But such delay lines typically provide limited control over the propagating fields. Here, we introduce a parametrically addressed delay line (PADL) for microwave photons that provides a high level of control over the dynamics of stored pulses, enabling us to arbitrarily delay or even swap pulses. By parametrically driving a three-waving mixing superconducting circuit element that is weakly hybridized with an ensemble of resonators, we engineer a spectral response that simulates that of a physical delay line, while providing fast control over the delay line's properties and granting access to its internal modes. We illustrate the main features of the PADL, operating on pulses with energies on the order of a single photon, through a series of experiments, which include choosing which photon echo to emit, translating pulses in time, and swapping two pulses. We also measure the noise added to the delay line from our parametric interactions and find that the added noise is much less than one photon. △ Less

Submitted 11 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 13 pages, 9 figures; v2: minor update of references

arXiv:2312.13289 [pdf, other]

Stoichiometry Representation Learning with Polymorphic Crystal Structures

Authors: Namkyeong Lee, Heewoong Noh, Gyoung S. Na, Tianfan Fu, Jimeng Sun, Chanyoung Park

Abstract: Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it i… ▽ More Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it is not trivial to learn the representations of stoichiometry due to the nature of materials science called polymorphism, i.e., a single stoichiometry can exist in multiple structural forms due to the flexibility of atomic arrangements, inducing uncertainties in representation. To this end, we propose PolySRL, which learns the probabilistic representation of stoichiometry by utilizing the readily available structural information, whose uncertainty reveals the polymorphic structures of stoichiometry. Extensive experiments on sixteen datasets demonstrate the superiority of PolySRL, and analysis of uncertainties shed light on the applicability of PolySRL in real-world material discovery. The source code for PolySRL is available at https://github.com/Namkyeong/PolySRL_AI4Science. △ Less

Submitted 17 November, 2023; originally announced December 2023.

Comments: NeurIPS 2023 AI4Science Workshop

arXiv:2312.13133 [pdf, other]

New dimer integrable systems and defects in five dimensional gauge theory

Authors: Norton Lee

Abstract: We study the relation between the quantum integrable systems derived from the dimer graphs and five dimensional $\mathcal{N}=1$ supersymmetric gauge theories on $S^1 \times \mathbb{R}^4$. We construct integrable systems based on new dimer graphs obtained from modification of hexagon dimer diagram. We study the gauge theories in correspondence to the newly proposed integrable systems. By examining… ▽ More We study the relation between the quantum integrable systems derived from the dimer graphs and five dimensional $\mathcal{N}=1$ supersymmetric gauge theories on $S^1 \times \mathbb{R}^4$. We construct integrable systems based on new dimer graphs obtained from modification of hexagon dimer diagram. We study the gauge theories in correspondence to the newly proposed integrable systems. By examining three types of defects -- a line defect, a canonical co-dimensional two defect and a monodromy defect -- in five-dimensional gauge theory with $\mathcal{N}=1$ supersymmetry and $Ω_{\varepsilon_1,\varepsilon_2}$-background. We identify, in the $\varepsilon_2 \to 0$ limit, the canonical co-dimensional two defect satisfying the Baxter T-Q equation of the generalized $A$-type dimer integrable system, and the monodromy defect as its common eigenstate of the commuting Hamiltonians, with the eigenvalues being the expectation value of the BPS Wilson loop in the anti-symmetric representation of the bulk gauge group. △ Less

Submitted 16 June, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 45+13 pages, 12 figures, correct typos, add citation

Report number: CGP-23022

arXiv:2312.06985 [pdf, ps, other]

Ergodic Secrecy Rate Analysis for LEO Satellite Downlink Networks

Authors: Daeun Kim, Namyoon Lee

Abstract: Satellite networks are recognized as an effective solution to ensure seamless connectivity worldwide, catering to a diverse range of applications. However, the broad coverage and broadcasting nature of satellite networks also expose them to security challenges. Despite these challenges, there is a lack of analytical understanding addressing the secrecy performance of these networks. This paper pre… ▽ More Satellite networks are recognized as an effective solution to ensure seamless connectivity worldwide, catering to a diverse range of applications. However, the broad coverage and broadcasting nature of satellite networks also expose them to security challenges. Despite these challenges, there is a lack of analytical understanding addressing the secrecy performance of these networks. This paper presents a secrecy rate analysis for downlink low Earth orbit (LEO) satellite networks by modeling the spatial distribution of satellites, users, and potential eavesdroppers as homogeneous Poisson point processes on concentric spheres. Specifically, we provide an analytical expression for the ergodic secrecy rate of a typical downlink user in terms of the satellite network parameters, fading parameters, and path-loss exponent. Simulation results show the exactness of the provided expressions and we find that optimal satellite altitude increases with eavesdropper density. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.04511 [pdf, other]

An LLM Compiler for Parallel Function Calling

Authors: Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

Abstract: The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling oft… ▽ More The reasoning capabilities of the recent LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls. Drawing inspiration from the principles of classical compilers, LLMCompiler enables parallel function calling with three components: (i) a Function Calling Planner, formulating execution plans for function calling; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct. Our code is available at https://github.com/SqueezeAILab/LLMCompiler. △ Less

Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: ICML 2024

arXiv:2312.03901 [pdf, other]

Redrawing the 2012 map of the Maryland congressional districts

Authors: Noah Lee, Hyunwoo Park, Sangho Shim

Abstract: Gerrymandering is the practice of drawing biased electoral maps that manipulate the voter population to gain an advantage. The most recent time gerrymandering became an issue was 2019 when the U.S. Federal Supreme Court decided that the court does not have the authority to dictate how to draw the district map and state legislators are the ones who should come up with an electoral district plan. We… ▽ More Gerrymandering is the practice of drawing biased electoral maps that manipulate the voter population to gain an advantage. The most recent time gerrymandering became an issue was 2019 when the U.S. Federal Supreme Court decided that the court does not have the authority to dictate how to draw the district map and state legislators are the ones who should come up with an electoral district plan. We solve the political districting problem and redraw the 2012 map of Maryland congressional districts which raised the issue in 2019. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 8 pages, to be submitted to IISE 2024 Annual Conference Proceedings

MSC Class: 90

arXiv:2312.03684 [pdf]

Spontaneous Chirality Flipping in an Orthogonal Spin-Charge Ordered Topological Magnet

Authors: H. Miao, J. Bouaziz, G. Fabbris, W. R. Meier, F. Z. Yang, H. X. Li, C. Nelson, E. Vescovo, S. Zhang, A. Christianson, H. N. Lee, Y. Zhang, C. D. Batista, S. Blügel

Abstract: The asymmetric distribution of chiral objects with opposite chirality is of great fundamental interests ranging from molecular biology to particle physics. In quantum materials, chiral states can build on inversion-symmetry-breaking lattice structures or emerge from spontaneous magnetic ordering induced by competing interactions. Although the handedness of a chiral state can be changed through ext… ▽ More The asymmetric distribution of chiral objects with opposite chirality is of great fundamental interests ranging from molecular biology to particle physics. In quantum materials, chiral states can build on inversion-symmetry-breaking lattice structures or emerge from spontaneous magnetic ordering induced by competing interactions. Although the handedness of a chiral state can be changed through external fields, a spontaneous chirality flipping has yet to be discovered. In this letter, we present experimental evidence of chirality flipping via changing temperature in a topological magnet EuAl$_4$, which features orthogonal spin and charge density waves (SDW/CDW). Using circular dichroism of Bragg peaks in the resonant magnetic x-ray scattering, we find that the chirality of the helical SDW flips through a first order phase transition with modified SDW wavelength. Intriguingly, we observe that the CDW couples strongly with the SDW and displays a rare commensurate-to-incommensurate transition at the chirality flipping temperature. Combining with first principles calculations and angle resolved photoemission spectroscopy, we establish the Fermi surface origin of the helical SDW with intertwined spin, charge, and lattice degrees of freedom in EuAl$_4$. Our results reveal an unprecedented spontaneous chirality flipping and lays the groundwork for a new functional manipulation of chirality through momentum dependent spin-charge-lattice interactions. △ Less

Submitted 19 February, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Supplementary materials are available from the corresponding author upon request

arXiv:2312.03055 [pdf, other]

Front-row seat of the recent R Aqr periastron passage: X-ray multi-epoch spectral and spatial analysis

Authors: A. Sacchi, M. Karovska, J. Raymond, V. Kashyap, T. J. Gaetz, W. Hack, J. Kennea, N. Lee, A. J Mioduszewski, M. J Claussen

Abstract: We report on the X-ray spectral and spatial evolution of the Symbiotic star R Aqr. Through a multi-epoch observational campaign performed with Chandra between 2017 and 2022, we study the X-ray emission of this binary system, composed of an evolved red giant star and a white dwarf (WD). This analysis is particularly timely as the WD approached the periastron in late 2018/early 2019, thus mass trans… ▽ More We report on the X-ray spectral and spatial evolution of the Symbiotic star R Aqr. Through a multi-epoch observational campaign performed with Chandra between 2017 and 2022, we study the X-ray emission of this binary system, composed of an evolved red giant star and a white dwarf (WD). This analysis is particularly timely as the WD approached the periastron in late 2018/early 2019, thus mass transfer, jet emission and outburst phenomena are to be expected. Through detailed spectral analysis, we detect a significant rise in the soft X-ray (0.5-2 keV) emission of R Aqr, likely linked to jet emission, followed by a decay towards the previous quiescent state. The hard X-ray emission (5-8 keV), is not immediately affected by the periastron passage; the hard component, after maintaining the same flux level between 2017 and 2021, rapidly decays after 2022. Possible explanations for this are a change in the reflection properties of the medium surrounding the binary, obscuration of the central region by material ejected during the periastron passage, or even the partial/complete destruction of the inner regions of the accretion disc surrounding the WD. In addition to this activity in the central region, extended emission is also detected, likely linked to a hot spot in a pre-outburst-emitted jet, which can be observed moving away from the system's central region. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures, 3 tables. Accepted for publication in ApJ

arXiv:2311.18172 [pdf, other]

Multi-Rate Variable-Length CSI Compression for FDD Massive MIMO

Authors: Bumsu Park, Heedong Do, Namyoon Lee

Abstract: For frequency-division-duplexing (FDD) systems, channel state information (CSI) should be fed back from the user terminal to the base station. This feedback overhead becomes problematic as the number of antennas grows. To alleviate this issue, we propose a flexible CSI compression method using variational autoencoder (VAE) with an entropy bottleneck structure, which can support multi-rate and vari… ▽ More For frequency-division-duplexing (FDD) systems, channel state information (CSI) should be fed back from the user terminal to the base station. This feedback overhead becomes problematic as the number of antennas grows. To alleviate this issue, we propose a flexible CSI compression method using variational autoencoder (VAE) with an entropy bottleneck structure, which can support multi-rate and variable-length operation. Numerical study confirms that the proposed method outperforms the existing CSI compression techniques in terms of normalized mean squared error. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.17539 [pdf, other]

Critical Influence of Overparameterization on Sharpness-aware Minimization

Authors: Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee

Abstract: Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss. Meanwhile, with evidence that suggests a strong correlation between the sharpness of minima and their generalization errors, increasing efforts have been made to develop optimization methods to explicitly find flat minima as more generalizable solution… ▽ More Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss. Meanwhile, with evidence that suggests a strong correlation between the sharpness of minima and their generalization errors, increasing efforts have been made to develop optimization methods to explicitly find flat minima as more generalizable solutions. Despite its contemporary relevance to overparameterization, however, this sharpness-aware minimization (SAM) strategy has not been studied much yet as to exactly how it is affected by overparameterization. Hence, in this work, we analyze SAM under overparameterization of varying degrees and present both empirical and theoretical results that indicate a critical influence of overparameterization on SAM. At first, we conduct extensive numerical experiments across vision, language, graph, and reinforcement learning domains and show that SAM consistently improves with overparameterization. Next, we attribute this phenomenon to the interplay between the enlarged solution space and increased implicit bias from overparameterization. Further, we prove multiple theoretical benefits of overparameterization for SAM to attain (i) minima with more uniform Hessian moments compared to SGD, (ii) much faster convergence at a linear rate, and (iii) lower test error for two-layer networks. Last but not least, we discover that the effect of overparameterization is more significantly pronounced in practical settings of label noise and sparsity, and yet, sufficient regularization is necessary. △ Less

Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.13807 [pdf, other]

doi 10.1093/mnras/stad3199

rrlfe: Software for Generating and Applying Metallicity Calibrations for RR Lyrae Variable Stars Across a Wide Range of Phases and Temperatures

Authors: Eckhart Spalding, Ronald Wilhelm, Nathan De Lee, Stacy Long, Timothy C. Beers, Vinicius M. Placco, John Kielkopf, Young Sun Lee, Joshua Pepper, Kenneth Carrell

Abstract: RR Lyrae stars play a central role in tracing phase-space structures within the Milky Way because they are easy to identify, are relatively luminous, and are found in large numbers in the Galactic bulge, disk, and halo. In this work, we present a new set of spectroscopic metallicity calibrations that use the equivalent widths of the Ca II K and Balmer H-gamma and H-delta lines to calculate metalli… ▽ More RR Lyrae stars play a central role in tracing phase-space structures within the Milky Way because they are easy to identify, are relatively luminous, and are found in large numbers in the Galactic bulge, disk, and halo. In this work, we present a new set of spectroscopic metallicity calibrations that use the equivalent widths of the Ca II K and Balmer H-gamma and H-delta lines to calculate metallicity values from low-resolution spectra. This builds on an earlier calibration from Layden by extending the range of equivalent widths which map between Ca II K and the Balmer lines. We have developed the software rrlfe to apply this calibration to spectra in a consistent, reproducible, and extensible manner. This software is open-source and available to the community. The calibration can be updated with additional datasets in the future. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Published

Journal ref: Monthly Notices of the Royal Astronomical Society, vol. 527, issue 1, January 2024, p. 828

arXiv:2311.12856 [pdf, other]

Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer

Authors: Namkyeong Lee, Heewoong Noh, Sungwon Kim, Dongmin Hyun, Gyoung S. Na, Chanyoung Park

Abstract: The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the ge… ▽ More The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer. The source code for DOSTransformer is available at https://github.com/HeewoongNoh/DOSTransformer. △ Less

Submitted 22 November, 2023; v1 submitted 24 October, 2023; originally announced November 2023.

Comments: NeurIPS 2023. arXiv admin note: text overlap with arXiv:2303.07000

arXiv:2311.05860 [pdf, other]

$\mathcal{O}\left(mα^2 (Zα)^6\right)$ contribution to Lamb shift from radiative corrections to the Wichmann-Kroll potential

Authors: Petr A. Krachkov, Roman N. Lee

Abstract: We derive an analytical expression for the contribution of the order $mα^2 (Zα)^6$ to the hydrogen Lamb shift which comes from the diagrams for radiative corrections to the Wichmann-Kroll potential. We use modern methods of multiloop calculations, based on IBP reduction, DRA method and differential equations. We derive an analytical expression for the contribution of the order $mα^2 (Zα)^6$ to the hydrogen Lamb shift which comes from the diagrams for radiative corrections to the Wichmann-Kroll potential. We use modern methods of multiloop calculations, based on IBP reduction, DRA method and differential equations. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 9 pages

arXiv:2311.04912 [pdf]

doi 10.1038/s41597-024-02959-0

ezBIDS: Guided standardization of neuroimaging data interoperable with major data archives and platforms

Authors: Daniel Levitas, Soichi Hayashi, Sophia Vinci-Booher, Anibal Heinsfeld, Dheeraj Bhatia, Nicholas Lee, Anthony Galassi, Guiomar Niso, Franco Pestilli

Abstract: Data standardization has become one of the leading methods neuroimaging researchers rely on for data sharing and reproducibility. Data standardization promotes a common framework through which researchers can utilize others' data. Yet, as of today, formatting datasets that adhere to community best practices requires technical expertise involving coding and considerable knowledge of file formats an… ▽ More Data standardization has become one of the leading methods neuroimaging researchers rely on for data sharing and reproducibility. Data standardization promotes a common framework through which researchers can utilize others' data. Yet, as of today, formatting datasets that adhere to community best practices requires technical expertise involving coding and considerable knowledge of file formats and standards. We describe ezBIDS, a tool for converting neuroimaging data and associated metadata to the Brain Imaging Data Structure (BIDS) standard. ezBIDS provides four unique features: (1) No installation or programming requirements. (2) Handling of both imaging and task events data and metadata. (3) Automated inference and guidance for adherence to BIDS. (4) Multiple data management options: download BIDS data to local system, or transfer to OpenNeuro.org or brainlife.io. In sum, ezBIDS requires neither coding proficiency nor knowledge of BIDS and is the first BIDS tool to offer guided standardization, support for task events conversion, and interoperability with OpenNeuro and brainlife.io. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2311.03285 [pdf, other]

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica

Abstract: The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched in… ▽ More The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched inference during serving. To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters. S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes Unified Paging. Unified Paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead. Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM (with naive support of LoRA serving), S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude. As a result, S-LoRA enables scalable serving of many task-specific fine-tuned models and offers the potential for large-scale customized fine-tuning services. The code is available at https://github.com/S-LoRA/S-LoRA △ Less

Submitted 5 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02236 [pdf, other]

Robust Fine-Tuning of Vision-Language Models for Domain Generalization

Authors: Kevin Vogt-Lowell, Noah Lee, Theodoros Tsiligkaridis, Marc Vaillant

Abstract: Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However… ▽ More Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However, zero-shot evaluation for these models has been predominantly confined to benchmarks with simple distribution shifts, limiting our understanding of their effectiveness under the more realistic shifts found in practice. Moreover, common fine-tuning methods for these models have yet to be evaluated against vision models in few-shot scenarios where training data is limited. To address these gaps, we present a new recipe for few-shot fine-tuning of the popular vision-language foundation model CLIP and evaluate its performance on challenging benchmark datasets with realistic distribution shifts from the WILDS collection. Our experimentation demonstrates that, while zero-shot CLIP fails to match performance of trained vision models on more complex benchmarks, few-shot CLIP fine-tuning outperforms its vision-only counterparts in terms of in-distribution and out-of-distribution accuracy at all levels of training data availability. This provides a strong incentive for adoption of foundation models within few-shot learning applications operating with real-world data. Code is available at https://github.com/mit-ll/robust-vision-language-finetuning △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: In proceedings of the 27th IEEE High Performance Extreme Computing Conference

arXiv:2311.01817 [pdf, other]

Mitigating Framing Bias with Polarity Minimization Loss

Authors: Yejin Bang, Nayeon Lee, Pascale Fung

Abstract: Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events. Media outlets with divergent political stances often use polarized language in their reporting of the same event. We propose a new loss function that encourages the model to minimize the polarity difference between the polarized input articles to reduce framing bias. Specific… ▽ More Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events. Media outlets with divergent political stances often use polarized language in their reporting of the same event. We propose a new loss function that encourages the model to minimize the polarity difference between the polarized input articles to reduce framing bias. Specifically, our loss is designed to jointly optimize the model to map polarity ends bidirectionally. Our experimental results demonstrate that incorporating the proposed polarity minimization loss leads to a substantial reduction in framing bias when compared to a BART-based multi-document summarization model. Notably, we find that the effectiveness of this approach is most pronounced when the model is trained to minimize the polarity loss associated with informational framing bias (i.e., skewed selection of information to report). △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 11 pages, EMNLP2023

arXiv:2310.07101 [pdf, other]

Hybrid Arrays: How Many RF Chains Are Required to Prevent Beam Squint?

Authors: Heedong Do, Namyoon Lee, Robert W. Heath Jr, Angel Lozano

Abstract: With increasing frequencies, bandwidths, and array apertures, the phenomenon of beam squint arises as a serious impairment to beamforming. Fully digital arrays with true time delay per antenna element are a potential solution, but they require downconversion at each element. This paper shows that hybrid arrays can perform essentially as well as digital arrays once the number of radio-frequency cha… ▽ More With increasing frequencies, bandwidths, and array apertures, the phenomenon of beam squint arises as a serious impairment to beamforming. Fully digital arrays with true time delay per antenna element are a potential solution, but they require downconversion at each element. This paper shows that hybrid arrays can perform essentially as well as digital arrays once the number of radio-frequency chains exceeds a certain threshold that is far below the number of elements. The result is robust, holding also for suboptimum but highly appealing beamspace architectures. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06271 [pdf, other]

Towards Mitigating Hallucination in Large Language Models via Self-Reflection

Authors: Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, Pascale Fung

Abstract: Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon pro… ▽ More Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks. However, the practical deployment still faces challenges, notably the issue of "hallucination", where models generate plausible-sounding but unfaithful or nonsensical information. This issue becomes particularly critical in the medical domain due to the uncommon professional concepts and potential social risks involved. This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets. Our investigation centers on the identification and comprehension of common problematic answers, with a specific emphasis on hallucination. To tackle this challenge, we present an interactive self-reflection methodology that incorporates knowledge acquisition and answer generation. Through this feedback process, our approach steadily enhances the factuality, consistency, and entailment of the generated answers. Consequently, we harness the interactivity and multitasking ability of LLMs and produce progressively more precise and accurate answers. Experimental results on both automatic and human evaluation demonstrate the superiority of our approach in hallucination reduction compared to baselines. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted by the findings of EMNLP 2023

Showing 1–50 of 653 results for author: Lee, N