subscribe to arXiv mailings

TRAVERSE: Traffic-Responsive Autonomous Vehicle Experience & Rare-event Simulation for Enhanced safety

Authors: Sandeep Thalapanane, Sandip Sharan Senthil Kumar, Guru Nandhan Appiya Dilipkumar Peethambari, Sourang SriHari, Laura Zheng, Julio Poveda, Ming C. Lin

Abstract: Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous… ▽ More Data for training learning-enabled self-driving cars in the physical world are typically collected in a safe, normal environment. Such data distribution often engenders a strong bias towards safe driving, making self-driving cars unprepared when encountering adversarial scenarios like unexpected accidents. Due to a dearth of such adverse data that is unrealistic for drivers to collect, autonomous vehicles can perform poorly when experiencing such rare events. This work addresses much-needed research by having participants drive a VR vehicle simulator going through simulated traffic with various types of accidental scenarios. It aims to understand human responses and behaviors in simulated accidents, contributing to our understanding of driving dynamics and safety. The simulation framework adopts a robust traffic simulation and is rendered using the Unity Game Engine. Furthermore, the simulation framework is built with portable, light-weight immersive driving simulator hardware, lowering the resource barrier for studies in autonomous driving research. Keywords: Rare Events, Traffic Simulation, Autonomous Driving, Virtual Reality, User Studies △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09404 [pdf, other]

CAACS: A Carbon Aware Ant Colony System

Authors: Marina Lin, Laura P. Schaposnik

Abstract: In an era where sustainability is becoming increasingly crucial, we introduce a new Carbon-Aware Ant Colony System (CAACS) Algorithm that addresses the Generalized Traveling Salesman Problem (GTSP) while minimizing carbon emissions. This novel approach leverages the natural efficiency of ant colony pheromone trails to find optimal routes, balancing both environmental and economic objectives. By in… ▽ More In an era where sustainability is becoming increasingly crucial, we introduce a new Carbon-Aware Ant Colony System (CAACS) Algorithm that addresses the Generalized Traveling Salesman Problem (GTSP) while minimizing carbon emissions. This novel approach leverages the natural efficiency of ant colony pheromone trails to find optimal routes, balancing both environmental and economic objectives. By integrating sustainability into transportation models, CAACS provides a powerful tool for real-world applications, including network design, delivery route planning, and commercial aircraft logistics. Our algorithm's unique bi-objective optimization advances the study of sustainable transportation solutions. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 31 figures, 23 pages

arXiv:2407.09089 [pdf]

Lomics: Generation of Pathways and Gene Sets using Large Language Models for Transcriptomic Analysis

Authors: Chun-Ka Wong, Ali Choo, Eugene C. C. Cheng, Wing-Chun San, Kelvin Chak-Kong Cheng, Yee-Man Lau, Minqing Lin, Fei Li, Wei-Hao Liang, Song-Yan Liao, Kwong-Man Ng, Ivan Fan-Ngai Hung, Hung-Fat Tse, Jason Wing-Hon Wong

Abstract: Interrogation of biological pathways is an integral part of omics data analysis. Large language models (LLMs) enable the generation of custom pathways and gene sets tailored to specific scientific questions. These targeted sets are significantly smaller than traditional pathway enrichment analysis libraries, reducing multiple hypothesis testing and potentially enhancing statistical power. Lomics (… ▽ More Interrogation of biological pathways is an integral part of omics data analysis. Large language models (LLMs) enable the generation of custom pathways and gene sets tailored to specific scientific questions. These targeted sets are significantly smaller than traditional pathway enrichment analysis libraries, reducing multiple hypothesis testing and potentially enhancing statistical power. Lomics (Large Language Models for Omics Studies) v1.0 is a python-based bioinformatics toolkit that streamlines the generation of pathways and gene sets for transcriptomic analysis. It operates in three steps: 1) deriving relevant pathways based on the researcher's scientific question, 2) generating valid gene sets for each pathway, and 3) outputting the results as .GMX files. Lomics also provides explanations for pathway selections. Consistency and accuracy are ensured through iterative processes, JSON format validation, and HUGO Gene Nomenclature Committee (HGNC) gene symbol verification. Lomics serves as a foundation for integrating LLMs into omics research, potentially improving the specificity and efficiency of pathway analysis. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09037 [pdf]

Photonic quasicrystal of spin angular momentum

Authors: Min Lin, Xinxin Gou, Zhenwei Xie, Aiping Yang, Luping Du, Xiaocong Yuan

Abstract: Quasicrystals,characterized by long-range order without translational symmetry,have catalyzed transformative advances in various fields,including optics in terms of field quasicrystals.Here,we present the first demonstration of photonic quasicrystals formed by spin angular momentum, unveiling novel spin-orbit coupling effects absent in traditional field quasicrystals.A de Bruijn tiling like theore… ▽ More Quasicrystals,characterized by long-range order without translational symmetry,have catalyzed transformative advances in various fields,including optics in terms of field quasicrystals.Here,we present the first demonstration of photonic quasicrystals formed by spin angular momentum, unveiling novel spin-orbit coupling effects absent in traditional field quasicrystals.A de Bruijn tiling like theoretical framework was built elucidating the formation mechanism of spin quasicrystals for diverse symmetries.Moreover,the configurations of these spin textures can be manipulated through the adjustments of the wavefronts,among which phason-like discontinuous dynamics is observed and quantitatively measured. Unlike optical quasicrystals shaped by electromagnetic fields,these spin-governed quasicrystals exhibit quasi-periodic properties of kinematic parameters,extending their potential applications to other physical systems. These findings hold promise for novel advancements in optical trapping,quasicrystal fabrication,and optical encryption systems. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08032 [pdf, other]

Rossby Wave Instability and Substructure Formation in 3D Non-Ideal MHD Wind-Launching Disks

Authors: Chun-Yen Hsu, Zhi-Yun Li, Yisheng Tu, Xiao Hu, Min-Kai Lin

Abstract: Rings and gaps are routinely observed in the dust continuum emission of protoplanetary discs (PPDs). How they form and evolve remains debated. Previous studies have demonstrated the possibility of spontaneous gas rings and gaps formation in wind-launching disks. Here, we show that such gas substructures are unstable to the Rossby Wave Instability (RWI) through numerical simulations. Specifically,… ▽ More Rings and gaps are routinely observed in the dust continuum emission of protoplanetary discs (PPDs). How they form and evolve remains debated. Previous studies have demonstrated the possibility of spontaneous gas rings and gaps formation in wind-launching disks. Here, we show that such gas substructures are unstable to the Rossby Wave Instability (RWI) through numerical simulations. Specifically, shorter wavelength azimuthal modes develop earlier, and longer wavelength ones dominate later, forming elongated (arc-like) anti-cyclonic vortices in the rings and (strongly magnetized) cyclonic vortices in the gaps that persist until the end of the simulation. Highly elongated vortices with aspect ratios of 10 or more are found to decay with time in our non-ideal MHD simulation, in contrast with the hydro case. This difference could be caused by magnetically induced motions, particularly strong meridional circulations with large values of the azimuthal component of the vorticity, which may be incompatible with the columnar structure preferred by vortices. The cyclonic and anti-cyclonic RWI vortices saturate at moderate levels, modifying but not destroying the rings and gaps in the radial gas distribution of the disk. In particular, they do not shut off the poloidal magnetic flux accumulation in low-density regions and the characteristic meridional flow patterns that are crucial to the ring and gap formation in wind-launching disks. Nevertheless, the RWI and their associated vortices open up the possibility of producing non-axisymmetric dust features observed in a small fraction of protoplanetary disks through non-ideal MHD, although detailed dust treatment is needed to explore this possibility. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07330 [pdf]

Interpretable Differential Diagnosis with Dual-Inference Large Language Models

Authors: Shuang Zhou, Sirui Ding, Jiashuo Wang, Mingquan Lin, Genevieve B. Melton, Rui Zhang

Abstract: Methodological advancements to automate the generation of differential diagnosis (DDx) to predict a list of potential diseases as differentials given patients' symptom descriptions are critical to clinical reasoning and applications such as decision support. However, providing reasoning or interpretation for these differential diagnoses is more meaningful. Fortunately, large language models (LLMs)… ▽ More Methodological advancements to automate the generation of differential diagnosis (DDx) to predict a list of potential diseases as differentials given patients' symptom descriptions are critical to clinical reasoning and applications such as decision support. However, providing reasoning or interpretation for these differential diagnoses is more meaningful. Fortunately, large language models (LLMs) possess powerful language processing abilities and have been proven effective in various related tasks. Motivated by this potential, we investigate the use of LLMs for interpretable DDx. First, we develop a new DDx dataset with expert-derived interpretation on 570 public clinical notes. Second, we propose a novel framework, named Dual-Inf, that enables LLMs to conduct bidirectional inference for interpretation. Both human and automated evaluation demonstrate the effectiveness of Dual-Inf in predicting differentials and diagnosis explanations. Specifically, the performance improvement of Dual-Inf over the baseline methods exceeds 32% w.r.t. BERTScore in DDx interpretation. Furthermore, experiments verify that Dual-Inf (1) makes fewer errors in interpretation, (2) has great generalizability, (3) is promising for rare disease diagnosis and explanation. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 15 pages

arXiv:2407.06184 [pdf, ps, other]

Integral aspects of Fourier duality for abelian varieties

Authors: Junaid Hasan, Hazem Hassan, Milton Lin, Marcella Manivel, Lily McBeath, Ben Moonen

Abstract: We prove several results about integral versions of Fourier duality for abelian schemes, making use of Pappas's work on integral Grothendieck-Riemann-Roch. If $S$ is smooth quasi-projective of dimension $d$ over a field and $π\colon X\to S$ is a $g$-dimensional abelian scheme, we prove, under very mild assumptions on $X/S$, that all classical results about Fourier duality, including the existence… ▽ More We prove several results about integral versions of Fourier duality for abelian schemes, making use of Pappas's work on integral Grothendieck-Riemann-Roch. If $S$ is smooth quasi-projective of dimension $d$ over a field and $π\colon X\to S$ is a $g$-dimensional abelian scheme, we prove, under very mild assumptions on $X/S$, that all classical results about Fourier duality, including the existence of a Beauville decomposition, are valid for the Chow ring $\mathrm{CH}(X;Λ)$ with coefficients in the ring $Λ= \mathbb{Z}[1/(2g+d+1)!]$. If $X$ admits a polarization $θ$ of degree $ν(θ)^2$ we further construct an $\mathfrak{sl}_2$-action on $\mathrm{CH}(X;Λ_θ)$ with $Λ_θ= Λ[1/ν(θ)]$, and we show that $\mathrm{CH}(X;Λ_θ)$ is a sum of copies of the symmetric powers $\mathrm{Sym}^n(\mathrm{St})$ of the $2$-dimensional standard representation, for $n=0,\ldots,g$. For an abelian variety over an algebraically closed field, we use our results to produce torsion classes in $\mathrm{CH}^i(X;Λ_θ)$ for every $i\in \{1,\ldots,g\}$. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 22 pages

MSC Class: 14C15; 14K05

arXiv:2407.06027 [pdf, other]

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficult to use. To address this issue, we propose PAS, an LLM-based plug-and-play APE system. PAS utilizes LLMs trained on high-quality, automatically generated prompt complementary datasets, resulting in exceptional performance. In comprehensive benchmarks, PAS achieves state-of-the-art (SoTA) results compared to previous APE models, with an average improvement of 6.09 points. Moreover, PAS is highly efficient, achieving SoTA performance with only 9000 data points. Additionally, PAS can autonomously generate prompt augmentation data without requiring additional human labor. Its flexibility also allows it to be compatible with all existing LLMs and applicable to a wide range of tasks. PAS excels in human evaluations, underscoring its suitability as a plug-in for users. This combination of high performance, efficiency, and flexibility makes PAS a valuable system for enhancing the usability and effectiveness of LLMs through improved prompt engineering. △ Less

Submitted 12 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04241 [pdf, other]

AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource

Authors: Wengyi Zhan, Mingbao Lin, Chia-Wen Lin, Rongrong Ji

Abstract: In an effort to improve the efficiency and scalability of single-image super-resolution (SISR) applications, we introduce AnySR, to rebuild existing arbitrary-scale SR methods into any-scale, any-resource implementation. As a contrast to off-the-shelf methods that solve SR tasks across various scales with the same computing costs, our AnySR innovates in: 1) building arbitrary-scale tasks as any-re… ▽ More In an effort to improve the efficiency and scalability of single-image super-resolution (SISR) applications, we introduce AnySR, to rebuild existing arbitrary-scale SR methods into any-scale, any-resource implementation. As a contrast to off-the-shelf methods that solve SR tasks across various scales with the same computing costs, our AnySR innovates in: 1) building arbitrary-scale tasks as any-resource implementation, reducing resource requirements for smaller scales without additional parameters; 2) enhancing any-scale performance in a feature-interweaving fashion, inserting scale pairs into features at regular intervals and ensuring correct feature/scale processing. The efficacy of our AnySR is fully demonstrated by rebuilding most existing arbitrary-scale SISR methods and validating on five popular SISR test datasets. The results show that our AnySR implements SISR tasks in a computing-more-efficient fashion, and performs on par with existing arbitrary-scale SISR methods. For the first time, we realize SISR tasks as not only any-scale in literature, but also as any-resource. Code is available at https://github.com/CrispyFeSo4/AnySR. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.02764 [pdf, other]

Data-driven Software-based Power Estimation for Embedded Devices

Authors: Haoyu Wang, Xinyi Li, Ti Zhou, Man Lin

Abstract: Energy measurement of computer devices, which are widely used in the Internet of Things (IoT), is an important yet challenging task. Most of these IoT devices lack ready-to-use hardware or software for power measurement. A cost-effective solution is to use low-end consumer-grade power meters. However, these low-end power meters cannot provide accurate instantaneous power measurements. In this pape… ▽ More Energy measurement of computer devices, which are widely used in the Internet of Things (IoT), is an important yet challenging task. Most of these IoT devices lack ready-to-use hardware or software for power measurement. A cost-effective solution is to use low-end consumer-grade power meters. However, these low-end power meters cannot provide accurate instantaneous power measurements. In this paper, we propose an easy-to-use approach to derive an instantaneous software-based energy estimation model with only low-end power meters based on data-driven analysis through machine learning. Our solution is demonstrated with a Jetson Nano board and Ruideng UM25C USB power meter. Various machine learning methods combined with our smart data collection method and physical measurement are explored. Benchmarks were used to evaluate the derived software-power model for the Jetson Nano board and Raspberry Pi. The results show that 92% accuracy can be achieved compared to the long-duration measurement. A kernel module that can collect running traces of utilization and frequencies needed is developed, together with the power model derived, for power prediction for programs running in real environment. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02103 [pdf, ps, other]

Rossby wave instability in weakly ionized protoplanetary disks. I. azimuthal or vertical B-fields

Authors: Can Cui, Ashutosh Tripathi, Cong Yu, Min-Kai Lin, Andrew Youdin

Abstract: Rossby wave instability (RWI) is considered the underlying mechanism to crescent-shaped azimuthal asymmetries, discovered in (sub-)millimeter dust continuum of many protoplanetary disks. Previous works on linear theory were conducted in the hydrodynamic limit. Nevertheless, protoplanetary disks are likely magnetized and weakly ionized. We examine the influence of magnetic fields and non-ideal magn… ▽ More Rossby wave instability (RWI) is considered the underlying mechanism to crescent-shaped azimuthal asymmetries, discovered in (sub-)millimeter dust continuum of many protoplanetary disks. Previous works on linear theory were conducted in the hydrodynamic limit. Nevertheless, protoplanetary disks are likely magnetized and weakly ionized. We examine the influence of magnetic fields and non-ideal magnetohydrodynamic (MHD) effects - namely, Ohmic resistivity, Hall drift, and ambipolar diffusion - on the RWI unstable modes. We perform radially global linear analyses, employing constant azimuthal ($B_φ$) or vertical ($B_z$) background magnetic fields. It is found that, in the ideal MHD regime, magnetism can either enhance or diminish RWI growth. Strong non-ideal MHD effects cause RWI growth rates to recover hydrodynamic results. The sign of Hall Elsässer number subtly complicates the results, and vertical wavenumbers generically diminish growth rates. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 13 pages, 4 figures, submitted to MNRAS

arXiv:2407.01492 [pdf, other]

RegMix: Data Mixture as Regression for Language Model Pre-training

Authors: Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin

Abstract: The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance gi… ▽ More The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance given their respective mixtures. With the fitted regression model, we simulate the top-ranked mixture and use it to train a large-scale model with orders of magnitude more compute. To empirically validate RegMix, we train 512 models with 1M parameters for 1B tokens of different mixtures to fit the regression model and find the optimal mixture. Using this mixture we train a 1B parameter model for 25B tokens (i.e. 1000x larger and 25x longer) which we find performs best among 64 candidate 1B parameter models with other mixtures. Further, our method demonstrates superior performance compared to human selection and achieves results that match or surpass DoReMi, while utilizing only 10% of the compute budget. Our experiments also show that (1) Data mixtures significantly impact performance with single-task performance variations of up to 14.6%; (2) Web corpora rather than data perceived as high-quality like Wikipedia have the strongest positive correlation with downstream performance; (3) Domains interact in complex ways often contradicting common sense, thus automatic approaches like RegMix are needed; (4) Data mixture effects transcend scaling laws, and our approach captures the complexity by considering all domains together. Our code is available at https://github.com/sail-sg/regmix. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00631 [pdf, other]

TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

Authors: Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

Abstract: Clinical trials are pivotal for developing new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More Clinical trials are pivotal for developing new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex data collection and question definition requiring medical expertise and a deep understanding of trial designs have hindered the involvement of AI thus far. This paper tackles these challenges by presenting a comprehensive suite of meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design, encompassing prediction of trial duration, patient dropout rate, serious adverse event, mortality rate, trial approval outcome, trial failure reason, drug dose finding, design of eligibility criteria. Furthermore, we provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and accelerating medical solution development. The curated dataset, metrics, and basic models are publicly available at https://github.com/ML2Health/ML2ClinicalTrials/tree/main/AI4Trial. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00497 [pdf, other]

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Authors: Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuanjing Huang, Shuicheng Yan

Abstract: This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles… ▽ More This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles. Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors. Our empirical studies, conducted with several open-source models, demonstrate significant improvements across multiple benchmarks, including mathematical reasoning, coding abilities, and factual knowledge. Notably, the refined Llama-3-8b-Instruction has outperformed ChatGPT, illustrating the effectiveness of our approach. By leveraging the strengths of both strategies, we have attained a more balanced performance improvement on both in-domain and out-of-domain benchmarks. Our code can be found at https://yingjiahao14.github.io/LLMs-as-Instructors-pages/. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00474 [pdf, other]

MH-pFLGB: Model Heterogeneous personalized Federated Learning via Global Bypass for Medical Image Analysis

Authors: Luyuan Xie, Manqing Lin, ChenMing Xu, Tianyu Luan, Zhipeng Zeng, Wenjun Qian, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

Abstract: In the evolving application of medical artificial intelligence, federated learning is notable for its ability to protect training data privacy. Federated learning facilitates collaborative model development without the need to share local data from healthcare institutions. Yet, the statistical and system heterogeneity among these institutions poses substantial challenges, which affects the effecti… ▽ More In the evolving application of medical artificial intelligence, federated learning is notable for its ability to protect training data privacy. Federated learning facilitates collaborative model development without the need to share local data from healthcare institutions. Yet, the statistical and system heterogeneity among these institutions poses substantial challenges, which affects the effectiveness of federated learning and hampers the exchange of information between clients. To address these issues, we introduce a novel approach, MH-pFLGB, which employs a global bypass strategy to mitigate the reliance on public datasets and navigate the complexities of non-IID data distributions. Our method enhances traditional federated learning by integrating a global bypass model, which would share the information among the clients, but also serves as part of the network to enhance the performance on each client. Additionally, MH-pFLGB provides a feature fusion module to better combine the local and global features. We validate \model{}'s effectiveness and adaptability through extensive testing on different medical tasks, demonstrating superior performance compared to existing state-of-the-art methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2405.06822

arXiv:2407.00462 [pdf, other]

pFLFE: Cross-silo Personalized Federated Learning via Feature Enhancement on Medical Image Segmentation

Authors: Luyuan Xie, Manqing Lin, Siyuan Liu, ChenMing Xu, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

Abstract: In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement… ▽ More In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement (pFLFE), designed to mitigate these challenges. pFLFE consists of two main stages: feature enhancement and supervised learning. The first stage improves differentiation between foreground and background features, and the second uses these enhanced features for learning from segmentation masks. We also design an alternative training approach that requires fewer communication rounds without compromising segmentation quality, even with limited communication resources. Through experiments on three medical segmentation tasks, we demonstrate that pFLFE outperforms the state-of-the-art methods. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19438 [pdf, other]

Shoulder of Dust Rings Formed by Planet-disk Interactions

Authors: Jiaqing Bi, Min-Kai Lin

Abstract: Recent analyses of mm-wavelength protoplanetary disk observations have revealed several emission excesses on the previously identified dust rings, referred to as dust shoulders. The prevalence of dust shoulders suggests that they trace a common but unclear mechanism. In this work, we combine 3D, multifluid hydrodynamic simulations with radiative transfer calculations to explain the formation of du… ▽ More Recent analyses of mm-wavelength protoplanetary disk observations have revealed several emission excesses on the previously identified dust rings, referred to as dust shoulders. The prevalence of dust shoulders suggests that they trace a common but unclear mechanism. In this work, we combine 3D, multifluid hydrodynamic simulations with radiative transfer calculations to explain the formation of dust shoulders. We find that the ring-shoulder pairs can result from the 3D planet-disk interactions with massive, gap-opening planets. The key driver is the dust filtration effect at the local pressure maximum due to planet-driven outward gas flows. Our work provides a possible explanation for the outer dust shoulders in recent super-resolution analyses of ALMA observations. It also provides insights into the formation of the inner dust shoulder in the PDS 70 disk and highlights the role of 3D effects in planet-disk interaction studies. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: accepted to ApJ

arXiv:2406.18173 [pdf, other]

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

Authors: Wenhao Li, Mingbao Lin, Yunshan Zhong, Shuicheng Yan, Rongrong Ji

Abstract: Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a c… ▽ More Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a context segment into memories and leverage these memories to predict outputs of the subsequent segment. Subsequently, by treating our memory-enhanced transformers as fully-connected recurrent neural networks (RNNs), we refine the training process using the Truncated Backpropagation Through Time (TBPTT) algorithm, which incorporates innovative incremental optimization techniques. These techniques not only diminish time complexity but also address the bias in gradient computation through an unbiased optimization process. UIO-LLMs successfully handle long context, such as extending the context window of Llama2-7b-chat from 4K to 100K tokens with minimal 2% additional parameters, while keeping the inference cost nearly linear as context length increases. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17628 [pdf, other]

Video Inpainting Localization with Contrastive Learning

Authors: Zijie Lou, Gang Cao, Man Lin

Abstract: Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning… ▽ More Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning effective spatiotemporal forensic features. To enhance the discriminative power, supervised contrastive learning is adopted to capture the local inconsistency of inpainted videos through attracting/repelling the positive/negative pristine and forged pixel pairs. A pixel-wise inpainting localization map is yielded by a lightweight convolution decoder with a specialized two-stage training strategy. To prepare enough training samples, we build a video object segmentation dataset of 2500 videos with pixel-level annotations per frame. Extensive experimental results validate the superiority of ViLocal over state-of-the-arts. Code and dataset will be available at https://github.com/multimediaFor/ViLocal. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2406.13576

arXiv:2406.17252 [pdf, other]

Resource-Optimized Grouping Shadow for Efficient Energy Estimation

Authors: Min Li, Mao Lin, Matthew J. S. Beach

Abstract: The accurate and efficient energy estimation of quantum Hamiltonians consisting of Pauli observables is an essential task in modern quantum computing. We introduce a Resource-Optimized Grouping Shadow (ROGS) algorithm, which optimally allocates measurement resources by minimizing the estimation error bound through a novel overlapped grouping strategy and convex optimization. Our numerical experime… ▽ More The accurate and efficient energy estimation of quantum Hamiltonians consisting of Pauli observables is an essential task in modern quantum computing. We introduce a Resource-Optimized Grouping Shadow (ROGS) algorithm, which optimally allocates measurement resources by minimizing the estimation error bound through a novel overlapped grouping strategy and convex optimization. Our numerical experiments demonstrate that ROGS requires significantly fewer unique quantum circuits for accurate estimation accuracy compared to existing methods given a fixed measurement budget, addressing a major cost factor for compiling and executing circuits on quantum computers. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: 22 pages, 5 figures

arXiv:2406.14162 [pdf, other]

DIRAS: Efficient LLM-Assisted Annotation of Document Relevance in Retrieval Augmented Generation

Authors: Jingwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold

Abstract: Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information or excessively include irrelevant information? To allay these concerns, it is necessary to annotate domain-specific benchmarks to evaluate information retrieval (IR) performance, as relevance definitions vary across queries… ▽ More Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information or excessively include irrelevant information? To allay these concerns, it is necessary to annotate domain-specific benchmarks to evaluate information retrieval (IR) performance, as relevance definitions vary across queries and domains. Furthermore, such benchmarks should be cost-efficiently annotated to avoid annotation selection bias. In this paper, we propose DIRAS (Domain-specific Information Retrieval Annotation with Scalability), a manual-annotation-free schema that fine-tunes open-sourced LLMs to annotate relevance labels with calibrated relevance probabilities. Extensive evaluation shows that DIRAS fine-tuned models achieve GPT-4-level performance on annotating and ranking unseen (query, document) pairs, and is helpful for real-world RAG development. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13576 [pdf, other]

Trusted Video Inpainting Localization via Deep Attentive Noise Learning

Authors: Zijie Lou, Gang Cao, Man Lin

Abstract: Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpaintin… ▽ More Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpainting Localization network (TruVIL) with excellent robustness and generalization ability. Observing that high-frequency noise can effectively unveil the inpainted regions, we design deep attentive noise learning in multiple stages to capture the inpainting traces. Firstly, a multi-scale noise extraction module based on 3D High Pass (HP3D) layers is used to create the noise modality from input RGB frames. Then the correlation between such two complementary modalities are explored by a cross-modality attentive fusion module to facilitate mutual feature learning. Lastly, spatial details are selectively enhanced by an attentive noise decoding module to boost the localization performance of the network. To prepare enough training samples, we also build a frame-level video object segmentation dataset of 2500 videos with pixel-level annotation for all frames. Extensive experimental results validate the superiority of TruVIL compared with the state-of-the-arts. In particular, both quantitative and qualitative evaluations on various inpainted videos verify the remarkable robustness and generalization ability of our proposed TruVIL. Code and dataset will be available at https://github.com/multimediaFor/TruVIL. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13083 [pdf, other]

Design and Performance of a Magnetic Bottle Electron Spectrometer for High-Energy Photoelectron Spectroscopy

Authors: Kurtis Borne, Jordan T ONeal, Jun Wang, Erk Isele, Razib Obaid, Nora Berrah, Xinxin Cheng, Philip H Bucksbaum, Justin James, Andri Kamalov, Kirk A Larsen, Xiang Li, Ming-Fu Lin, Yusong Liu, Agostino Marinelli, Adam Summers, Emily Thierstein, Thomas Wolf, Daniel Rolles, Peter Walter, James P Cryan, Taran Driver

Abstract: We describe the design and performance of a magnetic bottle electron spectrometer~(MBES) for high-energy electron spectroscopy. Our design features a ${\sim2}$~m long electron drift tube and electrostatic retardation lens, achieving sub-electronvolt (eV) electron kinetic energy resolution for high energy (several hundred eV) electrons with close to 4$π$ collection efficiency. A segmented anode… ▽ More We describe the design and performance of a magnetic bottle electron spectrometer~(MBES) for high-energy electron spectroscopy. Our design features a ${\sim2}$~m long electron drift tube and electrostatic retardation lens, achieving sub-electronvolt (eV) electron kinetic energy resolution for high energy (several hundred eV) electrons with close to 4$π$ collection efficiency. A segmented anode electron detector enables the simultaneous collection of photoelectron spectra in high resolution and high collection efficiency modes. This versatile instrument is installed at the TMO endstation at the LCLS x-ray free-electron laser (XFEL). In this paper, we demonstrate its high resolution, collection efficiency and spatial selectivity in measurements where it is coupled to an XFEL source. These combined characteristics are designed to enable high-resolution time-resolved measurements using x-ray photoelectron, absorption, and Auger-Meitner spectroscopy. We also describe the pervasive artifact in MBES time-of-flight spectra that arises from a periodic modulation in electron detection efficiency, and present a robust analysis procedure for its removal. △ Less

Submitted 4 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10740 [pdf, other]

FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models

Authors: Zhikai Zhang, Yitang Li, Haofeng Huang, Mingxian Lin, Li Yi

Abstract: Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same… ▽ More Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same time, foundation models trained with internet-scale image and text data have demonstrated surprising world knowledge and reasoning ability for various downstream tasks. Utilizing these foundation models may help with human motion synthesis, which some recent works have superficially explored. However, these methods didn't fully unveil the foundation models' potential for this task and only support several simple actions and environments. In this paper, we for the first time, without any motion data, explore open-set human motion synthesis using natural language instructions as user control signals based on MLLMs across any motion task and environment. Our framework can be split into two stages: 1) sequential keyframe generation by utilizing MLLMs as a keyframe designer and animator; 2) motion filling between keyframes through interpolation and motion tracking. Our method can achieve general human motion synthesis for many downstream tasks. The promising results demonstrate the worth of mocap-free human motion synthesis aided by MLLMs and pave the way for future research. △ Less

Submitted 21 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.09836 [pdf, other]

Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks

Authors: Zhiwei Zhang, Minhua Lin, Junjie Xu, Zongyu Wu, Enyan Dai, Suhang Wang

Abstract: Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor at… ▽ More Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor attacks where generated triggers have different properties. Hence, we first empirically verify that prediction variance under edge dropping is a crucial indicator for identifying poisoned nodes. With this observation, we propose using random edge dropping to detect backdoors and theoretically show that it can efficiently distinguish poisoned nodes from clean ones. Furthermore, we introduce a novel robust training strategy to efficiently counteract the impact of the triggers. Extensive experiments on real-world datasets show that our framework can effectively identify poisoned nodes, significantly degrade the attack success rate, and maintain clean accuracy when defending against various types of graph backdoor attacks with different properties. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09760 [pdf, other]

Bootstrapping Language Models with DPO Implicit Rewards

Authors: Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

Abstract: Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that… ▽ More Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that this implicit reward model can by itself be used in a bootstrapping fashion to further align the LLM. Our approach is to use the rewards from a current LLM model to construct a preference dataset, which is then used in subsequent DPO rounds. We incorporate refinements that debias the length of the responses and improve the quality of the preference dataset to further improve our approach. Our approach, named self-alignment with DPO ImpliCit rEwards (DICE), shows great improvements in alignment and achieves superior performance than Gemini Pro on AlpacaEval 2, reaching 27.55% length-controlled win rate against GPT-4 Turbo, but with only 8B parameters and no external feedback. Our code is available at https://github.com/sail-sg/dice. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09648 [pdf, other]

An Intrinsic Vector Heat Network

Authors: Alexander Gao, Maurice Chu, Mubbasir Kapadia, Ming C. Lin, Hsueh-Ti Derek Liu

Abstract: Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-… ▽ More Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-valued architectures to process channels individually, thus fail to preserve fundamental intrinsic properties of the vector field. The core idea of this work is to introduce a trainable vector heat diffusion module to spatially propagate vector-valued feature data across the surface, which we incorporate into our proposed architecture that consists of vector-valued neurons. Our architecture is invariant to rigid motion of the input, isometric deformation, and choice of local tangent bases, and is robust to discretizations of the surface. We evaluate our Vector Heat Network on triangle meshes, and empirically validate its invariant properties. We also demonstrate the effectiveness of our method on the useful industrial application of quadrilateral mesh generation. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.09136 [pdf, other]

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT dec… ▽ More The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT decoding might overlook. This deliberation, however, comes at the cost of significantly increased inference complexity. In this work, we demonstrate that fine-tuning LLMs leveraging the search tree constructed by ToT allows CoT to achieve similar or better performance, thereby avoiding the substantial inference burden. This is achieved through Chain of Preference Optimization (CPO), where LLMs are fine-tuned to align each step of the CoT reasoning paths with those of ToT using the inherent preference information in the tree-search process. Extensive experimental results show that CPO significantly improves LLM performance in solving a variety of complex problems, including question answering, fact verification, and arithmetic reasoning, demonstrating its effectiveness. Our code is available at https://github.com/sail-sg/CPO. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.06971 [pdf, ps, other]

Polar alignment of a dusty circumbinary disc -- I. Dust ring formation

Authors: Jeremy L. Smallwood, Min-Kai Lin, Hossam Aly, Rebecca Nealon, Cristiano Longarini

Abstract: We investigate the formation of dust traffic jams in polar-aligning circumbinary discs. We use 3D smoothed particle hydrodynamical simulations of both gas and dust to model an initially highly misaligned circumbinary disc around an eccentric binary. As the circumbinary disc evolves to a polar configuration (perpendicular to the binary orbital plane), the difference in the precession between the ga… ▽ More We investigate the formation of dust traffic jams in polar-aligning circumbinary discs. We use 3D smoothed particle hydrodynamical simulations of both gas and dust to model an initially highly misaligned circumbinary disc around an eccentric binary. As the circumbinary disc evolves to a polar configuration (perpendicular to the binary orbital plane), the difference in the precession between the gas and dust produces dust traffic jams, which become dense dust rings. We find the formation of dust rings exists for different Stokes number, binary eccentricity, and initial disc tilt. Dust rings are only produced while the circumbinary disc is misaligned to the binary orbital plane. When the disc becomes polar aligned, the dust rings are still present and long-lived. Once these dust rings are formed, they drift inward. The drift timescale depends on the Stokes number. The lower the Stokes number, the faster the dust ring drifts near the inner edge of the disc. The dust rings will have an increased midplane dust-to-go ratio, which may be a favourable environment for the steaming instability to operate. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted for publication in MNRAS, 19 pages, 17 figures

arXiv:2406.06038 [pdf, other]

Navigation and 3D Surface Reconstruction from Passive Whisker Sensing

Authors: Michael A. Lin, Hao Li, Chengyi Xing, Mark R. Cutkosky

Abstract: Whiskers provide a way to sense surfaces in the immediate environment without disturbing it. In this paper we present a method for using highly flexible, curved, passive whiskers mounted along a robot arm to gather sensory data as they brush past objects during normal robot motion. The information is useful both for guiding the robot in cluttered spaces and for reconstructing the exposed faces of… ▽ More Whiskers provide a way to sense surfaces in the immediate environment without disturbing it. In this paper we present a method for using highly flexible, curved, passive whiskers mounted along a robot arm to gather sensory data as they brush past objects during normal robot motion. The information is useful both for guiding the robot in cluttered spaces and for reconstructing the exposed faces of objects. Surface reconstruction depends on accurate localization of contact points along each whisker. We present an algorithm based on Bayesian filtering that rapidly converges to within 1\,mm of the actual contact locations. The piecewise-continuous history of contact locations from each whisker allows for accurate reconstruction of curves on object surfaces. Employing multiple whiskers and traces, we are able to produce an occupancy map of proximal objects. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2210.12387

arXiv:2406.04313 [pdf, other]

Improving Alignment and Robustness with Circuit Breakers

Authors: Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks

Abstract: AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to… ▽ More AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, circuit-breaking directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, circuit breakers allow the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks. △ Less

Submitted 12 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Code and models are available at https://github.com/GraySwanAI/circuit-breakers

arXiv:2406.02859

ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

Abstract: Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrasti… ▽ More Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrastive phonemic ordinal regularizer (ConPCO) tailored for regression-based APA models to generate more phoneme-discriminative features while considering the ordinal relationships among the regression targets. The proposed ConPCO first aligns the phoneme representations of an APA model and textual embeddings of phonetic transcriptions via contrastive learning. Afterward, the phoneme characteristics are retained by regulating the distances between inter- and intra-phoneme categories in the feature space while allowing for the ordinal relationships among the output targets. We further design and develop a hierarchical APA model to evaluate the effectiveness of our method. Extensive experiments conducted on the speechocean762 benchmark dataset suggest the feasibility and efficacy of our approach in relation to some cutting-edge baselines. △ Less

Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: This paper has been withdrawn because the authors aim to achieve better organization in writing and more detailed experimental analysis

arXiv:2406.01602 [pdf, other]

Effectiveness of denoising diffusion probabilistic models for fast and high-fidelity whole-event simulation in high-energy heavy-ion experiments

Authors: Yeonju Go, Dmitrii Torbunov, Timothy Rinn, Yi Huang, Haiwang Yu, Brett Viren, Meifeng Lin, Yihui Ren, Jin Huang

Abstract: Artificial intelligence (AI) generative models, such as generative adversarial networks (GANs), variational auto-encoders, and normalizing flows, have been widely used and studied as efficient alternatives for traditional scientific simulations. However, they have several drawbacks, including training instability and inability to cover the entire data distribution, especially for regions where dat… ▽ More Artificial intelligence (AI) generative models, such as generative adversarial networks (GANs), variational auto-encoders, and normalizing flows, have been widely used and studied as efficient alternatives for traditional scientific simulations. However, they have several drawbacks, including training instability and inability to cover the entire data distribution, especially for regions where data are rare. This is particularly challenging for whole-event, full-detector simulations in high-energy heavy-ion experiments, such as sPHENIX at the Relativistic Heavy Ion Collider and Large Hadron Collider experiments, where thousands of particles are produced per event and interact with the detector. This work investigates the effectiveness of Denoising Diffusion Probabilistic Models (DDPMs) as an AI-based generative surrogate model for the sPHENIX experiment that includes the heavy-ion event generation and response of the entire calorimeter stack. DDPM performance in sPHENIX simulation data is compared with a popular rival, GANs. Results show that both DDPMs and GANs can reproduce the data distribution where the examples are abundant (low-to-medium calorimeter energies). Nonetheless, DDPMs significantly outperform GANs, especially in high-energy regions where data are rare. Additionally, DDPMs exhibit superior stability compared to GANs. The results are consistent between both central and peripheral centrality heavy-ion collision events. Moreover, DDPMs offer a substantial speedup of approximately a factor of 100 compared to the traditional Geant4 simulation method. △ Less

Submitted 23 May, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

arXiv:2406.01431 [pdf, other]

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

Authors: Laura Zheng, Sanghyun Son, Jing Liang, Xijun Wang, Brian Clipp, Ming C. Lin

Abstract: Kinematic priors have shown to be helpful in boosting generalization and performance in prior work on trajectory forecasting. Specifically, kinematic priors have been applied such that models predict a set of actions instead of future output trajectories. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories are not only kinematically fea… ▽ More Kinematic priors have shown to be helpful in boosting generalization and performance in prior work on trajectory forecasting. Specifically, kinematic priors have been applied such that models predict a set of actions instead of future output trajectories. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories are not only kinematically feasible on average but also relate uncertainty from one timestep to the next. With benchmarks supporting prediction of multiple trajectory predictions, deterministic kinematic priors are less and less applicable to current models. We propose a method for integrating probabilistic kinematic priors into modern probabilistic trajectory forecasting architectures. The primary difference between our work and previous techniques is the analytical quantification of variance, or uncertainty, in predicted trajectories. With negligible additional computational overhead, our method can be generalized and easily implemented with any modern probabilistic method that models candidate trajectories as Gaussian distributions. In particular, our method works especially well in unoptimal settings, such as with small datasets or in the presence of noise. Our method achieves up to a 50% performance boost in small dataset settings and up to an 8% performance boost in large-scale learning compared to previous kinematic prediction methods on SOTA trajectory forecasting architectures out-of-the-box, with minimal fine-tuning. In this paper, we show four analytical formulations of probabilistic kinematic priors which can be used for any Gaussian Mixture Model (GMM)-based deep learning models, quantify the error bound on linear approximations applied during trajectory unrolling, and show results to evaluate each formulation in trajectory forecasting. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.01425 [pdf, other]

Sensitivity-Informed Augmentation for Robust Segmentation

Authors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin

Abstract: Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In… ▽ More Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In addition to external noises like user difference or weather conditions, internal noises such as variations in camera quality or lens distortion can affect the performance of segmentation models during both development and deployment. In this work, we present an efficient, adaptable, and gradient-free method to enhance the robustness of learning-based segmentation models across training. First, we introduce a novel adaptive sensitivity analysis (ASA) using Kernel Inception Distance (KID) on basis perturbations to benchmark perturbation sensitivity of pre-trained segmentation models. Then, we model the sensitivity curve using the adaptive SA and sample perturbation hyperparameter values accordingly. Finally, we conduct adversarial training with the selected perturbation values and dynamically re-evaluate robustness during online training. Our method, implemented end-to-end with minimal fine-tuning required, consistently outperforms state-of-the-art data augmentation techniques for segmentation. It shows significant improvement in both clean data evaluation and real-world adverse scenario evaluation across various segmentation datasets used in visual computing and computer graphics applications. △ Less

Submitted 16 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 10 pages

arXiv:2406.01288 [pdf, other]

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Jing Jiang, Min Lin

Abstract: Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting specia… ▽ More Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting special system tokens like [/INST] and employing demo-level random search from a collected demo pool. These simple techniques result in surprisingly effective jailbreaking against aligned LLMs (even with advanced defenses). For examples, our method achieves >80% (mostly >95%) ASRs on Llama-2-7B and Llama-3-8B without multiple restarts, even if the models are enhanced by strong defenses such as perplexity detection and/or SmoothLLM, which is challenging for suffix-based jailbreaking. In addition, we conduct comprehensive and elaborate (e.g., making sure to use correct system prompts) evaluations against other aligned LLMs and advanced defenses, where our method consistently achieves nearly 100% ASRs. Our code is available at https://github.com/sail-sg/I-FSJ. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01032 [pdf, other]

LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

Authors: Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang

Abstract: Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract ins… ▽ More Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract insights from Large Language Models (LLMs). We introduce GALLON (Graph Learning from Large Language Model Distillation), a framework that synergizes the capabilities of LLMs and GNNs by distilling multimodal knowledge into a unified Multilayer Perceptron (MLP). This method integrates the rich textual and visual data of molecules with the structural analysis power of GNNs. Extensive experiments reveal that our distilled MLP model notably improves the accuracy and efficiency of molecular property predictions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.21018 [pdf, other]

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

Authors: Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin

Abstract: Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milesto… ▽ More Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milestone, its attacking efficiency remains unsatisfactory. In this paper, we present several improved (empirical) techniques for optimization-based jailbreaks like GCG. We first observe that the single target template of "Sure" largely limits the attacking performance of GCG; given this, we propose to apply diverse target templates containing harmful self-suggestion and/or guidance to mislead LLMs. Besides, from the optimization aspects, we propose an automatic multi-coordinate updating strategy in GCG (i.e., adaptively deciding how many tokens to replace in each step) to accelerate convergence, as well as tricks like easy-to-hard initialisation. Then, we combine these improved technologies to develop an efficient jailbreak method, dubbed I-GCG. In our experiments, we evaluate on a series of benchmarks (such as NeurIPS 2023 Red Teaming Track). The results demonstrate that our improved techniques can help GCG outperform state-of-the-art jailbreaking attacks and achieve nearly 100% attack success rate. The code is released at https://github.com/jiaxiaojunQAQ/I-GCG. △ Less

Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.19026 [pdf, other]

DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints

Authors: Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-jin Liu, Zilong Zheng, Gao Huang

Abstract: Recent advances in large language models (LLMs) have made them indispensable, raising significant concerns over managing their safety. Automated red teaming offers a promising alternative to the labor-intensive and error-prone manual probing for vulnerabilities, providing more consistent and scalable safety evaluations. However, existing approaches often compromise diversity by focusing on maximiz… ▽ More Recent advances in large language models (LLMs) have made them indispensable, raising significant concerns over managing their safety. Automated red teaming offers a promising alternative to the labor-intensive and error-prone manual probing for vulnerabilities, providing more consistent and scalable safety evaluations. However, existing approaches often compromise diversity by focusing on maximizing attack success rate. Additionally, methods that decrease the cosine similarity from historical embeddings with semantic diversity rewards lead to novelty stagnation as history grows. To address these issues, we introduce DiveR-CT, which relaxes conventional constraints on the objective and semantic reward, granting greater freedom for the policy to enhance diversity. Our experiments demonstrate DiveR-CT's marked superiority over baselines by 1) generating data that perform better in various diversity metrics across different attack success rate levels, 2) better-enhancing resiliency in blue team models through safety tuning based on collected data, 3) allowing dynamic control of objective weights for reliable and controllable attack success rates, and 4) reducing susceptibility to reward overoptimization. Project details and code can be found at https://andrewzh112.github.io/#diverct. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18810 [pdf, other]

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

Authors: Jingjing Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji

Abstract: Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three… ▽ More Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR2024

arXiv:2405.18256 [pdf]

doi 10.1021/acsnano.4c00422

Electrical Control Grain Dimensionality with Multilevel Magnetic Anisotropy

Authors: Shengyao Li, Sabpreet Bhatti, Siew Lang Teo, Ming Lin, Xinyue Pan, Zherui Yang, Peng Song, Wanghao Tian, Xinyu He, Jianwei Chai, Xian Jun Loh, Qiang Zhu, S. N. Piramanayagam, Xiao Renshaw Wang

Abstract: In alignment with the increasing demand for larger storage capacity and longer data retention, electrical control of magnetic anisotropy has been a research focus in the realm of spintronics. Typically, magnetic anisotropy is determined by grain dimensionality, which is set during the fabrication of magnetic thin films. Despite the intrinsic correlation between magnetic anisotropy and grain dimens… ▽ More In alignment with the increasing demand for larger storage capacity and longer data retention, electrical control of magnetic anisotropy has been a research focus in the realm of spintronics. Typically, magnetic anisotropy is determined by grain dimensionality, which is set during the fabrication of magnetic thin films. Despite the intrinsic correlation between magnetic anisotropy and grain dimensionality, there is a lack of experimental evidence for electrically controlling grain dimensionality, thereby impeding the efficiency of magnetic anisotropy modulation. Here, we demonstrate an electric field control of grain dimensionality and prove it as the active mechanism for tuning interfacial magnetism. The reduction in grain dimensionality is associated with a transition from ferromagnetic to superparamagnetic behavior. We achieve a non-volatile and reversible modulation of the coercivity in both the ferromagnetic and superparamagnetic regimes. Subsequent electrical and elemental analysis confirms the variation in grain dimensionality upon the application of gate voltages, revealing a transition from a multidomain to a single-domain state accompanied by a reduction in grain dimensionality. Furthermore, we exploit the influence of grain dimensionality on domain wall motion, extending its applicability to multilevel magnetic memory and synaptic devices. Our results provide a strategy for tuning interfacial magnetism through grain size engineering for advancements in high-performance spintronics. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.16279 [pdf, other]

AI-Assisted Detector Design for the EIC (AID(2)E)

Authors: M. Diefenthaler, C. Fanelli, L. O. Gerlach, W. Guan, T. Horn, A. Jentsch, M. Lin, K. Nagai, H. Nayak, C. Pecar, K. Suresh, A. Vossen, T. Wang, T. Wenaus

Abstract: Artificial Intelligence is poised to transform the design of complex, large-scale detectors like the ePIC at the future Electron Ion Collider. Featuring a central detector with additional detecting systems in the far forward and far backward regions, the ePIC experiment incorporates numerous design parameters and objectives, including performance, physics reach, and cost, constrained by mechanical… ▽ More Artificial Intelligence is poised to transform the design of complex, large-scale detectors like the ePIC at the future Electron Ion Collider. Featuring a central detector with additional detecting systems in the far forward and far backward regions, the ePIC experiment incorporates numerous design parameters and objectives, including performance, physics reach, and cost, constrained by mechanical and geometric limits. This project aims to develop a scalable, distributed AI-assisted detector design for the EIC (AID(2)E), employing state-of-the-art multiobjective optimization to tackle complex designs. Supported by the ePIC software stack and using Geant4 simulations, our approach benefits from transparent parameterization and advanced AI features. The workflow leverages the PanDA and iDDS systems, used in major experiments such as ATLAS at CERN LHC, the Rubin Observatory, and sPHENIX at RHIC, to manage the compute intensive demands of ePIC detector simulations. Tailored enhancements to the PanDA system focus on usability, scalability, automation, and monitoring. Ultimately, this project aims to establish a robust design capability, apply a distributed AI-assisted workflow to the ePIC detector, and extend its applications to the design of the second detector (Detector-2) in the EIC, as well as to calibration and alignment tasks. Additionally, we are developing advanced data science tools to efficiently navigate the complex, multidimensional trade-offs identified through this optimization process. △ Less

Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

Comments: 11 pages, 4 figures, AI4EIC 2023 proceeding

arXiv:2405.15362 [pdf, other]

Pipeline Parallelism with Controllable Memory

Authors: Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin

Abstract: Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules,… ▽ More Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules, to the best of our knowledge, are memory inefficient. To address this, we introduce a family of memory efficient building blocks with controllable activation memory, which can reduce the peak activation memory to 1/2 of 1F1B without sacrificing efficiency, and even to 1/3 with comparable throughput. We can also achieve almost zero pipeline bubbles while maintaining the same activation memory as 1F1B. Our evaluations demonstrate that in pure pipeline parallelism settings, our methods outperform 1F1B by from 7% to 55% in terms of throughput. When employing a grid search over hybrid parallelism hyperparameters in practical scenarios, our proposed methods demonstrate a 16% throughput improvement over the 1F1B baseline for large language models. △ Less

Submitted 10 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14341 [pdf, other]

How do Observable Users Decompose D3 Code? An Exploratory Study

Authors: Melissa Lin, Heer Patel, Medina Lamkin, Tukey Tu, Hannah Bako, Soham Raut, Leilani Battle

Abstract: Users often struggle to program visualizations using complex toolkits like D3. Before we can design effective code assistants to support them, we must first understand how D3 users reason about their code. In this work, we explore users' understanding of D3 using an important gauge of code comprehension in CS education: code decomposition. We qualitatively analyze 560 D3 programs published on Obse… ▽ More Users often struggle to program visualizations using complex toolkits like D3. Before we can design effective code assistants to support them, we must first understand how D3 users reason about their code. In this work, we explore users' understanding of D3 using an important gauge of code comprehension in CS education: code decomposition. We qualitatively analyze 560 D3 programs published on Observable and identify three distinct strategies to decomposing D3 programs: segmenting code into layers of functionality, keeping everything all in one cell, or creating reusable visualization functions. We also observe how users inherit decomposition methods from copied examples and reorganize copied code to suit their needs. We corroborate our findings for decomposition preferences through interviews with D3 and Observable users. Based on our findings, we suggest strategies for generating more intuitive D3 code recommendations using decomposition preferences and highlight new research opportunities for visualization code assistants. All supplemental materials are available at https://osf.io/sudb8/?view_only=302fc5c8d397412aac35c6e094ae7dd6. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13685 [pdf, other]

Prompt Mixing in Diffusion Models using the Black Scholes Algorithm

Authors: Divya Kothandaraman, Ming Lin, Dinesh Manocha

Abstract: We introduce a novel approach for prompt mixing, aiming to generate images at the intersection of multiple text prompts using pre-trained text-to-image diffusion models. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. To do so, we leverage the connection between diffusion models (rooted in… ▽ More We introduce a novel approach for prompt mixing, aiming to generate images at the intersection of multiple text prompts using pre-trained text-to-image diffusion models. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. To do so, we leverage the connection between diffusion models (rooted in non-equilibrium thermodynamics) and the Black-Scholes model for pricing options in Finance, and draw analogies between the variables in both contexts to derive an appropriate algorithm for prompt mixing using the Black Scholes model. Specifically, the parallels between diffusion models and the Black-Scholes model enable us to leverage properties related to the dynamics of the Markovian model derived in the Black-Scholes algorithm. Our prompt-mixing algorithm is data-efficient, meaning it does not need additional training. Furthermore, it operates without human intervention or hyperparameter tuning. We highlight the benefits of our approach by comparing it qualitatively and quantitatively to other prompt mixing techniques, including linear interpolation, alternating prompts, step-wise prompt switching, and CLIP-guided prompt selection across various scenarios such as single object per text prompt, multiple objects per text prompt and objects against backgrounds. Code is available at https://github.com/divyakraman/BlackScholesDiffusion2024. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.10757 [pdf, other]

doi 10.1145/3637528.3671910

Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective

Authors: Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang

Abstract: Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger… ▽ More Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate. △ Less

Submitted 11 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted by KDD 2024

arXiv:2405.08786 [pdf, other]

Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

Authors: Tiantian Zhang, Manxi Lin, Hongda Guo, Xiaofan Zhang, Ka Fung Peter Chiu, Aasa Feragen, Qi Dou

Abstract: The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of common PI-RADS clinical guideline~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi… ▽ More The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of common PI-RADS clinical guideline~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a multi-modal large language model (MLLM) to incorporate PICG into PI-RADS scoring model without additional annotations and network parameters. We present a designed two-stage fine-tuning process aiming at adapting a MLLM originally trained on natural images to the MRI images while effectively integrating the PICG. Specifically, in the first stage, we develop a domain adapter layer tailored for processing 3D MRI inputs and instruct the MLLM to differentiate MRI sequences. In the second stage, we translate PICG for guiding instructions from the model to generate PICG-guided image features. Through such a feature distillation step, we align the scoring network's features with the PICG-guided image features, which enables the model to effectively incorporate the PICG information. We develop our model on a public dataset and evaluate it on an in-house dataset. Experimental results demonstrate that our approach effectively improves the performance of current scoring networks. Code is available at: https://github.com/med-air/PICG2scoring △ Less

Submitted 10 July, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08780 [pdf]

Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

Authors: Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng

Abstract: Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and p… ▽ More Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06944 [pdf, other]

Learning Monocular Depth from Focus with Event Focal Stack

Authors: Chenxu Jiang, Mingyuan Lin, Chi Zhang, Zhenghai Wang, Lei Yu

Abstract: Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency,… ▽ More Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency, which provides more temporal information for focus time acquisition. In this study, we propose the EDFF Network to estimate sparse depth from the Event Focal Stack. Specifically, we utilize the event voxel grid to encode intensity change information and project event time surface into the depth domain to preserve per-pixel focal distance information. A Focal-Distance-guided Cross-Modal Attention Module is presented to fuse the information mentioned above. Additionally, we propose a Multi-level Depth Fusion Block designed to integrate results from each level of a UNet-like architecture and produce the final output. Extensive experiments validate that our method outperforms existing state-of-the-art approaches. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06918 [pdf, other]

Super-Resolving Blurry Images with Events

Authors: Chi Zhang, Mingyuan Lin, Xiang Zhang, Chenxu Jiang, Lei Yu

Abstract: Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propo… ▽ More Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propose a multi-scale center-surround event representation to fully capture motion and texture information inherent in events. Additionally, we design a symmetric cross-modal attention module to fully exploit the complementarity between blurry images and events. Furthermore, we introduce an intermodal residual group composed of several residual dense Swin Transformer blocks, each incorporating multiple Swin Transformer layers and a residual connection, to extract global context and facilitate inter-block feature aggregation. Extensive experiments show that our method compares favorably against state-of-the-art approaches and achieves remarkable performance. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Showing 1–50 of 897 results for author: Lin, M