subscribe to arXiv mailings

Coupling multi-space topologies in 2D ferromagnetic lattice

Authors: Zhonglin He, Wenhui Du, Kaiying Dou, Ying Dai, Baibiao Huang, Yandong Ma

Abstract: Topology can manifest topological magnetism (e.g., skyrmion and bimeron) in real space and quantum anomalous Hall (QAH) state in momentum space, which have changed the modern conceptions of matter phase. While the topologies in different spaces are widely studied separately, their coexistence and coupling in single phase is seldomly explored. Here, we report a novel phenomenon that arises from the… ▽ More Topology can manifest topological magnetism (e.g., skyrmion and bimeron) in real space and quantum anomalous Hall (QAH) state in momentum space, which have changed the modern conceptions of matter phase. While the topologies in different spaces are widely studied separately, their coexistence and coupling in single phase is seldomly explored. Here, we report a novel phenomenon that arises from the interaction of topological magnetism and band topology, the multi-space topology, in 2D ferromagnetic lattice. Based on continuum theory and tight-binding model, we reveal that the interconnection between skyrmion/bimeron and QAH state generates distinctive localized chiral bound states (CBSs). With moderating topological magnetism through magnetic field, the multi-space topologies accompanied with different CBSs can be reversed, facilitating the coupling of multi-space topologies. By performing firstprinciples and atomic spin model simulations, we further demonstrate such multi-space topologies and their coupling in monolayer Cr2NSb. These results represent an important step towards the development of multispace topological phenomena in 2D lattice. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07531 [pdf, other]

Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models

Authors: Jin Liu, Qingquan Li, Wenlong Du

Abstract: In current benchmarks for evaluating large language models (LLMs), there are issues such as evaluation content restriction, untimely updates, and lack of optimization guidance. In this paper, we propose a new paradigm for the measurement of LLMs: Benchmarking-Evaluation-Assessment. Our paradigm shifts the "location" of LLM evaluation from the "examination room" to the "hospital". Through conductin… ▽ More In current benchmarks for evaluating large language models (LLMs), there are issues such as evaluation content restriction, untimely updates, and lack of optimization guidance. In this paper, we propose a new paradigm for the measurement of LLMs: Benchmarking-Evaluation-Assessment. Our paradigm shifts the "location" of LLM evaluation from the "examination room" to the "hospital". Through conducting a "physical examination" on LLMs, it utilizes specific task-solving as the evaluation content, performs deep attribution of existing problems within LLMs, and provides recommendation for optimization. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.04115 [pdf, other]

LiDAR-based Real-Time Object Detection and Tracking in Dynamic Environments

Authors: Wenqiang Du, Giovanni Beltrame

Abstract: In dynamic environments, the ability to detect and track moving objects in real-time is crucial for autonomous robots to navigate safely and effectively. Traditional methods for dynamic object detection rely on high accuracy odometry and maps to detect and track moving objects. However, these methods are not suitable for long-term operation in dynamic environments where the surrounding environment… ▽ More In dynamic environments, the ability to detect and track moving objects in real-time is crucial for autonomous robots to navigate safely and effectively. Traditional methods for dynamic object detection rely on high accuracy odometry and maps to detect and track moving objects. However, these methods are not suitable for long-term operation in dynamic environments where the surrounding environment is constantly changing. In order to solve this problem, we propose a novel system for detecting and tracking dynamic objects in real-time using only LiDAR data. By emphasizing the extraction of low-frequency components from LiDAR data as feature points for foreground objects, our method significantly reduces the time required for object clustering and movement analysis. Additionally, we have developed a tracking approach that employs intensity-based ego-motion estimation along with a sliding window technique to assess object movements. This enables the precise identification of moving objects and enhances the system's resilience to odometry drift. Our experiments show that this system can detect and track dynamic objects in real-time with an average detection accuracy of 88.7\% and a recall rate of 89.1\%. Furthermore, our system demonstrates resilience against the prolonged drift typically associated with front-end only LiDAR odometry. All of the source code, labeled dataset, and the annotation tool are available at: https://github.com/MISTLab/lidar_dynamic_objects_detection.git △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01005 [pdf, other]

doi 10.1145/3637528.3671533

MARLP: Time-series Forecasting Control for Agricultural Managed Aquifer Recharge

Authors: Yuning Chen, Kang Yang, Zhiyu An, Brady Holder, Luke Paloutzian, Khaled Bali, Wan Du

Abstract: The rapid decline in groundwater around the world poses a significant challenge to sustainable agriculture. To address this issue, agricultural managed aquifer recharge (Ag-MAR) is proposed to recharge the aquifer by artificially flooding agricultural lands using surface water. Ag-MAR requires a carefully selected flooding schedule to avoid affecting the oxygen absorption of crop roots. However, c… ▽ More The rapid decline in groundwater around the world poses a significant challenge to sustainable agriculture. To address this issue, agricultural managed aquifer recharge (Ag-MAR) is proposed to recharge the aquifer by artificially flooding agricultural lands using surface water. Ag-MAR requires a carefully selected flooding schedule to avoid affecting the oxygen absorption of crop roots. However, current Ag-MAR scheduling does not take into account complex environmental factors such as weather and soil oxygen, resulting in crop damage and insufficient recharging amounts. This paper proposes MARLP, the first end-to-end data-driven control system for Ag-MAR. We first formulate Ag-MAR as an optimization problem. To that end, we analyze four-year in-field datasets, which reveal the multi-periodicity feature of the soil oxygen level trends and the opportunity to use external weather forecasts and flooding proposals as exogenous clues for soil oxygen prediction. Then, we design a two-stage forecasting framework. In the first stage, it extracts both the cross-variate dependency and the periodic patterns from historical data to conduct preliminary forecasting. In the second stage, it uses weather-soil and flooding-soil causality to facilitate an accurate prediction of soil oxygen levels. Finally, we conduct model predictive control (MPC) for Ag-MAR flooding. To address the challenge of large action spaces, we devise a heuristic planning module to reduce the number of flooding proposals to enable the search for optimal solutions. Real-world experiments show that MARLP reduces the oxygen deficit ratio by 86.8% while improving the recharging amount in unit time by 35.8%, compared with the previous four years. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted by KDD 2024

arXiv:2406.17245 [pdf, other]

Unlocking Continual Learning Abilities in Language Models

Authors: Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun Cheung, Reynold Cheng, Jie Fu

Abstract: Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task informa… ▽ More Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGU}{this https URL}. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: preprint, 19 pages

arXiv:2406.16571 [pdf, other]

Differentiable Distributionally Robust Optimization Layers

Authors: Xutao Ma, Chao Ning, Wenli Du

Abstract: In recent years, there has been a growing research interest in decision-focused learning, which embeds optimization problems as a layer in learning pipelines and demonstrates a superior performance than the prediction-focused approach. However, for distributionally robust optimization (DRO), a popular paradigm for decision-making under uncertainty, it is still unknown how to embed it as a layer, i… ▽ More In recent years, there has been a growing research interest in decision-focused learning, which embeds optimization problems as a layer in learning pipelines and demonstrates a superior performance than the prediction-focused approach. However, for distributionally robust optimization (DRO), a popular paradigm for decision-making under uncertainty, it is still unknown how to embed it as a layer, i.e., how to differentiate decisions with respect to an ambiguity set. In this paper, we develop such differentiable DRO layers for generic mixed-integer DRO problems with parameterized second-order conic ambiguity sets and discuss its extension to Wasserstein ambiguity sets. To differentiate the mixed-integer decisions, we propose a novel dual-view methodology by handling continuous and discrete parts of decisions via different principles. Specifically, we construct a differentiable energy-based surrogate to implement the dual-view methodology and use importance sampling to estimate its gradient. We further prove that such a surrogate enjoys the asymptotic convergency under regularization. As an application of the proposed differentiable DRO layers, we develop a novel decision-focused learning pipeline for contextual distributionally robust decision-making tasks and compare it with the prediction-focused approach in experiments. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: In Forty-first International Conference on Machine Learning (2024)

arXiv:2406.12747 [pdf, other]

TSI-Bench: Benchmarking Time Series Imputation

Authors: Wenjie Du, Jun Wang, Linglong Qian, Yiyuan Yang, Fanxing Liu, Zepu Wang, Zina Ibrahim, Haoxin Liu, Zhiyuan Zhao, Yingjie Zhou, Wenjia Wang, Kaize Ding, Yuxuan Liang, B. Aditya Prakash, Qingsong Wen

Abstract: Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellen… ▽ More Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellent performance, whether their modeling achievements can be transferred to time series imputation tasks remains unexplored. To bridge these gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms and identification of meaningful insights into the influence of domain-appropriate missingness ratios and patterns on model performance. Furthermore, TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes. Our extensive study across 34,804 experiments, 28 algorithms, and 8 datasets with diverse missingness scenarios demonstrates TSI-Bench's effectiveness in diverse downstream tasks and potential to unlock future directions in time series imputation research and analysis. The source code and experiment logs are available at https://github.com/WenjieDu/AwesomeImputation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12206 [pdf, other]

The Absolute Age of NGC 3201 derived from Detached Eclipsing Binaries and the Hess Diagram

Authors: Jiaqi, Ying, Brian Chaboyer, Wenxin Du

Abstract: We estimate the absolute age of the globular cluster NGC 3201 using $10,000$ sets of theoretical isochrones constructed through Monte Carlo simulation using the Dartmouth Stellar Evolution Program. These isochrones take into consideration of uncertainty introduced by the choice of stellar evolution parameters. We fit isochrones with 3 detached eclipsing binaries and obtained an age independent of… ▽ More We estimate the absolute age of the globular cluster NGC 3201 using $10,000$ sets of theoretical isochrones constructed through Monte Carlo simulation using the Dartmouth Stellar Evolution Program. These isochrones take into consideration of uncertainty introduced by the choice of stellar evolution parameters. We fit isochrones with 3 detached eclipsing binaries and obtained an age independent of distance. We also fit isochrones with differential reddening corrected HST photometry data utilizing two different Hess diagram based fitting methods. Results from 3 different methods analyzing 2 different types of data agree to within $1 σ$, and we find the absolute age of NGC 3201 $= 11.85 \pm 0.74$ Gyr. We also perform variable importance analysis to study the uncertainty contribution from individual parameters and we find the distance is the dominance source of uncertainty in photometry based analysis while total metallicity, Helium abundance, $α$-element abundance, mixing length, and treatment of helium diffusion are important source of uncertainties for all 3 methods. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 14 pages, 10 figures, 3 Tables; Accepted for Publication ApJ

arXiv:2406.11906 [pdf, other]

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

Authors: Jingbo Zhou, Shaorong Chen, Jun Xia, Sizhe Liu, Tianze Ling, Wenjie Du, Yue Liu, Jianwei Yin, Stan Z. Li

Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this im… ▽ More Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this important task. Firstly, since there is no consensus for the evaluation datasets, the empirical results in different research papers are often not comparable, leading to unfair comparison. Secondly, the current methods are usually limited to amino acid-level or peptide-level precision and recall metrics. In this work, we present the first unified benchmark NovoBench for \emph{de novo} peptide sequencing, which comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent impressive methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $π$-HelixNovo are integrated into our framework. In addition to amino acid-level and peptide-level precision and recall, we evaluate the models' performance in terms of identifying post-tranlational modifications (PTMs), efficiency and robustness to peptide length, noise peaks and missing fragment ratio, which are important influencing factors while seldom be considered. Leveraging this benchmark, we conduct a large-scale study of current methods, report many insightful findings that open up new possibilities for future development. The benchmark will be open-sourced to facilitate future research and application. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11231 [pdf, other]

Enabling robots to follow abstract instructions and complete complex dynamic tasks

Authors: Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Chris Lucas

Abstract: Completing complex tasks in unpredictable settings like home kitchens challenges robotic systems. These challenges include interpreting high-level human commands, such as "make me a hot beverage" and performing actions like pouring a precise amount of water into a moving mug. To address these challenges, we present a novel framework that combines Large Language Models (LLMs), a curated Knowledge B… ▽ More Completing complex tasks in unpredictable settings like home kitchens challenges robotic systems. These challenges include interpreting high-level human commands, such as "make me a hot beverage" and performing actions like pouring a precise amount of water into a moving mug. To address these challenges, we present a novel framework that combines Large Language Models (LLMs), a curated Knowledge Base, and Integrated Force and Visual Feedback (IFVF). Our approach interprets abstract instructions, performs long-horizon tasks, and handles various uncertainties. It utilises GPT-4 to analyse the user's query and surroundings, then generates code that accesses a curated database of functions during execution. It translates abstract instructions into actionable steps. Each step involves generating custom code by employing retrieval-augmented generalisation to pull IFVF-relevant examples from the Knowledge Base. IFVF allows the robot to respond to noise and disturbances during execution. We use coffee making and plate decoration to demonstrate our approach, including components ranging from pouring to drawer opening, each benefiting from distinct feedback types and methods. This novel advancement marks significant progress toward a scalable, efficient robotic framework for completing complex tasks in uncertain environments. Our findings are illustrated in an accompanying video and supported by an open-source GitHub repository (released upon paper acceptance). △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09209 [pdf, other]

Acceleration of the Universe without the Hubble tension with Kaniadakis holographic dark energy using the Hubble horizon as the IR cut-off

Authors: Wei Fang, Guo Chen, Chao-Jun Feng, Wei Du, Chenggang Shu

Abstract: We introduce a holographic dark energy model that incorporates the first-order approximate Kaniadaski entropy, utilizing the Hubble horizon, $1/H$, as the infrared cutoff. We investigate the cosmological evolution within this framework. The model introduces an extra parameter relative to the $Λ$CDM model. It posits a Universe that is initially dominated by dark matter, which then evolves to a phas… ▽ More We introduce a holographic dark energy model that incorporates the first-order approximate Kaniadaski entropy, utilizing the Hubble horizon, $1/H$, as the infrared cutoff. We investigate the cosmological evolution within this framework. The model introduces an extra parameter relative to the $Λ$CDM model. It posits a Universe that is initially dominated by dark matter, which then evolves to a phase where dark energy becomes the predominant component, with this transition occurring at a redshift of approximately $z \sim 0.419$. The energy density of dark energy is ultimately expected to become constant, thereby circumventing the potential issue of a "big rip". Employing the most recent Type Ia supernova and Hubble parameter data, we constrain the model's parameters and find a Hubble constant of $H_0=72.8$ km/s/Mpc, thereby resolving the Hubble tension issue. The estimated age of the Universe, based on the best-fit parameter values, is $14.2$ Gyr. Furthermore, we predict the number of strong gravitational lenses and conduct statefinder and $Om$ diagnostic analyses to validate and characterize the model. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures

arXiv:2406.06652 [pdf, other]

Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Authors: Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou

Abstract: Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically… ▽ More Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization. △ Less

Submitted 17 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

Comments: 13 pages, 6 figures, and 6 tables

arXiv:2406.01719 [pdf, other]

Imputation of Missing Photometric Data and Photometric Redshift Estimation for CSST

Authors: Zhijian Luo, Zhirui Tang, Zhu Chen, Liping Fu, Wei Du, Shaohua Zhang, Yan Gong, Chenggang Shu, Junhao Lu, Yicheng Li, Xian-Min Meng, Xingchen Zhou, Zuhui Fan

Abstract: Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimat… ▽ More Accurate photometric redshift (photo-$z$) estimation requires support from multi-band observational data. However, in the actual process of astronomical observations and data processing, some sources may have missing observational data in certain bands for various reasons. This could greatly affect the accuracy and reliability of photo-$z$ estimation for these sources, and even render some estimation methods unusable. The same situation may exist for the upcoming Chinese Space Station Telescope (CSST). In this study, we employ a deep learning method called Generative Adversarial Imputation Networks (GAIN) to impute the missing photometric data in CSST, aiming to reduce the impact of data missing on photo-$z$ estimation and improve estimation accuracy. Our results demonstrate that using the GAIN technique can effectively fill in the missing photometric data in CSST. Particularly, when the data missing rate is below 30\%, the imputation of photometric data exhibits high accuracy, with higher accuracy in the $g$, $r$, $i$, $z$, and $y$ bands compared to the $NUV$ and $u$ bands. After filling in the missing values, the quality of photo-$z$ estimation obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software is notably enhanced. Evaluation metrics for assessing the quality of photo-$z$ estimation, including the catastrophic outlier fraction ($f_{out}$), the normalized median absolute deviation ($\rm {σ_{NMAD}}$), and the bias of photometric redshift ($bias$), all show some degree of improvement. Our research will help maximize the utilization of observational data and provide a new method for handling sample missing values for applications that require complete photometry data to produce results. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.17508 [pdf, other]

Unveiling the Secrets: How Masking Strategies Shape Time Series Imputation

Authors: Linglong Qian, Zina Ibrahim, Wenjie Du, Yiyuan Yang, Richard JB Dobson

Abstract: In this study, we explore the impact of different masking strategies on time series imputation models. We evaluate the effects of pre-masking versus in-mini-batch masking, normalization timing, and the choice between augmenting and overlaying artificial missingness. Using three diverse datasets, we benchmark eleven imputation models with different missing rates. Our results demonstrate that maskin… ▽ More In this study, we explore the impact of different masking strategies on time series imputation models. We evaluate the effects of pre-masking versus in-mini-batch masking, normalization timing, and the choice between augmenting and overlaying artificial missingness. Using three diverse datasets, we benchmark eleven imputation models with different missing rates. Our results demonstrate that masking strategies significantly influence imputation accuracy, revealing that more sophisticated and data-driven masking designs are essential for robust model evaluation. We advocate for refined experimental designs and comprehensive disclosureto better simulate real-world patterns, enhancing the practical applicability of imputation models. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.15319 [pdf, other]

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Authors: Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu

Abstract: LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehen… ▽ More LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehensive evaluation, ($\textit{O}$2) untested viability for scaling, and ($\textit{O}$3) lack of empirical guidelines. To tackle $\textit{O}$1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting. Our findings reveal that a depthwise stacking operator, called $G_{\text{stack}}$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance on eight standard NLP benchmarks compared to strong baselines. Motivated by these promising results, we conduct extensive experiments to delve deeper into $G_{\text{stack}}$ to address $\textit{O}$2 and $\textit{O}$3. For $\textit{O}$2 (untested scalability), our study shows that $G_{\text{stack}}$ is scalable and consistently performs well, with experiments up to 7B LLMs after growth and pre-training LLMs with 750B tokens. For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54.6\% speedup. We further address $\textit{O}$3 (lack of empirical guidelines) by formalizing guidelines to determine growth timing and growth factor for $G_{\text{stack}}$, making it practical in general LLM pre-training. We also provide in-depth discussions and comprehensive ablation studies of $G_{\text{stack}}$. Our code and pre-trained model are available at $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Preprint; The project link: $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$

arXiv:2405.13761 [pdf]

Monolithic Germanium Tin on Si Avalanche Photodiodes

Authors: Justin Rudie, Sylvester Amoah, Xiaoxin Wang, Rajesh Kumar, Grey Abernathy, Steven Akwabli, Perry C. Grant, Jifeng Liu, Baohua Li, Wei Du, Shui-Qing Yu

Abstract: We demonstrate monolithically grown germanium-tin (GeSn) on silicon avalanche photodiodes (APDs) for infrared light detection. A relatively thinner Ge buffer design was adopted to allow effective photo carriers to transport from the GeSn absorber to the Si multiplication layer such that clear punch-through behavior and a saturated primary responsivity of 0.3 A/W at 1550 nm were observed before ava… ▽ More We demonstrate monolithically grown germanium-tin (GeSn) on silicon avalanche photodiodes (APDs) for infrared light detection. A relatively thinner Ge buffer design was adopted to allow effective photo carriers to transport from the GeSn absorber to the Si multiplication layer such that clear punch-through behavior and a saturated primary responsivity of 0.3 A/W at 1550 nm were observed before avalanche breakdown in GeSn/Si APDs for the first time. The spectral response covers 1500 to 1700 nm. The measured punch-through and breakdown voltages are 15 and 17 V, respectively. Undisputed multiplication gain was obtained with the maximum value of 4.5 at 77 K, and 1.4 at 250 K, directly in reference to the saturated primary responsivity from the same device rather than a different GeSn p-i-n photodiode in previous reports. A peak responsivity was measured as 1.12 A/W at 1550 nm and 77 K. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: 8 pages, 5 figures, invited paper

arXiv:2405.13401 [pdf, ps, other]

TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models

Authors: Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Ping Yi, Zhuosheng Zhang, Gongshen Liu

Abstract: Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous… ▽ More Large language models (LLMs) have raised concerns about potential security threats despite performing significantly in Natural Language Processing (NLP). Backdoor attacks initially verified that LLM is doing substantial harm at all stages, but the cost and robustness have been criticized. Attacking LLMs is inherently risky in security review, while prohibitively expensive. Besides, the continuous iteration of LLMs will degrade the robustness of backdoors. In this paper, we propose TrojanRAG, which employs a joint backdoor attack in the Retrieval-Augmented Generation, thereby manipulating LLMs in universal attack scenarios. Specifically, the adversary constructs elaborate target contexts and trigger sets. Multiple pairs of backdoor shortcuts are orthogonally optimized by contrastive learning, thus constraining the triggering conditions to a parameter subspace to improve the matching. To improve the recall of the RAG for the target contexts, we introduce a knowledge graph to construct structured data to achieve hard matching at a fine-grained level. Moreover, we normalize the backdoor scenarios in LLMs to analyze the real harm caused by backdoors from both attackers' and users' perspectives and further verify whether the context is a favorable tool for jailbreaking models. Extensive experimental results on truthfulness, language understanding, and harmfulness show that TrojanRAG exhibits versatility threats while maintaining retrieval capabilities on normal queries. △ Less

Submitted 7 July, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: 19 pages, 14 figures, 4 tables

arXiv:2405.10163 [pdf]

Electrically Injected mid-infrared GeSn laser on Si operating at 140 K

Authors: Sudip Acharya, Hryhorii Stanchu, Rajesh Kumar, Solomon Ojo, Mourad Benamara, Guo-En Chang, Baohua Li, Wei Du, Shui-Qing Yu

Abstract: Owing to its true direct bandgap and tunable bandgap energies,GeSn alloys are increasingly attractive as gain media for mid-IR lasers that can be monolithically integrated on Si. Demonstrations of optically pumped GeSn laser at room under pulsed condition and at cryogenic temperature under continuous-wave excitation show great promise of GeSn lasers to be efficient electrically injected light sour… ▽ More Owing to its true direct bandgap and tunable bandgap energies,GeSn alloys are increasingly attractive as gain media for mid-IR lasers that can be monolithically integrated on Si. Demonstrations of optically pumped GeSn laser at room under pulsed condition and at cryogenic temperature under continuous-wave excitation show great promise of GeSn lasers to be efficient electrically injected light sources on Si. Here we report electrically injected GeSn lasers using Fabry-Perot cavity with 20, 40, and 80 micron ridge widths. A maximum operating temperature of 140 K with lasing threshold of 0.756 kA/cm2 at 77 K and emitting wavelength of 2722 nm at 140 K was obtained. The lower threshold current density compared to previous works was achieved by reducing optical loss and improving the optical confinement. The peak power was measured as 2.2 mW/facet at 77 K. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.18408 [pdf, other]

doi 10.1088/1674-4527/ad3954

Low surface brightness galaxies from BASS+MzLS with Machine Learning

Authors: Peng-Liang Du, Wei Du, Bing-Qing Zhang, Zhen-Ping Yi, Min He, Hong Wu

Abstract: From $\sim$ 5000 deg$^{2}$ of the combination of the Beijing-Arizona Sky Survey (BASS) and Mayall $z$-band Legacy Survey (MzLS) which is also the northern sky region of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we selected a sample of 31,825 candidates of low surface brightness galaxies (LSBGs) with the mean effective surface brightness 24.2 $< \barμ_{\rm eff,g} <$ 28… ▽ More From $\sim$ 5000 deg$^{2}$ of the combination of the Beijing-Arizona Sky Survey (BASS) and Mayall $z$-band Legacy Survey (MzLS) which is also the northern sky region of the Dark Energy Spectroscopic Instrument (DESI) Legacy Imaging Surveys, we selected a sample of 31,825 candidates of low surface brightness galaxies (LSBGs) with the mean effective surface brightness 24.2 $< \barμ_{\rm eff,g} <$ 28.8 mag arcsec$^{\rm -2}$ and the half-light radius 2.5$^{\prime\prime}$ $< r_{\rm eff} <$ 20$^{\prime\prime}$ based on the released photometric catalogue and the machine learning model. The distribution of the LSBGs is of bimodality in the $g$ - $r$ color, indicating the two distinct populations of the blue ($g$ - $r <$ 0.60) and the red ($g$ - $r >$ 0.60) LSBGs. The blue LSBGs appear spiral, disk or irregular while the red LSBGs are spheroidal or ellipitcal and spatially clustered. This trend shows that the color has a strong correlation with galaxy morphology for LSBGs. In the spatial distribution, the blue LSBGs are more uniformly distributed while the red ones are highly clustered, indicating that red LSBGs preferentially populated denser environment than the blue LSBGs. Besides, both populations have consistent distribution of ellipticity (median $ε\sim$ 0.3), half-light radius (median $r_{\rm eff} \sim$ 4$^{\prime\prime}$), and Sersic index (median $n$ = 1), implying the dominance of the full sample by the round and disk galaxies. This sample has definitely extended the studies of LSBGs to a regime of lower surface brightness, fainter magnitude, and broader other properties than the previously SDSS-based samples. △ Less

Submitted 29 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: 20 pages, 11 figures, 1 table, accepted by Research in Astronomy and Astrophysics

arXiv:2404.10515 [pdf, other]

An Enhanced Differential Grouping Method for Large-Scale Overlapping Problems

Authors: Maojiang Tian, Mingke Chen, Wei Du, Yang Tang, Yaochu Jin

Abstract: Large-scale overlapping problems are prevalent in practical engineering applications, and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based cooperative coevolution (CC) algorithms have demonstrated promising performance in addressing large-scale overlapping problems. However, current CC frameworks designed for overlapping problems r… ▽ More Large-scale overlapping problems are prevalent in practical engineering applications, and the optimization challenge is significantly amplified due to the existence of shared variables. Decomposition-based cooperative coevolution (CC) algorithms have demonstrated promising performance in addressing large-scale overlapping problems. However, current CC frameworks designed for overlapping problems rely on grouping methods for the identification of overlapping problem structures and the current grouping methods for large-scale overlapping problems fail to consider both accuracy and efficiency simultaneously. In this article, we propose a two-stage enhanced grouping method for large-scale overlapping problems, called OEDG, which achieves accurate grouping while significantly reducing computational resource consumption. In the first stage, OEDG employs a grouping method based on the finite differences principle to identify all subcomponents and shared variables. In the second stage, we propose two grouping refinement methods, called subcomponent union detection (SUD) and subcomponent detection (SD), to enhance and refine the grouping results. SUD examines the information of the subcomponents and shared variables obtained in the previous stage, and SD corrects inaccurate grouping results. To better verify the performance of the proposed OEDG, we propose a series of novel benchmarks that consider various properties of large-scale overlapping problems, including the topology structure, overlapping degree, and separability. Extensive experimental results demonstrate that OEDG is capable of accurately grouping different types of large-scale overlapping problems while consuming fewer computational resources. Finally, we empirically verify that the proposed OEDG can effectively improve the optimization performance of diverse large-scale overlapping problems. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.08169 [pdf, other]

AutoGFI: Streamlined Generalized Fiducial Inference for Modern Inference Problems

Authors: Wei Du, Jan Hannig, Thomas C. M. Lee, Yi Su, Chunzhe Zhang

Abstract: The origins of fiducial inference trace back to the 1930s when R. A. Fisher first introduced the concept as a response to what he perceived as a limitation of Bayesian inference - the requirement for a subjective prior distribution on model parameters in cases where no prior information was available. However, Fisher's initial fiducial approach fell out of favor as complications arose, particularl… ▽ More The origins of fiducial inference trace back to the 1930s when R. A. Fisher first introduced the concept as a response to what he perceived as a limitation of Bayesian inference - the requirement for a subjective prior distribution on model parameters in cases where no prior information was available. However, Fisher's initial fiducial approach fell out of favor as complications arose, particularly in multi-parameter problems. In the wake of 2000, amidst a renewed interest in contemporary adaptations of fiducial inference, generalized fiducial inference (GFI) emerged to extend Fisher's fiducial argument, providing a promising avenue for addressing numerous crucial and practical inference challenges. Nevertheless, the adoption of GFI has been limited due to its often demanding mathematical derivations and the necessity for implementing complex Markov Chain Monte Carlo algorithms. This complexity has impeded its widespread utilization and practical applicability. This paper presents a significant advancement by introducing an innovative variant of GFI designed to alleviate these challenges. Specifically, this paper proposes AutoGFI, an easily implementable algorithm that streamlines the application of GFI to a broad spectrum of inference problems involving additive noise. AutoGFI can be readily implemented as long as a fitting routine is available, making it accessible to a broader audience of researchers and practitioners. To demonstrate its effectiveness, AutoGFI is applied to three contemporary and challenging problems: tensor regression, matrix completion, and regression with network cohesion. These case studies highlight the immense potential of GFI and illustrate AutoGFI's promising performance when compared to specialized solutions for these problems. Overall, this research paves the way for a more accessible and powerful application of GFI in a range of practical domains. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.00819 [pdf, other]

Ultra-relativistic quark-nucleus scattering on quantum computers

Authors: Sihao Wu, Weijie Du, Xingbo Zhao, James P. Vary

Abstract: Quantum computing provides a promising approach for solving the real-time dynamics of systems consist of quarks and gluons from first-principle calculations that are intractable with classical computers. In this work, we start with an initial problem of the ultra-relativistic quark-nucleus scattering and present an efficient and precise approach to quantum simulate the dynamics on the light front.… ▽ More Quantum computing provides a promising approach for solving the real-time dynamics of systems consist of quarks and gluons from first-principle calculations that are intractable with classical computers. In this work, we start with an initial problem of the ultra-relativistic quark-nucleus scattering and present an efficient and precise approach to quantum simulate the dynamics on the light front. This approach employs the eigenbasis of the asymptotic scattering system and implements the compact scheme for basis encoding. It exploits the operator structure of the light-front Hamiltonian of the scattering system, which enables the Hamiltonian input scheme that utilizes the quantum Fourier transform for efficiency. It utilizes the truncated Taylor series for the dynamics simulations. The qubit cost of our approach scales logarithmically with the Hilbert space dimension of the scattering system. The gate cost has optimal scaling with the simulation error and near optimal scaling with the simulation time. These scalings make our approach advantageous for large-scale dynamics simulations on future fault-tolerant quantum computers. We demonstrate our approach with a simple scattering problem and benchmark the results with those from the Trotter algorithm and the classical calculations, where good agreement between the results is found. △ Less

Submitted 15 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: 27 pages, 11 figures. Comments are welcome

arXiv:2404.00555 [pdf, other]

Gas-rich Ultra-diffuse Galaxies Are Originated from High Specific Angular Momentum

Authors: Yu Rong, Huijie Hu, Min He, Wei Du, Qi Guo, Hui-Yuan Wang, Hong-Xin Zhang, Houjun Mo

Abstract: Ultra-diffuse galaxies, characterized by comparable effective radii to the Milky Way but possessing 100-1,000 times fewer stars, offer a unique opportunity to garner novel insights into the mechanisms governing galaxy formation. Nevertheless, the existing corpus of observational and simulation studies has not yet yielded a definitive constraint or comprehensive consensus on the formation mechanism… ▽ More Ultra-diffuse galaxies, characterized by comparable effective radii to the Milky Way but possessing 100-1,000 times fewer stars, offer a unique opportunity to garner novel insights into the mechanisms governing galaxy formation. Nevertheless, the existing corpus of observational and simulation studies has not yet yielded a definitive constraint or comprehensive consensus on the formation mechanisms underlying ultra-diffuse galaxies. In this study, we delve into the properties of ultra-diffuse galaxies enriched with neutral hydrogen using a semi-analytic method, with the explicit aim of constraining existing ultra-diffuse galaxy formation models. We find that the gas-rich ultra-diffuse galaxies are statistically not failed $L^{\star}$ galaxies nor dark matter deficient galaxies. In statistical terms, these ultra-diffuse galaxies exhibit comparable halo concentration, but higher baryonic mass fraction, as well as higher stellar and gas specific angular momentum, in comparison to typical dwarf galaxy counterparts. Our analysis unveils that higher gas specific angular momentum serves as the underlying factor elucidating the observed heightened baryonic mass fractions, diminished star formation efficiency, expanded stellar disk sizes, and reduced stellar densities in ultra-diffuse galaxies. Our findings make significant contributions to advancing our knowledge of ultra-diffuse galaxy formation and shed light on the intricate interplay between gas dynamics and the evolution of galaxies. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: comments welcome

arXiv:2403.15393 [pdf, other]

Detection of Opioid Users from Reddit Posts via an Attention-based Bidirectional Recurrent Neural Network

Authors: Yuchen Wang, Zhengyu Fang, Wei Du, Shuai Xu, Rong Xu, Jing Li

Abstract: The opioid epidemic, referring to the growing hospitalizations and deaths because of overdose of opioid usage and addiction, has become a severe health problem in the United States. Many strategies have been developed by the federal and local governments and health communities to combat this crisis. Among them, improving our understanding of the epidemic through better health surveillance is one o… ▽ More The opioid epidemic, referring to the growing hospitalizations and deaths because of overdose of opioid usage and addiction, has become a severe health problem in the United States. Many strategies have been developed by the federal and local governments and health communities to combat this crisis. Among them, improving our understanding of the epidemic through better health surveillance is one of the top priorities. In addition to direct testing, machine learning approaches may also allow us to detect opioid users by analyzing data from social media because many opioid users may choose not to do the tests but may share their experiences on social media anonymously. In this paper, we take advantage of recent advances in machine learning, collect and analyze user posts from a popular social network Reddit with the goal to identify opioid users. Posts from more than 1,000 users who have posted on three sub-reddits over a period of one month have been collected. In addition to the ones that contain keywords such as opioid, opiate, or heroin, we have also collected posts that contain slang words of opioid such as black or chocolate. We apply an attention-based bidirectional long short memory model to identify opioid users. Experimental results show that the approaches significantly outperform competitive algorithms in terms of F1-score. Furthermore, the model allows us to extract most informative words, such as opiate, opioid, and black, from posts via the attention layer, which provides more insights on how the machine learning algorithm works in distinguishing drug users from non-drug users. △ Less

Submitted 9 February, 2024; originally announced March 2024.

arXiv:2403.12130 [pdf, other]

Almost Optically Dark Galaxies in DECaLS (I): Detection, Optical Properties and Possible Origins

Authors: Lin Du, Wei Du, Cheng Cheng, Ming Zhu, Haiyang Yu, Hong Wu

Abstract: We report the discovery of eight optical counterparts of ALFALFA extragalactic objects from DECaLS, five of which are discovered for the first time. These objects were flagged as HI emission sources with no optical counterparts in SDSS before. Multi-band data reveal their unusual physical properties. They are faint and blue ($g-r=-0.35\sim0.55$), with quite low surface brightness (… ▽ More We report the discovery of eight optical counterparts of ALFALFA extragalactic objects from DECaLS, five of which are discovered for the first time. These objects were flagged as HI emission sources with no optical counterparts in SDSS before. Multi-band data reveal their unusual physical properties. They are faint and blue ($g-r=-0.35\sim0.55$), with quite low surface brightness ($μ_{\rm g,peak}=24.88\sim26.41\,{\rm mag}/{\rm arcsec}^2$), irregular morphologies, low stellar masses ($log_{10}(M_{*}/M_\odot)=5.27\sim7.15$), low star formation rates ($SFR=0.21\sim9.24\times10^{-3}\,{M_\odot}\,{\rm yr}^{-1}$), and remarkably high HI-to-stellar mass ratios ($log_{10}(M_{\rm HI}/M_{*}) = 1.72\sim3.22$, except AGC\,215415). They deviate from the scaling relations between HI and optical properties defined by the ALFALFA sample and the baryonic Tully-Fisher relation. They agree well with the main sequence of star-forming galaxies but exhibit low star-forming efficiency. Based on their physical properties and environments, we speculate that six of these objects may have originated from tidal processes, while the remaining two appear to have isolated origins. They may have had a relatively calm evolutionary history and only begun to form stars recently. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 32 pages, 11 figures, accepted by the Astrophysical Journal

arXiv:2403.07013 [pdf, other]

AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information

Authors: Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, Stan Z. Li

Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with… ▽ More Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with post-translational modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in decreased peptide-level identification precision. Secondly, diverse types of noise and missing peaks in mass spectra reduce the reliability of training data (peptide-spectrum matches, PSMs). To address these challenges, we propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide, using CMI for adaptive model training. Extensive experiments demonstrate AdaNovo's state-of-the-art performance on a 9-species benchmark, where the peptides in the training set are almost completely disjoint from the peptides of the test sets. Moreover, AdaNovo excels in identifying amino acids with PTMs and exhibits robustness against data noise. The supplementary materials contain the official code. △ Less

Submitted 15 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.03425 [pdf, other]

Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

Authors: Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a… ▽ More The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a practical molecular design necessitates not only meeting the diversity requirements but also addressing structural and textural constraints with various symmetries outlined by domain experts. In this article, we present an innovative approach to tackle this inverse design problem by formulating it as a multi-modality guidance generation/optimization task. Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular generation/optimization tasks, namely 3DToMolo. 3DToMolo aims to harmonize diverse modalities, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field. Experimental trials across three guidance generation settings have shown a superior hit generation performance compared to state-of-the-art methodologies. Moreover, 3DToMolo demonstrates the capability to generate novel molecules, incorporating specified target substructures, without the need for prior knowledge. This work not only holds general significance for the advancement of deep learning methodologies but also paves the way for a transformative shift in molecular design strategies. 3DToMolo creates opportunities for a more nuanced and effective exploration of the vast chemical space, opening new frontiers in the development of molecular entities with tailored properties and functionalities. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.01192 [pdf, other]

A Composite Decomposition Method for Large-Scale Global Optimization

Authors: Maojiang Tian, Minyang Chen, Wei Du, Yang Tang, Yaochu Jin, Gary G. Yen

Abstract: Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grouping stage significantly impact the performance of the optimization process. While the general separability grouping (GSG) method has overcome the limitation of previous differ… ▽ More Cooperative co-evolution (CC) algorithms, based on the divide-and-conquer strategy, have emerged as the predominant approach to solving large-scale global optimization (LSGO) problems. The efficiency and accuracy of the grouping stage significantly impact the performance of the optimization process. While the general separability grouping (GSG) method has overcome the limitation of previous differential grouping (DG) methods by enabling the decomposition of non-additively separable functions, it suffers from high computational complexity. To address this challenge, this article proposes a composite separability grouping (CSG) method, seamlessly integrating DG and GSG into a problem decomposition framework to utilize the strengths of both approaches. CSG introduces a step-by-step decomposition framework that accurately decomposes various problem types using fewer computational resources. By sequentially identifying additively, multiplicatively and generally separable variables, CSG progressively groups non-separable variables by recursively considering the interactions between each non-separable variable and the formed non-separable groups. Furthermore, to enhance the efficiency and accuracy of CSG, we introduce two innovative methods: a multiplicatively separable variable detection method and a non-separable variable grouping method. These two methods are designed to effectively detect multiplicatively separable variables and efficiently group non-separable variables, respectively. Extensive experimental results demonstrate that CSG achieves more accurate variable grouping with lower computational complexity compared to GSG and state-of-the-art DG series designs. △ Less

Submitted 8 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.00172 [pdf, other]

Go Beyond Black-box Policies: Rethinking the Design of Learning Agent for Interpretable and Verifiable HVAC Control

Authors: Zhiyu An, Xianzhong Ding, Wan Du

Abstract: Recent research has shown the potential of Model-based Reinforcement Learning (MBRL) to enhance energy efficiency of Heating, Ventilation, and Air Conditioning (HVAC) systems. However, existing methods rely on black-box thermal dynamics models and stochastic optimizers, lacking reliability guarantees and posing risks to occupant health. In this work, we overcome the reliability bottleneck by redes… ▽ More Recent research has shown the potential of Model-based Reinforcement Learning (MBRL) to enhance energy efficiency of Heating, Ventilation, and Air Conditioning (HVAC) systems. However, existing methods rely on black-box thermal dynamics models and stochastic optimizers, lacking reliability guarantees and posing risks to occupant health. In this work, we overcome the reliability bottleneck by redesigning HVAC controllers using decision trees extracted from existing thermal dynamics models and historical data. Our decision tree-based policies are deterministic, verifiable, interpretable, and more energy-efficient than current MBRL methods. First, we introduce a novel verification criterion for RL agents in HVAC control based on domain knowledge. Second, we develop a policy extraction procedure that produces a verifiable decision tree policy. We found that the high dimensionality of the thermal dynamics model input hinders the efficiency of policy extraction. To tackle the dimensionality challenge, we leverage importance sampling conditioned on historical data distributions, significantly improving policy extraction efficiency. Lastly, we present an offline verification algorithm that guarantees the reliability of a control policy. Extensive experiments show that our method saves 68.4% more energy and increases human comfort gain by 14.8% compared to the state-of-the-art method, in addition to an 1127x reduction in computation overhead. Our code and data are available at https://github.com/ryeii/Veri_HVAC △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: Accepted for the 61st Design Automation Conference (DAC)

arXiv:2402.18945 [pdf, other]

SynGhost: Imperceptible and Universal Task-agnostic Backdoor Attack in Pre-trained Language Models

Authors: Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Gongshen Liu

Abstract: Pre-training has been a necessary phase for deploying pre-trained language models (PLMs) to achieve remarkable performance in downstream tasks. However, we empirically show that backdoor attacks exploit such a phase as a vulnerable entry point for task-agnostic. In this paper, we first propose $\mathtt{maxEntropy}$, an entropy-based poisoning filtering defense, to prove that existing task-agnostic… ▽ More Pre-training has been a necessary phase for deploying pre-trained language models (PLMs) to achieve remarkable performance in downstream tasks. However, we empirically show that backdoor attacks exploit such a phase as a vulnerable entry point for task-agnostic. In this paper, we first propose $\mathtt{maxEntropy}$, an entropy-based poisoning filtering defense, to prove that existing task-agnostic backdoors are easily exposed, due to explicit triggers used. Then, we present $\mathtt{SynGhost}$, an imperceptible and universal task-agnostic backdoor attack in PLMs. Specifically, $\mathtt{SynGhost}$ hostilely manipulates clean samples through different syntactic and then maps the backdoor to representation space without disturbing the primitive representation. $\mathtt{SynGhost}$ further leverages contrastive learning to achieve universal, which performs a uniform distribution of backdoors in the representation space. In light of the syntactic properties, we also introduce an awareness module to alleviate the interference between different syntactic. Experiments show that $\mathtt{SynGhost}$ holds more serious threats. Not only do severe harmfulness to various downstream tasks on two tuning paradigms but also to any PLMs. Meanwhile, $\mathtt{SynGhost}$ is imperceptible against three countermeasures based on perplexity, fine-pruning, and the proposed $\mathtt{maxEntropy}$. △ Less

Submitted 24 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 18 pages, 19 figures, 13 tables

arXiv:2402.16918 [pdf, other]

m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers

Authors: Ka Man Lo, Yiming Liang, Wenyu Du, Yuantao Fan, Zili Wang, Wenhao Huang, Lei Ma, Jie Fu

Abstract: Modular neural architectures are gaining attention for their powerful generalization and efficient adaptation to new domains. However, training these models poses challenges due to optimization difficulties arising from intrinsic sparse connectivity. Leveraging knowledge from monolithic models through techniques like knowledge distillation can facilitate training and enable integration of diverse… ▽ More Modular neural architectures are gaining attention for their powerful generalization and efficient adaptation to new domains. However, training these models poses challenges due to optimization difficulties arising from intrinsic sparse connectivity. Leveraging knowledge from monolithic models through techniques like knowledge distillation can facilitate training and enable integration of diverse knowledge. Nevertheless, conventional knowledge distillation approaches are not tailored to modular models and struggle with unique architectures and enormous parameter counts. Motivated by these challenges, we propose module-to-module knowledge distillation (m2mKD) for transferring knowledge between modules. m2mKD combines teacher modules of a pretrained monolithic model and student modules of a modular model with a shared meta model respectively to encourage the student module to mimic the behaviour of the teacher module. We evaluate m2mKD on two modular neural architectures: Neural Attentive Circuits (NACs) and Vision Mixture-of-Experts (V-MoE). Applying m2mKD to NACs yields significant improvements in IID accuracy on Tiny-ImageNet (up to 5.6%) and OOD robustness on Tiny-ImageNet-R (up to 4.2%). Additionally, the V-MoE-Base model trained with m2mKD achieves 3.5% higher accuracy than end-to-end training on ImageNet-1k. Code is available at https://github.com/kamanphoebe/m2mKD. △ Less

Submitted 7 July, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.16061 [pdf, other]

How Large Language Models Encode Context Knowledge? A Layer-Wise Probing Study

Authors: Tianjie Ju, Weiwei Sun, Wei Du, Xinwei Yuan, Zhaochun Ren, Gongshen Liu

Abstract: Previous work has showcased the intriguing capability of large language models (LLMs) in retrieving facts and processing context knowledge. However, only limited research exists on the layer-wise capability of LLMs to encode knowledge, which challenges our understanding of their internal mechanisms. In this paper, we devote the first attempt to investigate the layer-wise capability of LLMs through… ▽ More Previous work has showcased the intriguing capability of large language models (LLMs) in retrieving facts and processing context knowledge. However, only limited research exists on the layer-wise capability of LLMs to encode knowledge, which challenges our understanding of their internal mechanisms. In this paper, we devote the first attempt to investigate the layer-wise capability of LLMs through probing tasks. We leverage the powerful generative capability of ChatGPT to construct probing datasets, providing diverse and coherent evidence corresponding to various facts. We employ $\mathcal V$-usable information as the validation metric to better reflect the capability in encoding context knowledge across different layers. Our experiments on conflicting and newly acquired knowledge show that LLMs: (1) prefer to encode more context knowledge in the upper layers; (2) primarily encode context knowledge within knowledge-related entity tokens at lower layers while progressively expanding more knowledge within other tokens at upper layers; and (3) gradually forget the earlier context knowledge retained within the intermediate layers when provided with irrelevant evidence. Code is publicly available at https://github.com/Jometeorie/probing_llama. △ Less

Submitted 4 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: Accepted at LREC-COLING 2024 (Long Paper)

arXiv:2402.14600 [pdf, other]

Diffusion Model-Based Multiobjective Optimization for Gasoline Blending Scheduling

Authors: Wenxuan Fang, Wei Du, Renchu He, Yang Tang, Yaochu Jin, Gary G. Yen

Abstract: Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusio… ▽ More Gasoline blending scheduling uses resource allocation and operation sequencing to meet a refinery's production requirements. The presence of nonlinearity, integer constraints, and a large number of decision variables adds complexity to this problem, posing challenges for traditional and evolutionary algorithms. This paper introduces a novel multiobjective optimization approach driven by a diffusion model (named DMO), which is designed specifically for gasoline blending scheduling. To address integer constraints and generate feasible schedules, the diffusion model creates multiple intermediate distributions between Gaussian noise and the feasible domain. Through iterative processes, the solutions transition from Gaussian noise to feasible schedules while optimizing the objectives using the gradient descent method. DMO achieves simultaneous objective optimization and constraint adherence. Comparative tests are conducted to evaluate DMO's performance across various scales. The experimental results demonstrate that DMO surpasses state-of-the-art multiobjective evolutionary algorithms in terms of efficiency when solving gasoline blending scheduling problems. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.13419 [pdf, ps, other]

Reward Bound for Behavioral Guarantee of Model-based Planning Agents

Authors: Zhiyu An, Xianzhong Ding, Wan Du

Abstract: Recent years have seen an emerging interest in the trustworthiness of machine learning-based agents in the wild, especially in robotics, to provide safety assurance for the industry. Obtaining behavioral guarantees for these agents remains an important problem. In this work, we focus on guaranteeing a model-based planning agent reaches a goal state within a specific future time step. We show that… ▽ More Recent years have seen an emerging interest in the trustworthiness of machine learning-based agents in the wild, especially in robotics, to provide safety assurance for the industry. Obtaining behavioral guarantees for these agents remains an important problem. In this work, we focus on guaranteeing a model-based planning agent reaches a goal state within a specific future time step. We show that there exists a lower bound for the reward at the goal state, such that if the said reward is below that bound, it is impossible to obtain such a guarantee. By extension, we show how to enforce preferences over multiple goals. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: To be published in ICLR 24 tiny paper track

arXiv:2402.12720 [pdf, other]

Revisiting the Information Capacity of Neural Network Watermarks: Upper Bound Estimation and Beyond

Authors: Fangqi Li, Haodong Zhao, Wei Du, Shilin Wang

Abstract: To trace the copyright of deep neural networks, an owner can embed its identity information into its model as a watermark. The capacity of the watermark quantify the maximal volume of information that can be verified from the watermarked model. Current studies on capacity focus on the ownership verification accuracy under ordinary removal attacks and fail to capture the relationship between robust… ▽ More To trace the copyright of deep neural networks, an owner can embed its identity information into its model as a watermark. The capacity of the watermark quantify the maximal volume of information that can be verified from the watermarked model. Current studies on capacity focus on the ownership verification accuracy under ordinary removal attacks and fail to capture the relationship between robustness and fidelity. This paper studies the capacity of deep neural network watermarks from an information theoretical perspective. We propose a new definition of deep neural network watermark capacity analogous to channel capacity, analyze its properties, and design an algorithm that yields a tight estimation of its upper bound under adversarial overwriting. We also propose a universal non-invasive method to secure the transmission of the identity message beyond capacity by multiple rounds of ownership verification. Our observations provide evidence for neural network owners and defenders that are curious about the tradeoff between the integrity of their ownership and the performance degradation of their products. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted by AAAI 2024

arXiv:2402.11900 [pdf, other]

Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models

Authors: Tianjie Ju, Yijin Chen, Xinwei Yuan, Zhuosheng Zhang, Wei Du, Yubin Zheng, Gongshen Liu

Abstract: Recent work has showcased the powerful capability of large language models (LLMs) in recalling knowledge and reasoning. However, the reliability of LLMs in combining these two capabilities into reasoning through multi-hop facts has not been widely explored. This paper systematically investigates the possibilities for LLMs to utilize shortcuts based on direct connections between the initial and ter… ▽ More Recent work has showcased the powerful capability of large language models (LLMs) in recalling knowledge and reasoning. However, the reliability of LLMs in combining these two capabilities into reasoning through multi-hop facts has not been widely explored. This paper systematically investigates the possibilities for LLMs to utilize shortcuts based on direct connections between the initial and terminal entities of multi-hop knowledge. We first explore the existence of factual shortcuts through Knowledge Neurons, revealing that: (i) the strength of factual shortcuts is highly correlated with the frequency of co-occurrence of initial and terminal entities in the pre-training corpora; (ii) few-shot prompting leverage more shortcuts in answering multi-hop questions compared to chain-of-thought prompting. Then, we analyze the risks posed by factual shortcuts from the perspective of multi-hop knowledge editing. Analysis shows that approximately 20% of the failures are attributed to shortcuts, and the initial and terminal entities in these failure instances usually have higher co-occurrences in the pre-training corpus. Finally, we propose erasing shortcut neurons to mitigate the associated risks and find that this approach significantly reduces failures in multiple-hop knowledge editing caused by shortcuts. △ Less

Submitted 2 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted at ACL 2024 (Long Paper. Main Conference)

arXiv:2402.11205 [pdf, other]

An Efficient Quantum Circuit for Block Encoding a Pairing Hamiltonian

Authors: Diyi Liu, Weijie Du, Lin Lin, James P. Vary, Chao Yang

Abstract: We present an efficient quantum circuit for block encoding pairing Hamiltonian often studied in nuclear physics. Our block encoding scheme does not require mapping the creation and annihilation operators to the Pauli operators and representing the Hamiltonian as a linear combination of unitaries. Instead, we show how to encode the Hamiltonian directly using controlled swap operations. We analyze t… ▽ More We present an efficient quantum circuit for block encoding pairing Hamiltonian often studied in nuclear physics. Our block encoding scheme does not require mapping the creation and annihilation operators to the Pauli operators and representing the Hamiltonian as a linear combination of unitaries. Instead, we show how to encode the Hamiltonian directly using controlled swap operations. We analyze the gate complexity of the block encoding circuit and show that it scales polynomially with respect to the number of qubits required to represent a quantum state associated with the pairing Hamiltonian. We also show how the block encoding circuit can be combined with the quantum singular value transformation to construct an efficient quantum circuit for approximating the density of states of a pairing Hamiltonian. The techniques presented can be extended to encode more general second-quantized Hamiltonians. △ Less

Submitted 21 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: 27 pages, 18 figures

MSC Class: 68Q12; 81P68

arXiv:2402.10760 [pdf, other]

RAGIC: Risk-Aware Generative Adversarial Model for Stock Interval Construction

Authors: Jingyi Gu, Wenlu Du, Guiling Wang

Abstract: Efforts to predict stock market outcomes have yielded limited success due to the inherently stochastic nature of the market, influenced by numerous unpredictable factors. Many existing prediction approaches focus on single-point predictions, lacking the depth needed for effective decision-making and often overlooking market risk. To bridge this gap, we propose a novel model, RAGIC, which introduce… ▽ More Efforts to predict stock market outcomes have yielded limited success due to the inherently stochastic nature of the market, influenced by numerous unpredictable factors. Many existing prediction approaches focus on single-point predictions, lacking the depth needed for effective decision-making and often overlooking market risk. To bridge this gap, we propose a novel model, RAGIC, which introduces sequence generation for stock interval prediction to quantify uncertainty more effectively. Our approach leverages a Generative Adversarial Network (GAN) to produce future price sequences infused with randomness inherent in financial markets. RAGIC's generator includes a risk module, capturing the risk perception of informed investors, and a temporal module, accounting for historical price trends and seasonality. This multi-faceted generator informs the creation of risk-sensitive intervals through statistical inference, incorporating horizon-wise insights. The interval's width is carefully adjusted to reflect market volatility. Importantly, our approach relies solely on publicly available data and incurs only low computational overhead. RAGIC's evaluation across globally recognized broad-based indices demonstrates its balanced performance, offering both accuracy and informativeness. Achieving a consistent 95% coverage, RAGIC maintains a narrow interval width. This promising outcome suggests that our approach effectively addresses the challenges of stock market prediction while incorporating vital risk considerations. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.08969 [pdf, other]

Hamiltonian input model and spectroscopy on quantum computers

Authors: Weijie Du, James P. Vary

Abstract: We present a novel input model for general second-quantized Hamiltonians of relativistic or non-relativistic many-fermion systems. This input model incorporates the fermionic anticommutation relations, particle number variations, and respects the symmetries of the Hamiltonian. Based on our input model, we propose a hybrid framework for spectral calculations on future quantum hardwares. We provide… ▽ More We present a novel input model for general second-quantized Hamiltonians of relativistic or non-relativistic many-fermion systems. This input model incorporates the fermionic anticommutation relations, particle number variations, and respects the symmetries of the Hamiltonian. Based on our input model, we propose a hybrid framework for spectral calculations on future quantum hardwares. We provide explicit circuit designs and the associated gate cost and circuit depth. We demonstrate our framework by solving the low-lying spectra of ${^{42}}Ca$ and ${^{46}}Ca$. Our input model provides new pathways to solving the spectra and time evolutions of the relativistic and nonrelativistic many-fermion systems. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: We welcome comments. Please send comments to duweigy@gmail.com

arXiv:2402.04059 [pdf, other]

Deep Learning for Multivariate Time Series Imputation: A Survey

Authors: Jun Wang, Wenjie Du, Wei Cao, Keli Zhang, Wenjia Wang, Yuxuan Liang, Qingsong Wen

Abstract: The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we… ▽ More The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we conduct a comprehensive survey on the recently proposed deep learning imputation methods. First, we propose a taxonomy for the reviewed methods, and then provide a structured review of these methods by highlighting their strengths and limitations. We also conduct empirical experiments to study different methods and compare their enhancement for downstream tasks. Finally, the open issues for future research on multivariate time series imputation are pointed out. All code and configurations of this work, including a regularly maintained multivariate time series imputation paper list, can be found in the GitHub repository~\url{https://github.com/WenjieDu/Awesome\_Imputation}. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 9 pages, 1 figure, 5 tables, 58 referred papers

arXiv:2402.03781 [pdf, other]

MolTC: Towards Molecular Relational Modeling In Language Models

Authors: Junfeng Fang, Shuai Zhang, Chang Wu, Zhengyi Yang, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang

Abstract: Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods… ▽ More Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. To train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC. △ Less

Submitted 10 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ACL 2024

arXiv:2402.01204 [pdf, other]

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data

Authors: Wei-Yao Wang, Wei-Wei Du, Derek Xu, Wei Wang, Wen-Chih Peng

Abstract: Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations f… ▽ More Self-supervised learning (SSL) has been incorporated into many state-of-the-art models in various domains, where SSL defines pretext tasks based on unlabeled datasets to learn contextualized and robust representations. Recently, SSL has been a new trend in exploring the representation learning capability in the realm of tabular data, which is more challenging due to not having explicit relations for learning descriptive representations. This survey aims to systematically review and summarize the recent progress and challenges of SSL for non-sequential tabular data (SSL4NS-TD). We first present a formal definition of NS-TD and clarify its correlation to related studies. Then, these approaches are categorized into three groups -- predictive learning, contrastive learning, and hybrid learning, with their motivations and strengths of representative methods within each direction. On top of this, application issues of SSL4NS-TD are presented, including automatic data engineering, cross-table transferability, and domain knowledge integration. In addition, we elaborate on existing benchmarks and datasets for NS-TD applications to discuss the performance of existing tabular models. Finally, we discuss the challenges of SSL4NS-TD and provide potential directions for future research. We expect our work to be useful in terms of encouraging more research on lowering the barrier to entry SSL for the tabular domain and improving the foundations for implicit tabular data. △ Less

Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: The paper list can be found at https://github.com/wwweiwei/awesome-self-supervised-learning-for-tabular-data

arXiv:2401.17138 [pdf, other]

Nuclear scattering via quantum computing

Authors: Peiyan Wang, Weijie Du, Wei Zuo, James P. Vary

Abstract: We propose a hybrid quantum-classical framework to solve the elastic scattering phase shift of two well-bound nuclei in an uncoupled channel. Within this framework, we develop a many-body formalism in which the continuum scattering states of the two colliding nuclei are regulated by a weak external harmonic oscillator potential with varying strength. Based on our formalism, we propose an approach… ▽ More We propose a hybrid quantum-classical framework to solve the elastic scattering phase shift of two well-bound nuclei in an uncoupled channel. Within this framework, we develop a many-body formalism in which the continuum scattering states of the two colliding nuclei are regulated by a weak external harmonic oscillator potential with varying strength. Based on our formalism, we propose an approach to compute the eigenenergies of the low-lying scattering states of the relative motion of the colliding nuclei as a function of the oscillator strength of the confining potential. Utilizing the modified effective range expansion, we extrapolate the elastic scattering phase shift of the colliding nuclei from these eigenenergies to the limit when the external potential vanishes. In our hybrid approach, we leverage the advantage of quantum computing to solve for these eigenenergies from a set of many-nucleon Hamiltonian eigenvalue problems. These eigenenergies are inputs to classical computers to obtain the phase shift. We demonstrate our framework with two simple problems, where we implement the rodeo algorithm to solve the relevant eigenenergies with the IBM Qiskit quantum simulator. The results of both the spectra and the elastic scattering phase shifts agree well with other theoretical results. △ Less

Submitted 15 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: We welcome comments!

arXiv:2401.15122 [pdf, other]

A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding Dynamics

Authors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Vignesh Bhethanabotla, Nakul Rampal, Omar Yaghi, Christian Borgs, Anima Anandkumar, Hongyu Guo, Jennifer Chayes

Abstract: In drug discovery, molecular dynamics (MD) simulation for protein-ligand binding provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. There has been a long history of improving the efficiency of MD simulations through better numerical methods and, more recently, by utilizing machine learning (ML) methods. Yet, challenges remain, s… ▽ More In drug discovery, molecular dynamics (MD) simulation for protein-ligand binding provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. There has been a long history of improving the efficiency of MD simulations through better numerical methods and, more recently, by utilizing machine learning (ML) methods. Yet, challenges remain, such as accurate modeling of extended-timescale simulations. To address this issue, we propose NeuralMD, the first ML surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding. We propose a principled approach that incorporates a novel physics-informed multi-grained group symmetric framework. Specifically, we propose (1) a BindingNet model that satisfies group symmetry using vector frames and captures the multi-level protein-ligand interactions, and (2) an augmented neural differential equation solver that learns the trajectory under Newtonian mechanics. For the experiment, we design ten single-trajectory and three multi-trajectory binding simulation tasks. We show the efficiency and effectiveness of NeuralMD, with a 2000$\times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric. We further qualitatively show that NeuralMD reaches more stable binding predictions compared to other machine learning methods. △ Less

Submitted 1 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14944 [pdf, other]

Modeling the wavelength dependence of photo-response non-uniformity of a CCD sensor

Authors: Zun Luo, Wei Du, Baocun Chen, Xianmin Meng, Hu Zhan

Abstract: Precision measurements of astrometry and photometry require stringent control of systematics such as those arising from imperfect correction of sensor effects. In this work, we develop a parametric method to model the wavelength dependence of photo-response non-uniformity (PRNU) for a laser annealed backside-illuminated charge-coupled device. The model accurately reproduces the PRNU patterns of fl… ▽ More Precision measurements of astrometry and photometry require stringent control of systematics such as those arising from imperfect correction of sensor effects. In this work, we develop a parametric method to model the wavelength dependence of photo-response non-uniformity (PRNU) for a laser annealed backside-illuminated charge-coupled device. The model accurately reproduces the PRNU patterns of flat-field images taken at nine wavelengths from 290nm to 950nm, leaving the root mean square (RMS) residuals no more than 0.2% in most cases. By removing the large-scale non-uniformity in the flat fields, the RMS residuals are further reduced. This model fitting approach gives more accurate predictions of the PRNU than cubic-spline interpolation does with fewer free parameters. It can be applied to make PRNU corrections for individual objects according to their spectral energy distribution to reduce photometry errors. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 28 pages, 10 figures, comments are welcome

arXiv:2401.12975 [pdf, other]

HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments

Authors: Qinhong Zhou, Sunli Chen, Yisong Wang, Haozhe Xu, Weihua Du, Hongxin Zhang, Yilun Du, Joshua B. Tenenbaum, Chuang Gan

Abstract: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events… ▽ More Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind, and specifically supports the utilization of large language models (LLMs) to assist common sense reasoning and decision-making. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines, including reinforcement learning (RL), rule-based, and search-based methods in dynamically changing environments. As a first step toward addressing this challenge using large language models, we further develop an LLM-based agent and perform an in-depth analysis of its promise and challenge of solving these challenging tasks. HAZARD is available at https://vis-www.cs.umass.edu/hazard/. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: ICLR 2024. The first two authors contributed equally to this work

arXiv:2401.10274 [pdf, ps, other]

Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil Scheduling

Authors: Wanting Zhang, Wei Du, Guo Yu, Renchu He, Wenli Du, Yaochu Jin

Abstract: With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs… ▽ More With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs from crude unloading, transportation, crude distillation unit processing, and inventory management of intermediate products. On the basis of the proposed model, a dual-stage evolutionary algorithm driven by heuristic rules (denoted by DSEA/HR) is developed, where the dual-stage search mechanism consists of global search and local refinement. In the global search stage, we devise several heuristic rules based on the empirical operating knowledge to generate a well-performing initial population and accelerate convergence in the mixed variables space. In the local refinement stage, a repair strategy is proposed to move the infeasible solutions towards feasible regions by further optimizing the local continuous variables. During the whole evolutionary process, the proposed dual-stage framework plays a crucial role in balancing exploration and exploitation. Experimental results have shown that DSEA/HR outperforms the state-of-the-art and widely-used mathematical programming methods and metaheuristic algorithms on LSCOSP instances within a reasonable time. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.06786 [pdf, other]

CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. We further enhanced the dataset to meet practical needs by rephrasing questions in a concise, abbreviated, and bilingual manner. The dataset consists of 1011 problems that take more than 1200 human hours to complete. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 12 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost. △ Less

Submitted 9 November, 2023; originally announced January 2024.

arXiv:2401.03072 [pdf, other]

Optimal Nonparametric Inference on Network Effects with Dependent Edges

Authors: Wenqin Du, Yuan Zhang, Wen Zhou

Abstract: Testing network effects in weighted directed networks is a foundational problem in econometrics, sociology, and psychology. Yet, the prevalent edge dependency poses a significant methodological challenge. Most existing methods are model-based and come with stringent assumptions, limiting their applicability. In response, we introduce a novel, fully nonparametric framework that requires only minima… ▽ More Testing network effects in weighted directed networks is a foundational problem in econometrics, sociology, and psychology. Yet, the prevalent edge dependency poses a significant methodological challenge. Most existing methods are model-based and come with stringent assumptions, limiting their applicability. In response, we introduce a novel, fully nonparametric framework that requires only minimal regularity assumptions. While inspired by recent developments in $U$-statistic literature (arXiv:1712.00771, arXiv:2004.06615), our approach notably broadens their scopes. Specifically, we identified and carefully addressed the challenge of indeterminate degeneracy in the test statistics $-$ a problem that aforementioned tools do not handle. We established Berry-Esseen type bound for the accuracy of type-I error rate control. Using original analysis, we also proved the minimax optimality of our test's power. Simulations underscore the superiority of our method in computation speed, accuracy, and numerical robustness compared to competing methods. We also applied our method to the U.S. faculty hiring network data and discovered intriguing findings. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 29 pages, 3 figures

MSC Class: 62E17; 62G10; 91D30

arXiv:2401.01801 [pdf, other]

A quatum inspired neural network for geometric modeling

Authors: Weitao Du, Shengchao Liu, Xuecang Zhang

Abstract: By conceiving physical systems as 3D many-body point clouds, geometric graph neural networks (GNNs), such as SE(3)/E(3) equivalent GNNs, have showcased promising performance. In particular, their effective message-passing mechanics make them adept at modeling molecules and crystalline materials. However, current geometric GNNs only offer a mean-field approximation of the many-body system, encapsul… ▽ More By conceiving physical systems as 3D many-body point clouds, geometric graph neural networks (GNNs), such as SE(3)/E(3) equivalent GNNs, have showcased promising performance. In particular, their effective message-passing mechanics make them adept at modeling molecules and crystalline materials. However, current geometric GNNs only offer a mean-field approximation of the many-body system, encapsulated within two-body message passing, thus falling short in capturing intricate relationships within these geometric graphs. To address this limitation, tensor networks, widely employed by computational physics to handle manybody systems using high-order tensors, have been introduced. Nevertheless, integrating these tensorized networks into the message-passing framework of GNNs faces scalability and symmetry conservation (e.g., permutation and rotation) challenges. In response, we introduce an innovative equivariant Matrix Product State (MPS)-based message-passing strategy, through achieving an efficient implementation of the tensor contraction operation. Our method effectively models complex many-body relationships, suppressing mean-field approximations, and captures symmetries within geometric graphs. Importantly, it seamlessly replaces the standard message-passing and layer-aggregation modules intrinsic to geometric GNNs. We empirically validate the superior accuracy of our approach on benchmark tasks, including predicting classical Newton systems and quantum tensor Hamiltonian matrices. To our knowledge, our approach represents the inaugural utilization of parameterized geometric tensor networks. △ Less

Submitted 28 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Showing 1–50 of 339 results for author: Du, W