-
Simple Multigraph Convolution Networks
Authors:
Danyang Wu,
Xinjie Shen,
Jitao Lu,
Jin Xu,
Feiping Nie
Abstract:
Existing multigraph convolution methods either ignore the cross-view interaction among multiple graphs, or induce extremely high computational cost due to standard cross-view polynomial operators. To alleviate this problem, this paper proposes a Simple MultiGraph Convolution Networks (SMGCN) which first extracts consistent cross-view topology from multigraphs including edge-level and subgraph-leve…
▽ More
Existing multigraph convolution methods either ignore the cross-view interaction among multiple graphs, or induce extremely high computational cost due to standard cross-view polynomial operators. To alleviate this problem, this paper proposes a Simple MultiGraph Convolution Networks (SMGCN) which first extracts consistent cross-view topology from multigraphs including edge-level and subgraph-level topology, then performs polynomial expansion based on raw multigraphs and consistent topologies. In theory, SMGCN utilizes the consistent topologies in polynomial expansion rather than standard cross-view polynomial expansion, which performs credible cross-view spatial message-passing, follows the spectral convolution paradigm, and effectively reduces the complexity of standard polynomial expansion. In the simulations, experimental results demonstrate that SMGCN achieves state-of-the-art performance on ACM and DBLP multigraph benchmark datasets. Our codes are available at https://github.com/frinkleko/SMGCN.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Extended Time-Dependent Density Functional Theory for Multi-Body Densities
Authors:
Jiong-Hang Liang,
Tian-Xing Hu,
D. Wu,
Zheng-Mao Sheng,
J. Zhang
Abstract:
Time-dependent density functional theory (TDDFT) is widely used for understanding and predicting properties and behaviors of matter. As one of the fundamental theorems in TDDFT, van Leeuwen's theorem [Phys. Rev. Lett. 82, 3863 (1999)] guarantees how to construct a unique potential with the same one-body density evolution. Here we extend van Leeuwen's theorem by exploring truncation criteria in BBG…
▽ More
Time-dependent density functional theory (TDDFT) is widely used for understanding and predicting properties and behaviors of matter. As one of the fundamental theorems in TDDFT, van Leeuwen's theorem [Phys. Rev. Lett. 82, 3863 (1999)] guarantees how to construct a unique potential with the same one-body density evolution. Here we extend van Leeuwen's theorem by exploring truncation criteria in BBGKY-hierarchy. Our generalized theorem demonstrates the existence of a unique non-local potential to accurately reconstruct the multi-body density evolution in binary interacting systems. Under non-stringent conditions, truncation of the BBGKY-hierarchy equations aligns with the behavior of multi-body density evolution, and maintains consistency in the reduced equations. As one of applications within the extended TDDFT supported by our theorem, multiple excitation energy can be typically solved as the eigenvalue of a generalized Casida's equation. The extended TDDFT provides an accurate and first-principle framework capable of describing the kinetic processes of correlated system, including strongly coupled particle transport, multiple excitation and ionization processes.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Qubit-Wise Architecture Search Method for Variational Quantum Circuits
Authors:
Jialin Chen,
Zhiqiang Cai,
Ke Xu,
Di Wu,
Wei Cao
Abstract:
Considering the noise level limit, one crucial aspect for quantum machine learning is to design a high-performing variational quantum circuit architecture with small number of quantum gates. As the classical neural architecture search (NAS), quantum architecture search methods (QAS) employ methods like reinforcement learning, evolutionary algorithms and supernet optimiza-tion to improve the search…
▽ More
Considering the noise level limit, one crucial aspect for quantum machine learning is to design a high-performing variational quantum circuit architecture with small number of quantum gates. As the classical neural architecture search (NAS), quantum architecture search methods (QAS) employ methods like reinforcement learning, evolutionary algorithms and supernet optimiza-tion to improve the search efficiency. In this paper, we propose a novel qubit-wise architec-ture search (QWAS) method, which progres-sively search one-qubit configuration per stage, and combine with Monte Carlo Tree Search al-gorithm to find good quantum architectures by partitioning the search space into several good and bad subregions. The numerical experimental results indicate that our proposed method can balance the exploration and exploitation of cir-cuit performance and size in some real-world tasks, such as MNIST, Fashion and MOSI. As far as we know, QWAS achieves the state-of-art re-sults of all tasks in the terms of accuracy and circuit size.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Ultralight vector dark matter search using data from the KAGRA O3GK run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi
, et al. (1778 additional authors not shown)
Abstract:
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese…
▽ More
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Authors:
Hanlin Tang,
Yifu Sun,
Decheng Wu,
Kai Liu,
Jianchen Zhu,
Zhanhui Kang
Abstract:
Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which m…
▽ More
Large language models (LLMs) have proven to be very superior to conventional methods in various tasks. However, their expensive computations and high memory requirements are prohibitive for deployment. Model quantization is an effective method for reducing this overhead. The problem is that in most previous works, the quantized model was calibrated using few samples from the training data, which might affect the generalization of the quantized LLMs to unknown cases and tasks. Hence in this work, we explore an important question: Can we design a data-independent quantization method for LLMs to guarantee its generalization performance? In this work, we propose EasyQuant, a training-free and data-independent weight-only quantization algorithm for LLMs. Our observation indicates that two factors: outliers in the weight and quantization ranges, are essential for reducing the quantization error. Therefore, in EasyQuant, we leave the outliers (less than 1%) unchanged and optimize the quantization range to reduce the reconstruction error. With these methods, we surprisingly find that EasyQuant achieves comparable performance to the original model. Since EasyQuant does not depend on any training data, the generalization performance of quantized LLMs is safely guaranteed. Moreover, EasyQuant can be implemented in parallel so that the quantized model could be attained in a few minutes even for LLMs over 100B. To our best knowledge, we are the first work that achieves almost lossless quantization performance for LLMs under a data-independent setting and our algorithm runs over 10 times faster than the data-dependent methods.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Pre-trained Model-based Actionable Warning Identification: A Feasibility Study
Authors:
Xiuting Ge,
Chunrong Fang,
Quanjun Zhang,
Daoyuan Wu,
Bowen Yu,
Qirui Zheng,
An Guo,
Shangwei Lin,
Zhihong Zhao,
Yang Liu,
Zhenyu Chen
Abstract:
Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develo…
▽ More
Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of static code analyzers. Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and demonstrated substantial success applications on various code-related tasks, could potentially circumvent the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting the extensive evaluation on 10K+ SpotBugs warnings from 10 large-scale and open-source projects, we observe that all studied PTMs are consistently 9.85%~21.12% better than the state-of-the-art ML-based AWI approaches. Besides, we investigate the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow. Further, we identify the reasons for current PTMs' underperformance on AWI. Based on our findings, we provide several practical guidelines to enhance PTM-based AWI in future work.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Low-rank matrix estimation via nonconvex spectral regularized methods in errors-in-variables matrix regression
Authors:
Xin Li,
Dongya Wu
Abstract:
High-dimensional matrix regression has been studied in various aspects, such as statistical properties, computational efficiency and application to specific instances including multivariate regression, system identification and matrix compressed sensing. Current studies mainly consider the idealized case that the covariate matrix is obtained without noise, while the more realistic scenario that th…
▽ More
High-dimensional matrix regression has been studied in various aspects, such as statistical properties, computational efficiency and application to specific instances including multivariate regression, system identification and matrix compressed sensing. Current studies mainly consider the idealized case that the covariate matrix is obtained without noise, while the more realistic scenario that the covariates may always be corrupted with noise or missing data has received little attention. We consider the general errors-in-variables matrix regression model and proposed a unified framework for low-rank estimation based on nonconvex spectral regularization. Then in the statistical aspect, recovery bounds for any stationary points are provided to achieve statistical consistency. In the computational aspect, the proximal gradient method is applied to solve the nonconvex optimization problem and is proved to converge in polynomial time. Consequences for specific matrix compressed sensing models with additive noise and missing data are obtained via verifying corresponding regularity conditions. Finally, the performance of the proposed nonconvex estimation method is illustrated by numerical experiments.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Towards Optimal Customized Architecture for Heterogeneous Federated Learning with Contrastive Cloud-Edge Model Decoupling
Authors:
Xingyan Chen,
Tian Du,
Mu Wang,
Tiancheng Gu,
Yu Zhao,
Gang Kou,
Changqiao Xu,
Dapeng Oliver Wu
Abstract:
Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting. However, the heterogeneity of edge data distribution drags the model towards the local minima, which can be distant from the global optimum. Such heterogeneity often leads to slow convergence and substa…
▽ More
Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting. However, the heterogeneity of edge data distribution drags the model towards the local minima, which can be distant from the global optimum. Such heterogeneity often leads to slow convergence and substantial communication overhead. To address these issues, we propose a novel federated learning framework called FedCMD, a model decoupling tailored to the Cloud-edge supported federated learning that separates deep neural networks into a body for capturing shared representations in Cloud and a personalized head for migrating data heterogeneity. Our motivation is that, by the deep investigation of the performance of selecting different neural network layers as the personalized head, we found rigidly assigning the last layer as the personalized head in current studies is not always optimal. Instead, it is necessary to dynamically select the personalized layer that maximizes the training performance by taking the representation difference between neighbor layers into account. To find the optimal personalized layer, we utilize the low-dimensional representation of each layer to contrast feature distribution transfer and introduce a Wasserstein-based layer selection method, aimed at identifying the best-match layer for personalization. Additionally, a weighted global aggregation algorithm is proposed based on the selected personalized layer for the practical application of FedCMD. Extensive experiments on ten benchmarks demonstrate the efficiency and superior performance of our solution compared with nine state-of-the-art solutions. All code and results are available at https://github.com/elegy112138/FedCMD.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
NoteLLM: A Retrievable Large Language Model for Note Recommendation
Authors:
Chao Zhang,
Shiwei Wu,
Haoxin Zhang,
Tong Xu,
Yan Gao,
Yao Hu,
Di Wu,
Enhong Chen
Abstract:
People enjoy sharing "notes" including their experiences within online communities. Therefore, recommending notes aligned with user interests has become a crucial task. Existing online methods only input notes into BERT-based models to generate note embeddings for assessing similarity. However, they may underutilize some important cues, e.g., hashtags or categories, which represent the key concept…
▽ More
People enjoy sharing "notes" including their experiences within online communities. Therefore, recommending notes aligned with user interests has become a crucial task. Existing online methods only input notes into BERT-based models to generate note embeddings for assessing similarity. However, they may underutilize some important cues, e.g., hashtags or categories, which represent the key concepts of notes. Indeed, learning to generate hashtags/categories can potentially enhance note embeddings, both of which compress key note information into limited content. Besides, Large Language Models (LLMs) have significantly outperformed BERT in understanding natural languages. It is promising to introduce LLMs into note recommendation. In this paper, we propose a novel unified framework called NoteLLM, which leverages LLMs to address the item-to-item (I2I) note recommendation. Specifically, we utilize Note Compression Prompt to compress a note into a single special token, and further learn the potentially related notes' embeddings via a contrastive learning approach. Moreover, we use NoteLLM to summarize the note and generate the hashtag/category automatically through instruction tuning. Extensive validations on real scenarios demonstrate the effectiveness of our proposed method compared with the online baseline and show major improvements in the recommendation system of Xiaohongshu.
△ Less
Submitted 25 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Hill Function-based Model of Transcriptional Response: Impact of Nonspecific Binding and RNAP Interactions
Authors:
Wenjia Shi,
Yao Ma,
Peilin Hu,
Mi Pang,
Xiaona Huang,
Yiting Dang,
Yuxin Xie,
Danni Wu
Abstract:
Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical i…
▽ More
Hill function is one of the widely used gene transcription regulation models. Its attribute of fitting may result in a lack of an underlying physical picture, yet the fitting parameters can provide information about biochemical reactions, such as the number of transcription factors (TFs) and the binding energy between regulatory elements. However, it remains unclear when and how much biochemical information can Hill function provide in addition to fitting. Here, started from the interactions between TFs and RNA polymerase during transcription regulation and both of their association-dissociation reactions at specific/nonspecific sites on DNA, the regulatory effect of TFs was deduced as fold change. We found that, for weak promoter, fold change can degrade into the regulatory factor (Freg) which is closely correlated with Hill function. By directly comparing and fitting with Hill function, the fitting parameters and corresponding biochemical reaction parameters in Freg were analyzed and discussed, where the single TF and multiple TFs that with cooperativity and basic logic effects were considered. We concluded the strength of promoter and interactions between TFs determine whether Hill function can reflect the corresponding biochemical information. Our findings highlight the role of Hill function in modeling/fitting for transcriptional regulation, which also benefits the preparation of synthetic regulatory elements.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling
Authors:
Ruijia Niu,
Dongxia Wu,
Kai Kim,
Yi-An Ma,
Duncan Watson-Parris,
Rose Yu
Abstract:
Multi-fidelity surrogate modeling aims to learn an accurate surrogate at the highest fidelity level by combining data from multiple sources. Traditional methods relying on Gaussian processes can hardly scale to high-dimensional data. Deep learning approaches utilize neural network based encoders and decoders to improve scalability. These approaches share encoded representations across fidelities w…
▽ More
Multi-fidelity surrogate modeling aims to learn an accurate surrogate at the highest fidelity level by combining data from multiple sources. Traditional methods relying on Gaussian processes can hardly scale to high-dimensional data. Deep learning approaches utilize neural network based encoders and decoders to improve scalability. These approaches share encoded representations across fidelities without including corresponding decoder parameters. This hinders inference performance, especially in out-of-distribution scenarios when the highest fidelity data has limited domain coverage. To address these limitations, we propose Multi-fidelity Residual Neural Processes (MFRNP), a novel multi-fidelity surrogate modeling framework. MFRNP explicitly models the residual between the aggregated output from lower fidelities and ground truth at the highest fidelity. The aggregation introduces decoders into the information sharing step and optimizes lower fidelity decoders to accurately capture both in-fidelity and cross-fidelity information. We show that MFRNP significantly outperforms state-of-the-art in learning partial differential equations and a real-world climate modeling task. Our code is published at: https://github.com/Rose-STL-Lab/MFRNP
△ Less
Submitted 24 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Meta-Task Prompting Elicits Embedding from Large Language Models
Authors:
Yibin Lei,
Di Wu,
Tianyi Zhou,
Tao Shen,
Yu Cao,
Chongyang Tao,
Andrew Yates
Abstract:
In this work, we introduce a new unsupervised embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning or task-specific engineering. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts…
▽ More
In this work, we introduce a new unsupervised embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning or task-specific engineering. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts that address multiple representational aspects. Our comprehensive experiments demonstrate that embeddings averaged from various meta-tasks yield competitive performance on Semantic Textual Similarity (STS) benchmarks and excel in downstream tasks, surpassing contrastive-trained models. Our findings suggest a new scaling law for embedding generation, offering a versatile, resource-efficient approach for embedding extraction across diverse sentence-centric scenarios.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment
Authors:
Juan Zhang,
Jiahao Chen,
Cheng Wang,
Zhiwang Yu,
Tangquan Qi,
Di Wu
Abstract:
Despite numerous completed studies, achieving high fidelity talking face generation with highly synchronized lip movements corresponding to arbitrary audio remains a significant challenge in the field. The shortcomings of published studies continue to confuse many researchers. This paper introduces G4G, a generic framework for high fidelity talking face generation with fine-grained intra-modal ali…
▽ More
Despite numerous completed studies, achieving high fidelity talking face generation with highly synchronized lip movements corresponding to arbitrary audio remains a significant challenge in the field. The shortcomings of published studies continue to confuse many researchers. This paper introduces G4G, a generic framework for high fidelity talking face generation with fine-grained intra-modal alignment. G4G can reenact the high fidelity of original video while producing highly synchronized lip movements regardless of given audio tones or volumes. The key to G4G's success is the use of a diagonal matrix to enhance the ordinary alignment of audio-image intra-modal features, which significantly increases the comparative learning between positive and negative samples. Additionally, a multi-scaled supervision module is introduced to comprehensively reenact the perceptional fidelity of original video across the facial region while emphasizing the synchronization of lip movements and the input audio. A fusion network is then used to further fuse the facial region and the rest. Our experimental results demonstrate significant achievements in reenactment of original video quality as well as highly synchronized talking lips. G4G is an outperforming generic framework that can produce talking videos competitively closer to ground truth level than current state-of-the-art methods.
△ Less
Submitted 2 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints
Authors:
Lingkai Kong,
Yuanqi Du,
Wenhao Mu,
Kirill Neklyudov,
Valentin De Bortoli,
Haorui Wang,
Dongxia Wu,
Aaron Ferber,
Yi-An Ma,
Carla P. Gomes,
Chao Zhang
Abstract:
Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in pra…
▽ More
Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in practice. To deal with such unknown constraints, we propose to perform optimization within the data manifold using diffusion models. To constrain the optimization process to the data manifold, we reformulate the original optimization problem as a sampling problem from the product of the Boltzmann distribution defined by the objective function and the data distribution learned by the diffusion model. To enhance sampling efficiency, we propose a two-stage framework that begins with a guided diffusion process for warm-up, followed by a Langevin dynamics stage for further correction. Theoretical analysis shows that the initial stage results in a distribution focused on feasible solutions, thereby providing a better initialization for the later stage. Comprehensive experiments on a synthetic dataset, six real-world black-box optimization datasets, and a multi-objective optimization dataset show that our method achieves better or comparable performance with previous state-of-the-art baselines.
△ Less
Submitted 29 April, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Sparse Autoregressive Neural Networks for Classical Spin Systems
Authors:
Indaco Biazzo,
Dian Wu,
Giuseppe Carleo
Abstract:
Efficient sampling and approximation of Boltzmann distributions involving large sets of binary variables, or spins, are pivotal in diverse scientific fields even beyond physics. Recent advances in generative neural networks have significantly impacted this domain. However, these neural networks are often treated as black boxes, with architectures primarily influenced by data-driven problems in com…
▽ More
Efficient sampling and approximation of Boltzmann distributions involving large sets of binary variables, or spins, are pivotal in diverse scientific fields even beyond physics. Recent advances in generative neural networks have significantly impacted this domain. However, these neural networks are often treated as black boxes, with architectures primarily influenced by data-driven problems in computational science. Addressing this gap, we introduce a novel autoregressive neural network architecture named TwoBo, specifically designed for sparse two-body interacting spin systems. We directly incorporate the Boltzmann distribution into its architecture and parameters, resulting in enhanced convergence speed, superior free energy accuracy, and reduced trainable parameters. We perform numerical experiments on disordered, frustrated systems with more than 1000 spins on grids and random graphs, and demonstrate its advantages compared to previous autoregressive and recurrent architectures. Our findings validate a physically informed approach and suggest potential extensions to multivalued variables and many-body interaction systems, paving the way for broader applications in scientific research.
△ Less
Submitted 21 June, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Mass production and performance study on the 20-inch PMT acrylic protection covers in JUNO
Authors:
Miao He,
Zhonghua Qin,
Diru Wu,
Meihang Xu,
Wan Xie,
Fang Chen,
Xiaoping Jing,
Genhua Yin,
Shengjiong Yin,
Linhua Gu,
Xiaofeng Xia,
Qinchang Wang
Abstract:
The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional p…
▽ More
The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional precision, mechanical strength, and transparency. This paper presents the manufacturing technology, mass production process, and performance characteristics of the acrylic covers.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
LLMs Can Defend Themselves Against Jailbreaking in a Practical Manner: A Vision Paper
Authors:
Daoyuan Wu,
Shuai Wang,
Yang Liu,
Ning Liu
Abstract:
Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs). A considerable amount of research exists proposing more effective jailbreak attacks, including the recent Greedy Coordinate Gradient (GCG) attack, jailbreak template-based attacks such as using "Do-Anything-Now" (DAN), and multilingual jailbreak. In contrast, th…
▽ More
Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs). A considerable amount of research exists proposing more effective jailbreak attacks, including the recent Greedy Coordinate Gradient (GCG) attack, jailbreak template-based attacks such as using "Do-Anything-Now" (DAN), and multilingual jailbreak. In contrast, the defensive side has been relatively less explored. This paper proposes a lightweight yet practical defense called SELFDEFEND, which can defend against all existing jailbreak attacks with minimal delay for jailbreak prompts and negligible delay for normal user prompts. Our key insight is that regardless of the kind of jailbreak strategies employed, they eventually need to include a harmful prompt (e.g., "how to make a bomb") in the prompt sent to LLMs, and we found that existing LLMs can effectively recognize such harmful prompts that violate their safety policies. Based on this insight, we design a shadow stack that concurrently checks whether a harmful prompt exists in the user prompt and triggers a checkpoint in the normal stack once a token of "No" or a harmful prompt is output. The latter could also generate an explainable LLM response to adversarial prompts. We demonstrate our idea of SELFDEFEND works in various jailbreak scenarios through manual analysis in GPT-3.5/4. We also list three future directions to further enhance SELFDEFEND.
△ Less
Submitted 4 March, 2024; v1 submitted 24 February, 2024;
originally announced February 2024.
-
Topological classes of thermodynamics of the rotating charged AdS black holes in gauged supergravities
Authors:
Xiao-Dan Zhu,
Di Wu,
Dan Wen
Abstract:
In this paper, we investigate the topological numbers of rotating charged AdS black holes in both four- and five-dimensional gauged supergravity theories. Our analysis is conducted within the framework of the thermodynamical topological approach to black holes, utilizing the generalized off-shell Helmholtz free energy. We demonstrate that the number of rotation parameters plays a significant role…
▽ More
In this paper, we investigate the topological numbers of rotating charged AdS black holes in both four- and five-dimensional gauged supergravity theories. Our analysis is conducted within the framework of the thermodynamical topological approach to black holes, utilizing the generalized off-shell Helmholtz free energy. We demonstrate that the number of rotation parameters plays a significant role in determining the topological numbers of five-dimensional rotating AdS black holes. Moreover, our findings indicate that the topological numbers of both four- and five-dimensional rotating AdS black holes are not influenced by the number of electric charge parameters. This highlights a distinct difference in how rotation and electric charge parameters impact the thermodynamic topological properties of these black holes.
△ Less
Submitted 30 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Ultra-short lifetime isomer studies from photonuclear reactions using laser-driven ultra-intense γ-ray
Authors:
Di Wu,
Haoyang Lan,
Jiaxing Liu,
Huangang Lu,
Jianyao Zhang,
Jianfeng Lv,
Xuezhi Wu,
Hui Zhang,
Yadong Xia,
Qiangyou He,
Jie Cai,
Qianyi Ma,
Yuhui Xia,
Zhenan Wang,
Meizhi Wang,
Zhiyan Yang,
Xinlu Xu,
Yixing Geng,
Chen Lin,
Wenjun Ma,
Yanying Zhao,
Haoran Wang,
Fulong Liu,
Chuangye He,
Jinqing Yu
, et al. (7 additional authors not shown)
Abstract:
Isomers, ubiquitous populations of relatively long-lived nuclear excited states, play a crucial role in nuclear physics. However, isomers with half-life times of several seconds or less barely had experimental cross section data due to the lack of a suitable measuring method. We report a method of online γ spectroscopy for ultra-short-lived isomers from photonuclear reactions using laser-driven ul…
▽ More
Isomers, ubiquitous populations of relatively long-lived nuclear excited states, play a crucial role in nuclear physics. However, isomers with half-life times of several seconds or less barely had experimental cross section data due to the lack of a suitable measuring method. We report a method of online γ spectroscopy for ultra-short-lived isomers from photonuclear reactions using laser-driven ultra-intense γ-rays. The fastest time resolution can reach sub-ps level with γ-ray intensities >10^{19}/s ({\geqslant} 8 MeV). The ^{115}In(γ, n)^{114m2}In reaction (T_{1/2} = 43.1 ms) was first measured in the high-energy region which shed light on the nuclear structure studies of In element. Simulations showed it would be an efficient way to study ^{229m}Th (T_{1/2} = 7 μs), which is believed to be the next generation of nuclear clock. This work offered a unique way of gaining insight into ultra-short lifetimes and promised an effective way to fill the gap in relevant experimental data.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Investigation of profile shifting and subpulse movement in PSR J0344-0901 with FAST
Authors:
H. M. Tedila,
R. Yuen,
N. Wang,
D. Li,
Z. G. Wen,
W. M. Yan,
J. P. Yuan,
X. H. Han,
P. Wang,
W. W. Zhu,
S. J. Dang,
S. Q. Wang,
J. T. Xie,
Q. D. Wu,
Sh. Khasanov,
FAST Collaboration
Abstract:
We report two phenomena detected in PSR J0344$-$0901 from two observations conducted at frequency centered at 1.25 GHz using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The first phenomenon manifests as shifting in the pulse emission to later longitudinal phases and then gradually returns to its original location. The event lasts for about 216 pulse periods, with an average s…
▽ More
We report two phenomena detected in PSR J0344$-$0901 from two observations conducted at frequency centered at 1.25 GHz using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The first phenomenon manifests as shifting in the pulse emission to later longitudinal phases and then gradually returns to its original location. The event lasts for about 216 pulse periods, with an average shift of about $0.7^\circ$ measured at the peak of the integrated profile. Changes in the polarization position angle (PPA) are detected around the trailing edge of the profile, together with an increase in the profile width. The second phenomenon is characterized by the apparent movement of subpulses, which results in different subpulse track patterns across the profile window. For the first time in this pulsar, we identify four emission modes, each with unique subpulse movement, and determine the pattern periods for three of the emission modes. Pulse nulling was not detected. Modeling of the changes in the PPA using the rotating vector model gives an inclination angle of $75.12^\circ \pm 3.80^\circ$ and an impact parameter of $-3.17^\circ \pm 5.32^\circ$ for this pulsar. We speculate that the subpulse movement may be related to the shifting of the pulse emission.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation
Authors:
Di Wu,
Wasi Uddin Ahmad,
Kai-Wei Chang
Abstract:
This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a perfor…
▽ More
This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a performance comparison between in-domain encoder-only and encoder-decoder PLMs across varied resource settings. Our findings, derived from extensive experimentation in two domains reveal that with encoder-only PLMs, although KPE with Conditional Random Fields slightly excels in identifying present keyphrases, the KPG formulation renders a broader spectrum of keyphrase predictions. Additionally, prefix-LM fine-tuning of encoder-only PLMs emerges as a strong and data-efficient strategy for KPG, outperforming general-domain seq2seq PLMs. We also identify a favorable parameter allocation towards model depth rather than width when employing encoder-decoder architectures initialized with encoder-only PLMs. The study sheds light on the potential of utilizing encoder-only PLMs for advancing KPG systems and provides a groundwork for future KPG methods. Our code and pre-trained checkpoints are released at https://github.com/uclanlp/DeepKPG.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Robust recovery for stochastic block models, simplified and generalized
Authors:
Sidhanth Mohanty,
Prasad Raghavendra,
David X. Wu
Abstract:
We study the problem of $\textit{robust community recovery}$: efficiently recovering communities in sparse stochastic block models in the presence of adversarial corruptions. In the absence of adversarial corruptions, there are efficient algorithms when the $\textit{signal-to-noise ratio}$ exceeds the $\textit{Kesten--Stigum (KS) threshold}$, widely believed to be the computational threshold for t…
▽ More
We study the problem of $\textit{robust community recovery}$: efficiently recovering communities in sparse stochastic block models in the presence of adversarial corruptions. In the absence of adversarial corruptions, there are efficient algorithms when the $\textit{signal-to-noise ratio}$ exceeds the $\textit{Kesten--Stigum (KS) threshold}$, widely believed to be the computational threshold for this problem. The question we study is: does the computational threshold for robust community recovery also lie at the KS threshold? We answer this question affirmatively, providing an algorithm for robust community recovery for arbitrary stochastic block models on any constant number of communities, generalizing the work of Ding, d'Orsi, Nasser & Steurer on an efficient algorithm above the KS threshold in the case of $2$-community block models.
There are three main ingredients to our work:
(i) The Bethe Hessian of the graph is defined as $H_G(t) \triangleq (D_G-I)t^2 - A_Gt + I$ where $D_G$ is the diagonal matrix of degrees and $A_G$ is the adjacency matrix. Empirical work suggested that the Bethe Hessian for the stochastic block model has outlier eigenvectors corresponding to the communities right above the Kesten-Stigum threshold. We formally confirm the existence of outlier eigenvalues for the Bethe Hessian, by explicitly constructing outlier eigenvectors from the community vectors.
(ii) We develop an algorithm for a variant of robust PCA on sparse matrices. Specifically, an algorithm to partially recover top eigenspaces from adversarially corrupted sparse matrices under mild delocalization constraints.
(iii) A rounding algorithm to turn vector assignments of vertices into a community assignment, inspired by the algorithm of Charikar \& Wirth \cite{CW04} for $2$XOR.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Multigap superconductivity in lithium intercalated bilayer Mo$_2$C
Authors:
Can Hong,
Danhong Wu,
Xi-Bo Li,
Feipeng Zheng
Abstract:
Interlayer coupling can significantly influence the physical properties of layered transition metal compounds. The superconductivity in layered Mo$_2$C systems, belonging to the emergent family of MXene, has garnered considerable attention. However, the impact of interlayer coupling on superconductivity, and the anisotropic superconducting properties in these systems are not yet clear. By performi…
▽ More
Interlayer coupling can significantly influence the physical properties of layered transition metal compounds. The superconductivity in layered Mo$_2$C systems, belonging to the emergent family of MXene, has garnered considerable attention. However, the impact of interlayer coupling on superconductivity, and the anisotropic superconducting properties in these systems are not yet clear. By performing first-principles calculations of electron-phonon coupling and anisotropic superconducting properties, we show that the interlayer coupling in bilayer 1$T$-Mo$_2$C suppresses superconductivity, resulting in a significant drop in superconducting transition temperature ($T_{\mathrm{c}}$) from 4.2 $K$ in its monolayer form to nearly 0 $K$. By introducing lithium atoms into the interlayer space of the bilayer, the interlayer coupling can be effectively weakened, transforming the system into a two-gap superconductor with a $T_{\mathrm{c}}$ above 10 $K$. A 3\% tensile strain can further transform the system into a three-gap superconductor with a significantly enhanced $T_{\mathrm{c}}$ of approximately 24.7 $K$, which is very high in the Mo$_2$C related systems. The enhancement of the superconductivity induced by the strain is mainly due to the downshift of an energy band with a flat dispersion to the energy near the Fermi level. The in-plane vibrations of Mo atoms and the $d$-orbital electrons of Mo atoms are most important for the formation of the superconductivity. Our method can also be applied to multilayer Mo$_2$C systems. Given the successful synthesis of layered Mo$_2$C systems and the experimental realization of alkaline metal atom depositions, our work presents a practically feasible strategy for achieving high $T_{\mathrm{c}}$ and multigap superconductivity in layered Mo$_2$C.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Passive Aperiodic Optical Phased Array based on Uniform Random Shuffle
Authors:
Bowen Yu,
Dachuan Wu,
Yasha Yi
Abstract:
Grating lobes arise from the periodic nature of element spacing in the optical phased array. Essentially, the phased array performs the Spatial Fourier Transform on light; the steering capability of the main lobe is governed by phase shift variations among waveguides, and the Sidelobe Suppression Ratio (SLSR) correlates with the uniformity of emitter positions. Leveraging this understanding, we ha…
▽ More
Grating lobes arise from the periodic nature of element spacing in the optical phased array. Essentially, the phased array performs the Spatial Fourier Transform on light; the steering capability of the main lobe is governed by phase shift variations among waveguides, and the Sidelobe Suppression Ratio (SLSR) correlates with the uniformity of emitter positions. Leveraging this understanding, we have optimized a 1x64 channel passive aperiodic OPAs with the uniform random shuffle in the emitter's position. Our conceptual simulations highlight a robust steering capability (18.60° / 10nm) and SLSR (-13.46 dB @ 0° / -8.27 dB @ +/-45°), and initial measurements demonstrate the steering capability (9.8 ° / 10nm, with smaller phase shifts design) and SLSR (-6.1dB @ -33.4°) from the preliminary fabrication.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
MFBind: a Multi-Fidelity Approach for Evaluating Drug Compounds in Practical Generative Modeling
Authors:
Peter Eckmann,
Dongxia Wu,
Germano Heinzelmann,
Michael K Gilson,
Rose Yu
Abstract:
Current generative models for drug discovery primarily use molecular docking to evaluate the quality of generated compounds. However, such models are often not useful in practice because even compounds with high docking scores do not consistently show experimental activity. More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but t…
▽ More
Current generative models for drug discovery primarily use molecular docking to evaluate the quality of generated compounds. However, such models are often not useful in practice because even compounds with high docking scores do not consistently show experimental activity. More accurate methods for activity prediction exist, such as molecular dynamics based binding free energy calculations, but they are too computationally expensive to use in a generative model. We propose a multi-fidelity approach, Multi-Fidelity Bind (MFBind), to achieve the optimal trade-off between accuracy and computational cost. MFBind integrates docking and binding free energy simulators to train a multi-fidelity deep surrogate model with active learning. Our deep surrogate model utilizes a pretraining technique and linear prediction heads to efficiently fit small amounts of high-fidelity data. We perform extensive experiments and show that MFBind (1) outperforms other state-of-the-art single and multi-fidelity baselines in surrogate modeling, and (2) boosts the performance of generative models with markedly higher quality compounds.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Nonlinear spiked covariance matrices and signal propagation in deep neural networks
Authors:
Zhichao Wang,
Denny Wu,
Zhou Fan
Abstract:
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimens…
▽ More
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Switch EMA: A Free Lunch for Better Flatness and Sharpness
Authors:
Siyuan Li,
Zicheng Liu,
Juanxi Tian,
Ge Wang,
Zedong Wang,
Weiyang Jin,
Di Wu,
Cheng Tan,
Tao Lin,
Yang Liu,
Baigui Sun,
Stan Z. Li
Abstract:
Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of…
▽ More
Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of modification, i.e., switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA). From both theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to reach generalization optima that better trade-off between flatness and sharpness. To verify the effectiveness of SEMA, we conduct comparison experiments with discriminative, generative, and regression tasks on vision and language datasets, including image classification, self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling. Comprehensive results with popular optimizers and networks show that SEMA is a free lunch for DNN training by improving performances and boosting convergence speeds.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Design of a W-band High-PAE Class A&AB Power Amplifier in 150nm GaAs Technology
Authors:
Jun Yan Leea,
Duo Wu,
Xuanrui Guoc,
Mohammad Mahdi Ariannejad,
Mohammad Arif Sobhan Bhuiyan,
Mahdi H. Miraz
Abstract:
Nanometer scale power amplifiers (PA) at sub-THz suffer from severe parasitic effects that lead to experience limited maximum frequency and reduced power performance at the device transceiver front end. The integrated circuits researchers proposed different PA design architecture combinations at scaled down technologies to overcome these limitations. Although the designs meet the minimum requireme…
▽ More
Nanometer scale power amplifiers (PA) at sub-THz suffer from severe parasitic effects that lead to experience limited maximum frequency and reduced power performance at the device transceiver front end. The integrated circuits researchers proposed different PA design architecture combinations at scaled down technologies to overcome these limitations. Although the designs meet the minimum requirements, the power added efficiency (PAE) of PA is still quite low. In this paper, a W-band single-ended common-source (CS) and cascode integrated 3-stage 2-way PA design is proposed. The design integrated different key design methodologies to mitigate the parasitic; such as combined Class AB and Class A stages for gain-boosting and efficiency enhancement, Wilkinson power combiner for higher output power, linearity, and bandwidth, and transmission line (TL)-based wide band matching network for better inter-stage matching and compact size. The proposed PA design is validated using UMS 150-nm GaAs pHEMT using advanced design system (ADS) simulator. The results show that the proposed PA achieved a gain of 20.1 dB, an output power of 17.2 dBm, a PAE of 33 % and a 21 GHz bandwidth at 90 GHz Sub-THz band. The PA layout consumes only 5.66 X 2.51 mm2 die space including pads. Our proposed PA design will boost the research on sub-THz integrated circuits research and will smooth the wide spread adoption of 6G in near future.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay
Authors:
Daya Bay Collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546…
▽ More
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes
Authors:
Dongxia Wu,
Tsuyoshi Idé,
Aurélie Lozano,
Georgios Kollias,
Jiří Navrátil,
Naoki Abe,
Yi-An Ma,
Rose Yu
Abstract:
We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature e…
▽ More
We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature either requires strong assumptions, such as linearity in the intensity function, or heuristically defined model parameters that do not necessarily meet the requirements of Granger causality. We propose Instance-wise Self-Attentive Hawkes Processes (ISAHP), a novel deep learning framework that can directly infer the Granger causality at the event instance level. ISAHP is the first neural point process model that meets the requirements of Granger causality. It leverages the self-attention mechanism of the transformer to align with the principles of Granger causality. We empirically demonstrate that ISAHP is capable of discovering complex instance-level causal structures that cannot be handled by classical models. We also show that ISAHP achieves state-of-the-art performance in proxy tasks involving type-level causal discovery and instance-level event type prediction.
△ Less
Submitted 29 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
The Virtues of Pessimism in Inverse Reinforcement Learning
Authors:
David Wu,
Gokul Swamy,
J. Andrew Bagnell,
Zhiwei Steven Wu,
Sanjiban Choudhury
Abstract:
Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations. However, it traditionally requires repeatedly solving a computationally expensive reinforcement learning (RL) problem in its inner loop. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. As an example, recent work resets th…
▽ More
Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations. However, it traditionally requires repeatedly solving a computationally expensive reinforcement learning (RL) problem in its inner loop. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. As an example, recent work resets the learner to expert states in order to inform the learner of high-reward expert states. However, such an approach is infeasible in the real world. In this work, we consider an alternative approach to speeding up the RL subroutine in IRL: \emph{pessimism}, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms. We formalize a connection between offline RL and IRL, enabling us to use an arbitrary offline RL algorithm to improve the sample efficiency of IRL. We validate our theory experimentally by demonstrating a strong correlation between the efficacy of an offline RL algorithm and how well it works as part of an IRL procedure. By using a strong offline RL algorithm as part of an IRL procedure, we are able to find policies that match expert performance significantly more efficiently than the prior art.
△ Less
Submitted 8 February, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Accelerating Inverse Reinforcement Learning with Expert Bootstrapping
Authors:
David Wu,
Sanjiban Choudhury
Abstract:
Existing inverse reinforcement learning methods (e.g. MaxEntIRL, $f$-IRL) search over candidate reward functions and solve a reinforcement learning problem in the inner loop. This creates a rather strange inversion where a harder problem, reinforcement learning, is in the inner loop of a presumably easier problem, imitation learning. In this work, we show that better utilization of expert demonstr…
▽ More
Existing inverse reinforcement learning methods (e.g. MaxEntIRL, $f$-IRL) search over candidate reward functions and solve a reinforcement learning problem in the inner loop. This creates a rather strange inversion where a harder problem, reinforcement learning, is in the inner loop of a presumably easier problem, imitation learning. In this work, we show that better utilization of expert demonstrations can reduce the need for hard exploration in the inner RL loop, hence accelerating learning. Specifically, we propose two simple recipes: (1) placing expert transitions into the replay buffer of the inner RL algorithm (e.g. Soft-Actor Critic) which directly informs the learner about high reward states instead of forcing the learner to discover them through extensive exploration, and (2) using expert actions in Q value bootstrapping in order to improve the target Q value estimates and more accurately describe high value expert states. Our methods show significant gains over a MaxEntIRL baseline on the benchmark MuJoCo suite of tasks, speeding up recovery to 70\% of deterministic expert performance by 2.13x on HalfCheetah-v2, 2.6x on Ant-v2, 18x on Hopper-v2, and 3.36x on Walker2d-v2.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
NetLLM: Adapting Large Language Models for Networking
Authors:
Duo Wu,
Xianda Wang,
Yaqi Qiao,
Zhi Wang,
Junchen Jiang,
Shuguang Cui,
Fangxin Wang
Abstract:
Many networking tasks now employ deep learning (DL) to solve complex prediction and system optimization problems. However, current design philosophy of DL-based algorithms entails intensive engineering overhead due to the manual design of deep neural networks (DNNs) for different networking tasks. Besides, DNNs tend to achieve poor generalization performance on unseen data distributions/environmen…
▽ More
Many networking tasks now employ deep learning (DL) to solve complex prediction and system optimization problems. However, current design philosophy of DL-based algorithms entails intensive engineering overhead due to the manual design of deep neural networks (DNNs) for different networking tasks. Besides, DNNs tend to achieve poor generalization performance on unseen data distributions/environments.
Motivated by the recent success of large language models (LLMs), for the first time, this work studies the LLM adaptation for networking to explore a more sustainable design philosophy. With the massive pre-trained knowledge and powerful inference ability, LLM can serve as the foundation model, and is expected to achieve "one model for all" with even better performance and stronger generalization for various tasks. In this paper, we present NetLLM, the first LLM adaptation framework that efficiently adapts LLMs to solve networking problems. NetLLM addresses many practical challenges in LLM adaptation, from how to process task-specific information with LLMs, to how to improve the efficiency of answer generation and acquiring domain knowledge for networking. Across three networking-related use cases - viewport prediction (VP), adaptive bitrate streaming (ABR) and cluster job scheduling (CJS), we demonstrate the effectiveness of NetLLM in LLM adaptation for networking, and showcase that the adapted LLM significantly outperforms state-of-the-art algorithms.
△ Less
Submitted 5 May, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Trust and ethical considerations in a multi-modal, explainable AI-driven chatbot tutoring system: The case of collaboratively solving Rubik's Cube
Authors:
Kausik Lakkaraju,
Vedant Khandelwal,
Biplav Srivastava,
Forest Agostinelli,
Hengtao Tang,
Prathamjeet Singh,
Dezhi Wu,
Matt Irvin,
Ashish Kundu
Abstract:
Artificial intelligence (AI) has the potential to transform education with its power of uncovering insights from massive data about student learning patterns. However, ethical and trustworthy concerns of AI have been raised but are unsolved. Prominent ethical issues in high school AI education include data privacy, information leakage, abusive language, and fairness. This paper describes technolog…
▽ More
Artificial intelligence (AI) has the potential to transform education with its power of uncovering insights from massive data about student learning patterns. However, ethical and trustworthy concerns of AI have been raised but are unsolved. Prominent ethical issues in high school AI education include data privacy, information leakage, abusive language, and fairness. This paper describes technological components that were built to address ethical and trustworthy concerns in a multi-modal collaborative platform (called ALLURE chatbot) for high school students to collaborate with AI to solve the Rubik's cube. In data privacy, we want to ensure that the informed consent of children, parents, and teachers, is at the center of any data that is managed. Since children are involved, language, whether textual, audio, or visual, is acceptable both from users and AI and the system can steer interaction away from dangerous situations. In information management, we also want to ensure that the system, while learning to improve over time, does not leak information about users from one group to another.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
A generalized essentially non-hourglass total Lagrangian SPH solid dynamics
Authors:
Dong Wu,
Xiaojing Tang,
Shuaihao Zhang,
Xiangyu Hu
Abstract:
In this paper, we tackle a persistent numerical instability within the total Lagrangian smoothed particle hydrodynamics (TLSPH) solid dynamics. Specifically, we address the hourglass modes that may grow and eventually deteriorate the reliability of simulation, particularly in the scenarios characterized by large deformations. We propose a generalized essentially non-hourglass formulation based on…
▽ More
In this paper, we tackle a persistent numerical instability within the total Lagrangian smoothed particle hydrodynamics (TLSPH) solid dynamics. Specifically, we address the hourglass modes that may grow and eventually deteriorate the reliability of simulation, particularly in the scenarios characterized by large deformations. We propose a generalized essentially non-hourglass formulation based on volumetric-deviatoric stress decomposition, offering a general solution for elasticity, plasticity, anisotropy, and other material models. Comparing the standard SPH formulation with the original non-nested Laplacian operator applied in our previous work \cite{wu2023essentially} to handle the hourglass issues in standard elasticity, we introduce a correction for the discretization of shear stress that relies on the discrepancy produced by a tracing-back prediction of the initial inter-particle direction from the current deformation gradient. The present formulation, when applied to standard elastic materials, is able to recover the original Laplacian operator. Due to the dimensionless nature of the correction, this formulation handles complex material models in a very straightforward way. Furthermore, a magnitude limiter is introduced to minimize the correction in domains where the discrepancy is less pronounced. The present formulation is validated, with a single set of modeling parameters, through a series of benchmark cases, confirming good stability and accuracy across elastic, plastic, and anisotropic materials. To showcase its potential, the formulation is employed to simulate a complex problem involving viscous plastic Oobleck material, contacts, and very large deformation.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Testing side-channel security of cryptographic implementations against future microarchitectures
Authors:
Gilles Barthe,
Marcel Böhme,
Sunjay Cauligi,
Chitchanok Chuengsatiansup,
Daniel Genkin,
Marco Guarnieri,
David Mateos Romero,
Peter Schwabe,
David Wu,
Yuval Yarom
Abstract:
How will future microarchitectures impact the security of existing cryptographic implementations? As we cannot keep reducing the size of transistors, chip vendors have started developing new microarchitectural optimizations to speed up computation. A recent study (Sanchez Vicarte et al., ISCA 2021) suggests that these optimizations might open the Pandora's box of microarchitectural attacks. Howeve…
▽ More
How will future microarchitectures impact the security of existing cryptographic implementations? As we cannot keep reducing the size of transistors, chip vendors have started developing new microarchitectural optimizations to speed up computation. A recent study (Sanchez Vicarte et al., ISCA 2021) suggests that these optimizations might open the Pandora's box of microarchitectural attacks. However, there is little guidance on how to evaluate the security impact of future optimization proposals.
To help chip vendors explore the impact of microarchitectural optimizations on cryptographic implementations, we develop (i) an expressive domain-specific language, called LmSpec, that allows them to specify the leakage model for the given optimization and (ii) a testing framework, called LmTest, to automatically detect leaks under the specified leakage model within the given implementation. Using this framework, we conduct an empirical study of 18 proposed microarchitectural optimizations on 25 implementations of eight cryptographic primitives in five popular libraries. We find that every implementation would contain secret-dependent leaks, sometimes sufficient to recover a victim's secret key, if these optimizations were realized. Ironically, some leaks are possible only because of coding idioms used to prevent leaks under the standard constant-time model.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Robust Path Planning via Learning from Demonstrations for Robotic Catheters in Deformable Environments
Authors:
Zhen Li,
Chiara Lambranzi,
Di Wu,
Alice Segato,
Federico De Marco,
Emmanuel Vander Poorten,
Jenny Dankelman,
Elena De Momi
Abstract:
Navigation through tortuous and deformable vessels using catheters with limited steering capability underscores the need for reliable path planning. State-of-the-art path planners do not fully account for the deformable nature of the environment. This work proposes a robust path planner via a learning from demonstrations method, named Curriculum Generative Adversarial Imitation Learning (C-GAIL).…
▽ More
Navigation through tortuous and deformable vessels using catheters with limited steering capability underscores the need for reliable path planning. State-of-the-art path planners do not fully account for the deformable nature of the environment. This work proposes a robust path planner via a learning from demonstrations method, named Curriculum Generative Adversarial Imitation Learning (C-GAIL). This path planning framework takes into account the interaction between steerable catheters and vessel walls and the deformable property of vessels. In-silico comparative experiments show that the proposed network achieves smaller targeting errors, and a higher success rate, compared to a state-of-the-art approach based on GAIL. The in-vitro validation experiments demonstrate that the path generated by the proposed C-GAIL path planner aligns better with the actual steering capability of the pneumatic artificial muscle-driven catheter utilized in this study. Therefore, the proposed approach can provide enhanced support to the user in navigating the catheter towards the target with greater precision, in contrast to the conventional centerline-following technique. The targeting and tracking errors are 1.26$\pm$0.55mm and 5.18$\pm$3.48mm, respectively. The proposed path planning framework exhibits superior performance in managing uncertainty associated with vessel deformation, thereby resulting in lower tracking errors.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Topological classes of thermodynamics of the static multi-charge AdS black holes in gauged supergravities: novel temperature-dependent thermodynamic topological phase transition
Authors:
Di Wu,
Shuang-Yong Gu,
Xiao-Dan Zhu,
Qing-Quan Jiang,
Shu-Zheng Yang
Abstract:
In this paper, we investigate, in the framework of the topological approach to black hole thermodynamics, using the generalized off-shell Helmholtz free energy, the topological numbers of the static multi-charge AdS black holes in four- and five-dimensional gauged supergravities. We find that the topological number of the static-charged AdS black holes in four-dimensional Kaluza-Klein (K-K) gauged…
▽ More
In this paper, we investigate, in the framework of the topological approach to black hole thermodynamics, using the generalized off-shell Helmholtz free energy, the topological numbers of the static multi-charge AdS black holes in four- and five-dimensional gauged supergravities. We find that the topological number of the static-charged AdS black holes in four-dimensional Kaluza-Klein (K-K) gauged supergravity theory is $W = 0$, while that of the static-charged AdS black holes in four-dimensional gauged $-iX^0X^1$-supergravity and STU gauged supergravity theories, and five-dimensional Einstein-Maxwell-dilaton-axion (EMDA) gauged supergravity and STU gauged supergravity, and five-dimensional static-charged AdS Horowitz-Sen black hole are both $W = 1$. Furthermore, we observe a novel temperature-dependent thermodynamic topological phase transition that can happen in the four-dimensional static-charged AdS black hole in EMDA gauged supergravity theory, the four-dimensional static-charged AdS Horowitz-Sen black hole, and the five-dimensional static-charged AdS black hole in K-K gauged supergravity theory. We believe that the novel temperature-dependent thermodynamic topological phase transition could help us better understand black hole thermodynamics and, further, shed new light on the fundamental nature of gauged supergravity theories.
△ Less
Submitted 28 June, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication
Authors:
Zikai Feng,
Di Wu,
Mengxing Huang,
Chau Yuen
Abstract:
In the multiple unmanned aerial vehicle (UAV)- assisted downlink communication, it is challenging for UAV base stations (UAV BSs) to realize trajectory design and resource assignment in unknown environments. The cooperation and competition between UAV BSs in the communication network leads to a Markov game problem. Multi-agent reinforcement learning is a significant solution for the above decision…
▽ More
In the multiple unmanned aerial vehicle (UAV)- assisted downlink communication, it is challenging for UAV base stations (UAV BSs) to realize trajectory design and resource assignment in unknown environments. The cooperation and competition between UAV BSs in the communication network leads to a Markov game problem. Multi-agent reinforcement learning is a significant solution for the above decision-making. However, there are still many common issues, such as the instability of the system and low utilization of historical data, that limit its application. In this paper, a novel graph-attention multi-agent trust region (GA-MATR) reinforcement learning framework is proposed to solve the multi-UAV assisted communication problem. Graph recurrent network is introduced to process and analyze complex topology of the communication network, so as to extract useful information and patterns from observational information. The attention mechanism provides additional weighting for conveyed information, so that the critic network can accurately evaluate the value of behavior for UAV BSs. This provides more reliable feedback signals and helps the actor network update the strategy more effectively. Ablation simulations indicate that the proposed approach attains improved convergence over the baselines. UAV BSs learn the optimal communication strategies to achieve their maximum cumulative rewards. Additionally, multi-agent trust region method with monotonic convergence provides an estimated Nash equilibrium for the multi-UAV assisted communication Markov game.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Analysis of Knowledge Tracing performance on synthesised student data
Authors:
Panagiotis Pagonis,
Kai Hartung,
Di Wu,
Munir Georges,
Sören Gröttrup
Abstract:
Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises i…
▽ More
Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises in benchmark datasets such as duplicate records. To resolve these problems, we simulated student data with three statistical strategies based on public datasets and tested their performance on two KT baselines. While we observe only minor performance improvement with additional synthetic data, our work shows that using only synthetic data for training can lead to similar performance as real data.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
Authors:
Yuqiang Sun,
Daoyuan Wu,
Yue Xue,
Han Liu,
Wei Ma,
Lyuye Zhang,
Miaolei Shi,
Yang Liu
Abstract:
Large language models (LLMs) have demonstrated significant potential for many downstream tasks, including those requiring human-level intelligence, such as vulnerability detection. However, recent attempts to use LLMs for vulnerability detection are still preliminary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability -- whether it originates from the mode…
▽ More
Large language models (LLMs) have demonstrated significant potential for many downstream tasks, including those requiring human-level intelligence, such as vulnerability detection. However, recent attempts to use LLMs for vulnerability detection are still preliminary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability -- whether it originates from the model itself or from external assistance, such as invoking tool support and retrieving vulnerability knowledge. In this paper, we aim to decouple LLMs' vulnerability reasoning capability from their other capabilities, including the ability to actively seek additional information (e.g., via function calling in SOTA models), adopt relevant vulnerability knowledge (e.g., via vector-based matching and retrieval), and follow instructions to output structured results. To this end, we propose a unified evaluation framework named LLM4Vuln, which separates LLMs' vulnerability reasoning from their other capabilities and evaluates how LLMs' vulnerability reasoning could be enhanced when combined with the enhancement of other capabilities. To demonstrate the effectiveness of LLM4Vuln, we have designed controlled experiments using 75 ground-truth smart contract vulnerabilities, which were extensively audited as high-risk on Code4rena from August to November 2023, and tested them in 4,950 different scenarios across three representative LLMs (GPT-4, Mixtral, and Code Llama). Our results not only reveal ten findings regarding the varying effects of knowledge enhancement, context supplementation, prompt schemes, and models but also enable us to identify 9 zero-day vulnerabilities in two pilot bug bounty programs with over 1,000 USD being awarded.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Stability of KdV solitons
Authors:
Derchyi Wu
Abstract:
We prove an orbital stability theorem of KdV $n$-solitons with explicit phase shifts in the soliton region with cones around the $x$-axis and lines determined by bound states of the KdV $n$-solitons removed.
We prove an orbital stability theorem of KdV $n$-solitons with explicit phase shifts in the soliton region with cones around the $x$-axis and lines determined by bound states of the KdV $n$-solitons removed.
△ Less
Submitted 31 March, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research
Authors:
Sicong Cao,
Xiaobing Sun,
Ratnadira Widyasari,
David Lo,
Xiaoxue Wu,
Lili Bo,
Jiale Zhang,
Bin Li,
Wei Liu,
Di Wu,
Yixin Chen
Abstract:
The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted…
▽ More
The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 63 papers across 21 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a roadmap highlighting potential opportunities we deemed appropriate and important for future work.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Probabilistic Mobility Load Balancing for Multi-band 5G and Beyond Networks
Authors:
Saria Al Lahham,
Di Wu,
Ekram Hossain,
Xue Liu,
Gregory Dudek
Abstract:
The ever-increasing demand for data services and the proliferation of user equipment (UE) have resulted in a significant rise in the volume of mobile traffic. Moreover, in multi-band networks, non-uniform traffic distribution among different operational bands can lead to congestion, which can adversely impact the user's quality of experience. Load balancing is a critical aspect of network optimiza…
▽ More
The ever-increasing demand for data services and the proliferation of user equipment (UE) have resulted in a significant rise in the volume of mobile traffic. Moreover, in multi-band networks, non-uniform traffic distribution among different operational bands can lead to congestion, which can adversely impact the user's quality of experience. Load balancing is a critical aspect of network optimization, where it ensures that the traffic is evenly distributed among different bands, avoiding congestion and ensuring better user experience. Traditional load balancing approaches rely only on the band channel quality as a load indicator and to move UEs between bands, which disregards the UE's demands and the band resource, and hence, leading to a suboptimal balancing and utilization of resources. To address this challenge, we propose an event-based algorithm, in which we model the load balancing problem as a multi-objective stochastic optimization, and assign UEs to bands in a probabilistic manner. The goal is to evenly distribute traffic across available bands according to their resources, while maintaining minimal number of inter-frequency handovers to avoid the signaling overhead and the interruption time. Simulation results show that the proposed algorithm enhances the network's performance and outperforms traditional load balancing approaches in terms of throughput and interruption time.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
The Effect of Human v/s Synthetic Test Data and Round-tripping on Assessment of Sentiment Analysis Systems for Bias
Authors:
Kausik Lakkaraju,
Aniket Gupta,
Biplav Srivastava,
Marco Valtorta,
Dezhi Wu
Abstract:
Sentiment Analysis Systems (SASs) are data-driven Artificial Intelligence (AI) systems that output polarity and emotional intensity when given a piece of text as input. Like other AIs, SASs are also known to have unstable behavior when subjected to changes in data which can make it problematic to trust out of concerns like bias when AI works with humans and data has protected attributes like gende…
▽ More
Sentiment Analysis Systems (SASs) are data-driven Artificial Intelligence (AI) systems that output polarity and emotional intensity when given a piece of text as input. Like other AIs, SASs are also known to have unstable behavior when subjected to changes in data which can make it problematic to trust out of concerns like bias when AI works with humans and data has protected attributes like gender, race, and age. Recently, an approach was introduced to assess SASs in a blackbox setting without training data or code, and rating them for bias using synthetic English data. We augment it by introducing two human-generated chatbot datasets and also consider a round-trip setting of translating the data from one language to the same through an intermediate language. We find that these settings show SASs performance in a more realistic light. Specifically, we find that rating SASs on the chatbot data showed more bias compared to the synthetic data, and round-tripping using Spanish and Danish as intermediate languages reduces the bias (up to 68% reduction) in human-generated data while, in synthetic data, it takes a surprising turn by increasing the bias! Our findings will help researchers and practitioners refine their SAS testing strategies and foster trust as SASs are considered part of more mission-critical applications for global use.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data
Authors:
Di Wu,
Shaomu Tan,
Yan Meng,
David Stap,
Christof Monz
Abstract:
Zero-shot translation aims to translate between language pairs not seen during training in Multilingual Machine Translation (MMT) and is largely considered an open problem. A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus. In this paper, we show that for an English-centric model, surprisingly large zero-shot improveme…
▽ More
Zero-shot translation aims to translate between language pairs not seen during training in Multilingual Machine Translation (MMT) and is largely considered an open problem. A common, albeit resource-consuming, solution is to add as many related translation directions as possible to the training corpus. In this paper, we show that for an English-centric model, surprisingly large zero-shot improvements can be achieved by simply fine-tuning with a very small amount of multi-parallel data. For example, on the EC30 dataset, we obtain up to +21.7 ChrF non-English overall improvements (870 directions) by using only 100 multi-parallel samples while preserving English-centric translation quality. When investigating the size effect of fine-tuning data and its transfer capabilities, we found that already a small, randomly sampled set of fine-tuning directions is sufficient to achieve comparable improvements. The resulting non-English performance is close to the complete translation upper bound. Even in a minimal setting -- fine-tuning with only one single sample -- the well-known off-target issue is almost completely resolved, explaining parts -- but not all -- of the observed improvements in translation quality.
△ Less
Submitted 26 February, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Exact Normal Modes of Quantum Plasmas
Authors:
Tian-Xing Hu,
Dong Wu,
Z. M. Sheng,
J. Zhang
Abstract:
The normal modes, i.e., the eigen solutions to the dispersion relation equation, are the most fundamental properties of a plasma, which also of key importance to many nonlinear effects such as parametric and two-plasmon decay, and Raman scattering. The real part indicates the intrinsic oscillation frequency while the imaginary part the Landau damping rate. In most of the literatures, the normal mo…
▽ More
The normal modes, i.e., the eigen solutions to the dispersion relation equation, are the most fundamental properties of a plasma, which also of key importance to many nonlinear effects such as parametric and two-plasmon decay, and Raman scattering. The real part indicates the intrinsic oscillation frequency while the imaginary part the Landau damping rate. In most of the literatures, the normal modes of quantum plasmas are obtained by means of small damping approximation (SDA), which is invalid for high-$k$ modes. In this paper, we solve the exact dispersion relations via the analytical continuation (AC) scheme, and, due to the multi-value nature of the Fermi-Dirac distribution, reformation of the complex Riemann surface is required. It is found that the change of the topological shape of the root locus in quantum plasmas is quite different from classical plasmas, in which both real and imaginary frequencies of high-$k$ modes increase with $k$ in a steeper way than the typical linear behaviour as appears in classical plasmas. As a result, the temporal evolution of a high-$k$ perturbation in quantum plasmas is dominated by the ballistic modes.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Validation of Classical Transport Cross Section for Ion-Ion Interactions Under Repulsive Yukawa Potential
Authors:
Tian-Xing Hu,
Dong Wu,
C. L. Lin,
Z. M. Sheng,
B. He,
J. Zhang
Abstract:
Value of cross section is a fundamental parameter to depict the transport of charged particles in matters. Due to masses of orders of magnitude higher than electrons and convenience of realistic calculation, the cross section of elastic nuclei-nuclei collision is usually treated via classical mechanics. The famous Bohr criterion was firstly proposed to judge whether the treatment via classical mec…
▽ More
Value of cross section is a fundamental parameter to depict the transport of charged particles in matters. Due to masses of orders of magnitude higher than electrons and convenience of realistic calculation, the cross section of elastic nuclei-nuclei collision is usually treated via classical mechanics. The famous Bohr criterion was firstly proposed to judge whether the treatment via classical mechanics is reliable or not. Later, Lindhard generalized the results of Coulomb to screening potentials. Considering the increasing importance of detailed ion-ion interactions under modern simulation codes in inertial confinement fusion (ICF) researches, the validation of classical transport cross section for ion-ion interactions in a big range of parameter space is certainly required. In this work, the transport cross sections via classical mechanics under repulsive Yukawa potential are compared with those via quantum mechanics. Differences of differential cross sections are found with respect to scattering angles and velocities. Our results generally indicate that the classical picture fails at the cases of both low and high velocities, which represent a significant extension of the famous Bohr criterion and its generalized variations. Furthermore, the precise validation zones of classical picture is also analysed in this work. This work is of significant importance for benchmarking the modern ion-kinetic simulation codes in ICF researches, concerning the stopping power of $α$ particles in DT fuels, ion-ion friction and viscous effects in the formation of kinetic shocks.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Improving automatic detection of driver fatigue and distraction using machine learning
Authors:
Dongjiang Wu
Abstract:
Changes and advances in information technology have played an important role in the development of intelligent vehicle systems in recent years. Driver fatigue and distracted driving are important factors in traffic accidents. Thus, onboard monitoring of driving behavior has become a crucial component of advanced driver assistance systems for intelligent vehicles. In this article, we present techni…
▽ More
Changes and advances in information technology have played an important role in the development of intelligent vehicle systems in recent years. Driver fatigue and distracted driving are important factors in traffic accidents. Thus, onboard monitoring of driving behavior has become a crucial component of advanced driver assistance systems for intelligent vehicles. In this article, we present techniques for simultaneously detecting fatigue and distracted driving behaviors using vision-based and machine learning-based approaches. In driving fatigue detection, we use facial alignment networks to identify facial feature points in the images, and calculate the distance of the facial feature points to detect the opening and closing of the eyes and mouth. Furthermore, we use a convolutional neural network (CNN) based on the MobileNet architecture to identify various distracted driving behaviors. Experiments are performed on a PC based setup with a webcam and results are demonstrated using public datasets as well as custom datasets created for training and testing. Compared to previous approaches, we build our own datasets and provide better results in terms of accuracy and computation time.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Kernel-based multi-marker tests of association based on the accelerated failure time model
Authors:
Chenxi Li,
Di Wu,
Qing Lu
Abstract:
Kernel-based multi-marker tests for survival outcomes use primarily the Cox model to adjust for covariates. The proportional hazards assumption made by the Cox model could be unrealistic, especially in the long-term follow-up. We develop a suite of novel multi-marker survival tests for genetic association based on the accelerated failure time model, which is a popular alternative to the Cox model…
▽ More
Kernel-based multi-marker tests for survival outcomes use primarily the Cox model to adjust for covariates. The proportional hazards assumption made by the Cox model could be unrealistic, especially in the long-term follow-up. We develop a suite of novel multi-marker survival tests for genetic association based on the accelerated failure time model, which is a popular alternative to the Cox model due to its direct physical interpretation. The tests are based on the asymptotic distributions of their test statistics and are thus computationally efficient. The association tests can account for the heterogeneity of genetic effects across sub-populations/individuals to increase the power. All the new tests can deal with competing risks and left truncation. Moreover, we develop small-sample corrections to the tests to improve their accuracy under small samples. Extensive numerical experiments show that the new tests perform very well in various scenarios. An application to a genetic dataset of Alzheimer's disease illustrates the tests' practical utility.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.