-
Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors
Authors:
Lei Cheng,
Arindam Sengupta,
Siyang Cao
Abstract:
Autonomous driving holds great promise in addressing traffic safety concerns by leveraging artificial intelligence and sensor technology. Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of…
▽ More
Autonomous driving holds great promise in addressing traffic safety concerns by leveraging artificial intelligence and sensor technology. Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of Multi-Object Tracking in autonomous driving systems. The proposed method leverages a Bi-directional Long Short-Term Memory network to incorporate long-term temporal information and improve motion prediction. An appearance feature model inspired by FaceNet is used to establish associations between objects across different frames, ensuring consistent tracking. A tri-output mechanism is employed, consisting of individual outputs for radar and camera sensors and a fusion output, to provide robustness against sensor failures and produce accurate tracking results. Through extensive evaluations of real-world datasets, our approach demonstrates remarkable improvements in tracking accuracy, ensuring reliable performance even in low-visibility scenarios.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
Authors:
Zhenyu Kuang,
Hongyang Zhang,
Lidong Cheng,
Yinhao Liu,
Yue Huang,
Xinghao Ding
Abstract:
Generalizable vehicle re-identification (ReID) aims to enable the well-trained model in diverse source domains to broadly adapt to unknown target domains without additional fine-tuning or retraining. However, it still faces the challenges of domain shift problem and has difficulty accurately generalizing to unknown target domains. This limitation occurs because the model relies heavily on primary…
▽ More
Generalizable vehicle re-identification (ReID) aims to enable the well-trained model in diverse source domains to broadly adapt to unknown target domains without additional fine-tuning or retraining. However, it still faces the challenges of domain shift problem and has difficulty accurately generalizing to unknown target domains. This limitation occurs because the model relies heavily on primary domain-invariant features in the training data and pays less attention to potentially valuable secondary features. To solve this complex and common problem, this paper proposes the two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method, which incorporates multiple experts with unique perspectives into Contrastive Language-Image Pretraining (CLIP) and fully leverages high-level semantic knowledge for comprehensive feature representation. Specifically, we propose to construct the learnable prompt set of all specific-perspective experts by adversarial learning in the latent space of visual features during the first stage of training. The learned prompt set with high-level semantics is then utilized to guide representation learning of the multi-level features for final knowledge fusion in the next stage. In this process of knowledge fusion, although multiple experts employ different assessment ways to examine the same vehicle, their common goal is to confirm the vehicle's true identity. Their collective decision can ensure the accuracy and consistency of the evaluation results. Furthermore, we design different image inputs for two-stage training, which include image component separation and diversity enhancement in order to extract the ID-related prompt representation and to obtain feature representation highlighted by all experts, respectively. Extensive experimental results demonstrate that our method achieves state-of-the-art recognition performance.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization
Authors:
Lei Cheng,
Teng Wang,
Lingquan Meng,
Changyin Sun
Abstract:
Cross-view geo-localization confronts significant challenges due to large perspective changes, especially when the ground-view query image has a limited field of view with unknown orientation. To bridge the cross-view domain gap, we for the first time explore to learn a BEV representation directly from the ground query image. However, the unknown orientation between ground and aerial images combin…
▽ More
Cross-view geo-localization confronts significant challenges due to large perspective changes, especially when the ground-view query image has a limited field of view with unknown orientation. To bridge the cross-view domain gap, we for the first time explore to learn a BEV representation directly from the ground query image. However, the unknown orientation between ground and aerial images combined with the absence of camera parameters led to ambiguity between BEV queries and ground references. To tackle this challenge, we propose a novel Window-to-Window BEV representation learning method, termed W2W-BEV, which adaptively matches BEV queries to ground reference at window-scale. Specifically, predefined BEV embeddings and extracted ground features are segmented into a fixed number of windows, and then most similar ground window is chosen for each BEV feature based on the context-aware window matching strategy. Subsequently, the cross-attention is performed between the matched BEV and ground windows to learn the robust BEV representation. Additionally, we use ground features along with predicted depth information to initialize the BEV embeddings, helping learn more powerful BEV representations. Extensive experimental results on benchmark datasets demonstrate significant superiority of our W2W-BEV over previous state-of-the-art methods under challenging conditions of unknown orientation and limited FoV. Specifically, on the CVUSA dataset with limited Fov of 90 degree and unknown orientation, the W2W-BEV achieve an significant improvement from 47.24% to 64.73 %(+17.49%) in R@1 accuracy.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Authors:
Teng Wang,
Lingquan Meng,
Lei Cheng,
Changyin Sun
Abstract:
Visual place recognition (VPR) remains challenging due to significant viewpoint changes and appearance variations. Mainstream works tackle these challenges by developing various feature aggregation methods to transform deep features into robust and compact global representations. Unfortunately, satisfactory results cannot be achieved under challenging conditions. We start from a new perspective an…
▽ More
Visual place recognition (VPR) remains challenging due to significant viewpoint changes and appearance variations. Mainstream works tackle these challenges by developing various feature aggregation methods to transform deep features into robust and compact global representations. Unfortunately, satisfactory results cannot be achieved under challenging conditions. We start from a new perspective and attempt to build a discriminative global representations by fusing image data and text descriptions of the the visual scene. The motivation is twofold: (1) Current Large Vision-Language Models (LVLMs) demonstrate extraordinary emergent capability in visual instruction following, and thus provide an efficient and flexible manner in generating text descriptions of images; (2) The text descriptions, which provide high-level scene understanding, show strong robustness against environment variations. Although promising, leveraging LVLMs to build multi-modal VPR solutions remains challenging in efficient multi-modal fusion. Furthermore, LVLMs will inevitably produces some inaccurate descriptions, making it even harder. To tackle these challenges, we propose a novel multi-modal VPR solution. It first adapts pre-trained visual and language foundation models to VPR for extracting image and text features, which are then fed into the feature combiner to enhance each other. As the main component, the feature combiner first propose a token-wise attention block to adaptively recalibrate text tokens according to their relevance to the image data, and then develop an efficient cross-attention fusion module to propagate information across different modalities. The enhanced multi-modal features are compressed into the feature descriptor for performing retrieval. Experimental results show that our method outperforms state-of-the-art methods by a large margin with significantly smaller image descriptor dimension.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Relativistic Exact Two-Component Coupled-Cluster Study of Molecular Sensitivity Factors for Nuclear Schiff Moments
Authors:
Tianxiang Chen,
Chaoqun Zhang,
Lan Cheng,
Kia Boon Ng,
Stephan Malbrunot-Ettenauer,
Victor V. Flambaum,
Zack Lasner,
John M. Doyle,
Phelan Yu,
Chandler J. Conn,
Chi Zhang,
Nicholas R. Hutzler,
Andrew M. Jayich,
Benjamin Augenbraun,
David Demille
Abstract:
Relativistic exact two-component coupled-cluster calculations of molecular sensitivity factors for nuclear Schiff moments (NSMs) are reported. We focus on molecules containing heavy nuclei, especially octupole-deformed nuclei. Analytic relativistic coupled-cluster gradient techniques are used and serve as useful tools for identifying candidate molecules that sensitively probe for physics beyond th…
▽ More
Relativistic exact two-component coupled-cluster calculations of molecular sensitivity factors for nuclear Schiff moments (NSMs) are reported. We focus on molecules containing heavy nuclei, especially octupole-deformed nuclei. Analytic relativistic coupled-cluster gradient techniques are used and serve as useful tools for identifying candidate molecules that sensitively probe for physics beyond the Standard Model in the hadronic sector. Notably, these tools enable straightforward ``black-box'' calculations. Two competing chemical mechanisms that contribute to the NSM are analyzed, illuminating the physics of ligand effects on NSM sensitivity factors.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
Authors:
Yuxuan Mu,
Xinxin Zuo,
Chuan Guo,
Yilin Wang,
Juwei Lu,
Xiaofeng Wu,
Songcen Xu,
Peng Dai,
Youliang Yan,
Li Cheng
Abstract:
We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an…
▽ More
We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach. Project page: $\href{https://yxmu.foo/GSD/}{\text{this https URL}}$
△ Less
Submitted 10 July, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Prediction-Free Coordinated Dispatch of Microgrid: A Data-Driven Online Optimization Approach
Authors:
Kaidi Huang,
Lin Cheng,
Ning Qi,
David Wenzhong Gao
Abstract:
The integration of renewable energy sources (RES) into microgrids poses challenges to reliable and economic operation due to the inherent uncertainty and volatility of RES. Contrary to the previous dispatch methods that require precise predictions of RES, this paper proposes a novel prediction-free and data-driven coordinated dispatch framework for reliable microgrid operations. In the offline sta…
▽ More
The integration of renewable energy sources (RES) into microgrids poses challenges to reliable and economic operation due to the inherent uncertainty and volatility of RES. Contrary to the previous dispatch methods that require precise predictions of RES, this paper proposes a novel prediction-free and data-driven coordinated dispatch framework for reliable microgrid operations. In the offline stage, ex-post optimal dispatch sequences are generated based on historical dispatch from "God' s-eye view". The sequences offer a global reference and are sequentially updated based on the newly observed data. Subsequently, we propose an adaptive virtual-queue-based online convex optimization (OCO) method to generate the real-time control policy of microgrid, which aim to minimize the instant operation cost while tracking the offline reference. We provide theoretical proof that the proposed method outperforms the existing OCO methods and admits sublinear dynamic regret bound and sublinear hard cumulative constraint violation bound for OCO with time-varying constraints. Case study illustrates that the proposed method outperforms state-of-the-art methods in terms of economic optimality, computational efficiency, and security.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Autonomous Ground Navigation in Highly Constrained Spaces: Lessons learned from The 3rd BARN Challenge at ICRA 2024
Authors:
Xuesu Xiao,
Zifan Xu,
Aniket Datar,
Garrett Warnell,
Peter Stone,
Joshua Julian Damanik,
Jaewon Jung,
Chala Adane Deresa,
Than Duc Huy,
Chen Jinyu,
Chen Yichen,
Joshua Adrian Cahyono,
Jingda Wu,
Longfei Mo,
Mingyang Lv,
Bowen Lan,
Qingyang Meng,
Weizhi Tao,
Li Cheng
Abstract:
The 3rd BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) in Yokohama, Japan and continued to evaluate the performance of state-of-the-art autonomous ground navigation systems in highly constrained environments. Similar to the trend in The 1st and 2nd BARN Challenge at ICRA 2022 and 2023 in Philadelphi…
▽ More
The 3rd BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) in Yokohama, Japan and continued to evaluate the performance of state-of-the-art autonomous ground navigation systems in highly constrained environments. Similar to the trend in The 1st and 2nd BARN Challenge at ICRA 2022 and 2023 in Philadelphia (North America) and London (Europe), The 3rd BARN Challenge in Yokohama (Asia) became more regional, i.e., mostly Asian teams participated. The size of the competition has slightly shrunk (six simulation teams, four of which were invited to the physical competition). The competition results, compared to last two years, suggest that the field has adopted new machine learning approaches while at the same time slightly converged to a few common practices. However, the regional nature of the physical participants suggests a challenge to promote wider participation all over the world and provide more resources to travel to the venue. In this article, we discuss the challenge, the approaches used by the three winning teams, and lessons learned to direct future research and competitions.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation
Authors:
Longchao Da,
Tiejin Chen,
Lu Cheng,
Hua Wei
Abstract:
The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should w…
▽ More
The Large language models (LLMs) have showcased superior capabilities in sophisticated tasks across various domains, stemming from basic question-answer (QA), they are nowadays used as decision assistants or explainers for unfamiliar content. However, they are not always correct due to the data sparsity in specific domain corpus, or the model's hallucination problems. Given this, how much should we trust the responses from LLMs? This paper presents a novel way to evaluate the uncertainty that captures the directional instability, by constructing a directional graph from entailment probabilities, and we innovatively conduct Random Walk Laplacian given the asymmetric property of a constructed directed graph, then the uncertainty is aggregated by the derived eigenvalues from the Laplacian process. We also provide a way to incorporate the existing work's semantics uncertainty with our proposed layer. Besides, this paper identifies the vagueness issues in the raw response set and proposes an augmentation approach to mitigate such a problem, we conducted extensive empirical experiments and demonstrated the superiority of our proposed solutions.
△ Less
Submitted 8 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
ConU: Conformal Uncertainty in Large Language Models with Correctness Coverage Guarantees
Authors:
Zhiyuan Wang,
Jinhao Duan,
Lu Cheng,
Yue Zhang,
Qingni Wang,
Hengtao Shen,
Xiaofeng Zhu,
Xiaoshuang Shi,
Kaidi Xu
Abstract:
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended…
▽ More
Uncertainty quantification (UQ) in natural language generation (NLG) tasks remains an open challenge, exacerbated by the intricate nature of the recent large language models (LLMs). This study investigates adapting conformal prediction (CP), which can convert any heuristic measure of uncertainty into rigorous theoretical guarantees by constructing prediction sets, for black-box LLMs in open-ended NLG tasks. We propose a sampling-based uncertainty measure leveraging self-consistency and develop a conformal uncertainty criterion by integrating the uncertainty condition aligned with correctness into the design of the CP algorithm. Experimental results indicate that our uncertainty measure generally surpasses prior state-of-the-art methods. Furthermore, we calibrate the prediction sets within the model's unfixed answer distribution and achieve strict control over the correctness coverage rate across 6 LLMs on 4 free-form NLG datasets, spanning general-purpose and medical domains, while the small average set size further highlights the efficiency of our method in providing trustworthy guarantees for practical open-ended NLG applications.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Formation Under Communication Constraints: Control Performance Meets Channel Capacity
Authors:
Yaru Chen,
Yirui Cong,
Xiangyun Zhou,
Long Cheng,
Xiangke Wang
Abstract:
In wireless communication-based formation control systems, the control performance is significantly impacted by the channel capacity of each communication link between agents. This relationship, however, remains under-investigated in the existing studies. To address this gap, the formation control problem of classical second-order multi-agent systems with bounded process noises was considered taki…
▽ More
In wireless communication-based formation control systems, the control performance is significantly impacted by the channel capacity of each communication link between agents. This relationship, however, remains under-investigated in the existing studies. To address this gap, the formation control problem of classical second-order multi-agent systems with bounded process noises was considered taking into account the channel capacity. More specifically, the model of communication links between agents is first established, based on a new concept -- guaranteed communication region, which characterizes all possible locations for successful message decoding in the present of control-system uncertainty. Furthermore, we rigorously prove that, the guaranteed communication region does not unboundedly increase with the transmission time, which indicates an important trade-off between the guaranteed communication region and the data rate. The fundamental limits of data rate for any desired accuracy are also obtained. Finally, the integrated design to achieve the desired formation accuracy is proposed, where an estimation-based controller and transmit power control strategy are developed.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Conformalized Link Prediction on Graph Neural Networks
Authors:
Tianyi Zhao,
Jian Kang,
Lu Cheng
Abstract:
Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty…
▽ More
Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty quantification approach to construct a predictive interval with a statistical guarantee for GNN-based link prediction. We term it as \textit{conformalized link prediction.} Our approach builds upon conformal prediction (CP), a framework that promises to construct statistically robust prediction sets or intervals. We first theoretically and empirically establish a permutation invariance condition for the application of CP in link prediction tasks, along with an exact test-time coverage. Leveraging the important structural information in graphs, we then identify a novel and crucial connection between a graph's adherence to the power law distribution and the efficiency of CP. This insight leads to the development of a simple yet effective sampling-based method to align the graph structure with a power law distribution prior to the standard CP procedure. Extensive experiments demonstrate that for conformalized link prediction, our approach achieves the desired marginal coverage while significantly improving the efficiency of CP compared to baseline methods.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
RACon: Retrieval-Augmented Simulated Character Locomotion Control
Authors:
Yuxuan Mu,
Shihao Zou,
Kangning Yin,
Zheng Tian,
Li Cheng,
Weinan Zhang,
Jun Wang
Abstract:
In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes…
▽ More
In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes a retriever and a motion controller. The retriever searches motion experts from a user-specified database in a task-oriented fashion, which boosts the responsiveness to the user's control. The selected motion experts and the manipulation signal are then transferred to the controller to drive the simulated character. In addition, a retrieval-augmented discriminator is designed to stabilize the training process. Our method surpasses existing techniques in both quality and quantity in locomotion control, as demonstrated in our empirical study. Moreover, by switching extensive databases for retrieval, it can adapt to distinctive motion types at run time.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Authors:
Jiangshu Du,
Yibo Wang,
Wenting Zhao,
Zhongfen Deng,
Shuaiqi Liu,
Renze Lou,
Henry Peng Zou,
Pranav Narayanan Venkit,
Nan Zhang,
Mukund Srinath,
Haoran Ranran Zhang,
Vipul Gupta,
Yinghui Li,
Tao Li,
Fei Wang,
Qin Liu,
Tianlin Liu,
Pengzhi Gao,
Congying Xia,
Chen Xing,
Jiayang Cheng,
Zhaowei Wang,
Ying Su,
Raj Sanjay Shah,
Ruohao Guo
, et al. (15 additional authors not shown)
Abstract:
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th…
▽ More
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?
This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.
△ Less
Submitted 25 June, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Stochastic Multi-objective Multi-trip AMR Routing Problem with Time Windows
Authors:
Lulu Cheng,
Ning Zhao
Abstract:
In recent years, with the rapidly aging population, alleviating the pressure on medical staff has become a critical issue. To improve the work efficiency of medical staff and reduce the risk of infection, we consider the multi-trip autonomous mobile robot (AMR) routing problem with the stochastic environment to find the solution to minimizing the total expected operating cost and maximizing the to…
▽ More
In recent years, with the rapidly aging population, alleviating the pressure on medical staff has become a critical issue. To improve the work efficiency of medical staff and reduce the risk of infection, we consider the multi-trip autonomous mobile robot (AMR) routing problem with the stochastic environment to find the solution to minimizing the total expected operating cost and maximizing the total service quality of patients so that each route violates the vehicle capacity and the time window with only a very small probability. The travel time of AMRs is stochastic affected by the surrounding environment, the demand for each ward is unknown until the AMR reaches the ward, and the service time is linearly related to the actual demand. We develop a population-based tabu search algorithm (PTS) that combines the genetic algorithm with the tabu search algorithm to solve the problem. Extensive numerical experiments were conducted on the modified Solomon instances to show that the PTS algorithm the efficient and reveals the impacts of the confidence level on the optimal solution, providing insights for the decision-maker to devise delivery schemes that trade-off the operating cost for patient satisfaction.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Composited-Nested-Learning with Data Augmentation for Nested Named Entity Recognition
Authors:
Xingming Liao,
Nankai Lin,
Haowen Li,
Lianglun Cheng,
Zhuowei Wang,
Chong Chen
Abstract:
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of…
▽ More
Nested Named Entity Recognition (NNER) focuses on addressing overlapped entity recognition. Compared to Flat Named Entity Recognition (FNER), annotated resources are scarce in the corpus for NNER. Data augmentation is an effective approach to address the insufficient annotated corpus. However, there is a significant lack of exploration in data augmentation methods for NNER. Due to the presence of nested entities in NNER, existing data augmentation methods cannot be directly applied to NNER tasks. Therefore, in this work, we focus on data augmentation for NNER and resort to more expressive structures, Composited-Nested-Label Classification (CNLC) in which constituents are combined by nested-word and nested-label, to model nested entities. The dataset is augmented using the Composited-Nested-Learning (CNL). In addition, we propose the Confidence Filtering Mechanism (CFM) for a more efficient selection of generated data. Experimental results demonstrate that this approach results in improvements in ACE2004 and ACE2005 and alleviates the impact of sample imbalance.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
PruningBench: A Comprehensive Benchmark of Structural Pruning
Authors:
Haoling Li,
Changhao Li,
Mengqi Xue,
Gongfan Fang,
Sheng Zhou,
Zunlei Feng,
Huiqiong Wang,
Yong Wang,
Lechao Cheng,
Mingli Song,
Jie Song
Abstract:
Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three c…
▽ More
Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three characteristics: 1) PruningBench employs a unified and consistent framework for evaluating the effectiveness of diverse structural pruning techniques; 2) PruningBench systematically evaluates 16 existing pruning methods, encompassing a wide array of models (e.g., CNNs and ViTs) and tasks (e.g., classification and detection); 3) PruningBench provides easily implementable interfaces to facilitate the implementation of future pruning methods, and enables the subsequent researchers to incorporate their work into our leaderboards. We provide an online pruning platform http://pruning.vipazoo.cn for customizing pruning tasks and reproducing all results in this paper. Codes will be made publicly on https://github.com/HollyLee2000/PruningBench.
△ Less
Submitted 28 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Obfuscating IoT Device Scanning Activity via Adversarial Example Generation
Authors:
Haocong Li,
Yaxin Zhang,
Long Cheng,
Wenjia Niu,
Haining Wang,
Qiang Li
Abstract:
Nowadays, attackers target Internet of Things (IoT) devices for security exploitation, and search engines for devices and services compromise user privacy, including IP addresses, open ports, device types, vendors, and products.Typically, application banners are used to recognize IoT device profiles during network measurement and reconnaissance. In this paper, we propose a novel approach to obfusc…
▽ More
Nowadays, attackers target Internet of Things (IoT) devices for security exploitation, and search engines for devices and services compromise user privacy, including IP addresses, open ports, device types, vendors, and products.Typically, application banners are used to recognize IoT device profiles during network measurement and reconnaissance. In this paper, we propose a novel approach to obfuscating IoT device banners (BANADV) based on adversarial examples. The key idea is to explore the susceptibility of fingerprinting techniques to a slight perturbation of an IoT device banner. By modifying device banners, BANADV disrupts the collection of IoT device profiles. To validate the efficacy of BANADV, we conduct a set of experiments. Our evaluation results show that adversarial examples can spoof state-of-the-art fingerprinting techniques, including learning- and matching-based approaches. We further provide a detailed analysis of the weakness of learning-based/matching-based fingerprints to carefully crafted samples. Overall, the innovations of BANADV lie in three aspects: (1) it utilizes an IoT-related semantic space and a visual similarity space to locate available manipulating perturbations of IoT banners; (2) it achieves at least 80\% success rate for spoofing IoT scanning techniques; and (3) it is the first to utilize adversarial examples of IoT banners in network measurement and reconnaissance.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Authors:
Yafeng Chen,
Siqi Zheng,
Hui Wang,
Luyao Cheng,
Qian Chen,
Shiliang Zhang,
Wen Wang
Abstract:
Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an…
▽ More
Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Self-Distillation Prototypes Network (SDPN), which effectively facilitates self-supervised speaker representation learning. SDPN assigns the representation of the augmented views of an utterance to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views. Originally, due to the lack of negative pairs in the SDPN training process, the network tends to align positive pairs very closely in the embedding space, a phenomenon known as model collapse. To alleviate this problem, we introduce a diversity regularization term to embeddings in SDPN. Comprehensive experiments on the VoxCeleb datasets demonstrate the superiority of SDPN in self-supervised speaker verification. SDPN sets a new state-of-the-art on the VoxCeleb1 speaker verification evaluation benchmark, achieving Equal Error Rate 1.80%, 1.99%, and 3.62% for trial VoxCeleb1-O, VoxCeleb1-E and VoxCeleb1-H respectively, without using any speaker labels in training.
△ Less
Submitted 25 June, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
A Large-scale Universal Evaluation Benchmark For Face Forgery Detection
Authors:
Yijun Bei,
Hengrui Lou,
Jinsong Geng,
Erteng Liu,
Lechao Cheng,
Jie Song,
Mingli Song,
Zunlei Feng
Abstract:
With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a si…
▽ More
With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a significant challenge. To address this, we have constructed a large-scale evaluation benchmark called DeepFaceGen, aimed at quantitatively assessing the effectiveness of face forgery detection and facilitating the iterative development of forgery detection technology. DeepFaceGen consists of 776,990 real face image/video samples and 773,812 face forgery image/video samples, generated using 34 mainstream face generation techniques. During the construction process, we carefully consider important factors such as content diversity, fairness across ethnicities, and availability of comprehensive labels, in order to ensure the versatility and convenience of DeepFaceGen. Subsequently, DeepFaceGen is employed in this study to evaluate and analyze the performance of 13 mainstream face forgery detection techniques from various perspectives. Through extensive experimental analysis, we derive significant findings and propose potential directions for future research. The code and dataset for DeepFaceGen are available at https://github.com/HengruiLou/DeepFaceGen.
△ Less
Submitted 13 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Vibrational Branching Ratios for Laser-Cooling of Nonlinear Strontium-Containing Molecules
Authors:
Alexander Frenett,
Zack Lasner,
Lan Cheng,
John M. Doyle
Abstract:
The vibrational branching ratios from the lowest excited electronic state for $\textrm{SrOCH}_3$, $\textrm{SrNH}_2$, and $\textrm{SrSH}$ are measured at the $< 0.1\%$ level. Spectra are obtained by driving the $\tilde{X} - \tilde{A}$ transitions and dispersing the fluorescence on a grating spectrometer. We also perform $\textit{ab initio}$ calculations for the energies of vibrational levels releva…
▽ More
The vibrational branching ratios from the lowest excited electronic state for $\textrm{SrOCH}_3$, $\textrm{SrNH}_2$, and $\textrm{SrSH}$ are measured at the $< 0.1\%$ level. Spectra are obtained by driving the $\tilde{X} - \tilde{A}$ transitions and dispersing the fluorescence on a grating spectrometer. We also perform $\textit{ab initio}$ calculations for the energies of vibrational levels relevant for laser cooling, as well as branching ratios to support the interpretations of all molecular spectra. Symmetry group analysis is applied in conjunction with our data to study rotational closure in these molecules. These analyses indicate favorable prospects for laser cooling $\textrm{SrNH}_2$ and other similar alkaline-earth(-like) amides for future beyond the Standard Model physics searches using polyatomic molecules with long-lived parity doublets.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges
Authors:
Usman Gohar,
Zeyu Tang,
Jialu Wang,
Kun Zhang,
Peter L. Spirtes,
Yang Liu,
Lu Cheng
Abstract:
The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairne…
▽ More
The widespread integration of Machine Learning systems in daily life, particularly in high-stakes domains, has raised concerns about the fairness implications. While prior works have investigated static fairness measures, recent studies reveal that automated decision-making has long-term implications and that off-the-shelf fairness approaches may not serve the purpose of achieving long-term fairness. Additionally, the existence of feedback loops and the interaction between models and the environment introduces additional complexities that may deviate from the initial fairness goals. In this survey, we review existing literature on long-term fairness from different perspectives and present a taxonomy for long-term fairness studies. We highlight key challenges and consider future research directions, analyzing both current issues and potential further explorations.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Authors:
Chengyuan Deng,
Yiqun Duan,
Xin Jin,
Heng Chang,
Yijun Tian,
Han Liu,
Henry Peng Zou,
Yiqiao Jin,
Yijia Xiao,
Yichen Wang,
Shenghao Wu,
Zongxing Xie,
Kuofeng Gao,
Sihong He,
Jun Zhuang,
Lu Cheng,
Haohan Wang
Abstract:
Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an…
▽ More
Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
A Quantum Neural Network-Based Approach to Power Quality Disturbances Detection and Recognition
Authors:
Guo-Dong Li,
Hai-Yan He,
Yue Li,
Xin-Hao Li,
Hao Liu,
Qing-Le Wang,
Long Cheng
Abstract:
Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks…
▽ More
Power quality disturbances (PQDs) significantly impact the stability and reliability of power systems, necessitating accurate and efficient detection and recognition methods. While numerous classical algorithms for PQDs detection and recognition have been extensively studied and applied, related work in the quantum domain is still in its infancy. In this paper, an improved quantum neural networks (QNN) model for PQDs detection and recognition is proposed. Specifically, the model constructs a quantum circuit comprising data qubits and ancilla qubits. Classical data is transformed into quantum data by embedding it into data qubits via the encoding layer. Subsequently, parametric quantum gates are utilized to form the variational layer, which facilitates qubit information transformation, thereby extracting essential feature information for detection and recognition. The expected value is obtained by measuring ancilla qubits, enabling the completion of disturbance classification based on this expected value. An analysis reveals that the runtime and space complexities of the QNN are $O\left ( poly\left ( N \right ) \right )$ and $O\left ( N \right )$, respectively. Extensive experiments validate the feasibility and superiority of the proposed model in PQD detection and recognition. The model achieves accuracies of 99.75\%, 97.85\% and 95.5\% in experiments involving the detection of disturbances, recognition of seven single disturbances, and recognition of ten mixed disturbances, respectively. Additionally, noise simulation and comparative experiments demonstrate that the proposed model exhibits robust anti-noise capabilities, requires few training parameters, and maintains high accuracy.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency
Authors:
Yafeng Chen,
Siqi Zheng,
Hui Wang,
Luyao Cheng,
Qian Chen,
Shiliang Zhang,
Junjie Li
Abstract:
Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion…
▽ More
Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature fusion approach has been proposed to effectively capture speaker characteristics from short utterances. Constrained by the model's size, a robust backbone Enhanced Res2Net (ERes2Net) combining global and local feature fusion demonstrates sub-optimal performance in short-duration speaker verification. To further improve the short-duration feature extraction capability of ERes2Net, we expand the channel dimension within each stage. However, this modification also increases the number of model parameters and computational complexity. To alleviate this problem, we propose an improved ERes2NetV2 by pruning redundant structures, ultimately reducing both the model parameters and its computational cost. A range of experiments conducted on the VoxCeleb datasets exhibits the superiority of ERes2NetV2, which achieves EER of 0.61% for the full-duration trial, 0.98% for the 3s-duration trial, and 1.48% for the 2s-duration trial on VoxCeleb1-O, respectively.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Two new proofs of partial Godbersen's Conjecture
Authors:
Lin Cheng
Abstract:
Two new proofs are provided, offering two new perspectives on Godbersen's conjecture. One of the proofs utilizes Helly's theorem to provide a concise and elegant proof of the inequality in Godbersen's conjecture. The other proof utilizes the Brunn-Minkowski inequality to provide a completely new proof of the inclusion $-K\subset nK$ for convex bodies $K$ with centroid at the origin, thereby provin…
▽ More
Two new proofs are provided, offering two new perspectives on Godbersen's conjecture. One of the proofs utilizes Helly's theorem to provide a concise and elegant proof of the inequality in Godbersen's conjecture. The other proof utilizes the Brunn-Minkowski inequality to provide a completely new proof of the inclusion $-K\subset nK$ for convex bodies $K$ with centroid at the origin, thereby proving Godbersen's conjecture.
△ Less
Submitted 5 June, 2024; v1 submitted 31 May, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
Towards Clinical AI Fairness: Filling Gaps in the Puzzle
Authors:
Mingxuan Liu,
Yilin Ning,
Salinelat Teixayavong,
Xiaoxuan Liu,
Mayli Mertens,
Yuqing Shang,
Xin Li,
Di Miao,
Jie Xu,
Daniel Shu Wei Ting,
Lionel Tim-Ee Cheng,
Jasmine Chiat Ling Ong,
Zhen Ling Teo,
Ting Fang Tan,
Narrendar RaviChandran,
Fei Wang,
Leo Anthony Celi,
Marcus Eng Hock Ong,
Nan Liu
Abstract:
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva…
▽ More
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical advancements and their practical clinical applications, resulting in a lack of contextualized discussion of AI fairness in clinical settings. Through a detailed evidence gap analysis, our review systematically pinpoints several deficiencies concerning both healthcare data and the provided AI fairness solutions. We highlight the scarcity of research on AI fairness in many medical domains where AI technology is increasingly utilized. Additionally, our analysis highlights a substantial reliance on group fairness, aiming to ensure equality among demographic groups from a macro healthcare system perspective; in contrast, individual fairness, focusing on equity at a more granular level, is frequently overlooked. To bridge these gaps, our review advances actionable strategies for both the healthcare and AI research communities. Beyond applying existing AI fairness methods in healthcare, we further emphasize the importance of involving healthcare professionals to refine AI fairness concepts and methods to ensure contextually relevant and ethically sound AI applications in healthcare.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation
Authors:
Yuting Ma,
Lechao Cheng,
Yaxiong Wang,
Zhun Zhong,
Xiaohua Xu,
Meng Wang
Abstract:
Federated learning (FL) is a popular privacy-preserving paradigm that enables distributed clients to collaboratively train models with a central server while keeping raw data locally. In practice, distinct model architectures, varying data distributions, and limited resources across local clients inevitably cause model performance degradation and a slowdown in convergence speed. However, existing…
▽ More
Federated learning (FL) is a popular privacy-preserving paradigm that enables distributed clients to collaboratively train models with a central server while keeping raw data locally. In practice, distinct model architectures, varying data distributions, and limited resources across local clients inevitably cause model performance degradation and a slowdown in convergence speed. However, existing FL methods can only solve some of the above heterogeneous challenges and have obvious performance limitations. Notably, a unified framework has not yet been explored to overcome these challenges. Accordingly, we propose FedHPL, a parameter-efficient unified $\textbf{Fed}$erated learning framework for $\textbf{H}$eterogeneous settings based on $\textbf{P}$rompt tuning and $\textbf{L}$ogit distillation. Specifically, we employ a local prompt tuning scheme that leverages a few learnable visual prompts to efficiently fine-tune the frozen pre-trained foundation model for downstream tasks, thereby accelerating training and improving model performance under limited local resources and data heterogeneity. Moreover, we design a global logit distillation scheme to handle the model heterogeneity and guide the local training. In detail, we leverage logits to implicitly capture local knowledge and design a weighted knowledge aggregation mechanism to generate global client-specific logits. We provide a theoretical guarantee on the generalization error bound for FedHPL. The experiments on various benchmark datasets under diverse settings of models and data demonstrate that our framework outperforms state-of-the-art FL approaches, with less computation overhead and training rounds.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
TEII: Think, Explain, Interact and Iterate with Large Language Models to Solve Cross-lingual Emotion Detection
Authors:
Long Cheng,
Qihao Shao,
Christine Zhao,
Sheng Bi,
Gina-Anne Levow
Abstract:
Cross-lingual emotion detection allows us to analyze global trends, public opinion, and social phenomena at scale. We participated in the Explainability of Cross-lingual Emotion Detection (EXALT) shared task, achieving an F1-score of 0.6046 on the evaluation set for the emotion detection sub-task. Our system outperformed the baseline by more than 0.16 F1-score absolute, and ranked second amongst c…
▽ More
Cross-lingual emotion detection allows us to analyze global trends, public opinion, and social phenomena at scale. We participated in the Explainability of Cross-lingual Emotion Detection (EXALT) shared task, achieving an F1-score of 0.6046 on the evaluation set for the emotion detection sub-task. Our system outperformed the baseline by more than 0.16 F1-score absolute, and ranked second amongst competing systems. We conducted experiments using fine-tuning, zero-shot learning, and few-shot learning for Large Language Model (LLM)-based models as well as embedding-based BiLSTM and KNN for non-LLM-based techniques. Additionally, we introduced two novel methods: the Multi-Iteration Agentic Workflow and the Multi-Binary-Classifier Agentic Workflow. We found that LLM-based approaches provided good performance on multilingual emotion detection. Furthermore, ensembles combining all our experimented models yielded higher F1-scores than any single approach alone.
△ Less
Submitted 2 July, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
Assessing Empathy in Large Language Models with Real-World Physician-Patient Interactions
Authors:
Man Luo,
Christopher J. Warren,
Lu Cheng,
Haidar M. Abdul-Muhsin,
Imon Banerjee
Abstract:
The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identifi…
▽ More
The integration of Large Language Models (LLMs) into the healthcare domain has the potential to significantly enhance patient care and support through the development of empathetic, patient-facing chatbots. This study investigates an intriguing question Can ChatGPT respond with a greater degree of empathy than those typically offered by physicians? To answer this question, we collect a de-identified dataset of patient messages and physician responses from Mayo Clinic and generate alternative replies using ChatGPT. Our analyses incorporate novel empathy ranking evaluation (EMRank) involving both automated metrics and human assessments to gauge the empathy level of responses. Our findings indicate that LLM-powered chatbots have the potential to surpass human physicians in delivering empathetic communication, suggesting a promising avenue for enhancing patient care and reducing professional burnout. The study not only highlights the importance of empathy in patient interactions but also proposes a set of effective automatic empathy ranking metrics, paving the way for the broader adoption of LLMs in healthcare.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Eidos: Efficient, Imperceptible Adversarial 3D Point Clouds
Authors:
Hanwei Zhang,
Luo Cheng,
Qisong He,
Wei Huang,
Renjue Li,
Ronan Sicre,
Xiaowei Huang,
Holger Hermanns,
Lijun Zhang
Abstract:
Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect tha…
▽ More
Classification of 3D point clouds is a challenging machine learning (ML) task with important real-world applications in a spectrum from autonomous driving and robot-assisted surgery to earth observation from low orbit. As with other ML tasks, classification models are notoriously brittle in the presence of adversarial attacks. These are rooted in imperceptible changes to inputs with the effect that a seemingly well-trained model ends up misclassifying the input. This paper adds to the understanding of adversarial attacks by presenting Eidos, a framework providing Efficient Imperceptible aDversarial attacks on 3D pOint cloudS. Eidos supports a diverse set of imperceptibility metrics. It employs an iterative, two-step procedure to identify optimal adversarial examples, thereby enabling a runtime-imperceptibility trade-off. We provide empirical evidence relative to several popular 3D point cloud classification models and several established 3D attack methods, showing Eidos' superiority with respect to efficiency as well as imperceptibility.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation
Authors:
Dingwen Zhang,
Hao Li,
Diqi He,
Nian Liu,
Lechao Cheng,
Jingdong Wang,
Junwei Han
Abstract:
In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from…
▽ More
In recent times, following the paradigm of DETR (DEtection TRansformer), query-based end-to-end instance segmentation (QEIS) methods have exhibited superior performance compared to CNN-based models, particularly when trained on large-scale datasets. Nevertheless, the effectiveness of these QEIS methods diminishes significantly when confronted with limited training data. This limitation arises from their reliance on substantial data volumes to effectively train the pivotal queries/kernels that are essential for acquiring localization and shape priors. To address this problem, we propose a novel method for unsupervised pre-training in low-data regimes. Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts (UPLVP), which improves QEIS models' instance segmentation by bringing language-vision prompts to queries/kernels. Our method consists of three parts: (1) Masks Proposal: Utilizes language-vision models to generate pseudo masks based on unlabeled images. (2) Prompt-Kernel Matching: Converts pseudo masks into prompts and injects the best-matched localization and shape features to their corresponding kernels. (3) Kernel Supervision: Formulates supervision for pre-training at the kernel level to ensure robust learning. With the help of our pre-training method, QEIS models can converge faster and perform better than CNN-based models in low-data regimes. Experimental evaluations conducted on MS COCO, Cityscapes, and CTW1500 datasets indicate that the QEIS models' performance can be significantly improved when pre-trained with our method. Code will be available at: https://github.com/lifuguan/UPLVP.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Understanding the ultraspherical spectral method
Authors:
Lu Cheng,
Kuan Xu
Abstract:
The ultraspherical spectral method features high accuracy and fast solution. In this article, we determine the sources of error arising from the ultraspherical spectral method and derive its effective condition number, which explains why its backward error is consistent with a numerical method with bounded condition number. In addition, we show the cause for the Cauchy error to go below the machin…
▽ More
The ultraspherical spectral method features high accuracy and fast solution. In this article, we determine the sources of error arising from the ultraspherical spectral method and derive its effective condition number, which explains why its backward error is consistent with a numerical method with bounded condition number. In addition, we show the cause for the Cauchy error to go below the machine epsilon and decay eventually to exact zero, revealing the fact that the Cauchy error can be misleading when used as an indicator of convergence and accuracy. The analysis in this work can be readily extended to other spectral methods, when applicable, and to the solution of PDEs.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Uncertainty-Aware PPG-2-ECG for Enhanced Cardiovascular Diagnosis using Diffusion Models
Authors:
Omer Belhasin,
Idan Kligvasser,
George Leifman,
Regev Cohen,
Erin Rainaldi,
Li-Fang Cheng,
Nishant Verma,
Paul Varghese,
Ehud Rivlin,
Michael Elad
Abstract:
Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography…
▽ More
Analyzing the cardiovascular system condition via Electrocardiography (ECG) is a common and highly effective approach, and it has been practiced and perfected over many decades. ECG sensing is non-invasive and relatively easy to acquire, and yet it is still cumbersome for holter monitoring tests that may span over hours and even days. A possible alternative in this context is Photoplethysmography (PPG): An optically-based signal that measures blood volume fluctuations, as typically sensed by conventional ``wearable devices''. While PPG presents clear advantages in acquisition, convenience, and cost-effectiveness, ECG provides more comprehensive information, allowing for a more precise detection of heart conditions. This implies that a conversion from PPG to ECG, as recently discussed in the literature, inherently involves an unavoidable level of uncertainty. In this paper we introduce a novel methodology for addressing the PPG-2-ECG conversion, and offer an enhanced classification of cardiovascular conditions using the given PPG, all while taking into account the uncertainties arising from the conversion process. We provide a mathematical justification for our proposed computational approach, and present empirical studies demonstrating its superior performance compared to state-of-the-art baseline methods.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Incorporating Physical Priors into Weakly-Supervised Anomaly Detection
Authors:
Chi Lung Cheng,
Gurpreet Singh,
Benjamin Nachman
Abstract:
We propose a new machine-learning-based anomaly detection strategy for comparing data with a background-only reference (a form of weak supervision). The sensitivity of previous strategies degrades significantly when the signal is too rare or there are many unhelpful features. Our Prior-Assisted Weak Supervision (PAWS) method incorporates information from a class of signal models in order to signif…
▽ More
We propose a new machine-learning-based anomaly detection strategy for comparing data with a background-only reference (a form of weak supervision). The sensitivity of previous strategies degrades significantly when the signal is too rare or there are many unhelpful features. Our Prior-Assisted Weak Supervision (PAWS) method incorporates information from a class of signal models in order to significantly enhance the search sensitivity of weakly supervised approaches. As long as the true signal is in the pre-specified class, PAWS matches the sensitivity of a dedicated, fully supervised method without specifying the exact parameters ahead of time. On the benchmark LHC Olympics anomaly detection dataset, our mix of semi-supervised and weakly supervised learning is able to extend the sensitivity over previous methods by a factor of 10 in cross section. Furthermore, if we add irrelevant (noise) dimensions to the inputs, classical methods degrade by another factor of 10 in cross section while PAWS remains insensitive to noise. This new approach could be applied in a number of scenarios and pushes the frontier of sensitivity between completely model-agnostic approaches and fully model-specific searches.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Self-Distillation Improves DNA Sequence Inference
Authors:
Tong Yu,
Lei Cheng,
Ruslan Khalitov,
Erland Brandser Olsson,
Zhirong Yang
Abstract:
Self-supervised pretraining (SSP) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSP approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics…
▽ More
Self-supervised pretraining (SSP) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSP approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics across multiple sequences. To overcome this challenge, we introduce an innovative deep neural network model, which incorporates collaborative learning between a `student' and a `teacher' subnetwork. In this model, the student subnetwork employs masked learning on nucleotides and progressively adapts its parameters to the teacher subnetwork through an exponential moving average approach. Concurrently, both subnetworks engage in contrastive learning, deriving insights from two augmented representations of the input sequences. This self-distillation process enables our model to effectively assimilate both contextual information from individual sequences and distributional data across the sequence population. We validated our approach with preliminary pretraining using the human reference genome, followed by applying it to 20 downstream inference tasks. The empirical results from these experiments demonstrate that our novel method significantly boosts inference performance across the majority of these tasks. Our code is available at https://github.com/wiedersehne/FinDNA.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
AI-Cybersecurity Education Through Designing AI-based Cyberharassment Detection Lab
Authors:
Ebuka Okpala,
Nishant Vishwamitra,
Keyan Guo,
Song Liao,
Long Cheng,
Hongxin Hu,
Yongkai Wu,
Xiaohong Yuan,
Jeannette Wade,
Sajad Khorsandroo
Abstract:
Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential…
▽ More
Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential learning educational materials that engage students in this emerging social cybersecurity in the era of AI. Experiential learning opportunities are usually provided through capstone projects and engineering design courses in STEM programs such as computer science. While capstone projects are an excellent example of experiential learning, given the interdisciplinary nature of this emerging social cybersecurity problem, it can be challenging to use them to engage non-computing students without prior knowledge of AI. Because of this, we were motivated to develop a hands-on lab platform that provided experiential learning experiences to non-computing students with little or no background knowledge in AI and discussed the lessons learned in developing this lab. In this lab used by social science students at North Carolina A&T State University across two semesters (spring and fall) in 2022, students are given a detailed lab manual and are to complete a set of well-detailed tasks. Through this process, students learn AI concepts and the application of AI for cyberharassment detection. Using pre- and post-surveys, we asked students to rate their knowledge or skills in AI and their understanding of the concepts learned. The results revealed that the students moderately understood the concepts of AI and cyberharassment.
△ Less
Submitted 16 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
CTRL: Continuous-Time Representation Learning on Temporal Heterogeneous Information Network
Authors:
Chenglin Li,
Yuanzhen Xie,
Chenyun Yu,
Lei Cheng,
Bo Hu,
Zang Li,
Di Niu
Abstract:
Inductive representation learning on temporal heterogeneous graphs is crucial for scalable deep learning on heterogeneous information networks (HINs) which are time-varying, such as citation networks. However, most existing approaches are not inductive and thus cannot handle new nodes or edges. Moreover, previous temporal graph embedding methods are often trained with the temporal link prediction…
▽ More
Inductive representation learning on temporal heterogeneous graphs is crucial for scalable deep learning on heterogeneous information networks (HINs) which are time-varying, such as citation networks. However, most existing approaches are not inductive and thus cannot handle new nodes or edges. Moreover, previous temporal graph embedding methods are often trained with the temporal link prediction task to simulate the link formation process of temporal graphs, while ignoring the evolution of high-order topological structures on temporal graphs. To fill these gaps, we propose a Continuous-Time Representation Learning (CTRL) model on temporal HINs. To preserve heterogeneous node features and temporal structures, CTRL integrates three parts in a single layer, they are 1) a \emph{heterogeneous attention} unit that measures the semantic correlation between nodes, 2) a \emph{edge-based Hawkes process} to capture temporal influence between heterogeneous nodes, and 3) \emph{dynamic centrality} that indicates the dynamic importance of a node. We train the CTRL model with a future event (a subgraph) prediction task to capture the evolution of the high-order network structure. Extensive experiments have been conducted on three benchmark datasets. The results demonstrate that our model significantly boosts performance and outperforms various state-of-the-art approaches. Ablation studies are conducted to demonstrate the effectiveness of the model design.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Resource-Efficient and Self-Adaptive Quantum Search in a Quantum-Classical Hybrid System
Authors:
Zihao Jiang,
Zefan Du,
Shaolun Ruan,
Juntao Chen,
Yong Wang,
Long Cheng,
Rajkumar Buyya,
Ying Mao
Abstract:
Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to…
▽ More
Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to break limits. Major companies like IBM, Google, and Microsoft provide access to noisy intermediate-scale quantum (NISQ) computers. Despite the theoretical promise of Shor's and Grover's algorithms, practical implementation on current quantum devices faces challenges, such as demanding additional resources and a high number of controlled operations. To tackle these challenges and optimize the utilization of limited onboard qubits, we introduce ReSaQuS, a resource-efficient index-value searching system within a quantum-classical hybrid framework. Building on Grover's algorithm, ReSaQuS employs an automatically managed iterative search approach. This method analyzes problem size, filters fewer probable data points, and progressively reduces the dataset with decreasing qubit requirements. Implemented using Qiskit and evaluated through extensive experiments, ReSaQuS has demonstrated a substantial reduction, up to 86.36\% in cumulative qubit consumption and 72.72\% in active periods, reinforcing its potential in optimizing quantum computing application deployment.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
A new computational framework for spinor-based relativistic exact two-component calculations using contracted basis functions
Authors:
Chaoqun Zhang,
Kirk A. Peterson,
Kenneth G. Dyall,
Lan Cheng
Abstract:
A new computational framework for spinor-based relativistic exact two-component (X2C) calculations is developed using contracted basis sets with a spin-orbit contraction scheme. Generally contracted j-adapted basis sets using primitive functions in the correlation-consistent basis sets are constructed for the X2C Hamiltonian with atomic mean-field spin-orbit integrals (the X2CAMF scheme). The cont…
▽ More
A new computational framework for spinor-based relativistic exact two-component (X2C) calculations is developed using contracted basis sets with a spin-orbit contraction scheme. Generally contracted j-adapted basis sets using primitive functions in the correlation-consistent basis sets are constructed for the X2C Hamiltonian with atomic mean-field spin-orbit integrals (the X2CAMF scheme). The contraction coefficients are taken from atomic X2CAMF Hartree-Fock spinors, hereby following the simple concept of linear combination of atomic orbitals (LCAOs). Benchmark calculations of spin-orbit splittings, equilibrium bond lengths, and harmonic vibrational frequencies demonstrate the accuracy and efficacy of the j-adapted spin-orbit contraction scheme.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Physics-informed Data-driven Cavitation Model for a Specific MG EOS
Authors:
Minsheng Huang,
Chengbao Yao,
Pan Wang,
Lidong Cheng,
Wenjun Ying
Abstract:
We present a novel one-fluid cavitation model of a specific Mie-Grüneisen equation of state(EOS), named polynomial EOS, based on an artificial neural network. Not only the physics-informed equation but also the experimental data are embedded into the proposed model by an optimization problem. The physics-informed data-driven model provides the concerned pressure within the cavitation region, where…
▽ More
We present a novel one-fluid cavitation model of a specific Mie-Grüneisen equation of state(EOS), named polynomial EOS, based on an artificial neural network. Not only the physics-informed equation but also the experimental data are embedded into the proposed model by an optimization problem. The physics-informed data-driven model provides the concerned pressure within the cavitation region, where the density tends to zero when the pressure falls below the saturated pressure. The present model is then applied to computing the challenging compressible multi-phase flow simulation, such as nuclear and underwater explosions. Numerical simulations show that our model in application agrees well with the corresponding experimental data, ranging from one dimension to three dimensions with the $h-$adaptive mesh refinement algorithm and load balance techniques in the structured and unstructured grid.
△ Less
Submitted 5 April, 2024;
originally announced May 2024.
-
Towards Robust Recommendation: A Review and an Adversarial Robustness Evaluation Library
Authors:
Lei Cheng,
Xiaowen Huang,
Jitao Sang,
Jian Yu
Abstract:
Recently, recommender system has achieved significant success. However, due to the openness of recommender systems, they remain vulnerable to malicious attacks. Additionally, natural noise in training data and issues such as data sparsity can also degrade the performance of recommender systems. Therefore, enhancing the robustness of recommender systems has become an increasingly important research…
▽ More
Recently, recommender system has achieved significant success. However, due to the openness of recommender systems, they remain vulnerable to malicious attacks. Additionally, natural noise in training data and issues such as data sparsity can also degrade the performance of recommender systems. Therefore, enhancing the robustness of recommender systems has become an increasingly important research topic. In this survey, we provide a comprehensive overview of the robustness of recommender systems. Based on our investigation, we categorize the robustness of recommender systems into adversarial robustness and non-adversarial robustness. In the adversarial robustness, we introduce the fundamental principles and classical methods of recommender system adversarial attacks and defenses. In the non-adversarial robustness, we analyze non-adversarial robustness from the perspectives of data sparsity, natural noise, and data imbalance. Additionally, we summarize commonly used datasets and evaluation metrics for evaluating the robustness of recommender systems. Finally, we also discuss the current challenges in the field of recommender system robustness and potential future research directions. Additionally, to facilitate fair and efficient evaluation of attack and defense methods in adversarial robustness, we propose an adversarial robustness evaluation library--ShillingREC, and we conduct evaluations of basic attack models and recommendation models. ShillingREC project is released at https://github.com/chengleileilei/ShillingREC.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Joint calibration to SPX and VIX Derivative Markets with Composite Change of Time Models
Authors:
Liexin Cheng,
Xue Cheng,
Xianhua Peng
Abstract:
The Chicago Board Options Exchange Volatility Index (VIX) is calculated from SPX options and derivatives of VIX are also traded in market, which leads to the so-called "consistent modeling" problem. This paper proposes a time-changed Lévy model for log price with a composite change of time structure to capture both features of the implied SPX volatility and the implied volatility of volatility. Co…
▽ More
The Chicago Board Options Exchange Volatility Index (VIX) is calculated from SPX options and derivatives of VIX are also traded in market, which leads to the so-called "consistent modeling" problem. This paper proposes a time-changed Lévy model for log price with a composite change of time structure to capture both features of the implied SPX volatility and the implied volatility of volatility. Consistent modeling is achieved naturally via flexible choices of jumps and leverage effects, as well as the composition of time changes. Many celebrated models are covered as special cases. From this model, we derive an explicit form of the characteristic function for the asset price (SPX) and the pricing formula for European options as well as VIX options. The empirical results indicate great competence of the proposed model in the problem of joint calibration of the SPX/VIX Markets.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
A Multi-Faceted Evaluation Framework for Assessing Synthetic Data Generated by Large Language Models
Authors:
Yefeng Yuan,
Yuhong Liu,
Liang Cheng
Abstract:
The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absenc…
▽ More
The rapid advancements in generative AI and large language models (LLMs) have opened up new avenues for producing synthetic data, particularly in the realm of structured tabular formats, such as product reviews. Despite the potential benefits, concerns regarding privacy leakage have surfaced, especially when personal information is utilized in the training datasets. In addition, there is an absence of a comprehensive evaluation framework capable of quantitatively measuring the quality of the generated synthetic data and their utility for downstream tasks. In response to this gap, we introduce SynEval, an open-source evaluation framework designed to assess the fidelity, utility, and privacy preservation of synthetically generated tabular data via a suite of diverse evaluation metrics. We validate the efficacy of our proposed framework - SynEval - by applying it to synthetic product review data generated by three state-of-the-art LLMs: ChatGPT, Claude, and Llama. Our experimental findings illuminate the trade-offs between various evaluation metrics in the context of synthetic data generation. Furthermore, SynEval stands as a critical instrument for researchers and practitioners engaged with synthetic tabular data,, empowering them to judiciously determine the suitability of the generated data for their specific applications, with an emphasis on upholding user privacy.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Angle-Resolved Magneto-Chiral Anisotropy in a Non-Centrosymmetric Atomic Layer Superlattice
Authors:
Long Cheng,
Mingrui Bao,
Jingxian Zhang,
Xue Zhang,
Qun Yang,
Qiang Li,
Hui Cao,
Dawei Qiu,
Jia Liu,
Fei Ye,
Qing Wang,
Genhao Liang,
Hui Li,
Guanglei Cheng,
Hua Zhou,
Jian-Min Zuo,
Xiaodong Zhou,
Jian Shen,
Zhifeng Zhu,
Sai Mu,
Wenbo Wang,
Xiaofang Zhai
Abstract:
Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for developing chiral materials and devices for electronic integration. Here we demonstrate an angle-…
▽ More
Chirality in solid-state materials has sparked significant interest due to potential applications of topologically-protected chiral states in next-generation information technology. The electrical magneto-chiral effect (eMChE), arising from relativistic spin-orbit interactions, shows great promise for developing chiral materials and devices for electronic integration. Here we demonstrate an angle-resolved eMChE in an A-B-C-C type atomic-layer superlattice lacking time and space inversion symmetry. We observe non-superimposable enantiomers of left-handed and right-handed tilted uniaxial magnetic anisotropy as the sample rotates under static fields, with the tilting angle reaching a striking 45 degree. Magnetic force microscopy and atomistic simulations correlate the tilt to the emergence and evolution of chiral spin textures. The Dzyaloshinskii-Moriya interaction lock effect in competition with Zeeman effect is demonstrated to be responsible for the angle-resolved eMChE. Our findings open up a new horizon for engineering angle-resolved magneto-chiral anisotropy, shedding light on the development of novel angle-resolved sensing or writing techniques in chiral spintronics.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior
Authors:
Zhiheng Liu,
Hao Ouyang,
Qiuyu Wang,
Ka Leong Cheng,
Jie Xiao,
Kai Zhu,
Nan Xue,
Yu Liu,
Yujun Shen,
Yang Cao
Abstract:
3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant proper…
▽ More
3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field
Authors:
Jiyang Li,
Lechao Cheng,
Zhangye Wang,
Tingting Mu,
Jingxuan He
Abstract:
Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatt…
▽ More
Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. The project is available at https://pokerlishao.github.io/LoopGaussian/.
△ Less
Submitted 16 April, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Problem-Driven Scenario Reduction Framework for Power System Stochastic Operation
Authors:
Yingrui Zhuang,
Lin Cheng,
Ning Qi,
Mads R. Almassalkhi,
Feng Liu
Abstract:
Scenario reduction (SR) aims to identify a small yet representative scenario set to depict the underlying uncertainty, which is critical to scenario-based stochastic optimization (SBSO) of power systems. Existing SR techniques commonly aim to achieve statistical approximation to the original scenario set. However, SR and SBSO are commonly considered into two distinct and decoupled processes, which…
▽ More
Scenario reduction (SR) aims to identify a small yet representative scenario set to depict the underlying uncertainty, which is critical to scenario-based stochastic optimization (SBSO) of power systems. Existing SR techniques commonly aim to achieve statistical approximation to the original scenario set. However, SR and SBSO are commonly considered into two distinct and decoupled processes, which cannot guarantee a superior approximation of the original optimality. Instead, this paper incorporates the SBSO problem structure into the SR process and introduces a novel problem-driven scenario reduction framework. Specifically, we transform the original scenario set in distribution space into the decision applicability between scenarios in problem space. Subsequently, the SR process, embedded by a distinctive problem-driven distance metric, is rendered as a mixed-integer linear programming formulation to obtain the representative scenario set while minimizing the optimality gap. Furthermore, ex-ante and ex-post problem-driven evaluation indices are proposed to evaluate the performance of SR. A two-stage stochastic economic dispatch problem with renewable generation and energy storage validates the effectiveness of the proposed framework. Numerical experiments demonstrate that the proposed framework significantly outperforms existing SR methods by identifying salient (e.g., worst-case) scenarios, and achieving an optimality gap of less than 0.1% within acceptable computation time.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
3D Branch Point Cloud Completion for Robotic Pruning in Apple Orchards
Authors:
Tian Qiu,
Alan Zoubi,
Nikolai Spine,
Lailiang Cheng,
Yu Jiang
Abstract:
Robotic branch pruning is a significantly growing research area to cope with the shortage of labor force in the context of agriculture. One fundamental requirement in robotic pruning is the perception of detailed geometry and topology of branches. However, the point clouds obtained in agricultural settings often exhibit incompleteness due to several constraints, thereby restricting the accuracy of…
▽ More
Robotic branch pruning is a significantly growing research area to cope with the shortage of labor force in the context of agriculture. One fundamental requirement in robotic pruning is the perception of detailed geometry and topology of branches. However, the point clouds obtained in agricultural settings often exhibit incompleteness due to several constraints, thereby restricting the accuracy of downstream robotic pruning. In this work, we addressed the issue of point cloud quality through a simulation-based deep neural network, leveraging a Real-to-Simulation (Real2Sim) data generation pipeline that not only eliminates the need for manual parameterization but also guarantees the realism of simulated data. The simulation-based neural network was applied to jointly perform point cloud completion and skeletonization on real-world partial branches, without additional real-world training. The Sim2Real qualitative completion and skeletonization results showed the model's remarkable capability for geometry reconstruction and topology prediction. Additionally, we quantitatively evaluated the Sim2Real performance by comparing branch-level trait characterization errors using raw incomplete data and complete data. The Mean Absolute Error (MAE) reduced by 75% and 8% for branch diameter and branch angle estimation, respectively, using the best complete data, which indicates the effectiveness of the Real2Sim data in a zero-shot generalization setting. The characterization improvements contributed to the precision and efficacy of robotic branch pruning.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.