-
Towards stable training of parallel continual learning
Authors:
Li Yuepan,
Fan Lyu,
Yuyang Li,
Wei Feng,
Guangcan Liu,
Fanhua Shang
Abstract:
Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultane…
▽ More
Parallel Continual Learning (PCL) tasks investigate the training methods for continual learning with multi-source input, where data from different tasks are learned as they arrive. PCL offers high training efficiency and is well-suited for complex multi-source data systems, such as autonomous vehicles equipped with multiple sensors. However, at any time, multiple tasks need to be trained simultaneously, leading to severe training instability in PCL. This instability manifests during both forward and backward propagation, where features are entangled and gradients are conflict. This paper introduces Stable Parallel Continual Learning (SPCL), a novel approach that enhances the training stability of PCL for both forward and backward propagation. For the forward propagation, we apply Doubly-block Toeplit (DBT) Matrix based orthogonality constraints to network parameters to ensure stable and consistent propagation. For the backward propagation, we employ orthogonal decomposition for gradient management stabilizes backpropagation and mitigates gradient conflicts across tasks. By optimizing gradients by ensuring orthogonality and minimizing the condition number, SPCL effectively stabilizing the gradient descent in complex optimization tasks. Experimental results demonstrate that SPCL outperforms state-of-the-art methjods and achieve better training stability.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning
Authors:
Jingshen Zhang,
Xinying Qiu,
Teng Shen,
Wenyu Wang,
Kailin Zhang,
Wenhe Feng
Abstract:
Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between wor…
▽ More
Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between word embeddings. To address this limitation, we propose incorporating contrastive learning into the BiLSTM-based encoder-decoder framework. Our approach introduces a multi-view negative sampling strategy to learn the differences between word pairs in the shared cross-lingual embedding space. We evaluate our model on five bilingual aligned datasets spanning four ASEAN languages: Lao, Vietnamese, Thai, and Indonesian. Experimental results demonstrate that integrating contrastive learning consistently improves word alignment accuracy across all datasets, confirming the effectiveness of the proposed method in low-resource scenarios. We will release our data set and code to support future research on ASEAN or more low-resource word alignment.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Rapid Mixing via Coupling Independence for Spin Systems with Unbounded Degree
Authors:
Xiaoyu Chen,
Weiming Feng
Abstract:
We develop a new framework to prove the mixing or relaxation time for the Glauber dynamics on spin systems with unbounded degree. It works for general spin systems including both $2$-spin and multi-spin systems. As applications for this approach:
$\bullet$ We prove the optimal $O(n)$ relaxation time for the Glauber dynamics of random $q$-list-coloring on an $n$-vertices triangle-tree graph with…
▽ More
We develop a new framework to prove the mixing or relaxation time for the Glauber dynamics on spin systems with unbounded degree. It works for general spin systems including both $2$-spin and multi-spin systems. As applications for this approach:
$\bullet$ We prove the optimal $O(n)$ relaxation time for the Glauber dynamics of random $q$-list-coloring on an $n$-vertices triangle-tree graph with maximum degree $Δ$ such that $q/Δ> α^\star$, where $α^\star \approx 1.763$ is the unique positive solution of the equation $α= \exp(1/α)$. This improves the $n^{1+o(1)}$ relaxation time for Glauber dynamics obtained by the previous work of Jain, Pham, and Vuong (2022). Besides, our framework can also give a near-linear time sampling algorithm under the same condition.
$\bullet$ We prove the optimal $O(n)$ relaxation time and near-optimal $\widetilde{O}(n)$ mixing time for the Glauber dynamics on hardcore models with parameter $λ$ in $\textit{balanced}$ bipartite graphs such that $λ< λ_c(Δ_L)$ for the max degree $Δ_L$ in left part and the max degree $Δ_R$ of right part satisfies $Δ_R = O(Δ_L)$. This improves the previous result by Chen, Liu, and Yin (2023).
At the heart of our proof is the notion of $\textit{coupling independence}$ which allows us to consider multiple vertices as a huge single vertex with exponentially large domain and do a "coarse-grained" local-to-global argument on spin systems. The technique works for general (multi) spin systems and helps us obtain some new comparison results for Glauber dynamics.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Unsupervised 4D Cardiac Motion Tracking with Spatiotemporal Optical Flow Networks
Authors:
Long Teng,
Wei Feng,
Menglong Zhu,
Xinchao Li
Abstract:
Cardiac motion tracking from echocardiography can be used to estimate and quantify myocardial motion within a cardiac cycle. It is a cost-efficient and effective approach for assessing myocardial function. However, ultrasound imaging has the inherent characteristics of spatially low resolution and temporally random noise, which leads to difficulties in obtaining reliable annotation. Thus it is dif…
▽ More
Cardiac motion tracking from echocardiography can be used to estimate and quantify myocardial motion within a cardiac cycle. It is a cost-efficient and effective approach for assessing myocardial function. However, ultrasound imaging has the inherent characteristics of spatially low resolution and temporally random noise, which leads to difficulties in obtaining reliable annotation. Thus it is difficult to perform supervised learning for motion tracking. In addition, there is no end-to-end unsupervised method currently in the literature. This paper presents a motion tracking method where unsupervised optical flow networks are designed with spatial reconstruction loss and temporal-consistency loss. Our proposed loss functions make use of the pair-wise and temporal correlation to estimate cardiac motion from noisy background. Experiments using a synthetic 4D echocardiography dataset has shown the effectiveness of our approach, and its superiority over existing methods on both accuracy and running speed. To the best of our knowledge, this is the first work performed that uses unsupervised end-to-end deep learning optical flow network for 4D cardiac motion tracking.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Optimal Mixing for Randomly Sampling Edge Colorings on Trees Down to the Max Degree
Authors:
Charlie Carlson,
Xiaoyu Chen,
Weiming Feng,
Eric Vigoda
Abstract:
We address the convergence rate of Markov chains for randomly generating an edge coloring of a given tree. Our focus is on the Glauber dynamics which updates the color at a randomly chosen edge in each step. For a tree $T$ with $n$ vertices and maximum degree $Δ$, when the number of colors $q$ satisfies $q\geqΔ+2$ then we prove that the Glauber dynamics has an optimal relaxation time of $O(n)$, wh…
▽ More
We address the convergence rate of Markov chains for randomly generating an edge coloring of a given tree. Our focus is on the Glauber dynamics which updates the color at a randomly chosen edge in each step. For a tree $T$ with $n$ vertices and maximum degree $Δ$, when the number of colors $q$ satisfies $q\geqΔ+2$ then we prove that the Glauber dynamics has an optimal relaxation time of $O(n)$, where the relaxation time is the inverse of the spectral gap. This is optimal in the range of $q$ in terms of $Δ$ as Dyer, Goldberg, and Jerrum (2006) showed that the relaxation time is $Ω(n^3)$ when $q=Δ+1$. For the case $q=Δ+1$, we show that an alternative Markov chain which updates a pair of neighboring edges has relaxation time $O(n)$. Moreover, for the $Δ$-regular complete tree we prove $O(n\log^2{n})$ mixing time bounds for the respective Markov chain. Our proofs establish approximate tensorization of variance via a novel inductive approach, where the base case is a tree of height $\ell=O(Δ^2\log^2Δ)$, which we analyze using a canonical paths argument.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
The Nash-MTL-STCN for Prestack Three-Parameter Inversion
Authors:
Yingtian Liu,
Yong Li,
Huating Li,
Junheng Peng,
Zhangquan Liao,
Wen Feng
Abstract:
Deep learning (DL) techniques have been widely used in prestack three-parameter inversion to address its ill-posed problems. Among these DL techniques, Multi-task learning (MTL) methods can simultaneously train multiple tasks, thereby enhancing model generalization and predictive performance. However, existing MTL methods typically adopt heuristic or non-heuristic approaches to jointly update the…
▽ More
Deep learning (DL) techniques have been widely used in prestack three-parameter inversion to address its ill-posed problems. Among these DL techniques, Multi-task learning (MTL) methods can simultaneously train multiple tasks, thereby enhancing model generalization and predictive performance. However, existing MTL methods typically adopt heuristic or non-heuristic approaches to jointly update the gradient of each task, leading to gradient conflicts between different tasks and reducing inversion accuracy. To address this issue, we propose a semi-supervised temporal convolutional network (STCN) based on Nash equilibrium (Nash-MTL-STCN). Firstly, temporal convolutional networks (TCN) with non-causal convolution and convolutional neural networks (CNNs) are used as multi-task layers to extract the shared features from partial angle stack seismic data, with CNNs serving as the single-task layer. Subsequently, the feature mechanism is utilized to extract shared features in the multi-task layer through hierarchical processing, and the gradient combination of these shared features is treated as a Nash game for strategy optimization and joint updates. Ultimately, the overall utility of the three-parameter is maximized, and gradient conflicts are alleviated. In addition, to enhance the network's generalization and stability, we have incorporated geophysical forward modeling and low-frequency models into the network. Experimental results demonstrate that the proposed method overcomes the gradient conflict issue of the conventional MTL methods with constant weights (CW) and achieves higher precision than four widely used non-heuristic MTL methods. Further field data experiments also validate the method's effectiveness.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Robust Multi-Robot Global Localization with Unknown Initial Pose based on Neighbor Constraints
Authors:
Yaojie Zhang,
Haowen Luo,
Weijun Wang,
Wei Feng
Abstract:
Multi-robot global localization (MR-GL) with unknown initial positions in a large scale environment is a challenging task. The key point is the data association between different robots' viewpoints. It also makes traditional Appearance-based localization methods unusable. Recently, researchers have utilized the object's semantic invariance to generate a semantic graph to address this issue. Howeve…
▽ More
Multi-robot global localization (MR-GL) with unknown initial positions in a large scale environment is a challenging task. The key point is the data association between different robots' viewpoints. It also makes traditional Appearance-based localization methods unusable. Recently, researchers have utilized the object's semantic invariance to generate a semantic graph to address this issue. However, previous works lack robustness and are sensitive to overlap rate of maps, resulting in unpredictable performance in real-world environments. In this paper, we propose a data association algorithm based on neighbor constraints to improve the robustness of the system. We demonstrate the effectiveness of our method on three different datasets, indicating a significant improvement in robustness compared to previous works.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
MFDNet: Multi-Frequency Deflare Network for Efficient Nighttime Flare Removal
Authors:
Yiguo Jiang,
Xuhang Chen,
Chi-Man Pun,
Shuqiang Wang,
Wei Feng
Abstract:
When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyra…
▽ More
When light is scattered or reflected accidentally in the lens, flare artifacts may appear in the captured photos, affecting the photos' visual quality. The main challenge in flare removal is to eliminate various flare artifacts while preserving the original content of the image. To address this challenge, we propose a lightweight Multi-Frequency Deflare Network (MFDNet) based on the Laplacian Pyramid. Our network decomposes the flare-corrupted image into low and high-frequency bands, effectively separating the illumination and content information in the image. The low-frequency part typically contains illumination information, while the high-frequency part contains detailed content information. So our MFDNet consists of two main modules: the Low-Frequency Flare Perception Module (LFFPM) to remove flare in the low-frequency part and the Hierarchical Fusion Reconstruction Module (HFRM) to reconstruct the flare-free image. Specifically, to perceive flare from a global perspective while retaining detailed information for image restoration, LFFPM utilizes Transformer to extract global information while utilizing a convolutional neural network to capture detailed local features. Then HFRM gradually fuses the outputs of LFFPM with the high-frequency component of the image through feature aggregation. Moreover, our MFDNet can reduce the computational cost by processing in multiple frequency bands instead of directly removing the flare on the input image. Experimental results demonstrate that our approach outperforms state-of-the-art methods in removing nighttime flare on real-world and synthetic images from the Flare7K dataset. Furthermore, the computational complexity of our model is remarkably low.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
Authors:
Weixi Feng,
Jiachen Li,
Michael Saxon,
Tsu-jui Fu,
Wenhu Chen,
William Yang Wang
Abstract:
Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world v…
▽ More
Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluating simple actions and argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world videos as time progresses. To assess the Temporal Compositionality of video generation models, we propose TC-Bench, a benchmark of meticulously crafted text prompts, corresponding ground truth videos, and robust evaluation metrics. The prompts articulate the initial and final states of scenes, effectively reducing ambiguities for frame development and simplifying the assessment of transition completion. In addition, by collecting aligned real-world videos corresponding to the prompts, we expand TC-Bench's applicability from text-conditional models to image-conditional ones that can perform generative frame interpolation. We also develop new metrics to measure the completeness of component transitions in generated videos, which demonstrate significantly higher correlations with human judgments than existing metrics. Our comprehensive experimental results reveal that most video generators achieve less than 20% of the compositional changes, highlighting enormous space for future improvement. Our analysis indicates that current video generation models struggle to interpret descriptions of compositional changes and synthesize various components across different time steps.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Authors:
Xuehai He,
Weixi Feng,
Kaizhi Zheng,
Yujie Lu,
Wanrong Zhu,
Jiachen Li,
Yue Fan,
Jianfeng Wang,
Linjie Li,
Zhengyuan Yang,
Kevin Lin,
William Yang Wang,
Lijuan Wang,
Xin Eric Wang
Abstract:
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi…
▽ More
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multimodal video understanding. MMWorld distinguishes itself from previous video understanding benchmarks with two unique advantages: (1) multi-discipline, covering various disciplines that often require domain expertise for comprehensive understanding; (2) multi-faceted reasoning, including explanation, counterfactual thinking, future prediction, etc. MMWorld consists of a human-annotated dataset to evaluate MLLMs with questions about the whole videos and a synthetic dataset to analyze MLLMs within a single modality of perception. Together, MMWorld encompasses 1,910 videos across seven broad disciplines and 69 subdisciplines, complete with 6,627 question-answer pairs and associated captions. The evaluation includes 2 proprietary and 10 open-source MLLMs, which struggle on MMWorld (e.g., GPT-4V performs the best with only 52.3\% accuracy), showing large room for improvement. Further ablation studies reveal other interesting findings such as models' different skill sets from humans. We hope MMWorld can serve as an essential step towards world model evaluation in videos.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Estimation of Global Building Stocks by 2070: Unlocking Renovation Potential
Authors:
Shufan Zhang,
Minda Ma,
Nan Zhou,
Jinyue Yan,
Wei Feng,
Ran Yan,
Kairui You,
Jingjing Zhang,
Jing Ke
Abstract:
Buildings produce one-third of carbon emissions globally, however, data absence regarding global floorspace poses challenges in advancing building carbon neutrality. We compile the measured building stocks for 14 major economies and apply our global building stock model, GLOBUS, to evaluate future trends in stock turnover. Based on a scenario not considering renovation, by 2070 the building stock…
▽ More
Buildings produce one-third of carbon emissions globally, however, data absence regarding global floorspace poses challenges in advancing building carbon neutrality. We compile the measured building stocks for 14 major economies and apply our global building stock model, GLOBUS, to evaluate future trends in stock turnover. Based on a scenario not considering renovation, by 2070 the building stock in developed economies will be ~1.4 times that of 2020 (100 billion m2); in developing economies it is expected to be 2.2 times that of 2020 (313 billion m2). Based on a techno-economic potential scenario, however, stocks in developed economies will decline to approximately 0.8 times the 2020 level, while stocks in developing economies will increase to nearly twice the 2020 level due to their fewer buildings currently. Overall, GLOBUS provides a way of calculating the global building stock, helping scientists, engineers, and policymakers conduct a range of investigation across various future scenarios.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
Authors:
Jingshen Zhang,
Xinglu Chen,
Xinying Qiu,
Zhimin Wang,
Wenhe Feng
Abstract:
Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Se…
▽ More
Chinese sentence simplification faces challenges due to the lack of large-scale labeled parallel corpora and the prevalence of idioms. To address these challenges, we propose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel framework that combines data augmentation techniques with lexcial simplification. RISS introduces two key components: (1) Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sentence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the comprehension and simplification of idiomatic expressions. By integrating RPS and IAS using multi-stage and multi-task learning strategies, RISS outperforms previous state-of-the-art methods on two Chinese sentence simplification datasets. Furthermore, RISS achieves additional improvements when fine-tuned on a small labeled dataset. Our approach demonstrates the potential for more effective and accessible Chinese text simplification.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Altermagnetism: Exploring New Frontiers in Magnetism and Spintronics
Authors:
Ling Bai,
Wanxiang Feng,
Siyuan Liu,
Libor Šmejkal,
Yuriy Mokrousov,
Yugui Yao
Abstract:
Recent developments have introduced a groundbreaking form of collinear magnetism known as "altermagnetism". This emerging magnetic phase is characterized by robust time-reversal symmetry breaking, antiparallel magnetic order, and alternating spin-splitting band structures, yet it exhibits vanishing net magnetization constrained by symmetry. Altermagnetism uniquely integrates traits previously cons…
▽ More
Recent developments have introduced a groundbreaking form of collinear magnetism known as "altermagnetism". This emerging magnetic phase is characterized by robust time-reversal symmetry breaking, antiparallel magnetic order, and alternating spin-splitting band structures, yet it exhibits vanishing net magnetization constrained by symmetry. Altermagnetism uniquely integrates traits previously considered mutually exclusive to conventional collinear ferromagnetism and antiferromagnetism, thereby facilitating phenomena and functionalities previously not achievable within these traditional categories of magnetism. Initially proposed theoretically, the existence of the altermagnetic phase has since been corroborated by a range of experimental studies, which have confirmed its unique properties and potential for applications. This review explores the rapidly expanding research on altermagnets, emphasizing the novel physical phenomena they manifest, methodologies for inducing altermagnetism, and promising altermagnetic materials. The goal of this review is to furnish readers with a comprehensive overview of altermagnetism and to inspire further innovative studies on altermagnetic materials which could potentially revolutionize applications in technology and materials science.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
BACON: Bayesian Optimal Condensation Framework for Dataset Distillation
Authors:
Zheng Zhou,
Hongbo Zhao,
Guangliang Cheng,
Xiangtai Li,
Shuchang Lyu,
Wenquan Feng,
Qi Zhao
Abstract:
Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyz…
▽ More
Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyzing the DD problem. To address these challenges, we propose the BAyesian optimal CONdensation framework (BACON), which is the first work to introduce the Bayesian theoretical framework to the literature of DD. This framework provides theoretical support for enhancing the performance of DD. Furthermore, BACON formulates the DD problem as the minimization of the expected risk function in joint probability distributions using the Bayesian framework. Additionally, by analyzing the expected risk function for optimal condensation, we derive a numerically feasible lower bound based on specific assumptions, providing an approximate solution for BACON. We validate BACON across several datasets, demonstrating its superior performance compared to existing state-of-the-art methods. For instance, under the IPC-10 setting, BACON achieves a 3.46% accuracy gain over the IDM method on the CIFAR-10 dataset and a 3.10% gain on the TinyImageNet dataset. Our extensive experiments confirm the effectiveness of BACON and its seamless integration with existing methods, thereby enhancing their performance for the DD task. Code and distilled datasets are available at BACON.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction
Authors:
Yang Zhou,
Shimin Shan,
Hongkui Wei,
Zhehuan Zhao,
Wenshuo Feng
Abstract:
Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-sampl…
▽ More
Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-samples with the same sentence meaning but with different representations and forms by paraphrasing the original training set samples. As well as instructing LLM to generate sentences that implicitly contain information about the corresponding labels based on the relation and entity of the original training set samples. These two kinds of pseudo-samples participate in the training of the RE model together with the original dataset, respectively. The PGA framework in the experiment improves the F1 scores of the three mainstream models for RE within the scientific domain. Also, using a LLM to obtain samples can effectively reduce the cost of manually labeling data.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Freeze-in Dark Matter Explanation of the Galactic 511 keV Signal
Authors:
Wan-Zhe Feng,
Zi-Hui Zhang
Abstract:
The galactic 511~keV photon signal can be fully explained by the decaying dark matter generated through the freeze-in mechanism. The explanation of the 511~keV signal requires an extremely tiny coupling between the decaying dark matter and $e^+e^-$ pair and thus cannot be generated via direct freeze-in from standard model particles. We construct models involving two $U(1)$ hidden sectors, one of w…
▽ More
The galactic 511~keV photon signal can be fully explained by the decaying dark matter generated through the freeze-in mechanism. The explanation of the 511~keV signal requires an extremely tiny coupling between the decaying dark matter and $e^+e^-$ pair and thus cannot be generated via direct freeze-in from standard model particles. We construct models involving two $U(1)$ hidden sectors, one of which couples directly to the standard model, the other couples directly to the first hidden sector while couples indirectly to the standard model. The decaying dark photon dark matter, which explains the 511~keV signal, is generated via a two-step freeze-in process. In the models we study, the freeze-in mechanism generates the entire dark matter relic density, and thus any types of additional dark matter components produced from other sources are unnecessary. The two-$U(1)$ model remains a strong candidate for explaining the 511 keV signal consistent with various dark matter density profiles.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback
Authors:
Jiachen Li,
Weixi Feng,
Tsu-Jui Fu,
Xinyi Wang,
Sugato Basu,
Wenhu Chen,
William Yang Wang
Abstract:
Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achiev…
▽ More
Diffusion-based text-to-video (T2V) models have achieved significant success but continue to be hampered by the slow sampling speed of their iterative sampling processes. To address the challenge, consistency models have been proposed to facilitate fast inference, albeit at the cost of sample quality. In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$. We introduce T2V-Turbo, which integrates feedback from a mixture of differentiable reward models into the consistency distillation (CD) process of a pre-trained T2V model. Notably, we directly optimize rewards associated with single-step generations that arise naturally from computing the CD loss, effectively bypassing the memory constraints imposed by backpropagating gradients through an iterative sampling process. Remarkably, the 4-step generations from our T2V-Turbo achieve the highest total score on VBench, even surpassing Gen-2 and Pika. We further conduct human evaluations to corroborate the results, validating that the 4-step generations from our T2V-Turbo are preferred over the 50-step DDIM samples from their teacher models, representing more than a tenfold acceleration while improving video generation quality.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory
Authors:
Naibo Wang,
Yuchen Deng,
Wenjie Feng,
Jianwei Yin,
See-Kiong Ng
Abstract:
Federated Class Incremental Learning (FCIL) is a critical yet largely underexplored issue that deals with the dynamic incorporation of new classes within federated learning (FL). Existing methods often employ generative adversarial networks (GANs) to produce synthetic images to address privacy concerns in FL. However, GANs exhibit inherent instability and high sensitivity, compromising the effecti…
▽ More
Federated Class Incremental Learning (FCIL) is a critical yet largely underexplored issue that deals with the dynamic incorporation of new classes within federated learning (FL). Existing methods often employ generative adversarial networks (GANs) to produce synthetic images to address privacy concerns in FL. However, GANs exhibit inherent instability and high sensitivity, compromising the effectiveness of these methods. In this paper, we introduce a novel data-free federated class incremental learning framework with diffusion-based generative memory (DFedDGM) to mitigate catastrophic forgetting by generating stable, high-quality images through diffusion models. We design a new balanced sampler to help train the diffusion models to alleviate the common non-IID problem in FL, and introduce an entropy-based sample filtering technique from an information theory perspective to enhance the quality of generative samples. Finally, we integrate knowledge distillation with a feature-based regularization term for better knowledge transfer. Our framework does not incur additional communication costs compared to the baseline FedAvg method. Extensive experiments across multiple datasets demonstrate that our method significantly outperforms existing baselines, e.g., over a 4% improvement in average accuracy on the Tiny-ImageNet dataset.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Controllable Continual Test-Time Adaptation
Authors:
Ziqi Shi,
Fan Lyu,
Ye Liu,
Fanhua Shang,
Fuyuan Hu,
Wei Feng,
Zhang Zhang,
Liang Wang
Abstract:
Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppr…
▽ More
Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppressing domain shifts, which proves inadequate during the unsupervised test phase. In contrast, we introduce a novel approach that guides rather than suppresses these shifts. Specifically, we propose $\textbf{C}$ontrollable $\textbf{Co}$ntinual $\textbf{T}$est-$\textbf{T}$ime $\textbf{A}$daptation (C-CoTTA), which explicitly prevents any single category from encroaching on others, thereby mitigating the mutual influence between categories caused by uncontrollable shifts. Moreover, our method reduces the sensitivity of model to domain transformations, thereby minimizing the magnitude of category shifts. Extensive quantitative experiments demonstrate the effectiveness of our method, while qualitative analyses, such as t-SNE plots, confirm the theoretical validity of our approach.
△ Less
Submitted 28 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Edge Information Hub-Empowered 6G NTN: Latency-Oriented Resource Orchestration and Configuration
Authors:
Yueshan Lin,
Wei Feng,
Yunfei Chen,
Ning Ge,
Zhiyong Feng,
Yue Gao
Abstract:
Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance.…
▽ More
Quick response to disasters is crucial for saving lives and reducing loss. This requires low-latency uploading of situation information to the remote command center. Since terrestrial infrastructures are often damaged in disaster areas, non-terrestrial networks (NTNs) are preferable to provide network coverage, and mobile edge computing (MEC) could be integrated to improve the latency performance. Nevertheless, the communications and computing in MEC-enabled NTNs are strongly coupled, which complicates the system design. In this paper, an edge information hub (EIH) that incorporates communication, computing and storage capabilities is proposed to synergize communication and computing and enable systematic design. We first address the joint data scheduling and resource orchestration problem to minimize the latency for uploading sensing data. The problem is solved using an optimal resource orchestration algorithm. On that basis, we propose the principles for resource configuration of the EIH considering payload constraints on size, weight and energy supply. Simulation results demonstrate the superiority of our proposed scheme in reducing the overall upload latency, thus enabling quick emergency rescue.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
Authors:
Weitao Feng,
Wenbo Zhou,
Jiyan He,
Jie Zhang,
Tianyi Wei,
Guanlin Li,
Tianwei Zhang,
Weiming Zhang,
Nenghai Yu
Abstract:
Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution a…
▽ More
Diffusion models have achieved remarkable success in generating high-quality images. Recently, the open-source models represented by Stable Diffusion (SD) are thriving and are accessible for customization, giving rise to a vibrant community of creators and enthusiasts. However, the widespread availability of customized SD models has led to copyright concerns, like unauthorized model distribution and unconsented commercial use. To address it, recent works aim to let SD models output watermarked content for post-hoc forensics. Unfortunately, none of them can achieve the challenging white-box protection, wherein the malicious user can easily remove or replace the watermarking module to fail the subsequent verification. For this, we propose \texttt{\method} as the first implementation under this scenario. Briefly, we merge watermark information into the U-Net of Stable Diffusion Models via a watermark Low-Rank Adaptation (LoRA) module in a two-stage manner. For watermark LoRA module, we devise a scaling matrix to achieve flexible message updates without retraining. To guarantee fidelity, we design Prior Preserving Fine-Tuning (PPFT) to ensure watermark learning with minimal impacts on model distribution, validated by proofs. Finally, we conduct extensive experiments and ablation studies to verify our design.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Overcoming Domain Drift in Online Continual Learning
Authors:
Fan Lyu,
Daofeng Liu,
Linglan Zhao,
Zhang Zhang,
Fanhua Shang,
Fuyuan Hu,
Wei Feng,
Liang Wang
Abstract:
Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential lea…
▽ More
Online Continual Learning (OCL) empowers machine learning models to acquire new knowledge online across a sequence of tasks. However, OCL faces a significant challenge: catastrophic forgetting, wherein the model learned in previous tasks is substantially overwritten upon encountering new tasks, leading to a biased forgetting of prior knowledge. Moreover, the continual doman drift in sequential learning tasks may entail the gradual displacement of the decision boundaries in the learned feature space, rendering the learned knowledge susceptible to forgetting. To address the above problem, in this paper, we propose a novel rehearsal strategy, termed Drift-Reducing Rehearsal (DRR), to anchor the domain of old tasks and reduce the negative transfer effects. First, we propose to select memory for more representative samples guided by constructed centroids in a data stream. Then, to keep the model from domain chaos in drifting, a two-level angular cross-task Contrastive Margin Loss (CML) is proposed, to encourage the intra-class and intra-task compactness, and increase the inter-class and inter-task discrepancy. Finally, to further suppress the continual domain drift, we present an optional Centorid Distillation Loss (CDL) on the rehearsal memory to anchor the knowledge in feature space for each previous old task. Extensive experimental results on four benchmark datasets validate that the proposed DRR can effectively mitigate the continual domain drift and achieve the state-of-the-art (SOTA) performance in OCL.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank
Authors:
Alexander Scarlatos,
Wanyong Feng,
Digory Smith,
Simon Woodhead,
Andrew Lan
Abstract:
Multiple-choice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a nove…
▽ More
Multiple-choice questions (MCQs) are commonly used across all levels of math education since they can be deployed and graded at a large scale. A critical component of MCQs is the distractors, i.e., incorrect answers crafted to reflect student errors or misconceptions. Automatically generating them in math MCQs, e.g., with large language models, has been challenging. In this work, we propose a novel method to enhance the quality of generated distractors through overgenerate-and-rank, training a ranking model to predict how likely distractors are to be selected by real students. Experimental results on a real-world dataset and human evaluation with math teachers show that our ranking model increases alignment with human-authored distractors, although human-authored ones are still preferred over generated ones.
△ Less
Submitted 13 May, 2024; v1 submitted 18 April, 2024;
originally announced May 2024.
-
Discrete nonlinear Schrödinger type equations: Solutions and continuum limits
Authors:
Song-lin Zhao,
Xiao-hui Feng,
Wei Feng
Abstract:
As local and nonlocal reductions of a discrete second-order Ablowitz-Kaup-Newell-Segur equation, two discrete nonlinear Schrödinger type equations are considered. Through the bilinearization reduction method, we construct double Casoratian solutions of the reduced discrete nonlinear Schrödinger type equations, including soliton solutions and Jordan-block solutions.Dynamics of the obtained one-soli…
▽ More
As local and nonlocal reductions of a discrete second-order Ablowitz-Kaup-Newell-Segur equation, two discrete nonlinear Schrödinger type equations are considered. Through the bilinearization reduction method, we construct double Casoratian solutions of the reduced discrete nonlinear Schrödinger type equations, including soliton solutions and Jordan-block solutions.Dynamics of the obtained one-soliton and two-soliton solutions are analyzed and illustrated. Moreover,both semi-continuous limit and full continuous limit, are applied to obtain solutions of the local and nonlocal semi-discrete nonlinear Schrödinger type equations, as well as the local and nonlocal continuous nonlinear Schrödinger type equations.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Accelerating Image Generation with Sub-path Linear Approximation Model
Authors:
Chen Xu,
Tianhui Song,
Weixin Feng,
Xubin Li,
Tiezheng Ge,
Bo Zheng,
Limin Wang
Abstract:
Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining hi…
▽ More
Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation. SLAM treats the PF-ODE trajectory as a series of PF-ODE sub-paths divided by sampled points, and harnesses sub-path linear (SL) ODEs to form a progressive and continuous error estimation along each individual PF-ODE sub-path. The optimization on such SL-ODEs allows SLAM to construct denoising mappings with smaller cumulative approximated errors. An efficient distillation method is also developed to facilitate the incorporation of more advanced diffusion models, such as latent diffusion models. Our extensive experimental results demonstrate that SLAM achieves an efficient training regimen, requiring only 6 A100 GPU days to produce a high-quality generative model capable of 2 to 4-step generation with high performance. Comprehensive evaluations on LAION, MS COCO 2014, and MS COCO 2017 datasets also illustrate that SLAM surpasses existing acceleration methods in few-step generation tasks, achieving state-of-the-art performance both on FID and the quality of the generated images.
△ Less
Submitted 22 April, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
In-situ process monitoring and adaptive quality enhancement in laser additive manufacturing: a critical review
Authors:
Lequn Chen,
Guijun Bi,
Xiling Yao,
Jinlong Su,
Chaolin Tan,
Wenhe Feng,
Michalis Benakis,
Youxiang Chew,
Seung Ki Moon
Abstract:
Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including…
▽ More
Laser Additive Manufacturing (LAM) presents unparalleled opportunities for fabricating complex, high-performance structures and components with unique material properties. Despite these advancements, achieving consistent part quality and process repeatability remains challenging. This paper provides a comprehensive review of various state-of-the-art in-situ process monitoring techniques, including optical-based monitoring, acoustic-based sensing, laser line scanning, and operando X-ray monitoring. These techniques are evaluated for their capabilities and limitations in detecting defects within Laser Powder Bed Fusion (LPBF) and Laser Directed Energy Deposition (LDED) processes. Furthermore, the review discusses emerging multisensor monitoring and machine learning (ML)-assisted defect detection methods, benchmarking ML models tailored for in-situ defect detection. The paper also discusses in-situ adaptive defect remediation strategies that advance LAM towards zero-defect autonomous operations, focusing on real-time closed-loop feedback control and defect correction methods. Research gaps such as the need for standardization, improved reliability and sensitivity, and decision-making strategies beyond early stopping are highlighted. Future directions are proposed, with an emphasis on multimodal sensor fusion for multiscale defect prediction and fault diagnosis, ultimately enabling self-adaptation in LAM processes. This paper aims to equip researchers and industry professionals with a holistic understanding of the current capabilities, limitations, and future directions in in-situ process monitoring and adaptive quality enhancement in LAM.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning
Authors:
Ming Cheng,
Ziyi Zhou,
Bowen Zhang,
Ziyu Wang,
Jiaqi Gan,
Ziang Ren,
Weiqi Feng,
Yi Lyu,
Hefan Zhang,
Xingjian Diao
Abstract:
In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of…
▽ More
In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of a multi-scale k-nearest neighbors (KNN) algorithm with feature fusion for graph construction, marking a leap in dimensionality reduction techniques by preserving essential data features. Moreover, the groundbreaking graph construction mechanism and the high-performance lightweight GCN increase embedding extraction speed by up to 36 times faster. We further offer Efflex in two versions, Efflex-L for scenarios demanding high accuracy, and Efflex-B for environments requiring swift data processing. Comprehensive experimentation with the Porto and Geolife datasets validates our approach, positioning Efflex as the state-of-the-art in the domain. Such enhancements in speed and accuracy highlight the versatility of Efflex, underscoring its wide-ranging potential for deployment in time-sensitive and computationally constrained applications.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity
Authors:
Naibo Wang,
Yuchen Deng,
Wenjie Feng,
Shichen Fan,
Jianwei Yin,
See-Kiong Ng
Abstract:
Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL se…
▽ More
Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL settings, exacerbated by the restricted communication between clients. In this paper, we improve the one-shot sequential federated learning for non-IID data by proposing a local model diversity-enhancing strategy. Specifically, to leverage the potential of local model diversity for improving model performance, we introduce a local model pool for each client that comprises diverse models generated during local training, and propose two distance measurements to further enhance the model diversity and mitigate the effect of non-IID data. Consequently, our proposed framework can improve the global model performance while maintaining low communication costs. Extensive experiments demonstrate that our method exhibits superior performance to existing one-shot PFL methods and achieves better accuracy compared with state-of-the-art one-shot SFL methods on both label-skew and domain-shift tasks (e.g., 6%+ accuracy improvement on the CIFAR-10 dataset).
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route
Authors:
Zonghang Li,
Wenjiao Feng,
Weibo Cai,
Hongfang Yu,
Long Luo,
Gang Sun,
Hongyang Du,
Dusit Niyato
Abstract:
Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions. This paradigm eliminates the need for centralizing sensitive raw data in one location but faces the significant challenge of high parameter synchronization delays, which stems from the constraints of ba…
▽ More
Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions. This paradigm eliminates the need for centralizing sensitive raw data in one location but faces the significant challenge of high parameter synchronization delays, which stems from the constraints of bandwidth-limited, heterogeneous, and fluctuating wide-area networks. Prior research has focused on optimizing the synchronization topology, evolving from starlike to tree-based structures. However, these solutions typically depend on regular tree structures and lack an adequate topology metric, resulting in limited improvements. This paper proposes NetStorm, an adaptive and highly efficient communication scheduler designed to speed up parameter synchronization across geo-distributed data centers. First, it establishes an effective metric for optimizing a multi-root FAPT synchronization topology. Second, a network awareness module is developed to acquire network knowledge, aiding in topology decisions. Third, a multipath auxiliary transmission mechanism is introduced to enhance network awareness and facilitate multipath transmissions. Lastly, we design policy consistency protocols to guarantee seamless updates of transmission policies. Empirical results demonstrate that NetStorm significantly outperforms distributed training systems like MXNET, MLNET, and TSEngine, with a speedup of 6.5~9.2 times over MXNET.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation
Authors:
Lianyu Hu,
Wei Feng,
Liqing Gao,
Zekang Liu,
Liang Wan
Abstract:
In sign language, the conveyance of human body trajectories predominantly relies upon the coordinated movements of hands and facial expressions across successive frames. Despite the recent advancements of sign language understanding methods, they often solely focus on individual frames, inevitably overlooking the inter-frame correlations that are essential for effectively modeling human body traje…
▽ More
In sign language, the conveyance of human body trajectories predominantly relies upon the coordinated movements of hands and facial expressions across successive frames. Despite the recent advancements of sign language understanding methods, they often solely focus on individual frames, inevitably overlooking the inter-frame correlations that are essential for effectively modeling human body trajectories. To address this limitation, this paper introduces a spatial-temporal correlation network, denoted as CorrNet+, which explicitly identifies body trajectories across multiple frames. In specific, CorrNet+ employs a correlation module and an identification module to build human body trajectories. Afterwards, a temporal attention module is followed to adaptively evaluate the contributions of different frames. The resultant features offer a holistic perspective on human body movements, facilitating a deeper understanding of sign language. As a unified model, CorrNet+ achieves new state-of-the-art performance on two extensive sign language understanding tasks, including continuous sign language recognition (CSLR) and sign language translation (SLT). Especially, CorrNet+ surpasses previous methods equipped with resource-intensive pose-estimation networks or pre-extracted heatmaps for hand and facial feature extraction. Compared with CorrNet, CorrNet+ achieves a significant performance boost across all benchmarks while halving the computational overhead. A comprehensive comparison with previous spatial-temporal reasoning methods verifies the superiority of CorrNet+. Code is available at https://github.com/hulianyuyy/CorrNet_Plus.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Authors:
Haosong Peng,
Wei Feng,
Hao Li,
Yufeng Zhan,
Qihua Zhou,
Yuanqing Xia
Abstract:
The advent of edge computing has made real-time intelligent video analytics feasible. Previous works, based on traditional model architecture (e.g., CNN, RNN, etc.), employ various strategies to filter out non-region-of-interest content to minimize bandwidth and computation consumption but show inferior performance in adverse environments. Recently, visual foundation models based on transformers h…
▽ More
The advent of edge computing has made real-time intelligent video analytics feasible. Previous works, based on traditional model architecture (e.g., CNN, RNN, etc.), employ various strategies to filter out non-region-of-interest content to minimize bandwidth and computation consumption but show inferior performance in adverse environments. Recently, visual foundation models based on transformers have shown great performance in adverse environments due to their amazing generalization capability. However, they require a large amount of computation power, which limits their applications in real-time intelligent video analytics. In this paper, we find visual foundation models like Vision Transformer (ViT) also have a dedicated acceleration mechanism for video analytics. To this end, we introduce Arena, an end-to-end edge-assisted video inference acceleration system based on ViT. We leverage the capability of ViT that can be accelerated through token pruning by only offloading and feeding Patches-of-Interest (PoIs) to the downstream models. Additionally, we employ probability-based patch sampling, which provides a simple but efficient mechanism for determining PoIs where the probable locations of objects are in subsequent frames. Through extensive evaluations on public datasets, our findings reveal that Arena can boost inference speeds by up to $1.58\times$ and $1.82\times$ on average while consuming only 54% and 34% of the bandwidth, respectively, all with high inference accuracy.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference
Authors:
Ruqi Liao,
Chuqing Zhao,
Jin Li,
Weiqi Feng
Abstract:
In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations,…
▽ More
In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations, CATP achieves up to 12.1X higher accuracy compared to existing token pruning methods, addressing the trade-off between computational efficiency and model precision.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Improving Continuous Sign Language Recognition with Adapted Image Models
Authors:
Lianyu Hu,
Tongkai Shi,
Liqing Gao,
Zekang Liu,
Wei Feng
Abstract:
The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-…
▽ More
The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-tuning the model easily forgets the generic essential knowledge acquired in the pretraining stage and overfits the downstream data. To enable high efficiency when adapting these large vision-language models (e.g., CLIP) to performing continuous sign language recognition (CSLR) while preserving their generalizability, we propose a novel strategy (AdaptSign). Especially, CLIP is adopted as the visual backbone to extract frame-wise features whose parameters are fixed, and a set of learnable modules are introduced to model spatial sign variations or capture temporal sign movements. The introduced additional modules are quite lightweight, only owning 3.2% extra computations with high efficiency. The generic knowledge acquired in the pretraining stage is well-preserved in the frozen CLIP backbone in this process. Extensive experiments show that despite being efficient, AdaptSign is able to demonstrate superior performance across a series of CSLR benchmarks including PHOENIX14, PHOENIX14-T, CSL-Daily and CSL compared to existing methods. Visualizations show that AdaptSign could learn to dynamically pay major attention to the informative spatial regions and cross-frame trajectories in sign videos.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning
Authors:
Ming Cheng,
Bowen Zhang,
Ziyu Wang,
Ziyi Zhou,
Weiqi Feng,
Yi Lyu,
Xingjian Diao
Abstract:
Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer f…
▽ More
Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer from the inevitable issues of complicated architecture and heavy training costs. Considering the intricate connections between trajectories, using Graph Neural Networks (GNNs) for data modeling is feasible. However, most methods directly use existing mathematical graph structures as the input instead of constructing specific graphs from certain vehicle trajectory data. This ignores such data's unique and dynamic characteristics. To bridge such a research gap, we propose VeTraSS -- an end-to-end pipeline for Vehicle Trajectory Similarity Search. Specifically, VeTraSS models the original trajectory data into multi-scale graphs, and generates comprehensive embeddings through a novel multi-layer attention-based GNN. The learned embeddings can be used for searching similar vehicle trajectories. Extensive experiments on the Porto and Geolife datasets demonstrate the effectiveness of VeTraSS, where our model outperforms existing work and reaches the state-of-the-art. This demonstrates the potential of VeTraSS for trajectory analysis and safe navigation in self-driving vehicles in the real world.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks
Authors:
Jianlang Chen,
Xuhong Ren,
Qing Guo,
Felix Juefei-Xu,
Di Lin,
Wei Feng,
Lei Ma,
Jianjun Zhao
Abstract:
Visual object tracking plays a critical role in visual-based autonomous systems, as it aims to estimate the position and size of the object of interest within a live video. Despite significant progress made in this field, state-of-the-art (SOTA) trackers often fail when faced with adversarial perturbations in the incoming frames. This can lead to significant robustness and security issues when the…
▽ More
Visual object tracking plays a critical role in visual-based autonomous systems, as it aims to estimate the position and size of the object of interest within a live video. Despite significant progress made in this field, state-of-the-art (SOTA) trackers often fail when faced with adversarial perturbations in the incoming frames. This can lead to significant robustness and security issues when these trackers are deployed in the real world. To achieve high accuracy on both clean and adversarial data, we propose building a spatial-temporal continuous representation using the semantic text guidance of the object of interest. This novel continuous representation enables us to reconstruct incoming frames to maintain semantic and appearance consistency with the object of interest and its clean counterparts. As a result, our proposed method successfully defends against different SOTA adversarial tracking attacks while maintaining high accuracy on clean data. In particular, our method significantly increases tracking accuracy under adversarial attacks with around 90% relative improvement on UAV123, which is even higher than the accuracy on clean data.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models
Authors:
Wanyong Feng,
Jaewook Lee,
Hunter McNichols,
Alexander Scarlatos,
Digory Smith,
Simon Woodhead,
Nancy Otero Ornelas,
Andrew Lan
Abstract:
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting high-quality distractor…
▽ More
Multiple-choice questions (MCQs) are ubiquitous in almost all levels of education since they are easy to administer, grade, and are a reliable format in assessments and practices. One of the most important aspects of MCQs is the distractors, i.e., incorrect options that are designed to target common errors or misconceptions among real students. To date, the task of crafting high-quality distractors largely remains a labor and time-intensive process for teachers and learning content designers, which has limited scalability. In this work, we study the task of automated distractor generation in the domain of math MCQs and explore a wide variety of large language model (LLM)-based approaches, from in-context learning to fine-tuning. We conduct extensive experiments using a real-world math MCQ dataset and find that although LLMs can generate some mathematically valid distractors, they are less adept at anticipating common errors or misconceptions among real students.
△ Less
Submitted 18 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level Attacks
Authors:
Brian Formento,
Wenjie Feng,
Chuan Sheng Foo,
Luu Anh Tuan,
See-Kiong Ng
Abstract:
Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial T…
▽ More
Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial Training strategy to enhance the robustness of LMs. Drawing inspiration from recent studies in the image domain, we investigate and later confirm that in a discrete data setting such as language, adversarial samples generated via word substitutions do indeed belong to an adversarial domain exhibiting a high Wasserstein distance from the base domain. Our method learns a robust representation that bridges these two domains. We hypothesize that if samples were not projected into an adversarial domain, but instead to a domain with minimal shift, it would improve attack robustness. We align the domains by incorporating a new distance-based objective. With this, our model is able to learn more generalized representations by aligning the model's high-level output features and therefore better handling unseen adversarial samples. This method can be generalized across word embeddings, even when they share minimal overlap at both vocabulary and word-substitution levels. To evaluate the effectiveness of our approach, we conduct experiments on BERT and RoBERTa models on three datasets. The results demonstrate promising state-of-the-art robustness.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition
Authors:
Lianyu Hu,
Liqing Gao,
Zekang Liu,
Wei Feng
Abstract:
Skeleton-aware sign language recognition (SLR) has gained popularity due to its ability to remain unaffected by background information and its lower computational requirements. Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively. However, their spatial graph modules are typically built on fixed graph structures such as graph con…
▽ More
Skeleton-aware sign language recognition (SLR) has gained popularity due to its ability to remain unaffected by background information and its lower computational requirements. Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively. However, their spatial graph modules are typically built on fixed graph structures such as graph convolutional networks or a single learnable graph, which only partially explore joint relationships. Additionally, a simple temporal convolution kernel is used to capture temporal information, which may not fully capture the complex movement patterns of different signers. To overcome these limitations, we propose a new spatial architecture consisting of two concurrent branches, which build input-sensitive joint relationships and incorporates specific domain knowledge for recognition, respectively. These two branches are followed by an aggregation process to distinguishe important joint connections. We then propose a new temporal module to model multi-scale temporal information to capture complex human dynamics. Our method achieves state-of-the-art accuracy compared to previous skeleton-aware methods on four large-scale SLR benchmarks. Moreover, our method demonstrates superior accuracy compared to RGB-based methods in most cases while requiring much fewer computational resources, bringing better accuracy-computation trade-off. Code is available at https://github.com/hulianyuyy/DSTA-SLR.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Reward Guided Latent Consistency Distillation
Authors:
Jiachen Li,
Weixi Feng,
Wenhu Chen,
William Yang Wang
Abstract:
Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis. By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps. However, the LCM's efficient inference is obtained at the cost of the sample quality. In t…
▽ More
Latent Consistency Distillation (LCD) has emerged as a promising paradigm for efficient text-to-image synthesis. By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps. However, the LCM's efficient inference is obtained at the cost of the sample quality. In this paper, we propose compensating the quality loss by aligning LCM's output with human preference during training. Specifically, we introduce Reward Guided LCD (RG-LCD), which integrates feedback from a reward model (RM) into the LCD process by augmenting the original LCD loss with the objective of maximizing the reward associated with LCM's single-step generation. As validated through human evaluation, when trained with the feedback of a good RM, the 2-step generations from our RG-LCM are favored by humans over the 50-step DDIM samples from the teacher LDM, representing a 25 times inference acceleration without quality loss.
As directly optimizing towards differentiable RMs can suffer from over-optimization, we overcome this difficulty by proposing the use of a latent proxy RM (LRM). This novel component serves as an intermediary, connecting our LCM with the RM. Empirically, we demonstrate that incorporating the LRM into our RG-LCD successfully avoids high-frequency noise in the generated images, contributing to both improved FID on MS-COCO and a higher HPSv2.1 score on HPSv2's test set, surpassing those achieved by the baseline LCM.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Spin and Orbital Magnetism by Light in Rutile Altermagnets
Authors:
T. Adamantopoulos,
M. Merte,
F. Freimuth,
D. Go,
M. Ležaić,
W. Feng,
Y. Yao,
J. Sinova,
L. Šmejkal,
S. Blügel,
Y. Mokrousov
Abstract:
While the understanding of altermagnetism is still at a very early stage, it is expected to play a role in various fields of condensed matter research, for example spintronics, caloritronics and superconductivity. In the field of optical magnetism, it is still unclear to which extent altermagnets as a class can exhibit a distinct behavior. Here we choose RuO$_2$, a prototype metallic altermagnet w…
▽ More
While the understanding of altermagnetism is still at a very early stage, it is expected to play a role in various fields of condensed matter research, for example spintronics, caloritronics and superconductivity. In the field of optical magnetism, it is still unclear to which extent altermagnets as a class can exhibit a distinct behavior. Here we choose RuO$_2$, a prototype metallic altermagnet with a giant spin splitting, and CoF$_2$, an experimentally known insulating altermagnet, to study the light-induced magnetism in rutile altermagnets from first-principles. We demonstrate that in the non-relativisic limit the allowed sublattice-resolved orbital response exhibits symmetries, imposed by altermagnetism, which lead to a drastic canting of light-induced moments. On the other hand, we find that inclusion of spin-orbit interaction enhances the overall effect drastically, introduces a significant anisotropy with respect to the light polarization and strongly suppresses the canting of induced moments. Remarkably, we observe that the moments induced by linearly-polarized laser pulses in light altermagnets can even exceed in magnitude those predicted for heavy ferromagnets exposed to circularly polarized light. By resorting to microscopic tools we interpret our results in terms of the altermagnetic spin splittings and of their reciprocal space distribution. Based on our findings, we speculate that optical excitations may provide a unique tool to switch and probe the magnetic state of rutile altermagnets.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Cosmologically Consistent Analysis of Gravitational Waves from hidden sectors
Authors:
Wan-Zhe Feng,
Jinzheng Li,
Pran Nath
Abstract:
Production of gravitational waves in the early universe is discussed in a cosmologically consistent analysis within a first order phase transition involving a hidden sector feebly coupled with the visible sector. Each sector resides in its own heat bath leading to a potential dependent on two temperatures, and on two fields: one a standard model Higgs and the other a scalar arising from a hidden s…
▽ More
Production of gravitational waves in the early universe is discussed in a cosmologically consistent analysis within a first order phase transition involving a hidden sector feebly coupled with the visible sector. Each sector resides in its own heat bath leading to a potential dependent on two temperatures, and on two fields: one a standard model Higgs and the other a scalar arising from a hidden sector $U(1)$ gauge theory. A synchronous evolution of the hidden and visible sector temperatures is carried out from the reheat temperature down to the electroweak scale.The hydrodynamics of two-field phase transitions, one for the visible and the other for the hidden is discussed, which leads to separate tunneling temperatures, and different sound speeds for the two sectors. Gravitational waves emerging from the two sectors are computed and their imprint on the measured gravitational wave power spectrum vs frequency is analyzed in terms of bubble nucleation signature, i.e., detonation, deflagration, and hybrid. It is shown that the two-field model predicts gravitational waves accessible at several proposed gravitational wave detectors: LISA, DECIGO, BBO, Taiji and their discovery would probe specific regions of the hidden sector parameter space and may also shed light on the nature of bubble nucleation in the early universe. The analysis presented here indicates that the cosmologically preferred models are those where the tunneling in the visible sector precedes the tunneling in the hidden sector and the sound speed $c_s$ lies below its maximum, i.e., $c^2_s<\frac{1}{3}$. It is of interest to investigate if these features are universal and applicable to a wider class of cosmologically consistent models.
△ Less
Submitted 16 June, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
Authors:
Feilong Tang,
Zhongxing Xu,
Zhaojun Qu,
Wei Feng,
Xingjian Jiang,
Zongyuan Ge
Abstract:
Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype…
▽ More
Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype awareness to capture diverse and fine-grained feature attributes of instances. The hypothesis is that contextual prototypes might erroneously activate similar and frequently co-occurring object categories due to this knowledge bias. Therefore, we propose to enhance the prototype representation ability by mitigating the bias to better capture spatial coverage in semantic object regions. With this goal, we present a Context Prototype-Aware Learning (CPAL) strategy, which leverages semantic context to enrich instance comprehension. The core of this method is to accurately capture intra-class variations in object features through context-aware prototypes, facilitating the adaptation to the semantic attributes of various instances. We design feature distribution alignment to optimize prototype awareness, aligning instance feature distributions with dense features. In addition, a unified training framework is proposed to combine label-guided classification supervision and prototypes-guided self-supervision. Experimental results on PASCAL VOC 2012 and MS COCO 2014 show that CPAL significantly improves off-the-shelf methods and achieves state-of-the-art performance. The project is available at https://github.com/Barrett-python/CPAL.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Edge Information Hub: Orchestrating Satellites, UAVs, MEC, Sensing and Communications for 6G Closed-Loop Controls
Authors:
Chengleyang Lei,
Wei Feng,
Peng Wei,
Yunfei Chen,
Ning Ge,
Shiwen Mao
Abstract:
An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aeria…
▽ More
An increasing number of field robots would be used for mission-critical tasks in remote or post-disaster areas. Due to usually-limited individual abilities, these robots require an edge information hub (EIH), which is capable of not only communications but also sensing and computing. Such EIH could be deployed on a flexibly-dispatched unmanned aerial vehicle (UAV). Different from traditional aerial base stations or mobile edge computing (MEC), the EIH would direct the operations of robots via sensing-communication-computing-control ($\textbf{SC}^3$) closed-loop orchestration. This paper aims to optimize the closed-loop control performance of multiple $\textbf{SC}^3$ loops, under the constraints of satellite-backhaul rate, computing capability, and on-board energy. Specifically, the linear quadratic regulator (LQR) control cost is used to measure the closed-loop utility, and a sum LQR cost minimization problem is formulated to jointly optimize the splitting of sensor data and allocation of communication and computing resources. We first derive the optimal splitting ratio of sensor data, and then recast the problem to a more tractable form. An iterative algorithm is finally proposed to provide a sub-optimal solution. Simulation results demonstrate the superiority of the proposed algorithm. We also uncover the influence of $\textbf{SC}^3$ parameters on closed-loop controls, highlighting more systematic understanding.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Higher-order exceptional surface in a pseudo-Hermitian superconducting circuit
Authors:
Guo-Qiang Zhang,
Wei Feng,
Yu Wang,
Chui-Ping Yang
Abstract:
In the last few years, much attention has been paid to exceptional surfaces (ESs) owing to various important physical phenomena and potential applications. However, high-order ESs in pseudo-Hermitian systems have not been reported until now. Here, we study the high-order ES in a pseudo-Hermitian superconducting (SC) circuit system. In our proposal, the SC circuit system is composed of three circul…
▽ More
In the last few years, much attention has been paid to exceptional surfaces (ESs) owing to various important physical phenomena and potential applications. However, high-order ESs in pseudo-Hermitian systems have not been reported until now. Here, we study the high-order ES in a pseudo-Hermitian superconducting (SC) circuit system. In our proposal, the SC circuit system is composed of three circularly coupled SC cavities, where the gain and loss are balanced. According to the eigenvalue properties of the pseudo-Hermitian Hamiltonian, we derive the general pseudo-Hermitian conditions for the ternary SC system. In the special pseudo-Hermitian case with parity-time symmetry, all third-order exceptional points (EP3s) of the SC system form a third-order exceptional line in the parameter space. Under the general pseudo-Hermitian conditions, more EP3s are found, and all EP3s are located on a surface, i.e., a third-order exceptional surface is constructed. Moreover, we also investigate the eigenvalues of the pseudo-Hermitian SC circuit around EP3s. Our work opens up a door for exploring high-order ESs and related applications in pseudo-Hermitian systems.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
Authors:
Wenfeng Feng,
Chuzhan Hao,
Yuewei Zhang,
Yu Han,
Hao Wang
Abstract:
Instruction Tuning has the potential to stimulate or enhance specific capabilities of large language models (LLMs). However, achieving the right balance of data is crucial to prevent catastrophic forgetting and interference between tasks. To address these limitations and enhance training flexibility, we propose the Mixture-of-LoRAs (MoA) architecture which is a novel and parameter-efficient tuning…
▽ More
Instruction Tuning has the potential to stimulate or enhance specific capabilities of large language models (LLMs). However, achieving the right balance of data is crucial to prevent catastrophic forgetting and interference between tasks. To address these limitations and enhance training flexibility, we propose the Mixture-of-LoRAs (MoA) architecture which is a novel and parameter-efficient tuning method designed for multi-task learning with LLMs. In this paper, we start by individually training multiple domain-specific LoRA modules using corresponding supervised corpus data. These LoRA modules can be aligned with the expert design principles observed in Mixture-of-Experts (MoE). Subsequently, we combine the multiple LoRAs using an explicit routing strategy and introduce domain labels to facilitate multi-task learning, which help prevent interference between tasks and ultimately enhances the performance of each individual task. Furthermore, each LoRA model can be iteratively adapted to a new domain, allowing for quick domain-specific adaptation. Experiments on diverse tasks demonstrate superior and robust performance, which can further promote the wide application of domain-specific LLMs.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Moving and fusion of Majorana zero modes in the presence of nonadiabatic transitions
Authors:
Qiongyao Wang,
Jing Bai,
Luting Xu,
Wei Feng,
Xin-Qi Li
Abstract:
We perform simulations for moving and non-Abelian fusion of Majorana zero modes in topological superconducting quantum wires. We display interesting behaviors of nonadiabatic transition associated with the moving through mini-gate-controlled multiple-segments modulations. Owing to breaking of the initial fermion parity induced by nonadiabatic transitions, deviation from the standard fusion rule is…
▽ More
We perform simulations for moving and non-Abelian fusion of Majorana zero modes in topological superconducting quantum wires. We display interesting behaviors of nonadiabatic transition associated with the moving through mini-gate-controlled multiple-segments modulations. Owing to breaking of the initial fermion parity induced by nonadiabatic transitions, deviation from the standard fusion rule is analyzed. Moreover, we develop a measurement scheme to infer the amount of fermion parity breaking and nonadiabatic transition probability to excited states, based on the characteristic spectrum of measurement current by a uantum-point-contact detector, in measuring the charge occupation dynamics in a fusion-outcome-probing quantum dot.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Structured Satellite-UAV-Terrestrial Networks for 6G Internet of Things
Authors:
Wei Feng,
Yanmin Wang,
Yunfei Chen,
Ning Ge,
Cheng-Xiang Wang
Abstract:
The upcoming sixth generation (6G) wireless communication network is envisioned to cover space, air, and maritime areas, in addition to urban-centered terrestrial coverage by the fifth generation (5G) network, to support intelligent Internet of Things (IoT). Towards this end, we investigate structured integration of satellites, unmanned aerial vehicles (UAVs), and terrestrial networks, aiming to s…
▽ More
The upcoming sixth generation (6G) wireless communication network is envisioned to cover space, air, and maritime areas, in addition to urban-centered terrestrial coverage by the fifth generation (5G) network, to support intelligent Internet of Things (IoT). Towards this end, we investigate structured integration of satellites, unmanned aerial vehicles (UAVs), and terrestrial networks, aiming to serve future universal IoT possibly with a massive number of devices in the coverage holes of current 5G. The hybrid satellite-UAV-terrestrial network usually leads to high system complexity, due to the heterogeneity and dynamics of space/air/ground links. With a systematic thinking, we propose to create and exploit hierarchies for the integrated network. Four basic structures are discussed by learning from the synergies in our human body. To orchestrate multiple heterogeneous basic structures, we further propose a process-oriented on-demand coverage method, which characterizes the system behavior as a series of events over time and is able to tackle the system complexity elaborately. We also outline open issues for promoting the agility and intelligence of structured satellite-UAV-terrestrial networks in the making.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Graph Descriptive Order Improves Reasoning with Large Language Model
Authors:
Yuyao Ge,
Shenghua Liu,
Wenjie Feng,
Lingrui Mei,
Lizhe Chen,
Xueqi Cheng
Abstract:
In recent years, large language models have achieved state-of-the-art performance across multiple domains. However, the progress in the field of graph reasoning with LLM remains limited. Our work delves into this gap by thoroughly investigating graph reasoning with LLMs. In this work, we reveal the impact of the order of graph description on LLMs' graph reasoning performance, which significantly a…
▽ More
In recent years, large language models have achieved state-of-the-art performance across multiple domains. However, the progress in the field of graph reasoning with LLM remains limited. Our work delves into this gap by thoroughly investigating graph reasoning with LLMs. In this work, we reveal the impact of the order of graph description on LLMs' graph reasoning performance, which significantly affects LLMs' reasoning abilities. By altering this order, we enhance the performance of LLMs from 42.22\% to 70\%. Furthermore, we introduce the Scaled Graph Reasoning benchmark for assessing LLMs' performance across various graph sizes and evaluate the relationship between LLMs' graph reasoning abilities and graph size. We discover that the graph reasoning performance of LLMs does not monotonically decrease with the increase in graph size. The experiments span several mainstream models, including GPT-3.5, LLaMA-2-7B, and LLaMA-2-13B, to offer a comprehensive evaluation.
△ Less
Submitted 24 February, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
From Synthetic to Real: Unveiling the Power of Synthetic Data for Video Person Re-ID
Authors:
Xiangqun Zhang,
Ruize Han,
Wei Feng
Abstract:
In this paper, we study a new problem of cross-domain video based person re-identification (Re-ID). Specifically, we take the synthetic video dataset as the source domain for training and use the real-world videos for testing, which significantly reduces the dependence on real training data collection and annotation. To unveil the power of synthetic data for video person Re-ID, we first propose a…
▽ More
In this paper, we study a new problem of cross-domain video based person re-identification (Re-ID). Specifically, we take the synthetic video dataset as the source domain for training and use the real-world videos for testing, which significantly reduces the dependence on real training data collection and annotation. To unveil the power of synthetic data for video person Re-ID, we first propose a self-supervised domain invariant feature learning strategy for both static and temporal features. Then, to further improve the person identification ability in the target domain, we develop a mean-teacher scheme with the self-supervised ID consistency loss. Experimental results on four real datasets verify the rationality of cross-synthetic-real domain adaption and the effectiveness of our method. We are also surprised to find that the synthetic data performs even better than the real data in the cross-domain setting.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking
Authors:
Wei Feng,
Feifan Wang,
Ruize Han,
Zekun Qian,
Song Wang
Abstract:
Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the video…
▽ More
Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self learning. In this work, we tackle this problem with a self-supervised learning aware end-to-end network. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate the multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.