-
TensorTEE: Unifying Heterogeneous TEE Granularity for Efficient Secure Collaborative Tensor Computing
Authors:
Husheng Han,
Xinyao Zheng,
Yuanbo Wen,
Yifan Hao,
Erhu Feng,
Ling Liang,
Jianan Mu,
Xiaqing Li,
Tianyun Ma,
Pengwei Jin,
Xinkai Song,
Zidong Du,
Qi Guo,
Xing Hu
Abstract:
Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computin…
▽ More
Heterogeneous collaborative computing with NPU and CPU has received widespread attention due to its substantial performance benefits. To ensure data confidentiality and integrity during computing, Trusted Execution Environments (TEE) is considered a promising solution because of its comparatively lower overhead. However, existing heterogeneous TEE designs are inefficient for collaborative computing due to fine and different memory granularities between CPU and NPU. 1) The cacheline granularity of CPU TEE intensifies memory pressure due to its extra memory access, and 2) the cacheline granularity MAC of NPU escalates the pressure on the limited memory storage. 3) Data transfer across heterogeneous enclaves relies on the transit of non-secure regions, resulting in cumbersome re-encryption and scheduling.
To address these issues, we propose TensorTEE, a unified tensor-granularity heterogeneous TEE for efficient secure collaborative tensor computing. First, we virtually support tensor granularity in CPU TEE to eliminate the off-chip metadata access by detecting and maintaining tensor structures on-chip. Second, we propose tensor-granularity MAC management with predictive execution to avoid computational stalls while eliminating off-chip MAC storage and access. Moreover, based on the unified granularity, we enable direct data transfer without re-encryption and scheduling dilemmas. Our evaluation is built on enhanced Gem5 and a cycle-accurate NPU simulator. The results show that TensorTEE improves the performance of Large Language Model (LLM) training workloads by 4.0x compared to existing work and incurs only 2.1% overhead compared to non-secure training, offering a practical security assurance for LLM training.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces
Authors:
Xingyu Zhou,
Le Liang,
Jing Zhang,
Chao-Kai Wen,
Shi Jin
Abstract:
The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing…
▽ More
The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing gradient-based MCMC detectors are heuristically designed and thus are theoretically untenable. To bridge this gap, we introduce a novel sampling algorithm tailored for discrete spaces. This algorithm leverages gradients from the underlying continuous spaces for acceleration while maintaining the validity of probabilistic sampling. We prove the convergence of this method and also analyze its convergence rate using both MCMC theory and empirical diagnostics. On this basis, we develop a MIMO detector that precisely samples from the target discrete distribution and generates posterior Bayesian estimates using these samples, whose performance is thereby theoretically guaranteed. Furthermore, our proposed detector is highly parallelizable and scalable to large MIMO dimensions, positioning it as a compelling candidate for next-generation wireless networks. Simulation results show that our detector achieves near-optimal performance, significantly outperforms state-of-the-art baselines, and showcases resilience to various system setups.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Vertex Exchange Method for a Class of Quadratic Programming Problems
Authors:
Ling Liang,
Kim-Chuan Toh,
Haizhao Yang
Abstract:
A vertex exchange method is proposed for solving the strongly convex quadratic program subject to the generalized simplex constraint. We conduct rigorous convergence analysis for the proposed algorithm and demonstrate its essential roles in solving some important classes of constrained convex optimization. To get a feasible initial point to execute the algorithm, we also present and analyze a high…
▽ More
A vertex exchange method is proposed for solving the strongly convex quadratic program subject to the generalized simplex constraint. We conduct rigorous convergence analysis for the proposed algorithm and demonstrate its essential roles in solving some important classes of constrained convex optimization. To get a feasible initial point to execute the algorithm, we also present and analyze a highly efficient semismooth Newton method for computing the projection onto the generalized simplex. The excellent practical performance of the proposed algorithms is demonstrated by a set of extensive numerical experiments. Our theoretical and numerical results further motivate the potential applications of the considered model and the proposed algorithms.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Nesterov's Accelerated Jacobi-Type Methods for Large-scale Symmetric Positive Semidefinite Linear Systems
Authors:
Ling Liang,
Qiyuan Pang,
Kim-Chuan Toh,
Haizhao Yang
Abstract:
Solving symmetric positive semidefinite linear systems is an essential task in many scientific computing problems. While Jacobi-type methods, including the classical Jacobi method and the weighted Jacobi method, exhibit simplicity in their forms and friendliness to parallelization, they are not attractive either because of the potential convergence failure or their slow convergence rate. This pape…
▽ More
Solving symmetric positive semidefinite linear systems is an essential task in many scientific computing problems. While Jacobi-type methods, including the classical Jacobi method and the weighted Jacobi method, exhibit simplicity in their forms and friendliness to parallelization, they are not attractive either because of the potential convergence failure or their slow convergence rate. This paper aims to showcase the possibility of improving classical Jacobi-type methods by employing Nesterov's acceleration technique that results in an accelerated Jacobi-type method with improved convergence properties. Simultaneously, it preserves the appealing features for parallel implementation. In particular, we show that the proposed method has an $O\left(\frac{1}{t^2}\right)$ convergence rate in terms of objective function values of the associated convex quadratic optimization problem, where $t\geq 1$ denotes the iteration counter. To further improve the practical performance of the proposed method, we also develop and analyze a restarted variant of the method, which is shown to have an $O\left(\frac{(\log_2(t))^2}{t^2}\right)$ convergence rate when the coefficient matrix is positive definite. Furthermore, we conduct appropriate numerical experiments to evaluate the efficiency of the proposed method. Our numerical results demonstrate that the proposed method outperforms the classical Jacobi-type methods and the conjugate gradient method and shows a comparable performance as the preconditioned conjugate gradient method with a diagonal preconditioner. Finally, we develop a parallel implementation and conduct speed-up tests on some large-scale systems. Our results indicate that the proposed framework is highly scalable.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Croppable Knowledge Graph Embedding
Authors:
Yushan Zhu,
Wen Zhang,
Zhiqiang Liu,
Mingyang Chen,
Lei Liang,
Huajun Chen
Abstract:
Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the effic…
▽ More
Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the efficiency and flexibility of KGE in serving various scenarios. In this work, we propose a novel KGE training framework MED, through which we could train once to get a croppable KGE model applicable to multiple scenarios with different dimensional requirements, sub-models of the required dimensions can be cropped out of it and used directly without any additional training. In MED, we propose a mutual learning mechanism to improve the low-dimensional sub-models performance and make the high-dimensional sub-models retain the capacity that low-dimensional sub-models have, an evolutionary improvement mechanism to promote the high-dimensional sub-models to master the knowledge that the low-dimensional sub-models can not learn, and a dynamic loss weight to balance the multiple losses adaptively. Experiments on 3 KGE models over 4 standard KG completion datasets, 3 real application scenarios over a real-world large-scale KG, and the experiments of extending MED to the language model BERT show the effectiveness, high efficiency, and flexible extensibility of MED.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Elevated UV luminosity density at Cosmic Dawn explained by non-evolving, weakly-mass dependent star formation efficiency
Authors:
Robert Feldmann,
Michael Boylan-Kolchin,
James S. Bullock,
Onur Çatmabacak,
Claude-André Faucher-Giguère,
Christopher C. Hayward,
Dušan Kereš,
Alexandres Lazar,
Lichen Liang,
Jorge Moreno,
Pascal A. Oesch,
Eliot Quataert,
Xuejian Shen,
Guochao Sun
Abstract:
Recent observations with the James Webb Space Telescope (JWST) have uncovered unexpectedly high cosmic star formation activity in the early Universe, mere hundreds of millions of years after the Big Bang. These observations are often understood to reflect an evolutionary shift in star formation efficiency (SFE) caused by changing galactic conditions during these early epochs. We present FIREbox-HR…
▽ More
Recent observations with the James Webb Space Telescope (JWST) have uncovered unexpectedly high cosmic star formation activity in the early Universe, mere hundreds of millions of years after the Big Bang. These observations are often understood to reflect an evolutionary shift in star formation efficiency (SFE) caused by changing galactic conditions during these early epochs. We present FIREbox-HR, a high-resolution, cosmological hydrodynamical simulation from the Feedback in Realistic Environments project, which offers insights into the SFE of galaxies during the first billion years of cosmic time. FIREbox-HR re-simulates the cosmic volume (L = 22.1 cMpc) of the original FIREbox run with eight times higher mass resolution (m_b ~ 7800 M_sun), but with identical physics, down to z ~ 6. FIREbox-HR predicts ultraviolet (UV) luminosity functions in good agreement with available observational data. The simulation also successfully reproduces the observed cosmic UV luminosity density at z ~ 6 - 14, demonstrating that relatively high star formation activity in the early Universe is a natural outcome of the baryonic processes encoded in the FIRE-2 model. According to FIREbox-HR, the SFE - halo mass relation for intermediate mass halos (M_halo ~ 10^9 - 10^11 M_sun) does not significantly evolve with redshift and is only weakly mass-dependent. These properties of the SFE - halo mass relation lead to a larger contribution from lower mass halos at higher z, driving the gradual evolution of the observed cosmic UV luminosity density. A theoretical model based on the SFE - halo mass relation inferred from FIREbox-HR allows us to explore implications for galaxy evolution. Future observations of UV faint galaxies at z > 12 will provide an opportunity to further test these predictions and deepen our understanding of star formation during Cosmic Dawn.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration
Authors:
Pratheba Selvaraju,
Tianyu Ding,
Tianyi Chen,
Ilya Zharkov,
Luming Liang
Abstract:
Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple ye…
▽ More
Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple yet effective approach designed to accelerate DiT by exploiting the repetitive nature of the diffusion process. FORA implements a caching mechanism that stores and reuses intermediate outputs from the attention and MLP layers across denoising steps, thereby reducing computational overhead. This approach does not require model retraining and seamlessly integrates with existing transformer-based diffusion models. Experiments show that FORA can speed up diffusion transformers several times over while only minimally affecting performance metrics such as the IS Score and FID. By enabling faster processing with minimal trade-offs in quality, FORA represents a significant advancement in deploying diffusion transformers for real-time applications. Code will be made publicly available at: https://github.com/prathebaselva/FORA.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
TrustUQA: A Trustful Framework for Unified Structured Data Question Answering
Authors:
Wen Zhang,
Long Jin,
Yushan Zhu,
Jiaoyan Chen,
Zhiwei Huang,
Junjie Wang,
Yin Hua,
Lei Liang,
Huajun Chen
Abstract:
Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple…
▽ More
Natural language question answering (QA) over structured data sources such as tables and knowledge graphs (KGs) have been widely investigated, for example with Large Language Models (LLMs). The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multiple sources simultaneously, while the later is limited in trustfulness. In this paper, we propose UnifiedTQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph (CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated UnifiedTQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods and in comparison with the baselines that are specific to a data type, it achieves state-of-the-art on 2 of them. Further more, we demonstrates potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition
Authors:
Yi Ding,
Chengxuan Tong,
Shuailei Zhang,
Muyun Jiang,
Yong Li,
Kevin Lim Jun Liang,
Cuntai Guan
Abstract:
Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a…
▽ More
Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction module (TGC). A novel residual multi-view pyramid GCN module (RMPG) is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer module (TCT) with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output module (TSO) generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
A Multi-Stage Goal-Driven Network for Pedestrian Trajectory Prediction
Authors:
Xiuen Wu,
Tao Wang,
Yuanzheng Cai,
Lingyu Liang,
George Papageorgiou
Abstract:
Pedestrian trajectory prediction plays a pivotal role in ensuring the safety and efficiency of various applications, including autonomous vehicles and traffic management systems. This paper proposes a novel method for pedestrian trajectory prediction, called multi-stage goal-driven network (MGNet). Diverging from prior approaches relying on stepwise recursive prediction and the singular forecastin…
▽ More
Pedestrian trajectory prediction plays a pivotal role in ensuring the safety and efficiency of various applications, including autonomous vehicles and traffic management systems. This paper proposes a novel method for pedestrian trajectory prediction, called multi-stage goal-driven network (MGNet). Diverging from prior approaches relying on stepwise recursive prediction and the singular forecasting of a long-term goal, MGNet directs trajectory generation by forecasting intermediate stage goals, thereby reducing prediction errors. The network comprises three main components: a conditional variational autoencoder (CVAE), an attention module, and a multi-stage goal evaluator. Trajectories are encoded using conditional variational autoencoders to acquire knowledge about the approximate distribution of pedestrians' future trajectories, and combined with an attention mechanism to capture the temporal dependency between trajectory sequences. The pivotal module is the multi-stage goal evaluator, which utilizes the encoded feature vectors to predict intermediate goals, effectively minimizing cumulative errors in the recursive inference process. The effectiveness of MGNet is demonstrated through comprehensive experiments on the JAAD and PIE datasets. Comparative evaluations against state-of-the-art algorithms reveal significant performance improvements achieved by our proposed method.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Fully Nonlinear Elliptic Equations With Periodic Data
Authors:
Dongsheng Li,
Lichun Liang
Abstract:
In this paper, we study solutions $u$ of fully nonlinear elliptic equations of the form $F(D^2u)=f$ in $\mathbb{R}^n$, where $f$ is periodic. We establish the existence and Liouville type results for entire quadratic polynomial growth solutions, that is, the solution is a quadratic polynomial plus a periodic function. As a consequence, we consider applications to $k$-Hessian equations.
In this paper, we study solutions $u$ of fully nonlinear elliptic equations of the form $F(D^2u)=f$ in $\mathbb{R}^n$, where $f$ is periodic. We establish the existence and Liouville type results for entire quadratic polynomial growth solutions, that is, the solution is a quadratic polynomial plus a periodic function. As a consequence, we consider applications to $k$-Hessian equations.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
CoSQA+: Enhancing Code Search Dataset with Matching Code
Authors:
Jing Gong,
Yanghui Wu,
Linxi Liang,
Zibin Zheng,
Yanlin Wang
Abstract:
Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering. Existing code search datasets are problematic: either using unrealistic queries, or with mismatched codes, and typically using one-to-one query-code pairing, which fails to reflect the reality that a query might have multiple valid code matches. T…
▽ More
Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering. Existing code search datasets are problematic: either using unrealistic queries, or with mismatched codes, and typically using one-to-one query-code pairing, which fails to reflect the reality that a query might have multiple valid code matches. This paper introduces CoSQA+, pairing high-quality queries (reused from CoSQA) with multiple suitable codes. We collect code candidates from diverse sources and form candidate pairs by pairing queries with these codes. Utilizing the power of large language models (LLMs), we automate pair annotation, filtering, and code generation for queries without suitable matches. Through extensive experiments, CoSQA+ has demonstrated superior quality over CoSQA. Models trained on CoSQA+ exhibit improved performance. Furthermore, we propose a new metric Mean Multi-choice Reciprocal Rank (MMRR), to assess one-to-N code search performance. We provide the code and data at https://github.com/DeepSoftwareAnalytics/CoSQA_Plus.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Authors:
Zeyu Liu,
Weicong Liang,
Yiming Zhao,
Bohan Chen,
Lin Liang,
Lijuan Wang,
Ji Li,
Yuhui Yuan
Abstract:
Recently, Glyph-ByT5 has achieved highly accurate visual text rendering performance in graphic design images. However, it still focuses solely on English and performs relatively poorly in terms of visual appeal. In this work, we address these two fundamental limitations by presenting Glyph-ByT5-v2 and Glyph-SDXL-v2, which not only support accurate visual text rendering for 10 different languages b…
▽ More
Recently, Glyph-ByT5 has achieved highly accurate visual text rendering performance in graphic design images. However, it still focuses solely on English and performs relatively poorly in terms of visual appeal. In this work, we address these two fundamental limitations by presenting Glyph-ByT5-v2 and Glyph-SDXL-v2, which not only support accurate visual text rendering for 10 different languages but also achieve much better aesthetic quality. To achieve this, we make the following contributions: (i) creating a high-quality multilingual glyph-text and graphic design dataset consisting of more than 1 million glyph-text pairs and 10 million graphic design image-text pairs covering nine other languages, (ii) building a multilingual visual paragraph benchmark consisting of 1,000 prompts, with 100 for each language, to assess multilingual visual spelling accuracy, and (iii) leveraging the latest step-aware preference learning approach to enhance the visual aesthetic quality. With the combination of these techniques, we deliver a powerful customized multilingual text encoder, Glyph-ByT5-v2, and a strong aesthetic graphic generation model, Glyph-SDXL-v2, that can support accurate spelling in 10 different languages. We perceive our work as a significant advancement, considering that the latest DALL-E3 and Ideogram 1.0 still struggle with the multilingual visual text rendering task.
△ Less
Submitted 12 July, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Adaptive Cooperative Streaming of Holographic Video Over Wireless Networks: A Proximal Policy Optimization Solution
Authors:
Wanli Wen,
Jiping Yan,
Yulu Zhang,
Zhen Huang,
Liang Liang,
Yunjian Jia
Abstract:
Adapting holographic video streaming to fluctuating wireless channels is essential to maintain consistent and satisfactory Quality of Experience (QoE) for users, which, however, is a challenging task due to the dynamic and uncertain characteristics of wireless networks. To address this issue, we propose a holographic video cooperative streaming framework designed for a generic wireless network in…
▽ More
Adapting holographic video streaming to fluctuating wireless channels is essential to maintain consistent and satisfactory Quality of Experience (QoE) for users, which, however, is a challenging task due to the dynamic and uncertain characteristics of wireless networks. To address this issue, we propose a holographic video cooperative streaming framework designed for a generic wireless network in which multiple access points can cooperatively transmit video with different bitrates to multiple users. Additionally, we model a novel QoE metric tailored specifically for holographic video streaming, which can effectively encapsulate the nuances of holographic video quality, quality fluctuations, and rebuffering occurrences simultaneously. Furthermore, we formulate a formidable QoE maximization problem, which is a non-convex mixed integer nonlinear programming problem. Using proximal policy optimization (PPO), a new class of reinforcement learning algorithms, we devise a joint beamforming and bitrate control scheme, which can be wisely adapted to fluctuations in the wireless channel. The numerical results demonstrate the superiority of the proposed scheme over representative baselines.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Fast and Certifiable Trajectory Optimization
Authors:
Shucheng Kang,
Xiaoyang Xu,
Jay Sarva,
Ling Liang,
Heng Yang
Abstract:
We propose semidefinite trajectory optimization (STROM), a framework that computes fast and certifiably optimal solutions for nonconvex trajectory optimization problems defined by polynomial objectives and constraints. STROM employs sparse second-order Lasserre's hierarchy to generate semidefinite program (SDP) relaxations of trajectory optimization. Different from existing tools (e.g., YALMIP and…
▽ More
We propose semidefinite trajectory optimization (STROM), a framework that computes fast and certifiably optimal solutions for nonconvex trajectory optimization problems defined by polynomial objectives and constraints. STROM employs sparse second-order Lasserre's hierarchy to generate semidefinite program (SDP) relaxations of trajectory optimization. Different from existing tools (e.g., YALMIP and SOSTOOLS in Matlab), STROM generates chain-like multiple-block SDPs with only positive semidefinite (PSD) variables. Moreover, STROM does so two orders of magnitude faster. Underpinning STROM is cuADMM, the first ADMM-based SDP solver implemented in CUDA and runs in GPUs. cuADMM builds upon the symmetric Gauss-Seidel ADMM algorithm and leverages GPU parallelization to speedup solving sparse linear systems and projecting onto PSD cones. In five trajectory optimization problems (inverted pendulum, cart-pole, vehicle landing, flying robot, and car back-in), cuADMM computes optimal trajectories (with certified suboptimality below 1%) in minutes (when other solvers take hours or run out of memory) and seconds (when others take minutes). Further, when warmstarted by data-driven initialization in the inverted pendulum problem, cuADMM delivers real-time performance: providing certifiably optimal trajectories in 0.66 seconds despite the SDP has 49,500 variables and 47,351 constraints.
△ Less
Submitted 11 June, 2024; v1 submitted 9 June, 2024;
originally announced June 2024.
-
Double-RIS-Assisted Orbital Angular Momentum Near-Field Secure Communications
Authors:
Liping Liang,
Minmin Wang,
Wenchi Cheng,
Wei Zhang
Abstract:
To satisfy the various demands of growing devices and services, emerging high-frequency-based technologies promote near-field wireless communications. Therefore, near-field physical layer security has attracted much attention to facilitate the wireless information security against illegitimate eavesdropping. However, highly correlated channels between legitimate transceivers and eavesdroppers of e…
▽ More
To satisfy the various demands of growing devices and services, emerging high-frequency-based technologies promote near-field wireless communications. Therefore, near-field physical layer security has attracted much attention to facilitate the wireless information security against illegitimate eavesdropping. However, highly correlated channels between legitimate transceivers and eavesdroppers of existing multiple-input multiple-output (MIMO) based near-field secure technologies along with the low degrees of freedom significantly limit the enhancement of security results in wireless communications. To significantly increase the secrecy rates of near-field wireless communications, in this paper we propose the double-reconfigurable-intelligent-surface (RIS) assisted orbital angular momentum (OAM) secure scheme, where RISs with few reflecting elements are easily deployed to reconstruct the direct links blocked by obstacles between the legitimate transceivers, mitigate the inter-mode interference caused by the misalignment of legitimate transceivers, and adjust the OAM beams direction to interfere with eavesdroppers. Meanwhile, due to the unique orthogonality among OAM modes, the OAM-based joint index modulation and artificial noise scheme is proposed to weaken the information acquisition by eavesdroppers while increasing the achievable rate with the low cost of legitimate communications. To maximize the secrecy rate of our proposed scheme, we develop the Riemannian manifold conjugate gradient (RMCG)-based alternative optimization (AO) algorithm to jointly optimize the transmit power allocation of OAM modes and phase shifts of double RISs. Numerical results show that our proposed double-RIS-assisted OAM near-field secure scheme outperforms the existing works in terms of the secrecy rate and the eavesdropper's bit error rate.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Integrated Sensing and Communication for Anti-Jamming with OAM
Authors:
Liping Liang,
Wenchi Cheng,
Wei Zhang,
Zhuohui Yao
Abstract:
The spectrum share and open nature of wireless channels enable integrated sensing and communication (ISAC) susceptible to hostile jamming attacks. Due to the intrinsic orthogonality and rich azimuth angle information of orbital angular momentum (OAM), vortex electromagnetic waves with helical phase fronts have shown great potential to achieve high-resolution imaging and strong anti-jamming capabil…
▽ More
The spectrum share and open nature of wireless channels enable integrated sensing and communication (ISAC) susceptible to hostile jamming attacks. Due to the intrinsic orthogonality and rich azimuth angle information of orbital angular momentum (OAM), vortex electromagnetic waves with helical phase fronts have shown great potential to achieve high-resolution imaging and strong anti-jamming capability of wireless communication. Focusing on significantly enhancing the anti-jamming results of ISAC systems with limited bandwidth under hostile jamming, in this paper we propose a novel ISAC for anti-jamming with OAM scheme, where the OAM legitimate transmitter can simultaneously sense the position of jammers with dynamic behavior and send data to multiple OAM legitimate users. Specifically, the OAM modes for sensing and communications are respectively hopped according to pre-set index modulation information to suppress jamming. To acquire the position of the jammer, we develop the enhanced multiple-signal-classification-based three-dimension position estimation scheme with continuous sensing in both frequency and angular domains, where the OAM transmitter is designed with the concentric uniform-circular-array mono-static method, to significantly increase the azimuthal resolution. Then, based on the acquired jamming channel state information, we develop the joint transmit-receive beamforming and power allocation scheme, where the transmit and receive beamforming matrices are dynamically adjusted to mitigate the mixed interference containing inter-mode interference, inter-user interference, and jamming, thus maximizing the achievable sum rates (ASRs) of all users. Numerical results demonstrate that our proposed scheme can significantly increase the ASR under broadband jamming attacks and achieve high detection accuracy of targets .
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs
Authors:
Langzhang Liang,
Sunwoo Kim,
Kijung Shin,
Zenglin Xu,
Shirui Pan,
Yuan Qi
Abstract:
Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations…
▽ More
Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations of SMP. In this work, we unveil some potential pitfalls of SMP and their remedies. We first identify two limitations of SMP: undesirable representation update for multi-hop neighbors and vulnerability against oversmoothing issues. To overcome these challenges, we propose a novel message passing function called Multiset to Multiset GNN(M2M-GNN). Our theoretical analyses and extensive experiments demonstrate that M2M-GNN effectively alleviates the aforementioned limitations of SMP, yielding superior performance in comparison
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Hyper-Transformer for Amodal Completion
Authors:
Jianxiong Gao,
Xuelin Qian,
Longfei Liang,
Junwei Han,
Yanwei Fu
Abstract:
Amodal object completion is a complex task that involves predicting the invisible parts of an object based on visible segments and background information. Learning shape priors is crucial for effective amodal completion, but traditional methods often rely on two-stage processes or additional information, leading to inefficiencies and potential error accumulation. To address these shortcomings, we…
▽ More
Amodal object completion is a complex task that involves predicting the invisible parts of an object based on visible segments and background information. Learning shape priors is crucial for effective amodal completion, but traditional methods often rely on two-stage processes or additional information, leading to inefficiencies and potential error accumulation. To address these shortcomings, we introduce a novel framework named the Hyper-Transformer Amodal Network (H-TAN). This framework utilizes a hyper transformer equipped with a dynamic convolution head to directly learn shape priors and accurately predict amodal masks. Specifically, H-TAN uses a dual-branch structure to extract multi-scale features from both images and masks. The multi-scale features from the image branch guide the hyper transformer in learning shape priors and in generating the weights for dynamic convolution tailored to each instance. The dynamic convolution head then uses the features from the mask branch to predict precise amodal masks. We extensively evaluate our model on three benchmark datasets: KINS, COCOA-cls, and D2SA, where H-TAN demonstrated superior performance compared to existing methods. Additional experiments validate the effectiveness and stability of the novel hyper transformer in our framework.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts
Authors:
Chunjing Gan,
Dan Yang,
Binbin Hu,
Hanxiao Zhang,
Siyuan Li,
Ziqi Liu,
Yue Shen,
Lin Ju,
Zhiqiang Zhang,
Jinjie Gu,
Lei Liang,
Jun Zhou
Abstract:
In recent years, large language models (LLMs) have made remarkable achievements in various domains. However, the untimeliness and cost of knowledge updates coupled with hallucination issues of LLMs have curtailed their applications in knowledge intensive tasks, where retrieval augmented generation (RAG) can be of help. Nevertheless, existing retrieval augmented models typically use similarity as a…
▽ More
In recent years, large language models (LLMs) have made remarkable achievements in various domains. However, the untimeliness and cost of knowledge updates coupled with hallucination issues of LLMs have curtailed their applications in knowledge intensive tasks, where retrieval augmented generation (RAG) can be of help. Nevertheless, existing retrieval augmented models typically use similarity as a bridge between queries and documents and follow a retrieve then read procedure. In this work, we argue that similarity is not always the panacea and totally relying on similarity would sometimes degrade the performance of retrieval augmented generation. To this end, we propose MetRag, a Multi layEred Thoughts enhanced Retrieval Augmented Generation framework. To begin with, beyond existing similarity oriented thought, we embrace a small scale utility model that draws supervision from an LLM for utility oriented thought and further come up with a smarter model by comprehensively combining the similarity and utility oriented thoughts. Furthermore, given the fact that the retrieved document set tends to be huge and using them in isolation makes it difficult to capture the commonalities and characteristics among them, we propose to make an LLM as a task adaptive summarizer to endow retrieval augmented generation with compactness-oriented thought. Finally, with multi layered thoughts from the precedent stages, an LLM is called for knowledge augmented generation. Extensive experiments on knowledge-intensive tasks have demonstrated the superiority of MetRag.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Improving and Evaluating Machine Learning Methods for Forensic Shoeprint Matching
Authors:
Divij Jain,
Saatvik Kher,
Lena Liang,
Yufeng Wu,
Ashley Zheng,
Xizhen Cai,
Anna Plantinga,
Elizabeth Upton
Abstract:
We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random…
▽ More
We propose a machine learning pipeline for forensic shoeprint pattern matching that improves on the accuracy and generalisability of existing methods. We extract 2D coordinates from shoeprint scans using edge detection and align the two shoeprints with iterative closest point (ICP). We then extract similarity metrics to quantify how well the two prints match and use these metrics to train a random forest that generates a probabilistic measurement of how likely two prints are to have originated from the same outsole. We assess the generalisability of machine learning methods trained on lab shoeprint scans to more realistic crime scene shoeprint data by evaluating the accuracy of our methods on several shoeprint scenarios: partial prints, prints with varying levels of blurriness, prints with different amounts of wear, and prints from different shoe models. We find that models trained on one type of shoeprint yield extremely high levels of accuracy when tested on shoeprint pairs of the same scenario but fail to generalise to other scenarios. We also discover that models trained on a variety of scenarios predict almost as accurately as models trained on specific scenarios.
△ Less
Submitted 2 April, 2024;
originally announced May 2024.
-
Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks
Authors:
Yichi Zhang,
Binbin Hu,
Zhuo Chen,
Lingbing Guo,
Ziqi Liu,
Zhiqiang Zhang,
Lei Liang,
Huajun Chen,
Wen Zhang
Abstract:
Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely resea…
▽ More
Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely research questions in open domains, or they are not open source due to data security and privacy in real scenarios. Meanwhile, existing studies have not explored the training efficiency and transferability of KGP models in depth. To address these problems, We propose a framework MuDoK to achieve multi-domain collaborative pre-training and efficient prefix prompt tuning to serve diverse downstream tasks like recommendation and text understanding. Our design is a plug-and-play prompt learning approach that can be flexibly adapted to different downstream task backbones. In response to the lack of open-source benchmarks, we constructed a new multi-domain KGP benchmark called KPI with two large-scale KGs and six different sub-domain tasks to evaluate our method and open-sourced it for subsequent research. We evaluated our approach based on constructed KPI benchmarks using diverse backbone models in heterogeneous downstream tasks. The experimental results show that our framework brings significant performance gains, along with its generality, efficiency, and transferability.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
QueryNER: Segmentation of E-commerce Queries
Authors:
Chester Palen-Michel,
Lizzie Liang,
Zhe Wu,
Constantine Lignos
Abstract:
We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable…
▽ More
We present QueryNER, a manually-annotated dataset and accompanying model for e-commerce query segmentation. Prior work in sequence labeling for e-commerce has largely addressed aspect-value extraction which focuses on extracting portions of a product title or query for narrowly defined aspects. Our work instead focuses on the goal of dividing a query into meaningful chunks with broadly applicable types. We report baseline tagging results and conduct experiments comparing token and entity dropping for null and low recall query recovery. Challenging test sets are created using automatic transformations and show how simple data augmentation techniques can make the models more robust to noise. We make the QueryNER dataset publicly available.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
On Ridge Estimation in High-dimensional Rotationally Sparse Linear Regression
Authors:
Libin Liang,
Zhiqiang Tan
Abstract:
Recently, deep neural networks have been found to nearly interpolate training data but still generalize well in various applications. To help understand such a phenomenon, it has been of interest to analyze the ridge estimator and its interpolation limit in high-dimensional regression models. For this motivation, we study the ridge estimator in a rotationally sparse setting of high-dimensional lin…
▽ More
Recently, deep neural networks have been found to nearly interpolate training data but still generalize well in various applications. To help understand such a phenomenon, it has been of interest to analyze the ridge estimator and its interpolation limit in high-dimensional regression models. For this motivation, we study the ridge estimator in a rotationally sparse setting of high-dimensional linear regression, where the signal of a response is aligned with a small number, $d$, of covariates with large or spiked variances, compared with the remaining covariates with small or tail variances, \textit{after} an orthogonal transformation of the covariate vector. We establish high-probability upper and lower bounds on the out-sample and in-sample prediction errors in two distinct regimes depending on the ratio of the effective rank of tail variances over the sample size $n$. The separation of the two regimes enables us to exploit relevant concentration inequalities and derive concrete error bounds without making any oracle assumption or independent components assumption on covariate vectors. Moreover, we derive sufficient and necessary conditions which indicate that the prediction errors of ridge estimation can be of the order $O(\frac{d}{n})$ if and only if the gap between the spiked and tail variances are sufficiently large. We also compare the orders of optimal out-sample and in-sample prediction errors and find that, remarkably, the optimal out-sample prediction error may be significantly smaller than the optimal in-sample one. Finally, we present numerical experiments which empirically confirm our theoretical findings.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Compatible weak factorization systems and model structures
Authors:
Zhenxing Di,
Liping Li,
Li Liang
Abstract:
In this paper the concept of compatible weak factorization systems in general categories is introduced as a counterpart of compatible complete cotorsion pairs in abelian categories. We describe a method to construct model structures on general categories via two compatible weak factorization systems satisfying certain conditions, and hence generalize a very useful result by Gillespie for abelian m…
▽ More
In this paper the concept of compatible weak factorization systems in general categories is introduced as a counterpart of compatible complete cotorsion pairs in abelian categories. We describe a method to construct model structures on general categories via two compatible weak factorization systems satisfying certain conditions, and hence generalize a very useful result by Gillespie for abelian model structures. As particular examples, we show that weak factorizations systems associated to some classical model structures (for example, the Kan-Quillen model structure on $\mathsf{sSet}$) satisfy these conditions.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
One-way Valley-locked waveguide with large channel achieved by all-dielectric Photonic Crystals
Authors:
Li Liang,
Xiao Zhang,
Chuan Wang,
Jie Liu,
Longzhen Fan,
Chengpeng Liang,
Liang Liang,
Feifei Li,
Qi Wu,
Yin Poo
Abstract:
Nonreciprocity, which denotes the asymmetric or even unidirectional transmission of light, constitutes the cornerstone of modern photonic circuits. In the realm of photonic devices, it has been widely utilized in isolators, circulators and so on. Recent topology in artificial materials, an unprecedented degree of freedom, has been proposed to solve the effect of impurities on nonreciprocal transmi…
▽ More
Nonreciprocity, which denotes the asymmetric or even unidirectional transmission of light, constitutes the cornerstone of modern photonic circuits. In the realm of photonic devices, it has been widely utilized in isolators, circulators and so on. Recent topology in artificial materials, an unprecedented degree of freedom, has been proposed to solve the effect of impurities on nonreciprocal transmission. However, in view of the bulk-edge correspondence, the spatial width of the transmission channel with uniform field distribution is quite narrow and needs further exploration. In this paper, we proposed a one-way valley-locked waveguide with a large channel in an all-dielectric photonic crystal. Quite different from the topological edge modes, the unidirectional property of our waveguide comes from the bulk modes with valley-lock, which can fully utilize the whole dimension of the structure with an efficiency of 100%. Additionally, the electrical field is uniformly distributed across the entire channel, which opens a new avenue for low-loss nonreciprocity devices.
△ Less
Submitted 7 March, 2024;
originally announced May 2024.
-
Elevating electron energy gain and betatron X-ray emission in proton-driven wakefield acceleration
Authors:
Hossein Saberi,
Guoxing Xia,
Linbo Liang,
John Patrick Farmer,
Alexander Pukhov
Abstract:
The long proton beams present at CERN have the potential to evolve into a train of microbunches through the self-modulation instability process. The resonant wakefield generated by a periodic train of proton microbunches can establish a high acceleration field within the plasma, facilitating electron acceleration. This paper investigates the impact of plasma density on resonant wakefield excitatio…
▽ More
The long proton beams present at CERN have the potential to evolve into a train of microbunches through the self-modulation instability process. The resonant wakefield generated by a periodic train of proton microbunches can establish a high acceleration field within the plasma, facilitating electron acceleration. This paper investigates the impact of plasma density on resonant wakefield excitation, thus influencing acceleration of a witness electron bunch and its corresponding betatron radiation within the wakefield. Various scenarios involving different plasma densities are explored through particle-in-cell simulations. The peak wakefield in each scenario is calculated by considering a long pre-modulated proton driver with a fixed peak current. Subsequently, the study delves into the witness beam acceleration in the wakefield and its radiation emission. Elevated plasma density increases both the number of microbunches and the accelerating gradient of each microbunch, consequently resulting in heightened resonant wakefield. Nevertheless, the scaling is disrupted by the saturation of the resonant wakefield due to the nonlinearities. The simulation results reveal that at high plasma densities an intense and broadband radiation spectrum extending into the domain of the hard X-rays and gamma rays is generated. Furthermore, in such instances, the energy gain of the witness beam is significantly enhanced. The impact of wakefield on the witness energy gain and the corresponding radiation spectrum is clearly evident at extremely elevated densities.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Representations of $\mathbb{N}^{\infty}$-type combinatorial categories
Authors:
Zhenxing Di,
Liping Li,
Li Liang
Abstract:
In this paper we consider representations of certain combinatorial categories, including the poset $\D$ of positive integers and division, the Young lattice $\mathscr{Y}$ of partitions of finite sets, the opposite category of the orbit category $\mathscr{Z}$ of $(\mathbb{Z}, +)$ with respect to nontrivial subgroups, and the category $\mathscr{CI}$ of finite cyclic groups and injective homomorphism…
▽ More
In this paper we consider representations of certain combinatorial categories, including the poset $\D$ of positive integers and division, the Young lattice $\mathscr{Y}$ of partitions of finite sets, the opposite category of the orbit category $\mathscr{Z}$ of $(\mathbb{Z}, +)$ with respect to nontrivial subgroups, and the category $\mathscr{CI}$ of finite cyclic groups and injective homomorphisms. We describe explicit upper bounds for homological degrees of their representations, and deduce that finitely presented representations (resp., representations presented in finite degrees) over a field form abelian subcategories of the representation categories. We also give an explicit description for the category of sheaves over the ringed atomic site $(\mathscr{Z}, \, J_{at}, \, \underline{\mathbb{C}})$, and show that irreducible sheaves are parameterized by primitive roots of the unit.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
ConCLVD: Controllable Chinese Landscape Video Generation via Diffusion Model
Authors:
Dingming Liu,
Shaowei Li,
Ruoyan Zhou,
Lili Liang,
Yongguan Hong,
Fei Chao,
Rongrong Ji
Abstract:
Chinese landscape painting is a gem of Chinese cultural and artistic heritage that showcases the splendor of nature through the deep observations and imaginations of its painters. Limited by traditional techniques, these artworks were confined to static imagery in ancient times, leaving the dynamism of landscapes and the subtleties of artistic sentiment to the viewer's imagination. Recently, emerg…
▽ More
Chinese landscape painting is a gem of Chinese cultural and artistic heritage that showcases the splendor of nature through the deep observations and imaginations of its painters. Limited by traditional techniques, these artworks were confined to static imagery in ancient times, leaving the dynamism of landscapes and the subtleties of artistic sentiment to the viewer's imagination. Recently, emerging text-to-video (T2V) diffusion methods have shown significant promise in video generation, providing hope for the creation of dynamic Chinese landscape paintings. However, challenges such as the lack of specific datasets, the intricacy of artistic styles, and the creation of extensive, high-quality videos pose difficulties for these models in generating Chinese landscape painting videos. In this paper, we propose CLV-HD (Chinese Landscape Video-High Definition), a novel T2V dataset for Chinese landscape painting videos, and ConCLVD (Controllable Chinese Landscape Video Diffusion), a T2V model that utilizes Stable Diffusion. Specifically, we present a motion module featuring a dual attention mechanism to capture the dynamic transformations of landscape imageries, alongside a noise adapter to leverage unsupervised contrastive learning in the latent space. Following the generation of keyframes, we employ optical flow for frame interpolation to enhance video smoothness. Our method not only retains the essence of the landscape painting imageries but also achieves dynamic transitions, significantly advancing the field of artistic video generation. The source code and dataset are available at https://anonymous.4open.science/r/ConCLVD-EFE3.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
AntBatchInfer: Elastic Batch Inference in the Kubernetes Cluster
Authors:
Siyuan Li,
Youshao Xiao,
Fanzhuang Meng,
Lin Ju,
Lei Liang,
Lin Wang,
Jun Zhou
Abstract:
Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these chall…
▽ More
Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
AntDT: A Self-Adaptive Distributed Training Framework for Leader and Straggler Nodes
Authors:
Youshao Xiao,
Lin Ju,
Zhenglei Zhou,
Siyuan Li,
Zhaoxin Huan,
Dalong Zhang,
Rujie Jiang,
Lin Wang,
Xiaolu Zhang,
Lei Liang,
Jun Zhou
Abstract:
Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. However, stragglers frequently occur in distributed training due to resource contention and hardware heterogeneity, which significantly hampers the training efficiency. Previous works only address part of the stragglers and could not adapti…
▽ More
Many distributed training techniques like Parameter Server and AllReduce have been proposed to take advantage of the increasingly large data and rich features. However, stragglers frequently occur in distributed training due to resource contention and hardware heterogeneity, which significantly hampers the training efficiency. Previous works only address part of the stragglers and could not adaptively solve various stragglers in practice. Additionally, it is challenging to use a systematic framework to address all stragglers because different stragglers require diverse data allocation and fault-tolerance mechanisms. Therefore, this paper proposes a unified distributed training framework called AntDT (Ant Distributed Training Framework) to adaptively solve the straggler problems. Firstly, the framework consists of four components, including the Stateful Dynamic Data Sharding service, Monitor, Controller, and Agent. These components work collaboratively to efficiently distribute workloads and provide a range of pre-defined straggler mitigation methods with fault tolerance, thereby hiding messy details of data allocation and fault handling. Secondly, the framework provides a high degree of flexibility, allowing for the customization of straggler mitigation solutions based on the specific circumstances of the cluster. Leveraging this flexibility, we introduce two straggler mitigation solutions, namely AntDT-ND for non-dedicated clusters and AntDT-DD for dedicated clusters, as practical examples to resolve various types of stragglers at Ant Group. Justified by our comprehensive experiments and industrial deployment statistics, AntDT outperforms other SOTA methods more than 3x in terms of training efficiency. Additionally, in Alipay's homepage recommendation scenario, using AntDT reduces the training duration of the ranking model from 27.8 hours to just 5.4 hours.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
AdaContour: Adaptive Contour Descriptor with Hierarchical Representation
Authors:
Tianyu Ding,
Jinxin Zhou,
Tianyi Chen,
Zhihui Zhu,
Ilya Zharkov,
Luming Liang
Abstract:
Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably charac…
▽ More
Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably characterize complex shapes. After hierarchically encoding object shapes in a training set and constructing a contour matrix of all subdivided regions, we compute a robust low-rank robust subspace and approximate each local contour by linearly combining the shared basis vectors to represent an object. Experiments show that AdaContour is able to represent shapes more accurately and robustly than other descriptors while retaining effectiveness. We validate AdaContour by integrating it into off-the-shelf detectors to enable instance segmentation which demonstrates faithful performance. The code is available at https://github.com/tding1/AdaContour.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing
Authors:
Guangzhi Wang,
Tianyi Chen,
Kamran Ghasedi,
HsiangTao Wu,
Tianyu Ding,
Chris Nuesmeyer,
Ilya Zharkov,
Mohan Kankanhalli,
Luming Liang
Abstract:
Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introdu…
▽ More
Face attribute editing plays a pivotal role in various applications. However, existing methods encounter challenges in achieving high-quality results while preserving identity, editing faithfulness, and temporal consistency. These challenges are rooted in issues related to the training pipeline, including limited supervision, architecture design, and optimization strategy. In this work, we introduce S3Editor, a Sparse Semantic-disentangled Self-training framework for face video editing. S3Editor is a generic solution that comprehensively addresses these challenges with three key contributions. Firstly, S3Editor adopts a self-training paradigm to enhance the training process through semi-supervision. Secondly, we propose a semantic disentangled architecture with a dynamic routing mechanism that accommodates diverse editing requirements. Thirdly, we present a structured sparse optimization schema that identifies and deactivates malicious neurons to further disentangle impacts from untarget attributes. S3Editor is model-agnostic and compatible with various editing approaches. Our extensive qualitative and quantitative results affirm that our approach significantly enhances identity preservation, editing fidelity, as well as temporal consistency.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
Authors:
Lili Liang,
Guanglu Sun,
Jin Qiu,
Lizhong Zhang
Abstract:
Compositional spatio-temporal reasoning poses a significant challenge in the field of video question answering (VideoQA). Existing approaches struggle to establish effective symbolic reasoning structures, which are crucial for answering compositional spatio-temporal questions. To address this challenge, we propose a neural-symbolic framework called Neural-Symbolic VideoQA (NS-VideoQA), specificall…
▽ More
Compositional spatio-temporal reasoning poses a significant challenge in the field of video question answering (VideoQA). Existing approaches struggle to establish effective symbolic reasoning structures, which are crucial for answering compositional spatio-temporal questions. To address this challenge, we propose a neural-symbolic framework called Neural-Symbolic VideoQA (NS-VideoQA), specifically designed for real-world VideoQA tasks. The uniqueness and superiority of NS-VideoQA are two-fold: 1) It proposes a Scene Parser Network (SPN) to transform static-dynamic video scenes into Symbolic Representation (SR), structuralizing persons, objects, relations, and action chronologies. 2) A Symbolic Reasoning Machine (SRM) is designed for top-down question decompositions and bottom-up compositional reasonings. Specifically, a polymorphic program executor is constructed for internally consistent reasoning from SR to the final answer. As a result, Our NS-VideoQA not only improves the compositional spatio-temporal reasoning in real-world VideoQA task, but also enables step-by-step error analysis by tracing the intermediate results. Experimental evaluations on the AGQA Decomp benchmark demonstrate the effectiveness of the proposed NS-VideoQA framework. Empirical studies further confirm that NS-VideoQA exhibits internal consistency in answering compositional questions and significantly improves the capability of spatio-temporal and logical inference for VideoQA tasks.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Attention-based Shape-Deformation Networks for Artifact-Free Geometry Reconstruction of Lumbar Spine from MR Images
Authors:
Linchen Qian,
Jiasong Chen,
Linhai Ma,
Timur Urakov,
Weiyong Gu,
Liang Liang
Abstract:
Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-…
▽ More
Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present $\textit{UNet-DeformSA}$ and $\textit{TransDeformer}$: novel attention-based deep neural networks that reconstruct the geometry of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of $\textit{TransDeformer}$ for error estimation. Specially, we devise new attention modules with a new attention formula, which integrate image features and tokenized contour features to predict the displacements of the points on a shape template without the need for image segmentation. The deformed template reveals the lumbar spine geometry in an image. Experiment results show that our networks generate artifact-free geometry outputs, and the variant of $\textit{TransDeformer}$ can predict the errors of a reconstructed geometry. Our code is available at https://github.com/linchenq/TransDeformer-Mesh.
△ Less
Submitted 30 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Authors:
Pingcheng Dong,
Yonghao Tan,
Dong Zhang,
Tianwei Ni,
Xuejiao Liu,
Yu Liu,
Peng Luo,
Luhong Liang,
Shih-Yang Liu,
Xijie Huang,
Huaiyu Zhu,
Yun Pan,
Fengwei An,
Kwang-Ting Cheng
Abstract:
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of…
▽ More
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https:// github.com/PingchengDong/GQA-LUT.
△ Less
Submitted 29 March, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images
Authors:
Yujian Liu,
Ruoxuan Wu,
Xinjie Shen,
Zihuang Lu,
Lingyu Liang,
Haiyu Zhou,
Shipu Xu,
Shaoai Cai,
Shidang Xu
Abstract:
In the realm of digital pathology, multi-magnification Multiple Instance Learning (multi-mag MIL) has proven effective in leveraging the hierarchical structure of Whole Slide Images (WSIs) to reduce information loss and redundant data. However, current methods fall short in bridging the domain gap between pretrained models and medical imaging, and often fail to account for spatial relationships ac…
▽ More
In the realm of digital pathology, multi-magnification Multiple Instance Learning (multi-mag MIL) has proven effective in leveraging the hierarchical structure of Whole Slide Images (WSIs) to reduce information loss and redundant data. However, current methods fall short in bridging the domain gap between pretrained models and medical imaging, and often fail to account for spatial relationships across different magnifications. Addressing these challenges, we introduce the Concentric Dual Fusion Attention-MIL (CDFA-MIL) framework,which innovatively combines point-to-area feature-colum attention and point-to-point concentric-row attention using concentric patch. This approach is designed to effectively fuse correlated information, enhancing feature representation and providing stronger correlation guidance for WSI analysis. CDFA-MIL distinguishes itself by offering a robust fusion strategy that leads to superior WSI recognition. Its application has demonstrated exceptional performance, significantly surpassing existing MIL methods in accuracy and F1 scores on prominent datasets like Camelyon16 and TCGA-NSCLC. Specifically, CDFA-MIL achieved an average accuracy and F1-score of 93.7\% and 94.1\% respectively on these datasets, marking a notable advancement over traditional MIL approaches.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
InBox: Recommendation with Knowledge Graph using Interest Box Embedding
Authors:
Zezhong Xu,
Yincen Qu,
Wen Zhang,
Lei Liang,
Huajun Chen
Abstract:
Knowledge graphs (KGs) have become vitally important in modern recommender systems, effectively improving performance and interpretability. Fundamentally, recommender systems aim to identify user interests based on historical interactions and recommend suitable items. However, existing works overlook two key challenges: (1) an interest corresponds to a potentially large set of related items, and (…
▽ More
Knowledge graphs (KGs) have become vitally important in modern recommender systems, effectively improving performance and interpretability. Fundamentally, recommender systems aim to identify user interests based on historical interactions and recommend suitable items. However, existing works overlook two key challenges: (1) an interest corresponds to a potentially large set of related items, and (2) the lack of explicit, fine-grained exploitation of KG information and interest connectivity. This leads to an inability to reflect distinctions between entities and interests when modeling them in a single way. Additionally, the granularity of concepts in the knowledge graphs used for recommendations tends to be coarse, failing to match the fine-grained nature of user interests. This homogenization limits the precise exploitation of knowledge graph data and interest connectivity. To address these limitations, we introduce a novel embedding-based model called InBox. Specifically, various knowledge graph entities and relations are embedded as points or boxes, while user interests are modeled as boxes encompassing interaction history. Representing interests as boxes enables containing collections of item points related to that interest. We further propose that an interest comprises diverse basic concepts, and box intersection naturally supports concept combination. Across three training steps, InBox significantly outperforms state-of-the-art methods like HAKG and KGIN on recommendation tasks. Further analysis provides meaningful insights into the variable value of different KG data for recommendations. In summary, InBox advances recommender systems through box-based interest and concept modeling for sophisticated knowledge graph exploitation.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Prompt-fused framework for Inductive Logical Query Answering
Authors:
Zezhong Xu,
Peng Ye,
Lei Liang,
Huajun Chen,
Wen Zhang
Abstract:
Answering logical queries on knowledge graphs (KG) poses a significant challenge for machine reasoning. The primary obstacle in this task stems from the inherent incompleteness of KGs. Existing research has predominantly focused on addressing the issue of missing edges in KGs, thereby neglecting another aspect of incompleteness: the emergence of new entities. Furthermore, most of the existing meth…
▽ More
Answering logical queries on knowledge graphs (KG) poses a significant challenge for machine reasoning. The primary obstacle in this task stems from the inherent incompleteness of KGs. Existing research has predominantly focused on addressing the issue of missing edges in KGs, thereby neglecting another aspect of incompleteness: the emergence of new entities. Furthermore, most of the existing methods tend to reason over each logical operator separately, rather than comprehensively analyzing the query as a whole during the reasoning process. In this paper, we propose a query-aware prompt-fused framework named Pro-QE, which could incorporate existing query embedding methods and address the embedding of emerging entities through contextual information aggregation. Additionally, a query prompt, which is generated by encoding the symbolic query, is introduced to gather information relevant to the query from a holistic perspective. To evaluate the efficacy of our model in the inductive setting, we introduce two new challenging benchmarks. Experimental results demonstrate that our model successfully handles the issue of unseen entities in logical queries. Furthermore, the ablation study confirms the efficacy of the aggregator and prompt components.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
Authors:
Hongcheng Zhang,
Liu Liang,
Pengxin Zeng,
Xiao Song,
Zhe Wang
Abstract:
Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-per…
▽ More
Sparse 3D detectors have received significant attention since the query-based paradigm embraces low latency without explicit dense BEV feature construction. However, these detectors achieve worse performance than their dense counterparts. In this paper, we find the key to bridging the performance gap is to enhance the awareness of rich representations in two modalities. Here, we present a high-performance fully sparse detector for end-to-end multi-modality 3D object detection. The detector, termed SparseLIF, contains three key designs, which are (1) Perspective-Aware Query Generation (PAQG) to generate high-quality 3D queries with perspective priors, (2) RoI-Aware Sampling (RIAS) to further refine prior queries by sampling RoI features from each modality, (3) Uncertainty-Aware Fusion (UAF) to precisely quantify the uncertainty of each sensor modality and adaptively conduct final multi-modality fusion, thus achieving great robustness against sensor noises. By the time of paper submission, SparseLIF achieves state-of-the-art performance on the nuScenes dataset, ranking 1st on both validation set and test benchmark, outperforming all state-of-the-art 3D object detectors by a notable margin.
△ Less
Submitted 10 July, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Editing Conceptual Knowledge for Large Language Models
Authors:
Xiaohan Wang,
Shengyu Mao,
Ningyu Zhang,
Shumin Deng,
Yunzhi Yao,
Yue Shen,
Lei Liang,
Jinjie Gu,
Huajun Chen
Abstract:
Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establi…
▽ More
Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establishing a suite of new metrics for evaluation. The experimental results reveal that, although existing editing methods can efficiently modify concept-level definition to some extent, they also have the potential to distort the related instantial knowledge in LLMs, leading to poor performance. We anticipate this can inspire further progress in better understanding LLMs. Our project homepage is available at https://zjunlp.github.io/project/ConceptEdit.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents
Authors:
Yuqi Zhu,
Shuofei Qiao,
Yixin Ou,
Shumin Deng,
Ningyu Zhang,
Shiwei Lyu,
Yue Shen,
Lei Liang,
Jinjie Gu,
Huajun Chen
Abstract:
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories durin…
▽ More
Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories during task solving and results in planning hallucination. To address this issue, we introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge. Specifically, KnowAgent employs an action knowledge base and a knowledgeable self-learning strategy to constrain the action path during planning, enabling more reasonable trajectory synthesis, and thereby enhancing the planning performance of language agents. Experimental results on HotpotQA and ALFWorld based on various backbone models demonstrate that KnowAgent can achieve comparable or superior performance to existing baselines. Further analysis indicates the effectiveness of KnowAgent in terms of planning hallucinations mitigation. Code is available in https://github.com/zjunlp/KnowAgent.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Authors:
Mahmoud Afifi,
Zhenhua Hu,
Liang Liang
Abstract:
High dynamic range (HDR) imaging involves capturing a series of frames of the same scene, each with different exposure settings, to broaden the dynamic range of light. This can be achieved through burst capturing or using staggered HDR sensors that capture long and short exposures simultaneously in the camera image signal processor (ISP). Within camera ISP pipeline, illuminant estimation is a cruc…
▽ More
High dynamic range (HDR) imaging involves capturing a series of frames of the same scene, each with different exposure settings, to broaden the dynamic range of light. This can be achieved through burst capturing or using staggered HDR sensors that capture long and short exposures simultaneously in the camera image signal processor (ISP). Within camera ISP pipeline, illuminant estimation is a crucial step aiming to estimate the color of the global illuminant in the scene. This estimation is used in camera ISP white-balance module to remove undesirable color cast in the final image. Despite the multiple frames captured in the HDR pipeline, conventional illuminant estimation methods often rely only on a single frame of the scene. In this paper, we explore leveraging information from frames captured with different exposure times. Specifically, we introduce a simple feature extracted from dual-exposure images to guide illuminant estimators, referred to as the dual-exposure feature (DEF). To validate the efficiency of DEF, we employed two illuminant estimators using the proposed DEF: 1) a multilayer perceptron network (MLP), referred to as exposure-based MLP (EMLP), and 2) a modified version of the convolutional color constancy (CCC) to integrate our DEF, that we call ECCC. Both EMLP and ECCC achieve promising results, in some cases surpassing prior methods that require hundreds of thousands or millions of parameters, with only a few hundred parameters for EMLP and a few thousand parameters for ECCC.
△ Less
Submitted 6 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation
Authors:
Yu Ming,
Zihao Wu,
Jie Yang,
Danyi Li,
Yuan Gao,
Changxin Gao,
Gui-Song Xia,
Yuanqing Li,
Li Liang,
Jin-Gang Yu
Abstract:
Nucleus instance segmentation from histopathology images suffers from the extremely laborious and expert-dependent annotation of nucleus instances. As a promising solution to this task, annotation-efficient deep learning paradigms have recently attracted much research interest, such as weakly-/semi-supervised learning, generative adversarial learning, etc. In this paper, we propose to formulate an…
▽ More
Nucleus instance segmentation from histopathology images suffers from the extremely laborious and expert-dependent annotation of nucleus instances. As a promising solution to this task, annotation-efficient deep learning paradigms have recently attracted much research interest, such as weakly-/semi-supervised learning, generative adversarial learning, etc. In this paper, we propose to formulate annotation-efficient nucleus instance segmentation from the perspective of few-shot learning (FSL). Our work was motivated by that, with the prosperity of computational pathology, an increasing number of fully-annotated datasets are publicly accessible, and we hope to leverage these external datasets to assist nucleus instance segmentation on the target dataset which only has very limited annotation. To achieve this goal, we adopt the meta-learning based FSL paradigm, which however has to be tailored in two substantial aspects before adapting to our task. First, since the novel classes may be inconsistent with those of the external dataset, we extend the basic definition of few-shot instance segmentation (FSIS) to generalized few-shot instance segmentation (GFSIS). Second, to cope with the intrinsic challenges of nucleus segmentation, including touching between adjacent cells, cellular heterogeneity, etc., we further introduce a structural guidance mechanism into the GFSIS network, finally leading to a unified Structurally-Guided Generalized Few-Shot Instance Segmentation (SGFSIS) framework. Extensive experiments on a couple of publicly accessible datasets demonstrate that, SGFSIS can outperform other annotation-efficient learning baselines, including semi-supervised learning, simple transfer learning, etc., with comparable performance to fully supervised learning with less than 5% annotations.
△ Less
Submitted 27 February, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Unleashing the Power of Imbalanced Modality Information for Multi-modal Knowledge Graph Completion
Authors:
Yichi Zhang,
Zhuo Chen,
Lei Liang,
Huajun Chen,
Wen Zhang
Abstract:
Multi-modal knowledge graph completion (MMKGC) aims to predict the missing triples in the multi-modal knowledge graphs by incorporating structural, visual, and textual information of entities into the discriminant models. The information from different modalities will work together to measure the triple plausibility. Existing MMKGC methods overlook the imbalance problem of modality information amo…
▽ More
Multi-modal knowledge graph completion (MMKGC) aims to predict the missing triples in the multi-modal knowledge graphs by incorporating structural, visual, and textual information of entities into the discriminant models. The information from different modalities will work together to measure the triple plausibility. Existing MMKGC methods overlook the imbalance problem of modality information among entities, resulting in inadequate modal fusion and inefficient utilization of the raw modality information. To address the mentioned problems, we propose Adaptive Multi-modal Fusion and Modality Adversarial Training (AdaMF-MAT) to unleash the power of imbalanced modality information for MMKGC. AdaMF-MAT achieves multi-modal fusion with adaptive modality weights and further generates adversarial samples by modality-adversarial training to enhance the imbalanced modality information. Our approach is a co-design of the MMKGC model and training strategy which can outperform 19 recent MMKGC methods and achieve new state-of-the-art results on three public MMKGC benchmarks. Our code and data have been released at https://github.com/zjukg/AdaMF-MAT.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus
Authors:
Honghao Gui,
Lin Yuan,
Hongbin Ye,
Ningyu Zhang,
Mengshu Sun,
Lei Liang,
Huajun Chen
Abstract:
Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEP…
▽ More
Large Language Models (LLMs) demonstrate remarkable potential across various domains; however, they exhibit a significant performance gap in Information Extraction (IE). Note that high-quality instruction data is the vital key for enhancing the specific capabilities of LLMs, while current IE datasets tend to be small in scale, fragmented, and lack standardized schema. To this end, we introduce IEPile, a comprehensive bilingual (English and Chinese) IE instruction corpus, which contains approximately 0.32B tokens. We construct IEPile by collecting and cleaning 33 existing IE datasets, and introduce schema-based instruction generation to unearth a large-scale corpus. Experimentally, IEPile enhance the performance of LLMs for IE, with notable improvements in zero-shot generalization. We open-source the resource and pre-trained models, hoping to provide valuable support to the NLP community.
△ Less
Submitted 26 May, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
An Inexact Halpern Iteration with Application to Distributionally Robust Optimization
Authors:
Ling Liang,
Kim-Chuan Toh,
Jia-Jie Zhu
Abstract:
The Halpern iteration for solving monotone inclusion problems has gained increasing interests in recent years due to its simple form and appealing convergence properties. In this paper, we investigate the inexact variants of the scheme in both deterministic and stochastic settings. We conduct extensive convergence analysis and show that by choosing the inexactness tolerances appropriately, the ine…
▽ More
The Halpern iteration for solving monotone inclusion problems has gained increasing interests in recent years due to its simple form and appealing convergence properties. In this paper, we investigate the inexact variants of the scheme in both deterministic and stochastic settings. We conduct extensive convergence analysis and show that by choosing the inexactness tolerances appropriately, the inexact schemes admit an $O(k^{-1})$ convergence rate in terms of the (expected) residue norm. Our results relax the state-of-the-art inexactness conditions employed in the literature while sharing the same competitive convergence properties. We then demonstrate how the proposed methods can be applied for solving two classes of data-driven Wasserstein distributionally robust optimization problems that admit convex-concave min-max optimization reformulations. We highlight its capability of performing inexact computations for distributionally robust learning with stochastic first-order methods.
△ Less
Submitted 12 February, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Unified Hallucination Detection for Multimodal Large Language Models
Authors:
Xiang Chen,
Chenxi Wang,
Yida Xue,
Ningyu Zhang,
Xiaoyan Yang,
Qiang Li,
Yue Shen,
Lei Liang,
Jinjie Gu,
Huajun Chen
Abstract:
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks,…
▽ More
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs) are plagued by the critical issue of hallucination. The reliable detection of such hallucinations in MLLMs has, therefore, become a vital aspect of model evaluation and the safeguarding of practical application deployment. Prior research in this domain has been constrained by a narrow focus on singular tasks, an inadequate range of hallucination categories addressed, and a lack of detailed granularity. In response to these challenges, our work expands the investigative horizons of hallucination detection. We present a novel meta-evaluation benchmark, MHaluBench, meticulously crafted to facilitate the evaluation of advancements in hallucination detection methods. Additionally, we unveil a novel unified multimodal hallucination detection framework, UNIHD, which leverages a suite of auxiliary tools to validate the occurrence of hallucinations robustly. We demonstrate the effectiveness of UNIHD through meticulous evaluation and comprehensive analysis. We also provide strategic insights on the application of specific tools for addressing various categories of hallucinations.
△ Less
Submitted 27 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features
Authors:
Aku Kammonen,
Lisi Liang,
Anamika Pandey,
Raúl Tempone
Abstract:
We present experimental results highlighting two key differences resulting from the choice of training algorithm for two-layer neural networks. The spectral bias of neural networks is well known, while the spectral bias dependence on the choice of training algorithm is less studied. Our experiments demonstrate that an adaptive random Fourier features algorithm (ARFF) can yield a spectral bias clos…
▽ More
We present experimental results highlighting two key differences resulting from the choice of training algorithm for two-layer neural networks. The spectral bias of neural networks is well known, while the spectral bias dependence on the choice of training algorithm is less studied. Our experiments demonstrate that an adaptive random Fourier features algorithm (ARFF) can yield a spectral bias closer to zero compared to the stochastic gradient descent optimizer (SGD). Additionally, we train two identically structured classifiers, employing SGD and ARFF, to the same accuracy levels and empirically assess their robustness against adversarial noise attacks.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Scavenging Hyena: Distilling Transformers into Long Convolution Models
Authors:
Tokiniaina Raharison Ralambomihanta,
Shahrad Mohammadzadeh,
Mohammad Sami Nur Islam,
Wassim Jabbour,
Laurence Liang
Abstract:
The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, ou…
▽ More
The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, our method replaces attention heads in transformer models by Hyena, offering a cost-effective alternative to traditional pre-training while confronting the challenge of processing long contextual information, inherent in quadratic attention mechanisms. Unlike conventional compression-focused methods, our technique not only enhances inference speed but also surpasses pre-training in terms of both accuracy and efficiency. In the era of evolving LLMs, our work contributes to the pursuit of sustainable AI solutions, striking a balance between computational power and environmental impact.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.