-
Quantum Plasma Creation near a Magnetar
Authors:
Jonathan Zhang,
Christopher Thompson
Abstract:
Magnetars in quiescent states continue to emit hard X-rays with a power far exceeding the loss of rotational energy. It has recently been noted that this hard X-ray continuum may bear a direct signature of quantum electrodynamic (QED) effects in magnetic fields stronger than the Schwinger field ($B_{\rm Q} = 4.4\times 10^{13}$ G). When the current flowing into the magnetosphere is driven by narrow…
▽ More
Magnetars in quiescent states continue to emit hard X-rays with a power far exceeding the loss of rotational energy. It has recently been noted that this hard X-ray continuum may bear a direct signature of quantum electrodynamic (QED) effects in magnetic fields stronger than the Schwinger field ($B_{\rm Q} = 4.4\times 10^{13}$ G). When the current flowing into the magnetosphere is driven by narrow structures in the solid crust, the $e^\pm$ pair plasma supporting the current relaxes to a collisional and trans-relativistic state. The decay of a pair into two photons produces a broad, bremsstrahlung-like spectrum of hard X-rays, similar to that observed and extending up to $0.5-1$ MeV. The conversion of two gamma rays to a pair is further enhanced by a factor $\sim B/B_{\rm Q}$. Monte Carlo calculations of pair creation in a dipole magnetic field are presented. Non-local particle injection is found to be strong enough to suppress the high voltage that otherwise would accompany polar magnetic twist; the hard X-rays are mostly emitted away from the magnetic poles. Some of the pairs annihilate in an optically thin surface layer. The prototypical anomalous X-ray pulsar 1E 2259$+$586, which shows a hard X-ray continuum but relatively weak torque noise, slow spindown, and no radio emission, is a Rosetta Stone for understanding the magnetar circuit, consistent with the picture advanced here. For a $15-60$ keV luminosity as low as $10^{34}$ erg s$^{-1}$, the polar flux of sub-relativistic pairs produces an optical depth $3-30$ to electron cyclotron scattering in the $1-10$ keV band, reducing the net X-ray polarization.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding
Authors:
Jincen Jiang,
Qianyu Zhou,
Yuhang Li,
Xuequan Lu,
Meili Wang,
Lizhuang Ma,
Jian Chang,
Jian Jun Zhang
Abstract:
Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task…
▽ More
Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task learning capability, it usually relies on high-quality context-rich data and considers a single dataset, and has rarely been studied in point cloud understanding. In this paper, we introduce a novel, practical, multi-domain multi-task setting, handling multiple domains and multiple tasks within one unified model for domain generalized point cloud understanding. To this end, we propose Domain Generalized Point-In-Context Learning (DG-PIC) that boosts the generalizability across various tasks and domains at testing time. In particular, we develop dual-level source prototype estimation that considers both global-level shape contextual and local-level geometrical structures for representing source domains and a dual-level test-time feature shifting mechanism that leverages both macro-level domain semantic information and micro-level patch positional relationships to pull the target data closer to the source ones during the testing. Our DG-PIC does not require any model updates during the testing and can handle unseen domains and multiple tasks, \textit{i.e.,} point cloud reconstruction, denoising, and registration, within one unified model. We also introduce a benchmark for this new setting. Comprehensive experiments demonstrate that DG-PIC outperforms state-of-the-art techniques significantly.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Hydrodynamics as the effective field theory of strong-to-weak spontaneous symmetry breaking
Authors:
Xiaoyang Huang,
Marvin Qi,
Jian-Hao Zhang,
Andrew Lucas
Abstract:
Inspired by the hunt for new phases of matter in quantum mixed states, it has recently been proposed that the equivalence of microcanonical and canonical ensembles in statistical mechanics is a manifestation of strong-to-weak spontaneous symmetry breaking (SWSSB) in an underlying many-body quantum description. Here, we build an effective field theory for SWSSB of a global U(1) symmetry; the answer…
▽ More
Inspired by the hunt for new phases of matter in quantum mixed states, it has recently been proposed that the equivalence of microcanonical and canonical ensembles in statistical mechanics is a manifestation of strong-to-weak spontaneous symmetry breaking (SWSSB) in an underlying many-body quantum description. Here, we build an effective field theory for SWSSB of a global U(1) symmetry; the answer exactly reproduces the Schwinger-Keldysh effective field theory of diffusion for the conserved charge. We conclude that hydrodynamics can be understood as a theory of "superfluidity" for the broken strong symmetry: a non-vanishing susceptibility is a measurable order parameter for SWSSB, the diffusion mode is the Goldstone boson of the spontaneously broken continuous symmetry, and a generalization of Goldstone's Theorem implies that the diffusion mode is always long-lived. This perspective provides a transparent physical explanation for the unusual "reparameterization" symmetries which are a necessary ingredient of Schwinger-Keldysh effective field theories for "normal fluids".
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
SpiralShard: Highly Concurrent and Secure Blockchain Sharding via Linked Cross-shard Endorsement
Authors:
You Lin,
Mingzhe Li,
Jin Zhang
Abstract:
Blockchain sharding improves the scalability of blockchain systems by partitioning the whole blockchain state, nodes, and transaction workloads into different shards. However, existing blockchain sharding systems generally suffer from a small number of shards, resulting in limited concurrency. The main reason is that existing sharding systems require large shard sizes to ensure security. To enhanc…
▽ More
Blockchain sharding improves the scalability of blockchain systems by partitioning the whole blockchain state, nodes, and transaction workloads into different shards. However, existing blockchain sharding systems generally suffer from a small number of shards, resulting in limited concurrency. The main reason is that existing sharding systems require large shard sizes to ensure security. To enhance the concurrency of blockchain sharding securely, we propose SpiralShard. The intuition is to allow the existence of some shards with a larger fraction of malicious nodes (i.e., corrupted shards), thus reducing shard sizes. SpiralShard can configure more and smaller shards for higher concurrency at the same network size. To ensure security with the existence of corrupted shards, we propose the Linked Cross-shard Endorsement (LCE) protocol. According to our LCE protocol, the blocks of each shard are sequentially verified and endorsed by a group of shards before being finalized. As a result, a corrupted shard can eliminate forks with the help of the other shards. We implement SpiralShard based on Harmony and conduct extensive evaluations. Experimental results show that, compared with Harmony, SpiralShard achieves around 19x throughput gain under a large network size with 4,000+ nodes.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility
Authors:
Yuchen Xia,
Jize Zhang,
Nasser Jazdi,
Michael Weyrich
Abstract:
This paper introduces a novel approach to integrating large language model (LLM) agents into automated production systems, aimed at enhancing task automation and flexibility. We organize production operations within a hierarchical framework based on the automation pyramid. Atomic operation functionalities are modeled as microservices, which are executed through interface invocation within a dedica…
▽ More
This paper introduces a novel approach to integrating large language model (LLM) agents into automated production systems, aimed at enhancing task automation and flexibility. We organize production operations within a hierarchical framework based on the automation pyramid. Atomic operation functionalities are modeled as microservices, which are executed through interface invocation within a dedicated digital twin system. This allows for a scalable and flexible foundation for orchestrating production processes. In this digital twin system, low-level, hardware-specific data is semantically enriched and made interpretable for LLMs for production planning and control tasks. Large language model agents are systematically prompted to interpret these production-specific data and knowledge. Upon receiving a user request or identifying a triggering event, the LLM agents generate a process plan. This plan is then decomposed into a series of atomic operations, executed as microservices within the real-world automation system. We implement this overall approach on an automated modular production facility at our laboratory, demonstrating how the LLMs can handle production planning and control tasks through a concrete case study. This results in an intuitive production facility with higher levels of task automation and flexibility. Finally, we reveal the several limitations in realizing the full potential of the large language models in autonomous systems and point out promising benefits. Demos of this series of ongoing research series can be accessed at: https://github.com/YuchenXia/GPT4IndustrialAutomation
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
BriDe Arbitrager: Enhancing Arbitrage in Ethereum 2.0 via Bribery-enabled Delayed Block Production
Authors:
Hulin Yang,
Mingzhe Li,
Jin Zhang,
Alia Asheralieva,
Qingsong Wei,
Siow Mong Rick Goh
Abstract:
The advent of Ethereum 2.0 has introduced significant changes, particularly the shift to Proof-of-Stake consensus. This change presents new opportunities and challenges for arbitrage. Amidst these changes, we introduce BriDe Arbitrager, a novel tool designed for Ethereum 2.0 that leverages Bribery-driven attacks to Delay block production and increase arbitrage gains. The main idea is to allow mali…
▽ More
The advent of Ethereum 2.0 has introduced significant changes, particularly the shift to Proof-of-Stake consensus. This change presents new opportunities and challenges for arbitrage. Amidst these changes, we introduce BriDe Arbitrager, a novel tool designed for Ethereum 2.0 that leverages Bribery-driven attacks to Delay block production and increase arbitrage gains. The main idea is to allow malicious proposers to delay block production by bribing validators/proposers, thereby gaining more time to identify arbitrage opportunities. Through analysing the bribery process, we design an adaptive bribery strategy. Additionally, we propose a Delayed Transaction Ordering Algorithm to leverage the delayed time to amplify arbitrage profits for malicious proposers. To ensure fairness and automate the bribery process, we design and implement a bribery smart contract and a bribery client. As a result, BriDe Arbitrager enables adversaries controlling a limited (< 1/4) fraction of the voting powers to delay block production via bribery and arbitrage more profit. Extensive experimental results based on Ethereum historical transactions demonstrate that BriDe Arbitrager yields an average of 8.66 ETH (16,442.23 USD) daily profits. Furthermore, our approach does not trigger any slashing mechanisms and remains effective even under Proposer Builder Separation and other potential mechanisms will be adopted by Ethereum.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Data-Driven Model Predictive Control for Autonomous Vehicle Steering
Authors:
Jiarui Zhang,
Aijing Kong,
Yu Tang,
Zhichao Lv,
Lulu Guo,
Peng Hang
Abstract:
With the development of autonomous driving technology, there are increasing demands for vehicle control, and MPC has become a widely researched topic in both industry and academia. Existing MPC control methods based on vehicle kinematics or dynamics have challenges such as difficult modeling, numerous parameters, strong nonlinearity, and high computational cost. To address these issues, this paper…
▽ More
With the development of autonomous driving technology, there are increasing demands for vehicle control, and MPC has become a widely researched topic in both industry and academia. Existing MPC control methods based on vehicle kinematics or dynamics have challenges such as difficult modeling, numerous parameters, strong nonlinearity, and high computational cost. To address these issues, this paper proposes a Data-Driven MPC control method for autonomous vehicle steering. This method avoids the need for complex vehicle system modeling and achieves trajectory tracking with relatively low computational time and small errors. We validate the control effectiveness of our algorithm in specific scenario through CarSim-Simulink simulation and perform comparative analysis with PID and vehicle kinematics MPC, confirming the feasibility and superiority of the proposed algorithm.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
One-dimensional flat bands in phosphorene nanoribbons with pentagonal nature
Authors:
Shuo Sun,
Jing-Yang You,
Zhihao Cai,
Jie Su,
Tong Yang,
Xinnan Peng,
Yihe Wang,
Daiyu Geng,
Jian Gou,
Yuli Huang,
Sisheng Duan,
Lan Chen,
Kehui Wu,
Andrew T. S. Wee,
Yuan Ping Feng,
Jia Lin Zhang,
Jiong Lu,
Baojie Feng,
Wei Chen
Abstract:
Materials with topological flat bands can serve as a promising platform to investigate strongly interacting phenomena. However, experimental realization of ideal flat bands is mostly limited to artificial lattices or moiré systems. Here we report a general way to construct one-dimensional (1D) flat bands in phosphorene nanoribbons (PNRs) with pentagonal nature: penta-hexa-PNRs and penta-dodeca-PNR…
▽ More
Materials with topological flat bands can serve as a promising platform to investigate strongly interacting phenomena. However, experimental realization of ideal flat bands is mostly limited to artificial lattices or moiré systems. Here we report a general way to construct one-dimensional (1D) flat bands in phosphorene nanoribbons (PNRs) with pentagonal nature: penta-hexa-PNRs and penta-dodeca-PNRs, wherein the corresponding flat bands are directly verified by using angle-resolved photoemission spectroscopy. We confirm that the observed 1D flat bands originate from the electronic 1D sawtooth and Lieb lattices, respectively, as revealed by the combination of bond-resolved scanning tunneling microscopy, scanning tunneling spectroscopy, tight-binding models, and first-principles calculations. Our study demonstrates a general way to construct 1D flat bands in 1D solid materials system, which provides a robust platform to explore strongly interacting phases of matter.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL
Authors:
Zhenhe Wu,
Zhongqiu Li,
Jie Zhang,
Mengxiang Li,
Yu Zhao,
Ruiyu Fang,
Zhongjiang He,
Xuelong Li,
Zhoujun Li,
Shuangyong Song
Abstract:
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v…
▽ More
Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting valuable information for more efficient prompt engineering. Based on above analysis, we propose RB-SQL, a novel retrieval-based LLM framework for in-context prompt engineering, which consists of three modules that retrieve concise tables and columns as schema, and targeted examples for in-context learning. Experiment results demonstrate that our model achieves better performance than several competitive baselines on public datasets BIRD and Spider.
△ Less
Submitted 12 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Multimodal contrastive learning for spatial gene expression prediction using histology images
Authors:
Wenwen Min,
Zhiceng Shi,
Jun Zhang,
Jun Wan,
Changmiao Wang
Abstract:
In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effect…
▽ More
In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H\&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose \textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of \textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
System Report for CCL24-Eval Task 7: Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation
Authors:
Jingshen Zhang,
Xiangyu Yang,
Xinkai Su,
Xinglu Chen,
Tianyou Huang,
Xinying Qiu
Abstract:
This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types pe…
▽ More
This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pre-training and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
More on Maximally Permissive Similarity Control of Discrete Event Systems
Authors:
Yu Wang,
Zhaohui Zhu,
Rob van Glabbeek,
Jinjin Zhang,
Lixing Tan
Abstract:
Takai proposed a method for constructing a maximally permissive supervisor for the similarity control problem (IEEE Transactions on Automatic Control, 66(7):3197-3204, 2021). This paper points out flaws in his results by providing a counterexample. Inspired by Takai's construction, the notion of a (saturated) (G, R)-automaton is introduced and metatheorems concerning (maximally permissive) supervi…
▽ More
Takai proposed a method for constructing a maximally permissive supervisor for the similarity control problem (IEEE Transactions on Automatic Control, 66(7):3197-3204, 2021). This paper points out flaws in his results by providing a counterexample. Inspired by Takai's construction, the notion of a (saturated) (G, R)-automaton is introduced and metatheorems concerning (maximally permissive) supervisors for the similarity control problem are provided in terms of this notion. As an application of these metatheorems, the flaws in Takai's work are corrected.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Solving General Natural-Language-Description Optimization Problems with Large Language Models
Authors:
Jihai Zhang,
Wei Wang,
Siyan Guo,
Li Wang,
Fangquan Lin,
Cheng Yang,
Wotao Yin
Abstract:
Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p…
▽ More
Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat).
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
Authors:
Mingjin Zhang,
Yuchun Wang,
Jie Guo,
Yunsong Li,
Xinbo Gao,
Jing Zhang
Abstract:
The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared i…
▽ More
The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared images. Unlike a visible light camera, a thermal imager reveals an object's temperature distribution by capturing infrared radiation. Small targets often show a subtle temperature transition at the object's boundaries. To address this issue, we propose the IRSAM model for IRSTD, which improves SAM's encoder-decoder architecture to learn better feature representation of infrared small objects. Specifically, we design a Perona-Malik diffusion (PMD)-based block and incorporate it into multiple levels of SAM's encoder to help it capture essential structural features while suppressing noise. Additionally, we devise a Granularity-Aware Decoder (GAD) to fuse the multi-granularity feature from the encoder to capture structural information that may be lost in long-distance modeling. Extensive experiments on the public datasets, including NUAA-SIRST, NUDT-SIRST, and IRSTD-1K, validate the design choice of IRSAM and its significant superiority over representative state-of-the-art methods. The source code are available at: github.com/IPIC-Lab/IRSAM.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
On polynomial convergence to tangent cones for singular Kähler-Einstein metrics
Authors:
Junsheng Zhang
Abstract:
Let $(Z,p)$ be a pointed Gromov-Hausdorff limit of non-collapsing Kähler-Einstein metrics with uniformly bounded Ricci curvature. We show that the singular Kähler-Einstein metric on $Z$ is conical at $p$ if and only if $\mathcal C=W$ in Donaldson-Sun's two-step degeneration theory, assuming curvature grows at most quadratically near $p$.
Let $(X,p)$ be a germ of an isolated log terminal algebrai…
▽ More
Let $(Z,p)$ be a pointed Gromov-Hausdorff limit of non-collapsing Kähler-Einstein metrics with uniformly bounded Ricci curvature. We show that the singular Kähler-Einstein metric on $Z$ is conical at $p$ if and only if $\mathcal C=W$ in Donaldson-Sun's two-step degeneration theory, assuming curvature grows at most quadratically near $p$.
Let $(X,p)$ be a germ of an isolated log terminal algebraic singularity. Following Hein-Sun's approach, we show that if $\mathcal C=W$ in the two-step stable degeneration of $(X,p)$ and $\mathcal C$ has a smooth link, then every singular Kähler-Einstein metric on $X$ with non-positive Ricci curvature and bounded potential is conical at $p$.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
Authors:
Yuqi Jia,
Minghong Fang,
Hongbin Liu,
Jinghuai Zhang,
Neil Zhenqiang Gong
Abstract:
Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non…
▽ More
Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods
Authors:
Yiying Wang,
Xiaojing Li,
Binzhu Wang,
Yueyang Zhou,
Han Ji,
Hong Chen,
Jinshi Zhang,
Fei Yu,
Zewei Zhao,
Song Jin,
Renji Gong,
Wanqing Xu
Abstract:
In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE…
▽ More
In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PEER (Plan, Execute, Express, Review) multi-agent framework. This systematizes domain-specific tasks by integrating precise question decomposition, advanced information retrieval, comprehensive summarization, and rigorous self-assessment. Given the concerns of cost and data privacy, enterprises are shifting from proprietary models like GPT-4 to custom models, striking a balance between cost, security, and performance. We developed industrial practices leveraging online data and user feedback for efficient model tuning. This study provides best practice guidelines for applying multi-agent systems in domain-specific problem-solving and implementing effective agent tuning strategies. Our empirical studies, particularly in the financial question-answering domain, demonstrate that our approach achieves 95.0% of GPT-4's performance, while effectively managing costs and ensuring data privacy.
△ Less
Submitted 9 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Optimization of noncollinear magnetic ordering temperature in Y-type hexaferrite by machine learning
Authors:
Yonghong Li,
Jing Zhang,
Linfeng Jiang,
Long Zhang,
Yugang Zhang,
Xueliang Wu,
Yisheng Chai,
Xiaoyuan Zhou,
Zizhen Zhou
Abstract:
Searching the optimal doping compositions of the Y-type hexaferrite Ba2Mg2Fe12O22 remains a long-standing challenge for enhanced non-collinear magnetic transition temperature (TNC). Instead of the conventional trial-and-error approach, the composition-property descriptor is established via a data driven machine learning method named SISSO (sure independence screening and sparsifying operator). Bas…
▽ More
Searching the optimal doping compositions of the Y-type hexaferrite Ba2Mg2Fe12O22 remains a long-standing challenge for enhanced non-collinear magnetic transition temperature (TNC). Instead of the conventional trial-and-error approach, the composition-property descriptor is established via a data driven machine learning method named SISSO (sure independence screening and sparsifying operator). Based on the chosen efficient and physically interpretable descriptor, a series of Y-type hexaferrite compositions are predicted to hold high TNC, among which the BaSrMg0.28Co1.72Fe10Al2O22 is then experimentally validated. Test results indicate that, under appropriate external magnetic field conditions, the TNC of this composition reaches up to reaches up to 568 K, and its magnetic transition temperature is also elevated to 735 K. This work offers a machine learning-based route to develop room temperature single phase multiferroics for device applications.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
SP-Chain: Boosting Intra-Shard and Cross-Shard Security and Performance in Blockchain Sharding
Authors:
Mingzhe Li,
You Lin,
Wei Wang,
Jin Zhang
Abstract:
A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or t…
▽ More
A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or trade off great performance for security. In this paper, we propose SP-Chain, a blockchain sharding system with enhanced Security and Performance for both intra- and cross-shard perspectives. For intra-shard aspect, we design a two-phase concurrent voting scheme to provide high system throughput and low transaction confirmation latency. Moreover, we propose an efficient unbiased leader rotation scheme to ensure high performance under malicious behavior. For cross-shard aspect, a proof-assisted efficient cross-shard transaction processing mechanism is proposed to guard the cross-shard transactions with low overhead. We implement SP-Chain based on Harmony, and evaluate its performance via large-scale deployment. Extensive evaluations suggest that SP-Chain can process more than 10,000 tx/sec under malicious behaviors with a confirmation latency of 7.6s in a network of 4,000 nodes.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
DL-Chain: Scalable and Stable Blockchain Sharding with High Concurrency via Dual-Layer Consensus
Authors:
You Lin,
Mingzhe Li,
Qingsong Wei,
Yong Liu,
Siow Mong Rick Goh,
Jin Zhang
Abstract:
Sharding enhances blockchain scalability by partitioning nodes into multiple groups for concurrent transaction processing. Configuring a large number of \emph{small shards} helps improve the transaction concurrency of a sharding system. However, it increases the fraction of malicious nodes within each shard, easily leading to shard corruption and jeopardizing system security. Some existing works h…
▽ More
Sharding enhances blockchain scalability by partitioning nodes into multiple groups for concurrent transaction processing. Configuring a large number of \emph{small shards} helps improve the transaction concurrency of a sharding system. However, it increases the fraction of malicious nodes within each shard, easily leading to shard corruption and jeopardizing system security. Some existing works have attempted to improve concurrency by reducing the shard size while maintaining security. However, they often require frequent and time-consuming recovery of corrupted shards, leading to severe system stagnation. Also, they usually require network-wide consensus to guarantee security, which limits scalability.
To address these issues, we propose DL-Chain, a blockchain sharding system that can securely provide \emph{high concurrency with stable and scalable performance.} Our core idea is a \underline{D}ual-\underline{L}ayer architecture and consensus, which consists of numerous smaller proposer shards (PSs) for transaction processing and multiple larger finalizer committees (FCs) for transaction finalization. To avoid system stagnation and thus guarantee stable performance, we ensure PSs' liveness even if they are corrupted through the cooperation of PSs and FCs, thus eliminating the recovery process of corrupted PSs. To better trade-off security and scalability, we fine-tune the FCs to enable multiple FCs to coexist securely. As a result, DL-Chain allows a larger fraction of malicious nodes in each PS ($<1/2$) and thus can securely configure smaller shards for boosted stable and scalable concurrency. Evaluation results show that DL-Chain achieves up to 10 times improvement in throughput compared to existing solutions and provides stable concurrency with up to 2,550 nodes.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task
Authors:
Yiran Yang,
Jinchao Zhang,
Ying Deng,
Jie Zhou
Abstract:
Inspired by the success of the text-to-image (T2I) generation task, many researchers are devoting themselves to the text-to-video (T2V) generation task. Most of the T2V frameworks usually inherit from the T2I model and add extra-temporal layers of training to generate dynamic videos, which can be viewed as a fine-tuning task. However, the traditional 3D-Unet is a serial mode and the temporal layer…
▽ More
Inspired by the success of the text-to-image (T2I) generation task, many researchers are devoting themselves to the text-to-video (T2V) generation task. Most of the T2V frameworks usually inherit from the T2I model and add extra-temporal layers of training to generate dynamic videos, which can be viewed as a fine-tuning task. However, the traditional 3D-Unet is a serial mode and the temporal layers follow the spatial layers, which will result in high GPU memory and training time consumption according to its serial feature flow. We believe that this serial mode will bring more training costs with the large diffusion model and massive datasets, which are not environmentally friendly and not suitable for the development of the T2V. Therefore, we propose a highly efficient spatial-temporal parallel training paradigm for T2V tasks, named Mobius. In our 3D-Unet, the temporal layers and spatial layers are parallel, which optimizes the feature flow and backpropagation. The Mobius will save 24% GPU memory and 12% training time, which can greatly improve the T2V fine-tuning task and provide a novel insight for the AIGC community. We will release our codes in the future.
△ Less
Submitted 11 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Robust and Explainable Framework to Address Data Scarcity in Diagnostic Imaging
Authors:
Zehui Zhao,
Laith Alzubaidi,
Jinglan Zhang,
Ye Duan,
Usman Naseem,
Yuantong Gu
Abstract:
Deep learning has significantly advanced automatic medical diagnostics and released the occupation of human resources to reduce clinical pressure, yet the persistent challenge of data scarcity in this area hampers its further improvements and applications. To address this gap, we introduce a novel ensemble framework called `Efficient Transfer and Self-supervised Learning based Ensemble Framework'…
▽ More
Deep learning has significantly advanced automatic medical diagnostics and released the occupation of human resources to reduce clinical pressure, yet the persistent challenge of data scarcity in this area hampers its further improvements and applications. To address this gap, we introduce a novel ensemble framework called `Efficient Transfer and Self-supervised Learning based Ensemble Framework' (ETSEF). ETSEF leverages features from multiple pre-trained deep learning models to efficiently learn powerful representations from a limited number of data samples. To the best of our knowledge, ETSEF is the first strategy that combines two pre-training methodologies (Transfer Learning and Self-supervised Learning) with ensemble learning approaches. Various data enhancement techniques, including data augmentation, feature fusion, feature selection, and decision fusion, have also been deployed to maximise the efficiency and robustness of the ETSEF model. Five independent medical imaging tasks, including endoscopy, breast cancer, monkeypox, brain tumour, and glaucoma detection, were tested to demonstrate ETSEF's effectiveness and robustness. Facing limited sample numbers and challenging medical tasks, ETSEF has proved its effectiveness by improving diagnostics accuracies from 10\% to 13.3\% when compared to strong ensemble baseline models and up to 14.4\% improvements compared with published state-of-the-art methods. Moreover, we emphasise the robustness and trustworthiness of the ETSEF method through various vision-explainable artificial intelligence techniques, including Grad-CAM, SHAP, and t-SNE. Compared to those large-scale deep learning models, ETSEF can be deployed flexibly and maintain superior performance for challenging medical imaging tasks, showing the potential to be applied to more areas that lack training data
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
The k-Facility Location Problem Via Optimal Transport: A Bayesian Study of the Percentile Mechanisms
Authors:
Gennaro Auricchio,
Jie Zhang
Abstract:
In this paper, we investigate the $k$-Facility Location Problem ($k$-FLP) within the Bayesian Mechanism Design framework, in which agents' preferences are samples of a probability distributed on a line. Our primary contribution is characterising the asymptotic behavior of percentile mechanisms, which varies according to the distribution governing the agents' types. To achieve this, we connect the…
▽ More
In this paper, we investigate the $k$-Facility Location Problem ($k$-FLP) within the Bayesian Mechanism Design framework, in which agents' preferences are samples of a probability distributed on a line. Our primary contribution is characterising the asymptotic behavior of percentile mechanisms, which varies according to the distribution governing the agents' types. To achieve this, we connect the $k$-FLP and projection problems in the Wasserstein space. Owing to this relation, we show that the ratio between the expected cost of a percentile mechanism and the expected optimal cost is asymptotically bounded. Furthermore, we characterize the limit of this ratio and analyze its convergence speed. Our asymptotic study is complemented by deriving an upper bound on the Bayesian approximation ratio, applicable when the number of agents $n$ exceeds the number of facilities $k$. We also characterize the optimal percentile mechanism for a given agent's distribution through a system of $k$ equations. Finally, we estimate the optimality loss incurred when the optimal percentile mechanism is derived using an approximation of the agents' distribution rather than the actual distribution.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models
Authors:
Jinhua Zhang,
Hualian Sheng,
Sijia Cai,
Bing Deng,
Qiao Liang,
Wen Li,
Ying Fu,
Jieping Ye,
Shuhang Gu
Abstract:
Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or Contr…
▽ More
Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or ControlNet, to produce commendable outcomes in controllable generation. However, such approaches intrinsically restrict generation performance to the learning capacities of predefined network architectures. In this paper, we explore the integration of controlling information and introduce PerlDiff (Perspective-Layout Diffusion Models), a method for effective street view image generation that fully leverages perspective 3D geometric information. Our PerlDiff employs 3D geometric priors to guide the generation of street view images with precise object-level control within the network learning process, resulting in a more robust and controllable output. Moreover, it demonstrates superior controllability compared to alternative layout control methods. Empirical results justify that our PerlDiff markedly enhances the precision of generation on the NuScenes and KITTI datasets. Our codes and models are publicly available at https://github.com/LabShuHangGU/PerlDiff.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Authors:
Jinliang Lu,
Ziliang Pang,
Min Xiao,
Yaochen Zhu,
Rui Xia,
Jiajun Zhang
Abstract:
The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies f…
▽ More
The remarkable success of Large Language Models (LLMs) has ushered natural language processing (NLP) research into a new era. Despite their diverse capabilities, LLMs trained on different corpora exhibit varying strengths and weaknesses, leading to challenges in maximizing their overall efficiency and versatility. To address these challenges, recent studies have explored collaborative strategies for LLMs. This paper provides a comprehensive overview of this emerging research area, highlighting the motivation behind such collaborations. Specifically, we categorize collaborative strategies into three primary approaches: Merging, Ensemble, and Cooperation. Merging involves integrating multiple LLMs in the parameter space. Ensemble combines the outputs of various LLMs. Cooperation} leverages different LLMs to allow full play to their diverse capabilities for specific tasks. We provide in-depth introductions to these methods from different perspectives and discuss their potential applications. Additionally, we outline future research directions, hoping this work will catalyze further studies on LLM collaborations and paving the way for advanced NLP applications.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces
Authors:
Xingyu Zhou,
Le Liang,
Jing Zhang,
Chao-Kai Wen,
Shi Jin
Abstract:
The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing…
▽ More
The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing gradient-based MCMC detectors are heuristically designed and thus are theoretically untenable. To bridge this gap, we introduce a novel sampling algorithm tailored for discrete spaces. This algorithm leverages gradients from the underlying continuous spaces for acceleration while maintaining the validity of probabilistic sampling. We prove the convergence of this method and also analyze its convergence rate using both MCMC theory and empirical diagnostics. On this basis, we develop a MIMO detector that precisely samples from the target discrete distribution and generates posterior Bayesian estimates using these samples, whose performance is thereby theoretically guaranteed. Furthermore, our proposed detector is highly parallelizable and scalable to large MIMO dimensions, positioning it as a compelling candidate for next-generation wireless networks. Simulation results show that our detector achieves near-optimal performance, significantly outperforms state-of-the-art baselines, and showcases resilience to various system setups.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
DFedSat: Communication-Efficient and Robust Decentralized Federated Learning for LEO Satellite Constellations
Authors:
Minghao Yang,
Jingjing Zhang,
Shengyun Liu
Abstract:
Low Earth Orbit (LEO) satellites play a crucial role in the development of 6G mobile networks and space-air-ground integrated systems. Recent advancements in space technology have empowered LEO satellites with the capability to run AI applications. However, centralized approaches, where ground stations (GSs) act as servers and satellites as clients, often encounter slow convergence and inefficienc…
▽ More
Low Earth Orbit (LEO) satellites play a crucial role in the development of 6G mobile networks and space-air-ground integrated systems. Recent advancements in space technology have empowered LEO satellites with the capability to run AI applications. However, centralized approaches, where ground stations (GSs) act as servers and satellites as clients, often encounter slow convergence and inefficiencies due to intermittent connectivity between satellites and GSs. In contrast, decentralized federated learning (DFL) offers a promising alternative by facilitating direct communication between satellites (clients) via inter-satellite links (ISLs). However, inter-plane ISLs connecting satellites from different orbital planes are dynamic due to Doppler shifts and pointing limitations. This could impact model propagation and lead to slower convergence. To mitigate these issues, we propose DFedSat, a fully decentralized federated learning framework tailored for LEO satellites. DFedSat accelerates the training process by employing two adaptive mechanisms for intra-plane and inter-plane model aggregation, respectively. Furthermore, a self-compensation mechanism is integrated to enhance the robustness of inter-plane ISLs against transmission failure. Additionally, we derive the sublinear convergence rate for the non-convex case of DFedSat. Extensive experimental results demonstrate DFedSat's superiority over other DFL baselines regarding convergence rate, communication efficiency, and resilience to unreliable links.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Integrating AI in College Education: Positive yet Mixed Experiences with ChatGPT
Authors:
Xinrui Song,
Jiajin Zhang,
Pingkun Yan,
Juergen Hahn,
Uwe Kruger,
Hisham Mohamed,
Ge Wang
Abstract:
The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduat…
▽ More
The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduate medical imaging course in the Spring 2024 semester. This study investigates the use of ChatGPT throughout a semester-long trial, providing insights into students' engagement, perception, and the overall educational effectiveness of the technology. We systematically collected and analyzed data concerning students' interaction with ChatGPT, focusing on their attitudes, concerns, and usage patterns. The findings indicate that ChatGPT offers significant advantages such as improved information access and increased interactivity, but its adoption is accompanied by concerns about the accuracy of the information provided and the necessity for well-defined guidelines to optimize its use.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Boosting 3D Object Detection with Semantic-Aware Multi-Branch Framework
Authors:
Hao Jing,
Anhong Wang,
Lijun Zhao,
Yakun Yang,
Donghan Bu,
Jing Zhang,
Yifan Zhang,
Junhui Hou
Abstract:
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-bran…
▽ More
In autonomous driving, LiDAR sensors are vital for acquiring 3D point clouds, providing reliable geometric information. However, traditional sampling methods of preprocessing often ignore semantic features, leading to detail loss and ground point interference in 3D object detection. To address this, we propose a multi-branch two-stage 3D object detection framework using a Semantic-aware Multi-branch Sampling (SMS) module and multi-view consistency constraints. The SMS module includes random sampling, Density Equalization Sampling (DES) for enhancing distant objects, and Ground Abandonment Sampling (GAS) to focus on non-ground points. The sampled multi-view points are processed through a Consistent KeyPoint Selection (CKPS) module to generate consistent keypoint masks for efficient proposal sampling. The first-stage detector uses multi-branch parallel learning with multi-view consistency loss for feature aggregation, while the second-stage detector fuses multi-view data through a Multi-View Fusion Pooling (MVFP) module to precisely predict 3D objects. The experimental results on KITTI 3D object detection benchmark dataset show that our method achieves excellent detection performance improvement for a variety of backbones, especially for low-performance backbones with the simple network structures.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery
Authors:
Kun Wu,
Zixu Wang,
Xiulong Yang,
Yangyang Chen,
Zhenqi Han,
Jialu Zhang,
Lizhuang Liu
Abstract:
As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction mod…
▽ More
As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction model, called TransMA. TransMA employs a multi-modal molecular structure fusion architecture, wherein the fine-grained atomic spatial relationship extractor named molecule 3D Transformer captures three-dimensional spatial features of the molecule, and the coarse-grained atomic sequence extractor named molecule Mamba captures one-dimensional molecular features. We design the mol-attention mechanism block, enabling it to align coarse and fine-grained atomic features and captures relationships between atomic spatial and sequential structures. TransMA achieves state-of-the-art performance in predicting transfection efficiency using the scaffold and cliff data splitting methods on the current largest LNPs dataset, including Hela and RAW cell lines. Moreover, we find that TransMA captures the relationship between subtle structural changes and significant transfection efficiency variations, providing valuable insights for LNPs design. Additionally, TransMA's predictions on external transfection efficiency data maintain a consistent order with actual transfection efficiencies, demonstrating its robust generalization capability. The code, model and data are made publicly available at https://github.com/wklix/TransMA/tree/master. We hope that high-accuracy transfection prediction models in the future can aid in LNPs design and initial screening, thereby assisting in accelerating the mRNA design process.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Explainable Image Recognition via Enhanced Slot-attention Based Classifier
Authors:
Bowen Wang,
Liangzhi Li,
Jiahao Zhang,
Yuta Nakashima,
Hajime Nagahara
Abstract:
The imperative to comprehend the behaviors of deep learning models is of utmost importance. In this realm, Explainable Artificial Intelligence (XAI) has emerged as a promising avenue, garnering increasing interest in recent years. Despite this, most existing methods primarily depend on gradients or input perturbation, which often fails to embed explanations directly within the model's decision-mak…
▽ More
The imperative to comprehend the behaviors of deep learning models is of utmost importance. In this realm, Explainable Artificial Intelligence (XAI) has emerged as a promising avenue, garnering increasing interest in recent years. Despite this, most existing methods primarily depend on gradients or input perturbation, which often fails to embed explanations directly within the model's decision-making process. Addressing this gap, we introduce ESCOUTER, a visually explainable classifier based on the modified slot attention mechanism. ESCOUTER distinguishes itself by not only delivering high classification accuracy but also offering more transparent insights into the reasoning behind its decisions. It differs from prior approaches in two significant aspects: (a) ESCOUTER incorporates explanations into the final confidence scores for each category, providing a more intuitive interpretation, and (b) it offers positive or negative explanations for all categories, elucidating "why an image belongs to a certain category" or "why it does not." A novel loss function specifically for ESCOUTER is designed to fine-tune the model's behavior, enabling it to toggle between positive and negative explanations. Moreover, an area loss is also designed to adjust the size of the explanatory regions for a more precise explanation. Our method, rigorously tested across various datasets and XAI metrics, outperformed previous state-of-the-art methods, solidifying its effectiveness as an explanatory tool.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
An Experimental Comparison of Transfer Learning against Self-supervised Learning
Authors:
Zehui Zhao,
Laith Alzubaidi,
Jinglan Zhang,
Ye Duan,
Usman Naseem,
Yuantong Gu
Abstract:
Recently, transfer learning and self-supervised learning have gained significant attention within the medical field due to their ability to mitigate the challenges posed by limited data availability, improve model generalisation, and reduce computational expenses. Transfer learning and self-supervised learning hold immense potential for advancing medical research. However, it is crucial to recogni…
▽ More
Recently, transfer learning and self-supervised learning have gained significant attention within the medical field due to their ability to mitigate the challenges posed by limited data availability, improve model generalisation, and reduce computational expenses. Transfer learning and self-supervised learning hold immense potential for advancing medical research. However, it is crucial to recognise that transfer learning and self-supervised learning architectures exhibit distinct advantages and limitations, manifesting variations in accuracy, training speed, and robustness. This paper compares the performance and robustness of transfer learning and self-supervised learning in the medical field. Specifically, we pre-trained two models using the same source domain datasets with different pre-training methods and evaluated them on small-sized medical datasets to identify the factors influencing their final performance. We tested data with several common issues in medical domains, such as data imbalance, data scarcity, and domain mismatch, through comparison experiments to understand their impact on specific pre-trained models. Finally, we provide recommendations to help users apply transfer learning and self-supervised learning methods in medical areas, and build more convenient and efficient deployment strategies.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution
Authors:
Ziang Yin,
Nicholas Gangi,
Meng Zhang,
Jeff Zhang,
Rena Huang,
Jiaqi Gu
Abstract:
Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great…
▽ More
Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great opportunity for hardware-efficient AI accelerators. However, current dense photonic accelerators fail to fully exploit the power-saving potential of algorithmic sparsity. It requires sparsity-aware hardware specialization with a fundamental re-design of photonic tensor core topology and cross-layer device-circuit-architecture-algorithm co-optimization aware of hardware non-ideality and power bottleneck. To trim down the redundant power consumption while maximizing robustness to thermal variations, we propose SCATTER, a novel algorithm-circuit co-sparse photonic accelerator featuring dynamically reconfigurable signal path via thermal-tolerant, power-efficient in-situ light redistribution and power gating. A power-optimized, crosstalk-aware dynamic sparse training framework is introduced to explore row-column structured sparsity and ensure marginal accuracy loss and maximum power efficiency. The extensive evaluation shows that our cross-stacked optimized accelerator SCATTER achieves a 511X area reduction and 12.4X power saving with superior crosstalk tolerance that enables unprecedented circuit layout compactness and on-chip power efficiency.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
PICA: Physics-Integrated Clothed Avatar
Authors:
Bo Peng,
Yunfan Tao,
Haoyu Zhan,
Yudong Guo,
Juyong Zhang
Abstract:
We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment…
▽ More
We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment dynamics, leading to incorrect deformations and noticeable rendering artifacts, especially for sliding or loose garments. Furthermore, previous works represent garment dynamics as pose-dependent deformations and facilitate novel pose animations in a data-driven manner. This often results in outcomes that do not faithfully represent the mechanics of motion and are prone to generating artifacts in out-of-distribution poses. To address these issues, we adopt two individual 3D Gaussian Splatting (3DGS) models with different deformation characteristics, modeling the human body and clothing separately. This distinction allows for better handling of their respective motion characteristics. With this representation, we integrate a graph neural network (GNN)-based clothed body physics simulation module to ensure an accurate representation of clothing dynamics. Our method, through its carefully designed features, achieves high-fidelity rendering of clothed human bodies in complex and novel driving poses, significantly outperforming previous methods under the same settings.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning
Authors:
Jingshen Zhang,
Xinying Qiu,
Teng Shen,
Wenyu Wang,
Kailin Zhang,
Wenhe Feng
Abstract:
Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between wor…
▽ More
Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between word embeddings. To address this limitation, we propose incorporating contrastive learning into the BiLSTM-based encoder-decoder framework. Our approach introduces a multi-view negative sampling strategy to learn the differences between word pairs in the shared cross-lingual embedding space. We evaluate our model on five bilingual aligned datasets spanning four ASEAN languages: Lao, Vietnamese, Thai, and Indonesian. Experimental results demonstrate that integrating contrastive learning consistently improves word alignment accuracy across all datasets, confirming the effectiveness of the proposed method in low-resource scenarios. We will release our data set and code to support future research on ASEAN or more low-resource word alignment.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling
Authors:
Mahdi Ait Lhaj Loutfi,
Teodora Boblea Podasca,
Alex Zwanenburg,
Taman Upadhaya,
Jorge Barrios,
David R. Raleigh,
William C. Chen,
Dante P. I. Capaldi,
Hong Zheng,
Olivier Gevaert,
Jing Wu,
Alvin C. Silva,
Paul J. Zhang,
Harrison X. Bai,
Jan Seuntjens,
Steffen Löck,
Patrick O. Richard,
Olivier Morin,
Caroline Reinhold,
Martin Lepage,
Martin Vallières
Abstract:
Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat…
▽ More
Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Materials and Methods: 89,714 radiomic features were extracted from five cancer datasets: low-grade glioma, meningioma, non-small cell lung cancer (NSCLC), and two renal cell carcinoma cohorts (n=2104). Features were categorized by computational complexity into morphological, intensity, texture, linear filters, and nonlinear filters. Models were trained and evaluated on each complexity level using the area under the curve (AUC). The most informative features were identified, and their importance was explained. The optimal complexity level and associated most informative features were identified using systematic statistical significance analyses and a false discovery avoidance procedure, respectively. Their predictive importance was explained using a novel tree-based method. Results: MEDimage, a new open-source tool, was developed to facilitate radiomic studies. Morphological features were optimal for MRI-based meningioma (AUC: 0.65) and low-grade glioma (AUC: 0.68). Intensity features were optimal for CECT-based renal cell carcinoma (AUC: 0.82) and CT-based NSCLC (AUC: 0.76). Texture features were optimal for MRI-based renal cell carcinoma (AUC: 0.72). Tuning the Hounsfield unit range improved results for CECT-based renal cell carcinoma (AUC: 0.86). Conclusion: Our proposed methodology and software can estimate the optimal radiomics complexity level for specific medical outcomes, potentially simplifying the use of radiomics in predictive modeling across various contexts.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Poster: Flexible Scheduling of Network and Computing Resources for Distributed AI Tasks
Authors:
Ruikun Wang,
Jiawei Zhang,
Qiaolun Zhang,
Bojun Zhang,
Zhiqun Gu,
Aryanaz Attarpour,
Yuefeng Ji,
Massimo Tornatore
Abstract:
Many emerging Artificial Intelligence (AI) applications require on-demand provisioning of large-scale computing, which can only be enabled by leveraging distributed computing services interconnected through networking. To address such increasing demand for networking to serve AI tasks, we investigate new scheduling strategies to improve communication efficiency and test them on a programmable test…
▽ More
Many emerging Artificial Intelligence (AI) applications require on-demand provisioning of large-scale computing, which can only be enabled by leveraging distributed computing services interconnected through networking. To address such increasing demand for networking to serve AI tasks, we investigate new scheduling strategies to improve communication efficiency and test them on a programmable testbed. We also show relevant challenges and research directions.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
RPN: Reconciled Polynomial Network Towards Unifying PGMs, Kernel SVMs, MLP and KAN
Authors:
Jiawei Zhang
Abstract:
In this paper, we will introduce a novel deep model named Reconciled Polynomial Network (RPN) for deep function learning. RPN has a very general architecture and can be used to build models with various complexities, capacities, and levels of completeness, which all contribute to the correctness of these models. As indicated in the subtitle, RPN can also serve as the backbone to unify different ba…
▽ More
In this paper, we will introduce a novel deep model named Reconciled Polynomial Network (RPN) for deep function learning. RPN has a very general architecture and can be used to build models with various complexities, capacities, and levels of completeness, which all contribute to the correctness of these models. As indicated in the subtitle, RPN can also serve as the backbone to unify different base models into one canonical representation. This includes non-deep models, like probabilistic graphical models (PGMs) - such as Bayesian network and Markov network - and kernel support vector machines (kernel SVMs), as well as deep models like the classic multi-layer perceptron (MLP) and the recent Kolmogorov-Arnold network (KAN).
Technically, RPN proposes to disentangle the underlying function to be inferred into the inner product of a data expansion function and a parameter reconciliation function. Together with the remainder function, RPN accurately approximates the underlying functions that governs data distributions. The data expansion functions in RPN project data vectors from the input space to a high-dimensional intermediate space, specified by the expansion functions in definition. Meanwhile, RPN also introduces the parameter reconciliation functions to fabricate a small number of parameters into a higher-order parameter matrix to address the ``curse of dimensionality'' problem caused by the data expansions. Moreover, the remainder functions provide RPN with additional complementary information to reduce potential approximation errors. We conducted extensive empirical experiments on numerous benchmark datasets across multiple modalities, including continuous function datasets, discrete vision and language datasets, and classic tabular datasets, to investigate the effectiveness of RPN.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Authors:
Xingrun Xing,
Boyan Gao,
Zheng Zhang,
David A. Clifton,
Shitao Xiao,
Li Du,
Guoqi Li,
Jiajun Zhang
Abstract:
The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological n…
▽ More
The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological neurons, exhibit significantly greater energy efficiency compared to LLMs with a similar number of parameters. Inspired by this, we redesign 7 to 70 billion parameter LLMs using bio-plausible spiking mechanisms, emulating the efficient behavior of the human brain. We propose the first spiking large language model as recent LLMs termed SpikeLLM. Coupled with the proposed model, a novel spike-driven quantization framework named Optimal Brain Spiking is introduced to reduce the energy cost and accelerate inference speed via two essential approaches: first (second)-order differentiation-based salient channel detection, and per-channel salient outlier expansion with Generalized Integrate-and-Fire neurons. Our proposed spike-driven quantization can plug in main streams of quantization training methods. In the OmniQuant pipeline, SpikeLLM significantly reduces 25.51% WikiText2 perplexity and improves 3.08% average accuracy of 6 zero-shot datasets on a LLAMA2-7B 4A4W model. In the GPTQ pipeline, SpikeLLM realizes a sparse ternary quantization, which achieves additive in all linear layers. Compared with PB-LLM with similar operations, SpikeLLM also exceeds significantly. We will release our code on GitHub.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
Jingping Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo Jin,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement
Authors:
Yongji Wu,
Wenjie Qu,
Tianyang Tao,
Zhuang Wang,
Wei Bai,
Zhuohao Li,
Yuan Tian,
Jiaheng Zhang,
Matthew Lentz,
Danyang Zhuo
Abstract:
Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing con…
▽ More
Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing considerable training progress as training has to restart from checkpoints. Existing solutions for efficient fault-tolerant training either lack elasticity or rely on building resiliency into pipeline parallelism, which cannot be applied to MoE models due to the expert parallelism strategy adopted by the MoE architecture.
We present Lazarus, a system for resilient and elastic training of MoE models. Lazarus adaptively allocates expert replicas to address the inherent imbalance in expert workload and speeds-up training, while a provably optimal expert placement algorithm is developed to maximize the probability of recovery upon failures. Through adaptive expert placement and a flexible token dispatcher, Lazarus can also fully utilize all available nodes after failures, leaving no GPU idle. Our evaluation shows that Lazarus outperforms existing MoE training systems by up to 5.7x under frequent node failures and 3.4x on a real spot instance trace.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Arbitrary Waveform Generated Metasurface: A New Paradigm for Direct Modulation and Beamforming Decoupling
Authors:
Xuehui Dong,
Bokai Lai,
Rujing Xiong,
Jianan Zhang,
Miyu Feng,
Tiebin Mi,
Robert Caiming Qiu
Abstract:
Passive arbitrary waveform generation (AWG) are especially important in a variety of fields like radar detection, wireless communications and integrated sensing and communications. Typically, backscatter devices are used to achieve passive signal reflection modulation to facilitate information transmission or to interfere with radar echoes. Reconfigurable Intelligent Surface (RIS) or Metasurface i…
▽ More
Passive arbitrary waveform generation (AWG) are especially important in a variety of fields like radar detection, wireless communications and integrated sensing and communications. Typically, backscatter devices are used to achieve passive signal reflection modulation to facilitate information transmission or to interfere with radar echoes. Reconfigurable Intelligent Surface (RIS) or Metasurface is a promising technology that combines the advantages of backscatter devices and reflective array antennas. Previous studies demonstrate diverse approaches to achieve reflection modulation by utilizing the superposition of the quantified reflective coefficient (RC) of each unit but suffer from the computing complexity of codebook sequence, the safety of communication, and the flexibility of modulation. To overcome the difficulties, we propose new paradigm of metasurface, i.e. AWG-RIS, that can independently generate arbitrary baseband waveforms and beam patterns based on a magnitude-phase decoupled unit design without altering the beam pattern. We proposed an analysis framework and introduce waveform factor and beamforming factor into the new model which provide the theoretical support for the flow from the control signal to the outgoing electromagnetic wave. Furthermore, we introduce the world's first prototype that demonstrates passive arbitrary waveform generation without altering the beam pattern. The experiments validate the generation of arbitrary waveforms and spectrograms, both for a single input and through the superposition of multiple inputs.
△ Less
Submitted 8 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
Authors:
Zhongqi Wang,
Jie Zhang,
Shiguang Shan,
Xilin Chen
Abstract:
While text-to-image diffusion models demonstrate impressive generation capabilities, they also exhibit vulnerability to backdoor attacks, which involve the manipulation of model outputs through malicious triggers. In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks. Specifically, we find the "Assimilation Pheno…
▽ More
While text-to-image diffusion models demonstrate impressive generation capabilities, they also exhibit vulnerability to backdoor attacks, which involve the manipulation of model outputs through malicious triggers. In this paper, for the first time, we propose a comprehensive defense method named T2IShield to detect, localize, and mitigate such attacks. Specifically, we find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger. Based on this key insight, we propose two effective backdoor detection methods: Frobenius Norm Threshold Truncation and Covariance Discriminant Analysis. Besides, we introduce a binary-search approach to localize the trigger within a backdoor sample and assess the efficacy of existing concept editing methods in mitigating backdoor attacks. Empirical evaluations on two advanced backdoor attack scenarios show the effectiveness of our proposed defense method. For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$\%$ with low computational cost. Furthermore, T2IShield achieves a localization F1 score of 86.4$\%$ and invalidates 99$\%$ poisoned samples. Codes are released at https://github.com/Robin-WZQ/T2IShield.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting
Authors:
Qinkai Yu,
Jianyang Xie,
Anh Nguyen,
He Zhao,
Jiong Zhang,
Huazhu Fu,
Yitian Zhao,
Yalin Zheng,
Yanda Meng
Abstract:
Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for ac…
▽ More
Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for accurate and robust grading. In this work, we propose a novel DR grading framework CLIP-DR based on three observations: 1) Recent pre-trained visual language models, such as CLIP, showcase a notable capacity for generalisation across various downstream tasks, serving as effective baseline models. 2) The grading of image-text pairs for DR often adheres to a discernible natural sequence, yet most existing DR grading methods have primarily overlooked this aspect. 3) A long-tailed distribution among DR severity levels complicates the grading process. This work proposes a novel ranking-aware prompting strategy to help the CLIP model exploit the ordinal information. Specifically, we sequentially design learnable prompts between neighbouring text-image pairs in two different ranking directions. Additionally, we introduce a Similarity Matrix Smooth module into the structure of CLIP to balance the class distribution. Finally, we perform extensive comparisons with several state-of-the-art methods on the GDRBench benchmark, demonstrating our CLIP-DR's robustness and superior performance. The implementation code is available \footnote{\url{https://github.com/Qinkaiyu/CLIP-DR}
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices
Authors:
Jiayi Zhang,
Chuang Zhao,
Yihan Zhao,
Zhaoyang Yu,
Ming He,
Jianping Fan
Abstract:
The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement…
▽ More
The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement in handling complex tasks and reducing high reasoning costs. In this paper, we introduce MobileExperts, which for the first time introduces tool formulation and multi-agent collaboration to address the aforementioned challenges. More specifically, MobileExperts dynamically assembles teams based on the alignment of agent portraits with the human requirements. Following this, each agent embarks on an independent exploration phase, formulating its tools to evolve into an expert. Lastly, we develop a dual-layer planning mechanism to establish coordinate collaboration among experts. To validate our effectiveness, we design a new benchmark of hierarchical intelligence levels, offering insights into algorithm's capability to address tasks across a spectrum of complexity. Experimental results demonstrate that MobileExperts performs better on all intelligence levels and achieves ~ 22% reduction in reasoning costs, thus verifying the superiority of our design.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Asymmetric Iterated Prisoner's Dilemma on BA Scale-Free Network
Authors:
Yunhao Ding,
Chunyan Zhang,
Jianlei Zhang
Abstract:
In real-world scenarios, individuals often cooperate for mutual benefit. However, differences in wealth can lead to varying outcomes for similar actions. In complex social networks, individuals' choices are also influenced by their neighbors. To explore the evolution of strategies in realistic settings, we conducted repeated asymmetric prisoners dilemma experiments on a weighted BA scale-free netw…
▽ More
In real-world scenarios, individuals often cooperate for mutual benefit. However, differences in wealth can lead to varying outcomes for similar actions. In complex social networks, individuals' choices are also influenced by their neighbors. To explore the evolution of strategies in realistic settings, we conducted repeated asymmetric prisoners dilemma experiments on a weighted BA scale-free network. Our analysis highlighted how the four components of memory-one strategies affect win rates, found two special strategies in the evolutionary process, and increased the cooperation levels among individuals. These findings offer practical insights for addressing real-world problems.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts
Authors:
Zheng-Peng Duan,
Jiawei zhang,
Zheng Lin,
Xin Jin,
Dongqing Zou,
Chunle Guo,
Chongyi Li
Abstract:
Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during…
▽ More
Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during inference. In this paper, we propose a diffusion-based method, named DiffRetouch. Thanks to the excellent distribution modeling ability of diffusion, our method can capture the complex fine-retouched distribution covering various visual-pleasing styles in the training data. Moreover, four image attributes are made adjustable to provide a user-friendly editing mechanism. By adjusting these attributes in specified ranges, users are allowed to customize preferred styles within the learned fine-retouched distribution. Additionally, the affine bilateral grid and contrastive learning scheme are introduced to handle the problem of texture distortion and control insensitivity respectively. Extensive experiments have demonstrated the superior performance of our method on visually appealing and sample diversity. The code will be made available to the community.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Flight Structure Optimization of Modular Reconfigurable UAVs
Authors:
Yao Su,
Ziyuan Jiao,
Zeyu Zhang,
Jingwen Zhang,
Hang Li,
Meng Wang,
Hangxin Liu
Abstract:
This paper presents a Genetic Algorithm (GA) designed to reconfigure a large group of modular Unmanned Aerial Vehicles (UAVs), each with different weights and inertia parameters, into an over-actuated flight structure with improved dynamic properties. Previous research efforts either utilized expert knowledge to design flight structures for a specific task or relied on enumeration-based algorithms…
▽ More
This paper presents a Genetic Algorithm (GA) designed to reconfigure a large group of modular Unmanned Aerial Vehicles (UAVs), each with different weights and inertia parameters, into an over-actuated flight structure with improved dynamic properties. Previous research efforts either utilized expert knowledge to design flight structures for a specific task or relied on enumeration-based algorithms that required extensive computation to find an optimal one. However, both approaches encounter challenges in accommodating the heterogeneity among modules. Our GA addresses these challenges by incorporating the complexities of over-actuation and dynamic properties into its formulation. Additionally, we employ a tree representation and a vector representation to describe flight structures, facilitating efficient crossover operations and fitness evaluations within the GA framework, respectively. Using cubic modular quadcopters capable of functioning as omni-directional thrust generators, we validate that the proposed approach can (i) adeptly identify suboptimal configurations ensuring over-actuation while ensuring trajectory tracking accuracy and (ii) significantly reduce computational costs compared to traditional enumeration-based methods.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Augmenting LLMs to Repair Obsolete Test Cases with Static Collector and Neural Reranker
Authors:
Jun Liu,
Jiwei Yan,
Yuanyuan Xie,
Jun Yan,
Jian Zhang
Abstract:
During software evolution, it is advocated that test code should co-evolve with production code. In real development scenarios, test updating may lag behind production code changing, which may cause the project to fail to compile or bring other troubles. Existing techniques based on pre-trained language models can be adopted to repair obsolete tests caused by such unsynchronized code changes, espe…
▽ More
During software evolution, it is advocated that test code should co-evolve with production code. In real development scenarios, test updating may lag behind production code changing, which may cause the project to fail to compile or bring other troubles. Existing techniques based on pre-trained language models can be adopted to repair obsolete tests caused by such unsynchronized code changes, especially syntactic-related ones. However, the lack of target-oriented contextual information affects repair accuracy on large-scale projects. Starting from an obsoleted test, the key challenging task is precisely identifying and constructing Test-Repair-Oriented Contexts (TROCtx) from the whole repository within a limited token size.
In this paper, we propose SynBCIATR (Syntactic-Breaking-Change-Induced Automated Test Repair), a novel approach to automatically repair obsolete test cases via precise and concise TROCtx construction. Inspired by developers' programming practices of the task, we design three types of TROCtx: class contexts, usage contexts, and environment contexts. For every type of TROCtx, SynBCIATR automatically collects the changed-token-related code information through static analysis techniques. Then it generates reranking queries to identify the most relevant TROCtxs, which will be taken as the repair-required key context and be input to the Large Language Model for the final test repair.
To evaluate the effectiveness of SynBCIATR, we construct a benchmark dataset that contains diverse syntactic breaking changes. The experimental results show that SynBCIATR outperforms baseline approaches both on textual- and intent-matching metrics. With the augmentation of TROCtx constructed by SynBCIATR, hallucinations are reduced by 57.1%.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.