-
Structure preserving schemes for a class of Wasserstein gradient flows
Authors:
Shiheng Zhang,
Jie Shen
Abstract:
We introduce in this paper two time discretization schemes tailored for a range of Wasserstein gradient flows. These schemes are designed to preserve mass, positivity and to be uniquely solvable. In addition, they also ensure energy dissipation in many typical scenarios. Through extensive numerical experiments, we demonstrate the schemes' robustness, accuracy and efficiency.
We introduce in this paper two time discretization schemes tailored for a range of Wasserstein gradient flows. These schemes are designed to preserve mass, positivity and to be uniquely solvable. In addition, they also ensure energy dissipation in many typical scenarios. Through extensive numerical experiments, we demonstrate the schemes' robustness, accuracy and efficiency.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Grain boundaries control lithiation of solid solution substrates in lithium metal batteries
Authors:
Leonardo Shoji Aota,
Chanwon Jung,
Siyuan Zhang,
Ömer K. Büyükuslu,
Poonam Yadav,
Mahander Pratap Singh,
Xinren Chen,
Eric Woods,
Christina Scheu,
Se-Ho Kim,
Dierk Raabe,
Baptiste Gault
Abstract:
The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlat…
▽ More
The development of sustainable transportation and communication systems requires an increase in both energy density and capacity retention of Li-batteries. Using substrates forming a solid solution with body centered cubic Li enhances the cycle stability of anode-less batteries. However, it remains unclear how the substrate microstructure affects the lithiation behavior. Here, we deploy a correlative, near-atomic scale probing approach through combined ion- and electron-microscopy to examine the distribution of Li in Li-Ag diffusion couples as model system. We reveal that Li regions with over 93.8% at.% nucleate within Ag at random high angle grain boundaries, whereas grain interiors are not lithiated. We evidence the role of kinetics and mechanical constraint from the microstructure over equilibrium thermodynamics in dictating the lithiation process. The findings suggest that grain size and grain boundary character are critical to enhance the electrochemical performance of interlayers/electrodes, particularly for improving lithiation kinetics and hence reducing dendrite formation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
FedVAE: Trajectory privacy preserving based on Federated Variational AutoEncoder
Authors:
Yuchen Jiang,
Ying Wu,
Shiyao Zhang,
James J. Q. Yu
Abstract:
The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits,…
▽ More
The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits, necessitating confidentiality and protection from unknown collectors. To address this challenge, privacy-preserving methods like K-anonymity and Differential Privacy have been proposed to safeguard private information in the dataset. Despite their effectiveness, these methods can impact the original features by introducing perturbations or generating unrealistic trajectory data, leading to suboptimal performance in downstream tasks. To overcome these limitations, we propose a Federated Variational AutoEncoder (FedVAE) approach, which effectively generates a new trajectory dataset while preserving the confidentiality of private information and retaining the structure of the original features. In addition, FedVAE leverages Variational AutoEncoder (VAE) to maintain the original feature space and generate new trajectory data, and incorporates Federated Learning (FL) during the training stage, ensuring that users' data remains locally stored to protect their personal information. The results demonstrate its superior performance compared to other existing methods, affirming FedVAE as a promising solution for enhancing data privacy and utility in location-based applications.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Zeroth-Order Katyusha: An Accelerated Derivative-Free Method for Composite Convex Optimization
Authors:
Silan Zhang,
Yujie Tang
Abstract:
We investigate accelerated zeroth-order algorithms for smooth composite convex optimization problems. While for unconstrained optimization, existing methods that merge 2-point zeroth-order gradient estimators with first-order frameworks usually lead to satisfactory performance, for constrained/composite problems, there is still a gap in the complexity bound that is related to the non-vanishing var…
▽ More
We investigate accelerated zeroth-order algorithms for smooth composite convex optimization problems. While for unconstrained optimization, existing methods that merge 2-point zeroth-order gradient estimators with first-order frameworks usually lead to satisfactory performance, for constrained/composite problems, there is still a gap in the complexity bound that is related to the non-vanishing variance of the 2-point gradient estimator near an optimal point. To bridge this gap, we propose the Zeroth-Order Loopless Katyusha (ZO-L-Katyusha) algorithm, leveraging the variance reduction as well as acceleration techniques from the first-order loopless Katyusha algorithm. We show that ZO-L-Katyusha is able to achieve accelerated linear convergence for compositve smooth and strongly convex problems, and has the same oracle complexity as the unconstrained case. Moreover, the number of function queries to construct a zeroth-order gradient estimator in ZO-L-Katyusha can be made to be O(1) on average. These results suggest that ZO-L-Katyusha provides a promising approach towards bridging the gap in the complexity bound for zeroth-order composite optimization.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion
Authors:
Jinhao He,
Huaiyang Huang,
Shuyang Zhang,
Jianhao Jiao,
Chengju Liu,
Ming Liu
Abstract:
Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve…
▽ More
Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve comparable onboard localization performance by tracking deep-learning visual features on a LiDAR-enhanced visual prior map. Experiments show that the proposed algorithm can provide centimeter-level global positioning results with scale, which is effortlessly integrated and favorable for low-cost robot system deployment in real-world applications.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Understanding chiral charge-density wave by frozen chiral phonon
Authors:
Shuai Zhang,
Kaifa Luo,
Tiantian Zhang
Abstract:
Charge density wave (CDW) is discovered within a wide interval in solids, however, its microscopic nature is still not transparent in most realistic materials, and the recently studied chiral ones with chiral structural distortion remain unclear. In this paper, we try to understand the driving forces of chiral CDW transition by chiral phonons from the electron-phonon coupling scenario. We use the…
▽ More
Charge density wave (CDW) is discovered within a wide interval in solids, however, its microscopic nature is still not transparent in most realistic materials, and the recently studied chiral ones with chiral structural distortion remain unclear. In this paper, we try to understand the driving forces of chiral CDW transition by chiral phonons from the electron-phonon coupling scenario. We use the prototypal monolayer 1T-TiSe$_2$ as a case study to unveil the absence of chirality in the CDW transition and propose a general approach, i.e., symmetry-breaking stimuli, to engineer the chirality of CDW in experiments. Inelastic scattering patterns are also studied as a benchmark of chiral CDW (CCDW, which breaks the mirror/inversion symmetry in 2D/3D systems). We notice that the anisotropy changing of Bragg peak profiles, which is contributed by the soft chiral phonons, can show a remarkable signature for CCDW. Our findings pave a path to understanding the CCDW from the chiral phonon perspective, especially in van der Waals materials, and provide a powerful way to manipulate the chirality of CDW.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
MAVIS: Mathematical Visual Instruction Tuning
Authors:
Renrui Zhang,
Xinyu Wei,
Dongzhi Jiang,
Yichi Zhang,
Ziyu Guo,
Chengzhuo Tong,
Jiaming Liu,
Aojun Zhou,
Bin Wei,
Shanghang Zhang,
Peng Gao,
Hongsheng Li
Abstract:
Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, a…
▽ More
Multi-modal Large Language Models (MLLMs) have recently emerged as a significant focus in academia and industry. Despite their proficiency in general multi-modal scenarios, the mathematical problem-solving capabilities in visual contexts remain insufficiently explored. We identify three key areas within MLLMs that need to be improved: visual encoding of math diagrams, diagram-language alignment, and mathematical reasoning skills. This draws forth an urgent demand for large-scale, high-quality data and training pipelines in visual mathematics. In this paper, we propose MAVIS, the first MAthematical VISual instruction tuning paradigm for MLLMs, involving a series of mathematical visual datasets and specialized MLLMs. Targeting the three issues, MAVIS contains three progressive training stages from scratch. First, we curate MAVIS-Caption, consisting of 558K diagram-caption pairs, to fine-tune a math-specific vision encoder (CLIP-Math) through contrastive learning, tailored for improved diagram visual encoding. Second, we utilize MAVIS-Caption to align the CLIP-Math with a large language model (LLM) by a projection layer, enhancing vision-language alignment in mathematical domains. Third, we introduce MAVIS-Instruct, including 900K meticulously collected and annotated visual math problems, which is adopted to finally instruct-tune the MLLM for robust mathematical reasoning skills. In MAVIS-Instruct, we incorporate complete chain-of-thought (CoT) rationales for each problem, and minimize textual redundancy, thereby concentrating the model towards the visual elements. Data and Models are released at https://github.com/ZrrSkywalker/MAVIS
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
GTA: A Benchmark for General Tool Agents
Authors:
Jize Wang,
Zerun Ma,
Yining Li,
Songyang Zhang,
Cailian Chen,
Kai Chen,
Xinyi Le
Abstract:
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, fa…
▽ More
Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Revisiting the Formulation of Charged Defect in Solids
Authors:
Hanzhi Shang,
Zeyu Jiang,
Yiyang Sun,
Damien West,
Shengbai Zhang
Abstract:
Defect physics is at the heart of microelectronics. By keeping track of the reference energy in total energy calculations, we explicitly show that the "potential alignment" correction vanishes, and the classic Markov-Payne correction yields accurate results. From linear response theory, we further formulate an accurate expression for the quadrupole correction. Application to numerous defects inclu…
▽ More
Defect physics is at the heart of microelectronics. By keeping track of the reference energy in total energy calculations, we explicitly show that the "potential alignment" correction vanishes, and the classic Markov-Payne correction yields accurate results. From linear response theory, we further formulate an accurate expression for the quadrupole correction. Application to numerous defects including anisotropic material yields accurate formation energies in small supercells and the historically slow convergence of the 2+ diamond vacancy is shown to be a result of slow varying gap levels of the defect leading to a size dependent dielectric constant.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Skin Effect of Nonlinear Optical Responses in Antiferromagnets
Authors:
Hang Zhou,
Rui-Chun Xiao,
Shu-Hui Zhang,
Wei Gan,
Hui Han,
Hong-Miao Zhao,
Wenjian Lu,
Changjin Zhang,
Yuping Sun,
Hui Li,
Ding-Fu Shao
Abstract:
Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear opt…
▽ More
Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear optical responses in antiferromagnets can be selectively accumulated near the surfaces, representing a skin effect. This is because the inversion symmetry, despite being broken globally, is barely violated locally deeply inside these antiferromagnets. Using A-type layered antiferromagnets as the representatives, we predict that the spatial-dependent nonlinear optical responses, such as bulk photovoltaic effect (BPVE) and second harmonic generation (SHG), are notable in the top- and bottom-most layers and decay rapidly when moving away from the surfaces. Such a phenomenon exists in a broad range of antiferromagnets composed of centrosymmetric sublattices, offering promising device applications using these antiferromagnets. Our work uncovers a previously overlooked property of nonlinear optical responses and opens new opportunities for high-performance antiferromagnetic optospintronics.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Revisiting the dead time effects of Insight-HXMT/ME on timing analysis
Authors:
Youli Tuo,
Xiaobo Li,
Ying Tan,
Baiyang Wu,
Weichun Jiang,
Liming Song,
Jinlu Qu,
Sudeep Gogate,
Shuang-Nan Zhang,
Andrea Santangelo
Abstract:
Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the sim…
▽ More
Dead time is a common instrumental effect of X-ray detectors which would alter the behavior of timing properties of astronomical signals, such as distorting the shape of power density spectra (PDS), affecting the root-mean-square of potential quasi-periodic oscillation signals, etc. We revisit the effects of the dead time of Medium Energy X-ray telescope (ME) onboard Insight-HXMT, based on the simulation of electronic read-out mechanism that causes the dead time, and the real data. We investigate dead time effects on the pulse profile as well as the Quasi-Periodic Oscillation (QPO) signals. The dead time coefficient suggests a linear correlation with the observed count rate in each phase bin of the pulse profile according to the simulation of periodic signal as well as the real data observed on Swift J0243.6+6124. The Fourier-amplitude-difference (FAD) method could well recover the intrinsic shape of the observed PDS in the case that the PDS is from two identical detectors. We apply this technique on ME, by splitting the 9 FPGA modules into 2 groups. The results indicate that the FAD technique suits the case when two groups of detectors are not largely different; and the recovered PDS of Sco X-1 observed by ME slightly enhances the significance of the previously known QPO signal, meanwhile the root-mean-square of QPO is significantly improved. We provide the FAD correction tool implemented in HXMTDAS for users in the future to better analyze QPO signals.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Revealing spontaneous symmetry breaking in continuous time crystals
Authors:
Yuanjiang Tang,
Chenyang Wang,
Bei Liu,
Jin Peng,
Chao Liang,
Yaohua Li,
Xian Zhao,
Cuicui Lu,
Shuang Zhang,
Yong-Chun Liu
Abstract:
Spontaneous symmetry breaking plays a pivotal role in physics ranging from the emergence of elementary particles to the phase transitions of matter. The spontaneous breaking of continuous time translation symmetry leads to a novel state of matter named continuous time crystal (CTC). It exhibits periodic oscillation without the need for periodic driving, and the relative phases for repetitively rea…
▽ More
Spontaneous symmetry breaking plays a pivotal role in physics ranging from the emergence of elementary particles to the phase transitions of matter. The spontaneous breaking of continuous time translation symmetry leads to a novel state of matter named continuous time crystal (CTC). It exhibits periodic oscillation without the need for periodic driving, and the relative phases for repetitively realized oscillations are random. However, the mechanism behind the spontaneous symmetry breaking in CTCs, particularly the random phases, remains elusive. Here we propose and experimentally realize two types of CTCs based on distinct mechanisms: manifold topology and near-chaotic motion. We observe both types of CTCs in thermal atomic ensembles by artificially synthesizing spin-spin nonlinear interactions through a measurement-feedback scheme. Our work provides general recipes for the realization of CTCs, and paves the way for exploring CTCs in various systems.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
Authors:
Yatai Ji,
Shilong Zhang,
Jie Wu,
Peize Sun,
Weifeng Chen,
Xuefeng Xiao,
Sidi Yang,
Yujiu Yang,
Ping Luo
Abstract:
The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and i…
▽ More
The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and intricate plots. Towards movie understanding, a critical initial step for LVLMs is to unleash the potential of character identities memory and recognition across multiple visual scenarios. To achieve the goal, we propose visual instruction tuning with ID reference and develop an ID-Aware Large Vision-Language Model, IDA-VLM. Furthermore, our research introduces a novel benchmark MM-ID, to examine LVLMs on instance IDs memory and recognition across four dimensions: matching, location, question-answering, and captioning. Our findings highlight the limitations of existing LVLMs in recognizing and associating instance identities with ID reference. This paper paves the way for future artificial intelligence systems to possess multi-identity visual inputs, thereby facilitating the comprehension of complex visual narratives like movies.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Electrical Impedance Tomography Based Closed-loop Tumor Treating Fields in Dynamic Lung Tumors
Authors:
Minmin Wang,
Xu Xie,
Yuxi Guo,
Liying Zhu,
Yue Lan,
Haitang Yang,
Yun Pan,
Guangdi Chen,
Shaomin Zhang,
Maomao Zhang
Abstract:
Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporatin…
▽ More
Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporating electrical impedance tomography (EIT) for real-time respiratory phase monitoring and dynamic parameter adjustments. Furthermore, we conduct theoretical analysis to evaluate the performance of the proposed method using the lung motion model. Compared to conventional TTFields settings, we observed that variations in the electrical conductivity of lung during different respiratory phases led to a decrease in the average electric field intensity within lung tumors, transitioning from end-expiratory (1.08 V/cm) to end-inspiratory (0.87 V/cm) phases. Utilizing our proposed closed-Loop TTFields approach at the same dose setting (2400 mA, consistent with the traditional TTFields setting), we can achieve a higher and consistent average electric field strength at the tumor site (1.30 V/cm) across different respiratory stages. Our proposed closed-loop TTFields method has the potential to improved lung tumor therapy by mitigating the impact of respiratory motion.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Revealing the evanescent components in Kronecker-product based codebooks: insights and implications
Authors:
Jun Yang,
Yijian Chen,
Yunqi Sun,
Yuan Si,
Hongkang Yu,
Shujuan Zhang,
Zhaohua Lu
Abstract:
The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed…
▽ More
The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed towards a specific angle of departure and the set of bases represent the orthogonal beams that cover the front hemisphere of an array. While the Kronecker-product based precoding scheme facilitates the concise indexing of a codeword in the codebooks through precoding matrix indicators (PMIs) in channel state information feedback, it introduces redundant spatial beams characterized by high spatial-frequency components. This paper investigates the presence of codewords representing high spatial-frequency components within the Kronecker-product based codebooks. Through theoretical analysis and simulations, we confirm the redundancy of these codewords in MIMO communications, advocating for their removal from the codebooks to enhance system performance. Several topics relevant to the high spatial components are also involved in the discussion. Practical suggestions regarding future standard design are provided based on our theoretical analysis and simulation results.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Realization of Conditional Operations through Transition Pathway Engineering
Authors:
Sheng Zhang,
Peng Duan,
Yun-Jie Wang,
Tian-Le Wang,
Peng Wang,
Ren-Ze Zhao,
Xiao-Yan Yang,
Ze-An Zhao,
Liang-Liang Guo,
Yong Chen,
Hai-Feng Zhang,
Lei Du,
Hao-Ran Tao,
Zhi-Fei Li,
Yuan Wu,
Zhi-Long Jia,
Wei-Cheng Kong,
Zhao-Yun Chen,
Yu-Chun Wu,
Guo-Ping Guo
Abstract:
In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-se…
▽ More
In the NISQ era, achieving large-scale quantum computing demands compact circuits to mitigate decoherence and gate error accumulation. Quantum operations with diverse degrees of freedom hold promise for circuit compression, but conventional approaches encounter challenges in simultaneously adjusting multiple parameters. Here, we propose a transition composite gate (TCG) scheme grounded on state-selective transition path engineering, enabling more expressive conditional operations. We experimentally validate a controlled unitary (CU) gate as an example, with independent and continuous parameters. By adjusting the parameters of $\rm X^{12}$ gate, we obtain the CU family with a fidelity range of 95.2% to 99.0% leveraging quantum process tomography (QPT). To demonstrate the capability of circuit compression, we use TCG scheme to prepare 3-qubit Greenberger-Horne-Zeilinger (GHZ) and W states, with the fidelity of 96.77% and 95.72%. TCG can achieve the reduction in circuit depth of about 40% and 44% compared with the use of CZ gates only. Moreover, we show that short-path TCG (SPTCG) can further reduce the state-preparation circuit time cost. The TCG scheme exhibits advantages in certain quantum circuits and shows significant potential for large-scale quantum algorithms.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Revolutionizing Battery Disassembly: The Design and Implementation of a Battery Disassembly Autonomous Mobile Manipulator Robot(BEAM-1)
Authors:
Yanlong Peng,
Zhigang Wang,
Yisheng Zhang,
Shengmin Zhang,
Nan Cai,
Fan Wu,
Ming Chen
Abstract:
The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disas…
▽ More
The efficient disassembly of end-of-life electric vehicle batteries(EOL-EVBs) is crucial for green manufacturing and sustainable development. The current pre-programmed disassembly conducted by the Autonomous Mobile Manipulator Robot(AMMR) struggles to meet the disassembly requirements in dynamic environments, complex scenarios, and unstructured processes. In this paper, we propose a Battery Disassembly AMMR(BEAM-1) system based on NeuralSymbolic AI. It detects the environmental state by leveraging a combination of multi-sensors and neural predicates and then translates this information into a quasi-symbolic space. In real-time, it identifies the optimal sequence of action primitives through LLM-heuristic tree search, ensuring high-precision execution of these primitives. Additionally, it employs positional speculative sampling using intuitive networks and achieves the disassembly of various bolt types with a meticulously designed end-effector. Importantly, BEAM-1 is a continuously learning embodied intelligence system capable of subjective reasoning like a human, and possessing intuition. A large number of real scene experiments have proved that it can autonomously perceive, decide, and execute to complete the continuous disassembly of bolts in multiple, multi-category, and complex situations, with a success rate of 98.78%. This research attempts to use NeuroSymbolic AI to give robots real autonomous reasoning, planning, and learning capabilities. BEAM-1 realizes the revolution of battery disassembly. Its framework can be easily ported to any robotic system to realize different application scenarios, which provides a ground-breaking idea for the design and implementation of future embodied intelligent robotic systems.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration
Authors:
Jiayi Liu,
Qianyu Zhang,
Xue Wan,
Shengyang Zhang,
Yaolin Tian,
Haodong Han,
Yutao Zhao,
Baichuan Liu,
Zeyuan Zhao,
Xubo Luo
Abstract:
With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes a…
▽ More
With the complexity of lunar exploration missions, the moon needs to have a higher level of autonomy. Environmental perception and navigation algorithms are the foundation for lunar rovers to achieve autonomous exploration. The development and verification of algorithms require highly reliable data support. Most of the existing lunar datasets are targeted at a single task, lacking diverse scenes and high-precision ground truth labels. To address this issue, we propose a multi-task, multi-scene, and multi-label lunar benchmark dataset LuSNAR. This dataset can be used for comprehensive evaluation of autonomous perception and navigation systems, including high-resolution stereo image pairs, panoramic semantic labels, dense depth maps, LiDAR point clouds, and the position of rover. In order to provide richer scene data, we built 9 lunar simulation scenes based on Unreal Engine. Each scene is divided according to topographic relief and the density of objects. To verify the usability of the dataset, we evaluated and analyzed the algorithms of semantic segmentation, 3D reconstruction, and autonomous navigation. The experiment results prove that the dataset proposed in this paper can be used for ground verification of tasks such as autonomous environment perception and navigation, and provides a lunar benchmark dataset for testing the accessibility of algorithm metrics. We make LuSNAR publicly available at: https://github.com/autumn999999/LuSNAR-dataset.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A comparative study of ultraluminous infrared galaxies in the IRAS and SDSS Surveys
Authors:
Shaohua Zhang,
Zhijian Luo,
Xiheng Shi,
Chenggan Shu,
Hubing Xiao,
Hongyan Zhou
Abstract:
We present a comprehensive study of Ultraluminous Infrared Galaxies (ULIRGs), leveraging data from the IRAS Faint Source Catalogue (FSC) and the spectroscopic catalog in the Sloan Digital Sky Survey (SDSS) DR16. Our meticulous cross-matching technique significantly enhances the reliability of ULIRG identification, resulting in the identification of 283 reliable ULIRGs, including 102 new detections…
▽ More
We present a comprehensive study of Ultraluminous Infrared Galaxies (ULIRGs), leveraging data from the IRAS Faint Source Catalogue (FSC) and the spectroscopic catalog in the Sloan Digital Sky Survey (SDSS) DR16. Our meticulous cross-matching technique significantly enhances the reliability of ULIRG identification, resulting in the identification of 283 reliable ULIRGs, including 102 new detections, while discarding 120 previously reported false sources. Covering a redshift range of $z = 0.018 - 0.996$, with a median redshift of $\bar{z} = 0.259$, our uniform sample reveals apparent interaction features in approximately 40\% of ULIRGs, increasing to 92\% for those with $z < 0.1$. Through optical spectra analysis, it is indicated that over 58\% of ULIRGs host an AGN, which is twice as high as the detections based solely on infrared colors. Moreover, a pronounced excess of radio emissions associated with AGN activity results in a steeper radio-far-infrared correlation. Notably, Type I ULIRGs exhibit properties similar to those of narrow-line Seyfert 1 galaxies (NLS1s), with an elevated incidence rate of \ion{Mg}{2} BALs (16.7\%), surpassing that of typical optically selected quasars by over tenfold, consistent with current evolutionary models. We anticipate that forthcoming telescopes such as the China Space Station Telescope (CSST) and Leighton Chajnantor Telescope (LCT) will provide deeper insights into ULIRG morphology, dust distribution, molecular gas, and AGN activity.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Authors:
Ruibo Fu,
Xin Qi,
Zhengqi Wen,
Jianhua Tao,
Tao Wang,
Chunyu Qiang,
Zhiyong Wang,
Yi Lu,
Xiaopeng Wang,
Shuchen Shi,
Yukun Liu,
Xuefei Liu,
Shuai Zhang
Abstract:
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we…
▽ More
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we propose an Agile Speaker Representation Reinforcement Learning strategy to enhance speaker similarity in speaker adaptation tasks. ASRRL is the first work to apply reinforcement learning to improve the modeling accuracy of speaker embeddings in speaker adaptation, addressing the challenge of decoupling voice content and timbre. Our approach introduces two action strategies tailored to different reference speeches scenarios. In the single-sentence scenario, a knowledge-oriented optimal routine searching RL method is employed to expedite the exploration and retrieval of refinement information on the fringe of speaker representations. In the few-sentence scenario, we utilize a dynamic RL method to adaptively fuse reference speeches, enhancing the robustness and accuracy of speaker modeling. To achieve optimal results in the target domain, a multi-scale fusion scoring mechanism based reward model that evaluates speaker similarity, speech quality, and intelligibility across three dimensions is proposed, ensuring that improvements in speaker similarity do not compromise speech quality or intelligibility. The experimental results on the LibriTTS and VCTK datasets within mainstream TTS frameworks demonstrate the extensibility and generalization capabilities of the proposed ASRRL method. The results indicate that the ASRRL method significantly outperforms traditional fine-tuning approaches, achieving higher speaker similarity and better overall speech quality with limited reference speeches.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Authors:
Zhihao Du,
Qian Chen,
Shiliang Zhang,
Kai Hu,
Heng Lu,
Yexin Yang,
Hangrui Hu,
Siqi Zheng,
Yue Gu,
Ziyang Ma,
Zhifu Gao,
Zhijie Yan
Abstract:
Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role…
▽ More
Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role in LLM-based TTS models. Current speech tokens are learned in an unsupervised manner, which lacks explicit semantic information and alignment to the text. In this paper, we propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis. Experimental results show that supervised semantic tokens significantly outperform existing unsupervised tokens in terms of content consistency and speaker similarity for zero-shot voice cloning. Moreover, we find that utilizing large-scale data further improves the synthesis performance, indicating the scalable capacity of CosyVoice. To the best of our knowledge, this is the first attempt to involve supervised speech tokens into TTS models.
△ Less
Submitted 9 July, 2024; v1 submitted 7 July, 2024;
originally announced July 2024.
-
A timing view of the additional high-energy spectral component discovered in the black hole candidate Swift J1727.8-1613
Authors:
Zi-Xu Yang,
Liang Zhang,
Shuang-Nan Zhang,
L. Tao,
Shu Zhang,
Ruican Ma,
Qingcui Bu,
Yue Huang,
He-Xin Liu,
Wei Yu,
Guang C. Xiao,
Peng-Ju Wang,
Hua Feng,
Li-Ming Song,
Xiang Ma,
Mingyu Ge,
QingChang Zhao,
J. L. Qu
Abstract:
We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. I…
▽ More
We present an energy-dependent analysis for the type-C quasi-periodic oscillations (QPOs) observed in the black hole X-ray binary Swift J1727.8-1613 using Insight-HXMT observations. We find that the QPO fractional rms at energies above 40 keV is significantly higher than that below 20 keV. This is the first report of a high energy (HE)-rms excess in the rms spectrum of a black hole X-ray binary. In the high energy band, an extra hard component is observed in additional to the standard thermal Comptonization component at similar energy band. The value of the QPO HE-rms excess is not only correlated with the disk parameters and the photon index of the standard Comptonization component, but also exhibits a moderate positive correlation with the flux of the additional hard spectral component. No features in the QPO phase-lag spectra are seen corresponding to the additional hard component. We propose that the additional hard component in the spectrum may originate from jet emission and the associated QPO HE-rms excess can be explained by the precession of the jet base.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
Three-Body Recombination of Ultracold Microwave-Shielded Polar Molecules
Authors:
Ian Stevenson,
Shayamal Singh,
Ahmed Elkamshishy,
Niccoló Bigagli,
Weijun Yuan,
Siwei Zhang,
Chris H. Greene,
Sebastian Will
Abstract:
A combined experimental and theoretical study is carried out on the three-body recombination process in a gas of microwave-shielded polar molecules. For ground-state polar molecules dressed with a strong microwave field, field-linked bound states can appear in the intermolecular potential. We model three-body recombination into such bound states using classical trajectory calculations. Our results…
▽ More
A combined experimental and theoretical study is carried out on the three-body recombination process in a gas of microwave-shielded polar molecules. For ground-state polar molecules dressed with a strong microwave field, field-linked bound states can appear in the intermolecular potential. We model three-body recombination into such bound states using classical trajectory calculations. Our results show that recombination can explain the enhanced loss rates observed at small microwave detunings in trapped samples of bosonic NaCs [Bigagli, $\textit{et al.}$, Nat. Phys. $\textbf{19}$ 1579-1584 (2023)]. Specifically, our calculations reproduce the experimentally measured three-body loss rates across a wide range of microwave Rabi couplings, detunings, and temperatures. This work suggests that for bosonic shielded molecular systems in which the two-body loss is sufficiently suppressed and a field-linked bound state is present, the dominant loss process will be three-body recombination.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Longitudinal optical phonons in photonic time crystals containing a stationary charge
Authors:
Sihao Zhang,
Junhua Dong,
Huanan Li,
Jingjun Xu,
Boris Shapiro
Abstract:
Lorentzian-type media support optical phonons that oscillate with longitudinal polarization parallel to the wave direction, at a wave vector-independent frequency at which the permittivity becomes zero. Here, we study the interactions between the longitudinal optical phonons and Lorentzian medium-based dispersive photonic time crystals (PTCs). We demonstrate that a stationary charge embedded in th…
▽ More
Lorentzian-type media support optical phonons that oscillate with longitudinal polarization parallel to the wave direction, at a wave vector-independent frequency at which the permittivity becomes zero. Here, we study the interactions between the longitudinal optical phonons and Lorentzian medium-based dispersive photonic time crystals (PTCs). We demonstrate that a stationary charge embedded in the PTCs can excite these longitudinal modes through the conversion of the static polarization field induced by the charge. Furthermore, the PTCs can develop a momentum bandgap across the entire wave vector space to amplify the longitudinal modes. Remarkably, this infinite momentum bandgap can be established with minimal temporal modulation of the refractive index when creating the PTCs. Our approach expands the range of waves that can be manipulated in PTCs and shows potential for observing momentum bandgap phenomenon in realistic optical experiments, where the modulation depth of the refractive index is severely constrained.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Enabling On-Device LLMs Personalization with Smartphone Sensing
Authors:
Shiquan Zhang,
Ying Ma,
Le Fang,
Hong Jia,
Simon D'Alfonso,
Vassilis Kostakos
Abstract:
This demo presents a novel end-to-end framework that combines on-device large language models (LLMs) with smartphone sensing technologies to achieve context-aware and personalized services. The framework addresses critical limitations of current personalization solutions via cloud-based LLMs, such as privacy concerns, latency and cost, and limited personal sensor data. To achieve this, we innovati…
▽ More
This demo presents a novel end-to-end framework that combines on-device large language models (LLMs) with smartphone sensing technologies to achieve context-aware and personalized services. The framework addresses critical limitations of current personalization solutions via cloud-based LLMs, such as privacy concerns, latency and cost, and limited personal sensor data. To achieve this, we innovatively proposed deploying LLMs on smartphones with multimodal sensor data and customized prompt engineering, ensuring privacy and enhancing personalization performance through context-aware sensing. A case study involving a university student demonstrated the proposed framework's capability to provide tailored recommendations. In addition, we show that the proposed framework achieves the best trade-off in privacy, performance, latency, cost, battery and energy consumption between on-device and cloud LLMs. Future work aims to integrate more diverse sensor data and conduct large-scale user studies to further refine the personalization. We envision the proposed framework could significantly improve user experiences in various domains such as healthcare, productivity, and entertainment by providing secure, context-aware, and efficient interactions directly on users' devices.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Novel Optimization Techniques for Parameter Estimation
Authors:
Chenyu Wu,
Nuozhou Wang,
Casey Garner,
Kevin Leder,
Shuzhong Zhang
Abstract:
In this paper, we introduce a new optimization algorithm that is well suited for solving parameter estimation problems. We call our new method cubic regularized Newton with affine scaling (CRNAS). In contrast to so-called first-order methods which rely solely on the gradient of the objective function, our method utilizes the Hessian of the objective. As a result it is able to focus on points satis…
▽ More
In this paper, we introduce a new optimization algorithm that is well suited for solving parameter estimation problems. We call our new method cubic regularized Newton with affine scaling (CRNAS). In contrast to so-called first-order methods which rely solely on the gradient of the objective function, our method utilizes the Hessian of the objective. As a result it is able to focus on points satisfying the second-order optimality conditions, as opposed to first-order methods that simply converge to critical points. This is an important feature in parameter estimation problems where the objective function is often non-convex and as a result there can be many critical points making it is near impossible to identify the global minimum. An important feature of parameter estimation in mathematical models of biological systems is that the parameters are constrained by either physical constraints or prior knowledge. We use an affine scaling approach to handle a wide class of constraints. We establish that CRNAS identifies a point satisfying $ε$-approximate second-order optimality conditions within $O(ε^{-3/2})$ iterations. Finally, we compare CRNAS with MATLAB's optimization solver fmincon on three different test problems. These test problems all feature mixtures of heterogeneous populations, a problem setting that CRNAS is particularly well-suited for. Our numerical simulations show CRNAS has favorable performance, performing comparable if not better than fmincon in accuracy and computational cost for most of our examples.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
A Unified Intracellular pH Landscape with SITE-pHorin: a Quantum-Entanglement-Enhanced pH Probe
Authors:
Shu-Ang Li,
Xiao-Yan Meng,
Su Zhang,
Ying-Jie Zhang,
Run-Zhou Yang,
Dian-Dian Wang,
Yang Yang,
Pei-Pei Liu,
Jian-Sheng Kang
Abstract:
An accurate map of intracellular organelle pH is crucial for comprehending cellular metabolism and organellar functions. However, a unified intracellular pH spectrum using a single probe is still lack. Here, we developed a novel quantum entanglement-enhanced pH-sensitive probe called SITE-pHorin, which featured a wide pH-sensitive range and ratiometric quantitative measurement capabilities. Subseq…
▽ More
An accurate map of intracellular organelle pH is crucial for comprehending cellular metabolism and organellar functions. However, a unified intracellular pH spectrum using a single probe is still lack. Here, we developed a novel quantum entanglement-enhanced pH-sensitive probe called SITE-pHorin, which featured a wide pH-sensitive range and ratiometric quantitative measurement capabilities. Subsequently, we measured the pH of various organelles and their sub-compartments, including mitochondrial sub-spaces, Golgi stacks, endoplasmic reticulum, lysosomes, peroxisomes, and endosomes in COS-7 cells. For the long-standing debate on mitochondrial compartments pH, we measured the pH of mitochondrial cristae as 6.60 \pm 0.40, the pH of mitochondrial intermembrane space as 6.95 \pm 0.30, and two populations of mitochondrial matrix pH at approximately 7.20 \pm 0.27 and 7.50 \pm 0.16, respectively. Notably, the lysosome pH exhibited a single, narrow Gaussian distribution centered at 4.79 \pm 0.17. Furthermore, quantum chemistry computations revealed that both the deprotonation of the residue Y182 and the discrete curvature of deformed benzene ring in chromophore are both necessary for the quantum entanglement mechanism of SITE-pHorin. Intriguingly, our findings reveal an accurate pH gradient (0.6-0.9 pH unit) between mitochondrial cristae and matrix, suggesting prior knowledge about ΔpH (0.4-0.6) and mitochondrial proton motive force (pmf) are underestimated.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Sparsest Models Elude Pruning: An Exposé of Pruning's Current Capabilities
Authors:
Stephen Zhang,
Vardan Papyan
Abstract:
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared…
▽ More
Pruning has emerged as a promising approach for compressing large-scale models, yet its effectiveness in recovering the sparsest of models has not yet been explored. We conducted an extensive series of 485,838 experiments, applying a range of state-of-the-art pruning algorithms to a synthetic dataset we created, named the Cubist Spiral. Our findings reveal a significant gap in performance compared to ideal sparse networks, which we identified through a novel combinatorial search algorithm. We attribute this performance gap to current pruning algorithms' poor behaviour under overparameterization, their tendency to induce disconnected paths throughout the network, and their propensity to get stuck at suboptimal solutions, even when given the optimal width and initialization. This gap is concerning, given the simplicity of the network architectures and datasets used in our study. We hope that our research encourages further investigation into new pruning techniques that strive for true network sparsity.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Authors:
Keyu An,
Qian Chen,
Chong Deng,
Zhihao Du,
Changfeng Gao,
Zhifu Gao,
Yue Gu,
Ting He,
Hangrui Hu,
Kai Hu,
Shengpeng Ji,
Yabin Li,
Zerui Li,
Heng Lu,
Haoneng Luo,
Xiang Lv,
Bin Ma,
Ziyang Ma,
Chongjia Ni,
Changhe Song,
Jiaqi Shi,
Xian Shi,
Hao Wang,
Wen Wang,
Yuxuan Wang
, et al. (8 additional authors not shown)
Abstract:
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp…
▽ More
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM.
△ Less
Submitted 10 July, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Performance of Medical Image Fusion in High-level Analysis Tasks: A Mutual Enhancement Framework for Unaligned PAT and MRI Image Fusion
Authors:
Yutian Zhong,
Jinchuan He,
Zhichao Liang,
Shuangyang Zhang,
Qianjin Feng,
Wufan Chen,
Li Qi
Abstract:
Photoacoustic tomography (PAT) offers optical contrast, whereas magnetic resonance imaging (MRI) excels in imaging soft tissue and organ anatomy. The fusion of PAT with MRI holds promising application prospects due to their complementary advantages. Existing image fusion have made considerable progress in pre-registered images, yet spatial deformations are difficult to avoid in medical imaging sce…
▽ More
Photoacoustic tomography (PAT) offers optical contrast, whereas magnetic resonance imaging (MRI) excels in imaging soft tissue and organ anatomy. The fusion of PAT with MRI holds promising application prospects due to their complementary advantages. Existing image fusion have made considerable progress in pre-registered images, yet spatial deformations are difficult to avoid in medical imaging scenarios. More importantly, current algorithms focus on visual quality and statistical metrics, thus overlooking the requirements of high-level tasks. To address these challenges, we proposes a unsupervised fusion model, termed PAMRFuse+, which integrates image generation and registration. Specifically, a cross-modal style transfer network is introduced to simplify cross-modal registration to single-modal registration. Subsequently, a multi-level registration network is employed to predict displacement vector fields. Furthermore, a dual-branch feature decomposition fusion network is proposed to address the challenges of cross-modal feature modeling and decomposition by integrating modality-specific and modality-shared features. PAMRFuse+ achieves satisfactory results in registering and fusing unaligned PAT-MRI datasets. Moreover, for the first time, we evaluate the performance of medical image fusion with contour segmentation and multi-organ instance segmentation. Extensive experimental demonstrations reveal the advantages of PAMRFuse+ in improving the performance of medical image analysis tasks.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Probing Perfection: The Relentless Art of Meddling for Pulmonary Airway Segmentation from HRCT via a Human-AI Collaboration Based Active Learning Method
Authors:
Shiyi Wang,
Yang Nan,
Sheng Zhang,
Federico Felder,
Xiaodan Xing,
Yingying Fang,
Javier Del Ser,
Simon L F Walsh,
Guang Yang
Abstract:
In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies w…
▽ More
In pulmonary tracheal segmentation, the scarcity of annotated data is a prevalent issue in medical segmentation. Additionally, Deep Learning (DL) methods face challenges: the opacity of 'black box' models and the need for performance enhancement. Our Human-Computer Interaction (HCI) based models (RS_UNet, LC_UNet, UUNet, and WD_UNet) address these challenges by combining diverse query strategies with various DL models. We train four HCI models and repeat these steps: (1) Query Strategy: The HCI models select samples that provide the most additional representative information when labeled in each iteration and identify unlabeled samples with the greatest predictive disparity using Wasserstein Distance, Least Confidence, Entropy Sampling, and Random Sampling. (2) Central line correction: Selected samples are used for expert correction of system-generated tracheal central lines in each training round. (3) Update training dataset: Experts update the training dataset after each DL model's training epoch, enhancing the trustworthiness and performance of the models. (4) Model training: The HCI model is trained using the updated dataset and an enhanced UNet version. Experimental results confirm the effectiveness of these HCI-based approaches, showing that WD-UNet, LC-UNet, UUNet, and RS-UNet achieve comparable or superior performance to state-of-the-art DL models. Notably, WD-UNet achieves this with only 15%-35% of the training data, reducing physician annotation time by 65%-85%.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Fisher-aware Quantization for DETR Detectors with Critical-category Objectives
Authors:
Huanrui Yang,
Yafeng Huang,
Zhen Dong,
Denis A Gudovskiy,
Tomoyuki Okuno,
Yohei Nakata,
Yuan Du,
Kurt Keutzer,
Shanghang Zhang
Abstract:
The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical…
▽ More
The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical-category performance, as a crucial yet largely overlooked fine-grained objective for detection tasks. We analyze the impact of quantization at the category-level granularity, and propose methods to improve performance for the critical categories. Specifically, we find that certain critical categories have a higher sensitivity to quantization, and are prone to overfitting after quantization-aware training (QAT). To explain this, we provide theoretical and empirical links between their performance gaps and the corresponding loss landscapes with the Fisher information framework. Using this evidence, we apply a Fisher-aware mixed-precision quantization scheme, and a Fisher-trace regularization for the QAT on the critical-category loss landscape. The proposed methods improve critical-category metrics of the quantized transformer-based DETR detectors. They are even more significant in case of larger models and higher number of classes where the overfitting becomes more severe. For example, our methods lead to 10.4% and 14.5% mAP gains for, correspondingly, 4-bit DETR-R50 and Deformable DETR on the most impacted critical classes in the COCO Panoptic dataset.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Observation of Co-propagating Chiral Zero Modes in Magnetic Photonic Crystals
Authors:
Zhongfu Li,
Shaojie Ma,
Shuwei Li,
Oubo you,
Yachao Liu,
Qingdong Yang,
Yuanjiang Xiang,
Peiheng Zhou,
Shuang Zhang
Abstract:
Topological singularities, such as Weyl points and Dirac points, can give rise to unidirectional propagation channels known as chiral zero modes (CZMs) when subject to a magnetic field. These CZMs are responsible for intriguing phenomena like the chiral anomaly in quantum systems. The propagation direction of each CZM is determined by both the applied magnetic field and the topological charge of t…
▽ More
Topological singularities, such as Weyl points and Dirac points, can give rise to unidirectional propagation channels known as chiral zero modes (CZMs) when subject to a magnetic field. These CZMs are responsible for intriguing phenomena like the chiral anomaly in quantum systems. The propagation direction of each CZM is determined by both the applied magnetic field and the topological charge of the singularity point. While counter-propagating CZMs have been observed in 2D and 3D systems, the realization of co-propagating CZMs has remained elusive. Here we present the first experimental observation of co-propagating CZMs in magnetic photonic crystals hosting a single pair of ideal Weyl points WPs. By manipulating the crystal's structural configuration, we spatially alter the locations of the WPs, creating pseudo-magnetic fields in opposite directions between them. This arrangement results in a pair of CZMs that possess the same group velocity and co-propagate. Our work opens up new possibilities for topological manipulation of wave propagation and may lead to advancements in optical waveguides, switches, and various other applications.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Authors:
Pan Zhang,
Xiaoyi Dong,
Yuhang Zang,
Yuhang Cao,
Rui Qian,
Lin Chen,
Qipeng Guo,
Haodong Duan,
Bin Wang,
Linke Ouyang,
Songyang Zhang,
Wenwei Zhang,
Yining Li,
Yang Gao,
Peng Sun,
Xinyue Zhang,
Wei Li,
Jingwen Li,
Wenhai Wang,
Hang Yan,
Conghui He,
Xingcheng Zhang,
Kai Chen,
Jifeng Dai,
Yu Qiao
, et al. (2 additional authors not shown)
Abstract:
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th…
▽ More
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Thorium doped strontium fluoride crystal: a unique candidate for solid nuclear optical clock material
Authors:
Qiaorui Gong,
Shanming Li,
Shulong Zhang,
Siliang Tao,
Guoliang Deng,
Peixiong Zhang,
Chengchun Zhao,
Yin Hang,
Shining Zhu,
Longsheng Ma
Abstract:
We report a candidate with unique advantages in the cultivation of solid-state nuclear clock material, Th:SrF2 crystal. It not only has a segregation coefficient close to 1, which can achieve highly efficient and uniform doping of Th, but also ensures a high transmittance (~69% at 150 nm) while achieving extremely high doping concentration (232Th>6*10^20 cm^(-3). In addition, SrF2 crystal will not…
▽ More
We report a candidate with unique advantages in the cultivation of solid-state nuclear clock material, Th:SrF2 crystal. It not only has a segregation coefficient close to 1, which can achieve highly efficient and uniform doping of Th, but also ensures a high transmittance (~69% at 150 nm) while achieving extremely high doping concentration (232Th>6*10^20 cm^(-3). In addition, SrF2 crystal will not be irradiated-colored under strong α radiation like CaF2 crystal, Th:SrF2 crystal is expected to fully unleash its high concentration doping characteristics while ensuring its transmission performance in nuclear transition band not be severely affected by 229Th radiation damage.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Design of a UE5-based digital twin platform
Authors:
Shaoqiu Lyu,
Muzhi Wang,
Sunrui Zhang,
Shengzhi Wang
Abstract:
Aiming at the current mainstream 3D scene engine learning and building cost is too high, this thesis proposes a digital twin platform design program based on Unreal Engine 5 (UE5). It aims to provide a universal platform construction design process to effectively reduce the learning cost of large-scale scene construction. Taking an actual project of a unit as an example, the overall cycle work of…
▽ More
Aiming at the current mainstream 3D scene engine learning and building cost is too high, this thesis proposes a digital twin platform design program based on Unreal Engine 5 (UE5). It aims to provide a universal platform construction design process to effectively reduce the learning cost of large-scale scene construction. Taking an actual project of a unit as an example, the overall cycle work of platform building is explained, and the digital twin and data visualization technologies and applications based on UE5 are analyzed. By summarizing the project implementation into a process approach, the standardization and operability of the process pathway is improved.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text and on-device LLM
Authors:
Le Fang,
Shiquan Zhang,
Hong Jia,
Jorge Goncalves,
Vassilis Kostakos
Abstract:
Smartphones have become essential to people's digital lives, providing a continuous stream of information and connectivity. However, this constant flow can lead to moments where users are simply passing time rather than engaging meaningfully. This underscores the importance of developing methods to identify these "time-killing" moments, enabling the delivery of important notifications in a way tha…
▽ More
Smartphones have become essential to people's digital lives, providing a continuous stream of information and connectivity. However, this constant flow can lead to moments where users are simply passing time rather than engaging meaningfully. This underscores the importance of developing methods to identify these "time-killing" moments, enabling the delivery of important notifications in a way that minimizes interruptions and enhances user engagement. Recent work has utilized screenshots taken every 5 seconds to detect time-killing activities on smartphones. However, this method often misses to capture phone usage between intervals. We demonstrate that up to 50% of time-killing instances go undetected using screenshots, leading to substantial gaps in understanding user behavior. To address this limitation, we propose a method called ScreenTK that detects time-killing moments by leveraging continuous screen text monitoring and on-device large language models (LLMs). Screen text contains more comprehensive information than screenshots and allows LLMs to summarize detailed phone usage. To verify our framework, we conducted experiments with six participants, capturing 1,034 records of different time-killing moments. Initial results show that our framework outperforms state-of-the-art solutions by 38% in our case study.
△ Less
Submitted 7 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (639 additional authors not shown)
Abstract:
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be…
▽ More
A high precision measurement of the branching fraction of the decay $J/ψ\to p \bar{p} η$ is performed using $(10 087 \pm 44) \times 10^6$ $J/ψ$ events recorded by the {BESIII} detector at the {BEPCII} storage ring. The branching fractions of the two decays $J/ψ\to p \bar{p} η(η\to γγ)$ and $J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)$ are measured individually to be $\mathcal{B}(J/ψ\to p \bar{p} η(η\to γγ)) = (1.480 \pm 0.001 \pm 0.024)\times\,10^{-3}$ and $\mathcal{B}(J/ψ\to p \bar{p} η(η\to π^+ π^- π^0)) = (1.557 \pm 0.003 \pm 0.038)\times\,10^{-3}$, where the first uncertainties are statistical and the second systematic. Both results are compatible within their uncorrelated systematic uncertainties. The combined result is $\mathcal{B}(J/ψ\to p \bar{p} η)=(1.495 \pm 0.001 \pm 0.023)\times\,10^{-3}$ where the first uncertainty is the combined statistical uncertainty and the second one the combined systematic uncertainty of both analyses, incorporating correlations between them. In addition, the $p \bar{p}$ threshold region is investigated for a potential threshold enhancement, and no evidence for one is observed.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
A versatile quantum microwave photonic signal processing platform based on coincidence window selection technique
Authors:
Xinghua Li,
Yifan Guo,
Xiao Xiang,
Runai Quan,
Mingtao Cao,
Ruifang Dong,
Tao Liu,
Ming Li,
Shougang Zhang
Abstract:
Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatili…
▽ More
Quantum microwave photonics (QMWP) is an innovative approach that combines energy-time entangled biphoton sources as the optical carrier with time-correlated single-photon detection for high-speed RF signal recovery. This groundbreaking method offers unique advantages such as nonlocal RF signal encoding and robust resistance to dispersion-induced frequency fading. This paper explores the versatility of processing the quantum microwave photonic signal by utilizing coincidence window selection on the biphoton coincidence distribution. The demonstration includes finely-tunable RF phase shifting, flexible multi-tap transversal filtering (with up to 15 taps), and photonically implemented RF mixing, leveraging the nonlocal RF mapping characteristic of QMWP. These accomplishments significantly enhance the capability of microwave photonic systems in processing ultra-weak signals, opening up new possibilities for various applications.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Quantum microwave photonic mixer with a large spurious-free dynamic range
Authors:
Xinghua Li,
Yifan Guo,
Xiao Xiang,
Runai Quan,
Mingtao Cao,
Ruifang Dong,
Tao Liu,
Ming Li,
Shougang Zhang
Abstract:
As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solu…
▽ More
As one of the most fundamental functionalities of microwave photonics, microwave frequency mixing plays an essential role in modern radars and wireless communication systems. However, the commonly utilized intensity modulation in the systems often leads to inadequate spurious-free dynamic range (SFDR) for many sought-after applications. Quantum microwave photonics technique offers a promising solution for improving SFDR in terms of higher-order harmonic distortion. In this paper, we demonstrate two types of quantum microwave photonic mixers based on the configuration of the intensity modulators: cascade-type and parallel-type. Leveraging the nonlocal RF signal encoding capability, both types of quantum microwave photonic mixers not only exhibit the advantage of dual-channel output but also present significant improvement in SFDR. Specifically, the parallel-type quantum microwave photonic mixer achieves a remarkable SFDR value of 113.6 dB.Hz1/2, which is 30 dB better than that of the cascade-type quantum microwave photonic mixer. When compared to the classical microwave photonic mixer, this enhancement reaches a notable 53.6 dB at the expense of 8 dB conversion loss. These results highlight the superiority of quantum microwave photonic mixers in the fields of microwave and millimeter-wave systems. Further applying multi-photon frequency entangled sources as optical carriers, the dual-channel microwave frequency conversion capability endowed by the quantum microwave photonic mixer can be extended to enhance the performance of multiple-paths microwave mixing which is essential for radar net systems.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Comparison of Short-Range Order in GeSn Grown by Molecular Beam Epitaxy and Chemical Vapor Deposition
Authors:
Shang Liu,
Yunfan Liang,
Haochen Zhao,
Nirosh M. Eldose,
Jin-Hee Bae,
Omar Concepcion,
Xiaochen Jin,
Shunda Chen,
Ilias Bikmukhametov,
Austin Akey,
Cory T. Cline,
Alejandra Cuervo Covian,
Xiaoxin Wang,
Tianshu Li,
Yuping Zeng,
Dan Buca,
Shui-Qing Yu,
Gregory J. Salamo,
Shengbai Zhang,
Jifeng Liu
Abstract:
Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom pr…
▽ More
Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom probe tomography. An $\sim$15% stronger preference for Sn-Sn 1$^{st}$ nearest neighbor (1NN) is observed in MBE GeSn than CVD GeSn for both thin film and quantum well samples with Sn composition ranging from 7 to 20%. Interestingly, samples grown by different deposition tools under the same method (either MBE or CVD) showed remarkable consistency in Sn-Sn 1NN SRO, while MBE vs. CVD showed clear differences. Supported by theoretical modeling, we consider that this difference in SRO originates from the impact of surface termination, where MBE surfaces are exposed to ultrahigh vacuum while CVD surfaces are terminated by H to a good extent. This finding not only suggests engineering surface termination or surfactants during the growth as a potential approach to control SRO in GeSn, but also provides insight into the underlying reasons for very different growth temperature between MBE and CVD that directly impact the strain relaxation behavior.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function
Authors:
Kuo Gai,
Shihua Zhang
Abstract:
In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$
Y_1=f(X_1),Y_2=f(X_2) $$ for…
▽ More
In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$
Y_1=f(X_1),Y_2=f(X_2) $$ for $X_1,X_2,Y_1,Y_2$ with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,$Y=WX$.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
On the uniqueness of the strictly convex quadrilateral central configuration with a fixed angle
Authors:
Yangshanshan Liu,
Shiqing Zhang
Abstract:
The conjecture of the existence and the uniqueness of the strictly convex quadrilateral central configuration for the Newtonian 4-body problem is one of the most-talked open problems in the study of the classical n-body problems in celestial mechanics. MacMillan and Bartky first gave its general existence in the 1930s and a particular case for its uniqueness. Still, the general case has yet to be…
▽ More
The conjecture of the existence and the uniqueness of the strictly convex quadrilateral central configuration for the Newtonian 4-body problem is one of the most-talked open problems in the study of the classical n-body problems in celestial mechanics. MacMillan and Bartky first gave its general existence in the 1930s and a particular case for its uniqueness. Still, the general case has yet to be solved perfectly since it was considered by Sim'{o} and Yoccoz in the 1980s and was first mentioned by Albouy and Fu in 2008 in the formal publication. Using coordinates of mutual distances and Morse's critical point theory, we give the (at most) uniqueness of the planar strictly convex 4-body central configuration when the angle of one pair of the opposite sides is given.
△ Less
Submitted 8 July, 2024; v1 submitted 2 July, 2024;
originally announced July 2024.
-
HRSAM: Efficiently Segment Anything in High-Resolution Images
Authors:
You Huang,
Wenbin Lai,
Jiayi Ji,
Liujuan Cao,
Shengchuan Zhang,
Rongrong Ji
Abstract:
The Segment Anything Model (SAM) has significantly advanced interactive segmentation but struggles with high-resolution images crucial for high-precision segmentation. This is primarily due to the quadratic space complexity of SAM-implemented attention and the length extrapolation issue in common global attention. This study proposes HRSAM that integrates Flash Attention and incorporates Plain, Sh…
▽ More
The Segment Anything Model (SAM) has significantly advanced interactive segmentation but struggles with high-resolution images crucial for high-precision segmentation. This is primarily due to the quadratic space complexity of SAM-implemented attention and the length extrapolation issue in common global attention. This study proposes HRSAM that integrates Flash Attention and incorporates Plain, Shifted and newly proposed Cycle-scan Window (PSCWin) attention to address these issues. The shifted window attention is redesigned with padding to maintain consistent window sizes, enabling effective length extrapolation. The cycle-scan window attention adopts the recently developed State Space Models (SSMs) to ensure global information exchange with minimal computational overhead. Such window-based attention allows HRSAM to perform effective attention computations on scaled input images while maintaining low latency. Moreover, we further propose HRSAM++ that additionally employs a multi-scale strategy to enhance HRSAM's performance. The experiments on the high-precision segmentation datasets HQSeg44K and DAVIS show that high-resolution inputs enable the SAM-distilled HRSAM models to outperform the teacher model while maintaining lower latency. Compared to the SOTAs, HRSAM achieves a 1.56 improvement in interactive segmentation's NoC95 metric with only 31% of the latency. HRSAM++ further enhances the performance, achieving a 1.63 improvement in NoC95 with just 38% of the latency.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Fake News Detection and Manipulation Reasoning via Large Vision-Language Models
Authors:
Ruihan Jin,
Ruibo Fu,
Zhengqi Wen,
Shuai Zhang,
Yukun Liu,
Jianhua Tao
Abstract:
Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content rem…
▽ More
Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content remains under-explored. Furthermore, due to the lack of external knowledge, the performance of existing methods on fact-related news is questionable, leaving their practical implementation unclear. In this paper, we propose a new multi-media research topic, namely manipulation reasoning. Manipulation reasoning aims to reason manipulations based on news content. To support the research, we introduce a benchmark for fake news detection and manipulation reasoning, referred to as Human-centric and Fact-related Fake News (HFFN). The benchmark highlights the centrality of human and the high factual relevance, with detailed manual annotations. HFFN encompasses four realistic domains with fake news samples generated through three manipulation approaches. Moreover, a Multi-modal news Detection and Reasoning langUage Model (M-DRUM) is presented not only to judge on the authenticity of multi-modal news, but also raise analytical reasoning about potential manipulations. On the feature extraction level, a cross-attention mechanism is employed to extract fine-grained fusion features from multi-modal inputs. On the reasoning level, a large vision-language model (LVLM) serves as the backbone to facilitate fact-related reasoning. A two-stage training framework is deployed to better activate the capacity of identification and reasoning. Comprehensive experiments demonstrate that our model outperforms state-of-the-art (SOTA) fake news detection models and powerful LVLMs like GPT-4 and LLaVA.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis
Authors:
Tianyu Cui,
Shiyu Ma,
Ziang Chen,
Tong Xiao,
Shimin Tao,
Yilun Liu,
Shenglin Zhang,
Duoming Lin,
Changchang Liu,
Yuzhe Cai,
Weibin Meng,
Yongqian Sun,
Dan Pei
Abstract:
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint…
▽ More
Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maintenance script generation, and alert information summarization. However, the performance of current LLMs in log analysis tasks remains inadequately validated. To address this gap, we introduce LogEval, a comprehensive benchmark suite designed to evaluate the capabilities of LLMs in various log analysis tasks for the first time. This benchmark covers tasks such as log parsing, log anomaly detection, log fault diagnosis, and log summarization. LogEval evaluates each task using 4,000 publicly available log data entries and employs 15 different prompts for each task to ensure a thorough and fair assessment. By rigorously evaluating leading LLMs, we demonstrate the impact of various LLM technologies on log analysis performance, focusing on aspects such as self-consistency and few-shot contextual learning. We also discuss findings related to model quantification, Chinese-English question-answering evaluation, and prompt engineering. These findings provide insights into the strengths and weaknesses of LLMs in multilingual environments and the effectiveness of different prompt strategies. Various evaluation methods are employed for different tasks to accurately measure the performance of LLMs in log analysis, ensuring a comprehensive assessment. The insights gained from LogEvals evaluation reveal the strengths and limitations of LLMs in log analysis tasks, providing valuable guidance for researchers and practitioners.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Honor Among Bandits: No-Regret Learning for Online Fair Division
Authors:
Ariel D. Procaccia,
Benjamin Schiffer,
Shirley Zhang
Abstract:
We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a…
▽ More
We consider the problem of online fair division of indivisible goods to players when there are a finite number of types of goods and player values are drawn from distributions with unknown means. Our goal is to maximize social welfare subject to allocating the goods fairly in expectation. When a player's value for an item is unknown at the time of allocation, we show that this problem reduces to a variant of (stochastic) multi-armed bandits, where there exists an arm for each player's value for each type of good. At each time step, we choose a distribution over arms which determines how the next item is allocated. We consider two sets of fairness constraints for this problem: envy-freeness in expectation and proportionality in expectation. Our main result is the design of an explore-then-commit algorithm that achieves $\tilde{O}(T^{2/3})$ regret while maintaining either fairness constraint. This result relies on unique properties fundamental to fair-division constraints that allow faster rates of learning, despite the restricted action space.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis
Authors:
Shenglin Zhang,
Sibo Xia,
Wenzhao Fan,
Binpeng Shi,
Xiao Xiong,
Zhenyu Zhong,
Minghua Ma,
Yongqian Sun,
Dan Pei
Abstract:
Modern microservice systems have gained widespread adoption due to their high scalability, flexibility, and extensibility. However, the characteristics of independent deployment, decentralization, and frequent dynamic interactions also introduce the risk of cascading failures, making it challenging to achieve accurate failure diagnosis and rapid system recovery. These issues severely impact operat…
▽ More
Modern microservice systems have gained widespread adoption due to their high scalability, flexibility, and extensibility. However, the characteristics of independent deployment, decentralization, and frequent dynamic interactions also introduce the risk of cascading failures, making it challenging to achieve accurate failure diagnosis and rapid system recovery. These issues severely impact operation efficiency and user experience. Recognizing the crucial role of failure diagnosis in enhancing the stability and reliability of microservice systems, researchers have conducted extensive studies and achieved a series of significant outcomes. This survey provides a comprehensive review and primary analysis of 94 papers from 2003 to the present, including an overview of the fundamental concepts, a research framework, and problem statements. These insights aim to help researchers understand the latest research progress in failure diagnosis. Publicly available datasets, toolkits, and evaluation metrics are also compiled to assist practitioners in selecting and validating various techniques, providing a foundation to advance the domain beyond current practices.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Heights and periods of algebraic cycles in families
Authors:
Ziyang Gao,
Shou-Wu Zhang
Abstract:
We consider the Beilinson--Bloch heights and Abel--Jacobian periods of homologically trivial Chow cycles in families. For the Beilinson--Bloch heights, we show that for any $g\ge 2$, there is a Zariski open dense subset $U$ of $\mathcal{M}_g$, the coarse moduli of curves of genus $g$ over rationals, such that the heights of Ceresa cycles and Gross--Schoen cycles over $U$ satisfy the Northcott prop…
▽ More
We consider the Beilinson--Bloch heights and Abel--Jacobian periods of homologically trivial Chow cycles in families. For the Beilinson--Bloch heights, we show that for any $g\ge 2$, there is a Zariski open dense subset $U$ of $\mathcal{M}_g$, the coarse moduli of curves of genus $g$ over rationals, such that the heights of Ceresa cycles and Gross--Schoen cycles over $U$ satisfy the Northcott property. For the Abel--Jacobi periods, we provide an algebraic criterion for the existence of a Zariski open dense subset of any family such that all cycles not defined over $\overline{\mathbb{Q}}$ are non-torsion and verify that this criterion holds for Ceresa cycles and Gross--Schoen cycles.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.