-
A Perspective on Foundation Models for the Electric Power Grid
Authors:
Hendrik F. Hamann,
Thomas Brunschwiler,
Blazhe Gjorgiev,
Leonardo S. A. Martins,
Alban Puech,
Anna Varbella,
Jonas Weiss,
Juan Bernabe-Moreno,
Alexandre Blondin Massé,
Seong Choi,
Ian Foster,
Bri-Mathias Hodge,
Rishabh Jain,
Kibaek Kim,
Vincent Mai,
François Mirallès,
Martin De Montigny,
Octavio Ramos-Leaños,
Hussein Suprême,
Le Xie,
El-Nasser S. Youssef,
Arnaud Zinflou,
Alexander J. Belvi,
Ricardo J. Bessa,
Bishnu Prasad Bhattari
, et al. (2 additional authors not shown)
Abstract:
Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi…
▽ More
Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transition and climate change. In this paper, we call for the development of, and state why we believe in, the potential of FMs for electric grids. We highlight their strengths and weaknesses amidst the challenges of a changing grid. We argue that an FM learning from diverse grid data and topologies could unlock transformative capabilities, pioneering a new approach in leveraging AI to redefine how we manage complexity and uncertainty in the electric grid. Finally, we discuss a power grid FM concept, namely GridFM, based on graph neural networks and show how different downstream tasks benefit.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
The Transmission Value of Energy Storage and Fundamental Limitations
Authors:
Qian Zhang,
P. R. Kumar,
Le Xie
Abstract:
This study addresses the transmission value of energy storage in electric grids. The inherent connection between storage and transmission infrastructure is captured from a "cumulative energy" perspective, which enables the reformulating of the conventional optimization problem by employing line power flow as the decision variable. The study also establishes the theoretical limitations of both stor…
▽ More
This study addresses the transmission value of energy storage in electric grids. The inherent connection between storage and transmission infrastructure is captured from a "cumulative energy" perspective, which enables the reformulating of the conventional optimization problem by employing line power flow as the decision variable. The study also establishes the theoretical limitations of both storage and transmission lines that can be replaced by each other, providing explicit closed-form expressions for the minimum capacity needed. As a key departure from conventional practice in which transmission lines are designed according to the peak power delivery needs, with sufficient storage capacity, the transmission line capacity can be designed based on the average power delivery needs. The models of this paper only rely on a few basic assumptions, paving the way for understanding future storage as a transmission asset market design. Numerical experiments based on 2-bus, modified RTS 24-bus, RTS-GMLC, and Texas synthetic power systems illustrate the results.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX
Authors:
Zhiyuan Chen,
Tianhao Chen,
Chenggang Xie,
Yang Xue,
Xiaonan Zhang,
Jingbo Zhou,
Xiaomin Fang
Abstract:
Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. Th…
▽ More
Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. These approaches restrict the understanding and generation of multimodal protein data. In contrast, large multimodal models have demonstrated potential capabilities in generating any-to-any content like text, images, and videos, thus enriching user interactions across various domains. Integrating these multimodal model technologies into protein research offers significant promise by potentially transforming how proteins are studied. To this end, we introduce HelixProtX, a system built upon the large multimodal model, aiming to offer a comprehensive solution to protein research by supporting any-to-any protein modality generation. Unlike existing methods, it allows for the transformation of any input protein modality into any desired protein modality. The experimental results affirm the advanced capabilities of HelixProtX, not only in generating functional descriptions from amino acid sequences but also in executing critical tasks such as designing protein sequences and structures from textual descriptions. Preliminary findings indicate that HelixProtX consistently achieves superior accuracy across a range of protein-related tasks, outperforming existing state-of-the-art models. By integrating multimodal large models into protein research, HelixProtX opens new avenues for understanding protein biology, thereby promising to accelerate scientific discovery.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Measurement of $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays at Belle II
Authors:
Belle II Collaboration,
I. Adachi,
L. Aggarwal,
H. Ahmed,
H. Aihara,
N. Akopov,
A. Aloisio,
N. Anh Ky,
D. M. Asner,
H. Atmacan,
T. Aushev,
V. Aushev,
M. Aversano,
R. Ayad,
V. Babu,
H. Bae,
S. Bahinipati,
P. Bambade,
Sw. Banerjee,
S. Bansal,
M. Barrett,
J. Baudot,
A. Baur,
A. Beaubien,
F. Becherer
, et al. (414 additional authors not shown)
Abstract:
We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We det…
▽ More
We report measurements of time-dependent $CP$ asymmetries in $B^0 \to K^0_S π^0 γ$ decays based on a data sample of $(388\pm6)\times10^6$ $B\bar{B}$ events collected at the $Υ(4S)$ resonance with the Belle II detector. The Belle II experiment operates at the SuperKEKB asymmetric-energy $e^+e^-$ collider. We measure decay-time distributions to determine $CP$-violating parameters $S$ and $C$. We determine these parameters for two ranges of $K^0_S π^0$ invariant mass: $m(K^0_S π^0)\in (0.8, 1.0)$ $GeV/c^2$, which is dominated by $B^0 \to K^{*0} (\to K^0_S π^0) γ$ decays, and a complementary region $m(K^0_S π^0)\in (0.6, 0.8)\cup(1.0, 1.8)$ $GeV/c^2$. Our results have improved precision as compared to previous measurements and are consistent with theory predictions.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
KUNPENG: An Embodied Large Model for Intelligent Maritime
Authors:
Naiyao Wang,
Tongbang Jiang,
Ye Wang,
Shaoyang Qiu,
Bo Zhang,
Xinqiang Xie,
Munan Li,
Chunliu Wang,
Yiyang Wang,
Hongxiang Ren,
Ruili Wang,
Hongjun Shan,
Hongbo Liu
Abstract:
Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic…
▽ More
Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Photonic quasicrystal of spin angular momentum
Authors:
Min Lin,
Xinxin Gou,
Zhenwei Xie,
Aiping Yang,
Luping Du,
Xiaocong Yuan
Abstract:
Quasicrystals,characterized by long-range order without translational symmetry,have catalyzed transformative advances in various fields,including optics in terms of field quasicrystals.Here,we present the first demonstration of photonic quasicrystals formed by spin angular momentum, unveiling novel spin-orbit coupling effects absent in traditional field quasicrystals.A de Bruijn tiling like theore…
▽ More
Quasicrystals,characterized by long-range order without translational symmetry,have catalyzed transformative advances in various fields,including optics in terms of field quasicrystals.Here,we present the first demonstration of photonic quasicrystals formed by spin angular momentum, unveiling novel spin-orbit coupling effects absent in traditional field quasicrystals.A de Bruijn tiling like theoretical framework was built elucidating the formation mechanism of spin quasicrystals for diverse symmetries.Moreover,the configurations of these spin textures can be manipulated through the adjustments of the wavefronts,among which phason-like discontinuous dynamics is observed and quantitatively measured. Unlike optical quasicrystals shaped by electromagnetic fields,these spin-governed quasicrystals exhibit quasi-periodic properties of kinematic parameters,extending their potential applications to other physical systems. These findings hold promise for novel advancements in optical trapping,quasicrystal fabrication,and optical encryption systems.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Exploring Generative AI Policies in Higher Education: A Comparative Perspective from China, Japan, Mongolia, and the USA
Authors:
Qin Xie,
Ming Li,
Ariunaa Enkhtur
Abstract:
This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positiv…
▽ More
This study conducts a comparative analysis of national policies on Generative AI across four countries: China, Japan, Mongolia, and the USA. Employing the Qualitative Comparative Analysis (QCA) method, it examines the responses of these nations to Generative AI in higher education settings, scrutinizing the diversity in their approaches within this group. While all four countries exhibit a positive attitude toward Generative AI in higher education, Japan and the USA prioritize a human-centered approach and provide direct guidance in teaching and learning. In contrast, China and Mongolia prioritize national security concerns, with their guidelines focusing more on the societal level rather than being specifically tailored to education. Additionally, despite all four countries emphasizing diversity, equity, and inclusion, they consistently fail to clearly discuss or implement measures to address the digital divide. By offering a comprehensive comparative analysis of attitudes and policies regarding Generative AI in higher education across these countries, this study enriches existing literature and provides policymakers with a global perspective, ensuring that policies in this domain promote inclusion rather than exclusion.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Chern Bands' Optimally Localized Wannier Functions and Fractional Chern Insulators
Authors:
Fang Xie,
Yuan Fang,
Lei Chen,
Jennifer Cano,
Qimiao Si
Abstract:
Recent development on fractional Chern insulators and proximate phases call for a real space representation of isolated Chern bands. Here we propose a new method for a general construction of optimally localized Wannier functions from such Chern bands. We do so through an optimal gauge choice of the Bloch states of a Chern band with the singularity placed at any desired position in momentum space.…
▽ More
Recent development on fractional Chern insulators and proximate phases call for a real space representation of isolated Chern bands. Here we propose a new method for a general construction of optimally localized Wannier functions from such Chern bands. We do so through an optimal gauge choice of the Bloch states of a Chern band with the singularity placed at any desired position in momentum space. We apply this method to construct the optimally localized Wannier functions for kagome lattice, and use it to identify channels of interactions that are favorable to the development of fractional Chern insulators. Implications of the approach for the interplay between correlations and topology in broader contexts are discussed.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design
Authors:
Jingyi Xie,
Rui Yu,
He Zhang,
Sooyeon Lee,
Syed Masum Billah,
John M. Carroll
Abstract:
People with visual impairments perceive their environment non-visually and often use AI-powered assistive tools to obtain textual descriptions of visual information. Recent large vision-language model-based AI-powered tools like Be My AI are more capable of understanding users' inquiries in natural language and describing the scene in audible text; however, the extent to which these tools are usef…
▽ More
People with visual impairments perceive their environment non-visually and often use AI-powered assistive tools to obtain textual descriptions of visual information. Recent large vision-language model-based AI-powered tools like Be My AI are more capable of understanding users' inquiries in natural language and describing the scene in audible text; however, the extent to which these tools are useful to visually impaired users is currently understudied. This paper aims to fill this gap. Our study with 14 visually impaired users reveals that they are adapting these tools organically -- not only can these tools facilitate complex interactions in household, spatial, and social contexts, but they also act as an extension of users' cognition, as if the cognition were distributed in the visual information. We also found that although the tools are currently not goal-oriented, users accommodate this limitation and embrace the tools' capabilities for broader use. These findings enable us to envision design implications for creating more goal-oriented, real-time processing, and reliable AI-powered assistive technology.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Scalable microwave-to-optical transducers at single photon level with spins
Authors:
Tian Xie,
Rikuto Fukumori,
Jiahui Li,
Andrei Faraon
Abstract:
Microwave-to-optical transduction of single photons will play an essential role in interconnecting future superconducting quantum devices, with applications in distributed quantum computing and secure communications. Various transducers that couple microwave and optical modes via an optical drive have been developed, utilizing nonlinear phenomena such as the Pockels effect and a combination of ele…
▽ More
Microwave-to-optical transduction of single photons will play an essential role in interconnecting future superconducting quantum devices, with applications in distributed quantum computing and secure communications. Various transducers that couple microwave and optical modes via an optical drive have been developed, utilizing nonlinear phenomena such as the Pockels effect and a combination of electromechanical, piezoelectric, and optomechanical couplings. However, the limited strength of these nonlinearities, set by bulk material properties, requires the use of high quality factor resonators, often in conjunction with sophisticated nano-fabrication of suspended structures. Thus, an efficient and scalable transduction technology is still an outstanding goal. Rare-earth ion (REI) doped crystals provide high-quality atomic resonances that result in effective second-order nonlinearities stronger by many orders of magnitude compared to conventional materials. Here, we use ytterbium-171 ions doped in a YVO$_4$ crystal at 340 ppm with an effective resonant $χ^{(2)}$ nonlinearity of ~ 10$^7$ pm/V to implement an on-chip microwave-to-optical transducer. Without an engineered optical cavity, we achieve percent-level efficiencies with an added noise as low as 1.24(9) photons. To showcase scalability, we demonstrate the interference of photons originating from two simultaneously operated transducers, enabled by the inherent absolute frequencies of the atomic transitions. These results establish REI-based transducers as a highly competitive transduction platform, provide existing REI-based quantum technologies a native link to various leading quantum microwave platforms, and pave the way toward remote transducer-assisted entanglement of superconducting quantum machines.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Multi-scale gridded Gabor attention for cirrus segmentation
Authors:
Felix Richards,
Adeline Paiement,
Xianghua Xie,
Elisabeth Sola,
Pierre-Alain Duc
Abstract:
In this paper, we address the challenge of segmenting global contaminants in large images. The precise delineation of such structures requires ample global context alongside understanding of textural patterns. CNNs specialise in the latter, though their ability to generate global features is limited. Attention measures long range dependencies in images, capturing global context, though at a large…
▽ More
In this paper, we address the challenge of segmenting global contaminants in large images. The precise delineation of such structures requires ample global context alongside understanding of textural patterns. CNNs specialise in the latter, though their ability to generate global features is limited. Attention measures long range dependencies in images, capturing global context, though at a large computational cost. We propose a gridded attention mechanism to address this limitation, greatly increasing efficiency by processing multi-scale features into smaller tiles. We also enhance the attention mechanism for increased sensitivity to texture orientation, by measuring correlations across features dependent on different orientations, in addition to channel and positional attention. We present results on a new dataset of astronomical images, where the task is segmenting large contaminating dust clouds.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Toward Efficient Deep Spiking Neuron Networks:A Survey On Compression
Authors:
Hui Xie,
Ge Yang,
Wenjuan Gao
Abstract:
With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or…
▽ More
With the rapid development of deep learning, Deep Spiking Neural Networks (DSNNs) have emerged as promising due to their unique spike event processing and asynchronous computation. When deployed on neuromorphic chips, DSNNs offer significant power advantages over Deep Artificial Neural Networks (DANNs) and eliminate time and energy consuming multiplications due to the binary nature of spikes (0 or 1). Additionally, DSNNs excel in processing temporal information, making them potentially superior for handling temporal data compared to DANNs. However, their deep network structure and numerous parameters result in high computational costs and energy consumption, limiting real-life deployment. To enhance DSNNs efficiency, researchers have adapted methods from DANNs, such as pruning, quantization, and knowledge distillation, and developed specific techniques like reducing spike firing and pruning time steps. While previous surveys have covered DSNNs algorithms, hardware deployment, and general overviews, focused research on DSNNs compression and efficiency has been lacking. This survey addresses this gap by concentrating on efficient DSNNs and their compression methods. It begins with an exploration of DSNNs' biological background and computational units, highlighting differences from DANNs. It then delves into various compression methods, including pruning, quantization, knowledge distillation, and reducing spike firing, and concludes with suggestions for future research directions.
△ Less
Submitted 3 June, 2024;
originally announced July 2024.
-
Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight
Authors:
Zhiqiang Xie,
Yujia Zheng,
Lizi Ottens,
Kun Zhang,
Christos Kozyrakis,
Jonathan Mace
Abstract:
Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to ca…
▽ More
Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to capture relationships between varied cloud system performance metrics. To be effective, however, systems developers must correctly define the causal graph of their system, which is a time-consuming, brittle, and challenging task that increases in difficulty for large and dynamic systems and requires domain expertise. Alternatively, automated data-driven approaches have limited efficacy for cloud systems due to the inherent rarity of incidents. In this work, we present Atlas, a novel approach to automatically synthesizing causal graphs for cloud systems. Atlas leverages large language models (LLMs) to generate causal graphs using system documentation, telemetry, and deployment feedback. Atlas is complementary to data-driven causal discovery techniques, and we further enhance Atlas with a data-driven validation step. We evaluate Atlas across a range of fault localization scenarios and demonstrate that Atlas is capable of generating causal graphs in a scalable and generalizable manner, with performance that far surpasses that of data-driven algorithms and is commensurate to the ground-truth baseline.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
$β$-DPO: Direct Preference Optimization with Dynamic $β$
Authors:
Junkang Wu,
Yuexiang Xie,
Zhengyi Yang,
Jiancan Wu,
Jinyang Gao,
Bolin Ding,
Xiang Wang,
Xiangnan He
Abstract:
Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the inf…
▽ More
Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the informativeness of pairwise data. Addressing the limitations of static $β$ values, we introduce a novel framework that dynamically calibrates $β$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $β$-guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $β$ adjustment technique significantly improves DPO's performance across a range of models and datasets, offering a more robust and adaptable training paradigm for aligning LLMs with human feedback. The code is available at \url{https://github.com/junkangwu/beta-DPO}.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
RTMW: Real-Time Multi-Person 2D and 3D Whole-body Pose Estimation
Authors:
Tao Jiang,
Xinchen Xie,
Yining Li
Abstract:
Whole-body pose estimation is a challenging task that requires simultaneous prediction of keypoints for the body, hands, face, and feet. Whole-body pose estimation aims to predict fine-grained pose information for the human body, including the face, torso, hands, and feet, which plays an important role in the study of human-centric perception and generation and in various applications. In this wor…
▽ More
Whole-body pose estimation is a challenging task that requires simultaneous prediction of keypoints for the body, hands, face, and feet. Whole-body pose estimation aims to predict fine-grained pose information for the human body, including the face, torso, hands, and feet, which plays an important role in the study of human-centric perception and generation and in various applications. In this work, we present RTMW (Real-Time Multi-person Whole-body pose estimation models), a series of high-performance models for 2D/3D whole-body pose estimation. We incorporate RTMPose model architecture with FPN and HEM (Hierarchical Encoding Module) to better capture pose information from different body parts with various scales. The model is trained with a rich collection of open-source human keypoint datasets with manually aligned annotations and further enhanced via a two-stage distillation strategy. RTMW demonstrates strong performance on multiple whole-body pose estimation benchmarks while maintaining high inference efficiency and deployment friendliness. We release three sizes: m/l/x, with RTMW-l achieving a 70.2 mAP on the COCO-Wholebody benchmark, making it the first open-source model to exceed 70 mAP on this benchmark. Meanwhile, we explored the performance of RTMW in the task of 3D whole-body pose estimation, conducting image-based monocular 3D whole-body pose estimation in a coordinate classification manner. We hope this work can benefit both academic research and industrial applications. The code and models have been made publicly available at: https://github.com/open-mmlab/mmpose/tree/main/projects/rtmpose
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models
Authors:
Wanling Gao,
Yunyou Huang,
Dandan Cui,
Zhuoming Yu,
Wenjing Liu,
Xiaoshuang Liang,
Jiahui Zhao,
Jiyue Xie,
Hao Li,
Li Ma,
Ning Ye,
Yumiao Kang,
Dingfeng Luo,
Peng Pan,
Wei Huang,
Zhongmou Liu,
Jizhong Hu,
Gangyuan Zhao,
Chongrong Jiang,
Fan Huang,
Tianyi Wei,
Suqin Tang,
Bingjie Xia,
Zhifei Zhang,
Jianfeng Zhan
Abstract:
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl…
▽ More
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents
Authors:
Haoyi Xiong,
Zhiyuan Wang,
Xuhong Li,
Jiang Bian,
Zeke Xie,
Shahid Mumtaz,
Laura E. Barnes
Abstract:
This article explores the convergence of connectionist and symbolic artificial intelligence (AI), from historical debates to contemporary advancements. Traditionally considered distinct paradigms, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic. Recent advancements in large language models (LLMs), exemplified by ChatGPT and GPT-4, highlig…
▽ More
This article explores the convergence of connectionist and symbolic artificial intelligence (AI), from historical debates to contemporary advancements. Traditionally considered distinct paradigms, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic. Recent advancements in large language models (LLMs), exemplified by ChatGPT and GPT-4, highlight the potential of connectionist architectures in handling human language as a form of symbols. The study argues that LLM-empowered Autonomous Agents (LAAs) embody this paradigm convergence. By utilizing LLMs for text-based knowledge modeling and representation, LAAs integrate neuro-symbolic AI principles, showcasing enhanced reasoning and decision-making capabilities. Comparing LAAs with Knowledge Graphs within the neuro-symbolic AI theme highlights the unique strengths of LAAs in mimicking human-like reasoning processes, scaling effectively with large datasets, and leveraging in-context samples without explicit re-training. The research underscores promising avenues in neuro-vector-symbolic integration, instructional encoding, and implicit reasoning, aimed at further enhancing LAA capabilities. By exploring the progression of neuro-symbolic AI and proposing future research trajectories, this work advances the understanding and development of AI technologies.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Infinite Motion: Extended Motion Generation via Long Text Instructions
Authors:
Mengtian Li,
Chengshuo Zhai,
Shengxiang Yao,
Zhifeng Xie,
Keyu Chen,
Yu-Gang Jiang
Abstract:
In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reass…
▽ More
In the realm of motion generation, the creation of long-duration, high-quality motion sequences remains a significant challenge. This paper presents our groundbreaking work on "Infinite Motion", a novel approach that leverages long text to extended motion generation, effectively bridging the gap between short and long-duration motion synthesis. Our core insight is the strategic extension and reassembly of existing high-quality text-motion datasets, which has led to the creation of a novel benchmark dataset to facilitate the training of models for extended motion sequences. A key innovation of our model is its ability to accept arbitrary lengths of text as input, enabling the generation of motion sequences tailored to specific narratives or scenarios. Furthermore, we incorporate the timestamp design for text which allows precise editing of local segments within the generated sequences, offering unparalleled control and flexibility in motion synthesis. We further demonstrate the versatility and practical utility of "Infinite Motion" through three specific applications: natural language interactive editing, motion sequence editing within long sequences and splicing of independent motion sequences. Each application highlights the adaptability of our approach and broadens the spectrum of possibilities for research and development in motion generation. Through extensive experiments, we demonstrate the superior performance of our model in generating long sequence motions compared to existing methods.Project page: https://shuochengzhai.github.io/Infinite-motion.github.io/
△ Less
Submitted 12 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Dynamically assisted pair production enhancement by combined multiple potentials
Authors:
Lie-Juan Li,
Li Wang,
Melike Mohamedsedik,
Li-Na Hu,
Bai-Song Xie
Abstract:
We propose a new Sauter-like field model with combinatorial multiple potentials consisting of a deep slow-varying and some shallow fast-varying potentials. The dynamically assisted Sauter-Schwinger effect on the pair production is found by using the computational quantum field theory. The enhanced pair production is found to be significant at about one order increasing for multiple potentials rath…
▽ More
We propose a new Sauter-like field model with combinatorial multiple potentials consisting of a deep slow-varying and some shallow fast-varying potentials. The dynamically assisted Sauter-Schwinger effect on the pair production is found by using the computational quantum field theory. The enhanced pair production is found to be significant at about one order increasing for multiple potentials rather than single potential. In case of dominated by Schwinger mechanism, the obvious time effect leads to electrons concentrating at the two edges of the potential, meanwhile, the momentum locates at the zero nearby. In contrary, however, for the multiphoton processes, the pair generation makes the electrons distributing outside the potential and the momentum appearing multiple peaks far away from zero and evenly evolving toward a step-like structure. An interesting finding is that the particles of pair produced in the alternating potential has a quasi-monoenergetic structure compared to the oscillating potential well or/and potential barrier, which is helpful to achieve the high quality positron source.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software
Authors:
Dezhi Ran,
Mengzhou Wu,
Wei Yang,
Tao Xie
Abstract:
By treating data and models as the source code, Foundation Models (FMs) become a new type of software. Mirroring the concept of software crisis, the increasing complexity of FMs making FM crisis a tangible concern in the coming decade, appealing for new theories and methodologies from the field of software engineering. In this paper, we outline our vision of introducing Foundation Model (FM) engin…
▽ More
By treating data and models as the source code, Foundation Models (FMs) become a new type of software. Mirroring the concept of software crisis, the increasing complexity of FMs making FM crisis a tangible concern in the coming decade, appealing for new theories and methodologies from the field of software engineering. In this paper, we outline our vision of introducing Foundation Model (FM) engineering, a strategic response to the anticipated FM crisis with principled engineering methodologies. FM engineering aims to mitigate potential issues in FM development and application through the introduction of declarative, automated, and unified programming interfaces for both data and model management, reducing the complexities involved in working with FMs by providing a more structured and intuitive process for developers. Through the establishment of FM engineering, we aim to provide a robust, automated, and extensible framework that addresses the imminent challenges, and discovering new research opportunities for the software engineering field.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing
Authors:
Minghang Zhou,
Tianyu Li,
Chaofan Qiao,
Dongyu Xie,
Guoqing Wang,
Ningjuan Ruan,
Lin Mei,
Yang Yang
Abstract:
Multispectral oriented object detection faces challenges due to both inter-modal and intra-modal discrepancies. Recent studies often rely on transformer-based models to address these issues and achieve cross-modal fusion detection. However, the quadratic computational complexity of transformers limits their performance. Inspired by the efficiency and lower complexity of Mamba in long sequence task…
▽ More
Multispectral oriented object detection faces challenges due to both inter-modal and intra-modal discrepancies. Recent studies often rely on transformer-based models to address these issues and achieve cross-modal fusion detection. However, the quadratic computational complexity of transformers limits their performance. Inspired by the efficiency and lower complexity of Mamba in long sequence tasks, we propose Disparity-guided Multispectral Mamba (DMM), a multispectral oriented object detection framework comprised of a Disparity-guided Cross-modal Fusion Mamba (DCFM) module, a Multi-scale Target-aware Attention (MTA) module, and a Target-Prior Aware (TPA) auxiliary task. The DCFM module leverages disparity information between modalities to adaptively merge features from RGB and IR images, mitigating inter-modal conflicts. The MTA module aims to enhance feature representation by focusing on relevant target regions within the RGB modality, addressing intra-modal variations. The TPA auxiliary task utilizes single-modal labels to guide the optimization of the MTA module, ensuring it focuses on targets and their local context. Extensive experiments on the DroneVehicle and VEDAI datasets demonstrate the effectiveness of our method, which outperforms state-of-the-art methods while maintaining computational efficiency. Code will be available at https://github.com/Another-0/DMM.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Asynchronous measurement-device-independent quantum digital signatures
Authors:
Jing-Wei Bian,
Bing-Hong Li,
Yuan-Mei Xie,
Hua-Lei Yin,
Zeng-Bing Chen
Abstract:
Quantum digital signatures (QDSs), which distribute and measure quantum states by key generation protocols and then sign messages via classical data processing, are a key area of interest in quantum cryptography. However, the practical implementation of a QDS network has many challenges, including complex interference technical requirements, linear channel loss of quantum state transmission, and p…
▽ More
Quantum digital signatures (QDSs), which distribute and measure quantum states by key generation protocols and then sign messages via classical data processing, are a key area of interest in quantum cryptography. However, the practical implementation of a QDS network has many challenges, including complex interference technical requirements, linear channel loss of quantum state transmission, and potential side-channel attacks on detectors. Here, we propose an asynchronous measurement-device-independent (MDI) QDS protocol with asynchronous two-photon interference strategy and one-time universal hashing method. The two-photon interference approach protects our protocol against all detector side-channel attacks and relaxes the difficulty of experiment implementation, while the asynchronous strategy effectively reduces the equivalent channel loss to its square root. Compared to previous MDI-QDS schemes, our protocol shows several orders of magnitude performance improvements and doubling of transmission distance when processing multi-bit messages. Our findings present an efficient and practical MDI-QDS scheme, paving the way for large-scale data processing with non-repudiation in quantum networks.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Pseudosymmetry in Tetragonal Perovskite SrIrO$_3$ Synthesized under High Pressure
Authors:
Haozhe Wang,
Alberto de la Torre,
Joseph T. Race,
Qiaochu Wang,
Jacob P. C. Ruff,
Patrick M. Woodward,
Kemp W. Plumb,
David Walker,
Weiwei Xie
Abstract:
In this study, we report a tetragonal perovskite structure of SrIrO$_3$ (P4/mmm, a = 3.9362(9) Å, c = 7.880(3) Å) synthesized at 6 GPa and 1400 $°$C, employing the ambient pressure monoclinic SrIrO$_3$ with distorted 6H structure as a precursor. The crystal structure of tetragonal SrIrO3 was evaluated on the basis of single crystal and powder X-ray diffraction. A cubic indexing was observed attrib…
▽ More
In this study, we report a tetragonal perovskite structure of SrIrO$_3$ (P4/mmm, a = 3.9362(9) Å, c = 7.880(3) Å) synthesized at 6 GPa and 1400 $°$C, employing the ambient pressure monoclinic SrIrO$_3$ with distorted 6H structure as a precursor. The crystal structure of tetragonal SrIrO3 was evaluated on the basis of single crystal and powder X-ray diffraction. A cubic indexing was observed attributed to overlooked superlattice reflections. Weak fractional peaks in the H and K dimensions suggest possible structure modulation by oxygen defects. Magnetization study reveals weak paramagnetic behavior down to 2 K, indicative of the interplay between spin-orbit coupling, electron correlations, and crystal electric field. Additionally, measurements of electrical resistivity display metallic behavior with an upturn at about 54 K, ascribed to weak electron localization and possible structural defects.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments
Authors:
Feng Xie,
Zhen Yao,
Lin Xie,
Yan Zeng,
Zhi Geng
Abstract:
We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming t…
▽ More
We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming that the causal model is a one-directional MR model. As such, in this paper, we first theoretically investigate the identification of the bi-directional MR from observational data. In particular, we provide necessary and sufficient conditions under which valid IV sets are correctly identified such that the bi-directional MR model is identifiable, including the causal directions of a pair of phenotypes (i.e., the treatment and outcome). Moreover, based on the identification theory, we develop a cluster fusion-like method to discover valid IV sets and estimate the causal effects of interest. We theoretically demonstrate the correctness of the proposed algorithm. Experimental results show the effectiveness of our method for estimating causal effects in bi-directional MR.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
A Trustworthy AIoT-enabled Localization System via Federated Learning and Blockchain
Authors:
Junfei Wang,
He Huang,
Jingze Feng,
Steven Wong,
Lihua Xie,
Jianfei Yang
Abstract:
There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to pa…
▽ More
There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to partially overcome privacy problems, but there still remain security concerns, e.g., single-point failure and malicious attacks. In this paper, we propose a framework named DFLoc to achieve precise 3D localization tasks while considering the following two security concerns. Particularly, we design a specialized blockchain to decentralize the framework by distributing the tasks such as model distribution and aggregation which are handled by a central server to all clients in most previous works, to address the issue of the single-point failure for a reliable and accurate indoor localization system. Moreover, we introduce an updated model verification mechanism within the blockchain to alleviate the concern of malicious node attacks. Experimental results substantiate the framework's capacity to deliver accurate 3D location predictions and its superior resistance to the impacts of single-point failure and malicious attacks when compared to conventional centralized federated learning systems.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Authors:
Junkang Wu,
Yuexiang Xie,
Zhengyi Yang,
Jiancan Wu,
Jiawei Chen,
Jinyang Gao,
Bolin Ding,
Xiang Wang,
Xiangnan He
Abstract:
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus…
▽ More
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $β$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $β'$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings. The code is available at https://github.com/junkangwu/Dr_DPO.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Bayesian Inference of Fine-Features of Nuclear Equation of State from Future Neutron Star Radius Measurements to 0.1km Accuracy
Authors:
Bao-An Li,
Xavier Grundler,
Wen-Jie Xie,
Nai-Bo Zhang
Abstract:
To more precisely constrain the Equation of State (EOS) of supradense neutron-rich nuclear matter, future high-precision X-ray and gravitational wave observatories are proposed to measure the radii of neutron stars (NSs) with an accuracy better than about 0.1 km. However, it remains unclear what particular aspects (other than the stiffness generally spoken of in the literature) of the EOS and to w…
▽ More
To more precisely constrain the Equation of State (EOS) of supradense neutron-rich nuclear matter, future high-precision X-ray and gravitational wave observatories are proposed to measure the radii of neutron stars (NSs) with an accuracy better than about 0.1 km. However, it remains unclear what particular aspects (other than the stiffness generally spoken of in the literature) of the EOS and to what precision they will be better constrained. In this work, within a Bayesian framework using a meta-model EOS for NSs, we infer the posterior probability distribution functions (PDFs) of incompressibility $K_{0}$ and skewness $J_{0}$ of symmetric nuclear matter (SNM) as well as the slope $L$, curvature $K_{\rm{sym}}$, and skewness $J_{\rm{sym}}$ characterizing the density dependence of nuclear symmetry energy $E_{\rm{sym}}(ρ)$, respectively, from mean values of NS radii consistent with existing observations and an expected accuracy $ΔR$ ranging from about 1.0 km to 0.1 km. We found that (1) the $ΔR$ has little effect on inferring the stiffness of SNM at suprasaturation densities, (2) smaller $ΔR$ reveals more accurately not only the PDFs but also pairwise correlations among parameters characterizing high-density $E_{\rm{sym}}(ρ)$, (3) a double-peak feature of the PDF($K_{\rm{sym}}$) corresponding to the strong $K_{\rm{sym}}-J_{\rm{sym}}$ and $K_{\rm{sym}}-L$ anti-correlations is revealed when $ΔR$ is less than about 0.2 km, (4) the high-precision radius measurement for canonical NSs is more useful than that for massive ones for constraining the EOS of nucleonic matter around (2-3) times the saturation density of SNM.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities
Authors:
Tianjie Ju,
Yiting Wang,
Xinbei Ma,
Pengzhou Cheng,
Haodong Zhao,
Yulong Wang,
Lifeng Liu,
Jian Xie,
Zhuosheng Zhang,
Gongshen Liu
Abstract:
The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper,…
▽ More
The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper, we investigate this critical issue by constructing a detailed threat model and a comprehensive simulation environment that mirrors real-world multi-agent deployments in a trusted platform. Subsequently, we propose a novel two-stage attack method involving Persuasiveness Injection and Manipulated Knowledge Injection to systematically explore the potential for manipulated knowledge (i.e., counterfactual and toxic knowledge) spread without explicit prompt manipulation.
Our method leverages the inherent vulnerabilities of LLMs in handling world knowledge, which can be exploited by attackers to unconsciously spread fabricated information. Through extensive experiments, we demonstrate that our attack method can successfully induce LLM-based agents to spread both counterfactual and toxic knowledge without degrading their foundational capabilities during agent communication. Furthermore, we show that these manipulations can persist through popular retrieval-augmented generation frameworks, where several benign agents store and retrieve manipulated chat histories for future interactions. This persistence indicates that even after the interaction has ended, the benign agents may continue to be influenced by manipulated knowledge. Our findings reveal significant security risks in LLM-based multi-agent systems, emphasizing the imperative need for robust defenses against manipulated knowledge spread, such as introducing ``guardian'' agents and advanced fact-checking tools.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
A 'MAP' to find high-performing soft robot designs: Traversing complex design spaces using MAP-elites and Topology Optimization
Authors:
Yue Xie,
Josh Pinskier,
Lois Liow,
David Howard,
Fumiya Iida
Abstract:
Soft robotics has emerged as the standard solution for grasping deformable objects, and has proven invaluable for mobile robotic exploration in extreme environments. However, despite this growth, there are no widely adopted computational design tools that produce quality, manufacturable designs. To advance beyond the diminishing returns of heuristic bio-inspiration, the field needs efficient tools…
▽ More
Soft robotics has emerged as the standard solution for grasping deformable objects, and has proven invaluable for mobile robotic exploration in extreme environments. However, despite this growth, there are no widely adopted computational design tools that produce quality, manufacturable designs. To advance beyond the diminishing returns of heuristic bio-inspiration, the field needs efficient tools to explore the complex, non-linear design spaces present in soft robotics, and find novel high-performing designs. In this work, we investigate a hierarchical design optimization methodology which combines the strengths of topology optimization and quality diversity optimization to generate diverse and high-performance soft robots by evolving the design domain. The method embeds variably sized void regions within the design domain and evolves their size and position, to facilitating a richer exploration of the design space and find a diverse set of high-performing soft robots. We demonstrate its efficacy on both benchmark topology optimization problems and soft robotic design problems, and show the method enhances grasp performance when applied to soft grippers. Our method provides a new framework to design parts in complex design domains, both soft and rigid.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Resource Allocation for Twin Maintenance and Computing Task Processing in Digital Twin Vehicular Edge Computing Network
Authors:
Yu Xie,
Qiong Wu,
Pingyi Fan,
Nan Cheng,
Wen Chen,
Jiangzhou Wang,
Khaled B. Letaief
Abstract:
As a promising technology, vehicular edge computing (VEC) can provide computing and caching services by deploying VEC servers near vehicles. However, VEC networks still face challenges such as high vehicle mobility. Digital twin (DT), an emerging technology, can predict, estimate, and analyze real-time states by digitally modeling objects in the physical world. By integrating DT with VEC, a virtua…
▽ More
As a promising technology, vehicular edge computing (VEC) can provide computing and caching services by deploying VEC servers near vehicles. However, VEC networks still face challenges such as high vehicle mobility. Digital twin (DT), an emerging technology, can predict, estimate, and analyze real-time states by digitally modeling objects in the physical world. By integrating DT with VEC, a virtual vehicle DT can be created in the VEC server to monitor the real-time operating status of vehicles. However, maintaining the vehicle DT model requires ongoing attention from the VEC server, which also needs to offer computing services for the vehicles. Therefore, effective allocation and scheduling of VEC server resources are crucial. This study focuses on a general VEC network with a single VEC service and multiple vehicles, examining the two types of delays caused by twin maintenance and computational processing within the network. By transforming the problem using satisfaction functions, we propose an optimization problem aimed at maximizing each vehicle's resource utility to determine the optimal resource allocation strategy. Given the non-convex nature of the issue, we employ multi-agent Markov decision processes to reformulate the problem. Subsequently, we propose the twin maintenance and computing task processing resource collaborative scheduling (MADRL-CSTC) algorithm, which leverages multi-agent deep reinforcement learning. Through experimental comparisons with alternative algorithms, it demonstrates that our proposed approach is effective in terms of resource allocation.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Incremental Multiview Point Cloud Registration with Two-stage Candidate Retrieval
Authors:
Shiqi Li,
Jihua Zhu,
Yifan Xie,
Mingchen Zhu
Abstract:
Multiview point cloud registration serves as a cornerstone of various computer vision tasks. Previous approaches typically adhere to a global paradigm, where a pose graph is initially constructed followed by motion synchronization to determine the absolute pose. However, this separated approach may not fully leverage the characteristics of multiview registration and might struggle with low-overlap…
▽ More
Multiview point cloud registration serves as a cornerstone of various computer vision tasks. Previous approaches typically adhere to a global paradigm, where a pose graph is initially constructed followed by motion synchronization to determine the absolute pose. However, this separated approach may not fully leverage the characteristics of multiview registration and might struggle with low-overlap scenarios. In this paper, we propose an incremental multiview point cloud registration method that progressively registers all scans to a growing meta-shape. To determine the incremental ordering, we employ a two-stage coarse-to-fine strategy for point cloud candidate retrieval. The first stage involves the coarse selection of scans based on neighbor fusion-enhanced global aggregation features, while the second stage further reranks candidates through geometric-based matching. Additionally, we apply a transformation averaging technique to mitigate accumulated errors during the registration process. Finally, we utilize a Reservoir sampling-based technique to address density variance issues while reducing computational load. Comprehensive experimental results across various benchmarks validate the effectiveness and generalization of our approach.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
High-rate quantum digital signatures network with integrated silicon photonics
Authors:
Yongqiang Du,
Bing-Hong Li,
Xin Hua,
Xiao-Yu Cao,
Zhengeng Zhao,
Feng Xie,
Zhenrong Zhang,
Hua-Lei Yin,
Xi Xiao,
Kejin Wei
Abstract:
The development of quantum networks is paramount towards practical and secure communications. Quantum digital signatures (QDS) offer an information-theoretically secure solution for ensuring data integrity, authenticity, and non-repudiation, rapidly growing from proof-of-concept to robust demonstrations. However, previous QDS systems relied on expensive and bulky optical equipment, limiting large-…
▽ More
The development of quantum networks is paramount towards practical and secure communications. Quantum digital signatures (QDS) offer an information-theoretically secure solution for ensuring data integrity, authenticity, and non-repudiation, rapidly growing from proof-of-concept to robust demonstrations. However, previous QDS systems relied on expensive and bulky optical equipment, limiting large-scale deployment and reconfigurable networking construction. Here, we introduce and verify a chip-based QDS network, placing the complicated and expensive measurement devices in the central relay while each user needs only a low-cost transmitter. We demonstrate the network with a three-node setup using an integrated encoder chip and decoder chip. By developing a 1-decoy-state one-time universal hash-QDS protocol, we achieve a maximum signature rate of 0.0414 times per second for a 1 Mbit file over fiber distances up to 200 km, surpassing all current state-of-the-art QDS experiments. This study validates the feasibility of chip-based QDS, paving the way for large-scale deployment and integration with existing fiber infrastructure.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder
Authors:
Kun Wu,
Zhiguo Jiang,
Kunming Tang,
Jun Shi,
Fengying Xie,
Wei Wang,
Haibo Wu,
Yushan Zheng
Abstract:
Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation…
▽ More
Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation pre-training with the designed position-aware masked autoencoder (PAMA). Meanwhile, we propose the position-aware cross-attention (PACA) module with a kernel reorientation (KRO) strategy and an anchor dropout (AD) mechanism. The KRO strategy can capture the complete semantic structure and eliminate ambiguity in WSIs, and the AD contributes to enhancing the robustness and generalization of the model. We evaluated our method on 6 large-scale datasets from multiple organs for pan-cancer classification tasks. The results have demonstrated the effectiveness of PAMA in generalized and discriminative WSI representation learning and pan-cancer WSI pre-training. The proposed method was also compared with \R{7} WSI analysis methods. The experimental results have indicated that our proposed PAMA is superior to the state-of-the-art methods.The code and checkpoints are available at https://github.com/WkEEn/PAMA.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Panoptic Segmentation of Galactic Structures in LSB Images
Authors:
Felix Richards,
Adeline Paiement,
Xianghua Xie,
Elisabeth Sola,
Pierre-Alain Duc
Abstract:
We explore the use of deep learning to localise galactic structures in low surface brightness (LSB) images. LSB imaging reveals many interesting structures, though these are frequently confused with galactic dust contamination, due to a strong local visual similarity. We propose a novel unified approach to multi-class segmentation of galactic structures and of extended amorphous image contaminants…
▽ More
We explore the use of deep learning to localise galactic structures in low surface brightness (LSB) images. LSB imaging reveals many interesting structures, though these are frequently confused with galactic dust contamination, due to a strong local visual similarity. We propose a novel unified approach to multi-class segmentation of galactic structures and of extended amorphous image contaminants. Our panoptic segmentation model combines Mask R-CNN with a contaminant specialised network and utilises an adaptive preprocessing layer to better capture the subtle features of LSB images. Further, a human-in-the-loop training scheme is employed to augment ground truth labels. These different approaches are evaluated in turn, and together greatly improve the detection of both galactic structures and contaminants in LSB images.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining
Authors:
Tianfang Sun,
Zhizhong Zhang,
Xin Tan,
Yanyun Qu,
Yuan Xie
Abstract:
LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications. However, two issues widely exist in this framework: 1) Solely keyframes are used for training. For example, in nuScenes, a substantial quantity of unpaired LiDAR and camera frames remain unutilized, limiting the representation capabilities of the pretrained network. 2) The con…
▽ More
LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications. However, two issues widely exist in this framework: 1) Solely keyframes are used for training. For example, in nuScenes, a substantial quantity of unpaired LiDAR and camera frames remain unutilized, limiting the representation capabilities of the pretrained network. 2) The contrastive loss erroneously distances points and image regions with identical semantics but from different frames, disturbing the semantic consistency of the learned presentations. In this paper, we propose a novel Vision-Foundation-Model-driven sample exploring module to meticulously select LiDAR-Image pairs from unexplored frames, enriching the original training set. We utilized timestamps and the semantic priors from VFMs to identify well-synchronized training pairs and to discover samples with diverse content. Moreover, we design a cross- and intra-modal conflict-aware contrastive loss using the semantic mask labels of VFMs to avoid contrasting semantically similar points and image regions. Our method consistently outperforms existing state-of-the-art pretraining frameworks across three major public autonomous driving datasets: nuScenes, SemanticKITTI, and Waymo on 3D semantic segmentation by +3.0\%, +3.0\%, and +3.3\% in mIoU, respectively. Furthermore, our approach exhibits adaptable generalization to different 3D backbones and typical semantic masks generated by non-VFM models.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Electrical Impedance Tomography Based Closed-loop Tumor Treating Fields in Dynamic Lung Tumors
Authors:
Minmin Wang,
Xu Xie,
Yuxi Guo,
Liying Zhu,
Yue Lan,
Haitang Yang,
Yun Pan,
Guangdi Chen,
Shaomin Zhang,
Maomao Zhang
Abstract:
Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporatin…
▽ More
Tumor Treating Fields (TTFields) is a non-invasive anticancer modality that utilizes alternating electric fields to disrupt cancer cell division and growth. While generally well-tolerated with minimal side effects, traditional TTFields therapy for lung tumors faces challenges due to the influence of respiratory motion. We design a novel closed-loop TTFields strategy for lung tumors by incorporating electrical impedance tomography (EIT) for real-time respiratory phase monitoring and dynamic parameter adjustments. Furthermore, we conduct theoretical analysis to evaluate the performance of the proposed method using the lung motion model. Compared to conventional TTFields settings, we observed that variations in the electrical conductivity of lung during different respiratory phases led to a decrease in the average electric field intensity within lung tumors, transitioning from end-expiratory (1.08 V/cm) to end-inspiratory (0.87 V/cm) phases. Utilizing our proposed closed-Loop TTFields approach at the same dose setting (2400 mA, consistent with the traditional TTFields setting), we can achieve a higher and consistent average electric field strength at the tumor site (1.30 V/cm) across different respiratory stages. Our proposed closed-loop TTFields method has the potential to improved lung tumor therapy by mitigating the impact of respiratory motion.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Inference Performance Optimization for Large Language Models on CPUs
Authors:
Pujiang He,
Shan Zhou,
Wenhuan Huang,
Changqing Li,
Duyi Wang,
Bin Guo,
Chen Meng,
Sheng Gui,
Weifei Yu,
Yi Xie
Abstract:
Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw…
▽ More
Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardware resources, optimizing inference performance is necessary. In this paper, we introduce an easily deployable inference performance optimization solution aimed at accelerating LLMs on CPUs. In this solution, we implement an effective way to reduce the KV cache size while ensuring precision. We propose a distributed inference optimization approach and implement it based on oneAPI Collective Communications Library. Furthermore, we propose optimization approaches for LLMs on CPU, and conduct tailored optimizations for the most commonly used models. The code is open-sourced at https://github.com/intel/xFasterTransformer.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
A blazar in the epoch of reionization
Authors:
Eduardo Banados,
Emmanuel Momjian,
Thomas Connor,
Silvia Belladitta,
Roberto Decarli,
Chiara Mazzucchelli,
Bram P. Venemans,
Fabian Walter,
Feige Wang,
Zhang-Liang Xie,
Aaron J. Barth,
Anna-Christina Eilers,
Xiaohui Fan,
Yana Khusanova,
Jan-Torge Schindler,
Daniel Stern,
Jinyi Yang,
Irham Taufik Andika,
Chris Carilli,
Emanuele P. Farina,
Andrew Fabian,
Joseph F. Hennawi,
Antonio Pensabene,
Sofia Rojas-Ruiz
Abstract:
Relativistic jets are thought to play a crucial role in the formation of massive galaxies and supermassive black holes. Here we report multi-wavelength and multi-epoch observations of the quasar VLASSJ0410-0139 at redshift z=7, powered by a 7e8 solar-mass black hole. Its radio variability, X-ray properties, and compact radio emission on parsec scales reveal that J0410-0139 is a blazar with a relat…
▽ More
Relativistic jets are thought to play a crucial role in the formation of massive galaxies and supermassive black holes. Here we report multi-wavelength and multi-epoch observations of the quasar VLASSJ0410-0139 at redshift z=7, powered by a 7e8 solar-mass black hole. Its radio variability, X-ray properties, and compact radio emission on parsec scales reveal that J0410-0139 is a blazar with a relativistic jet aligned with our line of sight. This blazar's existence implies that many more similar (unaligned) jetted sources must exist at z=7. One scenario is that we observe an intrinsically low-power radio jet, but we see it at high luminosity due to relativistic beaming effects. In this case, a large fraction (>80%) of the UV bright quasars must have a similar jet to match the number density expected from the UV quasar luminosity function. These jets can enhance the growth of supermassive black holes and substantially affect their host galaxies. However, the implications would be even more severe if the quasar belongs to the top 10% radio luminous quasars, as measured if the beaming enhancement is less than a factor of 10-15. In this scenario, there should be hundreds to thousands of radio-quiet quasars at z=7 with intrinsic properties similar to J0410-0139 -- in strong tension with the number density of bright quasars derived from their UV luminosity function. To reconcile these results, most black hole growth at z=7 must happen in an obscured phase, as some models predict. The existence of supermassive black holes in the epoch of reionization is facilitated by significant jet-enhanced or obscured super-Eddington accretion.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Uniaxial plasmon polaritons $\textit{via}$ charge transfer at the graphene/CrSBr interface
Authors:
Daniel J. Rizzo,
Eric Seewald,
Fangzhou Zhao,
Jordan Cox,
Kaichen Xie,
Rocco A. Vitalone,
Francesco L. Ruta,
Daniel G. Chica,
Yinming Shao,
Sara Shabani,
Evan J. Telford,
Matthew C. Strasbourg,
Thomas P. Darlington,
Suheng Xu,
Siyuan Qiu,
Aravind Devarakonda,
Takashi Taniguchi,
Kenji Watanabe,
Xiaoyang Zhu,
P. James Schuck,
Cory R. Dean,
Xavier Roy,
Andrew J. Millis,
Ting Cao,
Angel Rubio
, et al. (2 additional authors not shown)
Abstract:
Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon-polaritons (SPPs), as it possesses low intrinsic losses with a high degree of optical confinement. However, the inherently isotropic optical properties of graphene limit its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials as a pla…
▽ More
Graphene is a privileged 2D platform for hosting confined light-matter excitations known as surface plasmon-polaritons (SPPs), as it possesses low intrinsic losses with a high degree of optical confinement. However, the inherently isotropic optical properties of graphene limit its ability to guide and focus SPPs, making it less suitable than anisotropic elliptical and hyperbolic materials as a platform for polaritonic lensing and canalization. Here, we present the graphene/CrSBr heterostructure as an engineered 2D interface that hosts highly anisotropic SPP propagation over a wide range of frequencies in the mid-infrared and terahertz. Using a combination of scanning tunneling microscopy (STM), scattering-type scanning near-field optical microscopy (s-SNOM), and first-principles calculations, we demonstrate mutual doping in excess of 10$^{13}$ cm$^{-2}$ holes/electrons between the interfacial layers of graphene/CrSBr heterostructures. SPPs in graphene activated by charge transfer interact with charge-induced anisotropic intra- and interband transitions in the interfacial doped CrSBr, leading to preferential SPP propagation along the quasi-1D chains that compose each CrSBr layer. This multifaceted proximity effect both creates SPPs and endows them with anisotropic transport and propagation lengths that differ by an order-of-magnitude between the two in-plane crystallographic axes of CrSBr.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
Authors:
Weize Chen,
Ziming You,
Ran Li,
Yitong Guan,
Chen Qian,
Chenyang Zhao,
Cheng Yang,
Ruobing Xie,
Zhiyuan Liu,
Maosong Sun
Abstract:
The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to…
▽ More
The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to single-device setups. Furthermore, these frameworks often rely on hard-coded communication pipelines, limiting their adaptability to dynamic task requirements. Inspired by the concept of the Internet, we propose the Internet of Agents (IoA), a novel framework that addresses these limitations by providing a flexible and scalable platform for LLM-based multi-agent collaboration. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control. Through extensive experiments on general assistant tasks, embodied AI tasks, and retrieval-augmented generation benchmarks, we demonstrate that IoA consistently outperforms state-of-the-art baselines, showcasing its ability to facilitate effective collaboration among heterogeneous agents. IoA represents a step towards linking diverse agents in an Internet-like environment, where agents can seamlessly collaborate to achieve greater intelligence and capabilities. Our codebase has been released at \url{https://github.com/OpenBMB/IoA}.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making
Authors:
Yangyang Yu,
Zhiyuan Yao,
Haohang Li,
Zhiyang Deng,
Yupeng Cao,
Zhi Chen,
Jordan W. Suchow,
Rong Liu,
Zhenyu Cui,
Denghui Zhang,
Koduvayur Subbalakshmi,
Guojun Xiong,
Yueru He,
Jimin Huang,
Dong Li,
Qianqian Xie
Abstract:
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and man…
▽ More
Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-sourced information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce the FinCon, an LLM-based multi-agent framework with CONceptual verbal reinforcement tailored for diverse FINancial tasks. Inspired by effective real-world investment firm organizational structures, FinCon utilizes a manager-analyst communication hierarchy. This structure allows for synchronized cross-functional agent collaboration towards unified goals through natural language interactions and equips each agent with greater memory capacity than humans. Additionally, a risk-control component in FinCon enhances decision quality by episodically initiating a self-critiquing mechanism to update systematic investment beliefs. The conceptualized beliefs serve as verbal reinforcement for the future agent's behavior and can be selectively propagated to the appropriate node that requires knowledge updates. This feature significantly improves performance while reducing unnecessary peer-to-peer communication costs. Moreover, FinCon demonstrates strong generalization capabilities in various financial tasks, including single stock trading and portfolio management.
△ Less
Submitted 10 July, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Sketch-Guided Scene Image Generation
Authors:
Tianyu Zhang,
Xiaoxuan Xie,
Xusheng Du,
Haoran Xie
Abstract:
Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image scene generation from sketch inputs into object-l…
▽ More
Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image scene generation from sketch inputs into object-level cross-domain generation and scene-level image construction. We employ pre-trained diffusion models to convert each single object drawing into an image of the object, inferring additional details while maintaining the sparse sketch structure. In order to maintain the conceptual fidelity of the foreground during scene generation, we invert the visual features of object images into identity embeddings for scene generation. In scene-level image construction, we generate the latent representation of the scene image using the separated background prompts, and then blend the generated foreground objects according to the layout of the sketch input. To ensure the foreground objects' details remain unchanged while naturally composing the scene image, we infer the scene image on the blended latent representation using a global prompt that includes the trained identity tokens. Through qualitative and quantitative experiments, we demonstrate the ability of the proposed approach to generate scene images from hand-drawn sketches surpasses the state-of-the-art approaches.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
AI-driven multi-omics integration for multi-scale predictive modeling of causal genotype-environment-phenotype relationships
Authors:
You Wu,
Lei Xie
Abstract:
Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but strug…
▽ More
Despite the wealth of single-cell multi-omics data, it remains challenging to predict the consequences of novel genetic and chemical perturbations in the human body. It requires knowledge of molecular interactions at all biological levels, encompassing disease models and humans. Current machine learning methods primarily establish statistical correlations between genotypes and phenotypes but struggle to identify physiologically significant causal factors, limiting their predictive power. Key challenges in predictive modeling include scarcity of labeled data, generalization across different domains, and disentangling causation from correlation. In light of recent advances in multi-omics data integration, we propose a new artificial intelligence (AI)-powered biology-inspired multi-scale modeling framework to tackle these issues. This framework will integrate multi-omics data across biological levels, organism hierarchies, and species to predict causal genotype-environment-phenotype relationships under various conditions. AI models inspired by biology may identify novel molecular targets, biomarkers, pharmaceutical agents, and personalized medicines for presently unmet medical needs.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation
Authors:
Mengzhe Geng,
Xurong Xie,
Jiajun Deng,
Zengrui Jin,
Guinan Li,
Tianzi Wang,
Shujie Hu,
Zhaoqing Li,
Helen Meng,
Xunying Liu
Abstract:
The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-…
▽ More
The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-time adaptation of DNN/TDNN and Conformer ASR models. These include: 1) speaker-level variance-regularized spectral basis embedding (VR-SBE) features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation; and 2) feature-based learning hidden unit contributions (f-LHUC) transforms that are conditioned on VR-SBE features. Experiments are conducted on four tasks across two languages: the English UASpeech and TORGO dysarthric speech datasets, the English DementiaBank Pitt and Cantonese JCCOCC MoCA elderly speech corpora. The proposed on-the-fly speaker adaptation techniques consistently outperform baseline iVector and xVector adaptation by statistically significant word or character error rate reductions up to 5.32% absolute (18.57% relative) and batch-mode LHUC speaker adaptation by 2.24% absolute (9.20% relative), while operating with real-time factors speeding up to 33.6 times against xVectors during adaptation. The efficacy of the proposed adaptation techniques is demonstrated in a comparison against current ASR technologies including SSL pre-trained systems on UASpeech, where our best system produces a state-of-the-art WER of 23.33%. Analyses show VR-SBE features and f-LHUC transforms are insensitive to speaker-level data quantity in testtime adaptation. T-SNE visualization reveals they have stronger speaker-level homogeneity than baseline iVectors, xVectors and batch-mode LHUC transforms.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
Authors:
Xinying Guo,
Mingyuan Zhang,
Haozhe Xie,
Chenyang Gu,
Ziwei Liu
Abstract:
Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing h…
▽ More
Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing human motion generation models typically focus on individual behaviors, neglecting the complexities of collective behaviors. On the other hand, recent methods for multi-person motion generation depend heavily on pre-defined scenarios and are limited to a fixed, small number of inter-person interactions, thus hampering their practicality. To overcome these challenges, we introduce CrowdMoGen, a zero-shot text-driven framework that harnesses the power of Large Language Model (LLM) to incorporate the collective intelligence into the motion generation framework as guidance, thereby enabling generalizable planning and generation of crowd motions without paired training data. Our framework consists of two key components: 1) Crowd Scene Planner that learns to coordinate motions and dynamics according to specific scene contexts or introduced perturbations, and 2) Collective Motion Generator that efficiently synthesizes the required collective motions based on the holistic plans. Extensive quantitative and qualitative experiments have validated the effectiveness of our framework, which not only fills a critical gap by providing scalable and generalizable solutions for Crowd Motion Generation task but also achieves high levels of realism and flexibility.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
A Survey of Controllable Learning: Methods and Applications in Information Retrieval
Authors:
Chenglei Shen,
Xiao Zhang,
Teng Shi,
Changshuo Zhang,
Guofu Xie,
Jun Xu
Abstract:
Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorize…
▽ More
Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorizes CL according to who controls (users or platforms), what is controllable (e.g., retrieval objectives, users' historical behaviors, controllable environmental adaptation), how control is implemented (e.g., rule-based method, Pareto optimization, Hypernetwork), and where to implement control (e.g.,pre-processing, in-processing, post-processing methods). Then, we identify challenges faced by CL across training, evaluation, task setting, and deployment in online environments. Additionally, we outline promising directions for CL in theoretical analysis, efficient computation, empowering large language models, application scenarios and evaluation frameworks in IR.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
STMR: Spiral Transformer for Hand Mesh Reconstruction
Authors:
Huilong Xie,
Wenwei Song,
Wenxiong Kang,
Yihong Lin
Abstract:
Recent advancements in both transformer-based methods and spiral neighbor sampling techniques have greatly enhanced hand mesh reconstruction. Transformers excel in capturing complex vertex relationships, and spiral neighbor sampling is vital for utilizing topological structures. This paper ingeniously integrates spiral sampling into the Transformer architecture, enhancing its ability to leverage m…
▽ More
Recent advancements in both transformer-based methods and spiral neighbor sampling techniques have greatly enhanced hand mesh reconstruction. Transformers excel in capturing complex vertex relationships, and spiral neighbor sampling is vital for utilizing topological structures. This paper ingeniously integrates spiral sampling into the Transformer architecture, enhancing its ability to leverage mesh topology for superior performance in hand mesh reconstruction, resulting in substantial accuracy boosts. STMR employs a single image encoder for model efficiency. To augment its information extraction capability, we design the multi-scale pose feature extraction (MSPFE) module, which facilitates the extraction of rich pose features, ultimately enhancing the model's performance. Moreover, the proposed predefined pose-to-vertex lifting (PPVL) method improves vertex feature representation, further boosting reconstruction performance. Extensive experiments on the FreiHAND dataset demonstrate the state-of-the-art performance and unparalleled inference speed of STMR compared with similar backbone methods, showcasing its efficiency and effectiveness. The code is available at https://github.com/SmallXieGithub/STMR.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
PORCA: Root Cause Analysis with Partially Observed Data
Authors:
Chang Gong,
Di Yao,
Jin Wang,
Wenbin Li,
Lanting Fang,
Yongtao Xie,
Kaiyu Feng,
Peng Han,
Jingping Bi
Abstract:
Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which…
▽ More
Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by uncovering and analyzing the causal structure from complex systems. It has been widely used in many application domains. Reliable diagnostic conclusions are of great importance in mitigating system failures and financial losses. However, previous studies implicitly assume a full observation of the system, which neglect the effect of partial observation (i.e., missing nodes and latent malfunction). As a result, they fail in deriving reliable RCA results. In this paper, we unveil the issues of unobserved confounders and heterogeneity in partial observation and come up with a new problem of root cause analysis with partially observed data. To achieve this, we propose PORCA, a novel RCA framework which can explore reliable root causes under both unobserved confounders and unobserved heterogeneity. PORCA leverages magnified score-based causal discovery to efficiently optimize acyclic directed mixed graph under unobserved confounders. In addition, we also develop a heterogeneity-aware scheduling strategy to provide adaptive sample weights. Extensive experimental results on one synthetic and two real-world datasets demonstrate the effectiveness and superiority of the proposed framework.
△ Less
Submitted 11 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
DarkSide-20k sensitivity to light dark matter particles
Authors:
DarkSide-20k Collaboration,
:,
F. Acerbi,
P. Adhikari,
P. Agnes,
I. Ahmad,
S. Albergo,
I. F. M. Albuquerque,
T. Alexander,
A. K. Alton,
P. Amaudruz,
M. Angiolilli,
E. Aprile,
R. Ardito,
M. Atzori Corona,
D. J. Auty,
M. Ave,
I. C. Avetisov,
O. Azzolini,
H. O. Back,
Z. Balmforth,
A. Barrado Olmedo,
P. Barrillon,
G. Batignani,
P. Bhowmick
, et al. (289 additional authors not shown)
Abstract:
The dual-phase liquid argon time projection chamber is presently one of the leading technologies to search for dark matter particles with masses below 10 GeV/c$^2$. This was demonstrated by the DarkSide-50 experiment with approximately 50 kg of low-radioactivity liquid argon as target material. The next generation experiment DarkSide-20k, currently under construction, will use 1,000 times more arg…
▽ More
The dual-phase liquid argon time projection chamber is presently one of the leading technologies to search for dark matter particles with masses below 10 GeV/c$^2$. This was demonstrated by the DarkSide-50 experiment with approximately 50 kg of low-radioactivity liquid argon as target material. The next generation experiment DarkSide-20k, currently under construction, will use 1,000 times more argon and is expected to start operation in 2027. Based on the DarkSide-50 experience, here we assess the DarkSide-20k sensitivity to models predicting light dark matter particles, including Weakly Interacting Massive Particles (WIMPs) and sub-GeV/c$^2$ particles interacting with electrons in argon atoms. With one year of data, a sensitivity improvement to dark matter interaction cross-sections by at least one order of magnitude with respect to DarkSide-50 is expected for all these models. A sensitivity to WIMP--nucleon interaction cross-sections below $1\times10^{-42}$ cm$^2$ is achievable for WIMP masses above 800 MeV/c$^2$. With 10 years exposure, the neutrino fog can be reached for WIMP masses around 5 GeV/c$^2$.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.