-
WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning
Authors:
Yiheng Li,
Chongjian Ge,
Chenran Li,
Chenfeng Xu,
Masayoshi Tomizuka,
Chen Tang,
Mingyu Ding,
Wei Zhan
Abstract:
We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet…
▽ More
We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet sufficiently covered, despite being very common and more challenging for prediction or planning models to understand. Therefore, our WOMD-Reasoning focuses extensively on these interactions, providing a total of 409k Q&As for varying types of interactions. Additionally, WOMD-Reasoning presents by far the largest Q&A dataset on real-world driving scenarios, with around 3 million Q&As covering various topics of autonomous driving from map descriptions, motion status descriptions, to narratives and analyses of agents' interactions, behaviors, and intentions. This extensive textual information enables fine-tuning driving-related Large Language Models (LLMs) for a wide range of applications like scene description, prediction, planning, etc. By incorporating interaction and intention language from WOMD-Reasoning, we see significant enhancements in the performance of the state-of-the-art trajectory prediction model, Multipath++, with improvements of 10.14% in $MR_6$ and 6.90% in $minFDE_6$, proving the effectiveness of WOMD-Reasoning. We hope WOMD-Reasoning would empower LLMs in driving to offer better interaction understanding and behavioral reasoning. The dataset is available on https://waymo.com/open/download .
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning
Authors:
Yixiao Wang,
Yifei Zhang,
Mingxiao Huo,
Ran Tian,
Xiang Zhang,
Yichen Xie,
Chenfeng Xu,
Pengliang Ji,
Wei Zhan,
Mingyu Ding,
Masayoshi Tomizuka
Abstract:
The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). B…
▽ More
The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. SDP not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulations and real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning of new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications. Demos and codes can be found in https://forrest-110.github.io/sparse_diffusion_policy/.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Implementation of a scalable universal two-qubit quantum processor with electron and nuclear spins in a trapped ion
Authors:
Ji Bian,
Teng Liu,
Qifeng Lao,
Min Ding,
Huiyi Zhang,
Xinxin Rao,
Pengfei Lu,
Le Luo
Abstract:
Increasing the quantum information processing power with limited number of hosts is vital for achieving quantum advantage. Here we propose a novel scheme that achieves a scalable n-ion-2n-qubit quantum processor utilizing four internal levels of each ion, and experimentally implement a 1-ion-2-qubit universal processor using the valence electron spin and nuclear spin of a single 171Yb+ ion. Fideli…
▽ More
Increasing the quantum information processing power with limited number of hosts is vital for achieving quantum advantage. Here we propose a novel scheme that achieves a scalable n-ion-2n-qubit quantum processor utilizing four internal levels of each ion, and experimentally implement a 1-ion-2-qubit universal processor using the valence electron spin and nuclear spin of a single 171Yb+ ion. Fidelities of single-qubit and two-qubit gates are around 0.98 obtained by quantum process tomography. Additionally, the Grover's algorithm is implemented with a successful rate exceeding 0.99. We provide explicit scaling-up protocols based on standard laser-less and laser-based frameworks, and further demonstrate that the electron/nuclear-spin scheme allows less demanding two-qubit entangling gates between different ions. The replacement of some inter-atomic gates by intra-atomic gates could increase the fidelity of some quantum circuits. Our work paves the way towards achieving 2n-times increase in the size of quantum computational Hilbert space with n ions.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity
Authors:
Mucong Ding,
Tahseen Rabbani,
Bang An,
Evan Z Wang,
Furong Huang
Abstract:
Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational com…
▽ More
Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various sampling-based and historical-embedding-based methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketch-based algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensor-sketch (PTS) theory, our framework provides a novel protocol for sketching non-linear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality-sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on large-graph benchmarks demonstrate the scalability and competitive performance of our Sketch-GNNs versus their full-size GNN counterparts.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Authors:
Mucong Ding,
Souradip Chakraborty,
Vibhu Agrawal,
Zora Che,
Alec Koppel,
Mengdi Wang,
Amrit Bedi,
Furong Huang
Abstract:
Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conc…
▽ More
Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
KnobTree: Intelligent Database Parameter Configuration via Explainable Reinforcement Learning
Authors:
Jiahan Chen,
Shuhan Qi,
Yifan Li,
Zeyu Dong,
Mingfeng Ding,
Yulin Wu,
Xuan Wang
Abstract:
Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box prope…
▽ More
Databases are fundamental to contemporary information systems, yet traditional rule-based configuration methods struggle to manage the complexity of real-world applications with hundreds of tunable parameters. Deep reinforcement learning (DRL), which combines perception and decision-making, presents a potential solution for intelligent database configuration tuning. However, due to black-box property of RL-based method, the generated database tuning strategies still face the urgent problem of lack explainability. Besides, the redundant parameters in large scale database always make the strategy learning become unstable. This paper proposes KnobTree, an interpertable framework designed for the optimization of database parameter configuration. In this framework, an interpertable database tuning algorithm based on RL-based differentatial tree is proposed, which building a transparent tree-based model to generate explainable database tuning strategies. To address the problem of large-scale parameters, We also introduce a explainable method for parameter importance assessment, by utilizing Shapley Values to identify parameters that have significant impacts on database performance. Experiments conducted on MySQL and Gbase8s databases have verified exceptional transparency and interpretability of the KnobTree model. The good property makes generated strategies can offer practical guidance to algorithm designers and database administrators. Moreover, our approach also slightly outperforms the existing RL-based tuning algorithms in aspects such as throughput, latency, and processing time.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Bridging Electromagnetic and Gravitational Form Factors: Insights from LFHQCD
Authors:
Xiaobin Wang,
Zanbin Xing,
Minghui Ding,
Khépani Raya,
Lei Chang
Abstract:
We propose an efficacious approach to derive the generalized parton distributions for the pion and proton, based upon prior knowledge of their respective parton distribution functions (PDFs). Our method leverages on integral representations of the electromagnetic form factors derived from the light-front holographic QCD (LFHQCD) formalism, coupled with PDFs computed from continuum Schwinger functi…
▽ More
We propose an efficacious approach to derive the generalized parton distributions for the pion and proton, based upon prior knowledge of their respective parton distribution functions (PDFs). Our method leverages on integral representations of the electromagnetic form factors derived from the light-front holographic QCD (LFHQCD) formalism, coupled with PDFs computed from continuum Schwinger functional methods at the hadronic scale. Using these techniques, we calculate gravitational form factors and associated mass distributions for each hadron. Remarkably, our calculations yield results that closely match recent lattice QCD simulations conducted near the physical pion mass. This work not only deepens our understanding of hadronic structure but also highlights the efficacy of the LFHQCD approach in modeling fundamental properties of hadrons.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Authors:
Yuhang Wu,
Wenmeng Yu,
Yean Cheng,
Yan Wang,
Xiaohan Zhang,
Jiazheng Xu,
Ming Ding,
Yuxiao Dong
Abstract:
Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically des…
▽ More
Evaluating the alignment capabilities of large Vision-Language Models (VLMs) is essential for determining their effectiveness as helpful assistants. However, existing benchmarks primarily focus on basic abilities using nonverbal methods, such as yes-no and multiple-choice questions. In this paper, we address this gap by introducing AlignMMBench, a comprehensive alignment benchmark specifically designed for emerging Chinese VLMs. This benchmark is meticulously curated from real-world scenarios and Chinese Internet sources, encompassing thirteen specific tasks across three categories, and includes both single-turn and multi-turn dialogue scenarios. Incorporating a prompt rewrite strategy, AlignMMBench encompasses 1,054 images and 4,978 question-answer pairs. To facilitate the evaluation pipeline, we propose CritiqueVLM, a rule-calibrated evaluator that exceeds GPT-4's evaluation ability. Finally, we report the performance of representative VLMs on AlignMMBench, offering insights into the capabilities and limitations of different VLM architectures. All evaluation codes and data are available on https://alignmmbench.github.io.
△ Less
Submitted 13 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography
Authors:
Weicheng Yan,
Qiude Zhang,
Yun Wu,
Zhaohui Liu,
Liang Zhou,
Mingyue Ding,
Ming Yuchi,
Wu Qiu
Abstract:
Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in…
▽ More
Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in USCT. However, traditional FWI for sound speed image reconstruction suffers from high sensitivity to the initial model caused by its strong non-convex nonlinearity, resulting in poor performance when ultrasound signals are at high frequencies. This limitation significantly restricts the application of FWI in the USCT imaging field. In this paper, we propose an untrained neural network (UNN) that can be integrated into the traditional iteration-based FWI framework as an implicit regularization prior. This integration allows for seamless deployment as a plug-and-play module within existing FWI algorithms or their variants. Notably, the proposed UNN method can be trained in an unsupervised fashion, a vital aspect in medical imaging where ground truth data is often unavailable. Evaluations of the numerical simulation and phantom experiment of the breast demonstrate that the proposed UNN improves the robustness of image reconstruction, reduces image artifacts, and achieves great image contrast. To the best of our knowledge, this study represents the first attempt to propose an implicit UNN for FWI in reconstructing sound speed images for USCT.
△ Less
Submitted 13 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
LVBench: An Extreme Long Video Understanding Benchmark
Authors:
Weihan Wang,
Zehai He,
Wenyi Hong,
Yean Cheng,
Xiaohan Zhang,
Ji Qi,
Shiyu Huang,
Bin Xu,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sport…
▽ More
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sports commentary, all of which require comprehension of long videos spanning several hours. To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding. Our dataset comprises publicly sourced videos and encompasses a diverse set of tasks aimed at long video comprehension and information extraction. LVBench is designed to challenge multimodal models to demonstrate long-term memory and extended comprehension capabilities. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks. Through LVBench, we aim to spur the development of more advanced models capable of tackling the complexities of long video comprehension. Our data and code are publicly available at: https://lvbench.github.io.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Quantitative analysis and its applications for Keller-Segel type systems
Authors:
Mengyao Ding,
Yuzhou Fang,
Chao Zhang
Abstract:
In this paper, we utilize the De Giorgi iteration to quantitatively analyze the upper bound of solutions for Keller-Segel type systems. The refined upper bound estimate presented here has broad applications in determining large time behaviours of weak solutions and improving the regularity for models involving the $p$-Laplace operator. To demonstrate the applicability of our findings, we investiga…
▽ More
In this paper, we utilize the De Giorgi iteration to quantitatively analyze the upper bound of solutions for Keller-Segel type systems. The refined upper bound estimate presented here has broad applications in determining large time behaviours of weak solutions and improving the regularity for models involving the $p$-Laplace operator. To demonstrate the applicability of our findings, we investigate the asymptotic stability of a chemotaxis model with nonlinear signal production and a chemotaxis-Navier-Stokes model with a logistic source. Additionally, within the context of $p$-Laplacian diffusion, we establish Hölder continuity for a chemotaxis-haptotaxis model and a chemotaxis-Stokes model.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
Authors:
Shang Wang,
Tianqing Zhu,
Bo Liu,
Ming Ding,
Xu Guo,
Dayong Ye,
Wanlei Zhou,
Philip S. Yu
Abstract:
With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and se…
▽ More
With the rapid development of artificial intelligence, large language models (LLMs) have made remarkable advancements in natural language processing. These models are trained on vast datasets to exhibit powerful language understanding and generation capabilities across various applications, including machine translation, chatbots, and agents. However, LLMs have revealed a variety of privacy and security issues throughout their life cycle, drawing significant academic and industrial attention. Moreover, the risks faced by LLMs differ significantly from those encountered by traditional language models. Given that current surveys lack a clear taxonomy of unique threat models across diverse scenarios, we emphasize the unique privacy and security threats associated with five specific scenarios: pre-training, fine-tuning, retrieval-augmented generation systems, deployment, and LLM-based agents. Addressing the characteristics of each risk, this survey outlines potential threats and countermeasures. Research on attack and defense situations can offer feasible research directions, enabling more areas to benefit from LLMs.
△ Less
Submitted 18 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Break the Chain: Large Language Models Can be Shortcut Reasoners
Authors:
Mengru Ding,
Hanmeng Liu,
Zhizhang Fu,
Jian Song,
Wenbo Xie,
Yue Zhang
Abstract:
Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio…
▽ More
Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integration of human-like heuristics and shortcuts into language models (LMs) through "break the chain" strategies. These strategies disrupt traditional CoT processes using controlled variables to assess their efficacy. Additionally, we develop innovative zero-shot prompting strategies that encourage the use of shortcuts, enabling LMs to quickly exploit reasoning clues and bypass detailed procedural steps. Our comprehensive experiments across various LMs, both commercial and open-source, reveal that LMs maintain effective performance with "break the chain" strategies. We also introduce ShortcutQA, a dataset specifically designed to evaluate reasoning through shortcuts, compiled from competitive tests optimized for heuristic reasoning tasks such as forward/backward reasoning and simplification. Our analysis confirms that ShortcutQA not only poses a robust challenge to LMs but also serves as an essential benchmark for enhancing reasoning efficiency in AI.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Memorization in deep learning: A survey
Authors:
Jiaheng Wei,
Yanjun Zhang,
Leo Yu Zhang,
Ming Ding,
Chao Chen,
Kok-Leong Ong,
Jun Zhang,
Yang Xiang
Abstract:
Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model…
▽ More
Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model generalization, security, and privacy. This raises critical questions about the nature of generalization in DNNs and their susceptibility to security breaches. In this survey, we present a systematic framework to organize memorization definitions based on the generalization and security/privacy domains and summarize memorization evaluation methods at both the example and model levels. Through a comprehensive literature review, we explore DNN memorization behaviors and their impacts on security and privacy. We also introduce privacy vulnerabilities caused by memorization and the phenomenon of forgetting and explore its connection with memorization. Furthermore, we spotlight various applications leveraging memorization and forgetting mechanisms, including noisy label learning, privacy preservation, and model enhancement. This survey offers the first-in-kind understanding of memorization in DNNs, providing insights into its challenges and opportunities for enhancing AI development while addressing critical ethical concerns.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Phonon heat conduction across slippery interfaces in twisted graphite
Authors:
Fuwei Yang,
Wenjiang Zhou,
Zhibin Zhang,
Xuanyu Huang,
Jingwen Zhang,
Nianjie Liang,
Wujuan Yan,
Yuxi Wang,
Mingchao Ding,
Quanlin Guo,
Yu Han,
Te-Huan Liu,
Kaihui Liu,
Quanshui Zheng,
Bai Song
Abstract:
Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and prob…
▽ More
Interlayer rotation in van der Waals (vdW) materials offers great potential for manipulating phonon dynamics and heat flow in advanced electronics with ever higher compactness and power density. However, despite extensive theoretical efforts in recent years, experimental measurements remain scarce especially due to the critical challenges of preparing single-crystalline twisted interfaces and probing interfacial thermal transport with sufficient resolution. Here, we exploited the intrinsic twisted interfaces in highly oriented pyrolytic graphite (HOPG). By developing novel experimental schemes based on microfabricated mesas, we managed to achieve simultaneous mechanical characterizations and thermal measurements. In particular, we pushed the HOPG mesas with a microprobe to identify and rotate single-crystalline intrinsic interfaces owing to their slippery nature as is well known in structural superlubricity. Remarkably, we observed over 30-fold suppression of thermal conductance for the slippery interfaces by using epitaxial graphite as a control. Nonetheless, the interfacial conductance remains around 600 $\mathrm{MWm^{-2}K^{-1}}$ which surpasses the highest values for artificially stacked vdW structures by more than five times. Further, atomic simulations revealed the predominant role of the transverse acoustic phonons. Together, our findings highlight a general physical picture that directly correlates interfacial thermal transport with sliding resistance, and lay the foundation for twist-enabled thermal management which are particularly beneficial to twistronics and slidetronics.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Stochastic Thermodynamics of Micromagnetics with Spin Torque
Authors:
Mingnan Ding,
Jun Wu,
Xiangjun Xing
Abstract:
In this work, we study the stochastic dynamics of micro-magnetics interacting with a spin-current torque. We extend the previously constructed stochastic Landau-Lifshitz equation to the case with spin-current torque, and verify the conditions of detailed balance. Then we construct various thermodynamics quantities such as work and heat, and prove the second law of thermodynamics. Due to the existe…
▽ More
In this work, we study the stochastic dynamics of micro-magnetics interacting with a spin-current torque. We extend the previously constructed stochastic Landau-Lifshitz equation to the case with spin-current torque, and verify the conditions of detailed balance. Then we construct various thermodynamics quantities such as work and heat, and prove the second law of thermodynamics. Due to the existence of spin-torque and the asymmetry of the kinetic matrix, a novel effect of entropy pumping shows up. As a consequence, the system may behave as a heat engine which constantly transforms heat into magnetic work. Finally, we derive a fluctuation theorem for the joint probability density function of the pumped entropy and the total work, and verify it using numerical simulations.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization
Authors:
Xiumei Deng,
Jun Li,
Kang Wei,
Long Shi,
Zeihui Xiong,
Ming Ding,
Wen Chen,
Shi Jin,
H. Vincent Poor
Abstract:
Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead compared to federated SGD (FedSGD) algorithms, which arises from the necessity to transmit both local model updates a…
▽ More
Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead compared to federated SGD (FedSGD) algorithms, which arises from the necessity to transmit both local model updates and first and second moment estimates from distributed devices to the centralized server for aggregation. Driven by this issue, we propose a novel sparse FedAdam algorithm called FedAdam-SSM, wherein distributed devices sparsify the updates of local model parameters and moment estimates and subsequently upload the sparse representations to the centralized server. To further reduce the communication overhead, the updates of local model parameters and moment estimates incorporate a shared sparse mask (SSM) into the sparsification process, eliminating the need for three separate sparse masks. Theoretically, we develop an upper bound on the divergence between the local model trained by FedAdam-SSM and the desired model trained by centralized Adam, which is related to sparsification error and imbalanced data distribution. By minimizing the divergence bound between the model trained by FedAdam-SSM and centralized Adam, we optimize the SSM to mitigate the learning performance degradation caused by sparsification error. Additionally, we provide convergence bounds for FedAdam-SSM in both convex and non-convex objective function settings, and investigate the impact of local epoch, learning rate and sparsification ratio on the convergence rate of FedAdam-SSM. Experimental results show that FedAdam-SSM outperforms baselines in terms of convergence rate (over 1.1$\times$ faster than the sparse FedAdam baselines) and test accuracy (over 14.5\% ahead of the quantized FedAdam baselines).
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Trustworthy DNN Partition for Blockchain-enabled Digital Twin in Wireless IIoT Networks
Authors:
Xiumei Deng,
Jun Li,
Long Shi,
Kang Wei,
Ming Ding,
Yumeng Shao,
Wen Chen,
Shi Jin
Abstract:
Digital twin (DT) has emerged as a promising solution to enhance manufacturing efficiency in industrial Internet of Things (IIoT) networks. To promote the efficiency and trustworthiness of DT for wireless IIoT networks, we propose a blockchain-enabled DT (B-DT) framework that employs deep neural network (DNN) partitioning technique and reputation-based consensus mechanism, wherein the DTs maintain…
▽ More
Digital twin (DT) has emerged as a promising solution to enhance manufacturing efficiency in industrial Internet of Things (IIoT) networks. To promote the efficiency and trustworthiness of DT for wireless IIoT networks, we propose a blockchain-enabled DT (B-DT) framework that employs deep neural network (DNN) partitioning technique and reputation-based consensus mechanism, wherein the DTs maintained at the gateway side execute DNN inference tasks using the data collected from their associated IIoT devices. First, we employ DNN partitioning technique to offload the top-layer DNN inference tasks to the access point (AP) side, which alleviates the computation burden at the gateway side and thereby improves the efficiency of DNN inference. Second, we propose a reputation-based consensus mechanism that integrates Proof of Work (PoW) and Proof of Stake (PoS). Specifically, the proposed consensus mechanism evaluates the off-chain reputation of each AP according to its computation resource contributions to the DNN inference tasks, and utilizes the off-chain reputation as a stake to adjust the block generation difficulty. Third, we formulate a stochastic optimization problem of communication resource (i.e., partition point) and computation resource allocation (i.e., computation frequency of APs for top-layer DNN inference and block generation) to minimize system latency under the time-varying channel state and long-term constraints of off-chain reputation, and solve the problem using Lyapunov optimization method. Experimental results show that the proposed dynamic DNN partitioning and resource allocation (DPRA) algorithm outperforms the baselines in terms of reducing the overall latency while guaranteeing the trustworthiness of the B-DT system.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Understanding Forgetting in Continual Learning with Linear Regression
Authors:
Meng Ding,
Kaiyi Ji,
Di Wang,
Jinhui Xu
Abstract:
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic…
▽ More
Continual learning, focused on sequentially learning multiple tasks, has gained significant attention recently. Despite the tremendous progress made in the past, the theoretical understanding, especially factors contributing to catastrophic forgetting, remains relatively unexplored. In this paper, we provide a general theoretical analysis of forgetting in the linear regression model via Stochastic Gradient Descent (SGD) applicable to both underparameterized and overparameterized regimes. Our theoretical framework reveals some interesting insights into the intricate relationship between task sequence and algorithmic parameters, an aspect not fully captured in previous studies due to their restrictive assumptions. Specifically, we demonstrate that, given a sufficiently large data size, the arrangement of tasks in a sequence, where tasks with larger eigenvalues in their population data covariance matrices are trained later, tends to result in increased forgetting. Additionally, our findings highlight that an appropriate choice of step size will help mitigate forgetting in both underparameterized and overparameterized settings. To validate our theoretical analysis, we conducted simulation experiments on both linear regression models and Deep Neural Networks (DNNs). Results from these simulations substantiate our theoretical findings.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Calibrated Dataset Condensation for Faster Hyperparameter Search
Authors:
Mucong Ding,
Yuancheng Xu,
Tahseen Rabbani,
Xiaoyu Liu,
Brian Gravelle,
Teresa Ranadive,
Tai-Ching Tuan,
Furong Huang
Abstract:
Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generali…
▽ More
Dataset condensation can be used to reduce the computational cost of training multiple models on a large dataset by condensing the training dataset into a small synthetic set. State-of-the-art approaches rely on matching the model gradients between the real and synthetic data. However, there is no theoretical guarantee of the generalizability of the condensed data: data condensation often generalizes poorly across hyperparameters/architectures in practice. This paper considers a different condensation objective specifically geared toward hyperparameter search. We aim to generate a synthetic validation dataset so that the validation-performance rankings of the models, with different hyperparameters, on the condensed and original datasets are comparable. We propose a novel hyperparameter-calibrated dataset condensation (HCDC) algorithm, which obtains the synthetic validation dataset by matching the hyperparameter gradients computed via implicit differentiation and efficient inverse Hessian approximation. Experiments demonstrate that the proposed framework effectively maintains the validation-performance rankings of models and speeds up hyperparameter/architecture search for tasks on both images and graphs.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Spectral Greedy Coresets for Graph Neural Networks
Authors:
Mucong Ding,
Yinhan He,
Jundong Li,
Furong Huang
Abstract:
The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfu…
▽ More
The ubiquity of large-scale graphs in node-classification tasks significantly hinders the real-world applications of Graph Neural Networks (GNNs). Node sampling, graph coarsening, and dataset condensation are effective strategies for enhancing data efficiency. However, owing to the interdependence of graph nodes, coreset selection, which selects subsets of the data examples, has not been successfully applied to speed up GNN training on large graphs, warranting special treatment. This paper studies graph coresets for GNNs and avoids the interdependence issue by selecting ego-graphs (i.e., neighborhood subgraphs around a node) based on their spectral embeddings. We decompose the coreset selection problem for GNNs into two phases: a coarse selection of widely spread ego graphs and a refined selection to diversify their topologies. We design a greedy algorithm that approximately optimizes both objectives. Our spectral greedy graph coreset (SGGC) scales to graphs with millions of nodes, obviates the need for model pre-training, and applies to low-homophily graphs. Extensive experiments on ten datasets demonstrate that SGGC outperforms other coreset methods by a wide margin, generalizes well across GNN architectures, and is much faster than graph condensation.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection
Authors:
Yuwen Qian,
Shuchi Wu,
Kang Wei,
Ming Ding,
Di Xiao,
Tao Xiang,
Chuan Ma,
Song Guo
Abstract:
Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake…
▽ More
Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake a comprehensive investigation into a backdoor attack paradigm, where unscrupulous clients conspire to manipulate the global model, revealing the vulnerability of FSSL to such attacks. In FSL, backdoor attacks typically build a direct association between the backdoor trigger and the target label. In contrast, in FSSL, backdoor attacks aim to alter the global model's representation for images containing the attacker's specified trigger pattern in favor of the attacker's intended target class, which is less straightforward. In this sense, we demonstrate that existing defenses are insufficient to mitigate the investigated backdoor attacks in FSSL, thus finding an effective defense mechanism is urgent. To tackle this issue, we dive into the fundamental mechanism of backdoor attacks on FSSL, proposing the Embedding Inspector (EmInspector) that detects malicious clients by inspecting the embedding space of local models. In particular, EmInspector assesses the similarity of embeddings from different local models using a small set of inspection images (e.g., ten images of CIFAR100) without specific requirements on sample distribution or labels. We discover that embeddings from backdoored models tend to cluster together in the embedding space for a given inspection image. Evaluation results show that EmInspector can effectively mitigate backdoor attacks on FSSL across various adversary settings. Our code is avaliable at https://github.com/ShuchiWu/EmInspector.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Decentralized Privacy Preservation for Critical Connections in Graphs
Authors:
Conggai Li,
Wei Ni,
Ming Ding,
Youyang Qu,
Jianjun Chen,
David Smith,
Wenjie Zhang,
Thierry Rakotoarivelo
Abstract:
Many real-world interconnections among entities can be characterized as graphs. Collecting local graph information with balanced privacy and data utility has garnered notable interest recently. This paper delves into the problem of identifying and protecting critical information of entity connections for individual participants in a graph based on cohesive subgraph searches. This problem has not b…
▽ More
Many real-world interconnections among entities can be characterized as graphs. Collecting local graph information with balanced privacy and data utility has garnered notable interest recently. This paper delves into the problem of identifying and protecting critical information of entity connections for individual participants in a graph based on cohesive subgraph searches. This problem has not been addressed in the literature. To address the problem, we propose to extract the critical connections of a queried vertex using a fortress-like cohesive subgraph model known as $p$-cohesion. A user's connections within a fortress are obfuscated when being released, to protect critical information about the user. Novel merit and penalty score functions are designed to measure each participant's critical connections in the minimal $p$-cohesion, facilitating effective identification of the connections. We further propose to preserve the privacy of a vertex enquired by only protecting its critical connections when responding to queries raised by data collectors. We prove that, under the decentralized differential privacy (DDP) mechanism, one's response satisfies $(\varepsilon, δ)$-DDP when its critical connections are protected while the rest remains unperturbed. The effectiveness of our proposed method is demonstrated through extensive experiments on real-life graph datasets.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Industrial Metaverse: Enabling Technologies, Open Problems, and Future Trends
Authors:
Shiying Zhang,
Jun Li,
Long Shi,
Ming Ding,
Dinh C. Nguyen,
Wen Chen,
Zhu Han
Abstract:
As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspe…
▽ More
As an emerging technology that enables seamless integration between the physical and virtual worlds, the Metaverse has great potential to be deployed in the industrial production field with the development of extended reality (XR) and next-generation communication networks. This deployment, called the Industrial Metaverse, is used for product design, production operations, industrial quality inspection, and product testing. However, there lacks of in-depth understanding of the enabling technologies associated with the Industrial Metaverse. This encompasses both the precise industrial scenarios targeted by each technology and the potential migration of technologies developed in other domains to the industrial sector. Driven by this issue, in this article, we conduct a comprehensive survey of the state-of-the-art literature on the Industrial Metaverse. Specifically, we first analyze the advantages of the Metaverse for industrial production. Then, we review a collection of key enabling technologies of the Industrial Metaverse, including blockchain (BC), digital twin (DT), 6G, XR, and artificial intelligence (AI), and analyze how these technologies can support different aspects of industrial production. Subsequently, we present numerous formidable challenges encountered within the Industrial Metaverse, including confidentiality and security concerns, resource limitations, and interoperability constraints. Furthermore, we investigate the extant solutions devised to address them. Finally, we briefly outline several open issues and future research directions of the Industrial Metaverse.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
A Class of Convex Optimization-Based Recursive Algorithms for Identification of Stochastic Systems
Authors:
Mingxia Ding,
Wenxiao Zhao,
Tianshi Chen
Abstract:
Focusing on identification, this paper develops a class of convex optimization-based criteria and correspondingly the recursive algorithms to estimate the parameter vector $θ^{*}$ of a stochastic dynamic system. Not only do the criteria include the classical least-squares estimator but also the $L_l=|\cdot|^l, l\geq 1$, the Huber, the Log-cosh, and the Quantile costs as special cases. First, we pr…
▽ More
Focusing on identification, this paper develops a class of convex optimization-based criteria and correspondingly the recursive algorithms to estimate the parameter vector $θ^{*}$ of a stochastic dynamic system. Not only do the criteria include the classical least-squares estimator but also the $L_l=|\cdot|^l, l\geq 1$, the Huber, the Log-cosh, and the Quantile costs as special cases. First, we prove that the minimizers of the convex optimization-based criteria converge to $θ^{*}$ with probability one. Second, the recursive algorithms are proposed to find the estimates, which minimize the convex optimization-based criteria, and it is shown that these estimates also converge to the true parameter vector with probability one. Numerical examples are given, justifying the performance of the proposed algorithms including the strong consistency of the estimates, the robustness against outliers in the observations, and higher efficiency in online computation compared with the kernel-based regularization method due to the recursive nature.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations
Authors:
Yumeng Shao,
Jun Li,
Long Shi,
Kang Wei,
Ming Ding,
Qianmu Li,
Zengxiang Li,
Wen Chen,
Shi Jin
Abstract:
Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation i…
▽ More
Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation incurs a new problem of inconsistency between local updates and global updates. Motivated by the issues of conventional SFL and AFL, we first propose a time-driven SFL (T-SFL) framework for heterogeneous systems. The core idea of T-SFL is that the server aggregates the models from different clients, each with varying numbers of iterations, at regular time intervals. To evaluate the learning performance of T-SFL, we provide an upper bound on the global loss function. Further, we optimize the aggregation weights to minimize the developed upper bound. Then, we develop a discriminative model selection (DMS) algorithm that removes local models from clients whose number of iterations falls below a predetermined threshold. In particular, this algorithm ensures that each client's aggregation weight accurately reflects its true contribution to the global model update, thereby improving the efficiency and robustness of the system. To validate the effectiveness of T-SFL with the DMS algorithm, we conduct extensive experiments using several popular datasets including MNIST, Cifar-10, Fashion-MNIST, and SVHN. The experimental results demonstrate that T-SFL with the DMS algorithm can reduce the latency of conventional SFL by 50\%, while achieving an average 3\% improvement in learning accuracy over state-of-the-art AFL algorithms.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Authors:
Zhuoyi Yang,
Heyang Jiang,
Wenyi Hong,
Jiayan Teng,
Wendi Zheng,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inferenc…
▽ More
Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inference process and handle global dependencies. Building on this module, we adopt the DiT structure for upsampling and develop an infinite super-resolution model capable of upsampling images of various shapes and resolutions. Comprehensive experiments show that our model achieves SOTA performance in generating ultra-high-resolution images in both machine and human evaluation. Compared to commonly used UNet structures, our model can save more than 5x memory when generating 4096*4096 images. The project URL is https://github.com/THUDM/Inf-DiT.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Stochastic thermodynamics of Brownian motion in a flowing fluid
Authors:
Jun Wu,
Mingnan Ding,
Xiangjun Xing
Abstract:
We study stochastic thermodynamics of over-damped Brownian motion in a flowing fluid. Unlike some previous works, we treat the effects of the flow field as a non-conservational driving force acting on the Brownian particle. This allows us to apply the theoretical formalism developed in a recent work for general non-conservative Langevin dynamics. We define heat and work both at the trajectory leve…
▽ More
We study stochastic thermodynamics of over-damped Brownian motion in a flowing fluid. Unlike some previous works, we treat the effects of the flow field as a non-conservational driving force acting on the Brownian particle. This allows us to apply the theoretical formalism developed in a recent work for general non-conservative Langevin dynamics. We define heat and work both at the trajectory level and at the ensemble level, and prove the second law of thermodynamics explicitly. The entropy production (EP) is decomposed into a housekeeping part and an excess part, both of which are non-negative at the ensemble level. Fluctuation theorems are derived for the housekeeping work, the excess work, and the total work, which are further verified using numerical simulations. A comparison between our theory and an earlier theory by Speck et. al. is also carried out.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Stochastic Thermodynamics of Micromagnetics
Authors:
Mingnan Ding,
Jun Wu,
Xiangjun Xing
Abstract:
In this work, we study the stochastic thermodynamics of micro-magnetic systems. We first formulate the stochastic dynamics of micro-magnetic systems by incorporating noises into Landau-Lifshitz (LL) equation, which describes the irreversible and deterministic dynamics of magnetic moments. The resulting stochastic Landau-Lifshitz (sLL) equation obeys detailed balance, which guarantees that, with th…
▽ More
In this work, we study the stochastic thermodynamics of micro-magnetic systems. We first formulate the stochastic dynamics of micro-magnetic systems by incorporating noises into Landau-Lifshitz (LL) equation, which describes the irreversible and deterministic dynamics of magnetic moments. The resulting stochastic Landau-Lifshitz (sLL) equation obeys detailed balance, which guarantees that, with the external field fixed, the system converges to thermodynamic equilibrium with vanishing entropy production and with non-vanishing probability current. We then discuss various thermodynamic variables both at the trajectory level and at the ensemble level, and further establish both the first and the second laws of thermodynamics. Finally, we establish fluctuation theorems, and verify them using numerical simulations.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Privacy at a Price: Exploring its Dual Impact on AI Fairness
Authors:
Mengmeng Yang,
Ming Ding,
Youyang Qu,
Wei Ni,
David Smith,
Thierry Rakotoarivelo
Abstract:
The worldwide adoption of machine learning (ML) and deep learning models, particularly in critical sectors, such as healthcare and finance, presents substantial challenges in maintaining individual privacy and fairness. These two elements are vital to a trustworthy environment for learning systems. While numerous studies have concentrated on protecting individual privacy through differential priva…
▽ More
The worldwide adoption of machine learning (ML) and deep learning models, particularly in critical sectors, such as healthcare and finance, presents substantial challenges in maintaining individual privacy and fairness. These two elements are vital to a trustworthy environment for learning systems. While numerous studies have concentrated on protecting individual privacy through differential privacy (DP) mechanisms, emerging research indicates that differential privacy in machine learning models can unequally impact separate demographic subgroups regarding prediction accuracy. This leads to a fairness concern, and manifests as biased performance. Although the prevailing view is that enhancing privacy intensifies fairness disparities, a smaller, yet significant, subset of research suggests the opposite view. In this article, with extensive evaluation results, we demonstrate that the impact of differential privacy on fairness is not monotonous. Instead, we observe that the accuracy disparity initially grows as more DP noise (enhanced privacy) is added to the ML process, but subsequently diminishes at higher privacy levels with even more noise. Moreover, implementing gradient clipping in the differentially private stochastic gradient descent ML method can mitigate the negative impact of DP noise on fairness. This mitigation is achieved by moderating the disparity growth through a lower clipping threshold.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Communication-Efficient Model Aggregation with Layer Divergence Feedback in Federated Learning
Authors:
Liwei Wang,
Jun Li,
Wen Chen,
Qingqing Wu,
Ming Ding
Abstract:
Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL f…
▽ More
Federated Learning (FL) facilitates collaborative machine learning by training models on local datasets, and subsequently aggregating these local models at a central server. However, the frequent exchange of model parameters between clients and the central server can result in significant communication overhead during the FL training process. To solve this problem, this paper proposes a novel FL framework, the Model Aggregation with Layer Divergence Feedback mechanism (FedLDF). Specifically, we calculate model divergence between the local model and the global model from the previous round. Then through model layer divergence feedback, the distinct layers of each client are uploaded and the amount of data transferred is reduced effectively. Moreover, the convergence bound reveals that the access ratio of clients has a positive correlation with model performance. Simulation results show that our algorithm uploads local models with reduced communication overhead while upholding a superior global model performance.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
RoadBEV: Road Surface Reconstruction in Bird's Eye View
Authors:
Tong Zhao,
Lei Yang,
Yichen Xie,
Mingyu Ding,
Masayoshi Tomizuka,
Yintao Wei
Abstract:
Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to mor…
▽ More
Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to more reliable and accurate reconstruction. This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively. The former directly fits elevation values based on voxel features queried from image view, while the latter efficiently recognizes road elevation patterns based on BEV volume representing discrepancy between left and right voxel features. Insightful analyses reveal their consistence and difference with perspective view. Experiments on real-world dataset verify the models' effectiveness and superiority. Elevation errors of RoadBEV-mono and RoadBEV-stereo achieve 1.83cm and 0.50cm, respectively. The estimation performance improves by 50\% in BEV based on monocular image. Our models are promising for practical applications, providing valuable references for vision-based BEV perception in autonomous driving. The code is released at https://github.com/ztsrxh/RoadBEV.
△ Less
Submitted 20 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
On the Optimal MMSE Channel Estimation for One-Bit Quantized MIMO Systems
Authors:
Minhua Ding,
Italo Atzeni,
Antti Tölli,
A. Lee Swindlehurst
Abstract:
This paper focuses on the minimum mean squared error (MMSE) channel estimator for multiple-input multiple-output (MIMO) systems with one-bit quantization at the receiver side. Despite its optimality and significance in estimation theory, the MMSE channel estimator has not been fully investigated in this context due to its general non-linearity and computational complexity. Instead, the typically s…
▽ More
This paper focuses on the minimum mean squared error (MMSE) channel estimator for multiple-input multiple-output (MIMO) systems with one-bit quantization at the receiver side. Despite its optimality and significance in estimation theory, the MMSE channel estimator has not been fully investigated in this context due to its general non-linearity and computational complexity. Instead, the typically suboptimal Bussgang linear MMSE (BLMMSE) estimator has been widely adopted. In this work, we develop a new framework to compute the MMSE channel estimator that hinges on computation of the orthant probability of the multivariate normal distribution. Based on this framework, we determine a necessary and sufficient condition for the BLMMSE channel estimator to be optimal and equivalent to the MMSE estimator. Under the assumption of specific channel correlation or pilot symbols, we further utilize the framework to derive analytical expressions for the MMSE channel estimator that are particularly convenient for computation when certain system dimensions become large, thereby enabling a comparison between the BLMMSE and MMSE channel estimators in these cases.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
A model for heating the super-hot corona in solar active regions
Authors:
Zekun Lu,
Feng Chen,
M. D. Ding,
Can Wang,
Yu Dai,
Xin Cheng
Abstract:
What physical mechanisms heat the outer solar or stellar atmosphere to million-Kelvin temperatures is a fundamental but long-standing open question. In particular, the solar corona in active region cores contains an even hotter component reaching ten million Kelvin, manifesting as persistent coronal loops in extreme ultraviolet and soft X-ray images, which imposes a more stringent energy budget. H…
▽ More
What physical mechanisms heat the outer solar or stellar atmosphere to million-Kelvin temperatures is a fundamental but long-standing open question. In particular, the solar corona in active region cores contains an even hotter component reaching ten million Kelvin, manifesting as persistent coronal loops in extreme ultraviolet and soft X-ray images, which imposes a more stringent energy budget. Here, we present a self-consistent coronal heating model using a state-of-the-art three-dimensional radiative magnetohydrodynamics simulation. We find that the continuous magnetic flux emergence in active regions keeps driving magnetic reconnections that release energy impulsively but, on time average, persistently. As a result, numerous sub-structures are heated to ten million Kelvin and then evolve independently, which collectively form long-lived and stable coronal loops as in observations. This provides a heating model explaining the origin of the super-hot coronal plasma and the persistence of hot coronal loops in emerging active regions.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Mahonian-Stirling statistics for partial permutations
Authors:
Ming-Jian Ding,
Jiang Zeng
Abstract:
Recently Cheng et al. (Adv. in Appl. Math. 143 (2023) 102451) generalized the inversion number to partial permutations, which are also known as Laguerre digraphs, and asked for a suitable analogue of MacMahon's major index. We provide such a major index, namely, the corresponding maj and inv statistics are equidistributed, and exhibit a Haglund-Remmel-Wilson type identity. We then interpret some J…
▽ More
Recently Cheng et al. (Adv. in Appl. Math. 143 (2023) 102451) generalized the inversion number to partial permutations, which are also known as Laguerre digraphs, and asked for a suitable analogue of MacMahon's major index. We provide such a major index, namely, the corresponding maj and inv statistics are equidistributed, and exhibit a Haglund-Remmel-Wilson type identity. We then interpret some Jacobi-Rogers polynomials in terms of Laguerre digraphs generalizing Deb and Sokal's alternating Laguerre digraph interpretation of some special Jacobi-Rogers polynomials.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
The Frontier of Data Erasure: Machine Unlearning for Large Language Models
Authors:
Youyang Qu,
Ming Ding,
Nan Sun,
Kanchana Thilakarathna,
Tianqing Zhu,
Dusit Niyato
Abstract:
Large Language Models (LLMs) are foundational to AI advancements, facilitating applications like predictive text generation. Nonetheless, they pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information from their vast datasets. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns, offering techniques for LLMs to selectively disc…
▽ More
Large Language Models (LLMs) are foundational to AI advancements, facilitating applications like predictive text generation. Nonetheless, they pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information from their vast datasets. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns, offering techniques for LLMs to selectively discard certain data. This paper reviews the latest in machine unlearning for LLMs, introducing methods for the targeted forgetting of information to address privacy, ethical, and legal challenges without necessitating full model retraining. It divides existing research into unlearning from unstructured/textual data and structured/classification data, showcasing the effectiveness of these approaches in removing specific data while maintaining model efficacy. Highlighting the practicality of machine unlearning, this analysis also points out the hurdles in preserving model integrity, avoiding excessive or insufficient data removal, and ensuring consistent outputs, underlining the role of machine unlearning in advancing responsible, ethical AI.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
An efficient asymptotic DC method for sparse and low-rank matrix recovery
Authors:
Mingcai Ding,
Xiaoliang Song,
Bo Yu
Abstract:
The optimization problem of sparse and low-rank matrix recovery is considered, which involves a least squares problem with a rank constraint and a cardinality constraint. To overcome the challenges posed by these constraints, an asymptotic difference-of-convex (ADC) method that employs a Moreau smoothing approach and an exact penalty approach is proposed to transform this problem into a DC program…
▽ More
The optimization problem of sparse and low-rank matrix recovery is considered, which involves a least squares problem with a rank constraint and a cardinality constraint. To overcome the challenges posed by these constraints, an asymptotic difference-of-convex (ADC) method that employs a Moreau smoothing approach and an exact penalty approach is proposed to transform this problem into a DC programming format gradually. To solve the gained DC programming, by making full use of its DC structure, an efficient inexact DC algorithm with sieving strategy (siDCA) is introduced. The subproblem of siDCA is solved by an efficient dual-based semismooth Newton method. The convergence of the solution sequence generated by siDCA is proved. To illustrate the effectiveness of ADC-siDCA, matrix recovery experiments on nonnegative and positive semidefinite matrices. The numerical results are compared with those obtained using a successive DC approximation minimization method and a penalty proximal alternating linearized minimization approach. The outcome of the comparison indicates that ADC-siDCA surpasses the other two methods in terms of efficiency and recovery error. Additionally, numerical experiments on sparse phase retrieval demonstrate that ADC-siDCA is a valuable tool for recovering sparse and low-rank Hermitian matrices.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Sun-as-a-star Study of an X-class Solar Flare with Spectroscopic Observations of CHASE
Authors:
Y. L. Ma,
Q. H. Lao,
X. Cheng,
B. T. Wang,
Z. H. Zhao,
S. H. Rao,
C. Li,
M. D. Ding
Abstract:
Sun-as-a-star spectroscopic characteristics of solar flares can be used as a benchmark for the detection and analyses of stellar flares. Here, we study the Sun-as-a-star properties of an X1.0 solar flare using high-resolution spectroscopic data obtained by the Chinese $\mathrm{H} α$ Solar Explorer (CHASE). A noise reduction algorithm based on discrete Fourier transformation is first employed to en…
▽ More
Sun-as-a-star spectroscopic characteristics of solar flares can be used as a benchmark for the detection and analyses of stellar flares. Here, we study the Sun-as-a-star properties of an X1.0 solar flare using high-resolution spectroscopic data obtained by the Chinese $\mathrm{H} α$ Solar Explorer (CHASE). A noise reduction algorithm based on discrete Fourier transformation is first employed to enhance the signal-to-noise ratio of the space-integral $\mathrm{H} α$ spectrum with a focus on its typical characteristics. For the flare of interest, we find that the average $\mathrm{H} α$ profile displays a strong emission at the line center and an obvious line broadening. It also presents a clear red asymmetry, corresponding to a redshift velocity of around $50 \ \mathrm{km \ s^{-1}}$ that slightly decreases with time, consistent with previous results. Furthermore, we study how the size of the space-integral region affects the characteristics of the flare Sun-as-a-star $\mathrm{H} α$ profile. It is found that although the redshift velocity calculated from the $\mathrm{H} α$ profile remains unchanged, the detectability of the characteristics weakens as the space-integral region becomes large. An upper limit for the size of the target region where the red asymmetry is detectable is estimated. It is also found that the intensity in $\mathrm{H} α$ profiles, measured by the equivalent widths of the spectra, are significantly underestimated if the $\mathrm{H} α$ spectra are further averaged in the time domain.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Q-SLAM: Quadric Representations for Monocular SLAM
Authors:
Chensheng Peng,
Chenfeng Xu,
Yue Wang,
Mingyu Ding,
Heng Yang,
Masayoshi Tomizuka,
Kurt Keutzer,
Marco Pavone,
Wei Zhan
Abstract:
Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirement…
▽ More
Monocular SLAM has long grappled with the challenge of accurately modeling 3D geometries. Recent advances in Neural Radiance Fields (NeRF)-based monocular SLAM have shown promise, yet these methods typically focus on novel view synthesis rather than precise 3D geometry modeling. This focus results in a significant disconnect between NeRF applications, i.e., novel-view synthesis and the requirements of SLAM. We identify that the gap results from the volumetric representations used in NeRF, which are often dense and noisy. In this study, we propose a novel approach that reimagines volumetric representations through the lens of quadric forms. We posit that most scene components can be effectively represented as quadric planes. Leveraging this assumption, we reshape the volumetric representations with million of cubes by several quadric planes, which leads to more accurate and efficient modeling of 3D scenes in SLAM contexts. Our method involves two key steps: First, we use the quadric assumption to enhance coarse depth estimations obtained from tracking modules, e.g., Droid-SLAM. This step alone significantly improves depth estimation accuracy. Second, in the subsequent mapping phase, we diverge from previous NeRF-based SLAM methods that distribute sampling points across the entire volume space. Instead, we concentrate sampling points around quadric planes and aggregate them using a novel quadric-decomposed Transformer. Additionally, we introduce an end-to-end joint optimization strategy that synchronizes pose estimation with 3D reconstruction.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
DrPlanner: Diagnosis and Repair of Motion Planners Using Large Language Models
Authors:
Yuanfei Lin,
Chenran Li,
Mingyu Ding,
Masayoshi Tomizuka,
Wei Zhan,
Matthias Althoff
Abstract:
Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners us…
▽ More
Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners using large language models. Initially, we generate a structured description of the planner and its planned trajectories from both natural and programming languages. Leveraging the profound capabilities of large language models in addressing reasoning challenges, our framework returns repaired planners with detailed diagnostic descriptions. Furthermore, the framework advances iteratively with continuous feedback from the evaluation of the repaired outcomes. Our approach is validated using search-based motion planners; experimental results highlight the need of demonstrations in the prompt and the ability of our framework in identifying and rectifying elusive issues effectively.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Authors:
Wendi Zheng,
Jiayan Teng,
Zhuoyi Yang,
Weihan Wang,
Jidong Chen,
Xiaotao Gu,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the…
▽ More
Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the first model implementing relay diffusion in the realm of text-to-image generation, executing the task by first creating low-resolution images and subsequently applying relay-based super-resolution. This methodology not only results in competitive text-to-image outputs but also greatly reduces both training and inference costs. Our experimental results demonstrate that CogView3 outperforms SDXL, the current state-of-the-art open-source text-to-image diffusion model, by 77.0\% in human evaluations, all while requiring only about 1/2 of the inference time. The distilled variant of CogView3 achieves comparable performance while only utilizing 1/10 of the inference time by SDXL.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Generalized Coronal Loop Scaling Laws and Their Implication for Turbulence in Solar Active Region Loops
Authors:
Y. Dai,
J. J. Xiang,
M. D. Ding
Abstract:
Recent coronal loop modeling has emphasized the importance of combining both Coulomb collisions and turbulent scattering to characterize field-aligned thermal conduction, which invokes a hybrid loop model. In this work we generalize the hybrid model by incorporating nonuniform heating and cross section that are both formulated by a power-law function of temperature. Based on the hybrid model solut…
▽ More
Recent coronal loop modeling has emphasized the importance of combining both Coulomb collisions and turbulent scattering to characterize field-aligned thermal conduction, which invokes a hybrid loop model. In this work we generalize the hybrid model by incorporating nonuniform heating and cross section that are both formulated by a power-law function of temperature. Based on the hybrid model solutions, we construct scaling laws that relate loop-top temperature ($T_a$) and heating rate ($H_a$) to other loop parameters. It is found that the loop-top properties for turbulent loops are additionally power-law functions of turbulent mean free path ($λ_T$), with the functional forms varying from situation to situation that depends on the specification of the heating and/or areal parameters. More importantly, both a sufficiently footpoint-concentrated heating and a cross-sectional expansion with height can effectively weaken (strengthen) the negative (positive) power-law dependence of $T_a$ ($H_a$) on $λ_T$. The reason lies in a notable reduction of heat flux by footpoint heating and/or cross-sectional expansion in the turbulence-dominated coronal part, where turbulent scattering introduces a much weaker dependence of the conduction coefficient on temperature. In this region, therefore, the reduction of the heat flux predominately relies on a backward flattening of the temperature gradient. Through numerical modeling that incorporates more realistic conditions, this scenario is further consolidated. Our results have important implication for solar active region (AR) loops. With the factors of nonuniform heating and cross section taken into account, AR loops can bear relatively stronger turbulence while still keeping a physically reasonable temperature for nonflaring loops.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Examining the critical phenomenon of pion parton distribution: Insights from the Moment Problem
Authors:
Xiaobin Wang,
Zexin Wu,
Minghui Ding,
Lei Chang
Abstract:
A recent study by Wang {\it et al.}(arXiv:2309.01417) proposed a novel connection between the nature of the parton distribution function (PDF) and the characteristics of its moments. In this study, we apply these findings to analyze the evolution of the pion valence quark PDF, garnering valuable qualitative insights. Firstly, we validate the non-negativity and continuity of the PDF across a wide r…
▽ More
A recent study by Wang {\it et al.}(arXiv:2309.01417) proposed a novel connection between the nature of the parton distribution function (PDF) and the characteristics of its moments. In this study, we apply these findings to analyze the evolution of the pion valence quark PDF, garnering valuable qualitative insights. Firstly, we validate the non-negativity and continuity of the PDF across a wide range of scales, indicating the logical consistency of our chosen evolution scheme. Subsequently, we examine the unimodality of both the PDF and its transformed counterpart, the xPDF, i.e., the parton distribution function multiplied by the momentum fraction. We observe a smooth evolution of the peak position of the xPDF towards the small-$x$ region with increasing scale, while intriguingly, the PDF undergoes a phase of bimodal competition as the energy scale evolves.
△ Less
Submitted 7 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Lithium Abundances from the LAMOST Med-Resolution Survey Data Release 9
Authors:
Ming-Yi Ding,
Jian-Rong Shi,
Hong-liang Yan,
Chun-Qian Li,
Qi Gao,
Tian-Yi Chen,
Jing-Hua Zhang,
Shuai Liu,
Xiao-Jin Xie,
Yao-Jia Tang,
Ze-Ming Zhou,
Jiang-Tao Wang
Abstract:
Lithium is a fragile but crucial chemical element in the universe, exhibits interesting and complex behaviors. Thanks to the massive spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution survey (MRS), we can investigate the lithium abundances in a large and diverse sample of stars, which could bring vital help to study the origin and evolu…
▽ More
Lithium is a fragile but crucial chemical element in the universe, exhibits interesting and complex behaviors. Thanks to the massive spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution survey (MRS), we can investigate the lithium abundances in a large and diverse sample of stars, which could bring vital help to study the origin and evolution of lithium. In this work, we use the Li 6,707.8 Å line to derive the lithium abundance through a template-matching method. A catalog of precise lithium abundance is presented for 795,384 spectra corresponding to 455,752 stars from the LAMOST MRS Data Release (DR) 9. Comparing our results with those of external high-resolution references we find a good consistency with a typical deviation of σ A(Li) ~ 0.2 dex. We also analyze the internal errors using stars that have multiple LAMOST MRS observations, which will reach as low as 0.1 dex when the signal-to-noise ratio (S/N) of the spectra > 20. Besides, our result indicates that a small fraction of giant stars still exhibit surprisingly high amount of lithium contents, and 967 stars are identified as Li-rich giants with A(Li) > 1.5 dex, accounting for ~ 2.6% of our samples. If one takes into account the fact that nearly all stars deplete lithium during the main sequence, then the fraction of Li-rich stars may exceed 2.6% much. This new catalog covers a wide range of stellar evolutionary stages from pre-main sequence to giants, and will provide help to the further study of the chemical evolution of lithium.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Does Negative Sampling Matter? A Review with Insights into its Theory and Applications
Authors:
Zhen Yang,
Ming Ding,
Tinglin Huang,
Yukuo Cen,
Junshuai Song,
Bin Xu,
Yuxiao Dong,
Jie Tang
Abstract:
Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling me…
▽ More
Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling methods? In what fields is it applied? Addressing these questions, we propose a general framework that leverages negative sampling. Delving into the history of negative sampling, we trace the development of negative sampling through five evolutionary paths. We dissect and categorize the strategies used to select negative sample candidates, detailing global, local, mini-batch, hop, and memory-based approaches. Our review categorizes current negative sampling methods into five types: static, hard, GAN-based, Auxiliary-based, and In-batch methods, providing a clear structure for understanding negative sampling. Beyond detailed categorization, we highlight the application of negative sampling in various areas, offering insights into its practical benefits. Finally, we briefly discuss open problems and future directions for negative sampling.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models
Authors:
Dingkun Guo,
Yuqi Xiang,
Shuqi Zhao,
Xinghao Zhu,
Masayoshi Tomizuka,
Mingyu Ding,
Wei Zhan
Abstract:
Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for o…
▽ More
Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before.
This work delves into infusing such physical commonsense reasoning into robotic manipulation. We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds, seamlessly integrated through a bridge module. The language modality exhibits robust reasoning capabilities concerning the impacts of diverse physical properties on grasping, while the 3D modality comprehends object shapes and parts. With these two capabilities, PhyGrasp is able to accurately assess the physical properties of object parts and determine optimal grasping poses. Additionally, the model's language comprehension enables human instruction interpretation, generating grasping poses that align with human preferences. To train PhyGrasp, we construct a dataset PhyPartNet with 195K object instances with varying physical properties and human preferences, alongside their corresponding language descriptions. Extensive experiments conducted in the simulation and on the real robots demonstrate that PhyGrasp achieves state-of-the-art performance, particularly in long-tailed cases, e.g., about 10% improvement in success rate over GraspNet. Project page: https://sites.google.com/view/phygrasp
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Unveiling the Initiation Route of Coronal Mass Ejections through their Slow Rise Phase
Authors:
Chen Xing,
Guillaume Aulanier,
Xin Cheng,
Chun Xia,
Mingde Ding
Abstract:
Understanding the early evolution of coronal mass ejections (CMEs), in particular their initiation, is the key to forecasting solar eruptions and induced disastrous space weather. Although many initiation mechanisms have been proposed, a full understanding of CME initiation, which is identified as a slow rise of CME progenitors in kinematics before the impulsive acceleration, remains elusive. Here…
▽ More
Understanding the early evolution of coronal mass ejections (CMEs), in particular their initiation, is the key to forecasting solar eruptions and induced disastrous space weather. Although many initiation mechanisms have been proposed, a full understanding of CME initiation, which is identified as a slow rise of CME progenitors in kinematics before the impulsive acceleration, remains elusive. Here, with a state-of-the-art thermal-magnetohydrodynamics simulation, we determine a complete CME initiation route in which multiple mainstream mechanisms occur in sequence yet are tightly coupled. The slow rise is first triggered and driven by the developing hyperbolic flux tube (HFT) reconnection. Subsequently, the slow rise continues as driven by the coupling of the HFT reconnection and the early development of torus instability. The end of the slow rise, i.e., the onset of the impulsive acceleration, is induced by the start of the fast magnetic reconnection coupled with the torus instability. These results unveil that the CME initiation is a complicated process involving multiple physical mechanisms, thus being hardly resolved by a single initiation mechanism.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Authors:
Yao Mu,
Junting Chen,
Qinglong Zhang,
Shoufa Chen,
Qiaojun Yu,
Chongjian Ge,
Runjian Chen,
Zhixuan Liang,
Mengkang Hu,
Chaofan Tao,
Peize Sun,
Haibao Yu,
Chao Yang,
Wenqi Shao,
Wenhai Wang,
Jifeng Dai,
Yu Qiao,
Mingyu Ding,
Ping Luo
Abstract:
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various…
▽ More
Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various scenarios. In this paper, we propose a tree-structured multimodal code generation framework for generalized robotic behavior synthesis, termed RoboCodeX. RoboCodeX decomposes high-level human instructions into multiple object-centric manipulation units consisting of physical preferences such as affordance and safety constraints, and applies code generation to introduce generalization ability across various robotics platforms. To further enhance the capability to map conceptual and perceptual understanding into control commands, a specialized multimodal reasoning dataset is collected for pre-training and an iterative self-updating methodology is introduced for supervised fine-tuning. Extensive experiments demonstrate that RoboCodeX achieves state-of-the-art performance in both simulators and real robots on four different kinds of manipulation tasks and one navigation task.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Authors:
Junting Chen,
Yao Mu,
Qiaojun Yu,
Tianming Wei,
Silang Wu,
Zhecheng Yuan,
Zhixuan Liang,
Chao Yang,
Kaipeng Zhang,
Wenqi Shao,
Yu Qiao,
Huazhe Xu,
Mingyu Ding,
Ping Luo
Abstract:
Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental c…
▽ More
Rapid progress in high-level task planning and code generation for open-world robot manipulation has been witnessed in Embodied AI. However, previous studies put much effort into general common sense reasoning and task planning capabilities of large-scale language or multi-modal models, relatively little effort on ensuring the deployability of generated code on real robots, and other fundamental components of autonomous robot systems including robot perception, motion planning, and control. To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language. The RobotScript platform addresses this gap by emphasizing the unified interface with both simulation and real robots, based on abstraction from the Robot Operating System (ROS), ensuring syntax compliance and simulation validation with Gazebo. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms, and multiple grippers. Additionally, our benchmark assesses reasoning abilities for physical space and constraints, highlighting the differences between GPT-3.5, GPT-4, and Gemini in handling complex physical interactions. Finally, we present a thorough evaluation on the whole system, exploring how each module in the pipeline: code generation, perception, motion planning, and even object geometric properties, impact the overall performance of the system.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Developing an Automated Detection, Tracking and Analysis Method for Solar Filaments Observed by CHASE via Machine Learning
Authors:
Z. Zheng,
Q. Hao,
Y. Qiu,
J. Hong,
C. Li,
M. D. Ding
Abstract:
Studies on the dynamics of solar filaments have significant implications for understanding their formation, evolution, and eruption, which are of great importance for space weather warning and forecasting. The H$α$ Imaging Spectrograph (HIS) onboard the recently launched Chinese H$α$ Solar Explorer (CHASE) can provide full-disk solar H$α$ spectroscopic observations, which bring us an opportunity t…
▽ More
Studies on the dynamics of solar filaments have significant implications for understanding their formation, evolution, and eruption, which are of great importance for space weather warning and forecasting. The H$α$ Imaging Spectrograph (HIS) onboard the recently launched Chinese H$α$ Solar Explorer (CHASE) can provide full-disk solar H$α$ spectroscopic observations, which bring us an opportunity to systematically explore and analyze the plasma dynamics of filaments. The dramatically increased observation data require automate processing and analysis which are impossible if dealt with manually. In this paper, we utilize the U-Net model to identify filaments and implement the Channel and Spatial Reliability Tracking (CSRT) algorithm for automated filament tracking. In addition, we use the cloud model to invert the line-of-sight velocity of filaments and employ the graph theory algorithm to extract the filament spine, which can advance our understanding of the dynamics of filaments. The favorable test performance confirms the validity of our method, which will be implemented in the following statistical analyses of filament features and dynamics of CHASE/HIS observations.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.