Skip to main content

Showing 1–50 of 4,840 results for author: Wang, H

  1. arXiv:2407.13566  [pdf

    cs.CY cs.SI eess.SY

    Decentralised Governance for Autonomous Cyber-Physical Systems

    Authors: Kelsie Nabben, Hongyang Wang, Michael Zargham

    Abstract: This paper examines the potential for Cyber-Physical Systems (CPS) to be governed in a decentralised manner, whereby blockchain-based infrastructure facilitates the communication between digital and physical domains through self-governing and self-organising principles. Decentralised governance paradigms that integrate computation in physical domains (such as 'Decentralised Autonomous Organisation… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Report number: Dawo/2024/20

  2. arXiv:2407.13120  [pdf, other

    cs.CV math.OC

    HPPP: Halpern-type Preconditioned Proximal Point Algorithms and Applications to Image Restoration

    Authors: Shuchang Zhang, Hui Zhang, Hongxia Wang

    Abstract: Preconditioned Proximal Point (PPP) algorithms provide a unified framework for splitting methods in image restoration. Recent advancements with RED (Regularization by Denoising) and PnP (Plug-and-Play) priors have achieved state-of-the-art performance in this domain, emphasizing the need for a meaningful particular solution. However, degenerate PPP algorithms typically exhibit weak convergence in… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.13094  [pdf, other

    cs.CV

    Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

    Authors: Wufei Ma, Kai Li, Zhongshi Jiang, Moustafa Meshry, Qihao Liu, Huiyu Wang, Christian Häne, Alan Yuille

    Abstract: Recent video-text foundation models have demonstrated strong performance on a wide variety of downstream video understanding tasks. Can these video-text models genuinely understand the contents of natural videos? Standard video-text evaluations could be misleading as many questions can be inferred merely from the objects and contexts in a single frame or biases inherent in the datasets. In this pa… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://feint6k.github.io

  4. arXiv:2407.12962  [pdf, other

    cs.RO

    NAS: N-step computation of All Solutions to the footstep planning problem

    Authors: Jiayi Wang, Saeid Samadi, Hefan Wang, Pierre Fernbach, Olivier Stasse, Sethu Vijayakumar, Steve Tonneau

    Abstract: How many ways are there to climb a staircase in a given number of steps? Infinitely many, if we focus on the continuous aspect of the problem. A finite, possibly large number if we consider the discrete aspect, i.e. on which surface which effectors are going to step and in what order. We introduce NAS, an algorithm that considers both aspects simultaneously and computes all the possible solutions… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to Humanoids 2024

  5. arXiv:2407.12883  [pdf, other

    cs.CL cs.AI cs.IR

    BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

    Authors: Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu

    Abstract: Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 50 pages

  6. arXiv:2407.12871  [pdf, other

    cs.CL cs.AI cs.LG

    MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

    Authors: Xiaohan Wang, Dian Li, Yilin Zhao, Sinbadliu, Hui Wang

    Abstract: Utilizing complex tools with Large Language Models (LLMs) is a critical component for grounding AI agents in various real-world scenarios. The core challenge of manipulating tools lies in understanding their usage and functionality. The prevailing approach involves few-shot prompting with demonstrations or fine-tuning on expert trajectories. However, for complex tools and tasks, mere in-context de… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  7. arXiv:2407.12821  [pdf, other

    cs.CL cs.AI cs.LG

    AutoFlow: Automated Workflow Generation for Large Language Model Agents

    Authors: Zelong Li, Shuyuan Xu, Kai Mei, Wenyue Hua, Balaji Rama, Om Raheja, Hao Wang, He Zhu, Yongfeng Zhang

    Abstract: Recent advancements in Large Language Models (LLMs) have shown significant progress in understanding complex natural language. One important application of LLM is LLM-based AI Agent, which leverages the ability of LLM as well as external tools for complex-task solving. To make sure LLM Agents follow an effective and reliable procedure to solve the given task, manually designed workflows are usuall… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Open source code available at https://github.com/agiresearch/AutoFlow

  8. arXiv:2407.12815  [pdf, ps, other

    cs.CL cs.LG

    SMLT-MUGC: Small, Medium, and Large Texts -- Machine versus User-Generated Content Detection and Comparison

    Authors: Anjali Rawal, Hui Wang, Youjia Zheng, Yu-Hsuan Lin, Shanu Sushmita

    Abstract: Large language models (LLMs) have gained significant attention due to their ability to mimic human language. Identifying texts generated by LLMs is crucial for understanding their capabilities and mitigating potential consequences. This paper analyzes datasets of varying text lengths: small, medium, and large. We compare the performance of machine learning algorithms on four datasets: (1) small (t… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  9. arXiv:2407.12727  [pdf, other

    cs.CV

    NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model

    Authors: Zhongqun Zhang, Hengfei Wang, Ziwei Yu, Yihua Cheng, Angela Yao, Hyung Jin Chang

    Abstract: Modeling the physical contacts between the hand and object is standard for refining inaccurate hand poses and generating novel human grasp in 3D hand-object reconstruction. However, existing methods rely on geometric constraints that cannot be specified or controlled. This paper introduces a novel task of controllable 3D hand-object contact modeling with natural language descriptions. Challenges i… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  10. arXiv:2407.12532  [pdf, other

    cs.CL cs.AI

    Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

    Authors: Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framewo… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  11. arXiv:2407.12522  [pdf, other

    cs.CL cs.AI

    Struct-X: Enhancing Large Language Models Reasoning with Structured Data

    Authors: Xiaoyu Tan, Haoyu Wang, Xihe Qiu, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-refl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  12. arXiv:2407.12443  [pdf, other

    cs.LG cs.CV

    Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective

    Authors: Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin

    Abstract: Adversarial training (AT) has become an effective defense method against adversarial examples (AEs) and it is typically framed as a bi-level optimization problem. Among various AT methods, fast AT (FAT), which employs a single-step attack strategy to guide the training process, can achieve good robustness against adversarial attacks at a low cost. However, FAT methods suffer from the catastrophic… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  13. arXiv:2407.12428  [pdf, other

    cs.SE

    Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

    Authors: Haipeng Wang, Zhengyuan Wei, Qilin Zhou, Wing-Kwong Chan

    Abstract: In the testing-retraining pipeline for enhancing the robustness property of deep learning (DL) models, many state-of-the-art robustness-oriented fuzzing techniques are metric-oriented. The pipeline generates adversarial examples as test cases via such a DL testing technique and retrains the DL model under test with test suites that contain these test cases. On the one hand, the strategies of these… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: The official version of this paper is to appear in ACM Transactions on Software Engineering and Methodology (accepted in July 2024)

  14. arXiv:2407.12271  [pdf, other

    cs.CV eess.IV

    RBAD: A Dataset and Benchmark for Retinal Vessels Branching Angle Detection

    Authors: Hao Wang, Wenhui Zhu, Jiayou Qin, Xin Li, Oana Dumitrascu, Xiwen Chen, Peijie Qiu, Abolfazl Razi

    Abstract: Detecting retinal image analysis, particularly the geometrical features of branching points, plays an essential role in diagnosing eye diseases. However, existing methods used for this purpose often are coarse-level and lack fine-grained analysis for efficient annotation. To mitigate these issues, this paper proposes a novel method for detecting retinal branching angles using a self-configured ima… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2407.11998  [pdf, other

    cs.HC

    Custom Cloth Creation and Virtual Try-on for Everyone

    Authors: Pei Chen, Heng Wang, Sainan Sun, Zhiyuan Chen, Zhenkun Liu, Shuhua Cao, Li Yang, Minghui Yang

    Abstract: This demo showcases a simple tool that utilizes AIGC technology, enabling both professional designers and regular users to easily customize clothing for their digital avatars. Customization options include changing clothing colors, textures, logos, and patterns. Compared with traditional 3D modeling processes, our approach significantly enhances efficiency and interactivity and reduces production… ▽ More

    Submitted 13 June, 2024; originally announced July 2024.

  16. arXiv:2407.11921  [pdf, other

    cs.CV cs.CR

    IPA-NeRF: Illusory Poisoning Attack Against Neural Radiance Fields

    Authors: Wenxiang Jiang, Hanwei Zhang, Shuo Zhao, Zhongwen Guo, Hao Wang

    Abstract: Neural Radiance Field (NeRF) represents a significant advancement in computer vision, offering implicit neural network-based scene representation and novel view synthesis capabilities. Its applications span diverse fields including robotics, urban mapping, autonomous navigation, virtual reality/augmented reality, etc., some of which are considered high-risk AI applications. However, despite its wi… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.11784  [pdf, other

    cs.AI cs.CV cs.LG

    Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development

    Authors: Daoyuan Chen, Haibin Wang, Yilun Huang, Ce Ge, Yaliang Li, Bolin Ding, Jingren Zhou

    Abstract: The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically isolated paths of model-centric and data-centric developments, leading to suboptimal outcomes and inefficient resource utilization. In response, we pre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 26 pages, 9 figures, 5 tables

  18. arXiv:2407.11717  [pdf, other

    cs.CV

    Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

    Authors: Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao, Bo Zheng

    Abstract: Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantization, but completely overlook the data-perspective… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. The first two authors share the same contribution. arXiv admin note: substantial text overlap with arXiv:2312.07408

  19. arXiv:2407.11677  [pdf, other

    cs.CV

    Video-Language Alignment Pre-training via Spatio-Temporal Graph Transformer

    Authors: Shi-Xue Zhang, Hongfa Wang, Xiaobin Zhu, Weibo Gu, Tianjin Zhang, Chun Yang, Wei Liu, Xu-Cheng Yin

    Abstract: Video-language alignment is a crucial multi-modal task that benefits various downstream applications, e.g., video-text retrieval and video question answering. Existing methods either utilize multi-modal information in video-text pairs or apply global and local alignment techniques to promote alignment precision. However, these methods often fail to fully explore the spatio-temporal relationships a… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: under review

  20. arXiv:2407.11551  [pdf

    cs.RO

    Human-Machine Shared Control Approach for the Takeover of Cooperative Adaptive Cruise Control

    Authors: Haoran Wang, Zhenning Li, Arno Eichberger, Jia Hu

    Abstract: Cooperative Adaptive Cruise Control (CACC) often requires human takeover for tasks such as exiting a freeway. Direct human takeover can pose significant risks, especially given the close-following strategy employed by CACC, which might cause drivers to feel unsafe and execute hard braking, potentially leading to collisions. This research aims to develop a CACC takeover controller that ensures a sm… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  21. arXiv:2407.11501  [pdf, other

    cs.LG cs.AI

    Diff-MTS: Temporal-Augmented Conditional Diffusion-based AIGC for Industrial Time Series Towards the Large Model Era

    Authors: Lei Ren, Haiteng Wang, Yuanjun Laili

    Abstract: Industrial Multivariate Time Series (MTS) is a critical view of the industrial field for people to understand the state of machines. However, due to data collection difficulty and privacy concerns, available data for building industrial intelligence and industrial large models is far from sufficient. Therefore, industrial time series data generation is of great importance. Existing research usuall… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages,4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2407.11480  [pdf, other

    cs.LG cs.AI

    AIGC for Industrial Time Series: From Deep Generative Models to Large Generative Models

    Authors: Lei Ren, Haiteng Wang, Yang Tang, Chunhua Yang

    Abstract: With the remarkable success of generative models like ChatGPT, Artificial Intelligence Generated Content (AIGC) is undergoing explosive development. Not limited to text and images, generative models can generate industrial time series data, addressing challenges such as the difficulty of data collection and data annotation. Due to their outstanding generation ability, they have been widely used in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 17 pages, 4 figures.This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  23. arXiv:2407.11325  [pdf, other

    cs.CV

    VISA: Reasoning Video Object Segmentation via Large Language Models

    Authors: Cilin Yan, Haochen Wang, Shilin Yan, Xiaolong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves

    Abstract: Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implic… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  24. arXiv:2407.11084  [pdf, other

    eess.IV cs.CV

    A Survey of Distance-Based Vessel Trajectory Clustering: Data Pre-processing, Methodologies, Applications, and Experimental Evaluation

    Authors: Maohan Liang, Ryan Wen Liu, Ruobin Gao, Zhe Xiao, Xiaocai Zhang, Hua Wang

    Abstract: Vessel trajectory clustering, a crucial component of the maritime intelligent transportation systems, provides valuable insights for applications such as anomaly detection and trajectory prediction. This paper presents a comprehensive survey of the most prevalent distance-based vessel trajectory clustering methods, which encompass two main steps: trajectory similarity measurement and clustering. I… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  25. arXiv:2407.10990  [pdf

    cs.CL cs.AI

    MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

    Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

    Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 25 pages.4 figures

  26. arXiv:2407.10969  [pdf, other

    cs.CL cs.LG

    Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

    Authors: Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei

    Abstract: We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. The key results from this work are, (1) Q-Sparse… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  27. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  28. arXiv:2407.10756  [pdf, other

    cs.CV

    GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation

    Authors: Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang

    Abstract: In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most c… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 accepted

  29. arXiv:2407.10707  [pdf, other

    cs.CV

    Interactive Rendering of Relightable and Animatable Gaussian Avatars

    Authors: Youyi Zhan, Tianjia Shao, He Wang, Yin Yang, Kun Zhou

    Abstract: Creating relightable and animatable avatars from multi-view or monocular videos is a challenging task for digital human creation and virtual reality applications. Previous methods rely on neural radiance fields or ray tracing, resulting in slow training and rendering processes. By utilizing Gaussian Splatting, we propose a simple and efficient method to decouple body materials and lighting from sp… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  30. arXiv:2407.10416  [pdf, other

    cs.AR

    SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

    Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

    Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  31. arXiv:2407.10366  [pdf, other

    cs.CV cs.AI cs.LG

    Accessing Vision Foundation Models at ImageNet-level Costs

    Authors: Yitian Zhang, Xu Ma, Yue Bai, Huan Wang, Yun Fu

    Abstract: Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  32. arXiv:2407.10327  [pdf, other

    cs.LG cs.AI cs.CV

    Learning Unlabeled Clients Divergence via Anchor Model Aggregation for Federated Semi-supervised Learning

    Authors: Marawan Elbatel, Hualiang Wang, Jixiang Chen, Hao Wang, Xiaomeng Li

    Abstract: Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models fro… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  33. arXiv:2407.10325  [pdf, other

    eess.IV cs.CV

    Light Field Compression Based on Implicit Neural Representation

    Authors: Henan Wang, Hanxin Zhu, Zhibo Chen

    Abstract: Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field comp… ▽ More

    Submitted 7 May, 2024; originally announced July 2024.

    Comments: PCS2022

  34. arXiv:2407.10205  [pdf, other

    quant-ph cs.ET math.CO

    Parallel Ising Annealer via Gradient-based Hamiltonian Monte Carlo

    Authors: Hao Wang, Zixuan Liu, Zhixin Xie, Langyu Li, Zibo Miao, Wei Cui, Yu Pan

    Abstract: Ising annealer is a promising quantum-inspired computing architecture for combinatorial optimization problems. In this paper, we introduce an Ising annealer based on the Hamiltonian Monte Carlo, which updates the variables of all dimensions in parallel. The main innovation is the fusion of an approximate gradient-based approach into the Ising annealer which introduces significant acceleration and… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  35. arXiv:2407.10204  [pdf, other

    cs.LG

    Improving Graph Out-of-distribution Generalization on Real-world Data

    Authors: Can Xu, Yao Cheng, Jianxiang Yu, Haosen Wang, Jingsong Lv, Xiang Li

    Abstract: Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environm… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 21 pages, 5 figures

  36. arXiv:2407.09958  [pdf, other

    cs.CR cs.LG

    Partner in Crime: Boosting Targeted Poisoning Attacks against Federated Learning

    Authors: Shihua Sun, Shridatt Sugrim, Angelos Stavrou, Haining Wang

    Abstract: Federated Learning (FL) exposes vulnerabilities to targeted poisoning attacks that aim to cause misclassification specifically from the source class to the target class. However, using well-established defense frameworks, the poisoning impact of these attacks can be greatly mitigated. We introduce a generalized pre-training stage approach to Boost Targeted Poisoning Attacks against FL, called BoTP… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  37. arXiv:2407.09924  [pdf, other

    cs.CV

    Region-aware Image-based Human Action Retrieval with Transformers

    Authors: Hongsong Wang, Jie Gui

    Abstract: Human action understanding is a fundamental and challenging task in computer vision. Although there exists tremendous research on this area, most works focus on action recognition, while action retrieval has received less attention. In this paper, we focus on the neglected but important task of image-based action retrieval which aims to find images that depict the same action as a query image. We… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  38. arXiv:2407.09698  [pdf, other

    cs.LG

    RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

    Authors: Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao

    Abstract: The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannia… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  39. arXiv:2407.09016  [pdf, other

    cs.RO

    OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

    Authors: Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui Liu

    Abstract: Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration beco… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  40. arXiv:2407.08937  [pdf, other

    cs.CL cs.AI

    Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

    Authors: Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin

    Abstract: To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential lea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 MAIN

  41. Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

    Authors: Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

    Abstract: The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task e… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, accepted by GECCO 2024 poster

  42. PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

    Authors: Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong

    Abstract: Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emission… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  43. arXiv:2407.08770  [pdf, other

    cs.AI

    Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

    Authors: Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang

    Abstract: Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current methods for detoxification or preventing jailbreaking usually in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 14 figures

    MSC Class: 68T50 (Primary) 68T07; 62M45 (Secondary) ACM Class: I.2.7

  44. arXiv:2407.08572  [pdf, other

    cs.CV

    Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

    Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang

    Abstract: Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that t… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  45. arXiv:2407.08546  [pdf, other

    cs.CV cs.LG q-bio.QM

    Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

    Authors: Yihan Zhang, Xuanshuo Zhang, Wei Wu, Haohan Wang

    Abstract: Saliency maps have been widely used to interpret deep learning classifiers for Alzheimer's disease (AD). However, since AD is heterogeneous and has multiple subtypes, the pathological mechanism of AD remains not fully understood and may vary from patient to patient. Due to the lack of such understanding, it is difficult to comprehensively and effectively assess the saliency map of AD classifier. I… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  46. arXiv:2407.08532  [pdf, other

    cs.CR cs.SE

    Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models

    Authors: Ying Zhang, Xiaoyan Zhou, Hui Wen, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

    Abstract: Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 11 figures

  47. arXiv:2407.08507  [pdf, other

    cs.CV

    Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement

    Authors: Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen, Shanlin Yang

    Abstract: Facial video-based remote physiological measurement is a promising research area for detecting human vital signs (e.g., heart rate, respiration frequency) in a non-contact way. Conventional approaches are mostly supervised learning, requiring extensive collections of facial videos and synchronously recorded photoplethysmography (PPG) signals. To tackle it, self-supervised learning has recently gai… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  48. arXiv:2407.08422  [pdf, other

    cs.CR cs.AI

    On the (In)Security of LLM App Stores

    Authors: Xinyi Hou, Yanjie Zhao, Haoyu Wang

    Abstract: LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we co… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  49. arXiv:2407.07844  [pdf, other

    cs.CV

    OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

    Authors: Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang

    Abstract: Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training on diverse large-scale datasets. However, these approaches still face two primary challenges: (i) how to universally integrate diverse data sources… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Technical Report

  50. arXiv:2407.07763  [pdf, other

    cs.CV

    S&D Messenger: Exchanging Semantic and Domain Knowledge for Generic Semi-Supervised Medical Image Segmentation

    Authors: Qixiang Zhang, Haonan Wang, Xiaomeng Li

    Abstract: Semi-supervised medical image segmentation (SSMIS) has emerged as a promising solution to tackle the challenges of time-consuming manual labeling in the medical field. However, in practical scenarios, there are often domain variations within the datasets, leading to derivative scenarios like semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA).… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 10 pages, under review of IEEE Transcations on Medical Imaging