Skip to main content

Showing 1–50 of 6,036 results for author: Wang, X

  1. arXiv:2407.13301  [pdf, other

    cs.CL cs.AI cs.LG

    CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

    Authors: Junying Chen, Chi Gui, Anningzhe Gao, Ke Ji, Xidong Wang, Xiang Wan, Benyou Wang

    Abstract: The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within these models remain largely unaddressed. This study introduces Chain-of-Diagnosis (CoD) to enhance the interpretability of LLM-based medical diagnostics. CoD transforms the diagnostic process into a diagnostic chain that mirrors a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13252  [pdf, other

    cs.CV

    Unveiling Structural Memorization: Structural Membership Inference Attack for Text-to-Image Diffusion Models

    Authors: Qiao Li, Xiaomeng Fu, Xi Wang, Jin Liu, Xingyu Gao, Jiao Dai, Jizhong Han

    Abstract: With the rapid advancements of large-scale text-to-image diffusion models, various practical applications have emerged, bringing significant convenience to society. However, model developers may misuse the unauthorized data to train diffusion models. These data are at risk of being memorized by the models, thus potentially violating citizens' privacy rights. Therefore, in order to judge whether a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13219  [pdf, other

    cs.CV

    Multi-sentence Video Grounding for Long Video Generation

    Authors: Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Wenwu Zhu

    Abstract: Video generation has witnessed great success recently, but their application in generating long videos still remains challenging due to the difficulty in maintaining the temporal consistency of generated videos and the high memory cost during generation. To tackle the problems, in this paper, we propose a brave and new idea of Multi-sentence Video Grounding for Long Video Generation, connecting th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13158  [pdf, other

    cs.LG cs.DB

    HHGT: Hierarchical Heterogeneous Graph Transformer for Heterogeneous Graph Representation Learning

    Authors: Qiuyu Zhu, Liang Zhang, Qianxiong Xu, Kaijun Liu, Cheng Long, Xiaoyang Wang

    Abstract: Despite the success of Heterogeneous Graph Neural Networks (HGNNs) in modeling real-world Heterogeneous Information Networks (HINs), challenges such as expressiveness limitations and over-smoothing have prompted researchers to explore Graph Transformers (GTs) for enhanced HIN representation learning. However, research on GT in HINs remains limited, with two key shortcomings in existing work: (1) A… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.13122  [pdf, other

    cs.LG cs.AI

    MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets

    Authors: Peng Liao, XiLu Wang, Yaochu Jin, WenLi Du

    Abstract: Deploying models across diverse devices demands tradeoffs among multiple objectives due to different resource constraints. Arguably, due to the small model trap problem in multi-objective neural architecture search (MO-NAS) based on a supernet, existing approaches may fail to maintain large models. Moreover, multi-tasking neural architecture search (MT-NAS) excels in handling multiple tasks simult… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.13088  [pdf, other

    cs.DC

    Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

    Authors: Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang

    Abstract: Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Efficient scheduler designs for such clusters are vital to reduce operational costs and enhance resource utilization. While recent schedulers have shown impressive performance in optimizing DL job performanc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. arXiv:2407.12871  [pdf, other

    cs.CL cs.AI cs.LG

    MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

    Authors: Xiaohan Wang, Dian Li, Yilin Zhao, Sinbadliu, Hui Wang

    Abstract: Utilizing complex tools with Large Language Models (LLMs) is a critical component for grounding AI agents in various real-world scenarios. The core challenge of manipulating tools lies in understanding their usage and functionality. The prevailing approach involves few-shot prompting with demonstrations or fine-tuning on expert trajectories. However, for complex tools and tasks, mere in-context de… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  8. arXiv:2407.12851  [pdf

    cs.CL

    ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

    Authors: Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

    Abstract: Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 39 pages, 6 figures, 6 tables

  9. arXiv:2407.12667  [pdf, other

    cs.CV

    SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization

    Authors: Yiyang Chen, Siyan Dong, Xulong Wang, Lulu Cai, Youyi Zheng, Yanchao Yang

    Abstract: 3D surface reconstruction from images is essential for numerous applications. Recently, Neural Radiance Fields (NeRFs) have emerged as a promising framework for 3D modeling. However, NeRFs require accurate camera poses as input, and existing methods struggle to handle significantly noisy pose estimates (i.e., outliers), which are commonly encountered in real-world scenarios. To tackle this challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  10. arXiv:2407.12661  [pdf, other

    cs.CV

    InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction

    Authors: Xulong Wang, Siyan Dong, Youyi Zheng, Yanchao Yang

    Abstract: 3D surface reconstruction from multi-view images is essential for scene understanding and interaction. However, complex indoor scenes pose challenges such as ambiguity due to limited observations. Recent implicit surface representations, such as Neural Radiance Fields (NeRFs) and signed distance functions (SDFs), employ various geometric priors to resolve the lack of observed information. Neverthe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  11. arXiv:2407.12519  [pdf, other

    cs.CV

    Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

    Authors: Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu

    Abstract: Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  12. arXiv:2407.12487  [pdf, other

    cs.HC

    Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task

    Authors: Mengxiao Zhu, Xin Wang, Xiantao Wang, Zihang Chen, Wei Huang

    Abstract: Collaborative problem solving (CPS) competence is considered one of the essential 21st-century skills. To facilitate the assessment and learning of CPS competence, researchers have proposed a series of frameworks to conceptualize CPS and explored ways to make sense of the complex processes involved in collaborative problem solving. However, encoding explicit behaviors into subskills within the fra… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  13. arXiv:2407.12442  [pdf, other

    cs.CV

    ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

    Authors: Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

    Abstract: Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. code available at https://github.com/mc- lan/ClearCLIP

  14. arXiv:2407.12380  [pdf, other

    eess.AS cs.SD

    PCQ: Emotion Recognition in Speech via Progressive Channel Querying

    Authors: Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao

    Abstract: In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions. Traditional SER methods struggle to effectively capture the long-term temporal correla-tions and dynamic variations in complex emotional expressions. To overcome these limitations, we introduce the PCQ method, a pioneering approach for SER via \textbf{P}rogress… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted for publication by International Conference On Intelligent Computing 2024. For data and code, see <a href="https://github.com/ICIG/PCQ-Net">this https URL</a>

  15. arXiv:2407.12366  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

    Authors: Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

    Abstract: Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a trend underscores the potential of LLMs to generalize navigational reasoning and diverse language understanding. However, a significant discrepancy in agent performance is observed when integrating LLMs in the Vision-and-… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  16. arXiv:2407.12292  [pdf, other

    cs.CV cs.AI

    Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

    Authors: Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song

    Abstract: Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  17. arXiv:2407.12229  [pdf, other

    eess.AS cs.AI eess.SP

    Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

    Authors: Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. Em… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: See https://aka.ms/emoctrl-tts for demo samples

  18. arXiv:2407.11902  [pdf, other

    cs.CV

    Encapsulating Knowledge in One Prompt

    Authors: Qi Li, Runpeng Yu, Xinchao Wang

    Abstract: This paradigm encapsulates knowledge from various models into a solitary prompt without altering the original models or requiring access to the training data, which enables us to achieve efficient and convenient knowledge transfer in more realistic scenarios. From a practicality standpoint, this paradigm not only for the first time proves the effectiveness of Visual Prompt in data inaccessible con… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  19. arXiv:2407.11496  [pdf, other

    eess.IV cs.CV cs.MM

    ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

    Authors: Xinyi Wang, Angeliki Katsenou, David Bull

    Abstract: With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild has emerged. UGC is mostly acquired using consumer devices and undergoes multiple rounds of compression or transcoding before reaching the end user. Therefore, traditional quality metrics that require the original content as a reference cannot be us… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  20. arXiv:2407.11466  [pdf, other

    cs.CY

    Navigating the Data Trading Crossroads: An Interdisciplinary Survey

    Authors: Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang

    Abstract: Data has been increasingly recognized as a critical factor in the future economy. However, constructing an efficient data trading market faces challenges such as privacy breaches, data monopolies, and misuse. Despite numerous studies proposing algorithms to protect privacy and methods for pricing data, a comprehensive understanding of these issues and systemic solutions remain elusive. This paper… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  21. arXiv:2407.11389  [pdf, ps, other

    cs.NI eess.SP

    Spatial-spectral Cell-free Networks: A Large-scale Case Study

    Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Dongming Wang, Kai-Kit Wong

    Abstract: This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  22. arXiv:2407.11321  [pdf, other

    cs.CV

    TCFormer: Visual Recognition via Token Clustering Transformer

    Authors: Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

    Abstract: Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Tran… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  23. arXiv:2407.11100  [pdf, other

    cs.CR cs.AI cs.CL

    Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

    Authors: Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Qiao Yu, Li Li, Fei-Yue Wang

    Abstract: Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensur… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 59 pages, 7 figures

  24. arXiv:2407.11054  [pdf

    cs.LG cs.AI

    Generative AI for Health Technology Assessment: Opportunities, Challenges, and Policy Considerations

    Authors: Rachael Fleurence, Jiang Bian, Xiaoyan Wang, Hua Xu, Dalia Dawoud, Tala Fakhouri, Mitch Higashi, Jagpreet Chhatwal

    Abstract: This review introduces the transformative potential of generative Artificial Intelligence (AI) and foundation models, including large language models (LLMs), for health technology assessment (HTA). We explore their applications in four critical areas, evidence synthesis, evidence generation, clinical trials and economic modeling: (1) Evidence synthesis: Generative AI has the potential to assist in… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 24 pages, 1 figure, 1 table, 2 boxes, 103 references

  25. arXiv:2407.10923  [pdf, other

    cs.CV

    OPa-Ma: Text Guided Mamba for 360-degree Image Out-painting

    Authors: Penglei Gao, Kai Yao, Tiandi Ye, Steven Wang, Yuan Yao, Xiaofeng Wang

    Abstract: In this paper, we tackle the recently popular topic of generating 360-degree images given the conventional narrow field of view (NFoV) images that could be taken from a single camera or cellphone. This task aims to predict the reasonable and consistent surroundings from the NFoV images. Existing methods for feature extraction and fusion, often built with transformer-based architectures, incur subs… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  26. arXiv:2407.10890  [pdf, other

    cs.CE

    Thinking Fast and Slow: Data-Driven Adaptive DeFi Borrow-Lending Protocol

    Authors: Mahsa Bastankhah, Viraj Nadkarni, Xuechao Wang, Chi Jin, Sanjeev Kulkarni, Pramod Viswanath

    Abstract: Decentralized finance (DeFi) borrowing and lending platforms are crucial to the decentralized economy, involving two main participants: lenders who provide assets for interest and borrowers who offer collateral exceeding their debt and pay interest. Collateral volatility necessitates over-collateralization to protect lenders and ensure competitive returns. Traditional DeFi platforms use a fixed in… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  27. arXiv:2407.10833  [pdf, other

    eess.IV cs.CV

    MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration

    Authors: Yulin Ren, Xin Li, Bingchen Li, Xingrui Wang, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: We present MoE-DiffIR, an innovative universal compressed image restoration (CIR) method with task-customized diffusion priors. This intends to handle two pivotal challenges in the existing CIR methods: (i) lacking adaptability and universality for different image codecs, e.g., JPEG and WebP; (ii) poor texture generation capability, particularly at low bitrates. Specifically, our MoE-DiffIR develo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  28. arXiv:2407.10814  [pdf, other

    cs.CV

    Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification

    Authors: Linhao Qu, Dingkang Yang, Dan Huang, Qinhao Guo, Rongkui Luo, Shaoting Zhang, Xiaosong Wang

    Abstract: Current multi-instance learning algorithms for pathology image analysis often require a substantial number of Whole Slide Images for effective training but exhibit suboptimal performance in scenarios with limited learning data. In clinical settings, restricted access to pathology slides is inevitable due to patient privacy concerns and the prevalence of rare or emerging diseases. The emergence of… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  29. arXiv:2407.10782  [pdf, other

    cs.RO

    LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning

    Authors: Zhuozhu Jian, Qixuan Li, Shengtao Zheng, Xueqian Wang, Xinlei Chen

    Abstract: In air-ground collaboration scenarios without GPS and prior maps, the relative positioning of drones and unmanned ground vehicles (UGVs) has always been a challenge. For a drone equipped with monocular camera and an UGV equipped with LiDAR as an external sensor, we propose a robust and real-time relative pose estimation method (LVCP) based on the tight coupling of vision and LiDAR point cloud info… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: See more details in https://sites.google.com/view/lvcp

  30. arXiv:2407.10768  [pdf, other

    cs.LG cs.AI

    MSegRNN:Enhanced SegRNN Model with Mamba for Long-Term Time Series Forecasting

    Authors: GaoXiang Zhao, XiaoQiang Wang

    Abstract: The field of long-term time series forecasting demands handling extensive look-back windows and long-range prediction steps, posing significant challenges for RNN-based methodologies. Among these, SegRNN, a robust RNN-driven model, has gained considerable attention in LTSF analysis for achieving state-of-the-art results while maintaining a remarkably streamlined architecture. Concurrently, the Mam… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  31. arXiv:2407.10636  [pdf, other

    cs.CV

    Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

    Authors: Lin Zhu, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang

    Abstract: Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  32. arXiv:2407.10468  [pdf, other

    cs.SD cs.AI eess.AS

    LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

    Authors: Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang

    Abstract: Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to lo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024; Code: https://github.com/Yuanshi9815/LiteFocus

  33. arXiv:2407.10407  [pdf, ps, other

    cs.NI

    Distributed Scheduling for Throughput Maximization under Deadline Constraint in Wireless Mesh Networks

    Authors: Xin Wang, Xudong Wang

    Abstract: This paper studies the distributed scheduling of traffic flows with arbitrary deadlines that arrive at their source nodes and are transmitted to different destination nodes via multiple intermediate nodes in a wireless mesh network. When a flow is successfully delivered to its destination, a reward will be obtained, which is the embodiment of network performance and can be expressed by metrics suc… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  34. arXiv:2407.10406  [pdf, other

    cs.CV

    Towards Scale-Aware Full Surround Monodepth with Transformers

    Authors: Yuchen Yang, Xinyi Wang, Dong Li, Lu Tian, Ashish Sirasao, Xun Yang

    Abstract: Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to imp… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  35. arXiv:2407.10374  [pdf, other

    cs.CV cs.AI

    An Empirical Study of Mamba-based Pedestrian Attribute Recognition

    Authors: Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang

    Abstract: Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: In Peer Review

  36. arXiv:2407.10223  [pdf, other

    cs.LG cs.CR

    Practical Unlearning for Large Language Models

    Authors: Chongyang Gao, Lixu Wang, Chenkai Weng, Xiao Wang, Qi Zhu

    Abstract: While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training da… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 17 pages, 8 figures. The first two authors contribute equally and they are ordered alphabetically

  37. arXiv:2407.10055  [pdf

    cs.LG q-bio.QM

    MKDTI: Predicting drug-target interactions via multiple kernel fusion on graph attention network

    Authors: Yuhuan Zhou, Yulin Wu, Weiwei Yuan, Xuan Wang, Junyi Li

    Abstract: Drug-target relationships may now be predicted computationally using bioinformatics data, which is a valuable tool for understanding pharmacological effects, enhancing drug development efficiency, and advancing related research. A number of structure-based, ligand-based and network-based approaches have now emerged. Furthermore, the integration of graph attention networks with intricate drug targe… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  38. arXiv:2407.09826  [pdf, other

    cs.CV

    3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

    Authors: Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

    Abstract: In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-langu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  39. arXiv:2407.09618  [pdf, other

    cs.LG cs.SI

    The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

    Authors: Sitao Luan, Chenqing Hua, Qincheng Lu, Liheng Ma, Lirong Wu, Xinyu Wang, Minkai Xu, Xiao-Wen Chang, Doina Precup, Rex Ying, Stan Z. Li, Jian Tang, Guy Wolf, Stefanie Jegelka

    Abstract: Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance com… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Suggestions and comments are welcomed at sitao.luan@mail.mcgill.ca!

  40. arXiv:2407.09469  [pdf, other

    cs.RO

    Learning Coordinated Maneuver in Adversarial Environments

    Authors: Zechen Hu, Manshi Limbu, Daigo Shishika, Xuesu Xiao, Xuan Wang

    Abstract: This paper aims to solve the coordination of a team of robots traversing a route in the presence of adversaries with random positions. Our goal is to minimize the overall cost of the team, which is determined by (i) the accumulated risk when robots stay in adversary-impacted zones and (ii) the mission completion time. During traversal, robots can reduce their speed and act as a `guard' (the slower… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  41. arXiv:2407.09442  [pdf, other

    cs.DS cs.CG math.GN

    A Distance for Geometric Graphs via the Labeled Merge Tree Interleaving Distance

    Authors: Erin Wolf Chambers, Elizabeth Munch, Sarah Percival, Xinyi Wang

    Abstract: Geometric graphs appear in many real-world data sets, such as road networks, sensor networks, and molecules. We investigate the notion of distance between embedded graphs and present a metric to measure the distance between two geometric graphs via merge trees. In order to preserve as much useful information as possible from the original data, we introduce a way of rotating the sublevel set to obt… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  42. arXiv:2407.09083  [pdf, other

    cs.NE

    BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation

    Authors: Zekai Xu, Kang You, Qinghai Guo, Xiang Wang, Zhezhi He

    Abstract: Spiking neural networks (SNNs), which mimic biological neural system to convey information via discrete spikes, are well known as brain-inspired models with excellent computing efficiency. By utilizing the surrogate gradient estimation for discrete spikes, learning-based SNN training methods that can achieve ultra-low inference latency (number of time-step) emerge recently. Nevertheless, due to th… ▽ More

    Submitted 14 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: accepted by European Conference on Computer Vision (ECCV) 2024

  43. arXiv:2407.09029  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Enhancing Emotion Recognition in Incomplete Data: A Novel Cross-Modal Alignment, Reconstruction, and Refinement Framework

    Authors: Haoqin Sun, Shiwan Zhao, Shaokai Li, Xiangyu Kong, Xuechen Wang, Aobo Kong, Jiaming Zhou, Yong Chen, Wenjia Zeng, Yong Qin

    Abstract: Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refinement (CM-ARR) framework, an innovative approach that sequentially engages in cross-modal alignment, reconstruction, and refinement phases to handle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  44. arXiv:2407.09018  [pdf, other

    cs.SE

    AUITestAgent: Automatic Requirements Oriented GUI Function Testing

    Authors: Yongxiang Hu, Xuan Wang, Yingchuan Wang, Yu Zhang, Shiyu Guo, Chaoyi Chen, Xin Wang, Yangfan Zhou

    Abstract: The Graphical User Interface (GUI) is how users interact with mobile apps. To ensure it functions properly, testing engineers have to make sure it functions as intended, based on test requirements that are typically written in natural language. While widely adopted manual testing and script-based methods are effective, they demand substantial effort due to the vast number of GUI pages and rapid it… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  45. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  46. arXiv:2407.08924  [pdf, other

    cs.CR

    Disassembling Obfuscated Executables with LLM

    Authors: Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen, Shengchen Duan, Shen Wang

    Abstract: Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possi… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  47. arXiv:2407.08639  [pdf, other

    cs.AI cs.LG

    $β$-DPO: Direct Preference Optimization with Dynamic $β$

    Authors: Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

    Abstract: Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$, as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the inf… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  48. arXiv:2407.08303  [pdf, other

    cs.CV cs.AI

    DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

    Authors: Xiaotong Li, Fan Zhang, Haiwen Diao, Yueze Wang, Xinlong Wang, Ling-Yu Duan

    Abstract: Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex understanding of various visual elements, including multiple objects, text information, and spatial relations. Their development for comprehensive visual perception hinges on the availability of high-quality image-text datasets that offer diverse visual elements and throughout image descriptions. However, the scarcity… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  49. arXiv:2407.08265  [pdf, other

    cs.CV

    Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

    Authors: Miao Yan, Ping Zhang, Haofei Zhang, Ruqian Hao, Juanxiu Liu, Xiaoyang Wang, Lin Liu

    Abstract: Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking… ▽ More

    Submitted 18 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  50. arXiv:2407.08222  [pdf, other

    cs.RO

    PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin Ray Fingers

    Authors: Xing Wang, Joel Janek Dabrowski, Josh Pinskier, Lois Liow, Vinoth Viswanathan, Richard Scalzo, David Howard

    Abstract: Modelling complex deformation for soft robotics provides a guideline to understand their behaviour, leading to safe interaction with the environment. However, building a surrogate model with high accuracy and fast inference speed can be challenging for soft robotics due to the nonlinearity from complex geometry, large deformation, material nonlinearity etc. The reality gap from surrogate models al… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.