Skip to main content

Showing 1–50 of 1,419 results for author: Li, G

  1. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12258  [pdf, other

    cs.CV

    Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

    Authors: Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu Wang, Gengchen Li, Xiao Sun

    Abstract: In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integr… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Shuchen Shi, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description

  4. arXiv:2407.11486  [pdf, other

    cs.CV

    An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

    Authors: Jialong Huang, Gaojie Li, Shichao Kan, Jianfeng Liu, Yixiong Liang

    Abstract: Current cervical cytopathology whole slide image (WSI) screening primarily relies on detection-based approaches, which are limited in performance due to the expense and time-consuming annotation process. Multiple Instance Learning (MIL), a weakly supervised approach that relies solely on bag-level labels, can effectively alleviate these challenges. Nonetheless, MIL commonly employs frozen pretrain… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  5. arXiv:2407.11405  [pdf, other

    cs.CR cs.CV

    Cover-separable Fixed Neural Network Steganography via Deep Generative Models

    Authors: Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang

    Abstract: Image steganography is the process of hiding secret data in a cover image by subtle perturbation. Recent studies show that it is feasible to use a fixed neural network for data embedding and extraction. Such Fixed Neural Network Steganography (FNNS) demonstrates favorable performance without the need for training networks, making it more practical for real-world applications. However, the stego-im… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepetd at ACMMM 2024

  6. arXiv:2407.10957  [pdf, other

    cs.CV cs.AI

    Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

    Authors: Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

    Abstract: Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues. Such expressions… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  7. arXiv:2407.10625  [pdf, other

    cs.CV

    WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

    Authors: Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

    Abstract: Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  8. arXiv:2407.08959  [pdf, other

    cs.CL

    Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

    Authors: Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang

    Abstract: Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, 2 figures, Accepted by IJCAI2024

  9. arXiv:2407.08850  [pdf, other

    cs.HC cs.AI

    UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

    Authors: Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li

    Abstract: Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that aut… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.08200  [pdf, other

    cs.CV

    Deep Understanding of Soccer Match Videos

    Authors: Shikun Xu, Yandong Zhu, Gen Li, Changhu Wang

    Abstract: Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.08093  [pdf, other

    eess.IV cs.AI cs.CV eess.SP

    MemWarp: Discontinuity-Preserving Cardiac Registration with Memorized Anatomical Filters

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Dongdong Liu, Gaolei Li, Rongguang Wang

    Abstract: Many existing learning-based deformable image registration methods impose constraints on deformation fields to ensure they are globally smooth and continuous. However, this assumption does not hold in cardiac image registration, where different anatomical regions exhibit asymmetric motions during respiration and movements due to sliding organs within the chest. Consequently, such global constraint… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 figure, 2 tables

  12. arXiv:2407.07020  [pdf, other

    cs.AI cs.RO

    Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

    Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Chunlin Tian, Yuming Huang, Zilin Bian, Kaiqun Zhu, Guofa Li, Ziyuan Pu, Jia Hu, Zhiyong Cui, Chengzhong Xu

    Abstract: Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.19251

  13. arXiv:2407.06886  [pdf, other

    cs.CV cs.AI cs.LG cs.MA cs.RO

    Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

    Authors: Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin

    Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More

    Submitted 18 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: The first comprehensive review of Embodied AI in the era of MLMs, 37 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

  14. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  15. arXiv:2407.05814  [pdf, other

    cs.CV cs.AI cs.MM

    Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

    Authors: Yaozong Gan, Guang Li, Ren Togo, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

    Abstract: Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic sign… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  16. arXiv:2407.05705  [pdf, other

    cs.AI

    Fast and Continual Knowledge Graph Embedding via Incremental LoRA

    Authors: Jiajun Liu, Wenjun Ke, Peng Wang, Jiahao Wang, Jinhua Gao, Ziyu Shang, Guozheng Li, Zijie Xu, Ke Ji, Yining Li

    Abstract: Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant chall… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI2024

  17. arXiv:2407.05131  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

    Authors: Peng Xia, Kangyu Zhu, Haoran Li, Hongtu Zhu, Yun Li, Gang Li, Linjun Zhang, Huaxiu Yao

    Abstract: The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challen… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  18. arXiv:2407.04752  [pdf, other

    cs.LG cs.CL cs.NE

    SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

    Authors: Xingrun Xing, Boyan Gao, Zheng Zhang, David A. Clifton, Shitao Xiao, Li Du, Guoqi Li, Jiajun Zhang

    Abstract: The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological n… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  19. arXiv:2407.02813  [pdf, other

    cs.CV cs.AI cs.LG

    Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

    Authors: Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma

    Abstract: Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks… ▽ More

    Submitted 11 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  20. Coding-Enhanced Cooperative Jamming for Secret Communication in Fluid Antenna Systems

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Guyue Li, Farshad Rostami Ghadi, Yongxu Zhu, Shi Jin, Chan-Byoung Chae, Yangyang Zhang

    Abstract: This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, this paper has been accepted by IEEE Communications Letters

  21. arXiv:2407.01917  [pdf, other

    cs.NI cs.CR cs.DC

    Securing Distributed Network Digital Twin Systems Against Model Poisoning Attacks

    Authors: Zifan Zhang, Minghong Fang, Mingzhe Chen, Gaolei Li, Xi Lin, Yuchen Liu

    Abstract: In the era of 5G and beyond, the increasing complexity of wireless networks necessitates innovative frameworks for efficient management and deployment. Digital twins (DTs), embodying real-time monitoring, predictive configurations, and enhanced decision-making capabilities, stand out as a promising solution in this context. Within a time-series data-driven framework that effectively maps wireless… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by Internet of Things Journal (IoT-J). arXiv admin note: substantial text overlap with arXiv:2404.14389

  22. arXiv:2407.01640  [pdf, other

    cs.LG

    BADM: Batch ADMM for Deep Learning

    Authors: Ouya Wang, Shenglong Zhou, Geoffrey Ye Li

    Abstract: Stochastic gradient descent-based algorithms are widely used for training deep neural networks but often suffer from slow convergence. To address the challenge, we leverage the framework of the alternating direction method of multipliers (ADMM) to develop a novel data-driven algorithm, called batch ADMM (BADM). The fundamental idea of the proposed algorithm is to split the training data into batch… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  23. arXiv:2407.01511  [pdf, other

    cs.AI

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

    Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  24. arXiv:2407.01003  [pdf, other

    cs.CV cs.AI

    Embedded Prompt Tuning: Towards Enhanced Calibration of Pretrained Models for Medical Images

    Authors: Wenqiang Zu, Shenghao Xie, Qing Zhao, Guoqi Li, Lei Ma

    Abstract: Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 7 figures. arXiv admin note: text overlap with arXiv:2306.09579, arXiv:2203.12119 by other authors

  25. arXiv:2407.00896  [pdf, other

    eess.SP cs.AI

    Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions

    Authors: Yupeng Li, Gang Li, Zirui Wen, Shuangfeng Han, Shijian Gao, Guangyi Liu, Jiangzhou Wang

    Abstract: The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  26. arXiv:2407.00295  [pdf, other

    cs.CV

    A deep neural network framework for dynamic multi-valued mapping estimation and its applications

    Authors: Geng Li, Di Qiu, Lok Ming Lui

    Abstract: This paper addresses the problem of modeling and estimating dynamic multi-valued mappings. While most mathematical models provide a unique solution for a given input, real-world applications often lack deterministic solutions. In such scenarios, estimating dynamic multi-valued mappings is necessary to suggest different reasonable solutions for each input. This paper introduces a deep neural networ… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  27. arXiv:2406.19811  [pdf, ps, other

    cs.CV

    EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting

    Authors: Daiwei Zhang, Gengyan Li, Jiajie Li, Mickaël Bressieux, Otmar Hilliges, Marc Pollefeys, Luc Van Gool, Xi Wang

    Abstract: Human activities are inherently complex, and even simple household tasks involve numerous object interactions. To better understand these activities and behaviors, it is crucial to model their dynamic interactions with the environment. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand dynamic human-object inter… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  28. arXiv:2406.18559  [pdf, other

    cs.HC cs.AI cs.CV cs.LG

    Revision Matters: Generative Design Guided by Revision Edits

    Authors: Tao Li, Chin-Yi Cheng, Amber Xie, Gang Li, Yang Li

    Abstract: Layout design, such as user interface or graphical layout in general, is fundamentally an iterative revision process. Through revising a design repeatedly, the designer converges on an ideal layout. In this paper, we investigate how revision edits from human designer can benefit a multimodal generative model. To do so, we curate an expert dataset that traces how human designers iteratively edit an… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  29. arXiv:2406.18512  [pdf, other

    cs.CL

    "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline

    Authors: Grace Li, Milad Alshomary, Smaranda Muresan

    Abstract: Explanations form the foundation of knowledge sharing and build upon communication principles, social dynamics, and learning theories. We focus specifically on conversational approaches for explanations because the context is highly adaptive and interactive. Our research leverages previous work on explanatory acts, a framework for understanding the different strategies that explainers and explaine… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 6 figures, 5 pages

  30. arXiv:2406.18351  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Intrinsically Motivated Feedback Graph for Lost-sales Inventory Control

    Authors: Zifan Liu, Xinran Li, Shibo Chen, Gen Li, Jiashuo Jiang, Jun Zhang

    Abstract: Reinforcement learning (RL) has proven to be well-performed and general-purpose in the inventory control (IC). However, further improvement of RL algorithms in the IC domain is impeded due to two limitations of online experience. First, online experience is expensive to acquire in real-world applications. With the low sample efficiency nature of RL algorithms, it would take extensive time to train… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  31. arXiv:2406.17172  [pdf, other

    cs.CR cs.DC cs.LG

    Robust Zero Trust Architecture: Joint Blockchain based Federated learning and Anomaly Detection based Framework

    Authors: Shiva Raj Pokhrel, Luxing Yang, Sutharshan Rajasegarar, Gang Li

    Abstract: This paper introduces a robust zero-trust architecture (ZTA) tailored for the decentralized system that empowers efficient remote work and collaboration within IoT networks. Using blockchain-based federated learning principles, our proposed framework includes a robust aggregation mechanism designed to counteract malicious updates from compromised clients, enhancing the security of the global learn… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Journal ref: ACM SIGCOMM 2024 Sydney

  32. arXiv:2406.16807  [pdf, other

    cs.LG cs.CL cs.CV

    Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation

    Authors: Katherine M. Collins, Najoung Kim, Yonatan Bitton, Verena Rieser, Shayegan Omidshafiei, Yushi Hu, Sherol Chen, Senjuti Dutta, Minsuk Chang, Kimin Lee, Youwei Liang, Georgina Evans, Sahil Singla, Gang Li, Adrian Weller, Junfeng He, Deepak Ramachandran, Krishnamurthy Dj Dvijotham

    Abstract: Human feedback plays a critical role in learning and refining reward models for text-to-image generation, but the optimal form the feedback should take for learning an accurate reward function has not been conclusively established. This paper investigates the effectiveness of fine-grained feedback which captures nuanced distinctions in image quality and prompt-alignment, compared to traditional co… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  33. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  34. arXiv:2406.16137  [pdf, other

    cs.CV

    MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

    Authors: Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

    Abstract: Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  35. arXiv:2406.15906  [pdf, other

    cs.NI cs.AI

    OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization

    Authors: Siyuan Li, Xi Lin, Yaju Liu, Gaolei Li, Jianhua Li

    Abstract: Deep Reinforcement Learning (DRL) is regarded as a promising tool for optical network optimization. However, the flexibility and efficiency of current DRL-based solutions for optical network optimization require further improvement. Currently, generative models have showcased their significant performance advantages across various domains. In this paper, we introduce OpticGAI, the AI-generated pol… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM SIGCOMM 2024 Workshop on Hot Topics in Optical Technologies and Applications in Networking

  36. arXiv:2406.13275  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

    Authors: Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang

    Abstract: Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened up possibilities for improving AAC. Thus, we explore enhancing AAC from three aspects: 1) a pre-trained audio encoder via consistent ensemble distillation (CED)… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  37. arXiv:2406.13223  [pdf, other

    cs.RO

    Act Better by Timing: A timing-Aware Reinforcement Learning for Autonomous Driving

    Authors: Guanzhou Li, Jianping Wu, Yujing He

    Abstract: Coping with intensively interactive scenarios is one of the significant challenges in the development of autonomous driving. Reinforcement learning (RL) offers an ideal solution for such scenarios through its self-evolution mechanism via interaction with the environment. However, the lack of sufficient safety mechanisms in common RL leads to the fact that agent often find it difficult to interact… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  38. arXiv:2406.12539  [pdf, other

    cs.LG cs.AI

    The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

    Authors: Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, Yuxuan Liang

    Abstract: Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{fo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  39. arXiv:2406.12538  [pdf, other

    cs.LG cs.AI cs.RO

    Variational Distillation of Diffusion Policies into Mixture of Experts

    Authors: Hongyi Zhou, Denis Blessing, Ge Li, Onur Celik, Xiaogang Jia, Gerhard Neumann, Rudolf Lioutikov

    Abstract: This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) through variational inference. Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. This ability allows Diffusion Models to r… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  40. arXiv:2406.11496  [pdf, other

    cs.MA

    Decentralized Collaborative Pricing and Shunting for Multiple EV Charging Stations Based on Multi-Agent Reinforcement Learning

    Authors: Tianhao Bu, Hang Li, Guojie Li

    Abstract: The extraordinary electric vehicle (EV) popularization in the recent years has facilitated research studies in alleviating EV energy charging demand. Previous studies primarily focused on the optimizations over charging stations (CS) profit and EV users cost savings through charge/discharge scheduling events. In this work, the random behaviors of EVs are considered, with EV users preferences over… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at PSGEC: PSGEC2024

  41. arXiv:2406.11231  [pdf, other

    cs.RO cs.AI cs.CL cs.LG

    Enabling robots to follow abstract instructions and complete complex dynamic tasks

    Authors: Ruaridh Mon-Williams, Gen Li, Ran Long, Wenqian Du, Chris Lucas

    Abstract: Completing complex tasks in unpredictable settings like home kitchens challenges robotic systems. These challenges include interpreting high-level human commands, such as "make me a hot beverage" and performing actions like pouring a precise amount of water into a moving mug. To address these challenges, we present a novel framework that combines Large Language Models (LLMs), a curated Knowledge B… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  42. arXiv:2406.11033  [pdf, other

    cs.DB cs.AI

    HAIChart: Human and AI Paired Visualization System

    Authors: Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang

    Abstract: The growing importance of data visualization in business intelligence and data science emphasizes the need for tools that can efficiently generate meaningful visualizations from large datasets. Existing tools fall into two main categories: human-powered tools (e.g., Tableau and PowerBI), which require intensive expert involvement, and AI-powered automated tools (e.g., Draco and Table2Charts), whic… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 16 pages, 14 figures, 7 tables

  43. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  44. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  45. arXiv:2406.10573  [pdf, other

    cs.LG cs.AI cs.CR

    Graph Neural Backdoor: Fundamentals, Methodologies, Applications, and Future Directions

    Authors: Xiao Yang, Gaolei Li, Jianhua Li

    Abstract: Graph Neural Networks (GNNs) have significantly advanced various downstream graph-relevant tasks, encompassing recommender systems, molecular structure prediction, social media analysis, etc. Despite the boosts of GNN, recent research has empirically demonstrated its potential vulnerability to backdoor attacks, wherein adversaries employ triggers to poison input samples, inducing GNN to adversary-… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  46. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  47. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 16 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  48. arXiv:2406.09238  [pdf, other

    cs.IT eess.SP

    Near-Field Multiuser Communications based on Sparse Arrays

    Authors: Kangjian Chen, Chenhao Qi, Geoffrey Ye Li, Octavia A. Dobre

    Abstract: This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitiga… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  49. arXiv:2406.09198  [pdf, other

    cs.CV

    CLIP-Driven Cloth-Agnostic Feature Learning for Cloth-Changing Person Re-Identification

    Authors: Shuang Li, Jiaxu Leng, Guozhang Li, Ji Gan, Haosheng chen, Xinbo Gao

    Abstract: Contrastive Language-Image Pre-Training (CLIP) has shown impressive performance in short-term Person Re-Identification (ReID) due to its ability to extract high-level semantic features of pedestrians, yet its direct application to Cloth-Changing Person Re-Identification (CC-ReID) faces challenges due to CLIP's image encoder overly focusing on clothes clues. To address this, we propose a novel fram… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  50. arXiv:2406.08234  [pdf, other

    cs.LG cs.RO

    MaIL: Improving Imitation Learning with Mamba

    Authors: Xiaogang Jia, Qian Wang, Atalay Donat, Bowen Xing, Ge Li, Hongyi Zhou, Onur Celik, Denis Blessing, Rudolf Lioutikov, Gerhard Neumann

    Abstract: This work introduces Mamba Imitation Learning (MaIL), a novel imitation learning (IL) architecture that offers a computationally efficient alternative to state-of-the-art (SoTA) Transformer policies. Transformer-based policies have achieved remarkable results due to their ability in handling human-recorded data with inherently non-Markovian behavior. However, their high performance comes with the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.