Skip to main content

Showing 1–50 of 2,926 results for author: Chen, H

  1. arXiv:2407.13436  [pdf, other

    cs.IT

    An Algorithm for Computing the Capacity of Symmetrized KL Information for Discrete Channels

    Authors: Haobo Chen, Gholamali Aminian, Yuheng Bu

    Abstract: Symmetrized Kullback-Leibler (KL) information (\(I_{\mathrm{SKL}}\)), which symmetrizes the traditional mutual information by integrating Lautum information, has been shown as a critical quantity in communication~\cite{aminian2015capacity} and learning theory~\cite{aminian2023information}. This paper considers the problem of computing the capacity in terms of \(I_{\mathrm{SKL}}\) for a fixed discr… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13417  [pdf, other

    cs.CV

    GDDS: A Single Domain Generalized Defect Detection Frame of Open World Scenario using Gather and Distribute Domain-shift Suppression Network

    Authors: Haiyong Chen, Yaxiu Zhang, Yan Zhang, Xin Zhang, Xingwei Yan

    Abstract: Efficient and intelligent surface defect detection of photovoltaic modules is crucial for improving the quality of photovoltaic modules and ensuring the reliable operation of large-scale infrastructure. However, the scenario characteristics of data distribution deviation make the construction of defect detection models for open world scenarios such as photovoltaic manufacturing and power plant ins… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 13 images

    ACM Class: I.4.9; I.5.1

  3. arXiv:2407.13219  [pdf, other

    cs.CV

    Multi-sentence Video Grounding for Long Video Generation

    Authors: Wei Feng, Xin Wang, Hong Chen, Zeyang Zhang, Wenwu Zhu

    Abstract: Video generation has witnessed great success recently, but their application in generating long videos still remains challenging due to the difficulty in maintaining the temporal consistency of generated videos and the high memory cost during generation. To tackle the problems, in this paper, we propose a brave and new idea of Multi-sentence Video Grounding for Long Video Generation, connecting th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13179  [pdf, other

    eess.IV cs.CV

    Learned HDR Image Compression for Perceptually Optimal Storage and Display

    Authors: Peibei Cao, Haoyu Chen, Jingzhe Ma, Yu-Chieh Yuan, Zhiyong Xie, Xin Xie, Haiqing Bai, Kede Ma

    Abstract: High dynamic range (HDR) capture and display have seen significant growth in popularity driven by the advancements in technology and increasing consumer demand for superior image quality. As a result, HDR image compression is crucial to fully realize the benefits of HDR imaging without suffering from large file sizes and inefficient data handling. Conventionally, this is achieved by introducing a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.13132  [pdf, other

    eess.IV cs.CV

    LSD3K: A Benchmark for Smoke Removal from Laparoscopic Surgery Images

    Authors: Wenhui Chang, Hongming Chen

    Abstract: Smoke generated by surgical instruments during laparoscopic surgery can obscure the visual field, impairing surgeons' ability to perform operations accurately and safely. Thus, smoke removal task for laparoscopic images is highly desirable. Despite laparoscopic image desmoking has attracted the attention of researchers in recent years and several algorithms have emerged, the lack of publicly avail… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.12810  [pdf

    cs.NI

    A Study on the Situation of Connected Car Patent Portfolios

    Authors: Abel C. H. Chen, Chia-Shen Chang

    Abstract: In recent years, the countries of the world have drafted the specifications of connected cars; for instance, the Security Credential Management System (SCMS) has been proposed by United States Department of Transportation (USDOT), and the Cooperative Intelligent Transportation System (C-ITS) Credential Management System (CCMS) has been proposed by European Union (EU). Therefore, several companies… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: in Chinese language

  7. arXiv:2407.12764  [pdf, other

    cs.LG

    Jigsaw Game: Federated Clustering

    Authors: Jinxuan Xu, Hong-You Chen, Wei-Lun Chao, Yuqian Zhang

    Abstract: Federated learning has recently garnered significant attention, especially within the domain of supervised learning. However, despite the abundance of unlabeled data on end-users, unsupervised learning problems such as clustering in the federated setting remain underexplored. In this paper, we investigate the federated clustering problem, with a focus on federated k-means. We outline the challenge… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to TMLR

  8. arXiv:2407.12592  [pdf, other

    cs.CV

    VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: In the context of global climate change and frequent extreme weather events, forecasting future geospatial vegetation states under these conditions is of significant importance. The vegetation change process is influenced by the complex interplay between dynamic meteorological variables and static environmental variables, leading to high levels of uncertainty. Existing deterministic methods are in… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures

  9. arXiv:2407.12393  [pdf, other

    cs.CL cs.AI cs.CY

    PersLLM: A Personified Training Approach for Large Language Models

    Authors: Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhiyuan Liu, Maosong Sun

    Abstract: Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems. However, the absence of distinct personalities, such as displaying ingratiating behaviors, inconsistent opinions, and uniform response patterns, diminish LLMs utility in pract… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 10 pages for main text, 5 figures

  10. arXiv:2407.12226  [pdf, other

    cs.LG

    Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation

    Authors: Hang Chen, Collin Meese, Mark Nejad, Chien-Chung Shen

    Abstract: Low-latency traffic prediction is vital for smart city traffic management. Federated Learning has emerged as a promising technique for Traffic Prediction (FLTP), offering several advantages such as privacy preservation, reduced communication overhead, improved prediction accuracy, and enhanced adaptability to changing traffic conditions. However, majority of the current FLTP frameworks lack a real… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 16 pages, 4 figures

  11. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  12. arXiv:2407.10704  [pdf, other

    cs.CV

    Quantized Prompt for Efficient Generalization of Vision-Language Models

    Authors: Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen, Guiguang Ding

    Abstract: In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to ov… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 14 pages, 7 figures. Accepted by ECCV 2024

  13. arXiv:2407.10485  [pdf, other

    cs.CV

    Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss

    Authors: Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

    Abstract: Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces tracking difficulties caused by large and irregular motion, and insufficient training due to the motion long-tailed distribution of current UAV-MOT datasets. Previous UAV-MOT methods either extract motion and detection features redundantly or supervise motio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.07207

  14. arXiv:2407.10419  [pdf, other

    cs.CV cs.LG

    Omni-Dimensional Frequency Learner for General Time Series Analysis

    Authors: Xianing Chen. Hanting Chen, Hailin Hu

    Abstract: Frequency domain representation of time series feature offers a concise representation for handling real-world time series data with inherent complexity and dynamic nature. However, current frequency-based methods with complex operations still fall short of state-of-the-art time domain methods for general time series analysis. In this work, we present Omni-Dimensional Frequency Learner (ODFL) mode… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  15. arXiv:2407.10285  [pdf, other

    cs.CV

    Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

    Authors: Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia, Xiaodong Cun, Zhixun Su, Ying Shan

    Abstract: In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://yangqy1110.github.io/NC-SDEdit/, Code Repo: https://github.com/yangqy1110/NC-SDEdit/

    ACM Class: I.2; I.4.3

  16. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  17. arXiv:2407.09911  [pdf, other

    cs.HC cs.CV cs.LG eess.SP

    SensEmo: Enabling Affective Learning through Real-time Emotion Recognition with Smartwatches

    Authors: Kushan Choksi, Hongkai Chen, Karan Joshi, Sukrutha Jade, Shahriar Nirjon, Shan Lin

    Abstract: Recent research has demonstrated the capability of physiological signals to infer both user emotional and attention responses. This presents an opportunity for leveraging widely available physiological sensors in smartwatches, to detect real-time emotional cues in users, such as stress and excitement. In this paper, we introduce SensEmo, a smartwatch-based system designed for affective learning. S… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 7 pages, 7 figures, 2 tables. IEEE MASS 2024

    ACM Class: C.3.3; J.3.2; J.4.2

  18. arXiv:2407.09816  [pdf, other

    cs.CL

    MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

    Authors: Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Scaling model capacity enhances its capabilities but significantly increases computation. Mixture-of-Experts models (MoEs) address this by allowing model capacity to scale without substantially increasing training or inference costs. Despite their promising results, MoE models encounter several challenges. Primarily, the dispersion of training tokens across multiple experts can lead to underfittin… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Work in progress

  19. arXiv:2407.09698  [pdf, other

    cs.LG

    RIO-CPD: A Riemannian Geometric Method for Correlation-aware Online Change Point Detection

    Authors: Chengyuan Deng, Zhengzhang Chen, Xujiang Zhao, Haoyu Wang, Junxiang Wang, Haifeng Chen, Jie Gao

    Abstract: The objective of change point detection is to identify abrupt changes at potentially multiple points within a data sequence. This task is particularly challenging in the online setting where various types of changes can occur, including shifts in both the marginal and joint distributions of the data. This paper tackles these challenges by sequentially tracking correlation matrices on the Riemannia… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  20. Performance Comparison of Various Modes of Advanced Encryption Standard

    Authors: Abel C. H. Chen

    Abstract: With the maturation of quantum computing technology, many cryptographic methods are gradually facing threats from quantum computing. Although the Grover algorithm can accelerate search speeds, current research indicates that the Advanced Encryption Standard (AES) method can still enhance security by increasing the length of the secret key. However, the AES method involves multiple modes in impleme… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

    Comments: in Chinese language

  21. arXiv:2407.09268  [pdf, other

    eess.IV cs.CV

    Region Attention Transformer for Medical Image Restoration

    Authors: Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Zhou, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

    Abstract: Transformer-based methods have demonstrated impressive results in medical image restoration, attributed to the multi-head self-attention (MSA) mechanism in the spatial dimension. However, the majority of existing Transformers conduct attention within fixed and coarsely partitioned regions (\text{e.g.} the entire image or fixed patches), resulting in interference from irrelevant regions and fragmen… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by MICCAI 2024

  22. arXiv:2407.09249  [pdf

    cs.MA cs.AI

    GNN with Model-based RL for Multi-agent Systems

    Authors: Hanxiao Chen

    Abstract: Multi-agent systems (MAS) constitute a significant role in exploring machine intelligence and advanced applications. In order to deeply investigate complicated interactions within MAS scenarios, we originally propose "GNN for MBRL" model, which utilizes a state-spaced Graph Neural Networks with Model-based Reinforcement Learning to address specific MAS missions (e.g., Billiard-Avoidance, Autonomou… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  23. arXiv:2407.09024  [pdf, other

    cs.LG

    Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control

    Authors: Huayu Chen, Kaiwen Zheng, Hang Su, Jun Zhu

    Abstract: Drawing upon recent advances in language model alignment, we formulate offline Reinforcement Learning as a two-stage optimization problem: First pretraining expressive generative policies on reward-free behavior datasets, then fine-tuning these policies to align with task-specific annotations like Q-values. This strategy allows us to leverage abundant and diverse behavior data to enhance generaliz… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  24. arXiv:2407.09019  [pdf, other

    cs.SI cs.AI

    Heterogeneous Subgraph Network with Prompt Learning for Interpretable Depression Detection on Social Media

    Authors: Chen Chen, Mingwei Li, Fenghuan Li, Haopeng Chen, Yuankun Lin

    Abstract: Massive social media data can reflect people's authentic thoughts, emotions, communication, etc., and therefore can be analyzed for early detection of mental health problems such as depression. Existing works about early depression detection on social media lacked interpretability and neglected the heterogeneity of social media data. Furthermore, they overlooked the global interaction among users.… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  25. arXiv:2407.08995  [pdf, other

    cs.CL

    Self-Prompt Tuning: Enable Autonomous Role-Playing in LLMs

    Authors: Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Jiaming Zhou, Haoqin Sun

    Abstract: Recent advancements in LLMs have showcased their remarkable role-playing capabilities, able to accurately simulate the dialogue styles and cognitive processes of various roles based on different instructions and contexts. Studies indicate that assigning LLMs the roles of experts, a strategy known as role-play prompting, can enhance their performance in the corresponding domains. However, the promp… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  26. arXiv:2407.08972  [pdf, other

    cs.CV

    Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

    Authors: Honghao Chen, Yurong Zhang, Xiaokun Feng, Xiangxiang Chu, Kaiqi Huang

    Abstract: Robustness is a vital aspect to consider when deploying deep learning models into the wild. Numerous studies have been dedicated to the study of the robustness of vision transformers (ViTs), which have dominated as the mainstream backbone choice for vision tasks since the dawn of 2020s. Recently, some large kernel convnets make a comeback with impressive performance and efficiency. However, it sti… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  27. arXiv:2407.08924  [pdf, other

    cs.CR

    Disassembling Obfuscated Executables with LLM

    Authors: Huanyao Rong, Yue Duan, Hang Zhang, XiaoFeng Wang, Hongbo Chen, Shengchen Duan, Shen Wang

    Abstract: Disassembly is a challenging task, particularly for obfuscated executables containing junk bytes, which is designed to induce disassembly errors. Existing solutions rely on heuristics or leverage machine learning techniques, but only achieve limited successes. Fundamentally, such obfuscation cannot be defeated without in-depth understanding of the binary executable's semantics, which is made possi… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  28. arXiv:2407.08922  [pdf, other

    cs.LG

    Leveraging large language models for nano synthesis mechanism explanation: solid foundations or mere conjectures?

    Authors: Yingming Pu, Liping Huang, Tao Lin, Hongyu Chen

    Abstract: With the rapid development of artificial intelligence (AI), large language models (LLMs) such as GPT-4 have garnered significant attention in the scientific community, demonstrating great potential in advancing scientific discovery. This progress raises a critical question: are these LLMs well-aligned with real-world physicochemical principles? Current evaluation strategies largely emphasize fact-… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  29. PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

    Authors: Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong

    Abstract: Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emission… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  30. arXiv:2407.08282  [pdf, ps, other

    cs.CR cs.LG

    AoA-Based Physical Layer Authentication in Analog Arrays under Impersonation Attacks

    Authors: Muralikrishnan Srinivasan, Linda Senigagliesi, Hui Chen, Arsenia Chorti, Marco Baldi, Henk Wymeersch

    Abstract: We discuss the use of angle of arrival (AoA) as an authentication measure in analog array multiple-input multiple-output (MIMO) systems. A base station equipped with an analog array authenticates users based on the AoA estimated from certified pilot transmissions, while active attackers manipulate their transmitted signals to mount impersonation attacks. We study several attacks of increasing inte… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2024)

  31. arXiv:2407.07825  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    RT-LA-VocE: Real-Time Low-SNR Audio-Visual Speech Enhancement

    Authors: Honglie Chen, Rodrigo Mira, Stavros Petridis, Maja Pantic

    Abstract: In this paper, we aim to generate clean speech frame by frame from a live video stream and a noisy audio stream without relying on future inputs. To this end, we propose RT-LA-VocE, which completely re-designs every component of LA-VocE, a state-of-the-art non-causal audio-visual speech enhancement model, to perform causal real-time inference with a 40ms input frame. We do so by devising new visua… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  32. arXiv:2407.07307  [pdf, other

    cs.CV

    Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken

    Authors: Peifu Liu, Tingfa Xu, Jie Wang, Huan Chen, Huiyan Bai, Jianan Li

    Abstract: Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introdu… ▽ More

    Submitted 13 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  33. arXiv:2407.06985  [pdf, other

    cs.AI

    PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

    Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

    Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  34. arXiv:2407.06754  [pdf, other

    cs.DC cs.AI

    Threats and Defenses in Federated Learning Life Cycle: A Comprehensive Survey and Challenges

    Authors: Yanli Li, Zhongliang Guo, Nan Yang, Huaming Chen, Dong Yuan, Weiping Ding

    Abstract: Federated Learning (FL) offers innovative solutions for privacy-preserving collaborative machine learning (ML). Despite its promising potential, FL is vulnerable to various attacks due to its distributed nature, affecting the entire life cycle of FL services. These threats can harm the model's utility or compromise participants' privacy, either directly or indirectly. In response, numerous defense… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  35. arXiv:2407.06513  [pdf, other

    cs.CV

    Computer vision tasks for intelligent aerospace missions: An overview

    Authors: Huilin Chen, Qiyu Sun, Fangfei Li, Yang Tang

    Abstract: Computer vision tasks are crucial for aerospace missions as they help spacecraft to understand and interpret the space environment, such as estimating position and orientation, reconstructing 3D models, and recognizing objects, which have been extensively studied to successfully carry out the missions. However, traditional methods like Kalman Filtering, Structure from Motion, and Multi-View Stereo… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 23 pages, 7 figures, journal

  36. arXiv:2407.05873  [pdf, other

    eess.SP cs.IT

    Receiver Selection and Transmit Beamforming for Multi-static Integrated Sensing and Communications

    Authors: Dan Wang, Yuanming Tian, Chuan Huang, Hao Chen, Xiaodong Xu, Ping Zhang

    Abstract: Next-generation wireless networks are expected to develop a novel paradigm of integrated sensing and communications (ISAC) to enable both the high-accuracy sensing and high-speed communications. However, conventional mono-static ISAC systems, which simultaneously transmit and receive at the same equipment, may suffer from severe self-interference, and thus significantly degrade the system performa… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  37. arXiv:2407.04947  [pdf, other

    cs.CV

    FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

    Authors: Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen

    Abstract: We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing (semantic image composition), we showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to Proc. Eur. Conf. Comp. Vision 2024. Project webpage: https://github.com/aim-uofa/FreeCompose

  38. arXiv:2407.04939  [pdf, ps, other

    cs.LG cs.CV

    Balance of Number of Embedding and their Dimensions in Vector Quantization

    Authors: Hang Chen, Sankepally Sainath Reddy, Ziwei Chen, Dianbo Liu

    Abstract: The dimensionality of the embedding and the number of available embeddings ( also called codebook size) are critical factors influencing the performance of Vector Quantization(VQ), a discretization process used in many models such as the Vector Quantized Variational Autoencoder (VQ-VAE) architecture. This study examines the balance between the codebook sizes and dimensions of embeddings in VQ, whi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  39. arXiv:2407.04846  [pdf, other

    cs.LG cs.AI

    Amazing Things Come From Having Many Good Models

    Authors: Cynthia Rudin, Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, Zachery Boner

    Abstract: The Rashomon Effect, coined by Leo Breiman, describes the phenomenon that there exist many equally good predictive models for the same dataset. This phenomenon happens for many real datasets and when it does, it sparks both magic and consternation, but mostly magic. In light of the Rashomon Effect, this perspective piece proposes reshaping the way we think about machine learning, particularly for… ▽ More

    Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Journal ref: ICML (spotlight), 2024

  40. arXiv:2407.03720  [pdf, other

    cs.IR cs.CL

    Query-oriented Data Augmentation for Session Search

    Authors: Haonan Chen, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen

    Abstract: Modeling contextual information in a search session has drawn more and more attention when understanding complex user intents. Recent methods are all data-driven, i.e., they train different models on large-scale search log data to identify the relevance between search contexts and candidate documents. The common training paradigm is to pair the search context with different candidate documents and… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: TKDE 2024

  41. arXiv:2407.03245  [pdf, other

    cs.RO cs.AI eess.SY

    TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

    Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, Lin Shao

    Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More

    Submitted 3 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: fix few typos

  42. arXiv:2407.03217  [pdf, other

    cs.CV

    MHNet: Multi-view High-order Network for Diagnosing Neurodevelopmental Disorders Using Resting-state fMRI

    Authors: Yueyang Li, Weiming Zeng, Wenhao Dong, Luhui Cai, Lei Wang, Hongyu Chen, Hongjie Yan, Lingbin Bian, Nizhuan Wang

    Abstract: Background: Deep learning models have shown promise in diagnosing neurodevelopmental disorders (NDD) like ASD and ADHD. However, many models either use graph neural networks (GNN) to construct single-level brain functional networks (BFNs) or employ spatial convolution filtering for local information extraction from rs-fMRI data, often neglecting high-order features crucial for NDD classification.… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages

  43. arXiv:2407.03211  [pdf, other

    cs.CL cs.LG

    How Does Quantization Affect Multilingual LLMs?

    Authors: Kelly Marchisio, Saurabh Dash, Hongyu Chen, Dennis Aumiller, Ahmet Üstün, Sara Hooker, Sebastian Ruder

    Abstract: Quantization techniques are widely used to improve inference speed and deployment of large language models. While a wide body of work examines the impact of quantized LLMs on English tasks, none have examined the effect of quantization across languages. We conduct a thorough analysis of quantized multilingual LLMs, focusing on their performance across languages and at varying scales. We use automa… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  44. arXiv:2407.03130  [pdf, other

    cs.CV

    Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

    Authors: Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

    Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  45. arXiv:2407.02779  [pdf, other

    cs.AI cs.LG

    Croppable Knowledge Graph Embedding

    Authors: Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

    Abstract: Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the effic… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  46. arXiv:2407.02762  [pdf, other

    cs.LG cs.AI

    SF-GNN: Self Filter for Message Lossless Propagation in Deep Graph Neural Network

    Authors: Yushan Zhu, Wen Zhang, Yajing Xu, Zhen Yao, Mingyang Chen, Huajun Chen

    Abstract: Graph Neural Network (GNN), with the main idea of encoding graph structure information of graphs by propagation and aggregation, has developed rapidly. It achieved excellent performance in representation learning of multiple types of graphs such as homogeneous graphs, heterogeneous graphs, and more complex graphs like knowledge graphs. However, merely stacking GNN layers may not improve the model'… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  47. arXiv:2407.02356  [pdf, other

    eess.IV cs.CV cs.LG

    Enable the Right to be Forgotten with Federated Client Unlearning in Medical Imaging

    Authors: Zhipeng Deng, Luyang Luo, Hao Chen

    Abstract: The right to be forgotten, as stated in most data regulations, poses an underexplored challenge in federated learning (FL), leading to the development of federated unlearning (FU). However, current FU approaches often face trade-offs between efficiency, model performance, forgetting efficacy, and privacy preservation. In this paper, we delve into the paradigm of Federated Client Unlearning (FCU) t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  48. arXiv:2407.02158  [pdf, other

    cs.CV

    UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

    Authors: Jingjing Ren, Wenbo Li, Haoyu Chen, Renjing Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, Lei Zhu

    Abstract: Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to 6K) within a single model, while maintaining comp… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Project page https://jingjingrenabc.github.io/ultrapixel

  49. arXiv:2407.02157  [pdf, other

    cs.CV cs.HC

    FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs

    Authors: Haodong Chen, Haojian Huang, Junhao Dong, Mingzhe Zheng, Dian Shao

    Abstract: Dynamic Facial Expression Recognition (DFER) is crucial for understanding human behavior. However, current methods exhibit limited performance mainly due to the scarcity of high-quality data, the insufficient utilization of facial dynamics, and the ambiguity of expression semantics, etc. To this end, we propose a novel framework, named Multi-modal Fine-grained CLIP for Dynamic Facial Expression Re… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project Page: https://haroldchen19.github.io/FineCLIPER-Page/

  50. arXiv:2407.02098  [pdf, other

    cs.CV

    DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection

    Authors: Kaixin Xu, Qingtian Feng, Hao Chen, Zhe Wang, Xue Geng, Xulei Yang, Min Wu, Xiaoli Li, Weisi Lin

    Abstract: Applying deep neural networks to 3D point cloud processing has attracted increasing attention due to its advanced performance in many areas, such as AR/VR, autonomous driving, and robotics. However, as neural network models and 3D point clouds expand in size, it becomes a crucial challenge to reduce the computational and memory overhead to meet latency and energy constraints in real-world applicat… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.