Skip to main content

Showing 1–50 of 5,659 results for author: Li, J

  1. arXiv:2407.13675  [pdf, other

    cs.CV

    MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis

    Authors: Ziming Zhong, Yanxu Xu, Jing Li, Jiale Xu, Zhengxin Li, Chaohui Yu, Shenghua Gao

    Abstract: We present MeshSegmenter, a simple yet effective framework designed for zero-shot 3D semantic segmentation. This model successfully extends the powerful capabilities of 2D segmentation models to 3D meshes, delivering accurate 3D segmentation across diverse meshes and segment descriptions. Specifically, our model leverages the Segment Anything Model (SAM) model to segment the target regions from im… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: The paper was accepted by ECCV2024

  2. arXiv:2407.13328  [pdf, other

    cs.CV

    Unsupervised Domain Adaptive Lane Detection via Contextual Contrast and Aggregation

    Authors: Kunyang Zhou, Yunjian Feng, Jun Li

    Abstract: This paper focuses on two crucial issues in domain-adaptive lane detection, i.e., how to effectively learn discriminative features and transfer knowledge across domains. Existing lane detection methods usually exploit a pixel-wise cross-entropy loss to train detection models. However, the loss ignores the difference in feature representation among lanes, which leads to inefficient feature learning… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13147  [pdf, other

    cs.CV

    DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection

    Authors: Zhourui Zhang, Jun Li, Zhijian Wu, Jifeng Shen, Jianhua Xu

    Abstract: In recent years, current mainstream feature masking distillation methods mainly function by reconstructing selectively masked regions of a student network from the feature maps of a teacher network. In these methods, attention mechanisms can help to identify spatially important regions and crucial object-aware channel clues, such that the reconstructed features are encoded with sufficient discrimi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.12851  [pdf

    cs.CL

    ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

    Authors: Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

    Abstract: Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 39 pages, 6 figures, 6 tables

  5. arXiv:2407.12295  [pdf, ps, other

    cs.CV eess.IV

    Exploiting Inter-Image Similarity Prior for Low-Bitrate Remote Sensing Image Compression

    Authors: Junhui Li, Xingsong Hou

    Abstract: Deep learning-based methods have garnered significant attention in remote sensing (RS) image compression due to their superior performance. Most of these methods focus on enhancing the coding capability of the compression network and improving entropy model prediction accuracy. However, they typically compress and decompress each image independently, ignoring the significant inter-image similarity… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  6. arXiv:2407.12239  [pdf, other

    cs.CV

    Motion and Structure from Event-based Normal Flow

    Authors: Zhongyang Ren, Bangyan Liao, Delei Kong, Jinghang Li, Peidong Liu, Laurent Kneip, Guillermo Gallego, Yi Zhou

    Abstract: Recovering the camera motion and scene geometry from visual data is a fundamental problem in the field of computer vision. Its success in standard vision is attributed to the maturity of feature extraction, data association and multi-view geometry. The recent emergence of neuromorphic event-based cameras places great demands on approaches that use raw event data as input to solve this fundamental… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by ECCV 2024

  7. arXiv:2407.12229  [pdf, other

    eess.AS cs.AI eess.SP

    Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

    Authors: Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. Em… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: See https://aka.ms/emoctrl-tts for demo samples

  8. arXiv:2407.12112  [pdf, other

    cs.LG cs.CY cs.SI

    A Benchmark for Fairness-Aware Graph Learning

    Authors: Yushun Dong, Song Wang, Zhenyu Lei, Zaiyi Zheng, Jing Ma, Chen Chen, Jundong Li

    Abstract: Fairness-aware graph learning has gained increasing attention in recent years. Nevertheless, there lacks a comprehensive benchmark to evaluate and compare different fairness-aware graph learning methods, which blocks practitioners from choosing appropriate ones for broader real-world applications. In this paper, we present an extensive benchmark on ten representative fairness-aware graph learning… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2407.12031  [pdf

    cs.CY cs.AI

    Evaluation of Bias Towards Medical Professionals in Large Language Models

    Authors: Xi Chen, Yang Xu, MingKe You, Li Wang, WeiZhi Liu, Jian Li

    Abstract: This study evaluates whether large language models (LLMs) exhibit biases towards medical professionals. Fictitious candidate resumes were created to control for identity factors while maintaining consistent qualifications. Three LLMs (GPT-4, Claude-3-haiku, and Mistral-Large) were tested using a standardized prompt to evaluate resumes for specific residency programs. Explicit bias was tested by ch… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 36 pages, 6 figures

  10. arXiv:2407.11872  [pdf, ps, other

    cs.GT

    Proportional Dynamics in Linear Fisher Markets with Auto-bidding: Convergence, Incentives and Fairness

    Authors: Juncheng Li, Pingzhong Tang

    Abstract: Proportional dynamics, originated from peer-to-peer file sharing systems, models a decentralized price-learning process in Fisher markets. Previously, items in the dynamics operate independently of one another, and each is assumed to belong to a different seller. In this paper, we show how it can be generalized to the setting where each seller brings multiple items and buyers allocate budgets at t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  11. arXiv:2407.11869  [pdf, ps, other

    cs.GT

    Price Competition in Linear Fisher Markets: Stability, Equilibrium and Personalization

    Authors: Juncheng Li, Pingzhong Tang

    Abstract: Linear Fisher market is one of the most fundamental economic models. The market is traditionally examined on the basis of individual's price-taking behavior. However, this assumption breaks in markets such as online advertising and e-commerce, where several oligopolists dominate the market and are able to compete with each other via strategic actions. Motivated by this, we study the price competit… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.11484  [pdf, other

    cs.AI cs.CL

    The Oscars of AI Theater: A Survey on Role-Playing with Language Models

    Authors: Nuo Chen, Yang Deng, Jia Li

    Abstract: This survey explores the burgeoning field of role-playing with language models, focusing on their development from early persona-based models to advanced character-driven simulations facilitated by Large Language Models (LLMs). Initially confined to simple persona consistency due to limited model capabilities, role-playing tasks have now expanded to embrace complex character portrayals involving c… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 28 pages

  13. arXiv:2407.11466  [pdf, other

    cs.CY

    Navigating the Data Trading Crossroads: An Interdisciplinary Survey

    Authors: Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang

    Abstract: Data has been increasingly recognized as a critical factor in the future economy. However, constructing an efficient data trading market faces challenges such as privacy breaches, data monopolies, and misuse. Despite numerous studies proposing algorithms to protect privacy and methods for pricing data, a comprehensive understanding of these issues and systemic solutions remain elusive. This paper… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  14. arXiv:2407.11435  [pdf, other

    q-bio.GN cs.LG stat.ML

    Genomic Language Models: Opportunities and Challenges

    Authors: Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song

    Abstract: Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to signif… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Review article; 25 pages, 3 figures, 1 table

    MSC Class: 92-08; 92B20; 68T50; 68T07

  15. arXiv:2407.11382  [pdf, other

    cs.CV cs.AI cs.RO

    Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts

    Authors: Jianhao Li, Tianyu Sun, Zhongdao Wang, Enze Xie, Bailan Feng, Hongbo Zhang, Ze Yuan, Ke Xu, Jiaheng Liu, Ping Luo

    Abstract: This paper proposes an algorithm for automatically labeling 3D objects from 2D point or box prompts, especially focusing on applications in autonomous driving. Unlike previous arts, our auto-labeler predicts 3D shapes instead of bounding boxes and does not require training on a specific dataset. We propose a Segment, Lift, and Fit (SLF) paradigm to achieve this goal. Firstly, we segment high-quali… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  16. arXiv:2407.10955  [pdf, other

    stat.ML cs.LG math.OC

    Enhancing Stochastic Optimization for Statistical Efficiency Using ROOT-SGD with Diminishing Stepsize

    Authors: Tong Zhang, Chris Junchi Li

    Abstract: In this paper, we revisit \textsf{ROOT-SGD}, an innovative method for stochastic optimization to bridge the gap between stochastic optimization and statistical efficiency. The proposed method enhances the performance and reliability of \textsf{ROOT-SGD} by integrating a carefully designed \emph{diminishing stepsize strategy}. This approach addresses key challenges in optimization, providing robust… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  17. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  18. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  19. arXiv:2407.10804  [pdf, other

    cs.CL

    Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment

    Authors: Jinhao Jiang, Junyi Li, Wayne Xin Zhao, Yang Song, Tao Zhang, Ji-Rong Wen

    Abstract: Adapting general large language models (LLMs) to specialized domains presents great challenges due to varied data distributions. This adaptation typically requires continual pre-training on massive domain-specific corpora to facilitate knowledge memorization, followed by training to apply this knowledge following human instructions and preferences. However, this method may result in inefficient kn… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: LLM, CPT, knowledge learning, format alignment; work in progress

  20. arXiv:2407.10550  [pdf, other

    cs.CV

    Learning Natural Consistency Representation for Face Forgery Video Detection

    Authors: Daichi Zhang, Zihao Xiao, Shikun Li, Fanzhao Lin, Jianmin Li, Shiming Ge

    Abstract: Face Forgery videos have elicited critical social public concerns and various detectors have been proposed. However, fully-supervised detectors may lead to easily overfitting to specific forgery methods or videos, and existing self-supervised detectors are strict on auxiliary tasks, such as requiring audio or multi-modalities, leading to limited generalization and robustness. In this paper, we exa… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  21. arXiv:2407.10416  [pdf, other

    cs.AR

    SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

    Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

    Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  22. arXiv:2407.10334  [pdf, ps, other

    cs.HC

    Beyond Meditation: Understanding Everyday Mindfulness Practices and Technology Use Among Experienced Practitioners

    Authors: Jingjin Li, Karen Anne Cochrane, Gilly Leshed

    Abstract: Mindfulness, a practice of bringing attention to the present non-judgmentally, has many mental and physical well-being benefits, especially when practiced consistently. Many technologies have been invented to support solo or group mindfulness practice such as mobile apps, live streams, virtual reality environments, and wearables. In this paper, we present findings from an interview study with 20 e… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Manuscripts accepted to Proc. ACM Hum.-Comput. Interact (CSCW24)

    ACM Class: H.5.2

  23. arXiv:2407.10193  [pdf, other

    cs.CV

    GRAPE: Generalizable and Robust Multi-view Facial Capture

    Authors: Jing Li, Di Kang, Zhenyu He

    Abstract: Deep learning-based multi-view facial capture methods have shown impressive accuracy while being several orders of magnitude faster than a traditional mesh registration pipeline. However, the existing systems (e.g. TEMPEH) are strictly restricted to inference on the data captured by the same camera array used to capture their training data. In this study, we aim to improve the generalization abili… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  24. arXiv:2407.10167  [pdf, other

    cs.CL cs.AI

    Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

    Authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang

    Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these small… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.11864

  25. arXiv:2407.10055  [pdf

    cs.LG q-bio.QM

    MKDTI: Predicting drug-target interactions via multiple kernel fusion on graph attention network

    Authors: Yuhuan Zhou, Yulin Wu, Weiwei Yuan, Xuan Wang, Junyi Li

    Abstract: Drug-target relationships may now be predicted computationally using bioinformatics data, which is a valuable tool for understanding pharmacological effects, enhancing drug development efficiency, and advancing related research. A number of structure-based, ligand-based and network-based approaches have now emerged. Furthermore, the integration of graph attention networks with intricate drug targe… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  26. arXiv:2407.10047  [pdf, other

    cs.CV

    HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation

    Authors: Chengjie Jiang, Xiaowen Liu, Bowen Zheng, Lu Bai, Jing Li

    Abstract: Infrared and visible image fusion has been developed from vision perception oriented fusion methods to strategies which both consider the vision perception and high-level vision task. However, the existing task-driven methods fail to address the domain gap between semantic and geometric representation. To overcome these issues, we propose a high-level vision task-driven infrared and visible image… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  27. arXiv:2407.09935  [pdf, other

    cs.CV cs.MM eess.IV

    LeRF: Learning Resampling Function for Adaptive and Efficient Image Interpolation

    Authors: Jiacheng Li, Chang Chen, Fenglong Song, Youliang Yan, Zhiwei Xiong

    Abstract: Image resampling is a basic technique that is widely employed in daily applications, such as camera photo editing. Recent deep neural networks (DNNs) have made impressive progress in performance by introducing learned data priors. Still, these methods are not the perfect substitute for interpolation, due to the drawbacks in efficiency and versatility. In this work, we propose a novel method of Lea… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/ddlee-cn/LeRF-PyTorch

  28. arXiv:2407.09862  [pdf, other

    cs.CV

    ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency

    Authors: Shaocheng Yan, Pengcheng Shi, Jiayuan Li

    Abstract: Recent advances in point cloud registration mostly leverage geometric information. Although these methods have yielded promising results, they still struggle with problems of low overlap, thus limiting their practical usage. In this paper, we propose ML-SemReg, a plug-and-play point cloud registration framework that fully exploits semantic information. Our key insight is that mismatches can be cat… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  29. arXiv:2407.09826  [pdf, other

    cs.CV

    3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

    Authors: Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

    Abstract: In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-langu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  30. arXiv:2407.09475  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Adaptive Prediction Ensemble: Improving Out-of-Distribution Generalization of Motion Forecasting

    Authors: Jinning Li, Jiachen Li, Sangjae Bae, David Isele

    Abstract: Deep learning-based trajectory prediction models for autonomous driving often struggle with generalization to out-of-distribution (OOD) scenarios, sometimes performing worse than simple rule-based models. To address this limitation, we propose a novel framework, Adaptive Prediction Ensemble (APE), which integrates deep learning and rule-based prediction experts. A learned routing function, trained… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  31. arXiv:2407.09451  [pdf, other

    cs.RO

    Benchmarking Large Neighborhood Search for Multi-Agent Path Finding

    Authors: Jiaqi Tan, Yudong Luo, Jiaoyang Li, Hang Ma

    Abstract: Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  32. arXiv:2407.08914  [pdf, other

    cs.NI eess.SP

    Multi-objective Aerial Collaborative Secure Communication Optimization via Generative Diffusion Model-enabled Deep Reinforcement Learning

    Authors: Chuang Zhang, Geng Sun, Jiahui Li, Qingqing Wu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu

    Abstract: Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Transactions on Mobile Computing

  33. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  34. arXiv:2407.08374  [pdf, other

    cs.CV

    Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization

    Authors: Jinlong Li, Zequn Jie, Elisa Ricci, Lin Ma, Nicu Sebe

    Abstract: Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering from task overfitting when finetuned on a small data set. In this paper, we introduce an orthogonal finetuning method for efficiently updating pretra… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  35. Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

    Authors: Juren Li, Fanzhe Fu, Ran Wei, Yifei Sun, Zeyu Lai, Ning Song, Xin Chen, Yang Yang

    Abstract: Pathogenic chromosome abnormalities are very common among the general population. While numerical chromosome abnormalities can be quickly and precisely detected, structural chromosome abnormalities are far more complex and typically require considerable efforts by human experts for identification. This paper focuses on investigating the modeling of chromosome features and the identification of chr… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.08189  [pdf, other

    cs.CL cs.AI

    fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

    Authors: Jinfeng Li, Yuefeng Chen, Xiangyu Liu, Longtao Huang, Rong Zhang, Hui Xue

    Abstract: Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  37. arXiv:2407.08148  [pdf, other

    cs.CV

    SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

    Authors: Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

    Abstract: We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  38. arXiv:2407.08081  [pdf, other

    cs.RO cs.HC

    RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects

    Authors: Jiahao Nick Li, Toby Chong, Zhongyi Zhou, Hironori Yoshida, Koji Yatani, Xiang 'Anthony' Chen, Takeo Igarashi

    Abstract: Object pose estimation plays a vital role in mixed-reality interactions when users manipulate tangible objects as controllers. Traditional vision-based object pose estimation methods leverage 3D reconstruction to synthesize training data. However, these methods are designed for static objects with diffuse colors and do not work well for objects that change their appearance during manipulation, suc… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  39. arXiv:2407.08039  [pdf, other

    cs.CL

    Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

    Authors: Yuji Zhang, Sha Li, Jiateng Liu, Pengfei Yu, Yi R. Fung, Jing Li, Manling Li, Heng Ji

    Abstract: Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We coin this phenomenon as ``knowledge overshadowing'': when we query knowledge from a language model wi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  40. arXiv:2407.07457  [pdf, other

    cs.LG cs.CL

    GLBench: A Comprehensive Benchmark for Graph with Large Language Models

    Authors: Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

    Abstract: The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10280 by other authors

  41. arXiv:2407.07365  [pdf, other

    cs.CV

    High-Resolution Cloud Detection Network

    Authors: Jingsheng Li, Tianxiang Xue, Jiayi Zhao, Jingmin Ge, Yufang Min, Wei Su, Kun Zhan

    Abstract: The complexity of clouds, particularly in terms of texture detail at high resolutions, has not been well explored by most existing cloud detection networks. This paper introduces the High-Resolution Cloud Detection Network (HR-cloud-Net), which utilizes a hierarchical high-resolution integration approach. HR-cloud-Net integrates a high-resolution representation module, layer-wise cascaded feature… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Journal of Electronic Imaging

  42. arXiv:2407.07324  [pdf, other

    cs.CV

    Event-Aided Time-to-Collision Estimation for Autonomous Driving

    Authors: Jinghang Li, Bangyan Liao, Xiuyuan LU, Peidong Liu, Shaojie Shen, Yi Zhou

    Abstract: Predicting a potential collision with leading vehicles is an essential functionality of any autonomous/assisted driving system. One bottleneck of existing vision-based solutions is that their updating rate is limited to the frame rate of standard cameras used. In this paper, we present a novel method that estimates the time to collision using a neuromorphic event-based camera, a biologically inspi… ▽ More

    Submitted 16 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to European Conference on Computer Vision 2024, dataset used in this paper can be found at https://nail-hnu.github.io/EventAidedTTC

  43. arXiv:2407.07307  [pdf, other

    cs.CV

    Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken

    Authors: Peifu Liu, Tingfa Xu, Jie Wang, Huan Chen, Huiyan Bai, Jianan Li

    Abstract: Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introdu… ▽ More

    Submitted 13 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  44. arXiv:2407.07295  [pdf, other

    eess.IV cs.CE cs.CV

    Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis

    Authors: Jian-Qing Zheng, Yuanhan Mo, Yang Sun, Jiahua Li, Fuping Wu, Ziyang Wang, Tonia Vincent, Bartłomiej W. Papież

    Abstract: In medical imaging, the diffusion models have shown great potential in synthetic image generation tasks. However, these models often struggle with the interpretable connections between the generated and existing images and could create illusions. To address these challenges, our research proposes a novel diffusion-based generative model based on deformation diffusion and recovery. This model, name… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  45. arXiv:2407.07035  [pdf, other

    cs.CL cs.CV

    Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

    Authors: Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

    Abstract: Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for VLN research. In this survey, we provide a top-down review that adopts a principled framework for embodied planning and reasoning, and emphasizes the… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Authors contributed equally to this work, and supervisors contributed equal advising to this work

  46. arXiv:2407.06623  [pdf, other

    cs.NI

    SKYCASTLE: Taming LEO Mobility to Facilitate Seamless and Low-latency Satellite Internet Services

    Authors: Jihao Li, Hewu Li, Zeqi Lai, Qian Wu, Weisen Liu, Xiaomo Wang, Yuanjie Li, Jun Liu, Qi Zhang

    Abstract: Emerging integrated space and terrestrial networks (ISTN) built upon low earth orbit (LEO) satellite constellations aim at providing planet-wide Internet services, not only for residential users, but also for mobile users (e.g., in airplane and cruise scenarios). Efficiently managing global mobility and keeping connections active for mobile users is critical for ISTN operators. However, our quanti… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 10 pages, 10 figures, accepted by IEEE INFOCOM 2024

    Journal ref: IEEE International Conference on Computer Communications 2024

  47. arXiv:2407.06614  [pdf, other

    eess.IV cs.CV

    Implicit Regression in Subspace for High-Sensitivity CEST Imaging

    Authors: Chu Chen, Yang Liu, Se Weon Park, Jizhou Li, Kannie W. Y. Chan, Raymond H. F. Chan

    Abstract: Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  48. arXiv:2407.06546  [pdf, other

    cs.CV cs.RO

    Exploring the Causality of End-to-End Autonomous Driving

    Authors: Jiankun Li, Hao Li, Jiangjiang Liu, Zhikang Zou, Xiaoqing Ye, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  49. arXiv:2407.06524  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer

    Authors: Jizhen Li, Xinmeng Xu, Weiping Tu, Yuhong Yang, Rong Zhu

    Abstract: Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with diffe… ▽ More

    Submitted 13 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  50. arXiv:2407.06505  [pdf

    cs.HC

    Not all explicit cues help communicate: Pedestrians' perceptions, fixations, and decisions toward automated vehicles with varied appearance

    Authors: Wei Lyu, Yaqin Cao, Yi Ding, Jingyu Li, Kai Tian, Hui Zhang

    Abstract: Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 37 pages, 13 figures, 4 tables