Skip to main content

Showing 1–50 of 3,965 results for author: Liu, X

  1. arXiv:2407.13757  [pdf, other

    cs.CL cs.AI cs.CR

    Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

    Authors: Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

    Abstract: Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) mod… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 10 pages, 3 figures, under review

  2. arXiv:2407.13719  [pdf, other

    cs.CV

    HazeCLIP: Towards Language Guided Real-World Image Dehazing

    Authors: Ruiyi Wang, Wenhao Li, Xiaohong Liu, Chunyi Li, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

    Abstract: Existing methods have achieved remarkable performance in single image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  3. arXiv:2407.13545  [pdf, other

    eess.IV cs.CV

    DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

    Authors: Xuhui Liu, Zhi Qiao, Runkun Liu, Hong Li, Juan Zhang, Xiantong Zhen, Zhen Qian, Baochang Zhang

    Abstract: Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific res… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.12996  [pdf, other

    stat.ML cs.LG

    Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

    Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

    Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.12718  [pdf, other

    cs.CV

    SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

    Authors: Yuanzhi Zhu, Xingchao Liu, Qiang Liu

    Abstract: Diffusion models excel in high-quality generation but suffer from slow inference due to iterative sampling. While recent methods have successfully transformed diffusion models into one-step generators, they neglect model size reduction, limiting their applicability in compute-constrained scenarios. This paper aims to develop small, efficient one-step diffusion models based on the powerful rectifie… ▽ More

    Submitted 17 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  7. arXiv:2407.12611  [pdf, other

    cs.CV

    Deep Mutual Learning among Partially Labeled Datasets for Multi-Organ Segmentation

    Authors: Xiaoyu Liu, Linhao Qu, Ziyue Xie, Yonghong Shi, Zhijian Song

    Abstract: The task of labeling multiple organs for segmentation is a complex and time-consuming process, resulting in a scarcity of comprehensively labeled multi-organ datasets while the emergence of numerous partially labeled datasets. Current methods are inadequate in effectively utilizing the supervised information available from these datasets, thereby impeding the progress in improving the segmentation… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 10 pages, 9 figures

  8. arXiv:2407.12448  [pdf, other

    cs.LG

    Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

    Authors: Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu

    Abstract: Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an inno… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.12431  [pdf, other

    cs.CV

    GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval

    Authors: Han Zhou, Wei Dong, Xiaohong Liu, Shuaicheng Liu, Xiongkuo Min, Guangtao Zhai, Jun Chen

    Abstract: Most existing Low-light Image Enhancement (LLIE) methods either directly map Low-Light (LL) to Normal-Light (NL) images or use semantic or illumination maps as guides. However, the ill-posed nature of LLIE and the difficulty of semantic retrieval from impaired inputs limit these methods, especially in extremely low-light conditions. To address this issue, we present a new LLIE network via Generati… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  10. arXiv:2407.12329  [pdf, other

    cs.CV

    Label-Efficient 3D Brain Segmentation via Complementary 2D Diffusion Models with Orthogonal Views

    Authors: Jihoon Cho, Suhyun Ahn, Beomju Kim, Hyungjoon Bae, Xiaofeng Liu, Fangxu Xing, Kyungeun Lee, Georges Elfakhri, Van Wedeen, Jonghye Woo, Jinah Park

    Abstract: Deep learning-based segmentation techniques have shown remarkable performance in brain segmentation, yet their success hinges on the availability of extensive labeled training data. Acquiring such vast datasets, however, poses a significant challenge in many clinical applications. To address this issue, in this work, we propose a novel 3D brain segmentation approach using complementary 2D diffusio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Extended version of "3D Segmentation of Subcortical Brain Structure with Few Labeled Data using 2D Diffusion Models" (ISMRM 2024 oral)

  11. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  12. arXiv:2407.12258  [pdf, other

    cs.CV

    Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

    Authors: Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu Wang, Gengchen Li, Xiao Sun

    Abstract: In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integr… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  13. arXiv:2407.12257  [pdf, other

    cs.CV

    Compound Expression Recognition via Multi Model Ensemble for the ABAW7 Challenge

    Authors: Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Minrui Liu, Liuwei An, Zishun Cui, Weijie Feng, Xiao Sun

    Abstract: Compound Expression Recognition (CER) is vital for effective interpersonal interactions. Human emotional expressions are inherently complex due to the presence of compound expressions, requiring the consideration of both local and global facial cues for accurate judgment. In this paper, we propose an ensemble learning-based solution to address this complexity. Our approach involves training three… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.12572 by other authors

  14. arXiv:2407.12021  [pdf, other

    cs.CL cs.AI

    Adaptive Draft-Verification for Efficient Large Language Model Decoding

    Authors: Xukun Liu, Bowen Lei, Ruqi Zhang, Dongkuan Xu

    Abstract: Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires a separate forward pass through the model for each token generated, which is computationally inefficient and poses challenges for deploying LLMs in latency-sens… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Under review of Neurips 2024

  15. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  16. arXiv:2407.11745  [pdf, other

    eess.AS cs.AI cs.SD

    Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

    Authors: Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

    Abstract: Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.11712  [pdf, other

    cs.IR

    Harnessing Large Language Models for Multimodal Product Bundling

    Authors: Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, Tat-seng Chua

    Abstract: Product bundling provides clients with a strategic combination of individual items. And it has gained significant attention in recent years as a fundamental prerequisite for online services. Recent methods utilize multimodal information through sophisticated extractors for bundling, but remain limited by inferior semantic understanding, the restricted scope of knowledge, and an inability to handle… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: under review

  18. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  19. arXiv:2407.11073  [pdf, other

    cs.CR cs.CV cs.LG

    SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

    Authors: Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

    Abstract: Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following bre… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  20. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  21. arXiv:2407.10459  [pdf, other

    cs.CV

    DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

    Authors: Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

    Abstract: Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures; reference added; accepted at IJCAI2024 main track

  22. arXiv:2407.10446  [pdf, other

    cs.SD cs.AI eess.AS

    DDFAD: Dataset Distillation Framework for Audio Data

    Authors: Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu

    Abstract: Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to comp… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  23. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  24. arXiv:2407.10047  [pdf, other

    cs.CV

    HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation

    Authors: Chengjie Jiang, Xiaowen Liu, Bowen Zheng, Lu Bai, Jing Li

    Abstract: Infrared and visible image fusion has been developed from vision perception oriented fusion methods to strategies which both consider the vision perception and high-level vision task. However, the existing task-driven methods fail to address the domain gap between semantic and geometric representation. To overcome these issues, we propose a high-level vision task-driven infrared and visible image… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  25. arXiv:2407.09918  [pdf, other

    eess.IV cs.CV

    DiffRect: Latent Diffusion Label Rectification for Semi-supervised Medical Image Segmentation

    Authors: Xinyu Liu, Wuyang Li, Yixuan Yuan

    Abstract: Semi-supervised medical image segmentation aims to leverage limited annotated data and rich unlabeled data to perform accurate segmentation. However, existing semi-supervised methods are highly dependent on the quality of self-generated pseudo labels, which are prone to incorrect supervision and confirmation bias. Meanwhile, they are insufficient in capturing the label distributions in latent spac… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  26. arXiv:2407.09872  [pdf, other

    cs.SE

    A Systematic Literature Review on Task Recommendation Systems for Crowdsourced Software Engineering

    Authors: Shashiwadana Nirmani, Mojtaba Shahin, Hourieh Khalajzadeh, Xiao Liu

    Abstract: Context: Crowdsourced Software Engineering CSE offers outsourcing work to software practitioners by leveraging a global online workforce. However these software practitioners struggle to identify suitable tasks due to the variety of options available. Hence there have been a growing number of studies on introducing recommendation systems to recommend CSE tasks to software practitioners. Objective:… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 33 pages, 3 figures

  27. arXiv:2407.09820  [pdf

    cs.CY

    Mining individual daily commuting patterns of dockless bike-sharing users: a two-layer framework integrating spatiotemporal flow clustering and rule-based decision trees

    Authors: Caigang Zhuang, Shaoying Li, Xiaoping Liu

    Abstract: The rise of dockless bike-sharing systems has led to increased interest in using bike-sharing data for urban transportation and travel behavior research. However, few studies have focused on the individual daily mobility patterns, hindering their alignment with the increasingly refined needs of urban active transportation planning. To bridge this gap, this study presents a two-layer framework, int… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  28. arXiv:2407.09817  [pdf, other

    cs.SD cs.CL eess.AS

    Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

    Authors: Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recogniti… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024

  29. arXiv:2407.09590  [pdf, other

    cs.CL cs.LG

    Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

    Authors: Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

    Abstract: By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 13pages, 6 figures

  30. arXiv:2407.09546  [pdf, other

    q-fin.TR cs.SI

    A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading

    Authors: Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen, Xu Liu, Bingsheng He

    Abstract: The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions. Yet, the unique opportunities presented by the cryptocurrency market, noted for its on-chain data's transparency and the critical influence of off-chain signals like news, remain largely untapped by LLMs. This work aims to bridge… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  31. arXiv:2407.09120  [pdf, other

    cs.LG cs.CL cs.CV

    URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

    Authors: Ge Teng, Ting Mao, Chen Shen, Xiang Tian, Xuesong Liu, Yaowu Chen, Jieping Ye

    Abstract: Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM SIGKDD 2024

  32. arXiv:2407.09016  [pdf, other

    cs.RO

    OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

    Authors: Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui Liu

    Abstract: Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration beco… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  33. arXiv:2407.08952  [pdf, other

    cs.CL cs.AI

    Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection

    Authors: Ye Liu, Jiajun Zhu, Kai Zhang, Haoyu Tang, Yanghai Zhang, Xukai Liu, Qi Liu, Enhong Chen

    Abstract: Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learni… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  34. arXiv:2407.08939  [pdf, other

    cs.CV

    LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

    Authors: Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu

    Abstract: In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded f… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  35. arXiv:2407.08855  [pdf, other

    eess.IV cs.CV

    BraTS-PEDs: Results of the Multi-Consortium International Pediatric Brain Tumor Segmentation Challenge 2023

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Xinyang Liu, Debanjan Haldar, Zhifan Jiang, Anna Zapaishchykova, Julija Pavaine, Lubdha M. Shah, Blaise V. Jones, Nakul Sheth, Sanjay P. Prabhu, Aaron S. McAllister, Wenxin Tu, Khanak K. Nandolia, Andres F. Rodriguez, Ibraheem Salman Shaikh, Mariana Sanchez Montano, Hollie Anne Lai, Maruf Adewole, Jake Albrecht, Udunna Anazodo, Hannah Anderson, Syed Muhammed Anwar, Alejandro Aristizabal, Sina Bagheri , et al. (55 additional authors not shown)

    Abstract: Pediatric central nervous system tumors are the leading cause of cancer-related deaths in children. The five-year survival rate for high-grade glioma in children is less than 20%. The development of new treatments is dependent upon multi-institutional collaborative clinical trials requiring reproducible and accurate centralized response assessment. We present the results of the BraTS-PEDs 2023 cha… ▽ More

    Submitted 16 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  36. arXiv:2407.08418  [pdf, other

    cs.LG cs.CV

    PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

    Authors: ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  37. arXiv:2407.08189  [pdf, other

    cs.CL cs.AI

    fairBERTs: Erasing Sensitive Information Through Semantic and Fairness-aware Perturbations

    Authors: Jinfeng Li, Yuefeng Chen, Xiangyu Liu, Longtao Huang, Rong Zhang, Hui Xue

    Abstract: Pre-trained language models (PLMs) have revolutionized both the natural language processing research and applications. However, stereotypical biases (e.g., gender and racial discrimination) encoded in PLMs have raised negative ethical implications for PLMs, which critically limits their broader applications. To address the aforementioned unfairness issues, we present fairBERTs, a general framework… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  38. arXiv:2407.07799  [pdf, other

    cs.CL

    Attribute or Abstain: Large Language Models as Long Document Assistants

    Authors: Jan Buchmann, Xiao Liu, Iryna Gurevych

    Abstract: LLMs can help humans working with long documents, but are known to hallucinate. Attribution can increase trust in LLM responses: The LLM provides evidence that supports its response, which enhances verifiability. Existing approaches to attribution have only been evaluated in RAG settings, where the initial retrieval confounds LLM performance. This is crucially different from the long document sett… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Code and data: https://github.com/UKPLab/arxiv2024-attribute-or-abstain

  39. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  40. arXiv:2407.07392  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems

    Authors: Chashi Mahiul Islam, Shaeke Salman, Montasir Shams, Xiuwen Liu, Piyush Kumar

    Abstract: Building on the unprecedented capabilities of large language models for command understanding and zero-shot recognition of multi-modal vision-language transformers, visual language navigation (VLN) has emerged as an effective way to address multiple fundamental challenges toward a natural language interface to robot navigation. However, such vision-language models are inherently vulnerable due to… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures. This paper has been accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

  41. arXiv:2407.07374  [pdf, other

    cs.CV

    DuInNet: Dual-Modality Feature Interaction for Point Cloud Completion

    Authors: Xinpu Liu, Baolin Hou, Hanyun Wang, Ke Xu, Jianwei Wan, Yulan Guo

    Abstract: To further promote the development of multimodal point cloud completion, we contribute a large-scale multimodal point cloud completion benchmark ModelNet-MPC with richer shape categories and more diverse test data, which contains nearly 400,000 pairs of high-quality point clouds and rendered images of 40 categories. Besides the fully supervised point cloud completion task, two additional tasks inc… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Under Review, 13 pages, 7 figures

  42. arXiv:2407.07076  [pdf, other

    eess.IV cs.CV

    MADE-for-ASD: A Multi-Atlas Deep Ensemble Network for Diagnosing Autism Spectrum Disorder

    Authors: Md Rakibul Hasan, Xuehan Liu, Tom Gedeon, Md Zakir Hossain

    Abstract: In response to the global need for efficient early diagnosis of Autism Spectrum Disorder (ASD), this paper bridges the gap between traditional, time-consuming diagnostic methods and potential automated solutions. We propose a multi-atlas deep ensemble network, MADE-for-ASD, that integrates multiple atlases of the brain's functional magnetic resonance imaging (fMRI) data through a weighted deep ens… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Md Rakibul Hasan and Xuehan Liu contributed equally to this work

  43. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  44. arXiv:2407.06293  [pdf, other

    cs.CE physics.app-ph

    A Framework for Simulating the Path-level Residual Stress in the Laser Powder Bed Fusion Process

    Authors: Xin Liu, Xingchen Liu, Paul Witherell

    Abstract: Laser Powder Bed Fusion (LPBF) additive manufacturing has revolutionized industries with its capability to create intricate and customized components. The LPBF process uses moving heat sources to melt and solidify metal powders. The fast melting and cooling leads to residual stress, which critically affects the part quality. Currently, the computational intensity of accurately simulating the resid… ▽ More

    Submitted 10 April, 2024; originally announced July 2024.

  45. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/

  46. arXiv:2407.06152  [pdf, other

    physics.chem-ph cs.AI

    Uni-ELF: A Multi-Level Representation Learning Framework for Electrolyte Formulation Design

    Authors: Boshen Zeng, Sian Chen, Xinxin Liu, Changhong Chen, Bin Deng, Xiaoxu Wang, Zhifeng Gao, Yuzhi Zhang, Weinan E, Linfeng Zhang

    Abstract: Advancements in lithium battery technology heavily rely on the design and engineering of electrolytes. However, current schemes for molecular design and recipe optimization of electrolytes lack an effective computational-experimental closed loop and often fall short in accurately predicting diverse electrolyte formulation properties. In this work, we introduce Uni-ELF, a novel multi-level represen… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  47. arXiv:2407.06129  [pdf, other

    cs.AI cs.HC

    Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

    Authors: Hannah K. Bako, Arshnoor Bhutani, Xinyi Liu, Kwesi A. Cobbina, Zhicheng Liu

    Abstract: Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges pers… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, IEEE VIS short papers

  48. arXiv:2407.05890  [pdf, other

    cs.RO cs.CL

    Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Xinmin Liu, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: LLM-based agents have demonstrated impressive zero-shot performance in the vision-language navigation (VLN) task. However, these zero-shot methods focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in realistic navigation scenarios. To bridge this gap, we propose AO-Planner, a novel affordances-oriented pla… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  49. arXiv:2407.05858  [pdf, other

    cs.AI

    Empowering 1000 tokens/second on-device LLM prefilling with mllm-NPU

    Authors: Daliang Xu, Hao Zhang, Liming Yang, Ruiqi Liu, Gang Huang, Mengwei Xu, Xuanzhe Liu

    Abstract: On-device large language models (LLMs) are catalyzing novel mobile applications such as UI task automation and personalized email auto-reply, without giving away users' private data. However, on-device LLMs still suffer from unacceptably long inference latency, especially the time to first token (prefill stage) due to the need of long context for accurate, personalized content generation, as well… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  50. arXiv:2407.05610  [pdf, other

    cs.CV

    Described Spatial-Temporal Video Detection

    Authors: Wei Ji, Xiangyan Liu, Yingfei Sun, Jiajun Deng, You Qin, Ammar Nuwanna, Mengyao Qiu, Lina Wei, Roger Zimmermann

    Abstract: Detecting visual content on language expression has become an emerging topic in the community. However, in the video domain, the existing setting, i.e., spatial-temporal video grounding (STVG), is formulated to only detect one pre-existing object in each frame, ignoring the fact that language descriptions can involve none or multiple entities within a video. In this work, we advance the STVG to a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.