Skip to main content

Showing 1–50 of 2,811 results for author: Li, C

  1. arXiv:2407.13719  [pdf, other

    cs.CV

    HazeCLIP: Towards Language Guided Real-World Image Dehazing

    Authors: Ruiyi Wang, Wenhao Li, Xiaohong Liu, Chunyi Li, Zicheng Zhang, Xiongkuo Min, Guangtao Zhai

    Abstract: Existing methods have achieved remarkable performance in single image dehazing, particularly on synthetic datasets. However, they often struggle with real-world hazy images due to domain shift, limiting their practical applicability. This paper introduces HazeCLIP, a language-guided adaptation framework designed to enhance the real-world performance of pre-trained dehazing networks. Inspired by th… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2407.13217  [pdf, other

    cs.CV

    LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning

    Authors: Wei Huang, Wei Liu, Xiaoming Zhang, Xiaoli Yin, Xu Han, Chunli Li, Yuan Gao, Yu Shi, Le Lu, Ling Zhang, Lei Zhang, Ke Yan

    Abstract: The early detection and precise diagnosis of liver tumors are tasks of critical clinical value, yet they pose significant challenges due to the high heterogeneity and variability of liver tumors. In this work, a precise LIver tumor DIAgnosis network on multi-phase contrast-enhance CT, named LIDIA, is proposed for real-world scenario. To fully utilize all available phases in contrast-enhanced CT, L… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  3. arXiv:2407.13210  [pdf, other

    eess.IV cs.CV

    Improved Esophageal Varices Assessment from Non-Contrast CT Scans

    Authors: Chunli Li, Xiaoming Zhang, Yuan Gao, Xiaoli Yin, Le Lu, Ling Zhang, Ke Yan, Yu Shi

    Abstract: Esophageal varices (EV), a serious health concern resulting from portal hypertension, are traditionally diagnosed through invasive endoscopic procedures. Despite non-contrast computed tomography (NC-CT) imaging being a less expensive and non-invasive imaging modality, it has yet to gain full acceptance as a primary clinical diagnostic tool for EV evaluation. To overcome existing diagnostic challen… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Early accepted to MICCAI 2024

  4. arXiv:2407.13038  [pdf, other

    cs.CV cs.LG

    Universal Facial Encoding of Codec Avatars from VR Headsets

    Authors: Shaojie Bai, Te-Li Wang, Chenghui Li, Akshay Venkatesh, Tomas Simon, Chen Cao, Gabriel Schwartz, Ryan Wrench, Jason Saragih, Yaser Sheikh, Shih-En Wei

    Abstract: Faithful real-time facial animation is essential for avatar-mediated telepresence in Virtual Reality (VR). To emulate authentic communication, avatar animation needs to be efficient and accurate: able to capture both extreme and subtle expressions within a few milliseconds to sustain the rhythm of natural conversations. The oblique and incomplete views of the face, variability in the donning of he… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH 2024 (ACM Transactions on Graphics (TOG))

    Journal ref: ACM Trans. Graph. 43, 4, Article 93 (July 2024), 22 pages.

  5. arXiv:2407.12870  [pdf, other

    q-bio.QM cs.LG eess.IV

    Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

    Authors: Jianan Fan, Dongnan Liu, Canran Li, Hang Chang, Heng Huang, Filip Braet, Mei Chen, Weidong Cai

    Abstract: Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology. However, with disparate source organs and staining procedures among histology image clusters, the scanned tiles inherently conform to a non-uniform data distribution, which induces deteriorated promises for general cross-cohort usages. Despite the latest efforts leveraging domain adaptation… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 main conference

  6. arXiv:2407.12772  [pdf, other

    cs.CL cs.CV

    LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

    Authors: Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu

    Abstract: The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) remain limited. In this work, we introduce LMMS-EVAL, a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 mod… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Code ad leaderboard are available at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench

  7. arXiv:2407.12247  [pdf, other

    cs.CL

    Lacuna Language Learning: Leveraging RNNs for Ranked Text Completion in Digitized Coptic Manuscripts

    Authors: Lauren Levine, Cindy Tung Li, Lydia Bremer-McCollum, Nicholas Wagner, Amir Zeldes

    Abstract: Ancient manuscripts are frequently damaged, containing gaps in the text known as lacunae. In this paper, we present a bidirectional RNN model for character prediction of Coptic characters in manuscript lacunae. Our best model performs with 72% accuracy on single character reconstruction, but falls to 37% when reconstructing lacunae of various lengths. While not suitable for definitive manuscript r… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Machine Learning for Ancient Languages, ACL 2024 Workshop, 15 August 2024

    ACM Class: I.2.7

  8. arXiv:2407.12229  [pdf, other

    eess.AS cs.AI eess.SP

    Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

    Authors: Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda

    Abstract: People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, including NVs. This paper introduces EmoCtrl-TTS, an emotion-controllable zero-shot TTS that can generate highly emotional speech with NVs for any speaker. Em… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: See https://aka.ms/emoctrl-tts for demo samples

  9. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  10. arXiv:2407.11435  [pdf, other

    q-bio.GN cs.LG stat.ML

    Genomic Language Models: Opportunities and Challenges

    Authors: Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song

    Abstract: Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to signif… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Review article; 25 pages, 3 figures, 1 table

    MSC Class: 92-08; 92B20; 68T50; 68T07

  11. arXiv:2407.11431  [pdf

    cs.CV

    MRIo3DS-Net: A Mutually Reinforcing Images to 3D Surface RNN-like framework for model-adaptation indoor 3D reconstruction

    Authors: Chang Li, Jiao Guo, Yufei Zhao, Yongjun Zhang

    Abstract: This paper is the first to propose an end-to-end framework of mutually reinforcing images to 3D surface recurrent neural network-like for model-adaptation indoor 3D reconstruction,where multi-view dense matching and point cloud surface optimization are mutually reinforced by a RNN-like structure rather than being treated as a separate issue.The characteristics are as follows:In the multi-view dens… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.11287  [pdf

    eess.IV cs.GR

    A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation

    Authors: Chengsheng Li, Zhijun Liu

    Abstract: Digital Volume Correlation (DVC) is widely used for the analysis of three-dimensional displacement and strain fields based on CT scans. However, the applicability of DVC methods is limited when it comes to geomaterials: CT speckles are directly correlated with the material's microstructure, and the speckle structure cannot be artificially altered, with generally poor speckle quality. Additionally,… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  13. arXiv:2407.10955  [pdf, other

    stat.ML cs.LG math.OC

    Enhancing Stochastic Optimization for Statistical Efficiency Using ROOT-SGD with Diminishing Stepsize

    Authors: Tong Zhang, Chris Junchi Li

    Abstract: In this paper, we revisit \textsf{ROOT-SGD}, an innovative method for stochastic optimization to bridge the gap between stochastic optimization and statistical efficiency. The proposed method enhances the performance and reliability of \textsf{ROOT-SGD} by integrating a carefully designed \emph{diminishing stepsize strategy}. This approach addresses key challenges in optimization, providing robust… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.10876  [pdf, other

    cs.CV

    RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception

    Authors: Chunliang Li, Wencheng Han, Junbo Yin, Sanyuan Zhao, Jianbing Shen

    Abstract: Concurrent processing of multiple autonomous driving 3D perception tasks within the same spatiotemporal scene poses a significant challenge, in particular due to the computational inefficiencies and feature competition between tasks when using traditional multi-task learning approaches. This paper addresses these issues by proposing a novel unified representation, RepVF, which harmonizes the repre… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  15. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  16. arXiv:2407.10510  [pdf, other

    cs.CL cs.AI cs.CE

    TCM-FTP: Fine-Tuning Large Language Models for Herbal Prescription Prediction

    Authors: Xingzhi Zhou, Xin Dong, Chunhao Li, Yuning Bai, Yulong Xu, Ka Chun Cheung, Simon See, Xinpeng Song, Runshun Zhang, Xuezhong Zhou, Nevin L. Zhang

    Abstract: Traditional Chinese medicine (TCM) relies on specific combinations of herbs in prescriptions to treat symptoms and signs, a practice that spans thousands of years. Predicting TCM prescriptions presents a fascinating technical challenge with practical implications. However, this task faces limitations due to the scarcity of high-quality clinical datasets and the intricate relationship between sympt… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  17. arXiv:2407.10424  [pdf, other

    cs.PL cs.AI

    CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization

    Authors: Yang Zhao, Di Huang, Chongxiao Li, Pengwei Jin, Ziyuan Nan, Tianyun Ma, Lei Qi, Yansong Pan, Zhenxing Zhang, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: The increasing complexity and high costs associated with modern processor design have led to a surge in demand for processor design automation. Instruction-tuned large language models (LLMs) have demonstrated remarkable performance in automatically generating code for general-purpose programming languages like Python. However, these methods fail on hardware description languages (HDLs) like Verilo… ▽ More

    Submitted 18 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 16 pages, 8 figures, conference

  18. arXiv:2407.10416  [pdf, other

    cs.AR

    SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

    Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

    Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  19. arXiv:2407.10374  [pdf, other

    cs.CV cs.AI

    An Empirical Study of Mamba-based Pedestrian Attribute Recognition

    Authors: Xiao Wang, Weizhe Kong, Jiandong Jin, Shiao Wang, Ruichong Gao, Qingchuan Ma, Chenglong Li, Jin Tang

    Abstract: Current strong pedestrian attribute recognition models are developed based on Transformer networks, which are computationally heavy. Recently proposed models with linear complexity (e.g., Mamba) have garnered significant attention and have achieved a good balance between accuracy and computational cost across a variety of visual tasks. Relevant review articles also suggest that while these models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: In Peer Review

  20. arXiv:2407.09853  [pdf, other

    cs.CV

    Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

    Authors: Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrate… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024, project: https://github.com/qingshi9974/ECCV2024-AdpatICMH

  21. arXiv:2407.09435  [pdf, other

    cs.AI

    MUSCLE: A Model Update Strategy for Compatible LLM Evolution

    Authors: Jessica Echterhoff, Fartash Faghri, Raviteja Vemulapalli, Ting-Yao Hu, Chun-Liang Li, Oncel Tuzel, Hadi Pouransari

    Abstract: Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they ar… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  22. arXiv:2407.08473  [pdf, other

    cs.AR cs.AI

    Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

    Authors: Kaiyan Chang, Zhirong Chen, Yunhao Zhou, Wenlong Zhu, kun wang, Haobo Xu, Cangyuan Li, Mengdi Wang, Shengwen Liang, Huawei Li, Yinhe Han, Ying Wang

    Abstract: Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  23. arXiv:2407.08184  [pdf

    math.NA cs.CG

    Geometry-based Multi-beam Survey Line Layout Problem

    Authors: Chuangqi Li, Yuhang Wang, Fan Hu

    Abstract: The multi-beam measurement system plays a crucial role in ocean mapping and underwater terrain detection. By simultaneously transmitting multiple beams, the system can accurately receive sound waves reflected from the seabed, providing more precise and comprehensive water depth information while effectively revealing the complexity and characteristics of underwater terrain. Building upon the backg… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 8 pages and 5 images, which have been included in the Conference: 2024 3th International Conference on Computational Modeling, Simulation and Data Analysis (CMSDA 2023)

  24. arXiv:2407.08149  [pdf, other

    cs.CV

    Deep Polarization Cues for Single-shot Shape and Subsurface Scattering Estimation

    Authors: Chenhao Li, Trung Thanh Ngo, Hajime Nagahara

    Abstract: In this work, we propose a novel learning-based method to jointly estimate the shape and subsurface scattering (SSS) parameters of translucent objects by utilizing polarization cues. Although polarization cues have been used in various applications, such as shape from polarization (SfP), BRDF estimation, and reflection removal, their application in SSS estimation has not yet been explored. Our obs… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV24

  25. arXiv:2407.07895  [pdf, other

    cs.CV cs.CL cs.LG

    LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

    Authors: Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li

    Abstract: Visual instruction tuning has made considerable strides in enhancing the capabilities of Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single-image tasks, their applications to multi-image scenarios remains less explored. Additionally, prior LMM research separately tackles different scenarios, leaving it impossible to generalize cross scenarios with new emerging capa… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Project Page: https://llava-vl.github.io/blog/2024-06-16-llava-next-interleave/

  26. arXiv:2407.07660  [pdf, ps, other

    cs.CV cs.AI

    Boosting Medical Image Synthesis via Registration-guided Consistency and Disentanglement Learning

    Authors: Chuanpu Li, Zeli Chen, Yiwen Zhang, Liming Zhong, Wei Yang

    Abstract: Medical image synthesis remains challenging due to misalignment noise during training. Existing methods have attempted to address this challenge by incorporating a registration-guided module. However, these methods tend to overlook the task-specific constraints on the synthetic and registration modules, which may cause the synthetic module to still generate spatially aligned images with misaligned… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  27. arXiv:2407.07464  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Video-to-Audio Generation with Hidden Alignment

    Authors: Manjie Xu, Chenxing Li, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu

    Abstract: Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techni… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: https://sites.google.com/view/vta-ldm

  28. arXiv:2407.07304  [pdf, other

    cs.AI

    Inference Performance Optimization for Large Language Models on CPUs

    Authors: Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie

    Abstract: Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 5 pages, 6 figure, ICML 2024 on Foundation Models in the Wild

  29. arXiv:2407.06723  [pdf, other

    cs.CV cs.AI cs.LG

    Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

    Authors: Yu-Guan Hsieh, Cheng-Yu Hsieh, Shih-Ying Yeh, Louis Béthune, Hadi Pour Ansari, Pavan Kumar Anasosalu Vasu, Chun-Liang Li, Ranjay Krishna, Oncel Tuzel, Marco Cuturi

    Abstract: Humans describe complex scenes with compositionality, using simple text descriptions enriched with links and relationships. While vision-language research has aimed to develop models with compositional understanding capabilities, this is not reflected yet in existing datasets which, for the most part, still use plain text to describe images. In this work, we propose a new annotation strategy, grap… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 47 pages, 33 figures

  30. arXiv:2407.06688  [pdf, other

    cs.CV

    Universal Multi-view Black-box Attack against Object Detectors via Layout Optimization

    Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Chao Li, Xiaoqian Chen

    Abstract: Object detectors have demonstrated vulnerability to adversarial examples crafted by small perturbations that can deceive the object detector. Existing adversarial attacks mainly focus on white-box attacks and are merely valid at a specific viewpoint, while the universal multi-view black-box attack is less explored, limiting their generalization in practice. In this paper, we propose a novel univer… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 12 pages, 13 figures, 5 tables

  31. arXiv:2407.05671  [pdf, other

    cs.CV cs.AI

    MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

    Authors: Zhanwen Liu, Chao Li, Nan Yang, Yang Wang, Jiaqi Ma, Guangliang Cheng, Xiangmo Zhao

    Abstract: Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such ove… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  32. arXiv:2407.05552  [pdf, other

    cs.CV

    Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder

    Authors: Jia Liu, Changlin Li, Qirui Sun, Jiahui Ming, Chen Fang, Jue Wang, Bing Zeng, Shuaicheng Liu

    Abstract: Fine-tuning advanced diffusion models for high-quality image stylization usually requires large training datasets and substantial computational resources, hindering their practical applicability. We propose Ada-Adapter, a novel framework for few-shot style personalization of diffusion models. Ada-Adapter leverages off-the-shelf diffusion models and pre-trained image feature encoders to learn a com… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 16 pages, 11 figures

    MSC Class: 68T07 ACM Class: I.4.0

  33. arXiv:2407.05540  [pdf, other

    cs.CV

    GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

    Authors: Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

    Abstract: Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph fo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  34. arXiv:2407.05376  [pdf, other

    cs.RO

    Rethinking Closed-loop Planning Framework for Imitation-based Model Integrating Prediction and Planning

    Authors: Jiayu Guo, Mingyue Feng, Pengfei Zhu, Chengjun Li, Jian Pu

    Abstract: In recent years, the integration of prediction and planning through neural networks has received substantial attention. Despite extensive studies on it, there is a noticeable gap in understanding the operation of such models within a closed-loop planning setting. To bridge this gap, we propose a novel closed-loop planning framework compatible with neural networks engaged in joint prediction and pl… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 7 pages,5 figures

  35. arXiv:2407.04281  [pdf, other

    cs.RO

    WOMD-Reasoning: A Large-Scale Language Dataset for Interaction and Driving Intentions Reasoning

    Authors: Yiheng Li, Chongjian Ge, Chenran Li, Chenfeng Xu, Masayoshi Tomizuka, Chen Tang, Mingyu Ding, Wei Zhan

    Abstract: We propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a language annotation dataset built on WOMD, with a focus on describing and reasoning interactions and intentions in driving scenarios. Previous language datasets primarily captured interactions caused by close distances. However, interactions induced by traffic rules and human intentions, which can occur over long distances, are yet… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  36. Exploration of Class Center for Fine-Grained Visual Classification

    Authors: Hang Yao, Qiguang Miao, Peipei Zhao, Chaoneng Li, Xin Li, Guanwen Feng, Ruyi Liu

    Abstract: Different from large-scale classification tasks, fine-grained visual classification is a challenging task due to two critical problems: 1) evident intra-class variances and subtle inter-class differences, and 2) overfitting owing to fewer training samples in datasets. Most existing methods extract key features to reduce intra-class variances, but pay no attention to subtle inter-class differences… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accpeted by TCSVT. Code and trained models are here:https://github.com/hyao1/ECC

  37. arXiv:2407.04078  [pdf, other

    cs.CL cs.AI cs.LG

    DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

    Authors: Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu

    Abstract: Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress

  38. arXiv:2407.03995  [pdf, other

    cs.LG cs.AI cs.RO

    ROER: Regularized Optimal Experience Replay

    Authors: Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, Joni Pajarinen

    Abstract: Experience replay serves as a key component in the success of online reinforcement learning (RL). Prioritized experience replay (PER) reweights experiences by the temporal difference (TD) error empirically enhancing the performance. However, few works have explored the motivation of using TD error. In this work, we provide an alternative perspective on TD-error-based reweighting. We show the conne… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Journal ref: Reinforcement Learning Journal, vol. 1, no. 1, 2024, pp. TBD

  39. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  40. arXiv:2407.03757  [pdf, other

    cs.CV

    DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts

    Authors: Zheng-Peng Duan, Jiawei zhang, Zheng Lin, Xin Jin, Dongqing Zou, Chunle Guo, Chongyi Li

    Abstract: Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  41. arXiv:2407.03037  [pdf, other

    cs.SE

    Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model

    Authors: Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Boyu Wu, Yawen Wang, Jun Hu, Qing Wang

    Abstract: With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  42. arXiv:2407.03033  [pdf

    cs.CV

    ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation

    Authors: Chang Li, Pengfei Zhang, Yu Wang

    Abstract: Currently the semantic segmentation task of multispectral remotely sensed imagery (MSRSI) faces the following problems: 1) Usually, only single domain feature (i.e., space domain or frequency domain) is considered; 2) downsampling operation in encoder generally leads to the accuracy loss of edge extraction; 3) multichannel features of MSRSI are not fully considered; and 4) prior knowledge of remot… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  43. arXiv:2407.02328  [pdf, other

    cs.CL

    Efficient Sparse Attention needs Adaptive Token Release

    Authors: Chaoran Zhang, Lixin Zou, Dan Luo, Min Tang, Xiangyang Luo, Zihao Li, Chenliang Li

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large' scale introduces significant computational and storage challenges, particularly in managing the key-value states of the transformer, which limits their wider applicability. Therefore, we propose to adaptively release resources from caches and reb… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024(Findings)

  44. arXiv:2407.02265  [pdf, other

    cs.LG q-bio.BM

    DrugCLIP: Contrastive Drug-Disease Interaction For Drug Repurposing

    Authors: Yingzhou Lu, Yaojun Hu, Chenhao Li

    Abstract: Bringing a novel drug from the original idea to market typically requires more than ten years and billions of dollars. To alleviate the heavy burden, a natural idea is to reuse the approved drug to treat new diseases. The process is also known as drug repurposing or drug repositioning. Machine learning methods exhibited huge potential in automating drug repurposing. However, it still encounter som… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  45. arXiv:2407.02235  [pdf

    cs.CL

    Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

    Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou

    Abstract: Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 figures, 5 supplementary figures, 8 supplementary tables

  46. arXiv:2407.01470  [pdf, other

    cs.CL

    DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging

    Authors: Tzu-Han Lin, Chen-An Li, Hung-yi Lee, Yun-Nung Chen

    Abstract: Reinforcement learning from human feedback (RLHF) is a popular strategy for aligning large language models (LLMs) with desired behaviors. Reward modeling is a crucial step in RLHF. However, collecting paired preference data for training reward models is often costly and time-consuming, especially for domain-specific preferences requiring expert annotation. To address this challenge, we propose the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Preprint. Code will be released after the review results

  47. arXiv:2407.01306  [pdf, other

    cs.LG cs.CR

    Unveiling the Unseen: Exploring Whitebox Membership Inference through the Lens of Explainability

    Authors: Chenxi Li, Abhinav Kumar, Zhen Guo, Jie Hou, Reza Tourani

    Abstract: The increasing prominence of deep learning applications and reliance on personalized data underscore the urgent need to address privacy vulnerabilities, particularly Membership Inference Attacks (MIAs). Despite numerous MIA studies, significant knowledge gaps persist, particularly regarding the impact of hidden features (in isolation) on attack efficacy and insufficient justification for the root… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 20 pages, 10 figures, 4 tables

  48. arXiv:2407.01301  [pdf, other

    cs.CV

    GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

    Authors: Chenxin Li, Hengyu Liu, Zhiwen Fan, Wuyang Li, Yifan Liu, Panwang Pan, Yixuan Yuan

    Abstract: Recent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project website: https://gaussian-stego.github.io/

  49. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  50. arXiv:2407.01244  [pdf, other

    cs.CV

    CLHOP: Combined Audio-Video Learning for Horse 3D Pose and Shape Estimation

    Authors: Ci Li, Elin Hernlund, Hedvig Kjellström, Silvia Zuffi

    Abstract: In the monocular setting, predicting 3D pose and shape of animals typically relies solely on visual information, which is highly under-constrained. In this work, we explore using audio to enhance 3D shape and motion recovery of horses from monocular video. We test our approach on two datasets: an indoor treadmill dataset for 3D evaluation and an outdoor dataset capturing diverse horse movements, t… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: CVPR CV4Animals Workshop 2024