Skip to main content

Showing 1–50 of 3,884 results for author: Liu, Z

  1. arXiv:2407.13561  [pdf, other

    cs.CL

    Research on Tibetan Tourism Viewpoints information generation system based on LLM

    Authors: Jinhu Qi, Shuai Yan, Wentao Zhang, Yibo Zhang, Zirui Liu, Ke Wang

    Abstract: Tibet, ensconced within China's territorial expanse, is distinguished by its labyrinthine and heterogeneous topography, a testament to its profound historical heritage, and the cradle of a unique religious ethos. The very essence of these attributes, however, has impeded the advancement of Tibet's tourism service infrastructure, rendering existing smart tourism services inadequate for the region's… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: ICWOC 2024

  2. arXiv:2407.12899  [pdf, other

    cs.CV cs.AI cs.MM

    DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

    Authors: Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

    Abstract: Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subject-consistent frames based solely on a story. To this end, we propose DreamStory, an automatic open-domain story visualization framework by leveragin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.12772  [pdf, other

    cs.CL cs.CV

    LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

    Authors: Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu

    Abstract: The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) remain limited. In this work, we introduce LMMS-EVAL, a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 mod… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Code ad leaderboard are available at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench

  4. arXiv:2407.12393  [pdf, other

    cs.CL cs.AI cs.CY

    PersLLM: A Personified Training Approach for Large Language Models

    Authors: Zheni Zeng, Jiayi Chen, Huimin Chen, Yukun Yan, Yuxuan Chen, Zhiyuan Liu, Maosong Sun

    Abstract: Large language models exhibit aspects of human-level intelligence that catalyze their application as human-like agents in domains such as social simulations, human-machine interactions, and collaborative multi-agent systems. However, the absence of distinct personalities, such as displaying ingratiating behaviors, inconsistent opinions, and uniform response patterns, diminish LLMs utility in pract… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 10 pages for main text, 5 figures

  5. arXiv:2407.12128  [pdf, other

    cs.LG cs.CV

    Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams

    Authors: Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos Plataniotis, Yang Wang

    Abstract: Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  6. arXiv:2407.12068  [pdf, other

    cs.LG cs.AI

    Learning on Graphs with Large Language Models(LLMs): A Deep Dive into Model Robustness

    Authors: Kai Guo, Zewen Liu, Zhikai Chen, Hongzhi Wen, Wei Jin, Jiliang Tang, Yi Chang

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing tasks. Recently, several LLMs-based pipelines have been developed to enhance learning on graphs with text attributes, showcasing promising performance. However, graphs are well-known to be susceptible to adversarial attacks and it remains unclear whether LLMs exhibit robustness in learn… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.11998  [pdf, other

    cs.HC

    Custom Cloth Creation and Virtual Try-on for Everyone

    Authors: Pei Chen, Heng Wang, Sainan Sun, Zhiyuan Chen, Zhenkun Liu, Shuhua Cao, Li Yang, Minghui Yang

    Abstract: This demo showcases a simple tool that utilizes AIGC technology, enabling both professional designers and regular users to easily customize clothing for their digital avatars. Customization options include changing clothing colors, textures, logos, and patterns. Compared with traditional 3D modeling processes, our approach significantly enhances efficiency and interactivity and reduces production… ▽ More

    Submitted 13 June, 2024; originally announced July 2024.

  8. arXiv:2407.11421  [pdf, other

    cs.CL

    States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

    Authors: Junhao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  10. arXiv:2407.11384  [pdf, other

    cs.CL

    InvAgent: A Large Language Model based Multi-Agent System for Inventory Management in Supply Chains

    Authors: Yinzhu Quan, Zefang Liu

    Abstract: Supply chain management (SCM) involves coordinating the flow of goods, information, and finances across various entities to deliver products efficiently. Effective inventory management is crucial in today's volatile, uncertain, complex, and ambiguous (VUCA) world. Previous research has demonstrated the superiority of heuristic methods and reinforcement learning applications in inventory management… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  11. arXiv:2407.11287  [pdf

    eess.IV cs.GR

    A Self-Correcting Strategy of the Digital Volume Correlation Displacement Field Based on Image Matching: Application to Poor Speckles Quality and Complex-Large Deformation

    Authors: Chengsheng Li, Zhijun Liu

    Abstract: Digital Volume Correlation (DVC) is widely used for the analysis of three-dimensional displacement and strain fields based on CT scans. However, the applicability of DVC methods is limited when it comes to geomaterials: CT speckles are directly correlated with the material's microstructure, and the speckle structure cannot be artificially altered, with generally poor speckle quality. Additionally,… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  12. arXiv:2407.11253  [pdf, other

    cs.LG cs.CE

    Separable Operator Networks

    Authors: Xinling Yu, Sean Hooten, Ziyue Liu, Yequan Zhao, Marco Fiorentino, Thomas Van Vaerenbergh, Zheng Zhang

    Abstract: Operator learning has become a powerful tool in machine learning for modeling complex physical systems. Although Deep Operator Networks (DeepONet) show promise, they require extensive data acquisition. Physics-informed DeepONets (PI-DeepONet) mitigate data scarcity but suffer from inefficient training processes. We introduce Separable Operator Networks (SepONet), a novel framework that significant… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  13. arXiv:2407.11188  [pdf, other

    cs.CV

    Efficient In-Context Medical Segmentation with Meta-driven Visual Prompt Selection

    Authors: Chenwei Wu, David Restrepo, Zitao Shuai, Zhongming Liu, Liyue Shen

    Abstract: In-context learning (ICL) with Large Vision Models (LVMs) presents a promising avenue in medical image segmentation by reducing the reliance on extensive labeling. However, the ICL performance of LVMs highly depends on the choices of visual prompts and suffers from domain shifts. While existing works leveraging LVMs for medical tasks have focused mainly on model-centric approaches like fine-tuning… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.10937  [pdf, other

    cs.CV

    IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation

    Authors: Yuanhao Zhai, Kevin Lin, Linjie Li, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, David Doermann, Junsong Yuan, Zicheng Liu, Lijuan Wang

    Abstract: Significant advances have been made in human-centric video generation, yet the joint video-depth generation problem remains underexplored. Most existing monocular depth estimation methods may not generalize well to synthesized images or videos, and multi-view-based methods have difficulty controlling the human appearance and motion. In this work, we present IDOL (unIfied Dual-mOdal Latent diffusio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://yhzhai.github.io/idol/

  15. arXiv:2407.10753  [pdf, other

    cs.CV

    OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection

    Authors: Jinghua Hou, Tong Wang, Xiaoqing Ye, Zhe Liu, Shi Gong, Xiao Tan, Errui Ding, Jingdong Wang, Xiang Bai

    Abstract: Accurate depth information is crucial for enhancing the performance of multi-view 3D object detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise depth supervision, they overlook two significant phenomena: 1) the depth supervision obtained from LiDAR points is usually distributed on the surface of the object, which is not so friendly to existing DETR-based 3D… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  16. arXiv:2407.10749  [pdf, other

    cs.CV

    SEED: A Simple and Effective 3D DETR in Point Clouds

    Authors: Zhe Liu, Jinghua Hou, Xiaoqing Ye, Tong Wang, Jingdong Wang, Xiang Bai

    Abstract: Recently, detection transformers (DETRs) have gradually taken a dominant position in 2D detection thanks to their elegant framework. However, DETR-based detectors for 3D point clouds are still difficult to achieve satisfactory performance. We argue that the main challenges are twofold: 1) How to obtain the appropriate object queries is challenging due to the high sparsity and uneven distribution o… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  17. arXiv:2407.10714  [pdf, other

    cs.IR cs.AI

    SEMINAR: Search Enhanced Multi-modal Interest Network and Approximate Retrieval for Lifelong Sequential Recommendation

    Authors: Kaiming Shen, Xichen Ding, Zixiang Zheng, Yuqi Gong, Qianqian Li, Zhongyi Liu, Guannan Zhang

    Abstract: The modeling of users' behaviors is crucial in modern recommendation systems. A lot of research focuses on modeling users' lifelong sequences, which can be extremely long and sometimes exceed thousands of items. These models use the target item to search for the most relevant items from the historical sequence. However, training lifelong sequences in click through rate (CTR) prediction or personal… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages,code released

  18. arXiv:2407.10632  [pdf, other

    eess.IV cs.AI cs.CV

    Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

    Authors: Zhening Liu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang

    Abstract: With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo im… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  19. arXiv:2407.10459  [pdf, other

    cs.CV

    DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

    Authors: Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

    Abstract: Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures; reference added; accepted at IJCAI2024 main track

  20. arXiv:2407.10241  [pdf, other

    cs.CL

    BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs

    Authors: Zhiting Fan, Ruizhe Chen, Ruiling Xu, Zuozhu Liu

    Abstract: Evaluating the bias in Large Language Models (LLMs) becomes increasingly crucial with their rapid development. However, existing evaluation methods rely on fixed-form outputs and cannot adapt to the flexible open-text generation scenarios of LLMs (e.g., sentence completion and question answering). To address this, we introduce BiasAlert, a plug-and-play tool designed to detect social bias in open-… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  21. arXiv:2407.10205  [pdf, other

    quant-ph cs.ET math.CO

    Parallel Ising Annealer via Gradient-based Hamiltonian Monte Carlo

    Authors: Hao Wang, Zixuan Liu, Zhixin Xie, Langyu Li, Zibo Miao, Wei Cui, Yu Pan

    Abstract: Ising annealer is a promising quantum-inspired computing architecture for combinatorial optimization problems. In this paper, we introduce an Ising annealer based on the Hamiltonian Monte Carlo, which updates the variables of all dimensions in parallel. The main innovation is the fusion of an approximate gradient-based approach into the Ising annealer which introduces significant acceleration and… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  22. arXiv:2407.10058  [pdf, other

    cs.CL cs.AI

    Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

    Authors: Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Wenliang Chen

    Abstract: Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daT… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  23. arXiv:2407.09899  [pdf, other

    cs.RO

    DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands

    Authors: Zhengshen Zhang, Lei Zhou, Chenchen Liu, Zhiyang Liu, Chengran Yuan, Sheng Guo, Ruiteng Zhao, Marcelo H. Ang Jr., Francis EH Tay

    Abstract: The versatility and adaptability of human grasping catalyze advancing dexterous robotic manipulation. While significant strides have been made in dexterous grasp generation, current research endeavors pivot towards optimizing object manipulation while ensuring functional integrity, emphasizing the synthesis of functional grasps following desired affordance instructions. This paper addresses the ch… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  24. arXiv:2407.09336  [pdf, other

    cs.LG cs.AI

    Guidelines for Augmentation Selection in Contrastive Learning for Time Series Classification

    Authors: Ziyu Liu, Azadeh Alavi, Minyi Li, Xiang Zhang

    Abstract: Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 20 pages, 11 figures

  25. arXiv:2407.08725  [pdf, other

    cs.CV cs.AI cs.RO

    MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces

    Authors: Wayne Wu, Honglin He, Yiran Wang, Chenda Duan, Jack He, Zhizheng Liu, Quanyi Li, Bolei Zhou

    Abstract: Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while diverse robot dogs and humanoids have recently emerged in the street. Ensurin… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Technical report. Project page: https://metadriverse.github.io/metaurban/

  26. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages

  27. arXiv:2407.08351  [pdf, other

    cs.CL cs.LG

    AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models

    Authors: Xiang Lisa Li, Evan Zheran Liu, Percy Liang, Tatsunori Hashimoto

    Abstract: Evaluation is critical for assessing capabilities, tracking scientific progress, and informing model selection. In this paper, we present three desiderata for a good benchmark for language models: (i) salience (e.g., knowledge about World War II is more salient than a random day in history), (ii) novelty (i.e., the benchmark reveals new trends in model rankings not shown by previous benchmarks), a… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: preprint

  28. arXiv:2407.08044  [pdf, other

    cs.CL cs.AI cs.LG

    RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

    Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

    Abstract: Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantizati… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  29. arXiv:2407.07780  [pdf, other

    cs.CV

    Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher

    Authors: Jiangming Chen, Li Liu, Wanxia Deng, Zhen Liu, Yu Liu, Yingmei Wei, Yongxiang Liu

    Abstract: Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level ov… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  30. arXiv:2407.07702  [pdf, other

    cs.IT eess.SP

    Leveraging Self-Supervised Learning for MIMO-OFDM Channel Representation and Generation

    Authors: Zongxi Liu, Jiacheng Chen, Yunting Xu, Ting Ma, Jingbo Liu, Haibo Zhou, Dusit Niyato

    Abstract: In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO… ▽ More

    Submitted 23 May, 2024; originally announced July 2024.

  31. arXiv:2407.07672  [pdf, other

    cs.HC

    StoryDiffusion: How to Support UX Storyboarding With Generative-AI

    Authors: Zhaohui Liang, Xiaoyu Zhang, Kevin Ma, Zhao Liu, Xipei Ren, Kosa Goucher-Lambert, Can Liu

    Abstract: Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' indiv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  32. arXiv:2407.07667  [pdf, other

    cs.CV eess.IV

    VEnhancer: Generative Space-Time Enhancement for Video Generation

    Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

    Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: technical report

  33. arXiv:2407.07325  [pdf, other

    cs.CV cs.CL cs.MM eess.IV

    HiLight: Technical Report on the Motern AI Video Language Model

    Authors: Zhiting Wang, Qiangong Zhou, Kangjie Yang, Zongyang Liu, Xin Mao

    Abstract: This technical report presents the implementation of a state-of-the-art video encoder for video-text modal alignment and a video conversation framework called HiLight, which features dual visual towers. The work is divided into two main parts: 1.alignment of video and text modalities; 2.convenient and efficient way to interact with users. Our goal is to address the task of video comprehension in t… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  34. arXiv:2407.07061  [pdf, other

    cs.CL

    Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

    Authors: Weize Chen, Ziming You, Ran Li, Yitong Guan, Chen Qian, Chenyang Zhao, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

    Abstract: The rapid advancement of large language models (LLMs) has paved the way for the development of highly capable autonomous agents. However, existing multi-agent frameworks often struggle with integrating diverse capable third-party agents due to reliance on agents defined within their own ecosystems. They also face challenges in simulating distributed environments, as most frameworks are limited to… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: work in progress

  35. arXiv:2407.06767  [pdf, other

    cs.IT eess.SP

    Enhancing Robustness and Security in ISAC Network Design: Leveraging Transmissive Reconfigurable Intelligent Surface with RSMA

    Authors: Ziwei Liu, Wen Chen, Qingqing Wu, Zhendong Li, Xusheng Zhu, Qiong Wu, Nan Cheng

    Abstract: In this paper, we propose a novel transmissive reconfigurable intelligent surface transceiver-enhanced robust and secure integrated sensing and communication network. A time-division sensing communication mechanism is designed for the scenario, which enables communication and sensing to share wireless resources. To address the interference management problem and hinder eavesdropping, we implement… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  36. arXiv:2407.06249  [pdf, other

    cs.CL cs.SE

    CodeUpdateArena: Benchmarking Knowledge Editing on API Updates

    Authors: Zeyu Leo Liu, Shrey Pandit, Xi Ye, Eunsol Choi, Greg Durrett

    Abstract: Large language models (LLMs) are increasingly being used to synthesize and reason about source code. However, the static nature of these models' knowledge does not reflect the fact that libraries and API functions they invoke are continuously evolving, with functionality being added or changing. While numerous benchmarks evaluate how LLMs can generate code, no prior work has studied how an LLMs' k… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Under Review

  37. arXiv:2407.06190  [pdf, other

    cs.CV cs.LG cs.RO

    4D Contrastive Superflows are Dense 3D Representation Learners

    Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

    Abstract: In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotempora… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 36 pages, 11 figures, 11 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

  38. arXiv:2407.06188  [pdf, other

    cs.CV

    CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

    Authors: Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

    Abstract: Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing h… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project page: https://gxyes.github.io/projects/CrowdMoGen.html

  39. arXiv:2407.06129  [pdf, other

    cs.AI cs.HC

    Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization

    Authors: Hannah K. Bako, Arshnoor Bhutani, Xinyi Liu, Kwesi A. Cobbina, Zhicheng Liu

    Abstract: Automatically generating data visualizations in response to human utterances on datasets necessitates a deep semantic understanding of the data utterance, including implicit and explicit references to data attributes, visualization tasks, and necessary data preparation steps. Natural Language Interfaces (NLIs) for data visualization have explored ways to infer such information, yet challenges pers… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, IEEE VIS short papers

  40. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  41. arXiv:2407.05671  [pdf, other

    cs.CV cs.AI

    MSTF: Multiscale Transformer for Incomplete Trajectory Prediction

    Authors: Zhanwen Liu, Chao Li, Nan Yang, Yang Wang, Jiaqi Ma, Guangliang Cheng, Xiangmo Zhao

    Abstract: Motion forecasting plays a pivotal role in autonomous driving systems, enabling vehicles to execute collision warnings and rational local-path planning based on predictions of the surrounding vehicles. However, prevalent methods often assume complete observed trajectories, neglecting the potential impact of missing values induced by object occlusion, scope limitation, and sensor failures. Such ove… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  42. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  43. arXiv:2407.05382  [pdf, other

    cs.CV

    Rethinking Unsupervised Outlier Detection via Multiple Thresholding

    Authors: Zhonghang Liu, Panzhong Lu, Guoyang Xie, Zhichao Lu, Wen-Yan Lin

    Abstract: In the realm of unsupervised image outlier detection, assigning outlier scores holds greater significance than its subsequent task: thresholding for predicting labels. This is because determining the optimal threshold on non-separable outlier score functions is an ill-posed problem. However, the lack of predicted labels not only hiders some real applications of current outlier detectors but also c… ▽ More

    Submitted 14 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  44. arXiv:2407.04794  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks

    Authors: Zesen Liu, Tianshuo Cong, Xinlei He, Qi Li

    Abstract: Large Language Models (LLMs) excel in various applications, including text generation and complex tasks. However, the misuse of LLMs raises concerns about the authenticity and ethical implications of the content they produce, such as deepfake news, academic fraud, and copyright infringement. Watermarking techniques, which embed identifiable markers in machine-generated text, offer a promising solu… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  45. arXiv:2407.04416  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Audio Generation with Visual Enhanced Caption

    Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

    Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low quality and relatively small quantity of training data. In this work, we aim to create a large-scale audio dataset with rich captions for improving audi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages with 1 appendix

  46. arXiv:2407.03961  [pdf, other

    cs.CV cs.LG

    Leveraging Latent Diffusion Models for Training-Free In-Distribution Data Augmentation for Surface Defect Detection

    Authors: Federico Girella, Ziyue Liu, Franco Fummi, Francesco Setti, Marco Cristani, Luigi Capogrosso

    Abstract: Defect detection is the task of identifying defects in production samples. Usually, defect detection classifiers are trained on ground-truth data formed by normal samples (negative data) and samples with defects (positive data), where the latter are consistently fewer than normal samples. State-of-the-art data augmentation procedures add synthetic defect data by superimposing artifacts to normal s… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted at the 21st International Conference on Content-Based Multimedia Indexing (CBMI 2024)

  47. arXiv:2407.03699  [pdf, other

    cs.CV

    Generalized Robust Fundus Photography-based Vision Loss Estimation for High Myopia

    Authors: Zipei Yan, Zhile Liang, Zhengji Liu, Shuai Wang, Rachel Ka-Man Chun, Jizhou Li, Chea-su Kee, Dong Liang

    Abstract: High myopia significantly increases the risk of irreversible vision loss. Traditional perimetry-based visual field (VF) assessment provides systematic quantification of visual loss but it is subjective and time-consuming. Consequently, machine learning models utilizing fundus photographs to estimate VF have emerged as promising alternatives. However, due to the high variability and the limited ava… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024, code: https://github.com/yanzipei/VF_RED

  48. arXiv:2407.03195  [pdf, other

    math.OC cs.LG

    Incremental Gauss--Newton Methods with Superlinear Convergence Rates

    Authors: Zhiling Zhou, Zhuanghua Liu, Chengchang Liu, Luo Luo

    Abstract: This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and ou… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 37 pages, 9 figures

  49. arXiv:2407.03037  [pdf, other

    cs.SE

    Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model

    Authors: Zhe Liu, Cheng Li, Chunyang Chen, Junjie Wang, Boyu Wu, Yawen Wang, Jun Hu, Qing Wang

    Abstract: With the advancement of software rendering techniques, GUI pages in mobile apps now encompass a wealth of visual information, where the visual semantics of each page contribute to the overall app logic, presenting new challenges to software testing. Despite the progress in automated Graphical User Interface (GUI) testing, the absence of testing oracles has constrained its efficacy to identify only… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  50. arXiv:2407.02824  [pdf, other

    cs.SE

    Exploring the Capabilities of LLMs for Code Change Related Tasks

    Authors: Lishui Fan, Jiakun Liu, Zhongxin Liu, David Lo, Xin Xia, Shanping Li

    Abstract: Developers deal with code-change-related tasks daily, e.g., reviewing code. Pre-trained code and code-change-oriented models have been adapted to help developers with such tasks. Recently, large language models (LLMs) have shown their effectiveness in code-related tasks. However, existing LLMs for code focus on general code syntax and semantics rather than the differences between two code versions… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.