Skip to main content

Showing 1–50 of 3,256 results for author: Wang, W

  1. arXiv:2407.12888  [pdf

    cs.CL cs.AI

    Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models

    Authors: Alexander R. Pelletier, Joseph Ramirez, Irsyad Adam, Simha Sankar, Yu Yan, Ding Wang, Dylan Steinecke, Wei Wang, Peipei Ping

    Abstract: The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12842  [pdf, other

    cs.CL cs.AI

    MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

    Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

    Abstract: Sign language understanding has made significant strides; however, there is still no viable solution for generating sign sequences directly from entire spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users. In particular, a sequence diffusion model, utilizing embeddi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Findings; Project Page: https://hechang25.github.io/MS2SL

  3. arXiv:2407.12797  [pdf, other

    cs.PF cs.LG

    CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines

    Authors: Wenbo Sun, Jiaqi Wang, Qiming Guo, Ziyu Li, Wenlu Wang, Rihan Hai

    Abstract: Online Large Language Model (LLM) services such as ChatGPT and Claude 3 have transformed business operations and academic research by effortlessly enabling new opportunities. However, due to data-sharing restrictions, sectors such as healthcare and finance prefer to deploy local LLM applications using costly hardware resources. This scenario requires a balance between the effectiveness advantages… ▽ More

    Submitted 20 June, 2024; originally announced July 2024.

  4. arXiv:2407.12334  [pdf, other

    cs.CR

    Cabin: Confining Untrusted Programs within Confidential VMs

    Authors: Benshan Mei, Saisai Xia, Wenhao Wang, Dongdai Lin

    Abstract: Confidential computing safeguards sensitive computations from untrusted clouds, with Confidential Virtual Machines (CVMs) providing a secure environment for guest OS. However, CVMs often come with large and vulnerable operating system kernels, making them susceptible to attacks exploiting kernel weaknesses. The imprecise control over the read/write access in the page table has allowed attackers to… ▽ More

    Submitted 17 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: ICICS 2024

  5. arXiv:2407.12180  [pdf, other

    cs.NI cs.RO

    A UAV-assisted Wireless Localization Challenge on AERPAW

    Authors: Paul Kudyba, Jaya Sravani Mandapaka, Weijie Wang, Logan McCorkendale, Zachary McCorkendale, Mathias Kidane, Haijian Sun, Eric Adams, Kamesh Namuduri, Fraida Fund, Mihail Sichitiu, Ozgur Ozdemir

    Abstract: As wireless researchers are tasked to enable wireless communication as infrastructure in more dynamic aerial settings, there is a growing need for large-scale experimental platforms that provide realistic, reproducible, and reliable experimental validation. To bridge the research-to-implementation gap, the Aerial Experimentation and Research Platform for Advanced Wireless (AERPAW) offers open-sour… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE magazine paper

  6. arXiv:2407.11745  [pdf, other

    eess.AS cs.AI cs.SD

    Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

    Authors: Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

    Abstract: Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.11610  [pdf, other

    cs.CV

    MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

    Authors: Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

    Abstract: This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  8. arXiv:2407.10373  [pdf, other

    cs.SD cs.AI cs.CV eess.AS

    Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

    Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

    Abstract: Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired da… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://hechang25.github.io/MVSD

  9. arXiv:2407.10200  [pdf, other

    cs.CV cs.AI

    Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

    Authors: Tuo Feng, Wenguan Wang, Ruijie Quan, Yi Yang

    Abstract: Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/FengZicai/S2S

  10. arXiv:2407.10167  [pdf, other

    cs.CL cs.AI

    Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

    Authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang

    Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these small… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.11864

  11. arXiv:2407.10132  [pdf, other

    cs.LG stat.ME

    Optimal Kernel Choice for Score Function-based Causal Discovery

    Authors: Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

    Abstract: Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML2024

  12. arXiv:2407.09792  [pdf, other

    cs.RO

    Language-Augmented Symbolic Planner for Open-World Task Planning

    Authors: Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, Jia Pan

    Abstract: Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by Robotics: Science and Systems (RSS) 2024

  13. arXiv:2407.09693  [pdf, other

    cs.LG cs.AI

    A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

    Authors: Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor

    Abstract: The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symboli… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  14. arXiv:2407.09648  [pdf, other

    cs.CV

    3x2: 3D Object Part Segmentation by 2D Semantic Correspondences

    Authors: Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg

    Abstract: 3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object par… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  15. arXiv:2407.09121  [pdf, other

    cs.CL cs.AI

    Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

    Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

    Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  16. arXiv:2407.08133  [pdf, other

    cs.CV cs.AI

    Nonverbal Interaction Detection

    Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang

    Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the l… ▽ More

    Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/weijianan1/NVI

  17. arXiv:2407.08127  [pdf, other

    cs.CV

    Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

    Authors: Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

    Abstract: Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unreal… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  18. arXiv:2407.07924  [pdf, other

    math.OC cs.AI cs.CL cs.LG

    Solving General Natural-Language-Description Optimization Problems with Large Language Models

    Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

    Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  19. arXiv:2407.07504  [pdf, other

    cs.CV

    Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

    Authors: Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

    Abstract: Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  20. arXiv:2407.07487  [pdf, other

    cs.CL

    Review-LLM: Harnessing Large Language Models for Personalized Review Generation

    Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

    Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  21. arXiv:2407.07433  [pdf, other

    cs.CV cs.AI

    Controllable Navigation Instruction Generation with Chain of Thought Prompting

    Authors: Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu

    Abstract: Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  22. arXiv:2407.07056  [pdf, other

    cs.CV eess.IV

    CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement

    Authors: Wei Wang, Zhi Jin

    Abstract: Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPE… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  23. arXiv:2407.06953  [pdf, other

    cs.DC

    SP-Chain: Boosting Intra-Shard and Cross-Shard Security and Performance in Blockchain Sharding

    Authors: Mingzhe Li, You Lin, Wei Wang, Jin Zhang

    Abstract: A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  24. arXiv:2407.06943  [pdf, other

    cs.RO

    A Starter's Kit for Concentric Tube Robots

    Authors: Kalina Bonofiglio, Wenpeng Wang, Ethan R. Wilke, Adri Rajaraman, Loris Fichera

    Abstract: Concentric Tube Robots (CTRs) have garnered significant interest within the surgical robotics community because of their flexibility, dexterity, and ease of miniaturization. However, mastering the unique kinematics and design principles of CTRs can be challenging for newcomers to the field. In this paper, we present an educational kit aimed at lowering the barriers to entry into concentric tube ro… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  25. arXiv:2407.06844  [pdf, other

    cs.CV

    Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

    Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

    Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: submitted to TIP

  26. arXiv:2407.06540  [pdf, other

    cs.CV cs.AI

    General and Task-Oriented Video Segmentation

    Authors: Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

    Abstract: We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/kagawa588/GvSeg

  27. arXiv:2407.06426  [pdf, other

    cs.CL cs.AI cs.MA

    DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

    Authors: Luke Yoffe, Alfonso Amayuelas, William Yang Wang

    Abstract: To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate. However, LLMs often produce incorrect responses that appear deceptively confident, which can mislead other agents. This is partly because agents do not express their confidence levels during standard debates. To address this… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  28. arXiv:2407.05862  [pdf, other

    cs.CV

    Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

    Authors: Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

    Abstract: Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-ba… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

  29. arXiv:2407.05718  [pdf, other

    cs.CL

    A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

    Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

    Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  30. arXiv:2407.05700  [pdf, other

    cs.CL cs.AI cs.SE

    InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct

    Authors: Yutong Wu, Di Huang, Wenxuan Shi, Wei Wang, Lingzhe Gao, Shihao Liu, Ziyuan Nan, Kaizhao Yuan, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Yewen Pu, Dawei Yin, Xing Hu, Yunji Chen

    Abstract: Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  31. arXiv:2407.05690  [pdf, other

    cs.CL cs.AI

    Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

    Authors: Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

    Abstract: Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Findings of ACL 2024

  32. arXiv:2407.05550  [pdf, other

    cs.HC cs.AI

    MEEG and AT-DGNN: Advancing EEG Emotion Recognition with Music and Graph Learning

    Authors: Minghao Xiao, Zhengxi Zhu, Wenyu Wang, Meixia Qu

    Abstract: Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  33. arXiv:2407.05250  [pdf, other

    cs.CL

    CLIMB: A Benchmark of Clinical Bias in Large Language Models

    Authors: Yubo Zhang, Shudi Hou, Mingyu Derek Ma, Wei Wang, Muhao Chen, Jieyu Zhao

    Abstract: Large language models (LLMs) are increasingly applied to clinical decision-making. However, their potential to exhibit bias poses significant risks to clinical equity. Currently, there is a lack of benchmarks that systematically evaluate such clinical bias in LLMs. While in downstream tasks, some biases of LLMs can be avoided such as by instructing the model to answer "I'm not sure...", the intern… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  34. arXiv:2407.05054  [pdf

    cs.CL

    Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning

    Authors: Jingshen Zhang, Xinying Qiu, Teng Shen, Wenyu Wang, Kailin Zhang, Wenhe Feng

    Abstract: Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between wor… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  35. Ubiquitous Integrated Sensing and Communications for Massive MIMO LEO Satellite Systems

    Authors: Li You, Yongxiang Zhu, Xiaoyu Qiang, Christos G. Tsinos, Wenjin Wang, Xiqi Gao, Björn Ottersten

    Abstract: The next sixth generation (6G) networks are envisioned to integrate sensing and communications in a single system, thus greatly improving spectrum utilization and reducing hardware costs. Low earth orbit (LEO) satellite communications combined with massive multiple-input multiple-output (MIMO) technology holds significant promise in offering ubiquitous and seamless connectivity with high data rate… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 6 pages,4 figures

    Journal ref: IEEE Internet of Things Magazine, vol. 7, no. 4, pp. 30-35, Jul. 2024

  36. arXiv:2407.04973  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

    Authors: Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang

    Abstract: We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficie… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: LogicVista benchmarks the logical reasoning of multimodal large language models in visual tasks

  37. arXiv:2407.04947  [pdf, other

    cs.CV

    FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

    Authors: Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen

    Abstract: We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing (semantic image composition), we showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to Proc. Eur. Conf. Comp. Vision 2024. Project webpage: https://github.com/aim-uofa/FreeCompose

  38. arXiv:2407.04936  [pdf, other

    cs.SD eess.AS

    A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

    Authors: Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi, Kejia Zhang, Jianyuan Sun, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, the SDR-based metrics require a reference signal, which is often difficult to obtain in real-world scenarios. In addition, with the SDR-based metrics,… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE 2024 Workshop

  39. arXiv:2407.04903  [pdf, other

    cs.CL cs.AI cs.CV

    MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension

    Authors: Zekun Li, Xianjun Yang, Kyuri Choi, Wanrong Zhu, Ryan Hsieh, HyeonJung Kim, Jin Hyuk Lim, Sungyoung Ji, Byungju Lee, Xifeng Yan, Linda Ruth Petzold, Stephen D. Wilson, Woosang Lim, William Yang Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) and Large Multimodal Models (LMMs) has heightened the demand for AI-based scientific assistants capable of understanding scientific articles and figures. Despite progress, there remains a significant gap in evaluating models' comprehension of professional, graduate-level, and even PhD-level scientific content. Current datasets and benchmarks pr… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Code and data are available at https://github.com/Leezekun/MMSci

  40. arXiv:2407.04713  [pdf

    cs.ET physics.optics

    16-channel Photonic Solver for Optimization Problems on a Silicon Chip

    Authors: Jiayi Ouyang, Shengping Liu, Ziyue Yang, Wei Wang, Xue Feng, Yongzhuo Li, Yidong Huang

    Abstract: In this article, we proposed a programmable 16-channel photonic solver for quadratic unconstrained binary optimization (QUBO) problems. The solver is based on a hybrid optoelectronic scheme including a photonic chip and the corresponding electronic driving circuit. The photonic chip is fabricated on silicon on insulator (SOI) substrate and integrates high-speed electro-optic modulators, thermo-opt… ▽ More

    Submitted 5 June, 2024; originally announced July 2024.

  41. arXiv:2407.04416  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Audio Generation with Visual Enhanced Caption

    Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Mark D. Plumbley, Wenwu Wang

    Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low quality and relatively small quantity of training data. In this work, we aim to create a large-scale audio dataset with rich captions for improving audi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages with 1 appendix

  42. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  43. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  44. arXiv:2407.02918  [pdf, other

    cs.CV eess.IV

    Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction

    Authors: Jiaxin Guo, Jiangliu Wang, Di Kang, Wenzhen Dong, Wenting Wang, Yun-hui Liu

    Abstract: Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3D… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  45. arXiv:2407.02342  [pdf, ps, other

    cs.LG cs.DC cs.MA cs.NI

    Optimizing Age of Information in Vehicular Edge Computing with Federated Graph Neural Network Multi-Agent Reinforcement Learning

    Authors: Wenhua Wang, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Jiangzhou Wang, Khaled B. Letaief

    Abstract: With the rapid development of intelligent vehicles and Intelligent Transport Systems (ITS), the sensors such as cameras and LiDAR installed on intelligent vehicles provides higher capacity of executing computation-intensive and delay-sensitive tasks, thereby raising deployment costs. To address this issue, Vehicular Edge Computing (VEC) has been proposed to process data through Road Side Units (RS… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Optimizing-AoI-in-VEC-with-Federated-Graph-Neural-Network-Multi-Agent-Reinforcement-Learning

  46. arXiv:2407.02301  [pdf, other

    cs.CL

    CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

    Authors: Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao

    Abstract: Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  47. arXiv:2407.02031  [pdf, other

    cs.DC cs.AI cs.LG

    SwiftDiffusion: Efficient Diffusion Model Serving with Add-on Modules

    Authors: Suyi Li, Lingyun Yang, Xiaoxiao Jiang, Hanfeng Lu, Zhipeng Di, Weiyi Lu, Jiawei Chen, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang

    Abstract: This paper documents our characterization study and practices for serving text-to-image requests with stable diffusion models in production. We first comprehensively analyze inference request traces for commercial text-to-image applications. It commences with our observation that add-on modules, i.e., ControlNets and LoRAs, that augment the base stable diffusion models, are ubiquitous in generatin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  48. arXiv:2407.01863  [pdf, other

    cs.CL

    VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

    Authors: Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang

    Abstract: Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  49. arXiv:2407.01231  [pdf, other

    cs.CL cs.AI

    MIRAI: Evaluating LLM Agents for Event Forecasting

    Authors: Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

    Abstract: Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 66 pages, 8 figures, 6 tables; Website: https://mirai-llm.github.io/

  50. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7