Skip to main content

Showing 1–50 of 945 results for author: Luo, Y

  1. arXiv:2407.13088  [pdf, other

    cs.DC

    Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

    Authors: Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang

    Abstract: Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Efficient scheduler designs for such clusters are vital to reduce operational costs and enhance resource utilization. While recent schedulers have shown impressive performance in optimizing DL job performanc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.11315  [pdf, other

    cs.AI

    COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

    Authors: Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun

    Abstract: The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framewo… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.11272  [pdf, other

    cs.CV math.DG

    Differentiable Voxelization and Mesh Morphing

    Authors: Yihao Luo, Yikai Wang, Zhengrui Xiang, Yuliang Xiu, Guang Yang, ChoonHwai Yap

    Abstract: In this paper, we propose the differentiable voxelization of 3D meshes via the winding number and solid angles. The proposed approach achieves fast, flexible, and accurate voxelization of 3D meshes, admitting the computation of gradients with respect to the input mesh and GPU acceleration. We further demonstrate the application of the proposed voxelization in mesh morphing, where the voxelized mes… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. Geometric Understanding of Discriminability and Transferability for Visual Domain Adaptation

    Authors: You-Wei Luo, Chuan-Xian Ren, Xiao-Lin Xu, Qingshan Liu

    Abstract: To overcome the restriction of identical distribution assumption, invariant representation learning for unsupervised domain adaptation (UDA) has made significant advances in computer vision and pattern recognition communities. In UDA scenario, the training and test data belong to different domains while the task model is learned to be invariant. Recently, empirical connections between transferabil… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  5. arXiv:2407.09451  [pdf, other

    cs.RO

    Benchmarking Large Neighborhood Search for Multi-Agent Path Finding

    Authors: Jiaqi Tan, Yudong Luo, Jiaoyang Li, Hang Ma

    Abstract: Multi-Agent Path Finding (MAPF) aims to arrange collision-free goal-reaching paths for a group of agents. Anytime MAPF solvers based on large neighborhood search (LNS) have gained prominence recently due to their flexibility and scalability. Neighborhood selection strategy is crucial to the success of MAPF-LNS and a flurry of methods have been proposed. However, several pitfalls exist and hinder a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  6. arXiv:2407.09431  [pdf, other

    cs.CV

    Rethinking temporal self-similarity for repetitive action counting

    Authors: Yanan Luo, Jinhui Yi, Yazan Abu Farha, Moritz Wolter, Juergen Gall

    Abstract: Counting repetitive actions in long untrimmed videos is a challenging task that has many applications such as rehabilitation. State-of-the-art methods predict action counts by first generating a temporal self-similarity matrix (TSM) from the sampled frames and then feeding the matrix to a predictor network. The self-similarity matrix, however, is not an optimal input to a network since it discards… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted to ICIP 2024

  7. arXiv:2407.08946  [pdf, other

    cs.LG

    Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

    Authors: Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg Ver Steeg

    Abstract: Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.08813  [pdf, other

    eess.IV cs.AI cs.CV

    FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

    Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

    Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Codes available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

  9. arXiv:2407.07478  [pdf, other

    cs.CV

    EA-VTR: Event-Aware Video-Text Retrieval

    Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

    Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  10. arXiv:2407.06939  [pdf, other

    cs.RO cs.CV

    Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

    Authors: Sriram Yenamandra, Arun Ramachandran, Mukul Khanna, Karmesh Yadav, Jay Vakil, Andrew Melnik, Michael Büttner, Leon Harz, Lyon Brown, Gora Chand Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke, Yang Luo, Jinxin Zhu, Yansen Han, Bingyi Lu, Xuan Gu, Qinyuan Liu, Yaping Zhao, Qiting Ye, Chenxiao Dou, Yansong Chua, Volodymyr Kuzma , et al. (20 additional authors not shown)

    Abstract: In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed Open Vocabulary Mobile Manipulation as a key benchmark task for robotics: finding any object in a novel environment and placing it on any receptacle surface withi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  11. arXiv:2407.06169  [pdf, other

    cs.RO cs.CV cs.LG

    Potential Based Diffusion Motion Planning

    Authors: Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, Yilun Du

    Abstract: Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potent… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICML 2024. Project page and code at https://energy-based-model.github.io/potential-motion-plan/

  12. arXiv:2407.05633  [pdf, other

    cs.LG cs.CR

    AdaPI: Facilitating DNN Model Adaptivity for Efficient Private Inference in Edge Computing

    Authors: Tong Zhou, Jiahui Zhao, Yukui Luo, Xi Xie, Wujie Wen, Caiwen Ding, Xiaolin Xu

    Abstract: Private inference (PI) has emerged as a promising solution to execute computations on encrypted data, safeguarding user privacy and model parameters in edge computing. However, existing PI methods are predominantly developed considering constant resource constraints, overlooking the varied and dynamic resource constraints in diverse edge devices, like energy budgets. Consequently, model providers… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICCAD 2024 accepted publication

  13. arXiv:2407.05267  [pdf, other

    cs.CV

    DTR: A Unified Deep Tensor Representation Framework for Multimedia Data Recovery

    Authors: Ting-Wei Zhou, Xi-Le Zhao, Jian-Li Wang, Yi-Si Luo, Min Wang, Xiao-Xuan Bai, Hong Yan

    Abstract: Recently, the transform-based tensor representation has attracted increasing attention in multimedia data (e.g., images and videos) recovery problems, which consists of two indispensable components, i.e., transform and characterization. Previously, the development of transform-based tensor representation mainly focuses on the transform aspect. Although several attempts consider using shallow matri… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  14. arXiv:2407.04458  [pdf, other

    cs.CV cs.AI

    Robust Multimodal Learning via Representation Decoupling

    Authors: Shicai Wei, Yang Luo, Yuji Wang, Chunbo Luo

    Abstract: Multimodal learning robust to missing modality has attracted increasing attention due to its practicality. Existing methods tend to address it by learning a common subspace representation for different modality combinations. However, we reveal that they are sub-optimal due to their implicit constraint on intra-class representation. Specifically, the sample with different modalities within the same… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV2024 17 pages

  15. arXiv:2407.02005  [pdf, other

    cs.CL cs.SD eess.AS

    An End-to-End Speech Summarization Using Large Language Model

    Authors: Hengchao Shang, Zongyao Li, Jiaxin Guo, Shaojun Li, Zhiqiang Rao, Yuanchang Luo, Daimeng Wei, Hao Yang

    Abstract: Abstractive Speech Summarization (SSum) aims to generate human-like text summaries from spoken content. It encounters difficulties in handling long speech input and capturing the intricate cross-modal mapping between long speech inputs and short text summaries. Research on large language models (LLMs) and multimodal information fusion has provided new insights for addressing these challenges. In t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  16. arXiv:2407.01860  [pdf, other

    cs.SD eess.AS eess.SP

    Constant Directivity Loudspeaker Beamforming

    Authors: Yuancheng Luo

    Abstract: Loudspeaker array beamforming is a common signal processing technique for acoustic directivity control and robust audio reproduction. Unlike their microphone counterpart, loudspeaker constraints are often heterogeneous due to arrayed transducers with varying operating ranges in frequency, acoustic-electrical sensitivity, efficiency, and directivity. This work proposes a frequency-regularization me… ▽ More

    Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at EUSIPCO 2024

  17. arXiv:2407.01649  [pdf, other

    q-bio.QM cs.LG

    FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames

    Authors: Ruidong Wu, Ruihan Guo, Rui Wang, Shitong Luo, Yue Xu, Jiahan Li, Jianzhu Ma, Qiang Liu, Yunan Luo, Jian Peng

    Abstract: Despite the striking success of general protein folding models such as AlphaFold2(AF2, Jumper et al. (2021)), the accurate computational modeling of antibody-antigen complexes remains a challenging task. In this paper, we first analyze AF2's primary loss function, known as the Frame Aligned Point Error (FAPE), and raise a previously overlooked issue that FAPE tends to face gradient vanishing probl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  18. arXiv:2406.18565  [pdf, other

    cs.CV cs.AI

    Pseudo-label Based Domain Adaptation for Zero-Shot Text Steganalysis

    Authors: Yufei Luo, Zhen Yang, Ru Zhang, Jianyi Liu

    Abstract: Currently, most methods for text steganalysis are based on deep neural networks (DNNs). However, in real-life scenarios, obtaining a sufficient amount of labeled stego-text for correctly training networks using a large number of parameters is often challenging and costly. Additionally, due to a phenomenon known as dataset bias or domain shift, recognition models trained on a large dataset exhibit… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: The 30th International Conference on Computational & Experimental Engineering and Sciences (ICCES2024)

  19. arXiv:2406.18453  [pdf, other

    cs.CV

    Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

    Authors: Yuan Gao, Yajing Luo, Junhong Wang, Kui Jia, Gui-Song Xia

    Abstract: Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. This is arguably achieved by incorporating (i) 3D/2.5D shape perception from a single image, (ii) render-and-compare simulation, and (iii) rich semantic cue awareness to furnish (coarse) reference-query correspondence. Existing methods implement (i) by a 3D CAD mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: The codes are available at https://github.com/ethanygao/training-free_generalizable_relative_pose

  20. arXiv:2406.18053  [pdf, other

    cs.LG cs.AI

    Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

    Authors: Yu Luo, Fuchun Sun, Tianying Ji, Xianyuan Zhan

    Abstract: Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, the effectiveness of HRL is greatly influenced by subgoal reachability. Typical HRL methods only consider subgoal reachability from the unilateral level, where a dominant level enforces compliance to the subordinate level. However, we observe that when the dominan… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  21. arXiv:2406.17261  [pdf, other

    cs.CL

    TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

    Authors: Yiran Luo, Het Patel, Yu Fu, Dawon Ahn, Jia Chen, Yue Dong, Evangelos E. Papalexakis

    Abstract: Large language models (LLMs) have fundamentally transformed artificial intelligence, catalyzing recent advancements while imposing substantial environmental and computational burdens. We introduce TRAWL (Tensor Reduced and Approximated Weights for Large Language Models), a novel methodology for optimizing LLMs through tensor decomposition. TRAWL leverages diverse strategies to exploit matrices wit… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures. Submitted to EMNLP 2024 and under review

    MSC Class: 68T50 (Primary); 65F55 (Secondary) ACM Class: I.2.7

  22. When Invariant Representation Learning Meets Label Shift: Insufficiency and Theoretical Insights

    Authors: You-Wei Luo, Chuan-Xian Ren

    Abstract: As a crucial step toward real-world learning scenarios with changing environments, dataset shift theory and invariant representation learning algorithm have been extensively studied to relax the identical distribution assumption in classical learning setting. Among the different assumptions on the essential of shifting distributions, generalized label shift (GLS) is the latest developed one which… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  23. arXiv:2406.16072  [pdf, other

    cs.CV

    DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation

    Authors: Yueru Luo, Shuguang Cui, Zhen Li

    Abstract: Accurate 3D lane estimation is crucial for ensuring safety in autonomous driving. However, prevailing monocular techniques suffer from depth loss and lighting variations, hampering accurate 3D lane detection. In contrast, LiDAR points offer geometric cues and enable precise localization. In this paper, we present DV-3DLane, a novel end-to-end Dual-View multi-modal 3D Lane detection framework that… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted at ICLR2024

  24. arXiv:2406.16004  [pdf, other

    cs.CV

    RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

    Authors: Mingshu Zhao, Yi Luo, Yong Ouyang

    Abstract: In the realm of resource-constrained mobile vision tasks, the pursuit of efficiency and performance consistently drives innovation in lightweight Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). While ViTs excel at capturing global context through self-attention mechanisms, their deployment in resource-limited environments is hindered by computational complexity and latency. Co… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Tech report

  25. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  26. arXiv:2406.14924  [pdf, other

    cs.CV

    DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

    Authors: Jia Syuen Lim, Zhuoxiao Chen, Mahsa Baktashmotlagh, Zhi Chen, Xin Yu, Zi Huang, Yadan Luo

    Abstract: Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we in… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 19 pages

  27. arXiv:2406.14878  [pdf, other

    cs.CV cs.LG eess.IV

    MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection

    Authors: Zhuoxiao Chen, Junjie Meng, Mahsa Baktashmotlagh, Zi Huang, Yadan Luo

    Abstract: LiDAR-based 3D object detection is pivotal across many applications, yet the performance of such detection systems often degrades after deployment, especially when faced with unseen test point clouds originating from diverse locations or subjected to corruption. In this work, we introduce a new online adaptation framework for detectors named Model Synergy (MOS). Specifically, MOS dynamically assem… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  28. arXiv:2406.13891  [pdf, other

    cs.CV cs.AI

    DPO: Dual-Perturbation Optimization for Test-time Adaptation in 3D Object Detection

    Authors: Zhuoxiao Chen, Zixin Wang, Sen Wang, Zi Huang, Yadan Luo

    Abstract: LiDAR-based 3D object detection has seen impressive advances in recent times. However, deploying trained 3D detectors in the real world often yields unsatisfactory performance when the distribution of the test data significantly deviates from the training data due to different weather conditions, object sizes, \textit{etc}. A key factor in this performance degradation is the diminished generalizab… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  29. arXiv:2406.11033  [pdf, other

    cs.DB cs.AI

    HAIChart: Human and AI Paired Visualization System

    Authors: Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang

    Abstract: The growing importance of data visualization in business intelligence and data science emphasizes the need for tools that can efficiently generate meaningful visualizations from large datasets. Existing tools fall into two main categories: human-powered tools (e.g., Tableau and PowerBI), which require intensive expert involvement, and AI-powered automated tools (e.g., Draco and Table2Charts), whic… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 16 pages, 14 figures, 7 tables

  30. arXiv:2406.09841  [pdf, other

    cs.LG q-bio.BM

    Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

    Authors: Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zikun Nie, Hao Zhou, Zaiqing Nie

    Abstract: Capturing molecular knowledge with representation learning approaches holds significant potential in vast scientific fields such as chemistry and life science. An effective and generalizable molecular representation is expected to capture the consensus and complementary molecular expertise from diverse views and perspectives. However, existing works fall short in learning multi-view molecular repr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  31. arXiv:2406.09770  [pdf, other

    cs.LG cs.AI

    Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

    Abstract: Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for l… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: code is available at https://github.com/tanganke/pareto_set_learning

  32. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  33. arXiv:2406.08993  [pdf, ps, other

    cs.LG

    Classic GNNs are Strong Baselines: Reassessing GNNs for Node Classification

    Authors: Yuankai Luo, Lei Shi, Xiao-Ming Wu

    Abstract: Graph Transformers (GTs) have recently emerged as popular alternatives to traditional message-passing Graph Neural Networks (GNNs), due to their theoretically superior expressiveness and impressive performance reported on standard node classification benchmarks, often significantly outperforming GNNs. In this paper, we conduct a thorough empirical analysis to reevaluate the performance of three cl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  34. arXiv:2406.08759  [pdf, other

    cs.CV cs.MM

    Gaussian-Forest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling

    Authors: Fengyi Zhang, Tianjun Zhang, Lin Zhang, Helen Huang, Yadan Luo

    Abstract: The field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage chal… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  35. arXiv:2406.07815  [pdf, other

    cs.CL cs.AI

    Are Large Language Models Good Statisticians?

    Authors: Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan Tang

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across a range of scientific tasks including mathematics, physics, and chemistry. Despite their successes, the effectiveness of LLMs in handling complex statistical tasks remains systematically under-explored. To bridge this gap, we introduce StatQA, a new benchmark designed for statistical analysis tasks. StatQA comprises 11,6… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 31 pages, 10 figures,19 tables. Work in progress

  36. arXiv:2406.07471  [pdf, other

    cs.CV

    OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Authors: Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, Zongyuan Ge

    Abstract: Surgical scene perception via videos are critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets for surgical workflow analysis, which typically face challenges such as small s… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Version 1

  37. arXiv:2406.06110  [pdf, other

    cs.CL cs.AI

    Recurrent Context Compression: Efficiently Expanding the Context Window of LLM

    Authors: Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guojing Ge, Haoran Chen, Dong Yi, Jinqiao Wang

    Abstract: To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also invest… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  38. arXiv:2406.05630  [pdf, other

    cs.CV

    Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion

    Authors: Ge Ya Luo, Zhi Hao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, Christopher Pal

    Abstract: With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  39. arXiv:2406.05532  [pdf, other

    cs.LG cs.AI

    Exploring Adversarial Robustness of Deep State Space Models

    Authors: Biqing Qi, Yang Luo, Junqi Gao, Pengfei Li, Kai Tian, Zhiyuan Ma, Bowen Zhou

    Abstract: Deep State Space Models (SSMs) have proven effective in numerous task scenarios but face significant security challenges due to Adversarial Perturbations (APs) in real-world deployments. Adversarial Training (AT) is a mainstream approach to enhancing Adversarial Robustness (AR) and has been validated on various traditional DNN architectures. However, its effectiveness in improving the AR of SSMs r… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  40. arXiv:2406.04791  [pdf, other

    cs.SD eess.AS

    Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR

    Authors: Shaojun Li, Daimeng Wei, Hengchao Shang, Jiaxin Guo, ZongYao Li, Zhanglin Wu, Zhiqiang Rao, Yuanchang Luo, Xianghui He, Hao Yang

    Abstract: Despite recent improvements in End-to-End Automatic Speech Recognition (E2E ASR) systems, the performance can degrade due to vocal characteristic mismatches between training and testing data, particularly with limited target speaker adaptation data. We propose a novel speaker adaptation approach Speaker-Smoothed kNN that leverages k-Nearest Neighbors (kNN) retrieval techniques to improve model out… ▽ More

    Submitted 1 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  41. arXiv:2406.03753  [pdf, other

    cs.HC

    VisLTR: Visualization-in-the-Loop Table Reasoning

    Authors: Jianing Hao, Zhuowen Liang, Chunting Li, Yuyu Luo, Wei Zeng

    Abstract: Table reasoning transforms user requirements into corresponding answers according to the provided table, which is often integrated with natural language interfaces for lay users to explore tabular data effortlessly. Recent research exploits large language models to facilitate table reasoning, by transforming vague user requirements into structured query languages (SQLs). However, these SQL-based a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages, 9 figures

  42. arXiv:2406.03317  [pdf, other

    cs.HC cs.MM

    Save It for the "Hot" Day: An LLM-Empowered Visual Analytics System for Heat Risk Management

    Authors: Haobo Li, Wong Kam-Kwai, Yan Luo, Juntong Chen, Chengzhong Liu, Yaxuan Zhang, Alexis Kai Hon Lau, Huamin Qu, Dongyu Liu

    Abstract: The escalating frequency and intensity of heat-related climate events, particularly heatwaves, emphasize the pressing need for advanced heat risk management strategies. Current approaches, primarily relying on numerical models, face challenges in spatial-temporal resolution and in capturing the dynamic interplay of environmental, social, and behavioral factors affecting heat risks. This has led to… ▽ More

    Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  43. arXiv:2406.03280  [pdf, other

    cs.LG cs.AI cs.CL

    FusionBench: A Comprehensive Benchmark of Deep Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Project homepage: https://github.com/tanganke/fusion_bench

  44. arXiv:2406.02962  [pdf, other

    cs.CL cs.AI cs.IR

    Docs2KG: Unified Knowledge Graph Construction from Heterogeneous Documents Assisted by Large Language Models

    Authors: Qiang Sun, Yuanyi Luo, Wenxiao Zhang, Sirui Li, Jichunyang Li, Kai Niu, Xiangrui Kong, Wei Liu

    Abstract: Even for a conservative estimate, 80% of enterprise data reside in unstructured files, stored in data lakes that accommodate heterogeneous formats. Classical search engines can no longer meet information seeking needs, especially when the task is to browse and explore for insight formulation. In other words, there are no obvious search keywords to use. Knowledge graphs, due to their natural visual… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  45. arXiv:2406.02902  [pdf, other

    cs.CL

    S$^2$GSL: Incorporating Segment to Syntactic Enhanced Graph Structure Learning for Aspect-based Sentiment Analysis

    Authors: Bingfeng Chen, Qihan Ouyang, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao

    Abstract: Previous graph-based approaches in Aspect based Sentiment Analysis(ABSA) have demonstrated impressive performance by utilizing graph neural networks and attention mechanisms to learn structures of static dependency trees and dynamic latent trees. However, incorporating both semantic and syntactic information simultaneously within complex global structures can introduce irrelevant contexts and synt… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024(main)

  46. arXiv:2406.02884  [pdf, other

    cs.CV

    PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM

    Authors: Tao Yang, Yingmin Luo, Zhongang Qi, Yang Wu, Ying Shan, Chang Wen Chen

    Abstract: Layout generation is the keystone in achieving automated graphic design, requiring arranging the position and size of various multi-modal design elements in a visually pleasing and constraint-following manner. Previous approaches are either inefficient for large-scale applications or lack flexibility for varying design requirements. Our research introduces a unified framework for automated graphic… ▽ More

    Submitted 1 July, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 10 pages; typos corrected, appendix added

  47. arXiv:2406.02629  [pdf, other

    cs.CR cs.LG

    SSNet: A Lightweight Multi-Party Computation Scheme for Practical Privacy-Preserving Machine Learning Service in the Cloud

    Authors: Shijin Duan, Chenghong Wang, Hongwu Peng, Yukui Luo, Wujie Wen, Caiwen Ding, Xiaolin Xu

    Abstract: As privacy-preserving becomes a pivotal aspect of deep learning (DL) development, multi-party computation (MPC) has gained prominence for its efficiency and strong security. However, the practice of current MPC frameworks is limited, especially when dealing with large neural networks, exemplified by the prolonged execution time of 25.8 seconds for secure inference on ResNet-152. The primary challe… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 figures

  48. arXiv:2406.01593  [pdf, other

    cs.CV

    Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting

    Authors: Shaojie Ma, Yawei Luo, Yi Yang

    Abstract: 3D reconstruction and simulation, while interrelated, have distinct objectives: reconstruction demands a flexible 3D representation adaptable to diverse scenes, whereas simulation requires a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to resolve such a dilemma. MaGS constrains 3D Gaussians to hover on th… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: see https://wcwac.github.io/MaGS-page/

  49. arXiv:2406.01265  [pdf, other

    cs.DB

    The Dawn of Natural Language to SQL: Are We Fully Ready?

    Authors: Boyan Li, Yuyu Luo, Chengliang Chai, Guoliang Li, Nan Tang

    Abstract: Translating users' natural language questions into SQL queries (i.e., NL2SQL) significantly lowers the barriers to accessing relational databases. The emergence of Large Language Models has introduced a novel paradigm in NL2SQL tasks, enhancing capabilities dramatically. However, this raises a critical question: Are we fully prepared to deploy NL2SQL models in production? To address the posed qu… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 14 pages, 15 figures, 7 tables

  50. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, YingFeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.