Skip to main content

Showing 1–50 of 2,336 results for author: Liu, C

  1. arXiv:2407.13363  [pdf, other

    cs.CV

    Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation

    Authors: Chang Liu, Giulia Rizzoli, Pietro Zanuttigh, Fu Li, Yi Niu

    Abstract: Current weakly-supervised incremental learning for semantic segmentation (WILSS) approaches only consider replacing pixel-level annotations with image-level labels, while the training images are still from well-designed datasets. In this work, we argue that widely available web images can also be considered for the learning of new classes. To achieve this, firstly we introduce a strategy to select… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.13237  [pdf, other

    cs.AI

    LLM-Empowered State Representation for Reinforcement Learning

    Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

    Abstract: Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.12939  [pdf, other

    cs.CV

    GenRC: Generative 3D Room Completion from Sparse Image Collections

    Authors: Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun

    Abstract: Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first proje… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  4. arXiv:2407.12576  [pdf, other

    cs.AR cs.AI

    IICPilot: An Intelligent Integrated Circuit Backend Design Framework Using Open EDA

    Authors: Zesong Jiang, Qing Zhang, Cheng Liu, Huawei Li, Xiaowei Li

    Abstract: Open-source EDA tools are rapidly advancing, fostering collaboration, innovation, and knowledge sharing within the EDA community. However, the growing complexity of these tools, characterized by numerous design parameters and heuristics, poses a significant barrier to their widespread adoption. This complexity is particularly pronounced in integrated circuit (IC) backend designs, which place subst… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: under review

  5. arXiv:2407.12575  [pdf, other

    cs.AR

    Graphitron: A Domain Specific Language for FPGA-based Graph Processing Accelerator Generation

    Authors: Xinmiao Zhang, Zheng Feng, Shengwen Liang, Xinyu Chen, Cheng Liu, Huawei Li, Xiaowei Li

    Abstract: FPGA-based graph processing accelerators, enabling extensive customization, have demonstrated significant energy efficiency over general computing engines like CPUs and GPUs. Nonetheless, customizing accelerators to diverse graph processing algorithms with distinct computational patterns remains challenging and error-prone for high-level application users. To this end, template-based approaches ha… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  6. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  7. EmoFace: Audio-driven Emotional 3D Face Animation

    Authors: Chang Liu, Qunfen Lin, Zijiao Zeng, Ye Pan

    Abstract: Audio-driven emotional 3D face animation aims to generate emotionally expressive talking heads with synchronized lip movements. However, previous research has often overlooked the influence of diverse emotions on facial expressions or proved unsuitable for driving MetaHuman models. In response to this deficiency, we introduce EmoFace, a novel audio-driven methodology for creating facial animations… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, 2024

  8. arXiv:2407.12023  [pdf, other

    cs.CL cs.AI

    CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models

    Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Zhi-Long Ji, Jin-Feng Bai, Zhen-Ru Pan, Fan-Hu Zeng, Jian Xu, Jia-Xin Zhang, Cheng-Lin Liu

    Abstract: Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 ed… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  9. arXiv:2407.11812  [pdf, ps, other

    cs.LG q-bio.QM

    DFDRNN: A dual-feature based neural network for drug repositioning

    Authors: Enqiang Zhu, Xiang Li, Chanjuan Liu, Nikhil R. Pal

    Abstract: Drug repositioning is an economically efficient strategy used to discover new indications for existing drugs beyond their original approvals, expanding their applicability and usage to address challenges in disease treatment. In recent years, deep-learning techniques for drug repositioning have gained much attention. While most deep learning-based research methods focus on encoding drugs and disea… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  10. arXiv:2407.11537  [pdf, other

    cs.CV cs.AI

    AEMIM: Adversarial Examples Meet Masked Image Modeling

    Authors: Wenzhao Xiang, Chang Liu, Hang Su, Hongyang Yu

    Abstract: Masked image modeling (MIM) has gained significant traction for its remarkable prowess in representation learning. As an alternative to the traditional approach, the reconstruction from corrupted images has recently emerged as a promising pretext task. However, the regular corrupted images are generated using generic generators, often lacking relevance to the specific reconstruction task involved… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Under review of International Journal of Computer Vision (IJCV)

  11. arXiv:2407.11380  [pdf, other

    cs.CV cs.LG

    NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

    Authors: Chenyu Liu, Jia Pan, Jinshui Hu, Baocai Yin, Bing Yin, Mingjun Chen, Cong Liu, Jun Du, Qingfeng Liu

    Abstract: Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. Current methods typically approach HMER as an image-to-sequence generation task within an autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall languag… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  12. arXiv:2407.11324  [pdf, other

    cs.AR

    ApproxPilot: A GNN-based Accelerator Approximation Framework

    Authors: Qing Zhang, Cheng Liu, Siting Liu, Yajuan Hui, Huawei Li, Xiaowei Li

    Abstract: A typical optimization of customized accelerators for error-tolerant applications such as multimedia, recognition, and classification is to replace traditional arithmetic units like multipliers and adders with the approximate ones to enhance energy efficiency while adhering to accuracy requirements. However, the plethora of arithmetic units and diverse approximate unit options result in an exceedi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  13. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.10528  [pdf, other

    cs.CV

    Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

    Authors: Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Runyi Yu, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

    Abstract: Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  15. arXiv:2407.10485  [pdf, other

    cs.CV

    Effective Motion Modeling for UAV-platform Multiple Object Tracking with Re-Margin Loss

    Authors: Mufeng Yao, Jinlong Peng, Qingdong He, Bo Peng, Hao Chen, Mingmin Chi, Chao Liu, Jon Atli Benediktsson

    Abstract: Multiple object tracking (MOT) from unmanned aerial vehicle (UAV) platforms requires efficient motion modeling. This is because UAV-MOT faces tracking difficulties caused by large and irregular motion, and insufficient training due to the motion long-tailed distribution of current UAV-MOT datasets. Previous UAV-MOT methods either extract motion and detection features redundantly or supervise motio… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2308.07207

  16. arXiv:2407.10131  [pdf, other

    cs.CV

    WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

    Authors: Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu

    Abstract: Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained v… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  17. arXiv:2407.09899  [pdf, other

    cs.RO

    DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Pipeline for Multi-Dexterous Robotic Hands

    Authors: Zhengshen Zhang, Lei Zhou, Chenchen Liu, Zhiyang Liu, Chengran Yuan, Sheng Guo, Ruiteng Zhao, Marcelo H. Ang Jr., Francis EH Tay

    Abstract: The versatility and adaptability of human grasping catalyze advancing dexterous robotic manipulation. While significant strides have been made in dexterous grasp generation, current research endeavors pivot towards optimizing object manipulation while ensuring functional integrity, emphasizing the synthesis of functional grasps following desired affordance instructions. This paper addresses the ch… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  18. arXiv:2407.09550  [pdf

    cs.CV cs.AI cs.LG

    CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network

    Authors: Jia-Hau Bai, Chi-Ting Liu, Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu

    Abstract: This study uses CAPM (Convex Adversarial Polytope for Maxpool-based CNN) to improve the verified bound for general purpose maxpool-based convolutional neural networks (CNNs) under bounded norm adversarial perturbations. The maxpool function is decomposed as a series of ReLU functions to extend the convex relaxation technique to maxpool functions, by which the verified bound can be efficiently comp… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  19. arXiv:2407.09512  [pdf

    cs.HC cs.AI

    Design and evaluation of AI copilots -- case studies of retail copilot templates

    Authors: Michal Furmakiewicz, Chang Liu, Angus Taylor, Ilya Venger

    Abstract: Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively. A case study of developing copilot templates for the retail domain by Microsoft is used to illustrate the role and importance of each aspect. The first section explores the key technical components of a copilot's architecture, inclu… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

    Comments: 22 pages, 4 figures

  20. arXiv:2407.09091  [pdf, other

    cs.RO

    Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion

    Authors: Jinhao He, Huaiyang Huang, Shuyang Zhang, Jianhao Jiao, Chengju Liu, Ming Liu

    Abstract: Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ICRA 2024

  21. arXiv:2407.08961  [pdf

    eess.IV cs.CV

    Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT

    Authors: Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang

    Abstract: Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  22. arXiv:2407.08722  [pdf, other

    cs.RO cs.CV cs.LG

    Unifying 3D Representation and Control of Diverse Robots with a Single Camera

    Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

    Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

  23. arXiv:2407.08678  [pdf, other

    cs.LG math.OC stat.CO stat.ML

    How to beat a Bayesian adversary

    Authors: Zihan Ding, Kexin Jin, Jonas Latz, Chenguang Liu

    Abstract: Deep neural networks and other modern machine learning models are often susceptible to adversarial attacks. Indeed, an adversary may often be able to change a model's prediction through a small, directed perturbation of the model's input - an issue in safety-critical applications. Adversarially robust machine learning is usually based on a minmax optimisation problem that minimises the machine lea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    MSC Class: 90C15; 65C35; 68T07

  24. arXiv:2407.08659  [pdf, other

    cs.LG cs.CV

    Controlling the Fidelity and Diversity of Deep Generative Models via Pseudo Density

    Authors: Shuangqi Li, Chen Liu, Tong Zhang, Hieu Le, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: We introduce an approach to bias deep generative models, such as GANs and diffusion models, towards generating data with either enhanced fidelity or increased diversity. Our approach involves manipulating the distribution of training and generated data through a novel metric for individual samples, named pseudo density, which is based on the nearest-neighbor information from real samples. Our appr… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  25. arXiv:2407.08501  [pdf, other

    cs.HC

    SelfIE: Self-Initiated Explorable Instructions Towards Enhanced User Experience

    Authors: Hyeongcheol Kim, Katherine Fennedy, Georgia Zhang, Can Liu, Shengdong Zhao

    Abstract: Given the widespread use of procedural instructions with non-linear access (situational information retrieval), there has been a proposal to accommodate both linear and non-linear usage in instructional design. However, it has received inadequate scholarly attention, leading to limited exploration. This paper introduces Self-Initiated Explorable (SelfIE) instructions, a new design concept aiming a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  26. arXiv:2407.08233  [pdf, other

    cs.LG

    Differentially Private Neural Network Training under Hidden State Assumption

    Authors: Ding Chen, Chen Liu

    Abstract: We present a novel approach called differentially private stochastic block coordinate descent (DP-SBCD) for training neural networks with provable guarantees of differential privacy under the hidden state assumption. Our methodology incorporates Lipschitz neural networks and decomposes the training process of the neural network into sub-problems, each corresponding to the training of a specific la… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  27. arXiv:2407.07771  [pdf, other

    cs.CL cs.CV cs.MM

    Multi-task Prompt Words Learning for Social Media Content Generation

    Authors: Haochen Xue, Chong Zhang, Chengzhi Liu, Fangyu Wu, Xiaobo Jin

    Abstract: The rapid development of the Internet has profoundly changed human life. Humans are increasingly expressing themselves and interacting with others on social media platforms. However, although artificial intelligence technology has been widely used in many aspects of life, its application in social media content creation is still blank. To solve this problem, we propose a new prompt word generation… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 5 figures

    Journal ref: International Joint Conference on Neural Networks 2024

  28. arXiv:2407.07672  [pdf, other

    cs.HC

    StoryDiffusion: How to Support UX Storyboarding With Generative-AI

    Authors: Zhaohui Liang, Xiaoyu Zhang, Kevin Ma, Zhao Liu, Xipei Ren, Kosa Goucher-Lambert, Can Liu

    Abstract: Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' indiv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  29. arXiv:2407.07327  [pdf, other

    cs.AI

    Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram

    Authors: Ming-Liang Zhang, Zhong-Zhi Li, Fei Yin, Liang Lin, Cheng-Lin Liu

    Abstract: Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained st… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: under review by journal

  30. arXiv:2407.06127  [pdf, other

    cs.CV

    Better Sampling, towards Better End-to-end Small Object Detection

    Authors: Zile Huang, Chong Zhang, Mingyu Jin, Fangyu Wu, Chengzhi Liu, Xiaobo Jin

    Abstract: While deep learning-based general object detection has made significant strides in recent years, the effectiveness and efficiency of small object detection remain unsatisfactory. This is primarily attributed not only to the limited characteristics of such small targets but also to the high density and mutual overlap among these targets. The existing transformer-based small object detectors do not… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures

  31. Two-Path GMM-ResNet and GMM-SENet for ASV Spoofing Detection

    Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Minglei Ma, Yingen Yang

    Abstract: The automatic speaker verification system is sometimes vulnerable to various spoofing attacks. The 2-class Gaussian Mixture Model classifier for genuine and spoofed speech is usually used as the baseline for spoofing detection. However, the GMM classifier does not separately consider the scores of feature frames on each Gaussian component. In addition, the GMM accumulates the scores on all frames… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  32. arXiv:2407.05293  [pdf, other

    cs.CE

    Wideband Beamforming with RIS: A Unified Framework via Space-Frequency Transformation

    Authors: Xiaowei Qian, Xiaoling Hu, Chenxi Liu, Mugen Peng

    Abstract: The spectrum shift from the sub-6G band to the high-frequency band has posed an ever-increasing demand on the paradigm shift from narrowband beamforming to wideband beamforming. Despite recent research efforts, the problem of wideband beamforming design is particularly challenging in reconfigurable intelligent surface (RIS)-assisted systems, due to that RIS is not capable of performing frequency-d… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 13 pages, 16 figures

  33. arXiv:2407.03926  [pdf, ps, other

    cs.IT eess.SP

    Rethinking the fundamental performance limits of integrated sensing and communication systems

    Authors: Zhouyuan Yu, Xiaoling Hu, Chenxi Liu, Mugen Peng

    Abstract: Integrated sensing and communication (ISAC) has been recognized as a key enabler and feature of future wireless networks. In the existing works analyzing the performances of ISAC, discrete-time systems were commonly assumed, which, however, overlooked the impacts of temporal, spectral, and spatial properties. To address this issue, we establish a unified information model for the band-limited cont… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  34. arXiv:2407.03594  [pdf, other

    cs.CV

    UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos

    Authors: Yuzhong Huang, Chen Liu, Ji Hou, Ke Huo, Shiyu Dong, Fred Morstatter

    Abstract: We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality an… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.07710 by other authors

  35. arXiv:2407.03195  [pdf, other

    math.OC cs.LG

    Incremental Gauss--Newton Methods with Superlinear Convergence Rates

    Authors: Zhiling Zhou, Zhuanghua Liu, Chengchang Liu, Luo Luo

    Abstract: This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and ou… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 37 pages, 9 figures

  36. arXiv:2407.03177  [pdf, other

    cs.HC eess.SP

    EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding

    Authors: Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian

    Abstract: Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet emp… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  37. arXiv:2407.03152  [pdf, other

    cs.CV cs.LG

    Stereo Risk: A Continuous Modeling Approach to Stereo Matching

    Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Yao Yao, Luc Van Gool

    Abstract: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted as an Oral Paper at ICML 2024. Draft info: 18 pages, 6 Figure, 16 Tables

  38. arXiv:2407.03135  [pdf, other

    cs.SD cs.AI cs.HC eess.AS

    GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

    Authors: Hui Yan, Zhenchun Lei, Changhong Liu, Yong Zhou

    Abstract: With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consid… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  39. arXiv:2407.02315  [pdf, other

    cs.CV cs.AI

    VFIMamba: Video Frame Interpolation with State Space Models

    Authors: Guozhen Zhang, Chunxu Liu, Yutao Cui, Xiaotong Zhao, Kai Ma, Limin Wang

    Abstract: Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  40. GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

    Authors: Zhenchun Lei, Hui Yan, Changhong Liu, Yong Zhou, Minglei Ma

    Abstract: Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scal… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  41. arXiv:2407.01896  [pdf, other

    cs.CL cs.IR

    LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis

    Authors: Tianyu Cui, Shiyu Ma, Ziang Chen, Tong Xiao, Shimin Tao, Yilun Liu, Shenglin Zhang, Duoming Lin, Changchang Liu, Yuzhe Cai, Weibin Meng, Yongqian Sun, Dan Pei

    Abstract: Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant potential in natural language processing tasks. In the AIOps domain, they excel in tasks such as anomaly detection, root cause analysis of faults, operations and maint… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  42. arXiv:2407.01639  [pdf, other

    cs.LG cs.SE

    ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

    Authors: Tianhao Wei, Luca Marzari, Kai S. Yun, Hanjiang Hu, Peizhi Niu, Xusheng Luo, Changliu Liu

    Abstract: Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the fir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  43. arXiv:2407.01183  [pdf, other

    cs.DB

    TCSR-SQL: Towards Table Content-aware Text-to-SQL with Self-retrieval

    Authors: Wenbo Xu, Liang Yan, Peiyi Han, Haifeng Zhu, Chuanyi Liu, Shaoming Duan, Cuiyun Gao, Yingwei Liang

    Abstract: Large Language Model-based (LLM-based) Text-to-SQL methods have achieved important progress in generating SQL queries for real-world applications. When confronted with table content-aware questions in real-world scenarios, ambiguous data content keywords and non-existent database schema column names within the question leads to the poor performance of existing methods. To solve this problem, we pr… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  44. arXiv:2407.00632  [pdf, other

    cs.RO cs.CL cs.CV cs.MA

    CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

    Authors: Pengying Wu, Yao Mu, Kangjie Zhou, Ji Ma, Junting Chen, Chang Liu

    Abstract: Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to the RSS 2024 Workshop: GROUND

  45. arXiv:2407.00548  [pdf, other

    cs.RO

    KOROL: Learning Visualizable Object Feature with Koopman Operator Rollout for Manipulation

    Authors: Hongyi Chen, Abulikemu Abuduweili, Aviral Agrawal, Yunhai Han, Harish Ravichandar, Changliu Liu, Jeffrey Ichnowski

    Abstract: Learning dexterous manipulation skills presents significant challenges due to complex nonlinear dynamics that underlie the interactions between objects and multi-fingered hands. Koopman operators have emerged as a robust method for modeling such nonlinear dynamics within a linear framework. However, current methods rely on runtime access to ground-truth (GT) object states, making them unsuitable f… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  46. arXiv:2406.18537  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale

    Authors: Keenon Werling, Janelle Kaneda, Alan Tan, Rishi Agarwal, Six Skov, Tom Van Wouwe, Scott Uhlrich, Nicholas Bianco, Carmichael Ong, Antoine Falisse, Shardul Sapkota, Aidan Chandra, Joshua Carter, Ezio Preatoni, Benjamin Fregly, Jennifer Hicks, Scott Delp, C. Karen Liu

    Abstract: While reconstructing human poses in 3D from inexpensive sensors has advanced significantly in recent years, quantifying the dynamics of human motion, including the muscle-generated joint torques and external forces, remains a challenge. Prior attempts to estimate physics from reconstructed human poses have been hampered by a lack of datasets with high-quality pose and force data for a variety of m… ▽ More

    Submitted 16 May, 2024; originally announced June 2024.

    Comments: 15 pages, 6 figures, 4 tables

  47. arXiv:2406.18530  [pdf, other

    cs.CV

    MatchTime: Towards Automatic Soccer Game Commentary Generation

    Authors: Jiayuan Rao, Haoning Wu, Chang Liu, Yanfeng Wang, Weidi Xie

    Abstract: Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Technical Report; Project Page: https://haoningwu3639.github.io/MatchTime/

  48. arXiv:2406.18139  [pdf, other

    cs.CL cs.CV

    LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

    Authors: Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan

    Abstract: Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency. Unlike single-modality LLMs that manage only textual contexts, the KV cache of long-context MLLMs includes representations from multiple images with temp… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  49. arXiv:2406.17960  [pdf, other

    cs.CV cs.AI

    MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Mengjiao Shen, Jingwei Yang, Chengju Liu, Qijun Chen

    Abstract: Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  50. arXiv:2406.17840  [pdf, other

    cs.AI cs.CV

    Human-Object Interaction from Human-Level Instructions

    Authors: Zhen Wu, Jiaman Li, C. Karen Liu

    Abstract: Intelligent agents need to autonomously navigate and interact within contextual environments to perform a wide range of daily tasks based on human-level instructions. These agents require a foundational understanding of the world, incorporating common sense and knowledge, to interpret such instructions. Moreover, they must possess precise low-level skills for movement and interaction to execute th… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 10 pages