Skip to main content

Showing 1–50 of 9,662 results for author: Wang, Y

  1. arXiv:2407.13376  [pdf, other

    cs.RO

    Dual-arm Motion Generation for Repositioning Care based on Deep Predictive Learning with Somatosensory Attention Mechanism

    Authors: Tamon Miyake, Namiko Saito, Tetsuya Ogata, Yushi Wang, Shigeki Sugano

    Abstract: A versatile robot working in a domestic environment based on a deep neural network (DNN) is currently attracting attention. One of the roles expected for domestic robots is caregiving for a human. In particular, we focus on repositioning care because repositioning plays a fundamental role in supporting the health and quality of life of individuals with limited mobility. However, generating motions… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13362  [pdf, other

    cs.CV

    Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation

    Authors: Pengfei Wang, Yuxi Wang, Shuai Li, Zhaoxiang Zhang, Zhen Lei, Lei Zhang

    Abstract: The scarcity of large-scale 3D-text paired data poses a great challenge on open vocabulary 3D scene understanding, and hence it is popular to leverage internet-scale 2D data and transfer their open vocabulary capabilities to 3D models through knowledge distillation. However, the existing distillation-based 3D scene understanding approaches rely on the representation capacity of 2D models, disregar… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.13343  [pdf, other

    cs.CL

    Learning-From-Mistakes Prompting for Indigenous Language Translation

    Authors: You-Cheng Liao, Chen-Jui Yu, Chi-Yi Lin, He-Feng Yun, Yen-Hsiang Wang, Hsiao-Min Li, Yao-Chung Fan

    Abstract: Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs an… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13331  [pdf, other

    cs.LG

    Reconstruct the Pruned Model without Any Retraining

    Authors: Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

    Abstract: Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while usi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages

  5. arXiv:2407.13278  [pdf, other

    cs.LG

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

    Abstract: Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: \

  6. arXiv:2407.13271  [pdf, other

    cs.SE

    Identifying Smart Contract Security Issues in Code Snippets from Stack Overflow

    Authors: Jiachi Chen, Chong Chen, Jiang Hu, John Grundy, Yanlin Wang, Ting Chen, Zibin Zheng

    Abstract: Smart contract developers frequently seak solutions to developmental challenges on Q&A platforms such as Stack Overflow (SO). Although community responses often provide viable solutions, the embedded code snippets can also contain hidden vulnerabilities. Integrating such code directly into smart contracts may make them susceptible to malicious attacks. We conducted an online survey and received 74… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  7. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  8. arXiv:2407.13181  [pdf, other

    cs.CV

    Training-Free Large Model Priors for Multiple-in-One Image Restoration

    Authors: Xuanhua He, Lang Li, Yingying Wang, Hui Zheng, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a nov… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.12951  [pdf, other

    cs.CV

    AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

    Authors: Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang

    Abstract: Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. Despite the high accuracy, deploying it in real applications raises critical challenges including the high computational cost and inference latency. Recently, the post-training quantization (PTQ) technique has emerged as a promising way to enhance ViT's efficiency. Neverth… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  10. arXiv:2407.12880  [pdf, other

    cs.LG cs.AI cs.CL

    Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

    Authors: Ye Jiang, Taihang Wang, Xiaoman Xu, Yimin Wang, Xingyi Song, Diana Maynard

    Abstract: The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large nu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  11. arXiv:2407.12879  [pdf, other

    cs.CL cs.AI cs.LG

    Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection

    Authors: Ye Jiang, Yimin Wang

    Abstract: Large visual-language models (LVLMs) exhibit exceptional performance in visual-language reasoning across diverse cross-modal benchmarks. Despite these advances, recent research indicates that Large Language Models (LLMs), like GPT-3.5-turbo, underachieve compared to well-trained smaller models, such as BERT, in Fake News Detection (FND), prompting inquiries into LVLMs' efficacy in FND tasks. Altho… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.12663  [pdf, other

    cs.RO cs.AI

    Is That Rain? Understanding Effects on Visual Odometry Performance for Autonomous UAVs and Efficient DNN-based Rain Classification at the Edge

    Authors: Andrea Albanese, Yanran Wang, Davide Brunelli, David Boyle

    Abstract: The development of safe and reliable autonomous unmanned aerial vehicles relies on the ability of the system to recognise and adapt to changes in the local environment based on sensor inputs. State-of-the-art local tracking and trajectory planning are typically performed using camera sensor input to the flight control algorithm, but the extent to which environmental disturbances like rain affect t… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  13. arXiv:2407.12622  [pdf, other

    cs.CV

    Rethinking the Architecture Design for Efficient Generic Event Boundary Detection

    Authors: Ziwei Zheng, Zechuan Zhang, Yulin Wang, Shiji Song, Gao Huang, Le Yang

    Abstract: Generic event boundary detection (GEBD), inspired by human visual cognitive behaviors of consistently segmenting videos into meaningful temporal chunks, finds utility in various applications such as video editing and. In this paper, we demonstrate that SOTA GEBD models often prioritize final performance over model complexity, resulting in low inference speed and hindering efficient deployment in r… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ACM MM 2024

  14. arXiv:2407.12342  [pdf, other

    cs.CL

    Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

    Authors: Jintang Xue, Yun-Cheng Wang, Chengwei Wei, C. -C. Jay Kuo

    Abstract: As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases and it can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  15. arXiv:2407.12258  [pdf, other

    cs.CV

    Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

    Authors: Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu Wang, Gengchen Li, Xiao Sun

    Abstract: In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integr… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  16. arXiv:2407.12177  [pdf, other

    cs.LG stat.ML

    Are Linear Regression Models White Box and Interpretable?

    Authors: Ahmed M Salih, Yuhe Wang

    Abstract: Explainable artificial intelligence (XAI) is a set of tools and algorithms that applied or embedded to machine learning models to understand and interpret the models. They are recommended especially for complex or advanced models including deep neural network because they are not interpretable from human point of view. On the other hand, simple models including linear regression are easy to implem… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.12128  [pdf, other

    cs.LG cs.CV

    Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams

    Authors: Ziqiang Wang, Zhixiang Chi, Yanan Wu, Li Gu, Zhi Liu, Konstantinos Plataniotis, Yang Wang

    Abstract: Given a model trained on source data, Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source. Current methods predominantly optimize the model for each incoming test data batch using self-training loss. While these methods yield commendable results in ideal test data streams, where batches are independently and identically sampled from t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  18. arXiv:2407.12053  [pdf, other

    cs.LG cs.AI q-bio.QM

    Improving AlphaFlow for Efficient Protein Ensembles Generation

    Authors: Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

    Abstract: Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still r… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024 AI4Science workshop

  19. arXiv:2407.11991  [pdf, other

    cs.HC cs.AI

    Inspired by AI? A Novel Generative AI System To Assist Conceptual Automotive Design

    Authors: Ye Wang, Nicole B. Damen, Thomas Gale, Voho Seo, Hooman Shayani

    Abstract: Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather image ideas, followed by sketching on paper or using digital tools to develop concepts. Emerging generative AI techniques, such as diffusion models, off… ▽ More

    Submitted 6 June, 2024; originally announced July 2024.

    Journal ref: IDETC 2024

  20. arXiv:2407.11966  [pdf, other

    cs.CV cs.AI cs.LG

    Efficient Training with Denoised Neural Weights

    Authors: Yifan Gong, Zheng Zhan, Yanyu Li, Yerlan Idelbayev, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

    Abstract: Good weight initialization serves as an effective measure to reduce the training cost of a deep neural network (DNN) model. The choice of how to initialize parameters is challenging and may require manual tuning, which can be time-consuming and prone to human error. To overcome such limitations, this work takes a novel step towards building a weight generator to synthesize the neural weights for i… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project Page: https://yifanfanfanfan.github.io/denoised-weights/

  21. arXiv:2407.11877  [pdf, ps, other

    cs.LO cs.AI

    Bridging Weighted First Order Model Counting and Graph Polynomials

    Authors: Qipeng Kuang, Ondřej Kuželka, Yuanhong Wang, Yuyi Wang

    Abstract: The Weighted First-Order Model Counting Problem (WFOMC) asks to compute the weighted sum of models of a given first-order logic sentence over a given domain. It can be solved in time polynomial in the domain size for sentences from the two-variable fragment with counting quantifiers, known as $C^2$. This polynomial-time complexity is also retained when extending $C^2$ by one of the following axiom… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 37 pages, 2 figures

    ACM Class: F.4.0

  22. arXiv:2407.11844  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Variational Randomized Smoothing for Sample-Wise Adversarial Robustness

    Authors: Ryo Hase, Ye Wang, Toshiaki Koike-Akino, Jing Liu, Kieran Parsons

    Abstract: Randomized smoothing is a defensive technique to achieve enhanced robustness against adversarial examples which are small input perturbations that degrade the performance of neural network models. Conventional randomized smoothing adds random noise with a fixed noise level for every input sample to smooth out adversarial perturbations. This paper proposes a new variational framework that uses a pe… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 20 pages, under preparation

  23. arXiv:2407.11820  [pdf, other

    cs.CV cs.AI

    Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

    Authors: Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu

    Abstract: Audio-Visual Segmentation (AVS) aims to achieve pixel-level localization of sound sources in videos, while Audio-Visual Semantic Segmentation (AVSS), as an extension of AVS, further pursues semantic understanding of audio-visual scenes. However, since the AVSS task requires the establishment of audio-visual correspondence and semantic understanding simultaneously, we observe that previous methods… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024 accepted. Project url: https://gewu-lab.github.io/stepping_stones

  24. arXiv:2407.11781  [pdf, other

    cs.CV

    SlingBAG: Sliding ball adaptive growth algorithm with differentiable radiation enables super-efficient iterative 3D photoacoustic image reconstruction

    Authors: Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

    Abstract: High-quality 3D photoacoustic imaging (PAI) reconstruction under sparse view or limited view has long been challenging. Traditional 3D iterative-based reconstruction methods suffer from both slow speed and high memory consumption. Recently, in computer graphics, the differentiable rendering has made significant progress, particularly with the rise of 3D Gaussian Splatting. Inspired by these, we in… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  25. arXiv:2407.11730  [pdf, other

    cs.CV

    Monocular Occupancy Prediction for Scalable Indoor Scenes

    Authors: Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

    Abstract: Camera-based 3D occupancy prediction has recently garnered increasing attention in outdoor driving scenes. However, research in indoor scenes remains relatively unexplored. The core differences in indoor scenes lie in the complexity of scene scale and the variance in object size. In this paper, we propose a novel method, named ISO, for predicting indoor scene occupancy using monocular images. ISO… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  26. arXiv:2407.11569  [pdf, other

    cs.CV

    SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds

    Authors: Yanbo Wang, Wentao Zhao, Chuan Cao, Tianchen Deng, Jingchuan Wang, Weidong Chen

    Abstract: Although LiDAR semantic segmentation advances rapidly, state-of-the-art methods often incorporate specifically designed inductive bias derived from benchmarks originating from mechanical spinning LiDAR. This can limit model generalizability to other kinds of LiDAR technologies and make hyperparameter tuning more complex. To tackle these issues, we propose a generalized framework to accommodate var… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  27. arXiv:2407.11564  [pdf, other

    cs.CV

    SGIFormer: Semantic-guided and Geometric-enhanced Interleaving Transformer for 3D Instance Segmentation

    Authors: Lei Yao, Yi Wang, Moyun Liu, Lap-Pui Chau

    Abstract: In recent years, transformer-based models have exhibited considerable potential in point cloud instance segmentation. Despite the promising performance achieved by existing methods, they encounter challenges such as instance query initialization problems and excessive reliance on stacked layers, rendering them incompatible with large-scale 3D scenes. This paper introduces a novel method, named SGI… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  28. arXiv:2407.11433  [pdf, other

    cs.CV

    CycleHOI: Improving Human-Object Interaction Detection with Cycle Consistency of Detection and Generation

    Authors: Yisen Wang, Yao Teng, Limin Wang

    Abstract: Recognition and generation are two fundamental tasks in computer vision, which are often investigated separately in the exiting literature. However, these two tasks are highly correlated in essence as they both require understanding the underline semantics of visual concepts. In this paper, we propose a new learning framework, coined as CycleHOI, to boost the performance of human-object interactio… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  29. arXiv:2407.11335  [pdf, other

    cs.CV

    LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

    Authors: Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

    Abstract: Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vo… ▽ More

    Submitted 18 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  30. arXiv:2407.11272  [pdf, other

    cs.CV math.DG

    Differentiable Voxelization and Mesh Morphing

    Authors: Yihao Luo, Yikai Wang, Zhengrui Xiang, Yuliang Xiu, Guang Yang, ChoonHwai Yap

    Abstract: In this paper, we propose the differentiable voxelization of 3D meshes via the winding number and solid angles. The proposed approach achieves fast, flexible, and accurate voxelization of 3D meshes, admitting the computation of gradients with respect to the input mesh and GPU acceleration. We further demonstrate the application of the proposed voxelization in mesh morphing, where the voxelized mes… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  31. arXiv:2407.11162  [pdf, other

    cs.CV

    Integrating Amortized Inference with Diffusion Models for Learning Clean Distribution from Corrupted Images

    Authors: Yifei Wang, Weimin Bai, Weijian Luo, Wenzheng Chen, He Sun

    Abstract: Diffusion models (DMs) have emerged as powerful generative models for solving inverse problems, offering a good approximation of prior distributions of real-world image data. Typically, diffusion models rely on large-scale clean signals to accurately learn the score functions of ground truth clean image distributions. However, such a requirement for large amounts of clean data is often impractical… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  32. arXiv:2407.11100  [pdf, other

    cs.CR cs.AI cs.CL

    Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

    Authors: Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Qiao Yu, Li Li, Fei-Yue Wang

    Abstract: Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensur… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 59 pages, 7 figures

  33. arXiv:2407.10957  [pdf, other

    cs.CV cs.AI

    Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

    Authors: Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

    Abstract: Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues. Such expressions… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  34. arXiv:2407.10947  [pdf, other

    cs.CV

    Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

    Authors: Yaoting Wang, Peiwen Sun, Yuanchao Li, Honggang Zhang, Di Hu

    Abstract: The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual space using audio cues. However, in this work, it is recognized that previous AVS methods show a heavy reliance on detrimental segmentation preferences related to audible objects, rather than precise audio guidance. We argue that the primary reason is that audio lacks robust semantics compared to vision, especi… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  35. arXiv:2407.10874  [pdf, other

    cs.HC cs.CV cs.LG

    Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals

    Authors: Keshav Bimbraw, Jing Liu, Ye Wang, Toshiaki Koike-Akino

    Abstract: Biosignal-based hand gesture classification is an important component of effective human-machine interaction. For multimodal biosignal sensing, the modalities often face data loss due to missing channels in the data which can adversely affect the gesture classification performance. To make the classifiers robust to missing channels in the data, this paper proposes using Random Channel Ablation (RC… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  36. arXiv:2407.10870  [pdf, other

    cs.CV cs.AI cs.HC cs.LG

    GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

    Authors: Keshav Bimbraw, Ye Wang, Jing Liu, Toshiaki Koike-Akino

    Abstract: Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 8 pages, 9 figures

  37. arXiv:2407.10756  [pdf, other

    cs.CV

    GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation

    Authors: Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang

    Abstract: In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most c… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 accepted

  38. arXiv:2407.10718  [pdf, other

    cs.AI cs.CL

    Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning

    Authors: Yulong Wang, Tianhao Shen, Lifeng Liu, Jian Xie

    Abstract: Existing agents based on large language models (LLMs) demonstrate robust problem-solving capabilities by integrating LLMs' inherent knowledge, strong in-context learning and zero-shot capabilities, and the use of tools combined with intricately designed LLM invocation workflows by humans. However, these agents still exhibit shortcomings in long-term reasoning and under-use the potential of existin… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/Ag2S1/Sibyl-System

  39. arXiv:2407.10655  [pdf, other

    cs.CV

    OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer

    Authors: Yu Wang, Xiangbo Su, Qiang Chen, Xinyu Zhang, Teng Xi, Kun Yao, Errui Ding, Gang Zhang, Jingdong Wang

    Abstract: Open-vocabulary object detection focusing on detecting novel categories guided by natural language. In this report, we propose Open-Vocabulary Light-Weighted Detection Transformer (OVLW-DETR), a deployment friendly open-vocabulary detector with strong performance and low latency. Building upon OVLW-DETR, we provide an end-to-end training recipe that transferring knowledge from vision-language mode… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 4 pages

  40. arXiv:2407.10416  [pdf, other

    cs.AR

    SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

    Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

    Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  41. arXiv:2407.10414  [pdf, other

    eess.IV cs.CV cs.LG q-bio.NC

    Teaching CORnet Human fMRI Representations for Enhanced Model-Brain Alignment

    Authors: Zitong Lu, Yile Wang

    Abstract: Deep convolutional neural networks (DCNNs) have demonstrated excellent performance in object recognition and have been found to share some similarities with brain visual processing. However, the substantial gap between DCNNs and human visual perception still exists. Functional magnetic resonance imaging (fMRI) as a widely used technique in cognitive neuroscience can record neural activation in the… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.17231

  42. arXiv:2407.10376  [pdf, other

    q-bio.NC cs.CL

    Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder

    Authors: Yuejiao Wang, Xianmin Gong, Lingwei Meng, Xixin Wu, Helen Meng

    Abstract: Functional magnetic resonance imaging (fMRI) is essential for developing encoding models that identify functional changes in language-related brain areas of individuals with Neurocognitive Disorders (NCD). While large language model (LLM)-based fMRI encoding has shown promise, existing studies predominantly focus on healthy, young adults, overlooking older NCD populations and cognitive level corre… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 5 pages, accepted by Interspeech 2024

  43. Pattern Guided UV Recovery for Realistic Video Garment Texturing

    Authors: Youyi Zhan, Tuanfeng Y. Wang, Tianjia Shao, Kun Zhou

    Abstract: The fast growth of E-Commerce creates a global market worth USD 821 billion for online fashion shopping. What unique about fashion presentation is that, the same design can usually be offered with different cloths textures. However, only real video capturing or manual per-frame editing can be used for virtual showcase on the same design with different textures, both of which are heavily labor inte… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to IEEE Transactions on Visualization and Computer Graphics

  44. arXiv:2407.10135  [pdf, other

    cs.CV

    FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection

    Authors: Zheng Jiang, Jinqing Zhang, Yanan Zhang, Qingjie Liu, Zhenghui Hu, Baohui Wang, Yunhong Wang

    Abstract: Although multi-view 3D object detection based on the Bird's-Eye-View (BEV) paradigm has garnered widespread attention as an economical and deployment-friendly perception solution for autonomous driving, there is still a performance gap compared to LiDAR-based methods. In recent years, several cross-modal distillation methods have been proposed to transfer beneficial information from teacher models… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  45. arXiv:2407.10112  [pdf, other

    cs.IR

    Warming Up Cold-Start CTR Prediction by Learning Item-Specific Feature Interactions

    Authors: Yaqing Wang, Hongming Piao, Daxiang Dong, Quanming Yao, Jingbo Zhou

    Abstract: In recommendation systems, new items are continuously introduced, initially lacking interaction records but gradually accumulating them over time. Accurately predicting the click-through rate (CTR) for these items is crucial for enhancing both revenue and user experience. While existing methods focus on enhancing item ID embeddings for new items within general CTR models, they tend to adopt a glob… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: KDD 2024

  46. arXiv:2407.10081  [pdf, other

    cs.IR

    All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era

    Authors: Bo Chen, Xinyi Dai, Huifeng Guo, Wei Guo, Weiwen Liu, Yong Liu, Jiarui Qin, Ruiming Tang, Yichao Wang, Chuhan Wu, Yaxiong Wu, Hao Zhang

    Abstract: Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  47. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  48. arXiv:2407.09920  [pdf, other

    cs.CV

    MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection

    Authors: Ziyue Huang, Yongchao Feng, Qingjie Liu, Yunhong Wang

    Abstract: Detection pre-training methods for the DETR series detector have been extensively studied in natural scenes, e.g., DETReg. However, the detection pre-training remains unexplored in remote sensing scenes. In existing pre-training methods, alignment between object embeddings extracted from a pre-trained backbone and detector features is significant. However, due to differences in feature extraction… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 14 pages, 4 figures

  49. arXiv:2407.09887  [pdf, other

    cs.LG math.OC

    Benchmarking LLMs for Optimization Modeling and Enhancing Reasoning via Reverse Socratic Synthesis

    Authors: Zhicheng Yang, Yinya Huang, Wei Shi, Liang Feng, Linqi Song, Yiwei Wang, Xiaodan Liang, Jing Tang

    Abstract: Large language models (LLMs) have exhibited their problem-solving ability in mathematical reasoning. Solving realistic optimization (OPT) problems in industrial application scenarios requires advanced and applied math ability. However, current OPT benchmarks that merely solve linear programming are far from complex realistic situations. In this work, we propose E-OPT, a benchmark for end-to-end op… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  50. arXiv:2407.09817  [pdf, other

    cs.SD cs.CL eess.AS

    Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

    Authors: Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recogniti… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024