Skip to main content

Showing 1–50 of 117 results for author: Peng, K

  1. arXiv:2407.02685  [pdf, other

    cs.CV

    Open Panoramic Segmentation

    Authors: Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen

    Abstract: Panoramic images, capturing a 360° field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation… ▽ More

    Submitted 11 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://junweizheng93.github.io/publications/OPS/OPS.html

  2. arXiv:2407.02182  [pdf, other

    cs.CV cs.RO eess.IV

    Occlusion-Aware Seamless Segmentation

    Authors: Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang

    Abstract: Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Ble… ▽ More

    Submitted 17 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The fresh dataset and source code are available at https://github.com/yihong-97/OASS

  3. arXiv:2407.01872  [pdf, other

    cs.CV cs.RO eess.IV

    Referring Atomic Video Action Recognition

    Authors: Kunyu Peng, Jia Fu, Kailun Yang, Di Wen, Yufan Chen, Ruiping Liu, Junwei Zheng, Jiaming Zhang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: We introduce a new task called Referring Atomic Video Action Recognition (RAVAR), aimed at identifying atomic actions of a particular person based on a textual description and the video data of this person. This task differs from traditional action recognition and localization, where predictions are delivered for all present individuals. In contrast, we focus on recognizing the correct atomic acti… ▽ More

    Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The dataset and code will be made publicly available at https://github.com/KPeng9510/RAVAR

  4. arXiv:2407.00033  [pdf, other

    q-bio.NC cs.AI

    Uncovering cognitive taskonomy through transfer learning in masked autoencoder-based fMRI reconstruction

    Authors: Youzhi Qu, Junfeng Xia, Xinyao Jian, Wendu Li, Kaining Peng, Zhichao Liang, Haiyan Wu, Quanying Liu

    Abstract: Data reconstruction is a widely used pre-training task to learn the generalized features for many downstream tasks. Although reconstruction tasks have been applied to neural signal completion and denoising, neural signal reconstruction is less studied. Here, we employ the masked autoencoder (MAE) model to reconstruct functional magnetic resonance imaging (fMRI) data, and utilize a transfer learnin… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

  5. arXiv:2406.15736  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

    Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum

    Abstract: Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as h… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  6. arXiv:2405.13646  [pdf

    cs.LG

    A Transformer variant for multi-step forecasting of water level and hydrometeorological sensitivity analysis based on explainable artificial intelligence technology

    Authors: Mingyu Liu, Nana Bao, Xingting Yan, Chenyang Li, Kai Peng

    Abstract: Understanding the combined influences of meteorological and hydrological factors on water level and flood events is essential, particularly in today's changing climate environments. Transformer, as one kind of the cutting-edge deep learning methods, offers an effective approach to model intricate nonlinear processes, enables the extraction of key features and water level predictions. EXplainable A… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  7. arXiv:2405.02678  [pdf, other

    cs.LG cs.AI cs.CV

    Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?

    Authors: M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis

    Abstract: The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current resea… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  8. arXiv:2404.11764  [pdf, other

    cs.CV

    Multimodal 3D Object Detection on Unseen Domains

    Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel

    Abstract: LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degradation. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world,… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: technical report

  9. arXiv:2404.11737  [pdf, other

    cs.CV

    Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection

    Authors: Deepti Hegde, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Vishal M. Patel

    Abstract: Popular representation learning methods encourage feature invariance under transformations applied at the input. However, in 3D perception tasks like object localization and segmentation, outputs are naturally equivariant to some transformations, such as rotation. Using pre-training loss functions that encourage equivariance of features under certain transformations provides a strong self-supervis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: technical report

  10. arXiv:2404.06674  [pdf, other

    cs.SD cs.AI eess.AS

    VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

    Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

    Abstract: We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre. Previous works have been constrained to specialized models that can only edit these attributes individually and suffer from the following pitfalls: the magnitude of the conversion… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  11. arXiv:2403.20236  [pdf, other

    cs.CV

    Long-Tailed Anomaly Detection with Learnable Class Names

    Authors: Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos

    Abstract: Anomaly detection (AD) aims to identify defective images and localize their defects (if any). Ideally, AD models should be able to detect defects over many image classes; without relying on hard-coded class names that can be uninformative or inconsistent across datasets; learn without anomaly supervision; and be robust to the long-tailed distributions of real-world applications. To address these c… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: This paper is accepted to CVPR 2024. The supplementary material is included. The long-tailed dataset split is available at https://zenodo.org/records/10854201

  12. arXiv:2403.14442  [pdf, other

    cs.CV

    RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

    Authors: Yufan Chen, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ruiping Liu, Philip Torr, Rainer Stiefelhagen

    Abstract: Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we pr… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024. Project page: https://yufanchen96.github.io/projects/RoDLA

  13. arXiv:2403.09975  [pdf, other

    cs.CV cs.RO eess.IV

    Skeleton-Based Human Action Recognition with Noisy Labels

    Authors: Yi Xu, Kunyu Peng, Di Wen, Ruiping Liu, Junwei Zheng, Yufan Chen, Jiaming Zhang, Alina Roitberg, Kailun Yang, Rainer Stiefelhagen

    Abstract: Understanding human actions from body poses is critical for assistive robots sharing space with humans in order to make informed and safe decisions about the next interaction. However, precise temporal localization and annotation of activity sequences is time-consuming and the resulting labels are often noisy. If not effectively addressed, label noise negatively affects the model's training, resul… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: The source code will be made accessible at https://github.com/xuyizdby/NoiseEraSAR

  14. arXiv:2403.09963  [pdf, other

    cs.CL cs.AI cs.IR

    Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction

    Authors: Ziyang Xu, Keqin Peng, Liang Ding, Dacheng Tao, Xiliang Lu

    Abstract: Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction, i.e., prompts tend to introduce biases toward specific labels. Prompt bias presents a significant challenge in assessing the factual knowledge within PLMs. Therefore, this paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating pro… ▽ More

    Submitted 26 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by COLING 2024

  15. arXiv:2402.18302  [pdf, other

    cs.CV cs.RO eess.AS eess.IV

    EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

    Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

    Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: The source code and datasets will be made publicly available at https://github.com/lab206/EchoTrack

  16. arXiv:2402.16771  [pdf, other

    econ.TH cs.GT math.PR

    Wisdom and Foolishness of Noisy Matching Markets

    Authors: Kenny Peng, Nikhil Garg

    Abstract: We consider a many-to-one matching market where colleges share true preferences over students but make decisions using only independent noisy rankings. Each student has a true value $v$, but each college $c$ ranks the student according to an independently drawn estimated value $v + X_c$ for $X_c\sim \mathcal{D}.$ We ask a basic question about the resulting stable matching: How noisy is the set of… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  17. Automating psychological hypothesis generation with AI: when large language models meet causal graph

    Authors: Song Tong, Kai Mao, Zhen Huang, Yukun Zhao, Kaiping Peng

    Abstract: Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 pote… ▽ More

    Submitted 15 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  18. arXiv:2401.16923  [pdf, other

    cs.CV cs.RO eess.IV

    Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

    Authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Yufan Chen, Ke Cao, Junwei Zheng, M. Saquib Sarfraz, Kailun Yang, Rainer Stiefelhagen

    Abstract: Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE IV 2024. The source code is publicly available at https://github.com/RuipingL/MISS

  19. arXiv:2401.16712  [pdf, other

    cs.CV cs.RO eess.IV

    LF Tracy: A Unified Single-Pipeline Approach for Salient Object Detection in Light Field Cameras

    Authors: Fei Teng, Jiaming Zhang, Jiawei Liu, Kunyu Peng, Xina Cheng, Zhiyong Li, Kailun Yang

    Abstract: Leveraging the rich information extracted from light field (LF) cameras is instrumental for dense prediction tasks. However, adapting light field data to enhance Salient Object Detection (SOD) still follows the traditional RGB methods and remains under-explored in the community. Previous approaches predominantly employ a custom two-stream design to discover the implicit angular feature within ligh… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: The source code will be made publicly available at https://github.com/FeiBryantkit/LF-Tracy

  20. arXiv:2401.16462  [pdf, other

    cs.LG cs.AI

    Supervised Contrastive Learning based Dual-Mixer Model for Remaining Useful Life Prediction

    Authors: En Fu, Yanyan Hu, Kaixiang Peng, Yuxin Chu

    Abstract: The problem of the Remaining Useful Life (RUL) prediction, aiming at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device, has gained significant attention from researchers in recent years. In this paper, to overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approa… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  21. arXiv:2401.12087  [pdf, other

    cs.CL

    Revisiting Demonstration Selection Strategies in In-Context Learning

    Authors: Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

    Abstract: Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL), where a few examples are used to describe a task to the model. However, the performance of ICL varies significantly with the choice of demonstrations, and it is still unclear why this happens or what factors will influence its choice. In this work, we first revisit the fa… ▽ More

    Submitted 23 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: ACL 2024

  22. arXiv:2401.02122  [pdf, other

    cs.CL cs.SD eess.AS

    PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques

    Authors: Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kuang-Chen Peng, Zih-Ching Chen, Hung-yi Lee

    Abstract: Parameter-Efficient Fine-Tuning (PEFT) is increasingly recognized as an effective method in speech processing. However, the optimal approach and the placement of PEFT methods remain inconclusive. Our study conducts extensive experiments to compare different PEFT methods and their layer-wise placement adapting Differentiable Architecture Search (DARTS). We also explore the use of ensemble learning… ▽ More

    Submitted 7 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond (SASB) workshop

  23. arXiv:2401.01519  [pdf

    cs.LG cs.AI

    Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review

    Authors: Luoma Ke, Song Tong, Peng Cheng, Kaiping Peng

    Abstract: This paper explores the frontiers of large language models (LLMs) in psychology applications. Psychology has undergone several theoretical changes, and the current use of Artificial Intelligence (AI) and Machine Learning, particularly LLMs, promises to open up new research directions. We provide a detailed exploration of how LLMs like ChatGPT are transforming psychological research. It discusses t… ▽ More

    Submitted 16 March, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  24. arXiv:2312.11152  [pdf, other

    cs.CL cs.AI

    Prompt Based Tri-Channel Graph Convolution Neural Network for Aspect Sentiment Triplet Extraction

    Authors: Kun Peng, Lei Jiang, Hao Peng, Rui Liu, Zhengtao Yu, Jiaqian Ren, Zhifeng Hao, Philip S. Yu

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted in SIAM International Conference on Data Mining (SDM24)

  25. arXiv:2312.09841  [pdf, other

    cs.GT cs.CY econ.TH

    Monoculture in Matching Markets

    Authors: Kenny Peng, Nikhil Garg

    Abstract: Algorithmic monoculture arises when many decision-makers rely on the same algorithm to evaluate applicants. An emerging body of work investigates possible harms of this kind of homogeneity, but has been limited by the challenge of incorporating market effects in which the preferences and behavior of many applicants and decision-makers jointly interact to determine outcomes. Addressing this chall… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  26. arXiv:2312.06330  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Navigating Open Set Scenarios for Skeleton-based Action Recognition

    Authors: Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Se… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR

  27. arXiv:2312.00774  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Context Retrieval via Normalized Contextual Latent Interaction for Conversational Agent

    Authors: Junfeng Liu, Zhuocheng Mei, Kewen Peng, Ranga Raju Vatsavai

    Abstract: Conversational agents leveraging AI, particularly deep learning, are emerging in both academic research and real-world applications. However, these applications still face challenges, including disrespecting knowledge and facts, not personalizing to user preferences, and enormous demand for computational resources during training and inference. Recent research efforts have been focused on addressi… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

  28. arXiv:2311.05970  [pdf, other

    cs.CV cs.RO

    Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

    Authors: Calvin Tanama, Kunyu Peng, Zdravko Marinov, Rainer Stiefelhagen, Alina Roitberg

    Abstract: Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in real-world driving scenarios. This paper introduces a lightweight framework for resource-efficient driver activity recognition. The framework enhances 3D MobileNet, a ne… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted at IROS 2023

  29. arXiv:2310.12035  [pdf

    cs.HC q-bio.NC

    Tracking dynamic flow: Decoding flow fluctuations through performance in a fine motor control task

    Authors: Bohao Tian, Shijun Zhang, Sirui Chen, Yuru Zhang, Kaiping Peng, Hongxing Zhang, Dangxiao Wang

    Abstract: Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with person… ▽ More

    Submitted 28 December, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

  30. arXiv:2310.09762  [pdf, other

    cs.CL cs.AI

    Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

    Authors: Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, Dacheng Tao

    Abstract: The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost. Even in the era of large-scale language models (LLMs), MoE continues to play a crucial role, as some researchers have indicated that GPT-4 adopts the MoE structure to ensure diverse inf… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  31. arXiv:2309.16592  [pdf, other

    cs.CV cs.LG

    Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection

    Authors: Manish Sharma, Moitreya Chatterjee, Kuan-Chuan Peng, Suhas Lohit, Michael Jones

    Abstract: The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, LIMIT Workshop. The first two authors contributed equally

  32. arXiv:2309.12029  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

    Authors: Yifei Chen, Kunyu Peng, Alina Roitberg, David Schneider, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: To integrate action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower robots with the capacity to address occlusion, we propose a simple and effective method. We first pre… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: The source code will be made publicly available at https://github.com/cyfml/OPSTL

  33. arXiv:2309.12009  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

    Authors: Yiping Wei, Kunyu Peng, Alina Roitberg, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bone… ▽ More

    Submitted 10 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024. The source code will be made publicly available at https://github.com/desehuileng0o0/IKEM

  34. arXiv:2309.02171  [pdf, other

    cs.IT eess.SP

    A Wideband MIMO Channel Model for Aerial Intelligent Reflecting Surface-Assisted Wireless Communications

    Authors: Shaoyi Liu, Nan Ma, Yaning Chen, Ke Peng, Dongsheng Xue

    Abstract: Compared to traditional intelligent reflecting surfaces(IRS), aerial IRS (AIRS) has unique advantages, such as more flexible deployment and wider service coverage. However, modeling AIRS in the channel presents new challenges due to their mobility. In this paper, a three-dimensional (3D) wideband channel model for AIRS and IRS joint-assisted multiple-input multiple-output (MIMO) communication syst… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: 6 pages, 7 figures

  35. arXiv:2308.12049  [pdf, other

    cs.CV cs.AI

    Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

    Authors: Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen

    Abstract: Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merel… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  36. arXiv:2308.07832  [pdf, ps, other

    cs.LG cs.AI stat.ME

    REFORMS: Reporting Standards for Machine Learning Based Science

    Authors: Sayash Kapoor, Emily Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Malik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, Arvind Narayanan

    Abstract: Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways acros… ▽ More

    Submitted 19 September, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

  37. arXiv:2307.15588  [pdf, other

    cs.CV cs.RO eess.IV

    OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation

    Authors: Fei Teng, Jiaming Zhang, Kunyu Peng, Yaonan Wang, Rainer Stiefelhagen, Kailun Yang

    Abstract: Light field cameras, by harnessing the power of micro-lens array, are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation, a critical aspect of scene interpretation in vision intelligence. However, the extensive angular information of light… ▽ More

    Submitted 21 December, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: The source code of OAFuser will be made publicly available at https://github.com/FeiBryantkit/OAFuser

  38. arXiv:2307.15142  [pdf, other

    cs.IR cs.SI

    Reconciling the accuracy-diversity trade-off in recommendations

    Authors: Kenny Peng, Manish Raghavan, Emma Pierson, Jon Kleinberg, Nikhil Garg

    Abstract: In recommendation settings, there is an apparent trade-off between the goals of accuracy (to recommend items a user is most likely to want) and diversity (to recommend items representing a range of categories). As such, real-world recommender systems often explicitly incorporate diversity separately from accuracy. This approach, however, leaves a basic question unanswered: Why is there a trade-off… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 34 pages, 5 figures

  39. arXiv:2307.10700  [pdf, other

    cs.DL cs.CL cs.CY

    Topics, Authors, and Institutions in Large Language Model Research: Trends from 17K arXiv Papers

    Authors: Rajiv Movva, Sidhika Balachandar, Kenny Peng, Gabriel Agostini, Nikhil Garg, Emma Pierson

    Abstract: Large language models (LLMs) are dramatically influencing AI research, spurring discussions on what has changed so far and how to shape the field's future. To clarify such questions, we analyze a new dataset of 16,979 LLM-related arXiv papers, focusing on recent trends in 2023 vs. 2018-2022. First, we study disciplinary shifts: LLM research increasingly considers societal impacts, evidenced by 20x… ▽ More

    Submitted 28 April, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: NAACL 2024. Data & code available at https://github.com/rmovva/LLM-publication-patterns-public

  40. Where Did the President Visit Last Week? Detecting Celebrity Trips from News Articles

    Authors: Kai Peng, Ying Zhang, Shuai Ling, Zhaoru Ke, Haipeng Zhang

    Abstract: Celebrities' whereabouts are of pervasive importance. For instance, where politicians go, how often they visit, and who they meet, come with profound geopolitical and economic implications. Although news articles contain travel information of celebrities, it is not possible to perform large-scale and network-wise analysis due to the lack of automatic itinerary detection tools. To design such tools… ▽ More

    Submitted 9 October, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: Accepted to ICWSM 2024, 12 pages

  41. arXiv:2307.07763  [pdf, other

    cs.RO cs.CV eess.IV

    Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents

    Authors: Ke Cao, Ruiping Liu, Ze Wang, Kunyu Peng, Jiaming Zhang, Junwei Zheng, Zhifeng Teng, Kailun Yang, Rainer Stiefelhagen

    Abstract: The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based… ▽ More

    Submitted 25 December, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ROBIO 2023

  42. arXiv:2307.07757  [pdf, other

    cs.CV cs.HC cs.RO eess.IV

    Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments

    Authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ke Cao, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Code will be available at https://github.com/RuipingL/OpenSU

  43. arXiv:2307.01915  [pdf

    physics.soc-ph cs.SI math.DS math.HO nlin.AO

    Using mathematics to study how people influence each other's opinions

    Authors: Grace J. Li, Jiajie Luo, Kaiyan Peng, Mason A. Porter

    Abstract: People sometimes change their opinions when they discuss things with other people. Researchers study mathematical models of opinions to explore how people influence each other through their social interactions. In today's digital world, these models can help us learn how to promote accurate information and reduce unwanted influence. In this article, we discuss a simple mathematical model that look… ▽ More

    Submitted 22 June, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

    Comments: This article is written for teenagers and preteens. A newly revised version, based on referee comments

  44. arXiv:2305.08420  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains

    Authors: Kunyu Peng, Di Wen, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

    Abstract: Domain adaptation is essential for activity recognition to ensure accurate and robust performance across diverse environments, sensor types, and data sources. Unsupervised domain adaptation methods have been extensively studied, yet, they require large-scale unlabeled data from the target domain. In this work, we focus on Few-Shot Domain Adaptation for Activity Recognition (FSDA-AR), which leverag… ▽ More

    Submitted 27 April, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: The benchmark and source code will be publicly available at https://github.com/KPeng9510/RelaMiX

  45. arXiv:2305.00104  [pdf, other

    cs.CV eess.AS eess.IV

    MMViT: Multiscale Multiview Vision Transformers

    Authors: Yuchen Liu, Natasha Ong, Kaiyan Peng, Bo Xiong, Qifan Wang, Rui Hou, Madian Khabsa, Kaiyue Yang, David Liu, Donald S. Williamson, Hanchao Yu

    Abstract: We present Multiscale Multiview Vision Transformers (MMViT), which introduces multiscale feature maps and multiview encodings to transformer models. Our model encodes different views of the input signal and builds several channel-resolution feature stages to process the multiple views of the input at different resolutions in parallel. At each scale stage, we use a cross-attention block to fuse inf… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  46. arXiv:2303.13842  [pdf, other

    cs.CV cs.RO eess.IV

    FishDreamer: Towards Fisheye Semantic Completion via Unified Image Outpainting and Segmentation

    Authors: Hao Shi, Yu Li, Kailun Yang, Jiaming Zhang, Kunyu Peng, Alina Roitberg, Yaozu Ye, Huajian Ni, Kaiwei Wang, Rainer Stiefelhagen

    Abstract: This paper raises the new task of Fisheye Semantic Completion (FSC), where dense texture, structure, and semantics of a fisheye image are inferred even beyond the sensor field-of-view (FoV). Fisheye cameras have larger FoV than ordinary pinhole cameras, yet its unique special imaging model naturally leads to a blind area at the edge of the image plane. This is suboptimal for safety-critical applic… ▽ More

    Submitted 20 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR OmniCV 2023. Code and datasets will be available at https://github.com/MasterHow/FishDreamer

  47. arXiv:2303.13780  [pdf, other

    cs.CL

    Towards Making the Most of ChatGPT for Machine Translation

    Authors: Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

    Abstract: ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim… ▽ More

    Submitted 20 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023 (findings)

  48. arXiv:2303.11910  [pdf, other

    cs.CV

    360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View

    Authors: Zhifeng Teng, Jiaming Zhang, Kailun Yang, Kunyu Peng, Hao Shi, Simon Reiß, Ke Cao, Rainer Stiefelhagen

    Abstract: Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, mapping from 360° panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in… ▽ More

    Submitted 4 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Code and datasets are available at the project page: https://jamycheung.github.io/360BEV.html. Accepted to WACV 2024

  49. arXiv:2303.11570  [pdf, other

    cs.CV

    Boundary Unlearning

    Authors: Min Chen, Weizhuo Gao, Gaoyang Liu, Kai Peng, Chen Wang

    Abstract: The practical needs of the ``right to be forgotten'' and poisoned data removal call for efficient \textit{machine unlearning} techniques, which enable machine learning models to unlearn, or to forget a fraction of training data and its lineage. Recent studies on machine unlearning for deep neural networks (DNNs) attempt to destroy the influence of the forgetting data by scrubbing the model paramet… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: To be published in Proc.of 34th IEEE/CVF CVPR, 2023

  50. arXiv:2303.01480  [pdf, other

    cs.CV

    Delivering Arbitrary-Modal Semantic Segmentation

    Authors: Jiaming Zhang, Ruiping Liu, Hao Shi, Kailun Yang, Simon Reiß, Kunyu Peng, Haodong Fu, Kaiwei Wang, Rainer Stiefelhagen

    Abstract: Multimodal fusion can make semantic segmentation more robust. However, fusing an arbitrary number of modalities remains underexplored. To delve into this problem, we create the DeLiVER arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, we provide this dataset in four severe weather conditions as well as five sensor failure cases to expl… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR 2023. Dataset and our code are at: https://jamycheung.github.io/DELIVER.html