Skip to main content

Showing 1–50 of 440 results for author: Jin, S

  1. arXiv:2407.13306  [pdf, ps, other

    cs.IT eess.SP

    Group Movable Antenna With Flexible Sparsity: Joint Array Position and Sparsity Optimization

    Authors: Haiquan Lu, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA) is a promising technology to exploit the spatial variation of wireless channel for performance enhancement, by dynamically varying the antenna position within a certain region. However, for multi-antenna communication systems, moving each antenna independently not only requires prohibitive complexity to find the optimal antenna positions, but also incurs sophisticated movement… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures

  2. arXiv:2407.12827  [pdf, other

    cs.CL cs.LG

    The Solution for The PST-KDD-2024 OAG-Challenge

    Authors: Shupeng Zhong, Xinger Li, Shushan Jin, Yang Yang

    Abstract: In this paper, we introduce the second-place solution in the KDD-2024 OAG-Challenge paper source tracing track. Our solution is mainly based on two methods, BERT and GCN, and combines the reasoning results of BERT and GCN in the final submission to achieve complementary performance. In the BERT solution, we focus on processing the fragments that appear in the references of the paper, and use a var… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.11454  [pdf, other

    quant-ph cs.CR cs.DC

    Cloud-based Semi-Quantum Money

    Authors: Yichi Zhang, Siyuan Jin, Yuhan Huang, Bei Zeng, Qiming Shao

    Abstract: In the 1970s, Wiesner introduced the concept of quantum money, where quantum states generated according to specific rules function as currency. These states circulate among users with quantum resources through quantum channels or face-to-face interactions. Quantum mechanics grants quantum money physical-level unforgeability but also makes minting, storing, and circulating it significantly challeng… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2407.11321  [pdf, other

    cs.CV

    TCFormer: Visual Recognition via Token Clustering Transformer

    Authors: Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

    Abstract: Transformers are widely used in computer vision areas and have achieved remarkable success. Most state-of-the-art approaches split images into regular grids and represent each grid region with a vision token. However, fixed token distribution disregards the semantic meaning of different image regions, resulting in sub-optimal performance. To address this issue, we propose the Token Clustering Tran… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.10125  [pdf, other

    cs.CV

    When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

    Authors: Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

    Abstract: Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike p… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV'2024

  6. arXiv:2407.08257  [pdf, other

    cs.CV cs.AI cs.LG

    Knowledge distillation to effectively attain both region-of-interest and global semantics from an image where multiple objects appear

    Authors: Seonwhee Jin

    Abstract: Models based on convolutional neural networks (CNN) and transformers have steadily been improved. They also have been applied in various computer vision downstream tasks. However, in object detection tasks, accurately localizing and classifying almost infinite categories of foods in images remains challenging. To address these problems, we first segmented the food as the region-of-interest (ROI) b… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.07357  [pdf, ps, other

    cs.LG q-bio.MN

    A deep graph model for the signed interaction prediction in biological network

    Authors: Shuyi Jin, Mengji Zhang, Meijie Wang, Lun Yu

    Abstract: In pharmaceutical research, the strategy of drug repurposing accelerates the development of new therapies while reducing R&D costs. Network pharmacology lays the theoretical groundwork for identifying new drug indications, and deep graph models have become essential for their precision in mapping complex biological networks. Our study introduces an advanced graph model that utilizes graph convolut… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  8. arXiv:2407.06985  [pdf, other

    cs.AI

    PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

    Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

    Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  9. arXiv:2407.06691  [pdf, other

    cs.IT eess.SP

    OFDM Achieves the Lowest Ranging Sidelobe Under Random ISAC Signaling

    Authors: Fan Liu, Ying Zhang, Yifeng Xiong, Shuangyang Li, Weijie Yuan, Feifei Gao, Shi Jin, Giuseppe Caire

    Abstract: This paper aims to answer a fundamental question in the area of Integrated Sensing and Communications (ISAC): What is the optimal communication-centric ISAC waveform for ranging? Towards that end, we first established a generic framework to analyze the sensing performance of communication-centric ISAC waveforms built upon orthonormal signaling bases and random data symbols. Then, we evaluated thei… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, submitted to IEEE for possible publication

  10. arXiv:2407.06042  [pdf, ps, other

    eess.SP cs.IT

    Near-Optimal MIMO Detection Using Gradient-Based MCMC in Discrete Spaces

    Authors: Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin

    Abstract: The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  11. arXiv:2407.04404  [pdf

    cs.AR

    Fixed and Movable Antenna Technology for 6G Integrated Sensing and Communication

    Authors: Yong Zeng, Zhenjun Dong, Huizhi Wang, Lipeng Zhu, Ziyao Hong, Qingji Jiang, Dongming Wang, Shi Jin, Rui Zhang

    Abstract: By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens… ▽ More

    Submitted 16 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: in Chinese language

  12. Efficient IoT Devices Localization Through Wi-Fi CSI Feature Fusion and Anomaly Detection

    Authors: Yan Li, Jie Yang, Shang-Ling Shih, Wan-Ting Shih, Chao-Kai Wen, Shi Jin

    Abstract: Internet of Things (IoT) device localization is fundamental to smart home functionalities, including indoor navigation and tracking of individuals. Traditional localization relies on relative methods utilizing the positions of anchors within a home environment, yet struggles with precision due to inherent inaccuracies in these anchor positions. In response, we introduce a cutting-edge smartphone-b… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted in IEEE Internet of Things Journal, Early Access, 2024

    Journal ref: IEEE Internet of Things Journal, Early Access, 2024

  13. Coding-Enhanced Cooperative Jamming for Secret Communication in Fluid Antenna Systems

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Guyue Li, Farshad Rostami Ghadi, Yongxu Zhu, Shi Jin, Chan-Byoung Chae, Yangyang Zhang

    Abstract: This letter investigates the secret communication problem for a fluid antenna system (FAS)-assisted wiretap channel, where the legitimate transmitter transmits an information-bearing signal to the legitimate receiver, and at the same time, transmits a jamming signal to interfere with the eavesdropper (Eve). Unlike the conventional jamming scheme, which usually transmits Gaussian noise that interfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, this paper has been accepted by IEEE Communications Letters

  14. arXiv:2407.01755  [pdf, other

    cs.RO

    An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

    Authors: Xinyuan Luo, Shengmiao Jin, Hung-Jui Huang, Wenzhen Yuan

    Abstract: Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrate… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures, Accepted to IROS 2024

  15. arXiv:2406.13972  [pdf, other

    cs.SE

    CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

    Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin

    Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially ca… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  16. arXiv:2406.12270  [pdf, other

    cs.IT eess.SP

    Sparse MIMO for ISAC: New Opportunities and Challenges

    Authors: Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Linglong Dai, Yifei Yuan, Rui Zhang

    Abstract: Multiple-input multiple-output (MIMO) has been a key technology of wireless communications for decades. A typical MIMO system employs antenna arrays with the inter-antenna spacing being half of the signal wavelength, which we term as compact MIMO. Looking forward towards the future sixth-generation (6G) mobile communication networks, MIMO system will achieve even finer spatial resolution to not on… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  17. arXiv:2406.12030  [pdf, other

    cs.CV cs.AI cs.CL

    SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

    Authors: Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui, Jing Shao

    Abstract: The emergence of Vision Language Models (VLMs) has brought unprecedented advances in understanding multimodal information. The combination of textual and visual semantics in VLMs is highly complex and diverse, making the safety alignment of these models challenging. Furthermore, due to the limited study on the safety alignment of VLMs, there is a lack of large-scale, high-quality datasets. To addr… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.07923  [pdf, other

    cs.SD cs.AI eess.AS

    CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting

    Authors: Sichen Jin, Youngmoon Jung, Seungjin Lee, Jaeyoung Roh, Changwoo Han, Hoonyoung Cho

    Abstract: This paper introduces a novel approach for streaming openvocabulary keyword spotting (KWS) with text-based keyword enrollment. For every input frame, the proposed method finds the optimal alignment ending at the frame using connectionist temporal classification (CTC) and aggregates the frame-level acoustic embedding (AE) to obtain higher-level (i.e., character, word, or phrase) AE that aligns with… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.07886  [pdf, other

    cs.CL

    Label-aware Hard Negative Sampling Strategies with Momentum Contrastive Learning for Implicit Hate Speech Detection

    Authors: Jaehoon Kim, Seungwan Jin, Sohyun Park, Someen Park, Kyungsik Han

    Abstract: Detecting implicit hate speech that is not directly hateful remains a challenge. Recent research has attempted to detect implicit hate speech by applying contrastive learning to pre-trained language models such as BERT and RoBERTa, but the proposed models still do not have a significant advantage over cross-entropy loss-based learning. We found that contrastive learning based on randomly sampled b… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  20. arXiv:2406.05913  [pdf, other

    cs.NI eess.SP

    Revisiting Multi-User Downlink in IEEE 802.11ax: A Designers Guide to MU-MIMO

    Authors: Liu Cao, Lyutianyang Zhang, Sumit Roy, Sian Jin

    Abstract: Downlink (DL) Multi-User (MU) Multiple Input Multiple Output (MU-MIMO) is a key technology that allows multiple concurrent data transmissions from an Access Point (AP) to a selected sub-set of clients for higher network efficiency in IEEE 802.11ax. However, DL MU-MIMO feature is typically turned off as the default setting in AP vendors' products, that is, turning on the DL MU-MIMO may not help inc… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. 7 pages, 6 figures, magazine paper

  21. arXiv:2406.05821  [pdf, other

    cs.CV

    F-LMM: Grounding Frozen Large Multimodal Models

    Authors: Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy

    Abstract: Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs' understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation tokens and overfit grounding and segmentation datasets. Such a design would inevitably cause a catastrophic diminution in the… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/wusize/F-LMM

  22. arXiv:2406.02761  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Multi-layer Learnable Attention Mask for Multimodal Tasks

    Authors: Wayner Barrios, SouYoung Jin

    Abstract: While the Self-Attention mechanism in the Transformer model has proven to be effective in many domains, we observe that it is less effective in more diverse settings (e.g. multimodality) due to the varying granularity of each token and the high computational demands of lengthy sequences. To address the challenges, we introduce the Learnable Attention Mask (LAM), strategically designed to globally… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  23. arXiv:2406.01282  [pdf, other

    cs.LG

    Continuous Geometry-Aware Graph Diffusion via Hyperbolic Neural PDE

    Authors: Jiaxu Liu, Xinping Yi, Sihao Wu, Xiangyu Yin, Tianle Zhang, Xiaowei Huang, Shi Jin

    Abstract: While Hyperbolic Graph Neural Network (HGNN) has recently emerged as a powerful tool dealing with hierarchical graph data, the limitations of scalability and efficiency hinder itself from generalizing to deep models. In this paper, by envisioning depth as a continuous-time embedding evolution, we decouple the HGNN and reframe the information propagation as a partial differential equation, letting… ▽ More

    Submitted 7 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: The short version of this work will appear in the Proceedings of the 2024 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2024)

  24. arXiv:2406.00752  [pdf, other

    cs.DC

    Blockchain-aided wireless federated learning: Resource allocation and client scheduling

    Authors: Jun Li, Weiwei Zhang, Kang Wei, Guangji Chen, Feng Shu, Wen Chen, Shi Jin

    Abstract: Federated learning (FL) based on the centralized design faces both challenges regarding the trust issue and a single point of failure. To alleviate these issues, blockchain-aided decentralized FL (BDFL) introduces the decentralized network architecture into the FL training process, which can effectively overcome the defects of centralized architecture. However, deploying BDFL in wireless networks… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  25. arXiv:2405.17932  [pdf, ps, other

    cs.LG cs.DC

    Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization

    Authors: Xiumei Deng, Jun Li, Kang Wei, Long Shi, Zeihui Xiong, Ming Ding, Wen Chen, Shi Jin, H. Vincent Poor

    Abstract: Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead compared to federated SGD (FedSGD) algorithms, which arises from the necessity to transmit both local model updates a… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  26. arXiv:2405.17914  [pdf, other

    cs.LG

    Trustworthy DNN Partition for Blockchain-enabled Digital Twin in Wireless IIoT Networks

    Authors: Xiumei Deng, Jun Li, Long Shi, Kang Wei, Ming Ding, Yumeng Shao, Wen Chen, Shi Jin

    Abstract: Digital twin (DT) has emerged as a promising solution to enhance manufacturing efficiency in industrial Internet of Things (IIoT) networks. To promote the efficiency and trustworthiness of DT for wireless IIoT networks, we propose a blockchain-enabled DT (B-DT) framework that employs deep neural network (DNN) partitioning technique and reputation-based consensus mechanism, wherein the DTs maintain… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  27. arXiv:2405.17789  [pdf, ps, other

    cs.IT

    On the Downlink Average Energy Efficiency of Non-Stationary XL-MIMO

    Authors: Jun Zhang, Jiacheng Lu, Jingjing Zhang, Yu Han, Jue Wang, Shi Jin

    Abstract: Extra large-scale multiple-input multiple-output (XL-MIMO) is a key technology for future wireless communication systems. This paper considers the effects of visibility region (VR) at the base station (BS) in a non-stationary multi-user XL-MIMO scenario, where only partial antennas can receive users' signal. In time division duplexing (TDD) mode, we first estimate the VR at the BS by detecting the… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 13 pages, 11 figures

  28. arXiv:2405.13403  [pdf, other

    eess.IV cs.MM

    Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing

    Authors: Jiarun Ding, Peiwen Jiang, Chao-Kai Wen, Shi Jin

    Abstract: Semantic communication has undergone considerable evolution due to the recent rapid development of artificial intelligence (AI), significantly enhancing both communication robustness and efficiency. Despite these advancements, most current semantic communication methods for image transmission pay little attention to the differing importance of objects and backgrounds in images. To address this iss… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  29. arXiv:2405.07696  [pdf, other

    cs.CV

    MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

    Authors: Xueying Jiang, Sheng Jin, Xiaoqin Zhang, Ling Shao, Shijian Lu

    Abstract: Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimensions, depths, and orientations. We design MonoMAE, a monocular 3D detector inspired by Masked Autoencoders that addresses t… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  30. arXiv:2405.06993  [pdf, other

    cs.LG cs.DC

    Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations

    Authors: Yumeng Shao, Jun Li, Long Shi, Kang Wei, Ming Ding, Qianmu Li, Zengxiang Li, Wen Chen, Shi Jin

    Abstract: Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation i… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  31. arXiv:2405.05861  [pdf, other

    cs.RO cs.AI cs.LG

    ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers

    Authors: Liangliang Chen, Shiyu Jin, Haoyu Wang, Liangjun Zhang

    Abstract: Excavators are crucial for diverse tasks such as construction and mining, while autonomous excavator systems enhance safety and efficiency, address labor shortages, and improve human working conditions. Different from the existing modularized approaches, this paper introduces ExACT, an end-to-end autonomous excavator system that processes raw LiDAR, camera data, and joint positions to control exca… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: ICRA Workshop 2024: 3rd Workshop on Future of Construction: Lifelong Learning Robots in Changing Construction Sites

  32. arXiv:2404.19401  [pdf, other

    cs.CV

    UniFS: Universal Few-shot Instance Perception with Point Representations

    Authors: Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo

    Abstract: Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of ta… ▽ More

    Submitted 15 July, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by ECCV 2024

  33. arXiv:2404.13470  [pdf, other

    cs.DC cs.AI

    GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

    Authors: Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin

    Abstract: The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  34. arXiv:2404.12636  [pdf, other

    cs.SE

    Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs

    Authors: Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawendé F. Bissyandé, Claire Le Goues, Shunfu Jin

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks. Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks are however general… ▽ More

    Submitted 22 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  35. arXiv:2404.05199  [pdf, other

    eess.SP cs.IT

    Decision Transformer for Wireless Communications: A New Paradigm of Resource Management

    Authors: Jie Zhang, Jun Li, Long Shi, Zhe Wang, Shi Jin, Wen Chen, H. Vincent Poor

    Abstract: As the next generation of mobile systems evolves, artificial intelligence (AI) is expected to deeply integrate with wireless communications for resource management in variable environments. In particular, deep reinforcement learning (DRL) is an important tool for addressing stochastic optimization issues of resource allocation. However, DRL has to start each new training process from the beginning… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  36. arXiv:2404.05063  [pdf, other

    cs.CV

    AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement

    Authors: Shiwei Jin, Zhen Wang, Lei Wang, Peng Liu, Ning Bi, Truong Nguyen

    Abstract: Facial action unit (AU) intensity plays a pivotal role in quantifying fine-grained expression behaviors, which is an effective condition for facial expression manipulation. However, publicly available datasets containing intensity annotations for multiple AUs remain severely limited, often featuring a restricted number of subjects. This limitation places challenges to the AU intensity manipulation… ▽ More

    Submitted 10 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  37. arXiv:2404.04783  [pdf, other

    cs.IT eess.SP

    Fourier Transform-based Wavenumber Domain 3D Imaging in RIS-aided Communication Systems

    Authors: Yixuan Huang, Jie Yang, Wankai Tang, Chao-Kai Wen, Shi Jin

    Abstract: Radio imaging is rapidly gaining prominence in the design of future communication systems, with the potential to utilize reconfigurable intelligent surfaces (RISs) as imaging apertures. Although the sparsity of targets in three-dimensional (3D) space has led most research to adopt compressed sensing (CS)-based imaging algorithms, these often require substantial computational and memory burdens. Dr… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 16 pages, 11 figures, submitted to IEEE for possible publication

  38. arXiv:2404.02840  [pdf, ps, other

    cs.DC

    A Survey on Error-Bounded Lossy Compression for Scientific Datasets

    Authors: Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello

    Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: submitted to ACM Computing journal, requited to be 35 pages including references

  39. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  40. arXiv:2404.00429  [pdf, other

    cs.CV

    Multiway Point Cloud Mosaicking with Diffusion and Global Optimization

    Authors: Shengze Jin, Iro Armeni, Marc Pollefeys, Daniel Barath

    Abstract: We introduce a novel framework for multiway point cloud mosaicking (named Wednesday), designed to co-align sets of partially overlapping point clouds -- typically obtained from 3D scanners or moving RGB-D cameras -- into a unified coordinate system. At the core of our approach is ODIN, a learned pairwise registration algorithm that iteratively identifies overlaps and refines attention scores, empl… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  41. arXiv:2403.19185  [pdf, other

    cs.IT eess.SP

    Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation Learning

    Authors: Suhang Fan, Wei Xu, Renjie Xie, Shi Jin, Derrick Wing Kwan Ng, Naofal Al-Dhahir

    Abstract: Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  42. arXiv:2403.19144  [pdf, other

    cs.CV

    MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

    Authors: Seyeon Kim, Siyoon Jin, Jihye Park, Kihong Kim, Jiyoung Kim, Jisu Nam, Seungryong Kim

    Abstract: Conventional GAN-based models for talking head generation often suffer from limited quality and unstable training. Recent approaches based on diffusion models aimed to address these limitations and improve fidelity. However, they still face challenges, including extensive sampling times and difficulties in maintaining temporal consistency due to the high stochasticity of diffusion models. To overc… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  43. arXiv:2403.16004  [pdf, other

    cs.LG cs.AI

    A Federated Parameter Aggregation Method for Node Classification Tasks with Different Graph Network Structures

    Authors: Hao Song, Jiacheng Yao, Zhengxi Li, Shaocong Xu, Shibo Jin, Jiajun Zhou, Chenbo Fu, Qi Xuan, Shanqing Yu

    Abstract: Over the past few years, federated learning has become widely used in various classical machine learning fields because of its collaborative ability to train data from multiple sources without compromising privacy. However, in the area of graph neural networks, the nodes and network structures of graphs held by clients are different in many practical applications, and the aggregation method that d… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  44. arXiv:2403.11806  [pdf, other

    cs.IT eess.SP

    Fluid Antenna for Mobile Edge Computing

    Authors: Yiping Zuo, Jiajia Guo, Biyun Sheng, Chen Dai, Fu Xiao, Shi Jin

    Abstract: In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobilit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  45. arXiv:2403.11764  [pdf, other

    cs.IT eess.SP

    RIS-aided Single-frequency 3D Imaging by Exploiting Multi-view Image Correlations

    Authors: Yixuan Huang, Jie Yang, Chao-Kai Wen, Shi Jin

    Abstract: Retrieving range information in three-dimensional (3D) radio imaging is particularly challenging due to the limited communication bandwidth and pilot resources. To address this issue, we consider a reconfigurable intelligent surface (RIS)-aided uplink communication scenario, generating multiple measurements through RIS phase adjustment. This study successfully realizes 3D single-frequency imaging… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 16 pages, 12 figures, accepted by IEEE Transactions on Communications

  46. arXiv:2403.11461  [pdf, other

    cs.RO

    VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

    Authors: Weiyao Wang, Yutian Lei, Shiyu Jin, Gregory D. Hager, Liangjun Zhang

    Abstract: In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recogniz… ▽ More

    Submitted 18 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  47. arXiv:2403.07274  [pdf, other

    cs.IT eess.SP

    Achievable Rate Analysis and Optimization of Double-RIS Assisted Spatially Correlated MIMO with Statistical CSI

    Authors: Kaizhe Xu, Jiajia Guo, Jun Zhang, Shi Jin, Shaodan Ma

    Abstract: Reconfigurable intelligent surface (RIS) is a novel meta-material which can form a smart radio environment by dynamically altering reflection directions of the impinging electromagnetic waves. In the prior literature, the inter-RIS links which also contribute to the performance of the whole system are usually neglected when multiple RISs are deployed. In this paper we investigate a general double-… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  48. arXiv:2403.06420  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

    Authors: Liangliang Chen, Yutian Lei, Shiyu Jin, Ying Zhang, Liangjun Zhang

    Abstract: Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present a method for extracting the prior knowledge of LLMs b… ▽ More

    Submitted 19 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  49. arXiv:2402.19144  [pdf, other

    cs.CV

    Weakly Supervised Monocular 3D Detection with a Single-View Image

    Authors: Xueying Jiang, Sheng Jin, Lewei Lu, Xiaoqin Zhang, Shijian Lu

    Abstract: Monocular 3D detection (M3D) aims for precise 3D object localization from a single-view image which usually involves labor-intensive annotation of 3D detection boxes. Weakly supervised M3D has recently been studied to obviate the 3D annotation process by leveraging many existing 2D annotations, but it often requires extra training data such as LiDAR point clouds or multi-view images which greatly… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  50. arXiv:2402.15351  [pdf, other

    cs.LG cs.CV

    AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

    Authors: Zekang Yang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

    Abstract: Automated machine learning (AutoML) is a collection of techniques designed to automate the machine learning development process. While traditional AutoML approaches have been successfully applied in several critical steps of model development (e.g. hyperparameter optimization), there lacks a AutoML system that automates the entire end-to-end model production workflow. To fill this blank, we presen… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.