Skip to main content

Showing 1–50 of 6,999 results for author: Liu, Y

  1. arXiv:2407.13709  [pdf, other

    cs.CL cs.LG

    Understanding Reference Policies in Direct Preference Optimization

    Authors: Yixin Liu, Pengfei Liu, Arman Cohan

    Abstract: Direct Preference Optimization (DPO) has become a widely used training method for the instruction fine-tuning of large language models (LLMs). In this work, we explore an under-investigated aspect of DPO - its dependency on the reference model or policy. Such reference policies, typically instantiated as the model to be further fine-tuned, are important since they can impose an upper limit on DPO'… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13342  [pdf, other

    cs.CV

    Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds

    Authors: Shengtao Li, Ge Gao, Yudong Liu, Ming Gu, Yu-Shen Liu

    Abstract: Neural signed distance functions (SDFs) have shown powerful ability in fitting the shape geometry. However, inferring continuous signed distance fields from discrete unoriented point clouds still remains a challenge. The neural network typically fits the shape with a rough surface and omits fine-grained geometric details such as shape edges and corners. In this paper, we propose a novel non-linear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project page: https://list17.github.io/ImplicitFilter

  3. arXiv:2407.13292  [pdf, other

    cs.SD cs.CL eess.AS

    Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

    Authors: Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou

    Abstract: The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.13278  [pdf, other

    cs.LG

    Deep Time Series Models: A Comprehensive Survey and Benchmark

    Authors: Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

    Abstract: Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: \

  6. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  7. arXiv:2407.13229  [pdf, other

    cs.RO eess.SY

    Disturbance Observer for Estimating Coupled Disturbances

    Authors: Jindou Jia, Yuhang Liu, Kexin Guo, Xiang Yu, Lihua Xie, Lei Guo

    Abstract: High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning phil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures

  8. arXiv:2407.13220  [pdf, other

    eess.AS cs.SD

    MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

    Authors: Huadai Liu, Jialei Wang, Rongjie Huang, Yang Liu, Jiayang Xu, Zhou Zhao

    Abstract: Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.13097  [pdf, other

    cs.CL

    AlcLaM: Arabic Dialectal Language Model

    Authors: Murtadha Ahmed, Saghir Alfasly, Bo Wen, Jamaal Qasem, Mohammed Ahmed, Yunfeng Liu

    Abstract: Pre-trained Language Models (PLMs) are integral to many modern natural language processing (NLP) systems. Although multilingual models cover a wide range of languages, they often grapple with challenges like high inference costs and a lack of diverse non-English training data. Arabic-specific PLMs are trained predominantly on modern standard Arabic, which compromises their performance on regional… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by ArabicNLP 2024, presented in ACL 2024

  10. arXiv:2407.12939  [pdf, other

    cs.CV

    GenRC: Generative 3D Room Completion from Sparse Image Collections

    Authors: Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun

    Abstract: Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first proje… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  11. arXiv:2407.12550  [pdf, other

    cs.LG

    UniTE: A Survey and Unified Pipeline for Pre-training ST Trajectory Embeddings

    Authors: Yan Lin, Zeyu Zhou, Yicheng Liu, Haochen Lv, Haomin Wen, Tianyi Li, Yushuai Li, Christian S. Jensen, Shengnan Guo, Youfang Lin, Huaiyu Wan

    Abstract: Spatio-temporal (ST) trajectories are sequences of timestamped locations, which enable a variety of analyses that in turn enable important real-world applications. It is common to map trajectories to vectors, called embeddings, before subsequent analyses. Thus, the qualities of embeddings are very important. Methods for pre-training embeddings, which leverage unlabeled trajectories for training un… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  12. arXiv:2407.12511  [pdf, other

    cs.CV

    Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations

    Authors: Tomáš Chobola, Yu Liu, Hanyi Zhang, Julia A. Schnabel, Tingying Peng

    Abstract: Current deep learning-based low-light image enhancement methods often struggle with high-resolution images, and fail to meet the practical demands of visual perception across diverse and unseen scenarios. In this paper, we introduce a novel approach termed CoLIE, which redefines the enhancement process through mapping the 2D coordinates of an underexposed image to its illumination component, condi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV24

  13. arXiv:2407.12371  [pdf, other

    cs.CV cs.AI

    HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

    Authors: Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang

    Abstract: Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Project page: https://lvxintao.github.io/himo, accepted by ECCV 2024

  14. arXiv:2407.12344  [pdf, other

    cs.CL cs.CY

    The Better Angels of Machine Personality: How Personality Relates to LLM Safety

    Authors: Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao

    Abstract: Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxici… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  15. arXiv:2407.12273  [pdf, other

    cs.CV

    GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity

    Authors: Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong

    Abstract: Traditional single-task image restoration methods excel in handling specific degradation types but struggle with multiple degradations. To address this limitation, we propose Grouped Restoration with Image Degradation Similarity (GRIDS), a novel approach that harmonizes the competing objectives inherent in multiple-degradation restoration. We first introduce a quantitative method for assessing rel… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  16. arXiv:2407.12267  [pdf, other

    cs.CV cs.GR

    Generating 3D House Wireframes with Semantics

    Authors: Xueqi Ma, Yilin Liu, Wenjun Zhou, Ruowei Wang, Hui Huang

    Abstract: We present a new approach for generating 3D house wireframes with semantic enrichment using an autoregressive model. Unlike conventional generative models that independently process vertices, edges, and faces, our approach employs a unified wire-based representation for improved coherence in learning 3D wireframe structures. By re-ordering wire sequences based on semantic meanings, we facilitate s… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: European Conference on Computer Vision (Proceedings of ECCV 2024); Project page: https://vcc.tech/research/2024/3DWire; GitHub repository: https://github.com/3d-house-wireframe/3d-house-wireframe-dataset

  17. arXiv:2407.12223  [pdf, other

    cs.LG cs.AI

    Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation

    Authors: Chengzhi Lin, Shuchang Liu, Chuyuan Wang, Yongqi Liu

    Abstract: Within the domain of short video recommendation, predicting users' watch time is a critical but challenging task. Prevailing deterministic solutions obtain accurate debiased statistical models, yet they neglect the intrinsic uncertainty inherent in user environments. In our observation, we found that this uncertainty could potentially limit these methods' accuracy in watch-time prediction on our o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, 4 tables

  18. arXiv:2407.12074  [pdf, other

    cs.LG cs.AI

    Enhancing Parameter Efficiency and Generalization in Large-Scale Models: A Regularized and Masked Low-Rank Adaptation Approach

    Authors: Yuzhu Mao, Siqi Ping, Zihao Zhao, Yang Liu, Wenbo Ding

    Abstract: Large pre-trained models, such as large language models (LLMs), present significant resource challenges for fine-tuning due to their extensive parameter sizes, especially for applications in mobile systems. To address this, Low-Rank Adaptation (LoRA) has been developed to reduce resource consumption while maintaining satisfactory fine-tuning results. Despite its effectiveness, the original LoRA me… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  19. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Shuchen Shi, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description

  20. arXiv:2407.11963  [pdf, other

    cs.CL

    NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

    Authors: Mo Li, Songyang Zhang, Yunxin Liu, Kai Chen

    Abstract: In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple l… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  21. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  22. arXiv:2407.11626  [pdf

    cs.LG cs.NE

    Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces

    Authors: Dongnan Jin, Yali Liu, Qiuzhi Song, Xunju Ma, Yue Liu, Dehao Wu

    Abstract: In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search i… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  23. arXiv:2407.11610  [pdf, other

    cs.CV

    MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

    Authors: Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

    Abstract: This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  24. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  25. arXiv:2407.11505  [pdf, other

    cs.CV

    Haze-Aware Attention Network for Single-Image Dehazing

    Authors: Lihan Tong, Yun Liu, Weijia Li, Liyuan Chen, Erkang Chen

    Abstract: Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current attention-based solutions, we propose a new dehazing network combining an innovative Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhanc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 13 pages, 6 figures

    Report number: applsci-3022856 MSC Class: 68I1C; 68I8P ACM Class: I.4.3; I.4.9

  26. arXiv:2407.11497  [pdf, other

    cs.HC cs.GR

    "I Came Across a Junk": Understanding Design Flaws of Data Visualization from the Public's Perspective

    Authors: Xingyu Lan, Yu Liu

    Abstract: The visualization community has a rich history of reflecting upon flaws of visualization design, and research in this direction has remained lively until now. However, three main gaps still exist. First, most existing work characterizes design flaws from the perspective of researchers rather than the perspective of general users. Second, little work has been done to infer why these design flaws oc… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  27. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  28. arXiv:2407.11079  [pdf, ps, other

    eess.SP cs.IT

    One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach

    Authors: Mingjie Shao, Wei-Kun Chen, Cheng-Yang Yu, Ya-Feng Liu, Wing-Kin Ma

    Abstract: As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-in… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  29. arXiv:2407.11073  [pdf, other

    cs.CR cs.CV cs.LG

    SemiAdv: Query-Efficient Black-Box Adversarial Attack with Unlabeled Images

    Authors: Mingyuan Fan, Yang Liu, Cen Chen, Ximeng Liu

    Abstract: Adversarial attack has garnered considerable attention due to its profound implications for the secure deployment of robots in sensitive security scenarios. To potentially push for advances in the field, this paper studies the adversarial attack in the black-box setting and proposes an unlabeled data-driven adversarial attack method, called SemiAdv. Specifically, SemiAdv achieves the following bre… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  30. arXiv:2407.10918  [pdf, other

    cs.CV

    PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition

    Authors: Xiao Li, Yining Liu, Na Dong, Sitian Qin, Xiaolin Hu

    Abstract: Deep learning-based object recognition systems can be easily fooled by various adversarial perturbations. One reason for the weak robustness may be that they do not have part-based inductive bias like the human recognition process. Motivated by this, several part-based recognition models have been proposed to improve the adversarial robustness of recognition. However, due to the lack of part annot… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  31. arXiv:2407.10899  [pdf, other

    cs.CY cs.AI cs.CL

    Leveraging LLM-Respondents for Item Evaluation: a Psychometric Analysis

    Authors: Yunting Liu, Shreya Bhandari, Zachary A. Pardos

    Abstract: Effective educational measurement relies heavily on the curation of well-designed item pools (i.e., possessing the right psychometric properties). However, item calibration is time-consuming and costly, requiring a sufficient number of respondents for the response process. We explore using six different LLMs (GPT-3.5, GPT-4, Llama 2, Llama 3, Gemini-Pro, and Cohere Command R Plus) and various comb… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  32. arXiv:2407.10830  [pdf, other

    cs.DS

    Almost-Linear Time Algorithms for Decremental Graphs: Min-Cost Flow and More via Duality

    Authors: Jan van den Brand, Li Chen, Rasmus Kyng, Yang P. Liu, Simon Meierhans, Maximilian Probst Gutenberg, Sushant Sachdeva

    Abstract: We give the first almost-linear total time algorithm for deciding if a flow of cost at most $F$ still exists in a directed graph, with edge costs and capacities, undergoing decremental updates, i.e., edge deletions, capacity decreases, and cost increases. This implies almost-linear time algorithms for approximating the minimum-cost flow value and $s$-$t$ distance on such decremental graphs. Our fr… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 61 pages, Accepted to FOCS 2024

  33. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  34. arXiv:2407.10482  [pdf, other

    cs.CV

    NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis

    Authors: Yubin Hu, Xiaoyang Guo, Yang Xiao, Jingwei Huang, Yong-Jin Liu

    Abstract: This paper presents NGP-RT, a novel approach for enhancing the rendering speed of Instant-NGP to achieve real-time novel view synthesis. As a classic NeRF-based method, Instant-NGP stores implicit features in multi-level grids or hash tables and applies a shallow MLP to convert the implicit features into explicit colors and densities. Although it achieves fast training speed, there is still a lot… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  35. arXiv:2407.10439  [pdf, other

    cs.CV

    PolyRoom: Room-aware Transformer for Floorplan Reconstruction

    Authors: Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, Shuhan Shen

    Abstract: Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from point clouds. Despite significant advancements achieved in recent years, current methods still encounter several challenges, such as missing corners or edges, inacc… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  36. arXiv:2407.10209  [pdf, other

    cs.CV

    Vector Field Attention for Deformable Image Registration

    Authors: Yihao Liu, Junyu Chen, Lianrui Zuo, Aaron Carass, Jerry L. Prince

    Abstract: Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  37. arXiv:2407.10167  [pdf, other

    cs.CL cs.AI

    Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

    Authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang

    Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these small… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.11864

  38. arXiv:2407.10124  [pdf, other

    cs.RO

    Adaptive Model Predictive Control with Data-driven Error Model for Quadrupedal Locomotion

    Authors: Xuanqi Zeng, Hongbo Zhang, Linzhu Yue, Zhitao Song, Linwei Zhang, Yun-Hui Liu

    Abstract: Model Predictive Control (MPC) relies heavily on the robot model for its control law. However, a gap always exists between the reduced-order control model with uncertainties and the real robot, which degrades its performance. To address this issue, we propose the controller of integrating a data-driven error model into traditional MPC for quadruped robots. Our approach leverages real-world data fr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 7 Pages, 7 figures, conference(ICRA 2024)

  39. arXiv:2407.10106  [pdf, other

    cs.SE

    DistillSeq: A Framework for Safety Alignment Testing in Large Language Models using Knowledge Distillation

    Authors: Mingke Yang, Yuqi Chen, Yi Liu, Ling Shi

    Abstract: Large Language Models (LLMs) have showcased their remarkable capabilities in diverse domains, encompassing natural language understanding, translation, and even code generation. The potential for LLMs to generate harmful content is a significant concern. This risk necessitates rigorous testing and comprehensive evaluation of LLMs to ensure safe and responsible use. However, extensive testing of LL… ▽ More

    Submitted 17 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

  40. arXiv:2407.10099  [pdf, other

    cs.CV

    STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video

    Authors: Yang Liu, Zhiyong Zhang

    Abstract: The current methods of video-based 3D human pose estimation have achieved significant progress; however, they continue to confront the significant challenge of depth ambiguity. To address this limitation, this paper presents the spatio-temporal GraphFormer framework for 3D human pose estimation in video, which integrates body structure graph-based representations with spatio-temporal information.… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  41. arXiv:2407.10091  [pdf, other

    cs.CL

    Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation

    Authors: Ge Gao, Jongin Kim, Sejin Paik, Ekaterina Novozhilova, Yi Liu, Sarah T. Bonna, Margrit Betke, Derry Tanti Wijaya

    Abstract: Predicting emotions elicited by news headlines can be challenging as the task is largely influenced by the varying nature of people's interpretations and backgrounds. Previous works have explored classifying discrete emotions directly from news headlines. We provide a different approach to tackling this problem by utilizing people's explanations of their emotion, written in free-text, on how they… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: published at LREC-COLING 2024

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) 5944-5955

  42. arXiv:2407.10081  [pdf, other

    cs.IR

    All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era

    Authors: Bo Chen, Xinyi Dai, Huifeng Guo, Wei Guo, Weiwen Liu, Yong Liu, Jiarui Qin, Ruiming Tang, Yichao Wang, Chuhan Wu, Yaxiong Wu, Hao Zhang

    Abstract: Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  43. arXiv:2407.10039  [pdf, other

    cs.SE cs.CR cs.PL

    OpenTracer: A Dynamic Transaction Trace Analyzer for Smart Contract Invariant Generation and Beyond

    Authors: Zhiyang Chen, Ye Liu, Sidi Mohamed Beillahi, Yi Li, Fan Long

    Abstract: Smart contracts, self-executing programs on the blockchain, facilitate reliable value exchanges without centralized oversight. Despite the recent focus on dynamic analysis of their transaction histories in both industry and academia, no open-source tool currently offers comprehensive tracking of complete transaction information to extract user-desired data such as invariant-related data. This pape… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  44. arXiv:2407.09417  [pdf, other

    cs.CL cs.IR

    Mitigating Entity-Level Hallucination in Large Language Models

    Authors: Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu

    Abstract: The emergence of Large Language Models (LLMs) has revolutionized how users access information, shifting from traditional search engines to direct question-and-answer interactions with LLMs. However, the widespread adoption of LLMs has revealed a significant challenge known as hallucination, wherein LLMs generate coherent yet factually inaccurate responses. This hallucination phenomenon has led to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  45. arXiv:2407.09271  [pdf, other

    cs.CV cs.LG

    iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

    Authors: Tom Fischer, Yaoyao Liu, Artur Jesslen, Noor Ahmed, Prakhar Kaushik, Angtian Wang, Alan Yuille, Adam Kortylewski, Eddy Ilg

    Abstract: Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that no… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  46. arXiv:2407.09047  [pdf, other

    cs.CV

    Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation

    Authors: Wei Cong, Yang Cong, Yuyang Liu, Gan Sun

    Abstract: Incremental semantic segmentation endeavors to segment newly encountered classes while maintaining knowledge of old classes. However, existing methods either 1) lack guidance from class-specific knowledge (i.e., old class prototypes), leading to a bias towards new classes, or 2) constrain class-shared knowledge (i.e., old model weights) excessively without discrimination, resulting in a preference… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  47. arXiv:2407.08975  [pdf, other

    cs.AR cs.ET

    Hybrid Temporal Computing for Lower Power Hardware Accelerators

    Authors: Maliha Tasnim, Sachin Sachdeva, Yibo Liu, Sheldon X. -D. Tan

    Abstract: In this paper, we propose a new hybrid temporal computing (HTC) framework that leverages both pulse rate and temporal data encoding to design ultra-low energy hardware accelerators. Our approach is inspired by the recently proposed temporal computing, or race logic, which encodes data values as single delays, leading to significantly lower energy consumption due to minimized signal switching. Howe… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 7 pages, 8 figures and 3 tables

  48. arXiv:2407.08967  [pdf, other

    cs.CL cs.AI

    Empowering Few-Shot Relation Extraction with The Integration of Traditional RE Methods and Large Language Models

    Authors: Ye Liu, Kai Zhang, Aoran Gan, Linan Yue, Feng Hu, Qi Liu, Enhong Chen

    Abstract: Few-Shot Relation Extraction (FSRE), a subtask of Relation Extraction (RE) that utilizes limited training instances, appeals to more researchers in Natural Language Processing (NLP) due to its capability to extract textual information in extremely low-resource scenarios. The primary methodologies employed for FSRE have been fine-tuning or prompt tuning techniques based on Pre-trained Language Mode… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  49. arXiv:2407.08952  [pdf, other

    cs.CL cs.AI

    Detect, Investigate, Judge and Determine: A Novel LLM-based Framework for Few-shot Fake News Detection

    Authors: Ye Liu, Jiajun Zhu, Kai Zhang, Haoyu Tang, Yanghai Zhang, Xukai Liu, Qi Liu, Enhong Chen

    Abstract: Few-Shot Fake News Detection (FS-FND) aims to distinguish inaccurate news from real ones in extremely low-resource scenarios. This task has garnered increased attention due to the widespread dissemination and harmful impact of fake news on social media. Large Language Models (LLMs) have demonstrated competitive performance with the help of their rich prior knowledge and excellent in-context learni… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  50. arXiv:2407.08914  [pdf, other

    cs.NI eess.SP

    Multi-objective Aerial Collaborative Secure Communication Optimization via Generative Diffusion Model-enabled Deep Reinforcement Learning

    Authors: Chuang Zhang, Geng Sun, Jiahui Li, Qingqing Wu, Jiacheng Wang, Dusit Niyato, Yuanwei Liu

    Abstract: Due to flexibility and low-cost, unmanned aerial vehicles (UAVs) are increasingly crucial for enhancing coverage and functionality of wireless networks. However, incorporating UAVs into next-generation wireless communication systems poses significant challenges, particularly in sustaining high-rate and long-range secure communications against eavesdropping attacks. In this work, we consider a UAV… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Transactions on Mobile Computing