Skip to main content

Showing 1–50 of 2,619 results for author: Yang, J

  1. arXiv:2407.13499  [pdf, other

    cs.CR

    Three-State Information Hiding: Provably Secure Asymmetric Steganography

    Authors: Minhao Bai, Jinshuai Yang, Kaiyi Pang, Xu Xin, Yongfeng Huang

    Abstract: The rise of language models has provided a fertile ground for the application of steganography. Due to their qualified output, steganographic texts become similar to human and have attracted most of the steganography researchers' attention. However, running a language model requires a strong computation platform. It limits the applicable scenario of steganography, since those electronic devices co… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.12863  [pdf, other

    cs.CL cs.AI

    Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models

    Authors: Jung Hyun Lee, June Yong Yang, Byeongho Heo, Dongyoon Han, Kang Min Yoo

    Abstract: Large Language Models (LLMs) have demonstrated impressive problem-solving capabilities in mathematics through step-by-step reasoning chains. However, they are susceptible to reasoning errors that impact the quality of subsequent reasoning chains and the final answer due to language models' autoregressive token-by-token generating nature. Recent works have proposed adopting external verifiers to gu… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  3. arXiv:2407.12772  [pdf, other

    cs.CL cs.CV

    LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

    Authors: Kaichen Zhang, Bo Li, Peiyuan Zhang, Fanyi Pu, Joshua Adrian Cahyono, Kairui Hu, Shuai Liu, Yuanhan Zhang, Jingkang Yang, Chunyuan Li, Ziwei Liu

    Abstract: The advances of large foundation models necessitate wide-coverage, low-cost, and zero-contamination benchmarks. Despite continuous exploration of language model evaluations, comprehensive studies on the evaluation of Large Multi-modal Models (LMMs) remain limited. In this work, we introduce LMMS-EVAL, a unified and standardized multimodal benchmark framework with over 50 tasks and more than 10 mod… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Code ad leaderboard are available at https://github.com/EvolvingLMMs-Lab/lmms-eval and https://huggingface.co/spaces/lmms-lab/LiveBench

  4. arXiv:2407.12604  [pdf, ps, other

    cs.IT cs.DS cs.SI

    Exact Graph Matching in Correlated Gaussian-Attributed Erdős-Rényi Model

    Authors: Joonhyuk Yang, Hye Won Chung

    Abstract: Graph matching problem aims to identify node correspondence between two or more correlated graphs. Previous studies have primarily focused on models where only edge information is provided. However, in many social networks, not only the relationships between users, represented by edges, but also their personal information, represented by features, are present. In this paper, we address the challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: IEEE International Symposium on Information Theory (ISIT) 2024

  5. arXiv:2407.12529  [pdf, other

    cs.CL

    Crafting the Path: Robust Query Rewriting for Information Retrieval

    Authors: Ingeol Baek, Jimin Lee, Joonho Yang, Hwanhee Lee

    Abstract: Query rewriting aims to generate a new query that can complement the original query to improve the information retrieval system. Recent studies on query rewriting, such as query2doc (Q2D), query2expand (Q2E) and querey2cot (Q2C), rely on the internal knowledge of Large Language Models (LLMs) to generate a relevant passage to add information to the query. Nevertheless, the efficacy of these methodo… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 1 figure, 12 tables

  6. arXiv:2407.12435  [pdf, other

    cs.CV

    F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

    Authors: Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang

    Abstract: Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representati… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV24

  7. arXiv:2407.11691  [pdf, other

    cs.CV

    VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

    Authors: Haodong Duan, Junming Yang, Yuxuan Qiao, Xinyu Fang, Lin Chen, Yuan Liu, Xiaoyi Dong, Yuhang Zang, Pan Zhang, Jiaqi Wang, Dahua Lin, Kai Chen

    Abstract: We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework for researchers and developers to evaluate existing multi-modality models and publish reproducible evaluation results. In VLMEvalKit, we implement over 70 different large multi-modality models, including both proprietary… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  8. arXiv:2407.11534  [pdf, other

    cs.LG cs.AI

    LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

    Authors: Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee

    Abstract: With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language underst… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Preprint

  9. arXiv:2407.11027  [pdf, other

    cs.LG cs.AI

    A robust three-way classifier with shadowed granular-balls based on justifiable granularity

    Authors: Jie Yang, Lingyun Xiaodiao, Guoyin Wang, Witold Pedrycz, Shuyin Xia, Qinghua Zhang, Di Wu

    Abstract: The granular-ball (GB)-based classifier introduced by Xia, exhibits adaptability in creating coarse-grained information granules for input, thereby enhancing its generality and flexibility. Nevertheless, the current GB-based classifiers rigidly assign a specific class label to each data instance and lacks of the necessary strategies to address uncertain instances. These far-fetched certain classif… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2407.11006  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    How Good Is It? Evaluating the Efficacy of Common versus Domain-Specific Prompts on Foundational Large Language Models

    Authors: Oluyemi Enoch Amujo, Shanchieh Jay Yang

    Abstract: Recently, large language models (LLMs) have expanded into various domains. However, there remains a need to evaluate how these models perform when prompted with commonplace queries compared to domain-specific queries, which may be useful for benchmarking prior to fine-tuning domain-specific downstream tasks. This study evaluates LLMs, specifically Gemma-2B and Gemma-7B, across diverse domains, inc… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 10 pages, 5 figures, 2 tables, and algorithms

  11. arXiv:2407.11003  [pdf, other

    cs.CL cs.AI cs.LG

    Using Large Language Models in Public Transit Systems, San Antonio as a case study

    Authors: Ramya Jonnala, Gongbo Liang, Jeong Yang, Izzat Alsmadi

    Abstract: The integration of large language models into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio's public transit system, leveraging their capabilities in natural language processing, data analysis, and real time communication. By utilizing GTFS and other public transport… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  12. arXiv:2407.10784  [pdf, other

    cs.LG cs.AI stat.ML

    AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler

    Authors: Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang

    Abstract: In real-world applications, tabular data often suffer from distribution shifts due to their widespread and abundant nature, leading to erroneous predictions of pre-trained machine learning models. However, addressing such distribution shifts in the tabular domain has been relatively underexplored due to unique challenges such as varying attributes and dataset sizes, as well as the limited represen… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  13. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  14. arXiv:2407.09032  [pdf, other

    math.NA cs.LG

    DRM Revisited: A Complete Error Analysis

    Authors: Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang

    Abstract: In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number o… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  15. arXiv:2407.08555  [pdf, other

    eess.IV cs.CV

    SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

    Authors: Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

    Abstract: Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, ver… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Under review

  16. arXiv:2407.08243  [pdf, other

    cs.CV

    Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors

    Authors: Jingyi Yang, Zitong Yu, Xiuming Ni, Jia He, Hui Li

    Abstract: Face anti-spoofing techniques based on domain generalization have recently been studied widely. Adversarial learning and meta-learning techniques have been adopted to learn domain-invariant representations. However, prior approaches often consider the dataset gap as the primary factor behind domain shifts. This perspective is not fine-grained enough to reflect the intrinsic gap among the data accu… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ECAI 2024

  17. arXiv:2407.07921  [pdf, other

    cs.CR cs.AI cs.LG eess.SP

    A Trustworthy AIoT-enabled Localization System via Federated Learning and Blockchain

    Authors: Junfei Wang, He Huang, Jingze Feng, Steven Wong, Lihua Xie, Jianfei Yang

    Abstract: There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to pa… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  18. arXiv:2407.06938  [pdf, other

    cs.CV

    RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

    Authors: Bowen Zhang, Yiji Cheng, Chunyu Wang, Ting Zhang, Jiaolong Yang, Yansong Tang, Feng Zhao, Dong Chen, Baining Guo

    Abstract: We present RodinHD, which can generate high-fidelity 3D avatars from a portrait image. Existing methods fail to capture intricate details such as hairstyles which we tackle in this paper. We first identify an overlooked problem of catastrophic forgetting that arises when fitting triplanes sequentially on many avatars, caused by the MLP decoder sharing scheme. To overcome this issue, we raise a nov… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; project page: https://rodinhd.github.io/

  19. arXiv:2407.06772  [pdf, other

    cs.IT eess.SP

    Revealing the evanescent components in Kronecker-product based codebooks: insights and implications

    Authors: Jun Yang, Yijian Chen, Yunqi Sun, Yuan Si, Hongkang Yu, Shujuan Zhang, Zhaohua Lu

    Abstract: The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 11 pages, 9 figures

  20. arXiv:2407.06192  [pdf, other

    cs.CV cs.AI cs.CL

    Multi-Object Hallucination in Vision-Language Models

    Authors: Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

    Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates multi-object hallucination, examining how models misperceive (e.g., invent nonexistent o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to ALVR @ ACL 2024 | Project page: https://multi-object-hallucination.github.io/

  21. arXiv:2407.06100  [pdf, other

    physics.ao-ph cs.LG

    Leveraging data-driven weather models for improving numerical weather prediction skill through large-scale spectral nudging

    Authors: Syed Zahid Husain, Leo Separovic, Jean-François Caron, Rabah Aider, Mark Buehner, Stéphane Chamberland, Ervig Lapalme, Ron McTaggart-Cowan, Christopher Subich, Paul Vaillancourt, Jing Yang, Ayrton Zadra

    Abstract: Operational meteorological forecasting has long relied on physics-based numerical weather prediction (NWP) models. Recently, this landscape has been disrupted by the advent of data-driven artificial intelligence (AI)-based weather models, which offer tremendous computational performance and competitive forecasting skill. However, data-driven models for medium-range forecasting generally suffer fro… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  22. arXiv:2407.05712  [pdf, other

    cs.CV cs.AI

    MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

    Authors: Jianwen Jiang, Gaojie Lin, Zhengkun Rong, Chao Liang, Yongming Zhu, Jiaqi Yang, Tianyun Zhong

    Abstract: Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrati… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  23. arXiv:2407.05505  [pdf, other

    eess.IV cs.CV

    Dynamic Position Transformation and Boundary Refinement Network for Left Atrial Segmentation

    Authors: Fangqiang Xu, Wenxuan Tu, Fan Feng, Malitha Gunawardhana, Jiayuan Yang, Yun Gu, Jichao Zhao

    Abstract: Left atrial (LA) segmentation is a crucial technique for irregular heartbeat (i.e., atrial fibrillation) diagnosis. Most current methods for LA segmentation strictly assume that the input data is acquired using object-oriented center cropping, while this assumption may not always hold in practice due to the high cost of manual object annotation. Random cropping is a straightforward data pre-proces… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024 conference

  24. arXiv:2407.05248  [pdf, other

    cs.CV

    Self-Paced Sample Selection for Barely-Supervised Medical Image Segmentation

    Authors: Junming Su, Zhiqiang Shen, Peng Cao, Jinzhu Yang, Osmar R. Zaiane

    Abstract: The existing barely-supervised medical image segmentation (BSS) methods, adopting a registration-segmentation paradigm, aim to learn from data with very few annotations to mitigate the extreme label scarcity problem. However, this paradigm poses a challenge: pseudo-labels generated by image registration come with significant noise. To address this issue, we propose a self-paced sample selection fr… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  25. arXiv:2407.05176  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Towards Socially and Environmentally Responsible AI

    Authors: Pengfei Li, Yejia Liu, Jianyi Yang, Shaolei Ren

    Abstract: The sharply increasing sizes of artificial intelligence (AI) models come with significant energy consumption and environmental footprints, which can disproportionately impact certain (often marginalized) regions and hence create environmental inequity concerns. Moreover, concerns with social inequity have also emerged, as AI computing resources may not be equitably distributed across the globe and… ▽ More

    Submitted 22 April, 2024; originally announced July 2024.

    Comments: Presented at HotEthics 2024 (co-located with ASPLOS' 24)

  26. arXiv:2407.04381  [pdf, other

    cs.CV cs.AI

    Multi-Branch Auxiliary Fusion YOLO with Re-parameterization Heterogeneous Convolutional for accurate object detection

    Authors: Zhiqiang Yang, Qiu Guan, Keer Zhao, Jianmin Yang, Xinli Xu, Haixia Long, Ying Tang

    Abstract: Due to the effective performance of multi-scale feature fusion, Path Aggregation FPN (PAFPN) is widely employed in YOLO detectors. However, it cannot efficiently and adaptively integrate high-level semantic information with low-level spatial information simultaneously. We propose a new model named MAF-YOLO in this paper, which is a novel object detection framework with a versatile neck named Multi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  27. arXiv:2407.03900  [pdf, other

    cs.CV

    Oracle Bone Inscriptions Multi-modal Dataset

    Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

    Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  28. arXiv:2407.03571  [pdf, other

    math.OC cs.LG stat.ML

    A Fully Parameter-Free Second-Order Algorithm for Convex-Concave Minimax Problems with Optimal Iteration Complexity

    Authors: Junlin Wang, Junnan Yang, Zi Xu

    Abstract: In this paper, we study second-order algorithms for the convex-concave minimax problem, which has attracted much attention in many fields such as machine learning in recent years. We propose a Lipschitz-free cubic regularization (LF-CR) algorithm for solving the convex-concave minimax optimization problem without knowing the Lipschitz constant. It can be shown that the iteration complexity of the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    MSC Class: 90C47; 90C26; 90C30

  29. arXiv:2407.03040  [pdf, other

    cs.CL cs.AI

    Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

    Authors: Xia Hou, Qifeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song

    Abstract: Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generat… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

    MSC Class: 68T50 ACM Class: I.2.7

  30. Efficient IoT Devices Localization Through Wi-Fi CSI Feature Fusion and Anomaly Detection

    Authors: Yan Li, Jie Yang, Shang-Ling Shih, Wan-Ting Shih, Chao-Kai Wen, Shi Jin

    Abstract: Internet of Things (IoT) device localization is fundamental to smart home functionalities, including indoor navigation and tracking of individuals. Traditional localization relies on relative methods utilizing the positions of anchors within a home environment, yet struggles with precision due to inherent inaccuracies in these anchor positions. In response, we introduce a cutting-edge smartphone-b… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted in IEEE Internet of Things Journal, Early Access, 2024

    Journal ref: IEEE Internet of Things Journal, Early Access, 2024

  31. arXiv:2407.02855  [pdf, other

    cs.CR cs.CL cs.LG

    Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

    Authors: Zhexin Zhang, Junxiao Yang, Pei Ke, Shiyao Cui, Chujie Zheng, Hongning Wang, Minlie Huang

    Abstract: LLMs are known to be vulnerable to jailbreak attacks, even after safety alignment. An important observation is that, while different types of jailbreak attacks can generate significantly different queries, they mostly result in similar responses that are rooted in the same harmful knowledge (e.g., detailed steps to make a bomb). Therefore, we conjecture that directly unlearn the harmful knowledge… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 15 pages

  32. arXiv:2407.02689  [pdf, ps, other

    cs.LG cs.DC math.OC stat.ML

    Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

    Authors: Junchi Yang, Murat Yildirim, Qiu Feng

    Abstract: In distributed machine learning, efficient training across multiple agents with different data distributions poses significant challenges. Even with a centralized coordinator, current algorithms that achieve optimal communication complexity typically require either large minibatches or compromise on gradient complexity. In this work, we tackle both centralized and decentralized settings across str… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  33. arXiv:2407.02680  [pdf, other

    cs.SE

    KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution

    Authors: Alex Mathai, Chenxi Huang, Petros Maniatis, Aleksandr Nogikh, Franjo Ivancic, Junfeng Yang, Baishakhi Ray

    Abstract: Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. Unlike application-level software, a systems codebase like Linux is multilingual (low-level C/Assembly/Bash/Rust); gigantic (>20 million lines); critical (impac… ▽ More

    Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  34. arXiv:2407.02371  [pdf, other

    cs.CV

    OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

    Authors: Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai

    Abstract: Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is ch… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 15 pages, 9 figures

  35. arXiv:2407.02353  [pdf, other

    eess.SP cs.AR eess.SY

    Roadmap to Neuromorphic Computing with Emerging Technologies

    Authors: Adnan Mehonic, Daniele Ielmini, Kaushik Roy, Onur Mutlu, Shahar Kvatinsky, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, Sabina Spiga, Sergey Savelev, Alexander G Balanov, Nitin Chawla, Giuseppe Desoli, Gerardo Malavena, Christian Monzio Compagnoni, Zhongrui Wang, J Joshua Yang, Ghazi Sarwat Syed, Abu Sebastian, Thomas Mikolajick, Beatriz Noheda, Stefan Slesazeck, Bernard Dieny, Tuo-Hung, Hou, Akhil Varri , et al. (28 additional authors not shown)

    Abstract: The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining t… ▽ More

    Submitted 5 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 90 pages, 22 figures, roadmap, neuromorphic

  36. arXiv:2407.01960  [pdf, other

    cs.CV cs.LG

    Zero-shot Video Restoration and Enhancement Using Pre-Trained Image Diffusion Model

    Authors: Cong Cao, Huanjing Yue, Xin Liu, Jingyu Yang

    Abstract: Diffusion-based zero-shot image restoration and enhancement models have achieved great success in various image restoration and enhancement tasks without training. However, directly applying them to video restoration and enhancement results in severe temporal flickering artifacts. In this paper, we propose the first framework for zero-shot video restoration and enhancement based on a pre-trained i… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 19 pages

  37. arXiv:2407.01945  [pdf, other

    cs.CV

    Indoor 3D Reconstruction with an Unknown Camera-Projector Pair

    Authors: Zhaoshuai Qi, Yifeng Hao, Rui Hu, Wenyou Chang, Jiaqi Yang, Yanning Zhang

    Abstract: Structured light-based method with a camera-projector pair (CPP) plays a vital role in indoor 3D reconstruction, especially for scenes with weak textures. Previous methods usually assume known intrinsics, which are pre-calibrated from known objects, or self-calibrated from multi-view observations. It is still challenging to reliably recover CPP intrinsics from only two views without any known obje… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  38. arXiv:2407.01479  [pdf, other

    cs.RO cs.LG

    EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

    Authors: Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg

    Abstract: Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning. We propose EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Our approach combines SIM(3)-equivariant neural network architectures with diffusion models… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  39. arXiv:2407.01262  [pdf, other

    cs.LG

    Complementary Fusion of Deep Network and Tree Model for ETA Prediction

    Authors: YuRui Huang, Jie Zhang, HengDa Bao, Yang Yang, Jian Yang

    Abstract: Estimated time of arrival (ETA) is a very important factor in the transportation system. It has attracted increasing attentions and has been widely used as a basic service in navigation systems and intelligent transportation systems. In this paper, we propose a novel solution to the ETA estimation problem, which is an ensemble on tree models and neural networks. We proved the accuracy and robustne… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  40. arXiv:2407.00696  [pdf, other

    cs.LG

    Graph in Graph Neural Network

    Authors: Jiongshu Wang, Jing Yang, Jiankang Deng, Hatice Gunes, Siyang Song

    Abstract: Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    MSC Class: 68T05

  41. UWBAD: Towards Effective and Imperceptible Jamming Attacks Against UWB Ranging Systems with COTS Chips

    Authors: Yuqiao Yang, Zhongjie Wu, Yongzhao Zhang, Ting Chen, Jun Li, Jie Yang, Wenhao Liu, Xiaosong Zhang, Ruicong Shi, Jingwei Li, Yu Jiang, Zhuo Su

    Abstract: UWB ranging systems have been adopted in many critical and security sensitive applications due to its precise positioning and secure ranging capabilities. We present a practical jamming attack, namely UWBAD, against commercial UWB ranging systems, which exploits the vulnerability of the adoption of the normalized cross-correlation process in UWB ranging and can selectively and quickly block rangin… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

  42. arXiv:2406.20081  [pdf, other

    cs.CV cs.LG

    Segment Anything without Supervision

    Authors: XuDong Wang, Jingfeng Yang, Trevor Darrell

    Abstract: The Segmentation Anything Model (SAM) requires labor-intensive data labeling. We present Unsupervised SAM (UnSAM) for promptable and automatic whole-image segmentation that does not require human annotations. UnSAM utilizes a divide-and-conquer strategy to "discover" the hierarchical structure of visual scenes. We first leverage top-down clustering methods to partition an unlabeled image into inst… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/frank-xwang/UnSAM

  43. arXiv:2406.20078  [pdf, other

    cs.CV

    GM-DF: Generalized Multi-Scenario Deepfake Detection

    Authors: Yingxin Lai, Zitong Yu, Jing Yang, Bin Li, Xiangui Kang, Linlin Shen

    Abstract: Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  44. arXiv:2406.19398  [pdf, other

    cs.CV cs.GR

    Woven Fabric Capture with a Reflection-Transmission Photo Pair

    Authors: Yingjie Tang, Zixuan Li, Miloš Hašan, Jian Yang, Beibei Wang

    Abstract: Digitizing woven fabrics would be valuable for many applications, from digital humans to interior design. Previous work introduces a lightweight woven fabric acquisition approach by capturing a single reflection image and estimating the fabric parameters with a differentiable geometric and shading model. The renderings of the estimated fabric parameters can closely match the photo; however, the ca… ▽ More

    Submitted 7 July, 2024; v1 submitted 4 May, 2024; originally announced June 2024.

    Comments: 10 pages, 16 figures (in the main paper). Accepted by SIGGRAPH 2024 conference

  45. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  46. arXiv:2406.18544  [pdf, other

    cs.CV cs.GR

    GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors

    Authors: Zuo-Liang Zhu, Beibei Wang, Jian Yang

    Abstract: 3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. Inspired by previous works, the signed… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  47. arXiv:2406.18294  [pdf, other

    cs.CL

    Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

    Authors: Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang

    Abstract: Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code completion. However, in real-world development scenarios, simply concatenating the entire code repository often exceeds the context window limits of these Repo-Code L… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  48. arXiv:2406.18284  [pdf, other

    cs.CV

    RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

    Authors: Xiaozhong Ji, Chuming Lin, Zhonggan Ding, Ying Tai, Jian Yang, Junwei Zhu, Xiaobin Hu, Jiangning Zhang, Donghao Luo, Chengjie Wang

    Abstract: Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical applications. The challenges are two-fold: 1) Preserving unique individual traits for achieving high-precision lip synchronization. 2) Generating high-qual… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  49. arXiv:2406.17960  [pdf, other

    cs.CV cs.AI

    MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

    Authors: Liuyi Wang, Zongtao He, Mengjiao Shen, Jingwei Yang, Chengju Liu, Qijun Chen

    Abstract: Despite the remarkable developments of recent large models in Embodied Artificial Intelligence (E-AI), their integration into robotics is hampered by their excessive parameter sizes and computational demands. Towards the Vision-and-Language Navigation (VLN) task, a core task in E-AI, this paper reveals the great potential of using knowledge distillation for obtaining lightweight student models by… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  50. arXiv:2406.17588  [pdf, other

    cs.CL

    LongIns: A Challenging Long-context Instruction-based Exam for LLMs

    Authors: Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

    Abstract: The long-context capabilities of large language models (LLMs) have been a hot topic in recent years. To evaluate the performance of LLMs in different scenarios, various assessment benchmarks have emerged. However, as most of these benchmarks focus on identifying key information to answer questions, which mainly requires the retrieval ability of LLMs, these benchmarks can partially represent the re… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.