Skip to main content

Showing 1–50 of 724 results for author: Hu, S

  1. arXiv:2407.13331  [pdf, other

    cs.LG

    Reconstruct the Pruned Model without Any Retraining

    Authors: Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

    Abstract: Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while usi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages

  2. arXiv:2407.11421  [pdf, other

    cs.CL

    States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

    Authors: Junhao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  3. arXiv:2407.10474  [pdf, other

    cs.MM

    Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

    Authors: Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu

    Abstract: Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ICME 2024

  4. arXiv:2407.09894  [pdf, other

    cs.SI cs.AI cs.CL

    Transferring Structure Knowledge: A New Task to Fake news Detection Towards Cold-Start Propagation

    Authors: Lingwei Wei, Dou Hu, Wei Zhou, Songlin Hu

    Abstract: Many fake news detection studies have achieved promising performance by extracting effective semantic and structure features from both content and propagation trees. However, it is challenging to apply them to practical situations, especially when using the trained propagation-based models to detect news with no propagation data. Towards this scenario, we study a new task named cold-start fake new… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: ICASSP 2024

  5. arXiv:2407.09816  [pdf, other

    cs.CL

    MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

    Authors: Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Scaling model capacity enhances its capabilities but significantly increases computation. Mixture-of-Experts models (MoEs) address this by allowing model capacity to scale without substantially increasing training or inference costs. Despite their promising results, MoE models encounter several challenges. Primarily, the dispersion of training tokens across multiple experts can lead to underfittin… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Work in progress

  6. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  8. arXiv:2407.05104  [pdf, other

    cs.CY

    Crowdsourced reviews reveal substantial disparities in public perceptions of parking

    Authors: Lingyao Li, Songhua Hu, Ly Dinh, Libby Hemphill

    Abstract: Due to increased reliance on private vehicles and growing travel demand, parking remains a longstanding urban challenge globally. Quantifying parking perceptions is paramount as it enables decision-makers to identify problematic areas and make informed decisions on parking management. This study introduces a cost-effective and widely accessible data source, crowdsourced online reviews, to investig… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  9. arXiv:2407.04688  [pdf, other

    cs.CV

    Enhancing Vehicle Re-identification and Matching for Weaving Analysis

    Authors: Mei Qiu, Wei Lin, Stanley Chien, Lauren Christopher, Yaobin Chen, Shu Hu

    Abstract: Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantit… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  10. arXiv:2407.02165  [pdf, other

    cs.CV

    WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

    Authors: Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we… ▽ More

    Submitted 14 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Project page: https://wildavatar.github.io/

  11. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  12. arXiv:2406.17338  [pdf, other

    eess.IV cs.CV cs.LG

    Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

    Authors: Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

    Abstract: Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propos… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  13. arXiv:2406.15718  [pdf, other

    cs.CL

    Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

    Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

    Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  14. arXiv:2406.15093  [pdf, other

    cs.CR cs.CV eess.IV

    ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

    Authors: Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

    Abstract: Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, bui… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ESORICS 2024

  15. arXiv:2406.13356  [pdf, other

    cs.LG

    Jogging the Memory of Unlearned Model Through Targeted Relearning Attack

    Authors: Shengyuan Hu, Yiwei Fu, Zhiwei Steven Wu, Virginia Smith

    Abstract: Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to r… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, 12 tables

  16. arXiv:2406.13294  [pdf, other

    cs.MM cs.LG

    Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

    Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

    Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  17. arXiv:2406.12293  [pdf, other

    cs.CV

    Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

    Authors: Zehui Liao, Shishuai Hu, Yong Xia

    Abstract: The challenge of addressing mixed closed-set and open-set label noise in medical image classification remains largely unexplored. Unlike natural image classification where there is a common practice of segregation and separate processing of closed-set and open-set noisy samples from clean ones, medical image classification faces difficulties due to high inter-class similarity which complicates the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure

  18. arXiv:2406.11077  [pdf, other

    cs.CV

    Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields

    Authors: Yixiong Yang, Shilin Hu, Haoyu Wu, Ramon Baldrich, Dimitris Samaras, Maria Vanrell

    Abstract: The task of extracting intrinsic components, such as reflectance and shading, from neural radiance fields is of growing interest. However, current methods largely focus on synthetic scenes and isolated objects, overlooking the complexities of real scenes with backgrounds. To address this gap, our research introduces a method that combines relighting with intrinsic decomposition. By leveraging ligh… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024 Workshop Neural Rendering Intelligence(NRI)

  19. arXiv:2406.10246  [pdf, other

    cs.IR cs.AI

    Semantic-Enhanced Relational Metric Learning for Recommender Systems

    Authors: Mingming Li, Fuqing Zhu, Feng Yuan, Songlin Hu

    Abstract: Recently, relational metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. Different from the knowledge graph where the entity-to-entity relations are given in advance, historical interactions lack explicit relations between users and items in recommender systems. Currently, many researchers have s… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  20. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui Jin, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  21. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  22. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 16 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  23. arXiv:2406.08751  [pdf, other

    cs.AI

    3D Building Generation in Minecraft via Large Language Models

    Authors: Shiying Hu, Zengrong Huang, Chengpeng Hu, Jialin Liu

    Abstract: Recently, procedural content generation has exhibited considerable advancements in the domain of 2D game level generation such as Super Mario Bros. and Sokoban through large language models (LLMs). To further validate the capabilities of LLMs, this paper explores how LLMs contribute to the generation of 3D buildings in a sandbox game, Minecraft. We propose a Text to Building in Minecraft (T2BM) mo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by IEEE Conference on Games

  24. arXiv:2406.06544  [pdf, other

    cs.AR cs.AI

    TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

    Authors: Yifan Qin, Zheyu Yan, Zixuan Pan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NV… ▽ More

    Submitted 8 May, 2024; originally announced June 2024.

  25. arXiv:2406.05510  [pdf, other

    cs.LG cs.CL

    Representation Learning with Conditional Information Flow Maximization

    Authors: Dou Hu, Lingwei Wei, Wei Zhou, Songlin Hu

    Abstract: This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) fo… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 16 pages, accepted to ACL 2024 (main conference)

  26. arXiv:2406.01489  [pdf, other

    cs.CV

    DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention

    Authors: Yang Liu, Xiaofei Li, Jun Zhang, Shengze Hu, Jun Lei

    Abstract: The increasing difficulty in accurately detecting forged images generated by AIGC(Artificial Intelligence Generative Content) poses many risks, necessitating the development of effective methods to identify and further locate forged areas. In this paper, to facilitate research efforts, we construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model. Our goal i… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2406.01151  [pdf, other

    cs.AR

    A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

    Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

    Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  28. arXiv:2406.00783  [pdf, other

    cs.CV

    AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

    Authors: Li Lin, Santosh, Xin Wang, Shu Hu

    Abstract: AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  29. arXiv:2406.00337  [pdf, other

    cs.HC

    The Odyssey Journey: Hemifacial Spasm Patients' Top-Tier Medical Resource Seeking in China from an Actor-Network Perspective

    Authors: Ka I Chan, Yuntao Wang, Siying Hu, Bo Hei, Zhicong Lu, Pei-Luen Patrick Rau, Yuanchun Shi

    Abstract: Health information-seeking behaviors are critical for individuals managing illnesses, especially in cases like hemifacial spasm (HFS), a condition familiar to specialists but not to general practitioners and the broader public. The limited awareness of HFS often leads to scarce online resources for self-diagnosis and a heightened risk of misdiagnosis. In China, the imbalance in the doctor-to-patie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  30. arXiv:2405.21066  [pdf, other

    cs.CV

    Mixed Diffusion for 3D Indoor Scene Synthesis

    Authors: Siyi Hu, Diego Martin Arroyo, Stephanie Debats, Fabian Manhardt, Luca Carlone, Federico Tombari

    Abstract: Realistic conditional 3D scene synthesis significantly enhances and accelerates the creation of virtual environments, which can also provide extensive training data for computer vision and robotics research among other applications. Diffusion models have shown great performance in related applications, e.g., making precise arrangements of unordered sets. However, these models have not been fully e… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures. Under review. Code to be released at: https://github.com/MIT-SPARK/MiDiffusion

  31. arXiv:2405.19846  [pdf, other

    cs.CL cs.AI

    Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model

    Authors: Chaochen Gao, Xing Wu, Qi Fu, Songlin Hu

    Abstract: Large language models, initially pre-trained with a limited context length, can better handle longer texts by continuing training on a corpus with extended contexts. However, obtaining effective long-context data is challenging due to the scarcity and uneven distribution of long documents across different domains. To address this issue, we propose a Query-centric data synthesis method, abbreviated… ▽ More

    Submitted 19 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  32. arXiv:2405.19842  [pdf, other

    cs.CL cs.AI

    Improve Student's Reasoning Generalizability through Cascading Decomposed CoTs Distillation

    Authors: Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu

    Abstract: Large language models (LLMs) exhibit enhanced reasoning at larger scales, driving efforts to distill these capabilities into smaller models via teacher-student learning. Previous works simply fine-tune student models on teachers' generated Chain-of-Thoughts (CoTs) data. Although these methods enhance in-domain (IND) reasoning performance, they struggle to generalize to out-of-domain (OOD) tasks. W… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  33. arXiv:2405.19737  [pdf, other

    cs.CL cs.AI

    Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation

    Authors: Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu

    Abstract: As Large Language Models (LLMs) scale up and gain powerful Chain-of-Thoughts (CoTs) reasoning abilities, practical resource constraints drive efforts to distill these capabilities into more compact Smaller Language Models (SLMs). We find that CoTs consist mainly of simple reasoning forms, with a small proportion ($\approx 4.7\%$) of key reasoning steps that truly impact conclusions. However, previ… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  34. arXiv:2405.19677  [pdf, other

    cs.CR cs.AI

    Large Language Model Watermark Stealing With Mixed Integer Programming

    Authors: Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, Leo Yu Zhang, Chao Chen, Shengshan Hu, Asif Gill, Shirui Pan

    Abstract: The Large Language Model (LLM) watermark is a newly emerging technique that shows promise in addressing concerns surrounding LLM copyright, monitoring AI-generated text, and preventing its misuse. The LLM watermark scheme commonly includes generating secret keys to partition the vocabulary into green and red lists, applying a perturbation to the logits of tokens in the green list to increase their… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 12 pages

  35. arXiv:2405.18890  [pdf, other

    cs.LG cs.DC

    Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

    Authors: Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya Zhang, Masashi Sugiyama, Yanfeng Wang

    Abstract: In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  36. arXiv:2405.18641  [pdf, other

    cs.LG

    Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning

    Authors: Tiansheng Huang, Sihao Hu, Fatih Ilhan, Selim Furkan Tekin, Ling Liu

    Abstract: Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating states in the finetuning stage to optimize the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimizatio… ▽ More

    Submitted 26 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2405.18080  [pdf, other

    cs.LG

    HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: The purpose of offline multi-task reinforcement learning (MTRL) is to develop a unified policy applicable to diverse tasks without the need for online environmental interaction. Recent advancements approach this through sequence modeling, leveraging the Transformer architecture's scalability and the benefits of parameter sharing to exploit task similarities. However, variations in task content and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  38. arXiv:2405.17743  [pdf, other

    cs.CL cs.AI cs.CE cs.LG

    ORLM: Training Large Language Models for Optimization Modeling

    Authors: Zhengyang Tang, Chenyu Huang, Xin Zheng, Shixi Hu, Zizhuo Wang, Dongdong Ge, Benyou Wang

    Abstract: Large Language Models (LLMs) have emerged as powerful tools for tackling complex Operations Research (OR) problem by providing the capacity in automating optimization modeling. However, current methodologies heavily rely on prompt engineering (e.g., multi-agent cooperation) with proprietary LLMs, raising data privacy concerns that could be prohibitive in industry applications. To tackle this issue… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Work in progress

  39. Adaptive Device-Edge Collaboration on DNN Inference in AIoT: A Digital Twin-Assisted Approach

    Authors: Shisheng Hu, Mushu Li, Jie Gao, Conghao Zhou, Xuemin Shen

    Abstract: Device-edge collaboration on deep neural network (DNN) inference is a promising approach to efficiently utilizing network resources for supporting artificial intelligence of things (AIoT) applications. In this paper, we propose a novel digital twin (DT)-assisted approach to device-edge collaboration on DNN inference that determines whether and when to stop local inference at a device and upload th… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Journal ref: IEEE Internet Things J. (Volume: 11, Issue: 7, 01 April 2024)

  40. arXiv:2405.17251  [pdf, other

    cs.CV

    GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

    Authors: Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, Yuki Mitsufuji

    Abstract: Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth estimation (MDE) has shown promise in handling in-the-wild images. In these methods, an input view is geometrically warped to… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://GenWarp-NVS.github.io

  41. arXiv:2405.17098  [pdf, other

    cs.LG

    Q-value Regularized Transformer for Offline Reinforcement Learning

    Authors: Shengchao Hu, Ziqing Fan, Chaoqin Huang, Li Shen, Ya Zhang, Yanfeng Wang, Dacheng Tao

    Abstract: Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Conditional Sequence Modeling (CSM), a paradigm that learns the action distribution based on history trajectory and target returns for each state. However, these methods often struggle with stitching together optimal trajectories from sub-optimal ones due to the inconsistency between the sampled returns… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Published at ICML 2024

  42. arXiv:2405.15143  [pdf, other

    cs.LG cs.AI cs.CL

    Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

    Authors: Cong Lu, Shengran Hu, Jeff Clune

    Abstract: Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  43. arXiv:2405.15062  [pdf, other

    cs.LG

    Model-Agnostic Utility-Preserving Biometric Information Anonymization

    Authors: Chun-Fu Chen, Bill Moriarty, Shaohan Hu, Sean Moran, Marco Pistoia, Vincenzo Piuri, Pierangela Samarati

    Abstract: The recent rapid advancements in both sensing and machine learning technologies have given rise to the universal collection and utilization of people's biometrics, such as fingerprints, voices, retina/facial scans, or gait/motion/gestures data, enabling a wide range of applications including authentication, health monitoring, or much more sophisticated analytics. While providing better user experi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint of IJIS version, https://link.springer.com/article/10.1007/s10207-024-00862-8

  44. arXiv:2405.14981  [pdf, other

    cs.LG

    MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective

    Authors: Yizhuo Chen, Chun-Fu Chen, Hsiang Hsu, Shaohan Hu, Marco Pistoia, Tarek Abdelzaher

    Abstract: The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people's private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  45. arXiv:2405.14853  [pdf, other

    cs.LG cs.AI cs.RO

    Privileged Sensing Scaffolds Reinforcement Learning

    Authors: Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

    Abstract: We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ICLR 2024 Spotlight version

  46. arXiv:2405.14791  [pdf, other

    cs.LG cs.CV cs.DC

    Recurrent Early Exits for Federated Learning with Heterogeneous Clients

    Authors: Royson Lee, Javier Fernandez-Marques, Shell Xu Hu, Da Li, Stefanos Laskaridis, Łukasz Dudziak, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane

    Abstract: Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities; clients have differing compute and memory requirements. To tackle this challenge, recent state-of-the-art approaches leverage the use of early exits. Nonetheless, these approaches fal… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted at the 41st International Conference on Machine Learning (ICML 2024)

  47. arXiv:2405.14291  [pdf, other

    cs.LG cs.AI cs.DC

    Variational Bayes for Federated Continual Learning

    Authors: Dezhong Yao, Sanmu Li, Yutong Dai, Zhiqiang Xu, Shengshan Hu, Peilin Zhao, Lichao Sun

    Abstract: Federated continual learning (FCL) has received increasing attention due to its potential in handling real-world streaming data, characterized by evolving data distributions and varying client classes over time. The constraints of storage limitations and privacy concerns confine local models to exclusively access the present data within each learning cycle. Consequently, this restriction induces p… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  48. arXiv:2405.13059  [pdf, other

    cs.CL cs.AI

    RNG: Reducing Multi-level Noise and Multi-grained Semantic Gap for Joint Multimodal Aspect-Sentiment Analysis

    Authors: Yaxin Liu, Yan Zhou, Ziming Li, Jinchuan Zhang, Yu Shang, Chenyang Zhang, Songlin Hu

    Abstract: As an important multimodal sentiment analysis task, Joint Multimodal Aspect-Sentiment Analysis (JMASA), aiming to jointly extract aspect terms and their associated sentiment polarities from the given text-image pairs, has gained increasing concerns. Existing works encounter two limitations: (1) multi-level modality noise, i.e., instance- and feature-level noise; and (2) multi-grained semantic gap,… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024

  49. arXiv:2405.12218  [pdf, other

    cs.CV

    MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

    Authors: Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume… ▽ More

    Submitted 15 July, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: ECCV2024, Project page: https://mvsgaussian.github.io/ , Code: https://github.com/TQTQliu/MVSGaussian

  50. arXiv:2405.12139  [pdf, other

    cs.CV

    DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM

    Authors: Xuchen Li, Xiaokun Feng, Shiyu Hu, Meiqi Wu, Dailing Zhang, Jing Zhang, Kaiqi Huang

    Abstract: Visual Language Tracking (VLT) enhances single object tracking (SOT) by integrating natural language descriptions from a video, for the precise tracking of a specified object. By leveraging high-level semantic information, VLT guides object tracking, alleviating the constraints associated with relying on a visual modality. Nevertheless, most VLT benchmarks are annotated in a single granularity and… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR Workshop 2024, Oral Presentation