Skip to main content

Showing 1–50 of 764 results for author: Hu, H

  1. arXiv:2407.13532  [pdf, other

    cs.CR cs.DB

    PriPL-Tree: Accurate Range Query for Arbitrary Distribution under Local Differential Privacy

    Authors: Leixia Wang, Qingqing Ye, Haibo Hu, Xiaofeng Meng

    Abstract: Answering range queries in the context of Local Differential Privacy (LDP) is a widely studied problem in Online Analytical Processing (OLAP). Existing LDP solutions all assume a uniform data distribution within each domain partition, which may not align with real-world scenarios where data distribution is varied, resulting in inaccurate estimates. To address this problem, we introduce PriPL-Tree,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: To appear in VLDB 2024

  2. arXiv:2407.10419  [pdf, other

    cs.CV cs.LG

    Omni-Dimensional Frequency Learner for General Time Series Analysis

    Authors: Xianing Chen. Hanting Chen, Hailin Hu

    Abstract: Frequency domain representation of time series feature offers a concise representation for handling real-world time series data with inherent complexity and dynamic nature. However, current frequency-based methods with complex operations still fall short of state-of-the-art time domain methods for general time series analysis. In this work, we present Omni-Dimensional Frequency Learner (ODFL) mode… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  3. arXiv:2407.08961  [pdf

    eess.IV cs.CV

    Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT

    Authors: Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang

    Abstract: Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  4. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: We have previously shown all understanding or learning are compression, under reasonable assumptions. In principle, better understanding of data should improve data compression. Traditional compression methodologies focus on encoding frequencies or some other computable properties of data. Large language models approximate the uncomputable Solomonoff distribution, opening up a whole new avenue to… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  5. arXiv:2407.05578  [pdf, other

    cs.CV

    FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

    Authors: Jiedong Zhuang, Jiaqi Hu, Lianrui Mu, Rui Hu, Xiaoyu Liang, Jiangnan Ye, Haoji Hu

    Abstract: CLIP has achieved impressive zero-shot performance after pre-training on a large-scale dataset consisting of paired image-text data. Previous works have utilized CLIP by incorporating manually designed visual prompts like colored circles and blur masks into the images to guide the model's attention, showing enhanced zero-shot performance in downstream tasks. Although these methods have achieved pr… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV2024

  6. arXiv:2407.05407  [pdf, other

    cs.SD cs.AI eess.AS

    CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

    Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

    Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More

    Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

  7. arXiv:2407.05112  [pdf, other

    cs.CR cs.AI

    Releasing Malevolence from Benevolence: The Menace of Benign Data on Machine Unlearning

    Authors: Binhao Ma, Tianhang Zheng, Hongsheng Hu, Di Wang, Shuo Wang, Zhongjie Ba, Zhan Qin, Kui Ren

    Abstract: Machine learning models trained on vast amounts of real or synthetic data often achieve outstanding predictive performance across various domains. However, this utility comes with increasing concerns about privacy, as the training data may include sensitive information. To address these concerns, machine unlearning has been proposed to erase specific data samples from models. While some unlearning… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  8. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  9. arXiv:2407.03204  [pdf, other

    cs.CV

    Expressive Gaussian Human Avatars from Monocular RGB Video

    Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang

    Abstract: Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  10. arXiv:2407.01898  [pdf, other

    cs.RO

    Learning Granular Media Avalanche Behavior for Indirectly Manipulating Obstacles on a Granular Slope

    Authors: Haodi Hu, Feifei Qian, Daniel Seita

    Abstract: Legged robot locomotion on sand slopes is challenging due to the complex dynamics of granular media and how the lack of solid surfaces can hinder locomotion. A promising strategy, inspired by ghost crabs and other organisms in nature, is to strategically interact with rocks, debris, and other obstacles to facilitate movement. To provide legged robots with this ability, we present a novel approach… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Submitted to CoRL 2024

  11. arXiv:2407.01639  [pdf, other

    cs.LG cs.SE

    ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

    Authors: Tianhao Wei, Luca Marzari, Kai S. Yun, Hanjiang Hu, Peizhi Niu, Xusheng Luo, Changliu Liu

    Abstract: Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the fir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  12. arXiv:2407.00386  [pdf, other

    cs.NE cs.AI

    Multi-task multi-constraint differential evolution with elite-guided knowledge transfer for coal mine integrated energy system dispatching

    Authors: Canyun Dai, Xiaoyan Sun, Hejuan Hu, Wei Song, Yong Zhang, Dunwei Gong

    Abstract: The dispatch optimization of coal mine integrated energy system is challenging due to high dimensionality, strong coupling constraints, and multiobjective. Existing constrained multiobjective evolutionary algorithms struggle with locating multiple small and irregular feasible regions, making them inaplicable to this problem. To address this issue, we here develop a multitask evolutionary algorithm… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  13. Detecting Frames in News Headlines and Lead Images in U.S. Gun Violence Coverage

    Authors: Isidora Chara Tourni, Lei Guo, Hengchang Hu, Edward Halim, Prakash Ishwar, Taufiq Daryanto, Mona Jalal, Boqi Chen, Margrit Betke, Fabian Zhafransyah, Sha Lai, Derry Tanti Wijaya

    Abstract: News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called \say{frames} in communication research. We study, for the first time, the value of combining lead i… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: published at Findings of the Association for Computational Linguistics: EMNLP 2021

  14. arXiv:2406.16531  [pdf, other

    cs.CV

    GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization

    Authors: Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu

    Abstract: The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location(IMDL). However, the lack of a large-scale data foundation makes IMDL task unattainable. In this paper, a local manipulation pipeline is designed… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Code page: https://github.com/chenyirui/GIM

  15. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires… ▽ More

    Submitted 26 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  16. arXiv:2406.14171  [pdf, other

    cs.AI cs.CL

    Ranking LLMs by compression

    Authors: Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang

    Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 tables

  17. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  18. arXiv:2406.12385  [pdf, other

    cs.AR

    Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal

    Authors: Wenqi Jiang, Hang Hu, Torsten Hoefler, Gustavo Alonso

    Abstract: Vector search systems are indispensable in large language model (LLM) serving, search engines, and recommender systems, where minimizing online search latency is essential. Among various algorithms, graph-based vector search (GVS) is particularly popular due to its high search performance and quality. To efficiently serve low-latency GVS, we propose a hardware-algorithm co-design solution includin… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  19. arXiv:2406.12349  [pdf, other

    math.OC cs.LG

    Effective Generation of Feasible Solutions for Integer Programming via Guided Diffusion

    Authors: Hao Zeng, Jiaqi Wang, Avirup Das, Junying He, Kunpeng Han, Haoyuan Hu, Mingfei Sun

    Abstract: Feasible solutions are crucial for Integer Programming (IP) since they can substantially speed up the solving process. In many applications, similar IP instances often exhibit similar structures and shared solution distributions, which can be potentially modeled by deep learning methods. Unfortunately, existing deep-learning-based algorithms, such as Neural Diving and Predict-and-search framework,… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to SIGKDD 2024

  20. arXiv:2406.10501  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition

    Authors: Weichao Zhao, Wengang Zhou, Hezhen Hu, Min Wang, Houqiang Li

    Abstract: Recently, there have been efforts to improve the performance in sign language recognition by designing self-supervised learning methods. However, these methods capture limited information from sign pose data in a frame-wise learning manner, leading to sub-optimal solutions. To this end, we propose a simple yet effective self-supervised contrastive learning framework to excavate rich context via sp… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted by TIP2023

  21. arXiv:2406.10347  [pdf, other

    cs.NI

    A Near-Optimal Category Information Sampling in RFID Systems

    Authors: Xiujun Wang, Zhi Liu, Xiaokang Zhou, Yong Liao, Han Hu, Xiao Zheng, Jie Li

    Abstract: In many RFID-enabled applications, objects are classified into different categories, and the information associated with each object's category (called category information) is written into the attached tag, allowing the reader to access it later. The category information sampling in such RFID systems, which is to randomly choose (sample) a few tags from each category and collect their category in… ▽ More

    Submitted 18 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 37 pages, 11 figures

  22. arXiv:2406.09810  [pdf, other

    cs.RO eess.SY

    Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions

    Authors: Haimin Hu, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac

    Abstract: Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  23. arXiv:2406.09770  [pdf, other

    cs.LG cs.AI

    Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

    Abstract: Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for l… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: code is available at https://github.com/tanganke/pareto_set_learning

  24. arXiv:2406.03280  [pdf, other

    cs.LG cs.AI cs.CL

    FusionBench: A Comprehensive Benchmark of Deep Model Fusion

    Authors: Anke Tang, Li Shen, Yong Luo, Han Hu, Bo Du, Dacheng Tao

    Abstract: Deep model fusion is an emerging technique that unifies the predictions or parameters of several deep neural networks into a single model in a cost-effective and data-efficient manner. This enables the unified model to take advantage of the original models' strengths, potentially exceeding their performance. Although a variety of deep model fusion techniques have been introduced, their evaluations… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Project homepage: https://github.com/tanganke/fusion_bench

  25. arXiv:2406.00779  [pdf, other

    cs.LG

    Differentiation of Multi-objective Data-driven Decision Pipeline

    Authors: Peng Li, Lixia Wu, Chaoqun Feng, Haoyuan Hu, Lei Fu, Jieping Ye

    Abstract: Real-world scenarios frequently involve multi-objective data-driven optimization problems, characterized by unknown problem coefficients and multiple conflicting objectives. Traditional two-stage methods independently apply a machine learning model to estimate problem coefficients, followed by invoking a solver to tackle the predicted optimization problem. The independent use of optimization solve… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  26. arXiv:2405.20984  [pdf, other

    cs.LG

    Bayesian Design Principles for Offline-to-Online Reinforcement Learning

    Authors: Hao Hu, Yiqin Yang, Jianing Ye, Chengjie Wu, Ziqing Mai, Yujing Hu, Tangjie Lv, Changjie Fan, Qianchuan Zhao, Chongjie Zhang

    Abstract: Offline reinforcement learning (RL) is crucial for real-world applications where exploration can be costly or unsafe. However, offline learned policies are often suboptimal, and further online fine-tuning is required. In this paper, we tackle the fundamental dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimis… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Forty-first International Conference on Machine Learning (ICML), 2024

  27. arXiv:2405.20666  [pdf, other

    cs.CV

    MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition

    Authors: Weichao Zhao, Hezhen Hu, Wengang Zhou, Yunyao Mao, Min Wang, Houqiang Li

    Abstract: Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Although current pre-training approaches have alleviated this dilemma to some extent and yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: 1) Explicit motion information is usually disregarded in previous… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by TCSVT 2024

  28. arXiv:2405.20335  [pdf, other

    cs.CL

    Xwin-LM: Strong and Scalable Alignment Practice for LLMs

    Authors: Bolin Ni, JingCheng Hu, Yixuan Wei, Houwen Peng, Zheng Zhang, Gaofeng Meng, Han Hu

    Abstract: In this work, we present Xwin-LM, a comprehensive suite of alignment methodologies for large language models (LLMs). This suite encompasses several key techniques, including supervised finetuning (SFT), reward modeling (RM), rejection sampling finetuning (RS), and direct preference optimization (DPO). The key components are as follows: (1) Xwin-LM-SFT, models initially finetuned with high-quality… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  29. arXiv:2405.18658  [pdf, other

    q-bio.NC cs.AI

    D-CoRP: Differentiable Connectivity Refinement for Functional Brain Networks

    Authors: Haoyu Hu, Hongrun Zhang, Chao Li

    Abstract: Brain network is an important tool for understanding the brain, offering insights for scientific research and clinical diagnosis. Existing models for brain networks typically primarily focus on brain regions or overlook the complexity of brain connectivities. MRI-derived brain network data is commonly susceptible to connectivity noise, underscoring the necessity of incorporating connectivities int… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  30. arXiv:2405.17796  [pdf, ps, other

    cs.LG stat.ML

    Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff

    Authors: Jian Qian, Haichen Hu, David Simchi-Levi

    Abstract: Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression (Simchi-Levi and Xu, 2021), we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon H (as known as CMDP with H layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumpt… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  31. arXiv:2405.17372  [pdf, other

    cs.AI cs.LG cs.RO

    BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

    Authors: Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

    Abstract: Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to lo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  32. arXiv:2405.14444  [pdf

    cs.CV

    DuEDL: Dual-Branch Evidential Deep Learning for Scribble-Supervised Medical Image Segmentation

    Authors: Yitong Yang, Xinli Xu, Haigen Hu, Haixia Long, Qianwei Zhou, Qiu Guan

    Abstract: Despite the recent progress in medical image segmentation with scribble-based annotations, the segmentation results of most models are still not ro-bust and generalizable enough in open environments. Evidential deep learn-ing (EDL) has recently been proposed as a promising solution to model predictive uncertainty and improve the reliability of medical image segmen-tation. However directly applying… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures

  33. arXiv:2405.13636  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

    Authors: Jiaju Lin, Haoxuan Hu

    Abstract: Audio tagging is an important task of mapping audio samples to their corresponding categories. Recently endeavours that exploit transformer models in this field have achieved great success. However, the quadratic self-attention cost limits the scaling of audio transformer models and further constrains the development of more universal audio models. In this paper, we attempt to solve this problem b… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  34. arXiv:2405.12163  [pdf, other

    cs.CL cs.AI

    Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging

    Authors: Xiaobo Liang, Haoke Zhang, Helan hu, Juntao Li, Jun Xu, Min Zhang

    Abstract: The rapid advancement of large language models has given rise to a plethora of applications across a myriad of real-world tasks, mainly centered on aligning with human intent. However, the complexities inherent in human intent necessitate a dependence on labor-intensive and time-consuming human evaluation. To alleviate this constraint, we delve into the paradigm of employing open-source large lang… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  35. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  36. arXiv:2405.08125  [pdf, other

    cs.CY cs.AI cs.LG

    AI-Cybersecurity Education Through Designing AI-based Cyberharassment Detection Lab

    Authors: Ebuka Okpala, Nishant Vishwamitra, Keyan Guo, Song Liao, Long Cheng, Hongxin Hu, Yongkai Wu, Xiaohong Yuan, Jeannette Wade, Sajad Khorsandroo

    Abstract: Cyberharassment is a critical, socially relevant cybersecurity problem because of the adverse effects it can have on targeted groups or individuals. While progress has been made in understanding cyber-harassment, its detection, attacks on artificial intelligence (AI) based cyberharassment systems, and the social problems in cyberharassment detectors, little has been done in designing experiential… ▽ More

    Submitted 16 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages

  37. arXiv:2405.07994  [pdf

    eess.IV cs.AI cs.CV cs.LG

    BubbleID: A Deep Learning Framework for Bubble Interface Dynamics Analysis

    Authors: Christy Dunlap, Changgen Li, Hari Pandey, Ngan Le, Han Hu

    Abstract: This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetim… ▽ More

    Submitted 20 March, 2024; originally announced May 2024.

    Comments: 16 pages, 4 figures

  38. arXiv:2405.07018  [pdf, other

    cs.CR

    Shadow-Free Membership Inference Attacks: Recommender Systems Are More Vulnerable Than You Thought

    Authors: Xiaoxiao Chi, Xuyun Zhang, Yan Wang, Lianyong Qi, Amin Beheshti, Xiaolong Xu, Kim-Kwang Raymond Choo, Shuo Wang, Hongsheng Hu

    Abstract: Recommender systems have been successfully applied in many applications. Nonetheless, recent studies demonstrate that recommender systems are vulnerable to membership inference attacks (MIAs), leading to the leakage of users' membership privacy. However, existing MIAs relying on shadow training suffer a large performance drop when the attacker lacks knowledge of the training data distribution and… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IJCAI-24

  39. arXiv:2405.05928  [pdf

    cs.HC

    Moderating Embodied Cyber Threats Using Generative AI

    Authors: Keyan Guo, Freeman Guo, Hongxin Hu

    Abstract: The advancement in computing and hardware, like spatial computing and VR headsets (e.g., Apple's Vision Pro) [1], has boosted the popularity of social VR platforms (VRChat, Rec Room, Meta HorizonWorlds) [2, 3, 4]. Unlike traditional digital interactions, social VR allows for more immersive experiences, with avatars that mimic users' real-time movements and enable physical-like interactions. Howeve… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  40. arXiv:2405.04858  [pdf, other

    cs.CV

    Pedestrian Attribute Recognition as Label-balanced Multi-label Learning

    Authors: Yibo Zhou, Hai-Miao Hu, Yirong Xiang, Xiaokang Zhang, Haotian Wu

    Abstract: Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted as ICML2024 main conference paper

  41. arXiv:2405.04115  [pdf, other

    cs.CR

    A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning

    Authors: Xiaoyang Xu, Mengda Yang, Wenzhe Yi, Ziang Li, Juan Wang, Hongxin Hu, Yong Zhuang, Yaxin Liu

    Abstract: Split Learning (SL) is a distributed learning framework renowned for its privacy-preserving features and minimal computational requirements. Previous research consistently highlights the potential privacy breaches in SL systems by server adversaries reconstructing training data. However, these studies often rely on strong assumptions or compromise system utility to enhance attack performance. This… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  42. arXiv:2405.03673  [pdf, other

    cs.CV cs.AI

    MemoryMamba: Memory-Augmented State Space Model for Defect Recognition

    Authors: Qianning Wang, He Hu, Yucheng Zhou

    Abstract: As automation advances in manufacturing, the demand for precise and sophisticated defect detection technologies grows. Existing vision models for defect recognition methods are insufficient for handling the complexities and variations of defects in contemporary manufacturing settings. These models especially struggle in scenarios involving limited or imbalanced defect data. In this work, we introd… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 15 pages, 7 figures

  43. arXiv:2405.03110  [pdf, other

    cs.IR

    Vector Quantization for Recommender Systems: A Review and Outlook

    Authors: Qijiong Liu, Xiaoyu Dong, Jiaren Xiao, Nuo Chen, Hengchang Hu, Jieming Zhu, Chenxu Zhu, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: Vector quantization, renowned for its unparalleled feature compression capabilities, has been a prominent topic in signal processing and machine learning research for several decades and remains widely utilized today. With the emergence of large models and generative AI, vector quantization has gained popularity in recommender systems, establishing itself as a preferred solution. This paper starts… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  44. arXiv:2405.02676  [pdf, other

    cs.CV cs.GR

    Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

    Authors: Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu

    Abstract: Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and t… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024 Conference Track

    ACM Class: I.5.4

  45. arXiv:2405.01868  [pdf, other

    cs.CL

    Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

    Authors: Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

    Abstract: This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Main paper 8 pages; References and Appendix 9 pages; 7 figures and 14 tables

  46. arXiv:2404.19509  [pdf, other

    cs.CL

    Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom

    Authors: Shisen Yue, Siyuan Song, Xinyuan Cheng, Hai Hu

    Abstract: Understanding the non-literal meaning of an utterance is critical for large language models (LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp, the first Chinese multi-turn-dialogue-based dataset aimed at conversational implicature, sourced from dialogues in the Chinese sitcom $\textit{My Own Swordsman}$. It includes 200 carefully handcrafted questions, all a… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 14 pages, 8 tables and 5 figures

    ACM Class: J.5

  47. arXiv:2404.19307  [pdf, other

    cs.SE cs.CR

    Enhancing GUI Exploration Coverage of Android Apps with Deep Link-Integrated Monkey

    Authors: Han Hu, Han Wang, Ruiqi Dong, Xiao Chen, Chunyang Chen

    Abstract: Mobile apps are ubiquitous in our daily lives for supporting different tasks such as reading and chatting. Despite the availability of many GUI testing tools, app testers still struggle with low testing code coverage due to tools frequently getting stuck in loops or overlooking activities with concealed entries. This results in a significant amount of testing time being spent on redundant and repe… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  48. arXiv:2404.15598  [pdf, other

    cs.LG cs.CR

    Federated Learning with Only Positive Labels by Exploring Label Correlations

    Authors: Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

    Abstract: Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue ca… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems

  49. arXiv:2404.13456  [pdf, other

    cs.LG cs.RO eess.SY

    Real-Time Safe Control of Neural Network Dynamic Models with Sound Approximation

    Authors: Hanjiang Hu, Jianglin Lan, Changliu Liu

    Abstract: Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernste… ▽ More

    Submitted 20 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: Camera-ready version of L4DC 2024, 12 pages, 3 figures, 4 tables

  50. arXiv:2404.09172  [pdf, other

    cs.CV cs.AI

    LoopAnimate: Loopable Salient Object Animation

    Authors: Fanyi Wang, Peng Liu, Haotian Hu, Dan Meng, Jingwen Su, Jinjin Xu, Yanhao Zhang, Xiaoming Ren, Zhiwang Zhang

    Abstract: Research on diffusion model-based video generation has advanced rapidly. However, limitations in object fidelity and generation length hinder its practical applications. Additionally, specific domains like animated wallpapers require seamless looping, where the first and last frames of the video match seamlessly. To address these challenges, this paper proposes LoopAnimate, a novel method for gene… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.