Skip to main content

Showing 1–50 of 10,667 results for author: Wang, H

  1. arXiv:2407.09016  [pdf, other

    cs.RO

    OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

    Authors: Meng Wei, Tai Wang, Yilun Chen, Hanqing Wang, Jiangmiao Pang, Xihui Liu

    Abstract: Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-Language Models (VLMs) offer a promising solution by extending object recognition beyond predefined categories, efficient goal-oriented exploration beco… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.08937  [pdf, other

    cs.CL cs.AI

    Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

    Authors: Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin

    Abstract: To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential lea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 MAIN

  3. Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

    Authors: Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

    Abstract: The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task e… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, accepted by GECCO 2024 poster

  4. PAIL: Performance based Adversarial Imitation Learning Engine for Carbon Neutral Optimization

    Authors: Yuyang Ye, Lu-An Tang, Haoyu Wang, Runlong Yu, Wenchao Yu, Erhu He, Haifeng Chen, Hui Xiong

    Abstract: Achieving carbon neutrality within industrial operations has become increasingly imperative for sustainable development. It is both a significant challenge and a key opportunity for operational optimization in industry 4.0. In recent years, Deep Reinforcement Learning (DRL) based methods offer promising enhancements for sequential optimization processes and can be used for reducing carbon emission… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  5. arXiv:2407.08770  [pdf, other

    cs.AI

    Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

    Authors: Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang

    Abstract: Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current methods for detoxification or preventing jailbreaking usually in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 14 figures

    MSC Class: 68T50 (Primary) 68T07; 62M45 (Secondary) ACM Class: I.2.7

  6. arXiv:2407.08572  [pdf, other

    cs.CV

    Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

    Authors: Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang

    Abstract: Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that t… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  7. arXiv:2407.08546  [pdf, other

    cs.CV cs.LG q-bio.QM

    Quantitative Evaluation of the Saliency Map for Alzheimer's Disease Classifier with Anatomical Segmentation

    Authors: Yihan Zhang, Xuanshuo Zhang, Wei Wu, Haohan Wang

    Abstract: Saliency maps have been widely used to interpret deep learning classifiers for Alzheimer's disease (AD). However, since AD is heterogeneous and has multiple subtypes, the pathological mechanism of AD remains not fully understood and may vary from patient to patient. Due to the lack of such understanding, it is difficult to comprehensively and effectively assess the saliency map of AD classifier. I… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.08532  [pdf, other

    cs.CR cs.SE

    Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models

    Authors: Ying Zhang, Xiaoyan Zhou, Hui Wen, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

    Abstract: Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 11 figures

  9. arXiv:2407.08507  [pdf, other

    cs.CV

    Bootstrapping Vision-language Models for Self-supervised Remote Physiological Measurement

    Authors: Zijie Yue, Miaojing Shi, Hanli Wang, Shuai Ding, Qijun Chen, Shanlin Yang

    Abstract: Facial video-based remote physiological measurement is a promising research area for detecting human vital signs (e.g., heart rate, respiration frequency) in a non-contact way. Conventional approaches are mostly supervised learning, requiring extensive collections of facial videos and synchronously recorded photoplethysmography (PPG) signals. To tackle it, self-supervised learning has recently gai… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.08422  [pdf, other

    cs.CR cs.AI

    On the (In)Security of LLM App Stores

    Authors: Xinyi Hou, Yanjie Zhao, Haoyu Wang

    Abstract: LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we co… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.08194  [pdf, other

    cond-mat.quant-gas cond-mat.str-el quant-ph

    Uncovering Emergent Spacetime Supersymmetry with Rydberg Atom Arrays

    Authors: Chengshu Li, Shang Liu, Hanteng Wang, Wenjun Zhang, Zi-Xiang Li, Hui Zhai, Yingfei Gu

    Abstract: In the zoo of emergent symmetries in quantum many-body physics, the previously unrealized emergent spacetime supersymmetry (SUSY) is particularly intriguing. Although it was known that spacetime SUSY could emerge at the (1+1)d tricritical Ising transition, an experimental realization is still absent. In this letter, we propose to realize the tricritical Ising transition with Rydberg atom arrays, t… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 pages, 3 figures

  12. arXiv:2407.07984  [pdf

    cond-mat.mtrl-sci

    Pseudosymmetry in Tetragonal Perovskite SrIrO$_3$ Synthesized under High Pressure

    Authors: Haozhe Wang, Alberto de la Torre, Joseph T. Race, Qiaochu Wang, Jacob P. C. Ruff, Patrick M. Woodward, Kemp W. Plumb, David Walker, Weiwei Xie

    Abstract: In this study, we report a tetragonal perovskite structure of SrIrO$_3$ (P4/mmm, a = 3.9362(9) Å, c = 7.880(3) Å) synthesized at 6 GPa and 1400 $°$C, employing the ambient pressure monoclinic SrIrO$_3$ with distorted 6H structure as a precursor. The crystal structure of tetragonal SrIrO3 was evaluated on the basis of single crystal and powder X-ray diffraction. A cubic indexing was observed attrib… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 31 pages, 12 figures

  13. arXiv:2407.07844  [pdf, other

    cs.CV

    OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

    Authors: Hao Wang, Pengzhen Ren, Zequn Jie, Xiao Dong, Chengjian Feng, Yinlong Qian, Lin Ma, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan, Xiaodan Liang

    Abstract: Open-vocabulary detection is a challenging task due to the requirement of detecting objects based on class names, including those not encountered during training. Existing methods have shown strong zero-shot detection capabilities through pre-training on diverse large-scale datasets. However, these approaches still face two primary challenges: (i) how to universally integrate diverse data sources… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Technical Report

  14. arXiv:2407.07763  [pdf, other

    cs.CV

    S&D Messenger: Exchanging Semantic and Domain Knowledge for Generic Semi-Supervised Medical Image Segmentation

    Authors: Qixiang Zhang, Haonan Wang, Xiaomeng Li

    Abstract: Semi-supervised medical image segmentation (SSMIS) has emerged as a promising solution to tackle the challenges of time-consuming manual labeling in the medical field. However, in practical scenarios, there are often domain variations within the datasets, leading to derivative scenarios like semi-supervised medical domain generalization (Semi-MDG) and unsupervised medical domain adaptation (UMDA).… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 10 pages, under review of IEEE Transcations on Medical Imaging

  15. arXiv:2407.07666  [pdf

    cs.CL cs.AI

    A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

    Authors: Ting Fang Tan, Kabilan Elangovan, Jasmine Ong, Nigam Shah, Joseph Sung, Tien Yin Wong, Lan Xue, Nan Liu, Haibo Wang, Chang Fu Kuo, Simon Chesterman, Zee Kin Yeong, Daniel SW Ting

    Abstract: A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  16. arXiv:2407.07651  [pdf, other

    hep-ex physics.data-an

    Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

    Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

    Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  17. arXiv:2407.07552  [pdf

    physics.app-ph

    Attojoule superconducting thermal logic and memories

    Authors: Hui Wang, Niels Noordzij, Stephan Steinhauer, Thomas Descamps, Eitan Oksenberg, Val Zwiller, Iman Esmaeil Zadeh

    Abstract: Due to stringent thermal budgets in cryogenic technologies such as superconducting quantum computers and sensors, minimizing the energy dissipation and power consumption of cryogenic electronic components is pivotal for large-scale devices. However, electronic building blocks that simultaneously offer low energy consumption, fast switching, low error rates, a small footprint and simple fabrication… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  18. arXiv:2407.07544  [pdf, other

    cs.CV cs.AI

    Disentangling Masked Autoencoders for Unsupervised Domain Generalization

    Authors: An Zhang, Han Wang, Xiang Wang, Tat-Seng Chua

    Abstract: Domain Generalization (DG), designed to enhance out-of-distribution (OOD) generalization, is all about learning invariance against domain shifts utilizing sufficient supervision signals. Yet, the scarcity of such labeled data has led to the rise of unsupervised domain generalization (UDG) - a more important yet challenging task in that models are trained across diverse domains in an unsupervised m… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  19. arXiv:2407.07419  [pdf, other

    eess.SP

    Timing Recovery for Non-Orthogonal Multiple Access with Asynchronous Clock

    Authors: Qingxin Lu, Haide Wang, Wenxuan Mo, Ji Zhou, Weiping Liu, Changyuan Yu

    Abstract: A passive optical network (PON) based on non-orthogonal multiple access (NOMA) meets low latency and high capacity. In the NOMA-PON, the asynchronous clock between the strong and weak optical network units (ONUs) causes the timing error and phase noise on the signal of the weak ONU. The theoretical derivation shows that the timing error and phase noise can be independently compensated. In this Let… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: The Letter has been submitted to the IEEE Photonics Technology Letters

  20. arXiv:2407.07390  [pdf, other

    cond-mat.soft cond-mat.stat-mech

    Hole Statistics of Equilibrium 2D and 3D Hard-Sphere Crystals

    Authors: Haina Wang, David A. Huse, Salvatore Torquato

    Abstract: The probability of finding a spherical "hole" of a given radius $r$ contains crucial structural information about many-body systems. Such hole statistics, including the void conditional nearest-neighbor probability functions $G_V(r)$, have been well studied for hard-sphere fluids in $d$-dimensional Euclidean space $\mathbb{R}^d$. However, little is known about these functions for hard-sphere cryst… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  21. arXiv:2407.07374  [pdf, other

    cs.CV

    DuInNet: Dual-Modality Feature Interaction for Point Cloud Completion

    Authors: Xinpu Liu, Baolin Hou, Hanyun Wang, Ke Xu, Jianwei Wan, Yulan Guo

    Abstract: To further promote the development of multimodal point cloud completion, we contribute a large-scale multimodal point cloud completion benchmark ModelNet-MPC with richer shape categories and more diverse test data, which contains nearly 400,000 pairs of high-quality point clouds and rendered images of 40 categories. Besides the fully supervised point cloud completion task, two additional tasks inc… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Under Review, 13 pages, 7 figures

  22. arXiv:2407.07289  [pdf, other

    cs.CV

    Deformable Feature Alignment and Refinement for Moving Infrared Dim-small Target Detection

    Authors: Dengyan Luo, Yanping Xiang, Hu Wang, Luping Ji, Shuai Li, Mao Ye

    Abstract: The detection of moving infrared dim-small targets has been a challenging and prevalent research topic. The current state-of-the-art methods are mainly based on ConvLSTM to aggregate information from adjacent frames to facilitate the detection of the current frame. However, these methods implicitly utilize motion information only in the training stage and fail to explicitly explore motion compensa… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  23. arXiv:2407.07171  [pdf, other

    cs.CV

    ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation

    Authors: Yuyuan Liu, Yuanhong Chen, Hu Wang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

    Abstract: The costly and time-consuming annotation process to produce large training sets for modelling semantic LiDAR segmentation methods has motivated the development of semi-supervised learning (SSL) methods. However, such SSL approaches often concentrate on employing consistency learning only for individual LiDAR representations. This narrow focus results in limited perturbations that generally fail to… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 27 pages (15 pages main paper and 12 pages supplementary with references), ECCV 2024 accepted

  24. arXiv:2407.07125  [pdf, other

    gr-qc astro-ph.IM physics.data-an

    Rapid Parameter Estimation for Merging Massive Black Hole Binaries Using ODE-Based Generative Models

    Authors: Bo Liang, Minghui Du, He Wang, Yuxiang Xu, Chang Liu, Xiaotong Wei, Peng Xu, Li-e Qiang, Ziren Luo

    Abstract: Detecting the coalescences of massive black hole binaries (MBHBs) is one of the primary targets for space-based gravitational wave observatories such as LISA, Taiji, and Tianqin. The fast and accurate parameter estimation of merging MBHBs is of great significance for both astrophysics and the global fitting of all resolvable sources. However, such analyses entail significant computational costs. T… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  25. arXiv:2407.06984  [pdf, other

    cs.CV

    Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images

    Authors: Chuanrui Zhang, Yonggen Ling, Minglei Lu, Minghan Qin, Haoqian Wang

    Abstract: We study the 3D object understanding task for manipulating everyday objects with different material properties (diffuse, specular, transparent and mixed). Existing monocular and RGB-D methods suffer from scale ambiguity due to missing or imprecise depth measurements. We present CODERS, a one-stage approach for Category-level Object Detection, pose Estimation and Reconstruction from Stereo images.… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  26. arXiv:2407.06853  [pdf, other

    cs.CR

    TimeTravel: Real-time Timing Drift Attack on System Time Using Acoustic Waves

    Authors: Jianshuo Liu, Hong Li, Haining Wang, Mengjie Sun, Hui Wen, Jinfa Wang, Limin Sun

    Abstract: Real-time Clock (RTC) has been widely used in various real-time systems to provide precise system time. In this paper, we reveal a new security vulnerability of the RTC circuit, where the internal storage time or timestamp can be arbitrarily modified forward or backward. The security threat of dynamic modifications of system time caused by this vulnerability is called TimeTravel. Based on acoustic… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by USENIX Security 2024 winter cycle and will appear in USENIX Security 2025

  27. arXiv:2407.06850  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall

    Lattice-tunable substituted iron garnets for low-temperature magnonics

    Authors: William Legrand, Yana Kemna, Stefan Schären, Hanchen Wang, Davit Petrosyan, Luise Siegl, Richard Schlitz, Michaela Lammel, Pietro Gambardella

    Abstract: The magnetic resonance of iron garnets is at the heart of rich physics and provides a crucial technology used inside microwave components. A barrier still stands in the way of integrated cryogenic devices with iron garnets, because their epitaxy requires paramagnetic substrates that cause large microwave losses at low temperatures. Finding alternative combinations of substrates and magnetic garnet… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Main: 18 pages, 6 figures; Supplement: 9 pages, 6 figures

  28. arXiv:2407.06841  [pdf, other

    cs.CV

    HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model

    Authors: Dunbin Shen, Xuanbing Zhu, Jiacheng Tian, Jianjun Liu, Zhenrong Du, Hongyu Wang, Xiaorui Ma

    Abstract: Hyperspectral target detection (HTD) identifies objects of interest from complex backgrounds at the pixel level, playing a vital role in Earth observation. However, HTD faces challenges due to limited prior knowledge and spectral variations, leading to underfitting models and unreliable performance. To address these challenges, this paper proposes an efficient self-supervised HTD method with a pyr… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 13 pages,6 figures, 5 tables

  29. arXiv:2407.06645  [pdf, other

    cs.LG cs.CL

    Entropy Law: The Story Behind Data Compression and LLM Performance

    Authors: Mingjia Yin, Chuhan Wu, Yufei Wang, Hao Wang, Wei Guo, Yasheng Wang, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen

    Abstract: Data is the cornerstone of large language models (LLMs), but not all data is useful for model learning. Carefully selected data can better elicit the capabilities of LLMs with much less computational overhead. Most methods concentrate on evaluating the quality of individual samples in data selection, while the combinatorial effects among samples are neglected. Even if each sample is of perfect qua… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  30. arXiv:2407.06573  [pdf, other

    cs.SE

    LLM for Mobile: An Initial Roadmap

    Authors: Daihang Chen, Yonghui Liu, Mingyi Zhou, Yanjie Zhao, Haoyu Wang, Shuai Wang, Xiao Chen, Tegawendé F. Bissyandé, Jacques Klein, Li Li

    Abstract: When mobile meets LLMs, mobile app users deserve to have more intelligent usage experiences. For this to happen, we argue that there is a strong need to appl LLMs for the mobile ecosystem. We therefore provide a research roadmap for guiding our fellow researchers to achieve that as a whole. In this roadmap, we sum up six directions that we believe are urgently required for research to enable nativ… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  31. arXiv:2407.06546  [pdf, other

    cs.CV cs.RO

    Exploring the Causality of End-to-End Autonomous Driving

    Authors: Jiankun Li, Hao Li, Jiangjiang Liu, Zhikang Zou, Xiaoqing Ye, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: Deep learning-based models are widely deployed in autonomous driving areas, especially the increasingly noticed end-to-end solutions. However, the black-box property of these models raises concerns about their trustworthiness and safety for autonomous driving, and how to debug the causality has become a pressing concern. Despite some existing research on the explainability of autonomous driving, t… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  32. arXiv:2407.06453  [pdf, ps, other

    math.RA

    Dual minus partial order

    Authors: Ju Gao, Hongxing Wang, Xiaoji Liu

    Abstract: In this paper, we introduce the Dual-minus partial order, get some characterizations of the partial order, and prove that both the dual star partial order and the dual sharp partial order are Dual-minus-type partial orders. Based on the Dual-minus partial order, we introduce the Dual-minus sharp partial order and the Dual-minus star partial order, which are also Dual-minus-type partial orders. In… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 23 pages

    MSC Class: 15A09; 15A24; 62G30

  33. arXiv:2407.06372  [pdf, other

    cs.LG cs.CV

    Non-Robust Features are Not Always Useful in One-Class Classification

    Authors: Matthew Lau, Haoran Wang, Alec Helbling, Matthew Hul, ShengYun Peng, Martin Andreoni, Willian T. Lunardi, Wenke Lee

    Abstract: The robustness of machine learning models has been questioned by the existence of adversarial examples. We examine the threat of adversarial examples in practical applications that require lightweight models for one-class classification. Building on Ilyas et al. (2019), we investigate the vulnerability of lightweight one-class classifiers to adversarial attacks and possible reasons for it. Our res… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR Visual and Anomaly Detection (VAND) Workshop 2024

    MSC Class: 68T45 ACM Class: I.2.10; I.4.10; I.5.4

  34. arXiv:2407.06187  [pdf, other

    cs.CV cs.GR

    JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

    Authors: Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming.… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR 24

  35. arXiv:2407.05924  [pdf, other

    cs.CV

    Graph-Boosted Attentive Network for Semantic Body Parsing

    Authors: Tinghuai Wang, Huiling Wang

    Abstract: Human body parsing remains a challenging problem in natural scenes due to multi-instance and inter-part semantic confusions as well as occlusions. This paper proposes a novel approach to decomposing multiple human bodies into semantic part regions in unconstrained environments. Specifically we propose a convolutional neural network (CNN) architecture which comprises of novel semantic and contour a… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.05916  [pdf, other

    cs.CV

    Non-parametric Contextual Relationship Learning for Semantic Video Object Segmentation

    Authors: Tinghuai Wang, Huiling Wang

    Abstract: We propose a novel approach for modeling semantic contextual relationships in videos. This graph-based model enables the learning and propagation of higher-level spatial-temporal contexts to facilitate the semantic labeling of local regions. We introduce an exemplar-based nonparametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarit… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  37. arXiv:2407.05813  [pdf, other

    hep-ex astro-ph.CO

    DarkSide-20k sensitivity to light dark matter particles

    Authors: DarkSide-20k Collaboration, :, F. Acerbi, P. Adhikari, P. Agnes, I. Ahmad, S. Albergo, I. F. M. Albuquerque, T. Alexander, A. K. Alton, P. Amaudruz, M. Angiolilli, E. Aprile, R. Ardito, M. Atzori Corona, D. J. Auty, M. Ave, I. C. Avetisov, O. Azzolini, H. O. Back, Z. Balmforth, A. Barrado Olmedo, P. Barrillon, G. Batignani, P. Bhowmick , et al. (289 additional authors not shown)

    Abstract: The dual-phase liquid argon time projection chamber is presently one of the leading technologies to search for dark matter particles with masses below 10 GeV/c$^2$. This was demonstrated by the DarkSide-50 experiment with approximately 50 kg of low-radioactivity liquid argon as target material. The next generation experiment DarkSide-20k, currently under construction, will use 1,000 times more arg… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: submitted to Nature Communications

  38. arXiv:2407.05679  [pdf, other

    cs.CV cs.AI

    BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

    Authors: Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

    Abstract: World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 10 pages

  39. arXiv:2407.05358  [pdf, other

    cs.CV

    CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

    Authors: Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro

    Abstract: Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  40. arXiv:2407.05352  [pdf, other

    cs.CV cs.MM

    Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

    Authors: Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these models have shown proficiency in classification and visual grounding tasks. However, existing approaches primarily showcase their ability to perform sentence-level localization, leaving the potential for leveraging contextual inform… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  41. arXiv:2407.05119  [pdf, other

    cs.CV

    Open-Event Procedure Planning in Instructional Videos

    Authors: Yilu Wu, Hanlin Wang, Jing Wang, Limin Wang

    Abstract: Given the current visual observations, the traditional procedure planning task in instructional videos requires a model to generate goal-directed plans within a given action space. All previous methods for this task conduct training and inference under the same action space, and they can only plan for pre-defined events in the training set. We argue this setting is not applicable for human assista… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 9 pages(main text), 6 figures, 10 tables

  42. arXiv:2407.05005  [pdf, other

    cs.LG cs.DC

    Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching

    Authors: Yichen Li, Wenchao Xu, Haozhao Wang, Ruixuan Li, Yining Qi, Jingcai Guo

    Abstract: This paper focuses on Federated Domain-Incremental Learning (FDIL) where each client continues to learn incremental tasks where their domain shifts from each other. We propose a novel adaptive knowledge matching-based personalized FDIL approach (pFedDIL) which allows each client to alternatively utilize appropriate incremental task learning strategy on the correlation with the knowledge from previ… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  43. arXiv:2407.04689  [pdf, other

    cs.RO cs.CV

    RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation

    Authors: Yuxuan Kuang, Junjie Ye, Haoran Geng, Jiageng Mao, Congyue Deng, Leonidas Guibas, He Wang, Yue Wang

    Abstract: This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments. Unlike existing approaches that learn manipulation from expensive in-domain demonstrations, RAM capitalizes on a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundan… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  44. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  45. arXiv:2407.04404  [pdf

    cs.AR

    Multi-Antenna Technology for 6G Integrated Sensing and Communication

    Authors: Yong Zeng, Zhenjun Dong, Huizhi Wang, Lipeng Zhu, Ziyao Hong, Qingji Jiang, Dongming Wang, Shi Jin, Rui Zhang

    Abstract: By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: in Chinese language

  46. arXiv:2407.04284  [pdf, other

    cs.MM

    TSC-PCAC: Voxel Transformer and Sparse Convolution Based Point Cloud Attribute Compression for 3D Broadcasting

    Authors: Zixi Guo, Yun Zhang, Linwei Zhu, Hanli Wang, Gangyi Jiang

    Abstract: Point cloud has been the mainstream representation for advanced 3D applications, such as virtual reality and augmented reality. However, the massive data amounts of point clouds is one of the most challenging issues for transmission and storage. In this paper, we propose an end-to-end voxel Transformer and Sparse Convolution based Point Cloud Attribute Compression (TSC-PCAC) for 3D broadcasting. F… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  47. arXiv:2407.04277  [pdf, other

    cs.CV

    Research, Applications and Prospects of Event-Based Pedestrian Detection: A Survey

    Authors: Han Wang, Yuman Nie, Yun Li, Hongjie Liu, Min Liu, Wen Cheng, Yaoxiong Wang

    Abstract: Event-based cameras, inspired by the biological retina, have evolved into cutting-edge sensors distinguished by their minimal power requirements, negligible latency, superior temporal resolution, and expansive dynamic range. At present, cameras used for pedestrian detection are mainly frame-based imaging sensors, which have suffered from lethargic response times and hefty data redundancy. In contr… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  48. arXiv:2407.04213  [pdf

    cs.CR cs.NI

    Pathfinder: Exploring Path Diversity for Assessing Internet Censorship Inconsistency

    Authors: Xiaoqin Liang, Guannan Liu, Lin Jin, Shuai Hao, Haining Wang

    Abstract: Internet censorship is typically enforced by authorities to achieve information control for a certain group of Internet users. So far existing censorship studies have primarily focused on country-level characterization because (1) in many cases, censorship is enabled by governments with nationwide policies and (2) it is usually hard to control how the probing packets are routed to trigger censorsh… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  49. arXiv:2407.04066  [pdf, ps, other

    cs.CV

    EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation

    Authors: Wanqi Yang, Haoran Wang, Lei Wang, Ge Song, Yang Gao

    Abstract: Few-shot unsupervised domain adaptation (FS-UDA) utilizes few-shot labeled source domain data to realize effective classification in unlabeled target domain. However, current FS-UDA methods are still suffer from two issues: 1) the data from different domains can not be effectively aligned by few-shot labeled data due to the large domain gaps, 2) it is unstable and time-consuming to generalize to n… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  50. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name