Skip to main content

Showing 1–50 of 523 results for author: Gong, Z

  1. arXiv:2407.09050  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Refusing Safe Prompts for Multi-modal Large Language Models

    Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.07221  [pdf, other

    cs.CV cs.CR

    Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

    Authors: Yuqi Jia, Minghong Fang, Hongbin Liu, Jinghuai Zhang, Neil Zhenqiang Gong

    Abstract: Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.06502  [pdf, other

    eess.SP

    Fast Signal Interpolation Through Zero-padding and FFT/IFFT

    Authors: Zijun Gong

    Abstract: Based on the sampling theorem, interpolation should be conducted by employing the sinc functions as the kernels. Inspired by the fact that the discrete Fourier transform (DFT) is sampled from the discrete time Fourier transform, a fast signal interpolation algorithm based on zero-padding and fast Fourier transform (FFT) and inverse FFT (IFFT) is presented. This algorithm gives a good approximate o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.04086  [pdf, other

    cs.CR cs.CV cs.LG

    Certifiably Robust Image Watermark

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against rem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  6. arXiv:2407.01505  [pdf, other

    cs.CL cs.AI

    Self-Cognition in Large Language Models: An Exploratory Study

    Authors: Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Large Language Models and Cognition Workshop

  7. arXiv:2407.01026  [pdf, other

    cs.CL cs.AI

    Augmenting Document-level Relation Extraction with Efficient Multi-Supervision

    Authors: Xiangyu Lin, Weijia Jia, Zhiguo Gong

    Abstract: Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilizati… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  9. arXiv:2406.14056  [pdf, other

    cs.CV

    VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

    Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

    Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages

    MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

  10. arXiv:2406.12723  [pdf, other

    cs.LG

    BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

    Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

    Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  11. arXiv:2406.11431  [pdf, other

    cs.CL cs.AI

    Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

    Authors: Wenkai Yang, Shiqi Shen, Guangyao Shen, Zhi Gong, Yankai Lin

    Abstract: Superalignment, where humans are weak supervisors of superhuman models, has become an important and widely discussed issue in the current era of rapid development of Large Language Models (LLMs). The recent work preliminarily studies this problem by using weak models to supervise strong models. It discovers that weakly supervised strong students can consistently outperform weak teachers towards th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/keven980716/weak-to-strong-deception

  12. arXiv:2406.11409  [pdf, other

    cs.CL cs.AI

    CodeGemma: Open Code Models Based on Gemma

    Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, Jingyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

    Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

  13. arXiv:2406.08187  [pdf, other

    cs.RO

    Learning-based Traversability Costmap for Autonomous Off-road Navigation

    Authors: Qiumin Zhu, Zhen Sun, Songpengcheng Xia, Guoqing Liu, Kehui Ma, Ling Pei, Zheng Gong

    Abstract: Traversability estimation in off-road terrains is an essential procedure for autonomous navigation. However, creating reliable labels for complex interactions between the robot and the surface is still a challenging problem in learning-based costmap generation. To address this, we propose a method that predicts traversability costmaps by leveraging both visual and geometric information of the envi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.05493  [pdf, other

    physics.plasm-ph

    Highly Polarized Energetic Electrons via Intense Laser-Irradiated Tailored Targets

    Authors: Xiaofei Shen, Zheng Gong, Karen Z. Hatsagortsyan, Christoph H. Keitel

    Abstract: A method for the generation of ultrarelativistic electron beams with high spin polarization is put forward, where a tightly-focused linearly-polarized ultraintense laser pulse interacts with a nonprepolarized transverse-size-tailored solid target. The radiative spin polarization and angular separation is facilitated by the standing wave formed via the incident and reflected laser pulses at the ove… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  16. arXiv:2406.02252  [pdf, other

    cs.DC

    Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale

    Authors: Jinghan Sun, Zibo Gong, Anup Agarwal, Shadi Noghabi, Ranveer Chandra, Marc Snir, Jian Huang

    Abstract: Modular data centers (MDCs) that can be placed right at the energy farms and powered mostly by renewable energy, are proven to be a flexible and effective approach to lowering the carbon footprint of data centers. However, the main challenge of using renewable energy is the high variability of power produced, which implies large volatility in powering computing resources at MDCs, and degraded appl… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  17. arXiv:2406.01460  [pdf, other

    cs.CV cs.AI

    MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

    Authors: Yu Zhang, Qi Zhang, Zixuan Gong, Yiwei Shi, Yepeng Liu, Duoqian Miao, Yang Liu, Ke Liu, Kun Yi, Wei Fan, Liang Hu, Changwei Wang

    Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer s… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  18. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  19. arXiv:2405.17537  [pdf, other

    cs.AI cs.CL cs.CV

    BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

    Authors: ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang

    Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 16 pages with 9 figures

  20. arXiv:2405.10029  [pdf, other

    cs.MM

    AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

    Authors: Ziyu Gong, Chengcheng Mai, Yihua Huang

    Abstract: The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, w… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024

  21. arXiv:2405.05784  [pdf, other

    cs.CR cs.LG

    Link Stealing Attacks Against Inductive Graph Neural Networks

    Authors: Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang

    Abstract: A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained mo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear in the 24th Privacy Enhancing Technologies Symposium (PETS 2024), July 15-20, 2024

  22. arXiv:2405.02690  [pdf, other

    physics.plasm-ph physics.acc-ph physics.app-ph physics.optics

    Laser wakefield acceleration of ions with a transverse flying focus

    Authors: Zheng Gong, Sida Cao, John P. Palastro, Matthew R. Edwards

    Abstract: The extreme electric fields created in high-intensity laser-plasma interactions could generate energetic ions far more compactly than traditional accelerators. Despite this promise, laser-plasma accelerators have remained stagnant at maximum ion energies of 100 MeV/nucleon for the last twenty years. The central challenge is the low charge-to-mass ratio of ions, which has precluded one of the most… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures

  23. arXiv:2405.01283  [pdf, ps, other

    math.CA

    Fractional Bloom boundedness of commutators in spaces of homogeneous type

    Authors: Zhenbing Gong, Ji Li, Jaakko Sinko

    Abstract: We aim to characterise boundedness of commutators $[b,T]$ of singular integrals $T$. Boundedness is studied between weighted Lebesgue spaces $L^p(X)$ and $L^q(X)$, $p\leq q$, when the underlying space $X$ is a space of homogeneous type. Commutator theory in spaces of homogeneous type already exist in literature, in particular boundedness results in the setting $p=q$. The purpose here is to extend… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 30 pages; Main results unchanged. Fixed an oversight with the weight constants in the final theorem. This resulted from the use of the opposite non-degeneracy condition in section 4.2 when compared to the rest of the paper. Added Proposition 4.12 to move the argument between the two conditions. Added a reference and corrected typos

  24. arXiv:2404.17839  [pdf, other

    cs.CR cs.SE

    Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection

    Authors: Yizhou Chen, Zeyu Sun, Zhihao Gong, Dan Hao

    Abstract: Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disr… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE '24)

  25. arXiv:2404.17698  [pdf, other

    cs.HC

    "Actually I Can Count My Blessings": User-Centered Design of an Application to Promote Gratitude Among Young Adults

    Authors: Ananya Bhattacharjee, Zichen Gong, Bingcheng Wang, Timothy James Luckcock, Emma Watson, Elena Allica Abellan, Leslie Gutman, Anne Hsu, Joseph Jay Williams

    Abstract: Regular practice of gratitude has the potential to enhance psychological wellbeing and foster stronger social connections among young adults. However, there is a lack of research investigating user needs and expectations regarding gratitude-promoting applications. To address this gap, we employed a user-centered design approach to develop a mobile application that facilitates gratitude practice. O… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  26. arXiv:2404.17031  [pdf, other

    cs.CV

    Motor Focus: Ego-Motion Prediction with All-Pixel Matching

    Authors: Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao Gong, Abolfazl Razi

    Abstract: Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  27. arXiv:2404.15611  [pdf, other

    cs.CR

    Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

    Authors: Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises fro… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  28. arXiv:2404.13282  [pdf, other

    cs.CV cs.MM

    Wills Aligner: A Robust Multi-Subject Brain Representation Learner

    Authors: Guangyin Bao, Zixuan Gong, Qi Zhang, Jialei Zhou, Wei Fan, Kun Yi, Usman Naseem, Liang Hu, Duoqian Miao

    Abstract: Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, due to the significant variability in cortical parcellation and cognition patterns across subjects, current approaches personalized deep models for each subject, constraining the practicality of this technology in real-world contexts. To tackle the challenges, we introduce Wills Alig… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 15 pages

  29. arXiv:2404.12630  [pdf, other

    cs.CV cs.MM

    MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

    Authors: Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Ke Liu, Liang Hu, Duoqian Miao

    Abstract: Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decod… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 14 pages

  30. arXiv:2404.12022  [pdf, other

    cs.CL

    Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

    Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  31. arXiv:2404.10211  [pdf, other

    cs.LG cs.AI

    Anomaly Correction of Business Processes Using Transformer Autoencoder

    Authors: Ziyou Gong, Xianwen Fang, Ping Wu

    Abstract: Event log records all events that occur during the execution of business processes, so detecting and correcting anomalies in event log can provide reliable guarantee for subsequent process analysis. The previous works mainly include next event prediction based methods and autoencoder-based methods. These methods cannot accurately and efficiently detect anomalies and correct anomalies at the same t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  32. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  33. arXiv:2404.07431  [pdf, other

    cs.RO eess.SY

    Parameterized Fast and Safe Tracking (FaSTrack) using Deepreach

    Authors: Hyun Joe Jeong, Zheng Gong, Somil Bansal, Sylvia Herbert

    Abstract: Fast and Safe Tracking (FaSTrack) is a modular framework that provides safety guarantees while planning and executing trajectories in real time via value functions of Hamilton-Jacobi (HJ) reachability. These value functions are computed through dynamic programming, which is notorious for being computationally inefficient. Moreover, the resulting trajectory does not adapt online to the environment,… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures, 1 table, to be published in L4DC

  34. arXiv:2404.05403  [pdf, other

    cs.CR cs.AI

    SoK: Gradient Leakage in Federated Learning

    Authors: Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

    Abstract: Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  35. arXiv:2404.04254  [pdf, other

    cs.CR cs.AI cs.CL cs.CV cs.LG

    Watermark-based Detection and Attribution of AI-Generated Content

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Several companies--such as Google, Microsoft, and OpenAI--have deployed techniques to watermark AI-generated content to enable proactive detection. However, existing literature mainly focuses on user-agnostic detection. Attribution aims to further trace back the user of a generative-AI service who generated a given content detected as AI-generated. Despite its growing importance, attribution is la… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  36. arXiv:2404.02472  [pdf, other

    cs.RO eess.SY

    Safe Returning FaSTrack with Robust Control Lyapunov-Value Functions

    Authors: Zheng Gong, Boyang Li, Sylvia Herbert

    Abstract: Real-time navigation in a priori unknown environment remains a challenging task, especially when an unexpected (unmodeled) disturbance occurs. In this paper, we propose the framework Safe Returning Fast and Safe Tracking (SR-F) that merges concepts from 1) Robust Control Lyapunov-Value Functions (R-CLVF), and 2) the Fast and Safe Tracking (FaSTrack) framework. The SR-F computes an R-CLVF offline b… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 6 pages, 4 figures, 1 table, 2 algorithms. Submitted to LCSS on 03/06

  37. arXiv:2404.01829  [pdf, other

    math.OC eess.SY

    Synthesizing Control Lyapunov-Value Functions for High-Dimensional Systems Using System Decomposition and Admissible Control Sets

    Authors: Zheng Gong, Hyun Joe Jeong, Sylvia Herbert

    Abstract: Control Lyapunov functions (CLFs) play a vital role in modern control applications, but finding them remains a problem. Recently, the control Lyapunov-value function (CLVF) and robust CLVF have been proposed as solutions for nonlinear time-invariant systems with bounded control and disturbance. However, the CLVF suffers from the ''curse of dimensionality,'' which hinders its application to practic… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures, submitted to 63rd Conference on Decision and Control

  38. arXiv:2403.17710  [pdf, other

    cs.CR cs.AI

    Optimization-based Prompt Injection Attack to LLM-as-a-Judge

    Authors: Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

    Abstract: LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs). Based on existing research studies, LLMs demonstrate remarkable performance in providing a compelling alternative to traditional human assessment. However, the robustness of these systems against prompt injection attacks remains an open question. In this work, we introduce JudgeDeceiver, a nov… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  39. arXiv:2403.17369  [pdf, other

    cs.CV

    CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

    Authors: Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xiangwei Zhu, Zhenming Ji

    Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these disc… ▽ More

    Submitted 4 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  40. arXiv:2403.12415  [pdf, other

    cs.CV cs.HC

    VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

    Authors: Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, John Suchanek, Zihao Gong, Abolfazl Razi

    Abstract: This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  41. arXiv:2403.11482  [pdf, other

    cs.LG physics.geo-ph

    SeisFusion: Constrained Diffusion Model with Input Guidance for 3D Seismic Data Interpolation and Reconstruction

    Authors: Shuang Wang, Fei Deng, Peifan Jiang, Zishan Gong, Xiaolin Wei, Yuqing Wang

    Abstract: Geographical, physical, or economic constraints often result in missing traces within seismic data, making the reconstruction of complete seismic data a crucial step in seismic data processing. Traditional methods for seismic data reconstruction require the selection of multiple empirical parameters and struggle to handle large-scale continuous missing data. With the development of deep learning,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  42. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  43. arXiv:2403.06408  [pdf, other

    cs.LG cs.AI

    What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

    Authors: Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  44. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  45. arXiv:2403.05318  [pdf, other

    cs.AI cs.LG

    Looking Ahead to Avoid Being Late: Solving Hard-Constrained Traveling Salesman Problem

    Authors: Jingxiao Chen, Ziqin Gong, Minghuan Liu, Jun Wang, Yong Yu, Weinan Zhang

    Abstract: Many real-world problems can be formulated as a constrained Traveling Salesman Problem (TSP). However, the constraints are always complex and numerous, making the TSPs challenging to solve. When the number of complicated constraints grows, it is time-consuming for traditional heuristic algorithms to avoid illegitimate outcomes. Learning-based methods provide an alternative to solve TSPs in a soft… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  46. arXiv:2403.03455  [pdf, other

    math.OC eess.SY

    Robust Control Lyapunov-Value Functions for Nonlinear Disturbed Systems

    Authors: Zheng Gong, Sylvia Herbert

    Abstract: Control Lyapunov Functions (CLFs) have been extensively used in the control community. A well-known drawback is the absence of a systematic way to construct CLFs for general nonlinear systems, and the problem can become more complex with input or state constraints. Our preliminary work on constructing Control Lyapunov Value Functions (CLVFs) using Hamilton-Jacobi (HJ) reachability analysis provide… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 13 pages, 5 figures

  47. arXiv:2403.03149  [pdf, other

    cs.CR cs.DC cs.LG

    Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

    Authors: Yichang Xu, Ming Yin, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or… ▽ More

    Submitted 4 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: To appear in The Web Conference 2024 (WWW '24)

  48. arXiv:2403.03038  [pdf, other

    cond-mat.mes-hall cond-mat.stat-mech nlin.CD

    Transition from topological to chaos in the nonlinear Su-Schrieffer-Heeger model

    Authors: Kazuki Sone, Motohiko Ezawa, Zongping Gong, Taro Sawada, Nobuyuki Yoshioka, Takahiro Sagawa

    Abstract: Recent studies on topological materials are expanding into the nonlinear regime, while the central principle, namely the bulk-edge correspondence, is yet to be elucidated in the strongly nonlinear regime. Here, we reveal that nonlinear topological edge modes can exhibit the transition to spatial chaos by increasing nonlinearity, which can be a universal mechanism of the breakdown of the bulk-edge… ▽ More

    Submitted 26 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 8+9 pages, 5+4 figures

  49. arXiv:2403.02607  [pdf

    cs.GT cs.AI

    MEBS: Multi-task End-to-end Bid Shading for Multi-slot Display Advertising

    Authors: Zhen Gong, Lvyin Niu, Yang Zhao, Miao Xu, Zhenzhe Zheng, Haoqi Zhang, Zhilin Zhang, Fan Wu, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

    Abstract: Online bidding and auction are crucial aspects of the online advertising industry. Conventionally, there is only one slot for ad display and most current studies focus on it. Nowadays, multi-slot display advertising is gradually becoming popular where many ads could be displayed in a list and shown as a whole to users. However, multi-slot display advertising leads to different cost-effectiveness.… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  50. arXiv:2403.00350  [pdf, other

    physics.flu-dyn

    Eckart streaming with nonlinear high-order harmonics: an example at gigahertz

    Authors: Shiyu Li, Weiwei Cui, Thierry Baasch, Bin Wang, Zhixiong Gong

    Abstract: Acoustic streaming shows great potential in applications such as bubble dynamics, cell aggregation, and nano-sized particle isolation in the biomedical and drug industries. As the acoustic shock distance decreases with the increase of incident frequency, the nonlinear propagation effect will play a role in acoustic streaming, e.g., Eckart (bulk) streaming at a few gigahertz (GHz). However, the the… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 11 pages, 7 figures