Skip to main content

Showing 1–50 of 1,006 results for author: Gu, J

  1. arXiv:2407.09393  [pdf, ps, other

    math.NA

    A Numerical Study of WENO Approximations to Sharp Propagating Fronts for Reaction-Diffusion Systems

    Authors: Jiaxi Gu, Daniel Olmos-Liceaga, Jae-Hun Jung

    Abstract: Many reaction-diffusion systems in various applications exhibit traveling wave solutions that evolve on multiple spatio-temporal scales. These traveling wave solutions are crucial for understanding the underlying dynamics of the system. In this work, we present sixth-order weighted essentially non-oscillatory (WENO) methods within the finite difference framework to solve reaction-diffusion systems… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.08127  [pdf, other

    cs.CV

    Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

    Authors: Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

    Abstract: Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unreal… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  3. arXiv:2407.06498  [pdf, other

    cs.HC

    Enhancing spatial auditory attention decoding with neuroscience-inspired prototype training

    Authors: Zelin Qiu, Jianjun Gu, Dingding Yao, Junfeng Li

    Abstract: The spatial auditory attention decoding (Sp-AAD) technology aims to determine the direction of auditory attention in multi-talker scenarios via neural recordings. Despite the success of recent Sp-AAD algorithms, their performance is hindered by trial-specific features in EEG data. This study aims to improve decoding performance against these features. Studies in neuroscience indicate that spatial… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  4. arXiv:2407.06333  [pdf, ps, other

    cs.LG cs.NE math.NA

    A third-order finite difference weighted essentially non-oscillatory scheme with shallow neural network

    Authors: Kwanghyuk Park, Xinjuan Chen, Dongjin Lee, Jiaxi Gu, Jae-Hun Jung

    Abstract: In this paper, we introduce the finite difference weighted essentially non-oscillatory (WENO) scheme based on the neural network for hyperbolic conservation laws. We employ the supervised learning and design two loss functions, one with the mean squared error and the other with the mean squared logarithmic error, where the WENO3-JS weights are computed as the labels. Each loss function consists of… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.05510  [pdf, other

    cs.AR cs.ET cs.LG

    SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

    Authors: Ziang Yin, Nicholas Gangi, Meng Zhang, Jeff Zhang, Rena Huang, Jiaqi Gu

    Abstract: Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  6. arXiv:2407.04181  [pdf, other

    cs.AI cs.CL

    Orchestrating LLMs with Different Personalizations

    Authors: Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q. Weinberger, Wen Sun

    Abstract: This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. St… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. arXiv:2406.19693  [pdf, other

    cs.RO cs.CV

    MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics?

    Authors: Jinming Li, Yichen Zhu, Zhiyuan Xu, Jindong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang

    Abstract: It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning. The recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated their exceptional abilities in solving complex mathematical problems, m… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.19680  [pdf, other

    cs.CV cs.AI cs.MM

    MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

    Authors: Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou

    Abstract: In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.19633  [pdf, other

    cs.SE

    Combating Missed Recalls in E-commerce Search: A CoT-Prompting Testing Approach

    Authors: Shengnan Wu, Yongxiang Hu, Yingchuan Wang, Jiazhen Gu, Jin Meng, Liujie Fan, Zhongshi Luan, Xin Wang, Yangfan Zhou

    Abstract: Search components in e-commerce apps, often complex AI-based systems, are prone to bugs that can lead to missed recalls - situations where items that should be listed in search results aren't. This can frustrate shop owners and harm the app's profitability. However, testing for missed recalls is challenging due to difficulties in generating user-aligned test cases and the absence of oracles. In th… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering (FSE Companion '24), July 15--19, 2024, Porto de Galinhas, Brazil

  10. arXiv:2406.18098  [pdf, other

    hep-th cond-mat.mes-hall math-ph

    Towards full instanton trans-series in Hofstadter's butterfly

    Authors: Jie Gu, Zhaojie Xu

    Abstract: The trans-series completion of perturbative series of a wide class of quantum mechanical systems can be determined by combining the resurgence program and extra input coming from exact WKB analysis. In this paper, we reexamine the Harper-Hofstadter model and its spectrum, Hofstadter's butterfly, in light of recent developments. We demonstrate the connection between the perturbative energy series o… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 43 pages, 25 figures, 7 tables

  11. arXiv:2406.17810  [pdf, other

    physics.comp-ph cs.AI physics.optics

    PIC2O-Sim: A Physics-Inspired Causality-Aware Dynamic Convolutional Neural Operator for Ultra-Fast Photonic Device FDTD Simulation

    Authors: Pingchuan Ma, Haoyu Yang, Zhengqi Gao, Duane S. Boning, Jiaqi Gu

    Abstract: The finite-difference time-domain (FDTD) method, which is important in photonic hardware design flow, is widely adopted to solve time-domain Maxwell equations. However, FDTD is known for its prohibitive runtime cost, taking minutes to hours to simulate a single device. Recently, AI has been applied to realize orders-of-magnitude speedup in partial differential equation (PDE) solving. However, AI-b… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.17741  [pdf, other

    cs.CV cs.AI

    Point-SAM: Promptable 3D Segmentation Model for Point Clouds

    Authors: Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su

    Abstract: The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.17000  [pdf, ps, other

    astro-ph.IM astro-ph.CO

    Forecast measurement of the 21 cm global spectrum from Lunar orbit with the Vari-Zeroth-Order Polynomial (VZOP) method

    Authors: Tianyang Liu, Jiajun Zhang, Yuan Shi, Junhua Gu, Quan Guo, Yidong Xu, Furen Deng, Fengquan Wu, Yanping Cong, Xuelei Chen

    Abstract: The cosmic 21 cm signal serves as a crucial probe for studying the evolutionary history of the Universe. However, detecting the 21 cm signal poses significant challenges due to its extremely faint nature. To mitigate the interference from the Earth's radio frequency interference (RFI), the ground and the ionospheric effects, the Discovering the Sky at the Longest Wavelength (DSL) project will depl… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 15 pages, 17 figures, to be submitted to SCPMA

  14. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  15. arXiv:2406.14282  [pdf, other

    cs.CL cs.AI

    Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

    Authors: Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, Huajun Chen

    Abstract: Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress

  16. arXiv:2406.14036  [pdf, other

    cs.LG cs.AI cs.CL

    Toward Infinite-Long Prefix in Transformer

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

    Abstract: Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning fro… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  17. arXiv:2406.14014  [pdf, ps, other

    cs.LG cs.AI

    Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition

    Authors: Yimin Zhao, Jin Gu

    Abstract: An objective and accurate emotion diagnostic reference is vital to psychologists, especially when dealing with patients who are difficult to communicate with for pathological reasons. Nevertheless, current systems based on Electroencephalography (EEG) data utilized for sentiment discrimination have some problems, including excessive model complexity, mediocre accuracy, and limited interpretability… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The work has been accepted by MICCAI 2024. The uploaded one is preprint which has not undergone peer review (when applicable) or any post-submission improvements or corrections. The official DOI link will be provided once available

  18. FoRAG: Factuality-optimized Retrieval Augmented Generation for Web-enhanced Long-form Question Answering

    Authors: Tianchi Cai, Zhiwen Tan, Xierui Song, Tao Sun, Jiyan Jiang, Yunqi Xu, Yinger Zhang, Jinjie Gu

    Abstract: Retrieval Augmented Generation (RAG) has become prevalent in question-answering (QA) tasks due to its ability of utilizing search engine to enhance the quality of long-form question-answering (LFQA). Despite the emergence of various open source methods and web-enhanced commercial systems such as Bing Chat, two critical problems remain unsolved, i.e., the lack of factuality and clear logic in the g… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Report number: 30th

    Journal ref: KDD 2024

  19. arXiv:2406.13692  [pdf, other

    cs.CL

    Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

    Authors: Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

    Abstract: Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decodin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  20. arXiv:2406.12831  [pdf, other

    cs.CV cs.AI cs.MM

    VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

    Authors: Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang

    Abstract: Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  21. arXiv:2406.12640  [pdf

    cs.LG

    Research and Implementation of Data Enhancement Techniques for Graph Neural Networks

    Authors: Jingzhao Gu, Haoyang Huang

    Abstract: Data, algorithms, and arithmetic power are the three foundational conditions for deep learning to be effective in the application domain. Data is the focus for developing deep learning algorithms. In practical engineering applications, some data are affected by the conditions under which more data cannot be obtained or the cost of obtaining data is too high, resulting in smaller data sets (general… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 7 figures, to be published in IEEE International Conference on Artificial Intelligence and Electromechanical Automation

  22. arXiv:2406.12457  [pdf, other

    econ.TH

    Data Trade and Consumer Privacy

    Authors: Jiadong Gu

    Abstract: This paper studies optimal mechanisms for collecting and trading data. Consumers benefit from revealing information about their tastes to a service provider because this improves the service. However, the information is also valuable to a third party as it may extract more revenue from the consumer in another market called the product market. The paper characterizes the constrained optimal mechani… ▽ More

    Submitted 6 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  23. arXiv:2406.12044  [pdf, other

    cs.CV

    ARTIST: Improving the Generation of Text-rich Images by Disentanglement

    Authors: Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang

    Abstract: Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image. To address these shortcomings, we introduce a new framework named ARTIST. This framework incorporates a dedicated textual diffusio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.11753  [pdf, other

    cs.CL cs.LG

    A Semantic-based Layer Freezing Approach to Efficient Fine-Tuning of Language Models

    Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang

    Abstract: Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on \textit{how to finetune} but neglects the issue of \textit{where to finetune}. As a pioneering work on answering where to finetune (at the layer level), we conduct a semantic anal… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 13 pages, 5 figures, under peer-review

  25. arXiv:2406.10079  [pdf, other

    cs.CV cs.AI

    Localizing Events in Videos with Multimodal Queries

    Authors: Gengyuan Zhang, Mang Ling Ada Fok, Yan Xia, Yansong Tang, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu

    Abstract: Video understanding is a pivotal task in the digital era, yet the dynamic and multievent nature of videos makes them labor-intensive and computationally demanding to process. Thus, localizing a specific event given a semantic query has gained importance in both user-oriented applications like video search and academic research into video foundation models. A significant limitation in current resea… ▽ More

    Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages; fix some typos

  26. Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in VR Apps

    Authors: Shuqing Li, Cuiyun Gao, Jianping Zhang, Yujia Zhang, Yepang Liu, Jiazhen Gu, Yun Peng, Michael R. Lyu

    Abstract: The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: This work has been accepted at the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. DOI: https://doi.org/10.1145/3660803

  27. arXiv:2406.09305  [pdf, other

    cs.CV

    Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun

    Abstract: In subject-driven text-to-image generation, recent works have achieved superior performance by training the model on synthetic datasets containing numerous image pairs. Trained on these datasets, generative models can produce text-aligned images for specific subject from arbitrary testing image in a zero-shot manner. They even outperform methods which require additional fine-tuning on testing imag… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  28. arXiv:2406.08354  [pdf, other

    cs.CV cs.AI cs.LG

    DocSynthv2: A Practical Autoregressive Modeling for Document Generation

    Authors: Sanket Biswas, Rajiv Jain, Vlad I. Morariu, Jiuxiang Gu, Puneet Mathur, Curtis Wigington, Tong Sun, Josep Lladós

    Abstract: While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both la… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Spotlight (Oral) Acceptance to CVPR 2024 Workshop for Graphic Design Understanding and Generation (GDUG)

  29. arXiv:2406.08090  [pdf, other

    cs.CV

    From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

    Authors: Ziran Zhang, Yongrui Ma, Yueting Chen, Feng Zhang, Jinwei Gu, Tianfan Xue, Shi Guo

    Abstract: Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  30. arXiv:2406.07091  [pdf, other

    cs.CV

    AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

    Authors: Xing Zhang, Jiaxi Gu, Haoyu Zhao, Shicong Wang, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description. Since the annotation of TVG is labor-intensive, TVG under limited supervision has accepted attention in recent years. The great success of vision-language pre-training guides TVG to follow the traditional "pre-training + fine-tuning" paradigm, however, the pre-training process would suf… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Technique Report

  31. arXiv:2406.06730  [pdf, other

    cs.CV cs.AI

    TRINS: Towards Multimodal Language Models that Can Read

    Authors: Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun

    Abstract: Large multimodal language models have shown remarkable proficiency in understanding and editing images. However, a majority of these visually-tuned models struggle to comprehend the textual content embedded in images, primarily due to the limitation of training data. In this work, we introduce TRINS: a Text-Rich image INStruction dataset, with the objective of enhancing the reading ability of the… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  32. arXiv:2406.05090  [pdf, other

    cs.LG cs.AI cs.CV

    Provably Better Explanations with Optimized Aggregation of Feature Attributions

    Authors: Thomas Decker, Ananta R. Bhattarai, Jindong Gu, Volker Tresp, Florian Buettner

    Abstract: Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models. Despite the numerous techniques available, individual methods often produce inconsistent and unstable results, putting their overall reliability into question. In this work, we aim to systematically improve the quality of feature attributions by comb… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: International Conference on Machine Learning (ICML) 2024

  33. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi Jin, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  34. arXiv:2406.04129  [pdf, other

    cs.CV

    LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification

    Authors: Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue

    Abstract: Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopt… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  35. arXiv:2406.03712  [pdf, other

    cs.CL cs.LG

    A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions

    Authors: Lei Liu, Xiaoyan Yang, Junchi Lei, Xiaoyang Liu, Yue Shen, Zhiqiang Zhang, Peng Wei, Jinjie Gu, Zhixuan Chu, Zhan Qin, Kui Ren

    Abstract: Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a compreh… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  36. arXiv:2406.03303  [pdf, other

    cs.CV

    Learning Visual Prompts for Guiding the Attention of Vision Transformers

    Authors: Razieh Rezaei, Masoud Jalili Sabet, Jindong Gu, Daniel Rueckert, Philip Torr, Ashkan Khakzar

    Abstract: Visual prompting infuses visual information into the input image to adapt models toward specific predictions and tasks. Recently, manually crafted markers such as red circles are shown to guide the model to attend to a target region on the image. However, these markers only work on models trained with data containing those markers. Moreover, finding these prompts requires guesswork or prior knowle… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Short version (4-pages) accepted as a spotlight paper at T4V workshop, CVPR 2024

  37. arXiv:2406.02014  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer

    Authors: Wanli Ma, Xuegang Tang, Jin Gu, Ying Wang, Yuling Xia

    Abstract: In the fields of brain-computer interaction and cognitive neuroscience, effective decoding of auditory signals from task-based functional magnetic resonance imaging (fMRI) is key to understanding how the brain processes complex auditory information. Although existing methods have enhanced decoding capabilities, limitations remain in information utilization and model representation. To overcome the… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  38. arXiv:2406.01003  [pdf, other

    cs.CV

    Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

    Authors: Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu

    Abstract: Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  39. arXiv:2406.00633  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Improving GFlowNets for Text-to-Image Diffusion Alignment

    Authors: Dinghuai Zhang, Yizhe Zhang, Jiatao Gu, Ruixiang Zhang, Josh Susskind, Navdeep Jaitly, Shuangfei Zhai

    Abstract: Diffusion models have become the de-facto approach for generating visual data, which are trained to match the distribution of the training dataset. In addition, we also want to control generation to fulfill desired properties such as alignment to a text description, which can be specified with a black-box reward function. Prior works fine-tune pretrained diffusion models to achieve this goal throu… ▽ More

    Submitted 16 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  40. arXiv:2405.21048  [pdf, other

    cs.CV

    Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

    Authors: Jiatao Gu, Ying Shen, Shuangfei Zhai, Yizhe Zhang, Navdeep Jaitly, Joshua M. Susskind

    Abstract: Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To address this issue, we present Kaleido, a novel approach that enhances the diversity of samples by incorporating autoregr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 22 pages, 14 figures

  41. arXiv:2405.21018  [pdf, other

    cs.LG cs.CL cs.CR

    Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

    Authors: Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin

    Abstract: Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milesto… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  42. arXiv:2405.20640  [pdf, other

    cs.LG cs.SI

    Heterophilous Distribution Propagation for Graph Neural Networks

    Authors: Zhuonan Zheng, Sheng Zhou, Hongjia Xu, Ming Gu, Yilun Xu, Ao Li, Yuhong Li, Jingjun Gu, Jiajun Bu

    Abstract: Graph Neural Networks (GNNs) have achieved remarkable success in various graph mining tasks by aggregating information from neighborhoods for representation learning. The success relies on the homophily assumption that nearby nodes exhibit similar behaviors, while it may be violated in many real-world graphs. Recently, heterophilous graph neural networks (HeterGNNs) have attracted increasing atten… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  43. arXiv:2405.20584  [pdf, other

    cs.CV cs.AI

    Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization

    Authors: Yisu Liu, Jinyang An, Wanqian Zhang, Dayan Wu, Jingzi Gu, Zheng Lin, Weiping Wang

    Abstract: With the development of diffusion-based customization methods like DreamBooth, individuals now have access to train the models that can generate their personalized images. Despite the convenience, malicious users have misused these techniques to create fake images, thereby triggering a privacy security crisis. In light of this, proactive adversarial attacks are proposed to protect users against cu… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Under review

    ACM Class: I.2.10

  44. arXiv:2405.20327  [pdf, other

    cs.CV

    GECO: Generative Image-to-3D within a SECOnd

    Authors: Chen Wang, Jiatao Gu, Xiaoxiao Long, Yuan Liu, Lingjie Liu

    Abstract: 3D generation has seen remarkable progress in recent years. Existing techniques, such as score distillation methods, produce notable results but require extensive per-scene optimization, impacting time efficiency. Alternatively, reconstruction-based approaches prioritize efficiency but compromise quality due to their limited handling of uncertainty. We introduce GECO, a novel method for high-quali… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project Page: https://cwchenwang.github.io/geco

  45. arXiv:2405.20090  [pdf, other

    cs.CV

    Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

    Authors: Hao Cheng, Erjia Xiao, Jiahang Cao, Le Yang, Kaidi Xu, Jindong Gu, Renjing Xu

    Abstract: Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by on… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  46. arXiv:2405.19893  [pdf, other

    cs.LG cs.AI cs.CL

    Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts

    Authors: Chunjing Gan, Dan Yang, Binbin Hu, Hanxiao Zhang, Siyuan Li, Ziqi Liu, Yue Shen, Lin Ju, Zhiqiang Zhang, Jinjie Gu, Lei Liang, Jun Zhou

    Abstract: In recent years, large language models (LLMs) have made remarkable achievements in various domains. However, the untimeliness and cost of knowledge updates coupled with hallucination issues of LLMs have curtailed their applications in knowledge intensive tasks, where retrieval augmented generation (RAG) can be of help. Nevertheless, existing retrieval augmented models typically use similarity as a… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 12 pages

  47. arXiv:2405.18971  [pdf, other

    cs.IR

    Mitigate Position Bias with Coupled Ranking Bias on CTR Prediction

    Authors: Yao Zhao, Zhining Liu, Tianchi Cai, Haipeng Zhang, Chenyi Zhuang, Jinjie Gu

    Abstract: Position bias, i.e., users' preference of an item is affected by its placing position, is well studied in the recommender system literature. However, most existing methods ignore the widely coupled ranking bias, which is also related to the placing position of the item. Using both synthetic and industrial datasets, we first show how this widely coexisted ranking bias deteriorates the performance o… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures

  48. arXiv:2405.18842  [pdf, other

    cs.CV

    Descriptive Image Quality Assessment in the Wild

    Authors: Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

    Abstract: With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-wor… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  49. arXiv:2405.16821  [pdf, other

    cs.CL

    Perturbation-Restrained Sequential Model Editing

    Authors: Jun-Yu Ma, Hong Wang, Hao-Xiang Xu, Zhen-Hua Ling, Jia-Chen Gu

    Abstract: Model editing is an emerging field that focuses on updating the knowledge embedded within large language models (LLMs) without extensive retraining. However, current model editing methods significantly compromise the general abilities of LLMs as the number of edits increases, and this trade-off poses a substantial challenge to the continual learning of LLMs. In this paper, we first theoretically a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  50. arXiv:2405.16418  [pdf, other

    cs.LG cs.AI cs.CV

    Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

    Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

    Abstract: Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this paper, we bridge this gap by providing a detailed examination of these smoothness properties for the case where the target data distribution is a mixtur… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.