Skip to main content

Showing 1–50 of 239 results for author: Gu, S

  1. arXiv:2407.10702  [pdf, ps, other

    cs.LG

    Geometric Analysis of Unconstrained Feature Models with $d=K$

    Authors: Shao Gu, Yi Shen

    Abstract: Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimu… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.06109  [pdf, other

    cs.CV

    PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models

    Authors: Jinhua Zhang, Hualian Sheng, Sijia Cai, Bing Deng, Qiao Liang, Wen Li, Ying Fu, Jieping Ye, Shuhang Gu

    Abstract: Controllable generation is considered a potentially vital approach to address the challenge of annotating 3D data, and the precision of such controllable generation becomes particularly imperative in the context of data production for autonomous driving. Existing methods focus on the integration of diverse generative information into controlling inputs, utilizing frameworks such as GLIGEN or Contr… ▽ More

    Submitted 16 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  3. arXiv:2407.03297  [pdf, other

    cs.CV cs.AI

    Improved Noise Schedule for Diffusion Training

    Authors: Tiankai Hang, Shuyang Gu

    Abstract: Diffusion models have emerged as the de facto choice for generating visual signals. However, training a single model to predict noise across various levels poses significant challenges, necessitating numerous iterations and incurring significant computational costs. Various approaches, such as loss weighting strategy design and architectural refinements, have been introduced to expedite convergenc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  4. arXiv:2407.03152  [pdf, other

    cs.CV cs.LG

    Stereo Risk: A Continuous Modeling Approach to Stereo Matching

    Authors: Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Yao Yao, Luc Van Gool

    Abstract: We introduce Stereo Risk, a new deep-learning approach to solve the classical stereo-matching problem in computer vision. As it is well-known that stereo matching boils down to a per-pixel disparity estimation problem, the popular state-of-the-art stereo-matching approaches widely rely on regressing the scene disparity values, yet via discretization of scene disparity values. Such discretization o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted as an Oral Paper at ICML 2024. Draft info: 18 pages, 6 Figure, 16 Tables

  5. arXiv:2407.01648  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization

    Authors: Siyi Gu, Minkai Xu, Alexander Powers, Weili Nie, Tomas Geffner, Karsten Kreis, Jure Leskovec, Arash Vahdat, Stefano Ermon

    Abstract: Generating ligand molecules for specific protein targets, known as structure-based drug design, is a fundamental problem in therapeutics development and biological discovery. Recently, target-aware generative models, especially diffusion models, have shown great promise in modeling protein-ligand interactions and generating candidate drugs. However, existing models primarily focus on learning the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2406.08392  [pdf, other

    cs.CV

    FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

    Authors: Xinzhi Mu, Li Chen, Bohan Chen, Shuyang Gu, Jianmin Bao, Dong Chen, Ji Li, Yuhui Yuan

    Abstract: Recently, the application of modern diffusion-based text-to-image generation models for creating artistic fonts, traditionally the domain of professional designers, has garnered significant interest. Diverging from the majority of existing studies that concentrate on generating artistic typography, our research aims to tackle a novel and more demanding challenge: the generation of text effects for… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Project-page: https://font-studio.github.io/

  7. arXiv:2406.04314  [pdf, other

    cs.CV

    Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

    Authors: Zhanhao Liang, Yuhui Yuan, Shuyang Gu, Bohan Chen, Tiankai Hang, Ji Li, Liang Zheng

    Abstract: Recently, Direct Preference Optimization (DPO) has extended its success from aligning large language models (LLMs) to aligning text-to-image diffusion models with human preferences. Unlike most existing DPO methods that assume all diffusion steps share a consistent preference order with the final generated images, we argue that this assumption neglects step-specific denoising performance and that… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  8. arXiv:2406.02147  [pdf, other

    cs.CV

    UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

    Authors: Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang

    Abstract: 3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2405.20860  [pdf, other

    cs.LG

    Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation

    Authors: Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas Spanos, Adam Wierman, Ming Jin

    Abstract: Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints. However, safe RL often suffers from sample inefficiency, requiring extensive interactions with the environment to learn a safe policy. We propose Efficient Safe Policy Optimization (ESPO), a novel approach that enhances the ef… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  10. arXiv:2405.18209  [pdf, other

    cs.RO cs.LG

    Safe Multi-Agent Reinforcement Learning with Bilevel Optimization in Autonomous Driving

    Authors: Zhi Zheng, Shangding Gu

    Abstract: Ensuring safety in MARL, particularly when deploying it in real-world applications such as autonomous driving, emerges as a critical challenge. To address this challenge, traditional safe MARL methods extend MARL approaches to incorporate safety considerations, aiming to minimize safety risk values. However, these safe MARL algorithms often fail to model other agents and lack convergence guarantee… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  11. arXiv:2405.16414  [pdf, other

    cs.CV

    PPRSteg: Printing and Photography Robust QR Code Steganography via Attention Flow-Based Model

    Authors: Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang, Jing Liao, Shuhang Gu, Changbo Wang, Chenhui Li

    Abstract: Image steganography can hide information in a host image and obtain a stego image that is perceptually indistinguishable from the original one. This technique has tremendous potential in scenarios like copyright protection, information retrospection, etc. Some previous studies have proposed to enhance the robustness of the methods against image disturbances to increase their applicability. However… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 9 content pages

  12. arXiv:2405.16390  [pdf, other

    cs.AI cs.LG

    Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning

    Authors: Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Alois Knoll, Ming Jin

    Abstract: In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gr… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  13. arXiv:2405.16256  [pdf, other

    cs.DC cs.AI

    HetHub: A Heterogeneous distributed hybrid training system for large-scale models

    Authors: Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Haolin Ye, Sipei Gu, Chunsheng Shui, Zhezheng Lin, Hao Zhang, Sheng Wang, Guohao Dai, Yu Wang

    Abstract: The development of large-scale models relies on a vast number of computing resources. For example, the GPT-4 model (1.8 trillion parameters) requires 25000 A100 GPUs for its training. It is a challenge to build a large-scale cluster with a type of GPU-accelerator. Using multiple types of GPU-accelerators to construct a cluster is an effective way to solve the problem of insufficient homogeneous GP… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  14. arXiv:2405.06001  [pdf, other

    cs.LG cs.AI cs.CL

    LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models

    Authors: Ruihao Gong, Yang Yong, Shiqiao Gu, Yushi Huang, Yunchen Zhang, Xianglong Liu, Dacheng Tao

    Abstract: Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence, thanks to their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements of LLMs limit their widespread adoption. Quan- tization, a key compression technique, offers a viable solution to mitigate these demands by compressing a… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  15. arXiv:2405.01677  [pdf, other

    cs.LG cs.AI

    Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation

    Authors: Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, Qingwei Lin, Ming Jin, Alois Knoll

    Abstract: Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the t… ▽ More

    Submitted 7 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  16. arXiv:2404.14435  [pdf, other

    cs.CV eess.IV

    FreSeg: Frenet-Frame-based Part Segmentation for 3D Curvilinear Structures

    Authors: Shixuan Gu, Jason Ken Adhinarta, Mikhail Bessmeltsev, Jiancheng Yang, Jessica Zhang, Daniel Berger, Jeff W. Lichtman, Hanspeter Pfister, Donglai Wei

    Abstract: Part segmentation is a crucial task for 3D curvilinear structures like neuron dendrites and blood vessels, enabling the analysis of dendritic spines and aneurysms with scientific and clinical significance. However, their diversely winded morphology poses a generalization challenge to existing deep learning methods, which leads to labor-intensive manual correction. In this work, we propose FreSeg,… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures

  17. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  18. arXiv:2404.06777  [pdf, other

    cs.NI

    Responsible Federated Learning in Smart Transportation: Outlooks and Challenges

    Authors: Xiaowen Huang, Tao Huang, Shushi Gu, Shuguang Zhao, Guanglin Zhang

    Abstract: Integrating artificial intelligence (AI) and federated learning (FL) in smart transportation has raised critical issues regarding their responsible use. Ensuring responsible AI is paramount for the stability and sustainability of intelligent transportation systems. Despite its importance, research on the responsible application of AI and FL in this domain remains nascent, with a paucity of in-dept… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  19. arXiv:2403.17421  [pdf, other

    cs.IR cs.AI

    MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

    Authors: Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin

    Abstract: The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  20. Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes

    Authors: Tianwei Zhang, Dong Wei, Mengmeng Zhu, Shi Gu, Yefeng Zheng

    Abstract: Self-supervised learning has emerged as a powerful tool for pretraining deep networks on unlabeled data, prior to transfer learning of target tasks with limited annotation. The relevance between the pretraining pretext and target tasks is crucial to the success of transfer learning. Various pretext tasks have been proposed to utilize properties of medical image data (e.g., three dimensionality), w… ▽ More

    Submitted 7 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Medical Image Analysis

  21. arXiv:2403.16001  [pdf, other

    cs.SE

    Fine-Grained Assertion-Based Test Selection

    Authors: Sijia Gu, Ali Mesbah

    Abstract: For large software applications, running the whole test suite after each code change is time- and resource-intensive. Regression test selection techniques aim at reducing test execution time by selecting only the tests that are affected by code changes. However, existing techniques select test entities at coarse granularity levels such as test class, which causes imprecise test selection and execu… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  22. arXiv:2403.14623  [pdf, other

    cs.LG cs.CV

    Simplified Diffusion Schrödinger Bridge

    Authors: Zhicong Tang, Tiankai Hang, Shuyang Gu, Dong Chen, Baining Guo

    Abstract: This paper introduces a novel theoretical simplification of the Diffusion Schrödinger Bridge (DSB) that facilitates its unification with Score-based Generative Models (SGMs), addressing the limitations of DSB in complex data generation and enabling faster convergence and enhanced performance. By employing SGMs as an initial solution for DSB, our approach capitalizes on the strengths of both framew… ▽ More

    Submitted 27 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  23. arXiv:2403.10831  [pdf, other

    cs.CV

    DUE: Dynamic Uncertainty-Aware Explanation Supervision via 3D Imputation

    Authors: Qilong Zhao, Yifei Zhang, Mengdan Zhu, Siyi Gu, Yuyang Gao, Xiaofeng Yang, Liang Zhao

    Abstract: Explanation supervision aims to enhance deep learning models by integrating additional signals to guide the generation of model explanations, showcasing notable improvements in both the predictability and explainability of the model. However, the application of explanation supervision to higher-dimensional data, such as 3D medical images, remains an under-explored domain. Challenges associated wit… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 9 pages,6 figures

  24. arXiv:2403.09637  [pdf, other

    cs.RO cs.CV

    GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping

    Authors: Yuhang Zheng, Xiangyu Chen, Yupeng Zheng, Songen Gu, Runyi Yang, Bu Jin, Pengfei Li, Chengliang Zhong, Zengmao Wang, Lina Liu, Chao Yang, Dawei Wang, Zhen Chen, Xiaoxiao Long, Meiqing Wang

    Abstract: Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  25. arXiv:2403.08694  [pdf, other

    cs.CL

    TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning

    Authors: Shangding Gu, Alois Knoll, Ming Jin

    Abstract: The development of Large Language Models (LLMs) often confronts challenges stemming from the heavy reliance on human annotators in the reinforcement learning with human feedback (RLHF) framework, or the frequent and costly external queries tied to the self-instruct paradigm. In this work, we pivot to Reinforcement Learning (RL) -- but with a twist. Diverging from the typical RLHF, which refines LL… ▽ More

    Submitted 3 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  26. arXiv:2403.05606  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data

    Authors: Yifan Wu, Yang Liu, Yue Yang, Michael S. Yao, Wenli Yang, Xuehui Shi, Lihong Yang, Dongjun Li, Yueming Liu, James C. Gee, Xuan Yang, Wenbin Wei, Shi Gu

    Abstract: Diagnosing rare diseases presents a common challenge in clinical practice, necessitating the expertise of specialists for accurate identification. The advent of machine learning offers a promising solution, while the development of such technologies is hindered by the scarcity of data on rare conditions and the demand for models that are both interpretable and trustworthy in a clinical context. In… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  28. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  29. arXiv:2402.04504  [pdf, other

    cs.CV

    Text2Street: Controllable Text-to-image Generation for Street Views

    Authors: Jinming Su, Songen Gu, Yiting Duan, Xingyue Chen, Junfeng Luo

    Abstract: Text-to-image generation has made remarkable progress with the emergence of diffusion models. However, it is still a difficult task to generate images for street views based on text, mainly because the road topology of street scenes is complex, the traffic status is diverse and the weather condition is various, which makes conventional text-to-image models difficult to deal with. To address these… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  30. arXiv:2402.02498  [pdf, other

    eess.IV cs.AI cs.CV

    Fully Differentiable Correlation-driven 2D/3D Registration for X-ray to CT Image Fusion

    Authors: Minheng Chen, Zhirun Zhang, Shuheng Gu, Zhangyang Ge, Youyong Kong

    Abstract: Image-based rigid 2D/3D registration is a critical technique for fluoroscopic guided surgical interventions. In recent years, some learning-based fully differentiable methods have produced beneficial outcomes while the process of feature extraction and gradient flow transmission still lack controllability and interpretability. To alleviate these problems, in this work, we propose a novel fully dif… ▽ More

    Submitted 15 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: ISBI 2024

  31. arXiv:2401.13011  [pdf, other

    cs.CV

    CCA: Collaborative Competitive Agents for Image Editing

    Authors: Tiankai Hang, Shuyang Gu, Dong Chen, Xin Geng, Baining Guo

    Abstract: This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructio… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  32. arXiv:2401.08209  [pdf, other

    cs.CV

    Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary

    Authors: Leheng Zhang, Yawei Li, Xingyu Zhou, Xiaorui Zhao, Shuhang Gu

    Abstract: Single Image Super-Resolution is a classic computer vision problem that involves estimating high-resolution (HR) images from low-resolution (LR) ones. Although deep neural networks (DNNs), especially Transformers for super-resolution, have seen significant advancements in recent years, challenges still remain, particularly in limited receptive field caused by window-based self-attention. To addres… ▽ More

    Submitted 18 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 15 pages, 9 figures

  33. arXiv:2401.07402  [pdf, other

    cs.CV

    Improved Implicit Neural Representation with Fourier Reparameterized Training

    Authors: Kexuan Shi, Xingyu Zhou, Shuhang Gu

    Abstract: Implicit Neural Representation (INR) as a mighty representation paradigm has achieved success in various computer vision tasks recently. Due to the low-frequency bias issue of vanilla multi-layer perceptron (MLP), existing methods have investigated advanced techniques, such as positional encoding and periodic activation function, to improve the accuracy of INR. In this paper, we connect the networ… ▽ More

    Submitted 4 July, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: CVPR 2024

  34. arXiv:2401.06603  [pdf, other

    cs.CL

    Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study

    Authors: Shangding Gu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities for reinforcement learning (RL) models, such as planning and reasoning capabilities. However, the problems of LLMs and RL model collaboration still need to be solved. In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing h… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  35. arXiv:2401.06312  [pdf, other

    cs.CV

    Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

    Authors: Xingyu Zhou, Leheng Zhang, Xiaorui Zhao, Keze Wang, Leida Li, Shuhang Gu

    Abstract: Recently, Vision Transformer has achieved great success in recovering missing details in low-resolution sequences, i.e., the video super-resolution (VSR) task. Despite its superiority in VSR accuracy, the heavy computational burden as well as the large memory footprint hinder the deployment of Transformer-based VSR models on constrained devices. In this paper, we address the above issue by proposi… ▽ More

    Submitted 29 March, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by CVPR 2024

  36. arXiv:2312.12933  [pdf, other

    cs.SE

    Automated Testing for Text-to-Image Software

    Authors: Siqi Gu

    Abstract: Recently, creative generative artificial intelligence software has emerged as a pivotal assistant, enabling users to generate content and seek inspiration rapidly. Text-to-image (T2I) software, being one of the most widely used among them, is used to synthesize images with simple text input by engaging in a cross-modal process. However, despite substantial advancements in several fields, T2I softw… ▽ More

    Submitted 24 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  37. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2312.11459  [pdf, other

    cs.CV

    VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder

    Authors: Zhicong Tang, Shuyang Gu, Chunyu Wang, Ting Zhang, Jianmin Bao, Dong Chen, Baining Guo

    Abstract: This paper introduces a pioneering 3D volumetric encoder designed for text-to-3D generation. To scale up the training data for the diffusion model, a lightweight network is developed to efficiently acquire feature volumes from multi-view images. The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net. This research further addresses the challenges of inaccur… ▽ More

    Submitted 28 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.06126  [pdf, other

    cs.LG cs.AI cs.DC

    Spreeze: High-Throughput Parallel Reinforcement Learning Framework

    Authors: Jing Hou, Guang Chen, Ruiqi Zhang, Zhijun Li, Shangding Gu, Changjun Jiang

    Abstract: The promotion of large-scale applications of reinforcement learning (RL) requires efficient training computation. While existing parallel RL frameworks encompass a variety of RL algorithms and parallelization techniques, the excessively burdensome communication frameworks hinder the attainment of the hardware's limit for final throughput and training effects on a single desktop. In this paper, we… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 11 pages, 8 figures, submitted to IEEE Transactions Journal

  40. arXiv:2311.11601  [pdf, other

    cs.CL

    Addressing the Length Bias Problem in Document-Level Neural Machine Translation

    Authors: Zhuocheng Zhang, Shuhao Gu, Min Zhang, Yang Feng

    Abstract: Document-level neural machine translation (DNMT) has shown promising results by incorporating more context information. However, this approach also introduces a length bias problem, whereby DNMT suffers from significant translation quality degradation when decoding documents that are much shorter or longer than the maximum sequence length during training. %i.e., the length bias problem. To solve t… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted by EMNLP2023 Findings

  41. arXiv:2311.10954  [pdf, other

    astro-ph.EP cs.LG

    Taxonomic analysis of asteroids with artificial neural networks

    Authors: Nanping Luo, Xiaobin Wang, Shenghong Gu, Antti Penttilä, Karri Muinonen, Yisi Liu

    Abstract: We study the surface composition of asteroids with visible and/or infrared spectroscopy. For example, asteroid taxonomy is based on the spectral features or multiple color indices in visible and near-infrared wavelengths. The composition of asteroids gives key information to understand their origin and evolution. However, we lack compositional information for faint asteroids due to limits of groun… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 10 pages,8 figures,accepted by AJ for publication

  42. arXiv:2311.09262  [pdf, other

    cs.SI cs.AI

    Disentangling the Potential Impacts of Papers into Diffusion, Conformity, and Contribution Values

    Authors: Zhikai Xue, Guoxiu He, Zhuoren Jiang, Sichen Gu, Yangyang Kang, Star Zhao, Wei Lu

    Abstract: The potential impact of an academic paper is determined by various factors, including its popularity and contribution. Existing models usually estimate original citation counts based on static graphs and fail to differentiate values from nuanced perspectives. In this study, we propose a novel graph neural network to Disentangle the Potential impacts of Papers into Diffusion, Conformity, and Contri… ▽ More

    Submitted 21 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Update and correct some references. This paper is still in progress

  43. arXiv:2311.08911  [pdf, other

    cs.GT

    Connection Incentives in Cost Sharing Mechanisms with Budgets

    Authors: Tianyi Zhang, Dengji Zhao, Junyu Zhang, Sizhe Gu

    Abstract: In a cost sharing problem on a weighted undirected graph, all other nodes want to connect to the source node for some service. Each edge has a cost denoted by a weight and all the connected nodes should share the total cost for the connectivity. The goal of the existing solutions (e.g. folk solution and cycle-complete solution) is to design cost sharing rules with nice properties, e.g. budget bala… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2201.05976

  44. arXiv:2311.08903  [pdf, other

    cs.GT

    Cost Sharing under Private Costs and Connection Control on Directed Acyclic Graphs

    Authors: Tianyi Zhang, Dengji Zhao, Junyu Zhang, Sizhe Gu

    Abstract: We consider a cost sharing problem on a weighted directed acyclic graph (DAG) with a source node to which all the other nodes want to connect. The cost (weight) of each edge is private information reported by multiple contractors, and among them, only one contractor is selected as the builder. All the nodes except for the source need to share the total cost of the used edges. However, they may blo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  45. arXiv:2311.06211  [pdf, other

    cs.CV cs.RO

    ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation

    Authors: Zhide Zhong, Jiakai Cao, Songen Gu, Sirui Xie, Weibo Gao, Liyi Luo, Zike Yan, Hao Zhao, Guyue Zhou

    Abstract: We present ASSIST, an object-wise neural radiance field as a panoptic representation for compositional and realistic simulation. Central to our approach is a novel scene node data structure that stores the information of each object in a unified fashion, allowing online interaction in both intra- and cross-scene settings. By incorporating a differentiable neural network along with the associated b… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  46. arXiv:2311.00880  [pdf, other

    cs.LG cs.AI

    SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

    Authors: Jaafar Mhamed, Shangding Gu

    Abstract: Incorporating safety is an essential prerequisite for broadening the practical applications of reinforcement learning in real-world scenarios. To tackle this challenge, Constrained Markov Decision Processes (CMDPs) are leveraged, which introduce a distinct cost function representing safety violations. In CMDPs' settings, Lagrangian relaxation technique has been employed in previous algorithms to c… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  47. arXiv:2310.18635  [pdf, other

    cs.HC

    T-PickSeer: Visual Analysis of Taxi Pick-up Point Selection Behavior

    Authors: Shuxian Gu, Yemo Dai, Zezheng Feng, Yong Wang, Haipeng Zeng

    Abstract: Taxi drivers often take much time to navigate the streets to look for passengers, which leads to high vacancy rates and wasted resources. Empty taxi cruising remains a big concern for taxi companies. Analyzing the pick-up point selection behavior can solve this problem effectively, providing suggestions for taxi management and dispatch. Many studies have been devoted to analyzing and recommending… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 10 pages, 10 figures; The 10th China Visualization and Visual Analytics Conference

  48. arXiv:2310.13912  [pdf, other

    cs.CV

    Learning Motion Refinement for Unsupervised Face Animation

    Authors: Jiale Tao, Shuhang Gu, Wen Li, Lixin Duan

    Abstract: Unsupervised face animation aims to generate a human face video based on the appearance of a source image, mimicking the motion from a driving video. Existing methods typically adopted a prior-based motion model (e.g., the local affine motion model or the local thin-plate-spline motion model). While it is able to capture the coarse facial motion, artifacts can often be observed around the tiny mot… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  49. arXiv:2310.11360  [pdf, other

    cs.CL

    Enhancing Neural Machine Translation with Semantic Units

    Authors: Langlin Huang, Shuhao Gu, Zhuocheng Zhang, Yang Feng

    Abstract: Conventional neural machine translation (NMT) models typically use subwords and words as the basic units for model input and comprehension. However, complete words and phrases composed of several tokens are often the fundamental units for expressing semantics, referred to as semantic units. To address this issue, we propose a method Semantic Units for Machine Translation (SU4MT) which models the i… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP findings 2023

    ACM Class: I.2.7

  50. Co-Learning Semantic-aware Unsupervised Segmentation for Pathological Image Registration

    Authors: Yang Liu, Shi Gu

    Abstract: The registration of pathological images plays an important role in medical applications. Despite its significance, most researchers in this field primarily focus on the registration of normal tissue into normal tissue. The negative impact of focal tissue, such as the loss of spatial correspondence information and the abnormal distortion of tissue, are rarely considered. In this paper, we propose G… ▽ More

    Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 13 pages, 7 figures, published in Medical Image Computing and Computer Assisted Intervention (MICCAI) 2023

    Journal ref: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 537-547. Cham: Springer Nature Switzerland, 2023