Skip to main content

Showing 1–50 of 227 results for author: Sun, B

  1. arXiv:2407.01445  [pdf, other

    cs.LG cs.CV

    FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources

    Authors: Xiyuan Wei, Fanjiang Ye, Ori Yonay, Xingyu Chen, Baixi Sun, Dingwen Tao, Tianbao Yang

    Abstract: Existing studies of training state-of-the-art Contrastive Language-Image Pretraining (CLIP) models on large-scale data involve hundreds of or even thousands of GPUs due to the requirement of a large batch size. However, such a large amount of resources is not accessible to most people. While advanced compositional optimization techniques for optimizing global contrastive losses have been demonstra… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 23 pages

  2. arXiv:2406.18752  [pdf, other

    cs.LG cs.GT

    Competitive Algorithms for Online Knapsack with Succinct Predictions

    Authors: Mohammadreza Daneshvaramoli, Helia Karisani, Adam Lechowicz, Bo Sun, Cameron Musco, Mohammad Hajiesmaili

    Abstract: In the online knapsack problem, the goal is to pack items arriving online with different values and weights into a capacity-limited knapsack to maximize the total value of the accepted items. We study \textit{learning-augmented} algorithms for this problem, which aim to use machine-learned predictions to move beyond pessimistic worst-case guarantees. Existing learning-augmented algorithms for onli… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 29 pages, 10 figures, Submitted to NeurIPS 2024

    MSC Class: 68Q25; 68T05 ACM Class: F.2.2; I.2.6

  3. arXiv:2406.12846  [pdf, other

    cs.CV

    DrVideo: Document Retrieval Based Long Video Understanding

    Authors: Ziyu Ma, Chenhui Gou, Hengcan Shi, Bin Sun, Shutao Li, Hamid Rezatofighi, Jianfei Cai

    Abstract: Existing methods for long video understanding primarily focus on videos only lasting tens of seconds, with limited exploration of techniques for handling longer videos. The increased number of frames in longer videos presents two main challenges: difficulty in locating key information and performing long-range reasoning. Thus, we propose DrVideo, a document-retrieval-based system designed for long… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 11 pages

  4. arXiv:2406.12121  [pdf, other

    cs.CV

    TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations

    Authors: Bo Sun, Thibault Groueix, Chen Song, Qixing Huang, Noam Aigerman

    Abstract: This work proposes a novel representation of injective deformations of 3D space, which overcomes existing limitations of injective methods: inaccuracy, lack of robustness, and incompatibility with general learning and optimization frameworks. The core idea is to reduce the problem to a deep composition of multiple 2D mesh-based piecewise-linear maps. Namely, we build differentiable layers that pro… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.07850  [pdf, other

    cs.CL cs.AI

    Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generation

    Authors: Yiwei Li, Fei Mi, Yitong Li, Yasheng Wang, Bin Sun, Shaoxiong Feng, Kan Li

    Abstract: Stochastic sampling strategies such as top-k and top-p have been widely used in dialogue generation task. However, as an open-domain chatting system, there will be two different conversation scenarios, i.e. chit-chat and knowledge-based question answering. In the former situation, responses diversity is essential due to the one-to-many nature in dialogue. The latter, on the other hand, requires le… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  6. arXiv:2406.06253  [pdf, other

    eess.SY cs.PL

    PretVM: Predictable, Efficient Virtual Machine for Real-Time Concurrency

    Authors: Shaokai Lin, Erling Jellum, Mirco Theile, Tassilo Tanneberger, Binqi Sun, Chadlia Jerad, Ruomu Xu, Guangyu Feng, Christian Menard, Marten Lohstroh, Jeronimo Castrillon, Sanjit Seshia, Edward Lee

    Abstract: This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with wel… ▽ More

    Submitted 25 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  7. arXiv:2406.03248  [pdf, other

    cs.IR cs.CL

    Large Language Models as Evaluators for Recommendation Explanations

    Authors: Xiaoyu Zhang, Yishan Li, Jiayin Wang, Bowen Sun, Weizhi Ma, Peijie Sun, Min Zhang

    Abstract: The explainability of recommender systems has attracted significant attention in academia and industry. Many efforts have been made for explainable recommendations, yet evaluating the quality of the explanations remains a challenging and unresolved issue. In recent years, leveraging LLMs as evaluators presents a promising avenue in Natural Language Processing tasks (e.g., sentiment classification,… ▽ More

    Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

  8. arXiv:2406.03243  [pdf, other

    cs.AR cs.DC cs.LG

    Llumnix: Dynamic Scheduling for Large Language Model Serving

    Authors: Biao Sun, Ziming Huang, Hanyu Zhao, Wencong Xiao, Xinyi Zhang, Yong Li, Wei Lin

    Abstract: Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamen… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: To appear at OSDI '24; open-source repo will be available in June 2024

  9. arXiv:2405.18715  [pdf, other

    cs.CV

    NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

    Authors: Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng

    Abstract: Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusi… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: CVPR 2024, first two authors contributed equally. Project Page: https://rwn17.github.io/nerf-on-the-go/

  10. arXiv:2405.18347  [pdf, other

    cs.LG

    Dataset Growth

    Authors: Ziheng Qin, Zhaopan Xu, Yukun Zhou, Zangwei Zheng, Zebang Cheng, Hao Tang, Lei Shang, Baigui Sun, Xiaojiang Peng, Radu Timofte, Hongxun Yao, Kai Wang, Yang You

    Abstract: Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical to do manual cleaning against noise and redundancy given today's data scale. There are existing techniques for cleaning/selecting the collected data. H… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  11. arXiv:2405.15286  [pdf, other

    cs.CV

    3D Unsupervised Learning by Distilling 2D Open-Vocabulary Segmentation Models for Autonomous Driving

    Authors: Boyi Sun, Yuhang Liu, Xingxia Wang, Bin Tian, Long Chen, Fei-Yue Wang

    Abstract: Point cloud data labeling is considered a time-consuming and expensive task in autonomous driving, whereas unsupervised learning can avoid it by learning point cloud representations from unannotated data. In this paper, we propose UOV, a novel 3D Unsupervised framework assisted by 2D Open-Vocabulary segmentation models. It consists of two stages: In the first stage, we innovatively integrate high-… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 25 pages, 6 figures, codes are available at https://github.com/sbysbysbys/UOV

  12. arXiv:2405.15274  [pdf, other

    cs.CV cs.HC

    Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding

    Authors: Yuhang Liu, Boyi Sun, Guixu Zheng, Yishuo Wang, Jing Wang, Fei-Yue Wang

    Abstract: LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  13. arXiv:2405.09859  [pdf, other

    cs.DS

    Risk-Sensitive Online Algorithms

    Authors: Nicolas Christianson, Bo Sun, Steven Low, Adam Wierman

    Abstract: We study the design of risk-sensitive online algorithms, in which risk measures are used in the competitive analysis of randomized online algorithms. We introduce the CVaR$_δ$-competitive ratio ($δ$-CR) using the conditional value-at-risk of an algorithm's cost, which measures the expectation of the $(1-δ)$-fraction of worst outcomes against the offline optimal cost, and use this measure to study… ▽ More

    Submitted 24 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2024. Updated with an additional reference and minor edits

  14. arXiv:2405.09131  [pdf, other

    cs.CV

    RobustMVS: Single Domain Generalized Deep Multi-view Stereo

    Authors: Hongbin Xu, Weitao Chen, Baigui Sun, Xuansong Xie, Wenxiong Kang

    Abstract: Despite the impressive performance of Multi-view Stereo (MVS) approaches given plenty of training samples, the performance degradation when generalizing to unseen domains has not been clearly explored yet. In this work, we focus on the domain generalization problem in MVS. To evaluate the generalization results, we build a novel MVS domain generalization benchmark including synthetic and real-worl… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to TCSVT. Code will be released at: https://github.com/ToughStoneX/Robust-MVS. Benchmark will be released at: https://github.com/ToughStoneX/MVS_Evaluation_Benchmark

  15. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  16. arXiv:2404.17644  [pdf, other

    stat.ML cs.AI cs.LG

    A Conditional Independence Test in the Presence of Discretization

    Authors: Boyang Sun, Yu Yao, Huangyuan Hao, Yumou Qiu, Kun Zhang

    Abstract: Testing conditional independence has many applications, such as in Bayesian network learning and causal discovery. Different test methods have been proposed. However, existing methods generally can not work when only discretized observations are available. Specifically, consider $X_1$, $\tilde{X}_2$ and $X_3$ are observed variables, where $\tilde{X}_2$ is a discretization of latent variables… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

  17. arXiv:2404.00878  [pdf, other

    cs.CV

    TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

    Authors: Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong Liu, Jingdong Wang

    Abstract: Virtual try-on focuses on adjusting the given clothes to fit a specific person seamlessly while avoiding any distortion of the patterns and textures of the garment. However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the wide… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  18. arXiv:2403.19467  [pdf, other

    cs.CV

    Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication

    Authors: Mingze Sun, Chao Xu, Xinyu Jiang, Yang Liu, Baigui Sun, Ruqi Huang

    Abstract: In this paper, we introduce an innovative task focused on human communication, aiming to generate 3D holistic human motions for both speakers and listeners. Central to our approach is the incorporation of factorization to decouple audio features and the combination of textual semantic information, thereby facilitating the creation of more realistic and coordinated movements. We separately train VQ… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  19. arXiv:2403.10726  [pdf, other

    cs.DC cs.AI cs.AR

    Strict Partitioning for Sporadic Rigid Gang Tasks

    Authors: Binqi Sun, Tomasz Kloda, Marco Caccamo

    Abstract: The rigid gang task model is based on the idea of executing multiple threads simultaneously on a fixed number of processors to increase efficiency and performance. Although there is extensive literature on global rigid gang scheduling, partitioned approaches have several practical advantages (e.g., task isolation and reduced scheduling overheads). In this paper, we propose a new partitioned schedu… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: to be published in IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2024)

  20. arXiv:2403.08310  [pdf, other

    cs.CV

    StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields

    Authors: Hongbin Xu, Weitao Chen, Feng Xiao, Baigui Sun, Wenxiong Kang

    Abstract: 4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the no… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: In submission. The code and model are released at: https://github.com/ToughStoneX/StyleDyRF

  21. arXiv:2403.06775  [pdf, other

    cs.CV

    FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

    Authors: Pengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen

    Abstract: Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR2024

  22. arXiv:2403.05050  [pdf, other

    cs.CV cs.AI cs.MM

    DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception

    Authors: Xiang Huang, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Baigui Sun, Xiao Wu

    Abstract: The advancement of autonomous driving systems hinges on the ability to achieve low-latency and high-accuracy perception. To address this critical need, this paper introduces Dynamic Routering Network (DyRoNet), a low-rank enhanced dynamic routing framework designed for streaming perception in autonomous driving systems. DyRoNet integrates a suite of pre-trained branch networks, each meticulously f… ▽ More

    Submitted 18 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Project: https://tastevision.github.io/DyRoNet/

  23. arXiv:2403.01901  [pdf, other

    cs.CV

    FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

    Authors: Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun

    Abstract: In this paper, we abstract the process of people hearing speech, extracting meaningful cues, and creating various dynamically audio-consistent talking faces, termed Listening and Imagining, into the task of high-fidelity diverse talking faces generation from a single audio. Specifically, it involves two critical challenges: one is to effectively decouple identity, content, and emotion from entangl… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  24. arXiv:2402.19085  [pdf, other

    cs.CL cs.AI eess.SY

    Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

    Authors: Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, exi… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  25. arXiv:2402.18117  [pdf, other

    cs.CV cs.LG

    PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation

    Authors: Haoyu Xie, Changqi Wang, Jian Zhao, Yang Liu, Jun Dan, Chong Fu, Baigui Sun

    Abstract: Tremendous breakthroughs have been developed in Semi-Supervised Semantic Segmentation (S4) through contrastive learning. However, due to limited annotations, the guidance on unlabeled images is generated by the model itself, which inevitably exists noise and disturbs the unsupervised training process. To address this issue, we propose a robust contrastive-based S4 framework, termed the Probabilist… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 19 pages, 11 figures

  26. arXiv:2402.14012  [pdf, other

    cs.DS cs.LG

    Chasing Convex Functions with Long-term Constraints

    Authors: Adam Lechowicz, Nicolas Christianson, Bo Sun, Noman Bashir, Mohammad Hajiesmaili, Adam Wierman, Prashant Shenoy

    Abstract: We introduce and study a family of online metric problems with long-term constraints. In these problems, an online player makes decisions $\mathbf{x}_t$ in a metric space $(X,d)$ to simultaneously minimize their hitting cost $f_t(\mathbf{x}_t)$ and switching cost as determined by the metric. Over the time horizon $T$, the player must satisfy a long-term demand constraint… ▽ More

    Submitted 12 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024. 31 pages, 12 figures

  27. arXiv:2402.09240  [pdf, other

    cs.LG cs.CV

    Switch EMA: A Free Lunch for Better Flatness and Sharpness

    Authors: Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li

    Abstract: Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Preprint V1. Source code and models at https://github.com/Westlake-AI/SEMA

  28. arXiv:2402.08562  [pdf, other

    cs.CL cs.AI

    Higher Layers Need More LoRA Experts

    Authors: Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun, Ruibo Liu, Daiyi Peng, Yawen Zhang, Xiaoyuan Guo, Jie Yang, VS Subrahmanian

    Abstract: Parameter-efficient tuning (PEFT) techniques like low-rank adaptation (LoRA) offer training efficiency on Large Language Models, but their impact on model performance remains limited. Recent efforts integrate LoRA and Mixture-of-Experts (MoE) to improve the performance of PEFT methods. Despite promising results, research on improving the efficiency of LoRA with MoE is still in its early stages. Re… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: The code is available at https://github.com/GCYZSL/MoLA

  29. arXiv:2402.02503  [pdf

    cs.CV cs.CL

    GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

    Authors: Ziyu Ma, Shutao Li, Bin Sun, Jianfei Cai, Zuxiang Long, Fuyan Ma

    Abstract: Knowledge-based visual question answering (VQA) requires world knowledge beyond the image for accurate answer. Recently, instead of extra knowledge bases, a large language model (LLM) like GPT-3 is activated as an implicit knowledge engine to jointly acquire and reason the necessary knowledge for answering by converting images into textual information (e.g., captions and answer candidates). Howeve… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 17 pages

  30. arXiv:2401.10560  [pdf, other

    cs.CV

    360ORB-SLAM: A Visual SLAM System for Panoramic Images with Depth Completion Network

    Authors: Yichen Chen, Yiqi Pan, Ruyu Liu, Haoyu Zhang, Guodao Zhang, Bo Sun, Jianhua Zhang

    Abstract: To enhance the performance and effect of AR/VR applications and visual assistance and inspection systems, visual simultaneous localization and mapping (vSLAM) is a fundamental task in computer vision and robotics. However, traditional vSLAM systems are limited by the camera's narrow field-of-view, resulting in challenges such as sparse feature distribution and lack of dense depth information. To o… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 6 pages, 9 figures

  31. arXiv:2401.10480  [pdf, other

    cs.CL cs.AI

    Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning

    Authors: Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, Kan Li

    Abstract: Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning. Despite bringing significant performance improvements across a variety of multi-step reasoning tasks, it is a high-cost method that requires multiple sampling with the preset size. In this paper, we propose a simple and scalable sampling process, \textbf{E}arly-Stopping \textbf{S}elf-\textbf{C}onsistency… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  32. arXiv:2401.00897  [pdf, other

    cs.CV cs.AI

    Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

    Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Di Wu, Lirong Wu, Zicheng Liu, Jun Xia, Cheng Tan, Yang Liu, Baigui Sun, Stan Z. Li

    Abstract: As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked… ▽ More

    Submitted 9 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Preprint v2 (fix typos and citations). GitHub project at https://github.com/Lupin1998/Awesome-MIM

  33. arXiv:2312.12832  [pdf, other

    cs.CL cs.AI

    Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data

    Authors: Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Bin Sun, Xinglin Wang, Heda Wang, Kan Li

    Abstract: Large Language Models (LLMs) have performed well on various reasoning tasks, but their inaccessibility and numerous parameters hinder wide application in practice. One promising way is distilling the reasoning ability from LLMs to small models by the generated chain-of-thought reasoning paths. In some cases, however, LLMs may produce incorrect reasoning chains, especially when facing complex mathe… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  34. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  35. arXiv:2312.05486  [pdf, other

    cs.AI cs.LG math.PR

    FreeFlow: A Comprehensive Understanding on Diffusion Probabilistic Models via Optimal Transport

    Authors: Bowen Sun, Shibao Zheng

    Abstract: The blooming diffusion probabilistic models (DPMs) have garnered significant interest due to their impressive performance and the elegant inspiration they draw from physics. While earlier DPMs relied upon the Markovian assumption, recent methods based on differential equations have been rapidly applied to enhance the efficiency and capabilities of these models. However, a theoretical interpretatio… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

  36. arXiv:2312.03373  [pdf, other

    cs.SE

    EnvGuard: Guaranteeing Environment-Centric Safety and Security Properties in Web of Things

    Authors: Bingkun Sun, Liwei Shen, Jialin Ren, Zhen Dong, Siao Wang, Xin Peng

    Abstract: Web of Things (WoT) technology facilitates the standardized integration of IoT devices ubiquitously deployed in daily environments, promoting diverse WoT applications to automatically sense and regulate the environment. In WoT environment, heterogeneous applications, user activities, and environment changes collectively influence device behaviors, posing risks of unexpected violations of safety an… ▽ More

    Submitted 16 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

  37. arXiv:2312.01367  [pdf, other

    cs.CV cs.AI

    DiFace: Cross-Modal Face Recognition through Controlled Diffusion

    Authors: Bowen Sun, Shibao Zheng

    Abstract: Diffusion probabilistic models (DPMs) have exhibited exceptional proficiency in generating visual media of outstanding quality and realism. Nonetheless, their potential in non-generative domains, such as face recognition, has yet to be thoroughly investigated. Meanwhile, despite the extensive development of multi-modal face recognition methods, their emphasis has predominantly centered on visual m… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  38. arXiv:2311.15660  [pdf, other

    cs.CV

    Technical Report for Argoverse Challenges on 4D Occupancy Forecasting

    Authors: Pengfei Zheng, Kanokphan Lertniphonphan, Feng Chen, Siwei Chen, Bingchuan Sun, Jun Xie, Zhepeng Wang

    Abstract: This report presents our Le3DE2E_Occ solution for 4D Occupancy Forecasting in Argoverse Challenges at CVPR 2023 Workshop on Autonomous Driving (WAD). Our solution consists of a strong LiDAR-based Bird's Eye View (BEV) encoder with temporal fusion and a two-stage decoder, which combines a DETR head and a UNet decoder. The solution was tested on the Argoverse 2 sensor dataset to evaluate the occupan… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  39. arXiv:2311.12291  [pdf

    cs.CV

    Instance-aware 3D Semantic Segmentation powered by Shape Generators and Classifiers

    Authors: Bo Sun, Qixing Huang, Xiangru Huang

    Abstract: Existing 3D semantic segmentation methods rely on point-wise or voxel-wise feature descriptors to output segmentation predictions. However, these descriptors are often supervised at point or voxel level, leading to segmentation models that can behave poorly at instance-level. In this paper, we proposed a novel instance-aware approach for 3D semantic segmentation. Our method combines several geomet… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  40. arXiv:2311.10794  [pdf, other

    cs.CV

    Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression

    Authors: Animesh Sinha, Bo Sun, Anmol Kalia, Arantxa Casanova, Elliot Blanchard, David Yan, Winnie Zhang, Tony Nelli, Jiahui Chen, Hardik Shah, Licheng Yu, Mitesh Kumar Singh, Ankit Ramchandani, Maziar Sanjabi, Sonal Gupta, Amy Bearman, Dhruv Mahajan

    Abstract: We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that r… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  41. arXiv:2310.20598  [pdf, other

    cs.DS cs.LG

    Online Conversion with Switching Costs: Robust and Learning-Augmented Algorithms

    Authors: Adam Lechowicz, Nicolas Christianson, Bo Sun, Noman Bashir, Mohammad Hajiesmaili, Adam Wierman, Prashant Shenoy

    Abstract: We introduce and study online conversion with switching costs, a family of online problems that capture emerging problems at the intersection of energy and sustainability. In this problem, an online player attempts to purchase (alternatively, sell) fractional shares of an asset during a fixed time horizon with length $T$. At each time step, a cost function (alternatively, price function) is reveal… ▽ More

    Submitted 13 January, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to SIGMETRICS / Performance '24. 47 pages, 9 figures

  42. arXiv:2310.17139  [pdf, other

    cs.LG

    Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning

    Authors: Hongyu Zang, Xin Li, Leiji Zhang, Yang Liu, Baigui Sun, Riashat Islam, Remi Tachet des Combes, Romain Laroche

    Abstract: While bisimulation-based approaches hold promise for learning robust state representations for Reinforcement Learning (RL) tasks, their efficacy in offline RL tasks has not been up to par. In some instances, their performance has even significantly underperformed alternative methods. We aim to understand why bisimulation methods succeed in online settings, but falter in offline tasks. Our analysis… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  43. arXiv:2310.16466  [pdf, other

    cs.LG stat.ME

    Learning Continuous Network Emerging Dynamics from Scarce Observations via Data-Adaptive Stochastic Processes

    Authors: Jiaxu Cui, Bingyi Sun, Jiming Liu, Bo Yang

    Abstract: Learning network dynamics from the empirical structure and spatio-temporal observation data is crucial to revealing the interaction mechanisms of complex networks in a wide range of domains. However, most existing methods only aim at learning network dynamic behaviors generated by a specific ordinary differential equation instance, resulting in ineffectiveness for new ones, and generally require d… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: preprint

  44. arXiv:2310.11558  [pdf, other

    cs.LG cs.DS

    Online Algorithms with Uncertainty-Quantified Predictions

    Authors: Bo Sun, Jerry Huang, Nicolas Christianson, Mohammad Hajiesmaili, Adam Wierman, Raouf Boutaba

    Abstract: The burgeoning field of algorithms with predictions studies the problem of using possibly imperfect machine learning predictions to improve online algorithm performance. While nearly all existing algorithms in this framework make no assumptions on prediction quality, a number of methods providing uncertainty quantification (UQ) on machine learning models have been developed in recent years, which… ▽ More

    Submitted 3 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

  45. arXiv:2310.10967  [pdf, other

    cs.CL cs.AI cs.HC

    EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset

    Authors: Hang Yin, Pinren Lu, Ziang Li, Bin Sun, Kan Li

    Abstract: The need for high-quality data has been a key issue hindering the research of dialogue tasks. Recent studies try to build datasets through manual, web crawling, and large pre-trained models. However, man-made data is expensive and data collected from the internet often includes generic responses, meaningless statements, and toxic dialogues. Automatic data generation through large models is a cost-… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  46. arXiv:2310.04723  [pdf, other

    cs.LG stat.ML

    Subspace Identification for Multi-Source Domain Adaptation

    Authors: Zijian Li, Ruichu Cai, Guangyi Chen, Boyang Sun, Zhifeng Hao, Kun Zhang

    Abstract: Multi-source domain adaptation (MSDA) methods aim to transfer knowledge from multiple labeled source domains to an unlabeled target domain. Although current methods achieve target joint distribution identifiability by enforcing minimal changes across domains, they often necessitate stringent conditions, such as an adequate number of domains, monotonic transformation of latent variables, and invari… ▽ More

    Submitted 14 December, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: NeurIPS2023 Spotlight

  47. arXiv:2310.02959  [pdf, other

    cs.AR cs.DC cs.OS

    Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?

    Authors: Binqi Sun, Debayan Roy, Tomasz Kloda, Andrea Bastoni, Rodolfo Pellizzoni, Marco Caccamo

    Abstract: Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on the cache available for it to use, co-optimizing cache partitioning and task allocation improves the system's schedulability. In this paper, we propose a hybrid… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: to be published in IEEE Real-Time Systems Symposium (RTSS), 2023

  48. arXiv:2310.02779  [pdf, other

    cs.LG cs.GT

    Expected flow networks in stochastic environments and two-player zero-sum games

    Authors: Marco Jiralerspong, Bilun Sun, Danilo Vucetic, Tianyu Zhang, Yoshua Bengio, Gauthier Gidel, Nikolay Malkin

    Abstract: Generative flow networks (GFlowNets) are sequential sampling models trained to match a given distribution. GFlowNets have been successfully applied to various structured object generation tasks, sampling a diverse set of high-reward objects quickly. We propose expected flow networks (EFlowNets), which extend GFlowNets to stochastic environments. We show that EFlowNets outperform other GFlowNet for… ▽ More

    Submitted 13 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: ICLR 2024; code: https://github.com/GFNOrg/AdversarialFlowNetworks

  49. arXiv:2310.02650  [pdf, other

    cs.RO cs.CV

    Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach

    Authors: Matthew Hanlon, Boyang Sun, Marc Pollefeys, Hermann Blum

    Abstract: Rather than having each newly deployed robot create its own map of its surroundings, the growing availability of SLAM-enabled devices provides the option of simply localizing in a map of another robot or device. In cases such as multi-robot or human-robot collaboration, localizing all agents in the same map is even necessary. However, localizing e.g. a ground robot in the map of a drone or head-mo… ▽ More

    Submitted 8 May, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  50. arXiv:2310.02392  [pdf, other

    cs.RO

    A 3D Mixed Reality Interface for Human-Robot Teaming

    Authors: Jiaqi Chen, Boyang Sun, Marc Pollefeys, Hermann Blum

    Abstract: This paper presents a mixed-reality human-robot teaming system. It allows human operators to see in real-time where robots are located, even if they are not in line of sight. The operator can also visualize the map that the robots create of their environment and can easily send robots to new goal positions. The system mainly consists of a mapping and a control module. The mapping module is a real-… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.