Skip to main content

Showing 1–50 of 1,484 results for author: Zhao, H

  1. arXiv:2407.13771  [pdf, other

    cs.CV

    Training-Free Model Merging for Multi-target Domain Adaptation

    Authors: Wenyi Li, Huan-ang Gao, Mingju Gao, Beiwen Tian, Rong Zhi, Hao Zhao

    Abstract: In this paper, we study multi-target domain adaptation of scene understanding models. While previous methods achieved commendable results through inter-domain consistency losses, they often assumed unrealistic simultaneous access to images from all target domains, overlooking constraints such as data transfer bandwidth limitations and data privacy concerns. Given these challenges, we pose the ques… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.13752  [pdf, other

    cs.CV

    LogoSticker: Inserting Logos into Diffusion Models for Customized Generation

    Authors: Mingkang Zhu, Xi Chen, Zhongdao Wang, Hengshuang Zhao, Jiaya Jia

    Abstract: Recent advances in text-to-image model customization have underscored the importance of integrating new concepts with a few examples. Yet, these progresses are largely confined to widely recognized subjects, which can be learned with relative ease through models' adequate shared prior knowledge. In contrast, logos, characterized by unique patterns and textual elements, are hard to establish shared… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  3. arXiv:2407.13338  [pdf, other

    cs.CV

    Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM

    Authors: Baicheng Li, Zike Yan, Dong Wu, Hanqing Jiang, Hongbin Zha

    Abstract: Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention due to the expressive representation power and the innovative paradigm of continual learning. However, deploying such a system within a dynamic environment has not been well-studied. Such challenges are intractable even for conventional algorithms since observations from different vie… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.11895  [pdf, other

    cs.CV

    OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

    Authors: Zehan Wang, Ziang Zhang, Hang Zhang, Luping Liu, Rongjie Huang, Xize Cheng, Hengshuang Zhao, Zhou Zhao

    Abstract: Recently, human-computer interaction with various modalities has shown promising applications, like GPT-4o and Gemini. Given the foundational role of multimodal joint representation in understanding and generation pipelines, high-quality omni joint representations would be a step toward co-processing more diverse multimodal information. In this work, we present OmniBind, large-scale multimodal joi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Homepage is http://omnibind.github.io

  5. arXiv:2407.11682  [pdf, other

    cs.CV

    MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

    Authors: Xiaoshuai Hao, Ruikai Li, Hui Zhang, Dingzhe Li, Rong Yin, Sangil Jung, Seung-In Park, ByungIn Yoo, Haimei Zhao, Jing Zhang

    Abstract: Online high-definition (HD) map construction is an important and challenging task in autonomous driving. Recently, there has been a growing interest in cost-effective multi-view camera-based methods without relying on other sensors like LiDAR. However, these methods suffer from a lack of explicit depth information, necessitating the use of large models to achieve satisfactory performance. To addre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  6. arXiv:2407.11478  [pdf, other

    cs.RO

    Trajectory Optimization under Contact Timing Uncertainties

    Authors: Haizhou Zhao, Majid Khadiv

    Abstract: Most interesting problems in robotics (e.g., locomotion and manipulation) are realized through intermittent contact with the environment. Due to the perception and modeling errors, assuming an exact time for establishing contact with the environment is unrealistic. On the other hand, handling uncertainties in contact timing is notoriously difficult as it gives rise to either handling uncertain com… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.10999  [pdf, other

    cs.CL cs.AI

    TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot

    Authors: Kaiqi Zhang, Shuai Yuan, Honghan Zhao

    Abstract: With the rapid development of large language models (LLM), the evaluation of LLM becomes increasingly important. Measuring text generation tasks such as summarization and article creation is very difficult. Especially in specific application domains (e.g., to-business or to-customer service), in-house evaluation criteria have to meet not only general standards (correctness, helpfulness and creativ… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  8. arXiv:2407.10701  [pdf, other

    cs.CL

    DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

    Authors: Anni Zou, Wenhao Yu, Hongming Zhang, Kaixin Ma, Deng Cai, Zhuosheng Zhang, Hai Zhao, Dong Yu

    Abstract: Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple reading comprehension tasks. Consequently, these systems have been carefully designed to tackle challenges such as file parsing, metadata extraction, m… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Work in progress

  9. arXiv:2407.08706  [pdf, other

    cs.CV

    HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

    Authors: Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang

    Abstract: High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs caused by high-resolution input, one promising direction is to use sliding windows to slice the input into uniform patches, each matching the input size of the well-trained vision encoder. Although efficient, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.08584  [pdf, other

    cs.DC

    Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions

    Authors: Hailiang Zhao, Xueyan Tang, Peng Chen, Jianwei Yin, Shuiguang Deng

    Abstract: This paper investigates a data-locality-aware task assignment and scheduling problem aimed at minimizing job completion times for distributed job executions. Without prior knowledge of future job arrivals, we propose an optimal balanced task assignment algorithm (OBTA) that minimizes the completion time of each arriving job. We significantly reduce OBTA's computational overhead by narrowing the se… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.08268  [pdf, other

    cs.CV

    Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

    Authors: Tong Shao, Zhuotao Tian, Hang Zhao, Jingyong Su

    Abstract: CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature cor… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV24 accepted

  12. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  13. arXiv:2407.07791  [pdf, other

    cs.CL

    Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

    Authors: Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu

    Abstract: The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 Pages, working in progress

  14. arXiv:2407.07078  [pdf, other

    cs.CV

    MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images

    Authors: Ziyang Xu, Huangxuan Zhao, Ziwei Cui, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

    Abstract: Artificial intelligence has become a crucial tool for medical image analysis. As an advanced cerebral angiography technique, Digital Subtraction Angiography (DSA) poses a challenge where the radiation dose to humans is proportional to the image count. By reducing images and using AI interpolation instead, the radiation can be cut significantly. However, DSA images present more complex motion and s… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to ECAI2024

  15. arXiv:2407.06904  [pdf, other

    cs.AI

    Hypergraph based Understanding for Document Semantic Entity Recognition

    Authors: Qiwei Li, Zuchao Li, Ping Wang, Haojun Ai, Hai Zhao

    Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  16. arXiv:2407.06250  [pdf, other

    cs.CV

    FairDiff: Fair Segmentation with Point-Image Diffusion

    Authors: Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

    Abstract: Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack pai… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  17. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/

  18. arXiv:2407.05365  [pdf, other

    cs.AI

    ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models

    Authors: Xiyuan Zhou, Huan Zhao, Yuheng Cheng, Yuji Cao, Gaoqi Liang, Guolong Liu, Junhua Zhao

    Abstract: In response to the urgent demand for grid stability and the complex challenges posed by renewable energy integration and electricity market dynamics, the power sector increasingly seeks innovative technological solutions. In this context, large language models (LLMs) have become a key technology to improve efficiency and promote intelligent progress in the power sector with their excellent natural… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  19. arXiv:2407.05364  [pdf, other

    cs.LG

    PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

    Authors: Hangting Ye, Wei Fan, Xiaozhuang Song, Shun Zheng, He Zhao, Dandan Guo, Yi Chang

    Abstract: Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ICLR 2024

  20. arXiv:2407.05342  [pdf, other

    cs.CV

    Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

    Authors: Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia

    Abstract: This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLM… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  21. arXiv:2407.05282  [pdf, other

    cs.CV

    UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

    Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

    Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct a… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 32 pages, 14 figures

  22. arXiv:2407.04068  [pdf, other

    cs.CV

    CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting

    Authors: Qinkai Yu, Jianyang Xie, Anh Nguyen, He Zhao, Jiong Zhang, Huazhu Fu, Yitian Zhao, Yalin Zheng, Yanda Meng

    Abstract: Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for ac… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  23. arXiv:2407.03813  [pdf, other

    cs.CV

    PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer

    Authors: Qian Feng, Hanbin Zhao, Chao Zhang, Jiahua Dong, Henghui Ding, Yu-Gang Jiang, Hui Qian

    Abstract: Incremental Learning (IL) aims to learn deep models on sequential tasks continually, where each new task includes a batch of new classes and deep models have no access to task-ID information at the inference time. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples (rehearsal-free) and with a memory constraint (mem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  24. arXiv:2407.03165  [pdf, other

    cs.CV cs.GR

    Consistent Point Orientation for Manifold Surfaces via Boundary Integration

    Authors: Weizhou Liu, Xingce Wang, Haichuan Zhao, Xingfei Xue, Zhongke Wu, Xuequan Lu, Ying He

    Abstract: This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integr… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted in siggraph2024

  25. arXiv:2407.02833  [pdf, other

    cs.IR cs.CL cs.LG

    LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation

    Authors: Hongke Zhao, Songming Zheng, Likang Wu, Bowen Yu, Jing Wang

    Abstract: The explainability of recommendation systems is crucial for enhancing user trust and satisfaction. Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation. However, in existing related studies, fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems, limiting the applicatio… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  26. arXiv:2407.01863  [pdf, other

    cs.CL

    VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

    Authors: Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang

    Abstract: Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  27. arXiv:2407.01812  [pdf, other

    cs.RO cs.LG

    Equivariant Diffusion Policy

    Authors: Dian Wang, Stephen Hart, David Surovik, Tarik Kelestemur, Haojie Huang, Haibo Zhao, Mark Yeatman, Jiuguang Wang, Robin Walters, Robert Platt

    Abstract: Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning me… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  28. arXiv:2407.01320  [pdf, other

    cs.LG cs.AI cs.CL

    Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

    Authors: Haobo Song, Hao Zhao, Soumajit Majumder, Tao Lin

    Abstract: Fine-tuning large pre-trained foundation models, such as the 175B GPT-3, has attracted more attention for downstream tasks recently. While parameter-efficient fine-tuning methods have been proposed and proven effective without retraining all model parameters, their performance is limited by the capacity of incremental modules, especially under constrained parameter budgets. \\ To overcome this cha… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICLR 2024. Code at https://github.com/LINs-lab/CapaBoost

  29. arXiv:2407.01239  [pdf, other

    cs.CV cs.AI

    SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

    Authors: Ao Liang, Wenyu Chen, Jian Fang, Huaici Zhao

    Abstract: The single-stage point-based 3D object detectors have attracted widespread research interest due to their advantages of lightweight and fast inference speed. However, they still face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC). In this paper, we propose SGCCNet to alleviate these two issues.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 16 figures

  30. arXiv:2407.00808  [pdf

    eess.SY cs.AI

    Exploring a Physics-Informed Decision Transformer for Distribution System Restoration: Methodology and Performance Analysis

    Authors: Hong Zhao, Jin Wei-Kocsis, Adel Heidari Akhijahani, Karen L Butler-Purry

    Abstract: Driven by advancements in sensing and computing, deep reinforcement learning (DRL)-based methods have demonstrated significant potential in effectively tackling distribution system restoration (DSR) challenges under uncertain operational scenarios. However, the data-intensive nature of DRL poses obstacles in achieving satisfactory DSR solutions for large-scale, complex distribution systems. Inspir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  31. arXiv:2407.00765  [pdf, other

    cs.LG cs.NE math.NA stat.ML

    Structured and Balanced Multi-component and Multi-layer Neural Networks

    Authors: Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou

    Abstract: In this work, we propose a balanced multi-component and multi-layer neural network (MMNN) structure to approximate functions with complex features with both accuracy and efficiency in terms of degrees of freedom and computation cost. The main idea is motivated by a multi-component, each of which can be approximated effectively by a single-layer network, and multi-layer decomposition in a "divide-a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Our codes and implementation details are available at https://github.com/ShijunZhangMath/MMNN

  32. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  33. arXiv:2406.19705  [pdf, other

    cs.AI

    DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

    Authors: Kexiong Yu, Hang Zhao, Yuhang Huang, Renjiao Yi, Kai Xu, Chenyang Zhu

    Abstract: Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research ha… ▽ More

    Submitted 4 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

  34. arXiv:2406.18533  [pdf, other

    cs.CV

    On Scaling Up 3D Gaussian Splatting Training

    Authors: Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie

    Abstract: 3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize c… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/nyu-systems/Grendel-GS ; Project page: https://daohanlu.github.io/scaling-up-3dgs

    ACM Class: I.4.5

  35. CAT: Interpretable Concept-based Taylor Additive Models

    Authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Hongjue Zhao, Chenxiang Luo, Eric Zavesky, Huaxiu Yao, Huajie Shao

    Abstract: As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  36. Performative Debias with Fair-exposure Optimization Driven by Strategic Agents in Recommender Systems

    Authors: Zhichen Xiang, Hongke Zhao, Chuang Zhao, Ming He, Jianping Fan

    Abstract: Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking appr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: SIGKDD 2024 accepted paper

  37. arXiv:2406.17286  [pdf

    cs.RO eess.SY

    Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

    Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

    Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  38. arXiv:2406.16878  [pdf, ps, other

    eess.SP cs.AI cs.IT

    Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels

    Authors: Yanhu Wang, Shuaishuai Guo, Anming Dong, Hui Zhao

    Abstract: Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference cha… ▽ More

    Submitted 10 April, 2024; originally announced June 2024.

  39. arXiv:2406.16722  [pdf, other

    cs.CL

    Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

    Authors: Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

    Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  40. arXiv:2406.16494  [pdf, other

    cs.IR cs.AI

    Cross-domain Transfer of Valence Preferences via a Meta-optimization Approach

    Authors: Chuang Zhao, Hongke Zhao, Ming He, Xiaomeng Li, Jianping Fan

    Abstract: Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive relia… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  41. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  42. arXiv:2406.15781  [pdf, other

    cs.CL

    DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

    Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

    Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  43. arXiv:2406.15534  [pdf, other

    cs.LG cs.AI cs.CL q-bio.QM

    Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

    Authors: Tianyu Liu, Yijia Xiao, Xiao Luo, Hua Xu, W. Jim Zheng, Hongyu Zhao

    Abstract: The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages

  44. arXiv:2406.15504  [pdf, other

    cs.CL cs.LG

    Dr.E Bridges Graphs with Large Language Models through Words

    Authors: Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, Nan Feng

    Abstract: Significant efforts have been directed toward integrating powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of vision, language, and audio data. However, the graph-structured data, inherently rich in structural and domain-specific knowledge, have not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffe… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  45. arXiv:2406.14952  [pdf, other

    cs.CL

    ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

    Authors: Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

    Abstract: Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Pre-print

  46. arXiv:2406.13250  [pdf, other

    cs.AI cs.CL cs.LG

    LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling

    Authors: Zhong Guan, Hongke Zhao, Likang Wu, Ming He, Jianpin Fan

    Abstract: Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  47. arXiv:2406.13235  [pdf, other

    cs.IR cs.AI

    Enhancing Collaborative Semantics of Language Model-Driven Recommendations via Graph-Aware Learning

    Authors: Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan

    Abstract: Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the ad… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10pages

  48. arXiv:2406.12845  [pdf, other

    cs.LG cs.CL

    Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

    Authors: Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conventional RMs are trained on pairwise responses to the same user request, with relative ratings indicating which response humans prefer. The trained RM… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Technical report v1. Code and model are released at https://github.com/RLHFlow/RLHF-Reward-Modeling/

  49. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  50. arXiv:2406.12375  [pdf, other

    cs.LG cs.AI

    GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

    Authors: Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu

    Abstract: Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty c… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.