Skip to main content

Showing 1–50 of 2,834 results for author: Zha, H

  1. arXiv:2407.08706  [pdf, other

    cs.CV

    HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

    Authors: Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang

    Abstract: High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs caused by high-resolution input, one promising direction is to use sliding windows to slice the input into uniform patches, each matching the input size of the well-trained vision encoder. Although efficient, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  2. arXiv:2407.08584  [pdf, other

    cs.DC

    Data-Locality-Aware Task Assignment and Scheduling for Distributed Job Executions

    Authors: Hailiang Zhao, Xueyan Tang, Peng Chen, Jianwei Yin, Shuiguang Deng

    Abstract: This paper investigates a data-locality-aware task assignment and scheduling problem aimed at minimizing job completion times for distributed job executions. Without prior knowledge of future job arrivals, we propose an optimal balanced task assignment algorithm (OBTA) that minimizes the completion time of each arriving job. We significantly reduce OBTA's computational overhead by narrowing the se… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  3. arXiv:2407.08420  [pdf

    cond-mat.mtrl-sci physics.optics

    Skin Effect of Nonlinear Optical Responses in Antiferromagnets

    Authors: Hang Zhou, Rui-Chun Xiao, Shu-Hui Zhang, Wei Gan, Hui Han, Hong-Miao Zhao, Wenjian Lu, Changjin Zhang, Yuping Sun, Hui Li, Ding-Fu Shao

    Abstract: Nonlinear optics plays important roles in the research of fundamental physics and the applications of highperformance optoelectronic devices. The bulk nonlinear optical responses arise from the uniform light absorption in noncentrosymmetric crystals, and hence are usually considered to be the collective phenomena of all atoms. Here we show, in contrast to this common expectation, the nonlinear opt… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  4. arXiv:2407.08268  [pdf, other

    cs.CV

    Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

    Authors: Tong Shao, Zhuotao Tian, Hang Zhao, Jingyong Su

    Abstract: CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature cor… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV24 accepted

  5. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  6. arXiv:2407.07791  [pdf, other

    cs.CL

    Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities

    Authors: Tianjie Ju, Yiting Wang, Xinbei Ma, Pengzhou Cheng, Haodong Zhao, Yulong Wang, Lifeng Liu, Jian Xie, Zhuosheng Zhang, Gongshen Liu

    Abstract: The rapid adoption of large language models (LLMs) in multi-agent systems has highlighted their impressive capabilities in various applications, such as collaborative problem-solving and autonomous negotiation. However, the security implications of these LLM-based multi-agent systems have not been thoroughly investigated, particularly concerning the spread of manipulated knowledge. In this paper,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 Pages, working in progress

  7. arXiv:2407.07632  [pdf

    econ.GN

    The role of green ammonia in meeting challenges towards a sustainable development in China

    Authors: Hanxin Zhao

    Abstract: This paper discusses the adoption of a green ammonia economy in meeting challenges in China's sustainable development. First, key challenges in China's energy transition, industry decarbonziation and regional sustainable development are explored. The coal-dominated energy consumption has placed great obstacles in achieving energy transition and led to massive CO2 emission since the large-scale ind… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  8. arXiv:2407.07078  [pdf, other

    cs.CV

    MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images

    Authors: Ziyang Xu, Huangxuan Zhao, Ziwei Cui, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

    Abstract: Artificial intelligence has become a crucial tool for medical image analysis. As an advanced cerebral angiography technique, Digital Subtraction Angiography (DSA) poses a challenge where the radiation dose to humans is proportional to the image count. By reducing images and using AI interpolation instead, the radiation can be cut significantly. However, DSA images present more complex motion and s… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to ECAI2024

  9. arXiv:2407.06904  [pdf, other

    cs.AI

    Hypergraph based Understanding for Document Semantic Entity Recognition

    Authors: Qiwei Li, Zuchao Li, Ping Wang, Haojun Ai, Hai Zhao

    Abstract: Semantic entity recognition is an important task in the field of visually-rich document understanding. It distinguishes the semantic types of text by analyzing the position relationship between text nodes and the relation between text content. The existing document understanding models mainly focus on entity categories while ignoring the extraction of entity boundaries. We build a novel hypergraph… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  10. arXiv:2407.06250  [pdf, other

    cs.CV

    FairDiff: Fair Segmentation with Point-Image Diffusion

    Authors: Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

    Abstract: Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack pai… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  11. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/

  12. arXiv:2407.05365  [pdf, other

    cs.AI

    ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models

    Authors: Xiyuan Zhou, Huan Zhao, Yuheng Cheng, Yuji Cao, Gaoqi Liang, Guolong Liu, Junhua Zhao

    Abstract: In response to the urgent demand for grid stability and the complex challenges posed by renewable energy integration and electricity market dynamics, the power sector increasingly seeks innovative technological solutions. In this context, large language models (LLMs) have become a key technology to improve efficiency and promote intelligent progress in the power sector with their excellent natural… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  13. arXiv:2407.05364  [pdf, other

    cs.LG

    PTaRL: Prototype-based Tabular Representation Learning via Space Calibration

    Authors: Hangting Ye, Wei Fan, Xiaozhuang Song, Shun Zheng, He Zhao, Dandan Guo, Yi Chang

    Abstract: Tabular data have been playing a mostly important role in diverse real-world fields, such as healthcare, engineering, finance, etc. With the recent success of deep learning, many tabular machine learning (ML) methods based on deep networks (e.g., Transformer, ResNet) have achieved competitive performance on tabular benchmarks. However, existing deep tabular ML methods suffer from the representatio… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  14. arXiv:2407.05342  [pdf, other

    cs.CV

    Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

    Authors: Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia

    Abstract: This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLM… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  15. arXiv:2407.05282  [pdf, other

    cs.CV

    UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

    Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

    Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct a… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 32 pages, 14 figures

  16. arXiv:2407.04068  [pdf, other

    cs.CV

    CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting

    Authors: Qinkai Yu, Jianyang Xie, Anh Nguyen, He Zhao, Jiong Zhang, Huazhu Fu, Yitian Zhao, Yalin Zheng, Yanda Meng

    Abstract: Diabetic retinopathy (DR) is a complication of diabetes and usually takes decades to reach sight-threatening levels. Accurate and robust detection of DR severity is critical for the timely management and treatment of diabetes. However, most current DR grading methods suffer from insufficient robustness to data variability (\textit{e.g.} colour fundus images), posing a significant difficulty for ac… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  17. arXiv:2407.03813  [pdf, other

    cs.CV

    PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer

    Authors: Qian Feng, Hanbin Zhao, Chao Zhang, Jiahua Dong, Henghui Ding, Yu-Gang Jiang, Hui Qian

    Abstract: Incremental Learning (IL) aims to learn deep models on sequential tasks continually, where each new task includes a batch of new classes and deep models have no access to task-ID information at the inference time. Recent vast pre-trained models (PTMs) have achieved outstanding performance by prompt technique in practical IL without the old samples (rehearsal-free) and with a memory constraint (mem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  18. arXiv:2407.03165  [pdf, other

    cs.CV cs.GR

    Consistent Point Orientation for Manifold Surfaces via Boundary Integration

    Authors: Weizhou Liu, Xingce Wang, Haichuan Zhao, Xingfei Xue, Zhongke Wu, Xuequan Lu, Ying He

    Abstract: This paper introduces a new approach for generating globally consistent normals for point clouds sampled from manifold surfaces. Given that the generalized winding number (GWN) field generated by a point cloud with globally consistent normals is a solution to a PDE with jump boundary conditions and possesses harmonic properties, and the Dirichlet energy of the GWN field can be defined as an integr… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: accepted in siggraph2024

  19. arXiv:2407.02833  [pdf, other

    cs.IR cs.CL cs.LG

    LANE: Logic Alignment of Non-tuning Large Language Models and Online Recommendation Systems for Explainable Reason Generation

    Authors: Hongke Zhao, Songming Zheng, Likang Wu, Bowen Yu, Jing Wang

    Abstract: The explainability of recommendation systems is crucial for enhancing user trust and satisfaction. Leveraging large language models (LLMs) offers new opportunities for comprehensive recommendation logic generation. However, in existing related studies, fine-tuning LLM models for recommendation tasks incurs high computational costs and alignment issues with existing systems, limiting the applicatio… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  20. arXiv:2407.02767  [pdf, other

    cond-mat.mtrl-sci cond-mat.mes-hall

    Comparison of Short-Range Order in GeSn Grown by Molecular Beam Epitaxy and Chemical Vapor Deposition

    Authors: Shang Liu, Yunfan Liang, Haochen Zhao, Nirosh M. Eldose, Jin-Hee Bae, Omar Concepcion, Xiaochen Jin, Shunda Chen, Ilias Bikmukhametov, Austin Akey, Cory T. Cline, Alejandra Cuervo Covian, Xiaoxin Wang, Tianshu Li, Yuping Zeng, Dan Buca, Shui-Qing Yu, Gregory J. Salamo, Shengbai Zhang, Jifeng Liu

    Abstract: Atomic short-range order (SRO) in direct-bandgap GeSn for infrared photonics has recently attracted attention due to its notable impact on band structures. However, the SRO in GeSn thin films grown by different methods have hardly been compared. This paper compares SRO in GeSn thin films of similar compositions grown by molecular beam epitaxy (MBE) and chemical vapor deposition (CVD) using atom pr… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  21. arXiv:2407.01863  [pdf, other

    cs.CL

    VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

    Authors: Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang

    Abstract: Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warrant direct investigation. One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of obje… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  22. arXiv:2407.01812  [pdf, other

    cs.RO cs.LG

    Equivariant Diffusion Policy

    Authors: Dian Wang, Stephen Hart, David Surovik, Tarik Kelestemur, Haojie Huang, Haibo Zhao, Mark Yeatman, Jiuguang Wang, Robin Walters, Robert Platt

    Abstract: Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning me… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  23. arXiv:2407.01320  [pdf, other

    cs.LG cs.AI cs.CL

    Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

    Authors: Haobo Song, Hao Zhao, Soumajit Majumder, Tao Lin

    Abstract: Fine-tuning large pre-trained foundation models, such as the 175B GPT-3, has attracted more attention for downstream tasks recently. While parameter-efficient fine-tuning methods have been proposed and proven effective without retraining all model parameters, their performance is limited by the capacity of incremental modules, especially under constrained parameter budgets. \\ To overcome this cha… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICLR 2024. Code at https://github.com/LINs-lab/CapaBoost

  24. arXiv:2407.01239  [pdf, other

    cs.CV cs.AI

    SGCCNet: Single-Stage 3D Object Detector With Saliency-Guided Data Augmentation and Confidence Correction Mechanism

    Authors: Ao Liang, Wenyu Chen, Jian Fang, Huaici Zhao

    Abstract: The single-stage point-based 3D object detectors have attracted widespread research interest due to their advantages of lightweight and fast inference speed. However, they still face challenges such as inadequate learning of low-quality objects (ILQ) and misalignment between localization accuracy and classification confidence (MLC). In this paper, we propose SGCCNet to alleviate these two issues.… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 16 figures

  25. arXiv:2407.00808  [pdf

    eess.SY cs.AI

    Exploring a Physics-Informed Decision Transformer for Distribution System Restoration: Methodology and Performance Analysis

    Authors: Hong Zhao, Jin Wei-Kocsis, Adel Heidari Akhijahani, Karen L Butler-Purry

    Abstract: Driven by advancements in sensing and computing, deep reinforcement learning (DRL)-based methods have demonstrated significant potential in effectively tackling distribution system restoration (DSR) challenges under uncertain operational scenarios. However, the data-intensive nature of DRL poses obstacles in achieving satisfactory DSR solutions for large-scale, complex distribution systems. Inspir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  26. arXiv:2407.00765  [pdf, other

    cs.LG cs.NE math.NA stat.ML

    Structured and Balanced Multi-component and Multi-layer Neural Networks

    Authors: Shijun Zhang, Hongkai Zhao, Yimin Zhong, Haomin Zhou

    Abstract: In this work, we propose a balanced multi-component and multi-layer neural network (MMNN) structure to approximate functions with complex features with both accuracy and efficiency in terms of degrees of freedom and computation cost. The main idea is motivated by a multi-component, each of which can be approximated effectively by a single-layer network, and multi-layer decomposition in a "divide-a… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Our codes and implementation details are available at https://github.com/ShijunZhangMath/MMNN

  27. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  28. arXiv:2407.00185  [pdf, other

    math.OC

    Shape optimization of non-matching isogeometric shells with moving intersections

    Authors: Han Zhao, John T. Hwang, J. S. Chen

    Abstract: While shape optimization using isogeometric shells exhibits appealing features by integrating design geometries and analysis models, challenges arise when addressing computer-aided design (CAD) geometries comprised of multiple non-uniform rational B-splines (NURBS) patches, which are common in practice. The intractability stems from surface intersections within these CAD models. In this paper, we… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: 41 pages, 18 figures

  29. arXiv:2406.19705  [pdf, other

    cs.AI

    DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

    Authors: Kexiong Yu, Hang Zhao, Yuhang Huang, Renjiao Yi, Kai Xu, Chenyang Zhu

    Abstract: Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research ha… ▽ More

    Submitted 4 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

  30. arXiv:2406.18533  [pdf, other

    cs.CV

    On Scaling Up 3D Gaussian Splatting Training

    Authors: Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie

    Abstract: 3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize c… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/nyu-systems/Grendel-GS ; Project page: https://daohanlu.github.io/scaling-up-3dgs

    ACM Class: I.4.5

  31. arXiv:2406.18155  [pdf, other

    quant-ph

    SuperGrad: a differentiable simulator for superconducting processors

    Authors: Ziang Wang, Feng Wu, Hui-Hai Zhao, Xin Wan, Xiaotong Ni

    Abstract: One significant advantage of superconducting processors is their extensive design flexibility, which encompasses various types of qubits and interactions. Given the large number of tunable parameters of a processor, the ability to perform gradient optimization would be highly beneficial. Efficient backpropagation for gradient computation requires a tightly integrated software library, for which no… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 23 pages, 7 figures, 3 tables, the code is available at https://github.com/iqubit-org/supergrad

  32. CAT: Interpretable Concept-based Taylor Additive Models

    Authors: Viet Duong, Qiong Wu, Zhengyi Zhou, Hongjue Zhao, Chenxiang Luo, Eric Zavesky, Huaxiu Yao, Huajie Shao

    Abstract: As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  33. Performative Debias with Fair-exposure Optimization Driven by Strategic Agents in Recommender Systems

    Authors: Zhichen Xiang, Hongke Zhao, Chuang Zhao, Ming He, Jianping Fan

    Abstract: Data bias, e.g., popularity impairs the dynamics of two-sided markets within recommender systems. This overshadows the less visible but potentially intriguing long-tail items that could capture user interest. Despite the abundance of research surrounding this issue, it still poses challenges and remains a hot topic in academic circles. Along this line, in this paper, we developed a re-ranking appr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: SIGKDD 2024 accepted paper

  34. arXiv:2406.17286  [pdf

    cs.RO eess.SY

    Prioritized experience replay-based DDQN for Unmanned Vehicle Path Planning

    Authors: Liu Lipeng, Letian Xu, Jiabei Liu, Haopeng Zhao, Tongzhou Jiang, Tianyao Zheng

    Abstract: Path planning module is a key module for autonomous vehicle navigation, which directly affects its operating efficiency and safety. In complex environments with many obstacles, traditional planning algorithms often cannot meet the needs of intelligence, which may lead to problems such as dead zones in unmanned vehicles. This paper proposes a path planning algorithm based on DDQN and combines it wi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 4 pages, 6 figures, 2024 5th International Conference on Information Science, Parallel and Distributed Systems

  35. arXiv:2406.17248  [pdf, other

    quant-ph

    MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework

    Authors: Xusheng Xu, Jiangyu Cui, Zidong Cui, Runhong He, Qingyu Li, Xiaowei Li, Yanling Lin, Jiale Liu, Wuxin Liu, Jiale Lu, Maolin Luo, Chufan Lyu, Shijie Pan, Mosharev Pavel, Runqiu Shu, Jialiang Tang, Ruoqian Xu, Shu Xu, Kang Yang, Fan Yu, Qingguo Zeng, Haiying Zhao, Qiang Zheng, Junyuan Zhou, Xu Zhou , et al. (14 additional authors not shown)

    Abstract: We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum… ▽ More

    Submitted 10 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  36. arXiv:2406.16878  [pdf, ps, other

    eess.SP cs.AI cs.IT

    Benchmarking Semantic Communications for Image Transmission Over MIMO Interference Channels

    Authors: Yanhu Wang, Shuaishuai Guo, Anming Dong, Hui Zhao

    Abstract: Semantic communications offer promising prospects for enhancing data transmission efficiency. However, existing schemes have predominantly concentrated on point-to-point transmissions. In this paper, we aim to investigate the validity of this claim in interference scenarios compared to baseline approaches. Specifically, our focus is on general multiple-input multiple-output (MIMO) interference cha… ▽ More

    Submitted 10 April, 2024; originally announced June 2024.

  37. arXiv:2406.16722  [pdf, other

    cs.CL

    Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba

    Authors: Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao

    Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  38. arXiv:2406.16494  [pdf, other

    cs.IR cs.AI

    Cross-domain Transfer of Valence Preferences via a Meta-optimization Approach

    Authors: Chuang Zhao, Hongke Zhao, Ming He, Xiaomeng Li, Jianping Fan

    Abstract: Cross-domain recommendation offers a potential avenue for alleviating data sparsity and cold-start problems. Embedding and mapping, as a classic cross-domain research genre, aims to identify a common mapping function to perform representation transformation between two domains. Nevertheless, previous coarse-grained preference representations, non-personalized mapping functions, and excessive relia… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  39. arXiv:2406.15885  [pdf, other

    cs.SD cs.AI eess.AS

    The Music Maestro or The Musically Challenged, A Massive Music Evaluation Benchmark for Large Language Models

    Authors: Jiajia Li, Lu Yang, Mingni Tang, Cong Chen, Zuchao Li, Ping Wang, Hai Zhao

    Abstract: Benchmark plays a pivotal role in assessing the advancements of large language models (LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities, there is a notable absence of a dedicated benchmark for assessing their musical abilities. To address this gap, we present ZIQI-Eval, a comprehensive and large-scale music benchmark specifically designed to evaluate the music-rel… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL-Findings 2024

  40. arXiv:2406.15781  [pdf, other

    cs.CL

    DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

    Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

    Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  41. arXiv:2406.15534  [pdf, other

    cs.LG cs.AI cs.CL q-bio.QM

    Geneverse: A collection of Open-source Multimodal Large Language Models for Genomic and Proteomic Research

    Authors: Tianyu Liu, Yijia Xiao, Xiao Luo, Hua Xu, W. Jim Zheng, Hongyu Zhao

    Abstract: The applications of large language models (LLMs) are promising for biomedical and healthcare research. Despite the availability of open-source LLMs trained using a wide range of biomedical data, current research on the applications of LLMs to genomics and proteomics is still limited. To fill this gap, we propose a collection of finetuned LLMs and multimodal LLMs (MLLMs), known as Geneverse, for th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages

  42. arXiv:2406.15504  [pdf, other

    cs.CL cs.LG

    Dr.E Bridges Graphs with Large Language Models through Words

    Authors: Zipeng Liu, Likang Wu, Ming He, Zhong Guan, Hongke Zhao, Nan Feng

    Abstract: Significant efforts have been directed toward integrating powerful Large Language Models (LLMs) with diverse modalities, particularly focusing on the fusion of vision, language, and audio data. However, the graph-structured data, inherently rich in structural and domain-specific knowledge, have not yet been gracefully adapted to LLMs. Existing methods either describe the graph with raw text, suffe… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  43. arXiv:2406.14952  [pdf, other

    cs.CL

    ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models

    Authors: Haiquan Zhao, Lingyu Li, Shisong Chen, Shuqi Kong, Jiaan Wang, Kexin Huang, Tianle Gu, Yixu Wang, Dandan Liang, Zhixu Li, Yan Teng, Yanghua Xiao, Yingchun Wang

    Abstract: Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of ro… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Pre-print

  44. arXiv:2406.13448  [pdf, other

    physics.acc-ph physics.plasm-ph

    Demonstration of High-Efficiency Microwave Heating Producing Record Highly Charged Xenon Ion Beams with Superconducting ECR Ion Sources

    Authors: X. Wang, J. B. Li, V. Mironov, J. W. Guo, X. Z. Zhang, O. Tarvainen, Y. C. Feng, L. X. Li, J. D. Ma, Z. H. Zhang, W. Lu, S. Bogomolov, L. Sun, H. W. Zhao

    Abstract: Intense highly charged ion beam production is essential for high-power heavy ion accelerators. A novel movable Vlasov launcher for superconducting high charge state Electron Cyclotron Resonance (ECR) ion source has been devised that can affect the microwave power effectiveness by a factor of about 4 in terms of highly charged ion beam production. This approach based on a dedicated microwave launch… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  45. arXiv:2406.13250  [pdf, other

    cs.AI cs.CL cs.LG

    LangTopo: Aligning Language Descriptions of Graphs with Tokenized Topological Modeling

    Authors: Zhong Guan, Hongke Zhao, Likang Wu, Ming He, Jianpin Fan

    Abstract: Recently, large language models (LLMs) have been widely researched in the field of graph machine learning due to their outstanding abilities in language comprehension and learning. However, the significant gap between natural language tasks and topological structure modeling poses a nonnegligible challenge. Specifically, since natural language descriptions are not sufficient for LLMs to understand… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  46. arXiv:2406.13235  [pdf, other

    cs.IR cs.AI

    Enhancing Collaborative Semantics of Language Model-Driven Recommendations via Graph-Aware Learning

    Authors: Zhong Guan, Likang Wu, Hongke Zhao, Ming He, Jianpin Fan

    Abstract: Large Language Models (LLMs) are increasingly prominent in the recommendation systems domain. Existing studies usually utilize in-context learning or supervised fine-tuning on task-specific data to align LLMs into recommendations. However, the substantial bias in semantic spaces between language processing tasks and recommendation tasks poses a nonnegligible challenge. Specifically, without the ad… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10pages

  47. arXiv:2406.12845  [pdf, other

    cs.LG cs.CL

    Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

    Authors: Haoxiang Wang, Wei Xiong, Tengyang Xie, Han Zhao, Tong Zhang

    Abstract: Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference data. Conventional RMs are trained on pairwise responses to the same user request, with relative ratings indicating which response humans prefer. The trained RM… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Technical report v1. Code and model are released at https://github.com/RLHFlow/RLHF-Reward-Modeling/

  48. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  49. arXiv:2406.12375  [pdf, other

    cs.LG cs.AI

    GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

    Authors: Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu

    Abstract: Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty c… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  50. arXiv:2406.12214  [pdf, other

    cs.RO cs.CV

    Is Your HD Map Constructor Reliable under Sensor Corruptions?

    Authors: Xiaoshuai Hao, Mengchuan Wei, Yifan Yang, Haimei Zhao, Hui Zhang, Yi Zhou, Qiang Wang, Weiming Li, Lingdong Kong, Jing Zhang

    Abstract: Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, \eg, adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: project url: https://mapbench.github.io/