Skip to main content

Showing 1–50 of 2,306 results for author: Yang, M

  1. arXiv:2407.08600  [pdf, other

    cond-mat.mes-hall

    Influence of flat bands on RKKY interaction: perspective of Fano defects

    Authors: Yue-De Luo, Min-Fong Yang

    Abstract: In this paper, we revisit the effect of flat bands on the Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction by using a coordinate transformation that detangles flat-band states from dispersive ones. Under this transformation, original flat-band systems containing magnetic impurities are mapped onto a generalized Fano-Anderson model, where flat-band states act as Fano defects. From this perspective,… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  2. arXiv:2407.08561  [pdf, other

    cs.CV

    MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps

    Authors: Hang Wu, Zhenghao Zhang, Siyuan Lin, Xiangru Mu, Qiang Zhao, Ming Yang, Tong Qin

    Abstract: Robust localization is the cornerstone of autonomous driving, especially in challenging urban environments where GPS signals suffer from multipath errors. Traditional localization approaches rely on high-definition (HD) maps, which consist of precisely annotated landmarks. However, building HD map is expensive and challenging to scale up. Given these limitations, leveraging navigation maps has eme… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: IROS 2024 (Oral)

  3. arXiv:2407.08526  [pdf, other

    cs.CV

    BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

    Authors: Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang

    Abstract: Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: IEEE IV 2024

  4. arXiv:2407.07760  [pdf, other

    cs.CV cs.AI

    Learning Spatial-Semantic Features for Robust Video Object Segmentation

    Authors: Xin Li, Deshui Miao, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan Yang

    Abstract: Tracking and segmenting multiple similar objects with complex or separate parts in long-term videos is inherently challenging due to the ambiguity of target parts and identity confusion caused by occlusion, background clutter, and long-term variations. In this paper, we propose a robust video object segmentation framework equipped with spatial-semantic features and discriminative object queries to… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Winner solution of the VOTS2024 Challenge

  5. arXiv:2407.07403  [pdf, other

    cs.CV

    A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

    Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

    Abstract: With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the compl… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  6. arXiv:2407.06842  [pdf, other

    cs.CV

    Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

    Authors: Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

    Abstract: Recent work on image content manipulation based on vision-language pre-training models has been effectively extended to text-driven 3D scene editing. However, existing schemes for 3D scene editing still exhibit certain shortcomings, hindering their further interactive design. Such schemes typically adhere to fixed input patterns, limiting users' flexibility in text input. Moreover, their editing c… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024; Project Website: https://sk-fun.fun/CE3D

  7. arXiv:2407.06815  [pdf, other

    hep-ph astro-ph.HE

    Searching Accretion-Enhanced Dark Matter Annihilation Signals in the Galactic Centre

    Authors: Mei-Wen Yang, Zhi-Qi Guo, Xiao-Yi Luo, Zhao-Qiang Shen, Zi-Qing Xia, Chih-Ting Lu, Yue-Lin Sming Tsai, Yi-Zhong Fan

    Abstract: This study reanalyzes the detection prospects of dark matter (DM) annihilation signals in the Galactic Center, focusing on velocity-dependent dynamics within a spike density near the supermassive black hole (Sgr~A$^{\star}$). We investigate three annihilation processes -- $p$-wave, resonance, and forbidden annihilation -- under semi-relativistic velocities, leveraging gamma-ray data from Fermi and… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  8. arXiv:2407.06316  [pdf, other

    math-ph cond-mat.mes-hall cond-mat.str-el math.AP math.SP

    Dirac cones and magic angles in the Bistritzer--MacDonald TBG Hamiltonian

    Authors: Simon Becker, Solomon Quinn, Zhongkai Tao, Alexander Watson, Mengxuan Yang

    Abstract: We demonstrate the generic existence of Dirac cones in the full Bistritzer--MacDonald Hamiltonian for twisted bilayer graphene. Its complementary set, when Dirac cones are absent, is the set of magic angles. We show the stability of magic angles obtained in the chiral limit by demonstrating that the perfectly flat bands transform into quadratic band crossings when perturbing away from the chiral l… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 26 pages; comments are welcome

  9. arXiv:2407.05850  [pdf, other

    cs.DC

    DFedSat: Communication-Efficient and Robust Decentralized Federated Learning for LEO Satellite Constellations

    Authors: Minghao Yang, Jingjing Zhang, Shengyun Liu

    Abstract: Low Earth Orbit (LEO) satellites play a crucial role in the development of 6G mobile networks and space-air-ground integrated systems. Recent advancements in space technology have empowered LEO satellites with the capability to run AI applications. However, centralized approaches, where ground stations (GSs) act as servers and satellites as clients, often encounter slow convergence and inefficienc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 13 pages, 10 figures

  10. arXiv:2407.05584  [pdf, other

    cs.HC

    Exploring Real-Time Music-to-Image Systems for Creative Inspiration in Music Creation

    Authors: Meng Yang, Maria Teresa Llano, Jon McCormack

    Abstract: This paper presents a study on the use of a real-time music-to-image system as a mechanism to support and inspire musicians during their creative process. The system takes MIDI messages from a keyboard as input which are then interpreted and analysed using state-of-the-art generative AI models. Based on the perceived emotion and music structure, the system's interpretation is converted into visual… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures. Accepted by 15th International Conference on Computational Creativity, ICCC 24

  11. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  12. arXiv:2407.04433  [pdf, other

    hep-ph

    Radiative decays of $P$-wave bottom baryons from light-cone sum rules

    Authors: X. Luo, H. M. Yang, H. X. Chen

    Abstract: We carry out a comprehensive investigation on the radiative decays of $P$-wave bottom baryons using the light-cone sum rule method. We analyze their electromagnetic transitions into ground-state bottom baryons together with a photon. Together with their mass spectra and strong decays investigated in Refs. \cite{Yang:2020zrh,Tan:2023opd}, a rather complete QCD sum rule study has been done to unders… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 24 pages, 10 figures, 7 tables, suggestions and comments welcome

  13. arXiv:2407.03596  [pdf, other

    cs.CV

    Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image Classification

    Authors: Xuerong Zhang, Li Huang, Jing Lv, Ming Yang

    Abstract: Semi-supervised learning is attracting blooming attention, due to its success in combining unlabeled data. However, pseudo-labeling-based semi-supervised approaches suffer from two problems in image classification: (1) Existing methods might fail to adopt suitable thresholds since they either use a pre-defined/fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior performan… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICANN24 accepted

  14. arXiv:2407.03125  [pdf, other

    cs.LG cs.AI

    Foundations and Frontiers of Graph Learning Theory

    Authors: Yu Huang, Min Zhou, Menglin Yang, Zhen Wang, Muhan Zhang, Jie Wang, Hong Xie, Hao Wang, Defu Lian, Enhong Chen

    Abstract: Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures. Notably, Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm. With these models being usually characterized by intuition-driven design or highly intricate components, placing them within the… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 35pages,273references. Github link: https://github.com/minehly/awesome-paper-for-graph-learning-theory

  15. arXiv:2407.02547  [pdf, other

    cs.AI cs.LG

    Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention

    Authors: Yuquan Xie, Wanqi Yang, Jinyu Wei, Ming Yang, Yang Gao

    Abstract: Knowledge Tracing (KT) is a critical task in online education systems, aiming to monitor students' knowledge states throughout a learning period. Common KT approaches involve predicting the probability of a student correctly answering the next question based on their exercise history. However, these methods often suffer from performance degradation when faced with the scarcity of student interacti… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  16. arXiv:2407.02303  [pdf, ps, other

    math.AP

    On the multicomponent reactive flows in moving domains

    Authors: Kuntal Bhandari, Stanislav Kračmar, Šárka Nečasová, Minsuk Yang

    Abstract: This paper is concerned with the existence of global-in-time weak solutions to the multicomponent reactive flows inside a moving domain whose shape in time is prescribed. The flow is governed by the 3D compressible Navier-Stokes-Fourier system coupled with the equations of species mass fractions. The fluid velocity is supposed to fulfill the complete slip boundary condition, whereas the heat flux… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.09348

  17. arXiv:2407.02178  [pdf

    stat.ME

    Reverse time-to-death as time-scale in time-to-event analysis for studies of advanced illness and palliative care

    Authors: Yin Bun Cheung, Xiangmei Ma, Isha Chaudhry, Nan Liu, Qingyuan Zhuang, Grace Meijuan Yang, Chetna Malhotra, Eric Andrew Finkelstein

    Abstract: Background: Incidence of adverse outcome events rises as patients with advanced illness approach end-of-life. Exposures that tend to occur near end-of-life, e.g., use of wheelchair, oxygen therapy and palliative care, may therefore be found associated with the incidence of the adverse outcomes. We propose a strategy for time-to-event analysis to mitigate the time-varying confounding. Methods: We p… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 22 pages (including 2 tables and 2 figures)

  18. arXiv:2407.02123  [pdf, other

    cs.CV

    Hybrid Feature Collaborative Reconstruction Network for Few-Shot Fine-Grained Image Classification

    Authors: Shulei Qiu, Wanqi Yang, Ming Yang

    Abstract: Our research focuses on few-shot fine-grained image classification, which faces two major challenges: appearance similarity of fine-grained objects and limited number of samples. To preserve the appearance details of images, traditional feature reconstruction networks usually enhance the representation ability of key features by spatial feature reconstruction and minimizing the reconstruction erro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  19. arXiv:2407.01290  [pdf, other

    cs.LG cs.AI

    Hypformer: Exploring Efficient Hyperbolic Transformer Fully in Hyperbolic Space

    Authors: Menglin Yang, Harshit Verma, Delvin Ce Zhang, Jiahong Liu, Irwin King, Rex Ying

    Abstract: Hyperbolic geometry have shown significant potential in modeling complex structured data, particularly those with underlying tree-like and hierarchical structures. Despite the impressive performance of various hyperbolic neural networks across numerous domains, research on adapting the Transformer to hyperbolic space remains limited. Previous attempts have mainly focused on modifying self-attentio… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: KDD 2024

  20. arXiv:2407.01267  [pdf

    cs.CE

    Generalized Orbicular (m,n,o) T-Spherical Fuzzy Sets with Hamacher Aggregation Operators and Application to Multi-Criteria Group Decision Making

    Authors: Yasir Akhtar, Mehboob Ali, Miin-Shen Yang

    Abstract: This paper introduces a novel approach to enhance uncertainty representation, offering decision-makers a more comprehensive perspective for improved decision-making outcomes. We propose Generalized Orbicular (m,n,o) T-Spherical Fuzzy Set (GO-TSFS), a flexible extension of existing fuzzy set models including Globular T-spherical fuzzy sets (G-TSFSs), T-spherical fuzzy sets (T-SFSs), (p,q,r) Spheric… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 47 pages, 7 figures, 10 tables

  21. arXiv:2407.01247  [pdf, ps, other

    cs.CV

    Multi-level Reliable Guidance for Unpaired Multi-view Clustering

    Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

    Abstract: In this paper, we address the challenging problem of unpaired multi-view clustering (UMC), aiming to perform effective joint clustering using unpaired observed samples across multiple views. Commonly, traditional incomplete multi-view clustering (IMC) methods often depend on paired samples to capture complementary information between views. However, the strategy becomes impractical in UMC due to t… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  22. arXiv:2407.00979  [pdf, other

    cs.CV

    Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval

    Authors: Hanwen Su, Ge Song, Kai Huang, Jiyan Wang, Ming Yang

    Abstract: In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR). The prior methods tackle the problem in a two-modality setting with only category labels or even no textual information involved. However, the growing prevalence of Large-scale pre-trained Language Models (LLMs), which have demonstrated great knowledge learned from web-scale data, can provide us with an opport… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  23. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  24. arXiv:2406.19584  [pdf, ps, other

    math.CO

    The Planar Turán Number of $Θ_6$-graphs

    Authors: David Guan, Ervin Győri, Diep Luong-Le, Felicia Wang, Mengyuan Yang

    Abstract: There are two particular $Θ_6$-graphs - the 6-cycle graphs with a diagonal. We find the planar Turán number of each of them, i.e. the maximum number of edges in a planar graph $G$ of $n$ vertices not containing the given $Θ_6$ as a subgraph and we find infinitely many extremal constructions showing the sharpness of these results - apart from a small additive constant error in one of the cases.

    Submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2406.19369  [pdf, other

    cs.CV

    Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model

    Authors: Haobo Yuan, Xiangtai Li, Lu Qi, Tao Zhang, Ming-Hsuan Yang, Shuicheng Yan, Chen Change Loy

    Abstract: Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process long sequences efficiently. In this work, we focus on designing an efficient segment-anything model by exploring these different architectures. Specifica… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 16 pages; 8 figures

  26. arXiv:2406.18294  [pdf, other

    cs.CL

    Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs

    Authors: Lei Zhang, Yunshui Li, Jiaming Li, Xiaobo Xia, Jiaxi Yang, Run Luo, Minzheng Wang, Longze Chen, Junhao Liu, Min Yang

    Abstract: Some recently developed code large language models (Code LLMs) have been pre-trained on repository-level code data (Repo-Code LLMs), enabling these models to recognize repository structures and utilize cross-file information for code completion. However, in real-world development scenarios, simply concatenating the entire code repository often exceeds the context window limits of these Repo-Code L… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  27. arXiv:2406.17419  [pdf, other

    cs.CL cs.AI

    Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

    Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

    Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

  28. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  29. arXiv:2406.16871  [pdf, other

    eess.SY

    Neural network based model predictive control of voltage for a polymer electrolyte fuel cell system with constraints

    Authors: Xiufei Li, Miao Yang, Yuanxin Qi, Miao Zhang

    Abstract: A fuel cell system must output a steady voltage as a power source in practical use. A neural network (NN) based model predictive control (MPC) approach is developed in this work to regulate the fuel cell output voltage with safety constraints. The developed NN MPC controller stabilizes the polymer electrolyte fuel cell system's output voltage by controlling the hydrogen and air flow rates at the s… ▽ More

    Submitted 24 March, 2024; originally announced June 2024.

  30. arXiv:2406.16562  [pdf, other

    cs.CV cs.CL

    EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models

    Authors: Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Mengping Yang, Cheng Zhang, Hao Li

    Abstract: The recent advancements in text-to-image generative models have been remarkable. Yet, the field suffers from a lack of evaluation metrics that accurately reflect the performance of these models, particularly lacking fine-grained metrics that can guide the optimization of the models. In this paper, we propose EvalAlign, a metric characterized by its accuracy, stability, and fine granularity. Our ap… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Github Repository: https://github.com/SAIS-FUXI/EvalAlign

  31. Crowd-Sourced NeRF: Collecting Data from Production Vehicles for 3D Street View Reconstruction

    Authors: Tong Qin, Changze Li, Haoyang Ye, Shaowei Wan, Minzhen Li, Hongwei Liu, Ming Yang

    Abstract: Recently, Neural Radiance Fields (NeRF) achieved impressive results in novel view synthesis. Block-NeRF showed the capability of leveraging NeRF to build large city-scale models. For large-scale modeling, a mass of image data is necessary. Collecting images from specially designed data-collection vehicles can not support large-scale applications. How to acquire massive high-quality data remains an… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  32. DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation

    Authors: Deguo Xia, Weiming Zhang, Xiyan Liu, Wei Zhang, Chenting Gong, Jizhou Huang, Mengmeng Yang, Diange Yang

    Abstract: Generating city-scale lane-level maps faces significant challenges due to the intricate urban environments, such as blurred or absent lane markings. Additionally, a standard lane-level map requires a comprehensive organization of lane groupings, encompassing lane direction, style, boundary, and topology, yet has not been thoroughly examined in prior research. These obstacles result in labor-intens… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024, camera-ready version

  33. arXiv:2406.13443  [pdf, other

    cs.CL

    Dual-Phase Accelerated Prompt Optimization

    Authors: Muchen Yang, Moxin Li, Yongle Li, Zijun Chen, Chongming Gao, Junqi Zhang, Yangyang Li, Fuli Feng

    Abstract: Gradient-free prompt optimization methods have made significant strides in enhancing the performance of closed-source Large Language Models (LLMs) across a wide range of tasks. However, existing approaches make light of the importance of high-quality prompt initialization and the identification of effective optimization directions, thus resulting in substantial optimization steps to obtain satisfa… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  34. arXiv:2406.13381  [pdf, other

    cs.CL

    CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

    Authors: Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao

    Abstract: Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures

  35. arXiv:2406.12596  [pdf, ps, other

    eess.SP

    Beyond Near-Field: Far-Field Location Division Multiple Access in Downlink MIMO Systems

    Authors: Haoyan Liu, Caijian Jie, Min Yang, Chengguang Li

    Abstract: Exploring channel dimensions has been the driving force behind breakthroughs in successive generations of mobile communication systems. In 5G, space division multiple access (SDMA) leveraging massive MIMO has been crucial in enhancing system capacity through spatial differentiation of users. However, SDMA can only finely distinguish users at adjacent angles in ultra-dense networks by extremely lar… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  36. arXiv:2406.12072  [pdf, other

    cs.AI cs.LG

    DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs

    Authors: Jiasheng Zhang, Jialin Chen, Menglin Yang, Aosong Feng, Shuang Liang, Jie Shao, Rex Ying

    Abstract: Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures

  37. arXiv:2406.11472  [pdf, other

    cs.CV

    Learning from Exemplars for Interactive Image Segmentation

    Authors: Kun Li, Hao Cheng, George Vosselman, Michael Ying Yang

    Abstract: Interactive image segmentation enables users to interact minimally with a machine, facilitating the gradual refinement of the segmentation mask for a target of interest. Previous studies have demonstrated impressive performance in extracting a single target mask through interactive segmentation. However, the information cues of previously interacted objects have been overlooked in the existing met… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  38. arXiv:2406.11191  [pdf, other

    cs.CL

    A Survey on Human Preference Learning for Large Language Models

    Authors: Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang

    Abstract: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which ma… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: IEEE copyright statement added (also applied to the former version)

  39. arXiv:2406.10137  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Compressed Sensor Caching and Collaborative Sparse Data Recovery with Anchor Alignment

    Authors: Yi-Jen Yang, Ming-Hsun Yang, Jwo-Yuh Wu, Y. -W. Peter Hong

    Abstract: This work examines the compressed sensor caching problem in wireless sensor networks and devises efficient distributed sparse data recovery algorithms to enable collaboration among multiple caches. In this problem, each cache is only allowed to access measurements from a small subset of sensors within its vicinity to reduce both cache size and data acquisition overhead. To enable reliable data rec… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: v1 was submitted to IEEE Transactions on Signal Processing on Sept. 18, 2023

  40. arXiv:2406.08790  [pdf, ps, other

    quant-ph

    Direct generation of multi-photon hyperentanglement

    Authors: Peng Zhao, Jia-Wei Ying, Meng-Ying Yang, Wei Zhong, Ming-Ming Du, Shu-Ting Shen, Yun-Xi Li, An-Lei Zhang, Lan Zhou, Yu-Bo Sheng

    Abstract: Multi-photon hyperentangement is of fundamental importance in optical quantum information processing. Existing theory and experiment producing multi-photon hyperentangled states have until now relied on the outcome post-selection, a procedure where only the measurement results corresponding to the desired state are considered. Such approach severely limits the usefulness of the resulting hyperenta… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  41. arXiv:2406.08698  [pdf, other

    astro-ph.HE hep-ph

    Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations

    Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

    Abstract: In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 17 pages, 12 figures, accepted by PRL

  42. arXiv:2406.07595  [pdf, other

    cs.CR cs.AI cs.SE

    VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

    Authors: Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen

    Abstract: Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challe… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  43. arXiv:2406.07239  [pdf, other

    cs.CL

    On the Hallucination in Simultaneous Machine Translation

    Authors: Meizhi Zhong, Kehai Chen, Zhengshan Xue, Lemao Liu, Mingming Yang, Min Zhang

    Abstract: It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the dis… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  44. arXiv:2406.07232  [pdf, other

    cs.CL cs.AI

    DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

    Authors: Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang

    Abstract: Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual l… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference

  45. arXiv:2406.07054  [pdf, other

    cs.CL cs.AI

    CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation

    Authors: Renhao Li, Minghuan Tan, Derek F. Wong, Min Yang

    Abstract: In recent years, instruction fine-tuning (IFT) on large language models (LLMs) has garnered considerable attention to enhance model performance on unseen tasks. Attempts have been made on automatic construction and effective selection for IFT data. However, we posit that previous methods have not fully harnessed the potential of LLMs for enhancing data quality. The responses within IFT data could… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  46. arXiv:2406.07042  [pdf, other

    cs.CV

    EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

    Authors: Yining Shi, Kun Jiang, Ke Wang, Kangan Qian, Yunlong Wang, Jiusi Li, Tuopu Wen, Mengmeng Yang, Yiliang Xu, Diange Yang

    Abstract: 3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupan… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: preprint under review

  47. arXiv:2406.07037  [pdf, other

    cs.CV

    PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

    Authors: Yining Shi, Jiusi Li, Kun Jiang, Ke Wang, Yunlong Wang, Mengmeng Yang, Diange Yang

    Abstract: Vision-centric occupancy networks, which represent the surrounding environment with uniform voxels with semantics, have become a new trend for safe driving of camera-only autonomous driving perception systems, as they are able to detect obstacles regardless of their shape and occlusion. Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise sem… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 3dv2024

  48. arXiv:2406.05862  [pdf, other

    cs.CL cs.AI cs.CV

    II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xiping Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

    Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 100 pages, 82 figures, add citations

  49. arXiv:2406.04973  [pdf, other

    quant-ph physics.optics

    Characterizing Biphoton Spatial Wave Function Dynamics with Quantum Wavefront Sensing

    Authors: Yi Zheng, Zhao-Di Liu, Rui-Heng Miao, Jin-Ming Cui, Mu Yang, Xiao-Ye Xu, Jin-Shi Xu, Chuan-Feng Li, Guang-Can Guo

    Abstract: With an extremely high dimensionality, the spatial degree of freedom of entangled photons is a key tool for quantum foundation and applied quantum techniques. To fully utilize the feature, the essential task is to experimentally characterize the multiphoton spatial wave function including the entangled amplitude and phase information at different evolutionary stages. However, there is no effective… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Main text: 6 pages, 4 figures; Supplemental Material: 13 pages, 11 figures. Accepted by Physical Review Letters: https://journals.aps.org/prl/accepted/da075Yd1Ae71fc8532571485c61018959de0ae8f7

  50. arXiv:2406.04619  [pdf, other

    cs.LG stat.ML

    CTSyn: A Foundational Model for Cross Tabular Data Generation

    Authors: Xiaofeng Lin, Chenheng Xu, Matthew Yang, Guang Cheng

    Abstract: Generative Foundation Models (GFMs) have produced synthetic data with remarkable quality in modalities such as images and text. However, applying GFMs to tabular data poses significant challenges due to the inherent heterogeneity of table features. Existing cross-table learning frameworks are hindered by the absence of both a generative model backbone and a decoding mechanism for heterogeneous fea… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.