Skip to main content

Showing 1–50 of 1,819 results for author: Yang, D

  1. arXiv:2407.09285  [pdf, other

    cs.CV

    MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction: Methods and Results

    Authors: Jiangpeng He, Yuhao Chen, Gautham Vinod, Talha Ibn Mahmud, Fengqing Zhu, Edward Delp, Alexander Wong, Pengcheng Xi, Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva, Jiadong Tang, Dianyi Yang, Yu Gao, Zhaoxiang Liang, Yawei Jueluo, Chengyu Shi, Pengyu Wang

    Abstract: The increasing interest in computer vision applications for nutrition and dietary monitoring has led to the development of advanced 3D reconstruction techniques for food items. However, the scarcity of high-quality data and limited collaboration between industry and academia have constrained progress in this field. Building on recent advancements in 3D reconstruction, we host the MetaFood Workshop… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Technical report for MetaFood CVPR 2024 Challenge on Physically Informed 3D Food Reconstruction. arXiv admin note: substantial text overlap with arXiv:2407.01717

  2. arXiv:2407.08926  [pdf, other

    cs.IR

    Toward Automatic Group Membership Annotation for Group Fairness Evaluation

    Authors: Fumian Chen, Dayu Yang, Hui Fang

    Abstract: With the increasing research attention on fairness in information retrieval systems, more and more fairness-aware algorithms have been proposed to ensure fairness for a sustainable and healthy retrieval ecosystem. However, as the most adopted measurement of fairness-aware algorithms, group fairness evaluation metrics, require group membership information that needs massive human annotations and is… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: NLDB2024

  3. arXiv:2407.07026  [pdf, other

    cs.CV cs.CL cs.MM cs.SI

    Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

    Authors: Daiqing Wu, Dongbao Yang, Huawen Shen, Can Ma, Yu Zhou

    Abstract: With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image and text from the same post can express different or even contradictory sentiments, leading to potential \textbf{sentiment discrepancy}. However, existing works mainly adopt a single-branch fusion structure that… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  4. arXiv:2407.06084  [pdf, other

    cs.CV

    3D Vision and Language Pretraining with Large-Scale Synthetic Data

    Authors: Dejie Yang, Zhu Xu, Wentao Mo, Qingchao Chen, Siyuan Huang, Yang Liu

    Abstract: 3D Vision-Language Pre-training (3D-VLP) aims to provide a pre-train model which can bridge 3D scenes with natural language, which is an important technique for embodied intelligence. However, current 3D-VLP datasets are hindered by limited scene-level diversity and insufficient fine-grained annotations (only 1.2K scenes and 280K textual annotations in ScanScribe), primarily due to the labor-inten… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: accepted by IJCAI2024

  5. arXiv:2407.05352  [pdf, other

    cs.CV cs.MM

    Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model

    Authors: Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recently, diffusion models have increasingly demonstrated their capabilities in vision understanding. By leveraging prompt-based learning to construct sentences, these models have shown proficiency in classification and visual grounding tasks. However, existing approaches primarily showcase their ability to perform sentence-level localization, leaving the potential for leveraging contextual inform… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  6. arXiv:2407.04963  [pdf, other

    cs.CV

    Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training

    Authors: Dingkang Yang, Kun Yang, Haopeng Kuang, Zhaoyu Chen, Yuzheng Wang, Lihua Zhang

    Abstract: Understanding emotions from diverse contexts has received widespread attention in computer vision communities. The core philosophy of Context-Aware Emotion Recognition (CAER) is to provide valuable semantic cues for recognizing the emotions of target persons by leveraging rich contextual information. Current approaches invariably focus on designing sophisticated structures to extract perceptually… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: TPAMI 2024

  7. arXiv:2407.04955  [pdf, other

    cs.CV

    Asynchronous Multimodal Video Sequence Fusion via Learning Modality-Exclusive and -Agnostic Representations

    Authors: Dingkang Yang, Mingcheng Li, Linhao Qu, Kun Yang, Peng Zhai, Song Wang, Lihua Zhang

    Abstract: Understanding human intentions (e.g., emotions) from videos has received considerable attention recently. Video streams generally constitute a blend of temporal data stemming from distinct modalities, including natural language, facial expressions, and auditory clues. Despite the impressive advancements of previous works via attention-based paradigms, the inherent temporal asynchrony and modality… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: TCSVT 2024

  8. arXiv:2407.03384  [pdf, other

    physics.flu-dyn cs.CE

    Topological Separation of Vortices

    Authors: Adeel Zafar, Zahra Poorshayegh, Di Yang, Guoning Chen

    Abstract: Vortices and their analysis play a critical role in the understanding of complex phenomena in turbulent flows. Traditional vortex extraction methods, notably region-based techniques, often overlook the entanglement phenomenon, resulting in the inclusion of multiple vortices within a single extracted region. Their separation is necessary for quantifying different types of vortices and their statist… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted for presentation at IEEE Visualization (VIS) 2024 short paper track and will appear in the conference proceedings

  9. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  10. arXiv:2407.02996  [pdf, other

    cs.CL cs.AI

    Are Large Language Models Consistent over Value-laden Questions?

    Authors: Jared Moore, Tanvi Deshpande, Diyi Yang

    Abstract: Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across (1) paraphrases of one question, (2) related questions under one topic, (3) multiple-choice and open-ended use-cases of one question, a… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 pages, 9 figures

  11. arXiv:2407.02725  [pdf, ps, other

    math.RT math.CT

    Derived preprojective algebras and spherical twist functors

    Authors: Yuya Mizuno, Dong Yang

    Abstract: We study silting objects over derived preprojective algebras of acyclic quivers by giving a direct relationship between silting objects, spherical twist functors and mutations. Especially, for a Dynkin quiver, we establish a bijection between the elements of the braid group and the set of isomorphism classes of basic silting objects over the derived preprojective algebra.

    Submitted 2 July, 2024; originally announced July 2024.

    MSC Class: 16E45; 18G80; 16E35

  12. arXiv:2407.01930  [pdf, other

    cs.CV

    Self-Cooperation Knowledge Distillation for Novel Class Discovery

    Authors: Yuzheng Wang, Zhaoyu Chen, Dingkang Yang, Yunquan Sun, Lizhe Qi

    Abstract: Novel Class Discovery (NCD) aims to discover unknown and novel classes in an unlabeled set by leveraging knowledge already learned about known classes. Existing works focus on instance-level or class-level knowledge representation and build a shared representation space to achieve performance improvements. However, a long-neglected issue is the potential imbalanced number of samples from known and… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  13. arXiv:2407.01625  [pdf, other

    math.CO

    Balanced clique subdivisions and cycles lengths in $K_{s, t}$-free graphs

    Authors: Jianfeng Hou, Yindong Jin, Donglei Yang, Fan Yang

    Abstract: Let $ t\ge s\ge2$ be integers. Confirming a conjecture of Mader, Liu and Montgomery [J. Lond. Math. Soc., 2017] showed that every $K_{s, t}$-free graph with average degree $d$ contains a subdivision of a clique with at least $Ω(d^{\frac{s}{2(s-1)}})$ vertices. We give an improvement by showing that such a graph contains a balanced subdivision of a clique with the same order, where a balanced subdi… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2010.15802 by other authors

  14. arXiv:2407.01111  [pdf, other

    cs.LG cs.AI stat.ML

    Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

    Authors: Hao Wang, Zhichao Chen, Yuan Shen, Jiajun Fan, Zhaoran Liu, Degui Yang, Xinggao Liu, Haoxuan Li

    Abstract: Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Code is available at https://anonymous.4open.science/status/ncr-B697

  15. arXiv:2407.00870  [pdf, other

    cs.CL cs.HC

    Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

    Authors: Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, Diyi Yang

    Abstract: Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 34 pages, 24 figures, 11 Tables

  16. arXiv:2406.18921  [pdf, other

    cs.CL

    Capturing Minds, Not Just Words: Enhancing Role-Playing Language Models with Personality-Indicative Data

    Authors: Yiting Ran, Xintao Wang, Rui Xu, Xinfeng Yuan, Jiaqing Liang, Yanghua Xiao, Deqing Yang

    Abstract: Role-playing agents (RPA) have been a popular application area for large language models (LLMs), attracting significant interest from both industry and academia.While existing RPAs well portray the characters' knowledge and tones, they face challenges in capturing their minds, especially for small role-playing language models (RPLMs). In this paper, we propose to enhance RPLMs via personality-indi… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10pages

  17. arXiv:2406.17897  [pdf, other

    eess.IV

    Pixel-weighted Multi-pose Fusion for Metal Artifact Reduction in X-ray Computed Tomography

    Authors: Diyu Yang, Craig A. J. Kemp, Soumendu Majee, Gregery T. Buzzard, Charles A. Bouman

    Abstract: X-ray computed tomography (CT) reconstructs the internal morphology of a three dimensional object from a collection of projection images, most commonly using a single rotation axis. However, for objects containing dense materials like metal, the use of a single rotation axis may leave some regions of the object obscured by the metal, even though projections from other rotation axes (or poses) migh… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE MMSP 2024. arXiv admin note: substantial text overlap with arXiv:2209.07561

  18. arXiv:2406.17271  [pdf, other

    cs.CL

    DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

    Authors: Zhehao Zhang, Jiaao Chen, Diyi Yang

    Abstract: The current paradigm of evaluating Large Language Models (LLMs) through static benchmarks comes with significant limitations, such as vulnerability to data contamination and a lack of adaptability to the evolving capabilities of LLMs. Therefore, evaluation methods that can adapt and generate evaluation data with controlled complexity are urgently needed. In this work, we introduce Dynamic Evaluati… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  19. arXiv:2406.17006  [pdf, other

    hep-ex

    Probing the nature of the $χ_{c1}(3872)$ state using radiative decays

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1094 additional authors not shown)

    Abstract: The radiative decays $χ_{c1}(3872)\rightarrowψ(2S)γ$ and $χ_{c1}(3872)\rightarrow J/ψγ$ are used to probe the~nature of the~$χ_{c1}(3872)$ state using proton-proton collision data collected with the LHCb detector, corresponding to an~integrated luminosity of~9fb$^{-1}$. Using the~$B^+\rightarrow χ_{c1}(3872)K^+$decay, the $χ_{c1}(3872)\rightarrow ψ(2S)γ$ process is observed for the first time and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 31 pages, 2 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-015.html (LHCb public pages)

    Report number: LHCb-PAPER-2024-015, CERN-EP-2025-157

  20. arXiv:2406.16992  [pdf, other

    cs.LG cs.AI

    Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

    Authors: Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

    Abstract: Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  21. arXiv:2406.15769  [pdf, other

    cs.DC

    Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

    Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

    Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages; 27 figures

  22. arXiv:2406.15710  [pdf, other

    quant-ph physics.atom-ph physics.optics

    A photonic quantum engine driven by superradiance

    Authors: Jinuk Kim, Seung-hoon Oh, Daeho Yang, Junki Kim, Moonjoo Lee, Kyungwon An

    Abstract: Performance of nano- and micro-scale heat engines can be improved with a help from quantum mechanical phenomena. Recently, heat reservoirs with quantum coherence have been proposed to enhance engine performance beyond the Carnot limit even with a single reservoir. However, no physical realizations have been achieved so far. Here, we report the first proof-of-principle experimental demonstration of… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 1 extended data figure

    Journal ref: Nat. Photon. 16, 707 (2022)

  23. arXiv:2406.14958  [pdf, other

    cs.CV

    Skip and Skip: Segmenting Medical Images with Prompts

    Authors: Jiawei Chen, Dingkang Yang, Yuxuan Lei, Lihua Zhang

    Abstract: Most medical image lesion segmentation methods rely on hand-crafted accurate annotations of the original image for supervised learning. Recently, a series of weakly supervised or unsupervised methods have been proposed to reduce the dependence on pixel-level annotations. However, these methods are essentially based on pixel-level annotation, ignoring the image-level diagnostic results of the curre… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Work in progress

  24. arXiv:2406.14282  [pdf, other

    cs.CL cs.AI

    Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

    Authors: Junjie Wang, Mingyang Chen, Binbin Hu, Dan Yang, Ziqi Liu, Yue Shen, Peng Wei, Zhiqiang Zhang, Jinjie Gu, Jun Zhou, Jeff Z. Pan, Wen Zhang, Huajun Chen

    Abstract: Improving the performance of large language models (LLMs) in complex question-answering (QA) scenarios has always been a research focal point. Recent studies have attempted to enhance LLMs' performance by combining step-wise planning with external retrieval. While effective for advanced models like GPT-3.5, smaller LLMs face challenges in decomposing complex questions, necessitating supervised fin… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress

  25. DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation

    Authors: Deguo Xia, Weiming Zhang, Xiyan Liu, Wei Zhang, Chenting Gong, Jizhou Huang, Mengmeng Yang, Diange Yang

    Abstract: Generating city-scale lane-level maps faces significant challenges due to the intricate urban environments, such as blurred or absent lane markings. Additionally, a standard lane-level map requires a comprehensive organization of lane groupings, encompassing lane direction, style, boundary, and topology, yet has not been thoroughly examined in prior research. These obstacles result in labor-intens… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024, camera-ready version

  26. arXiv:2406.14228  [pdf, other

    cs.AI

    EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

    Authors: Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang

    Abstract: The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of agent systems. How to automatically extend the spec… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in process

  27. arXiv:2406.12111  [pdf, other

    hep-ex

    Precision measurement of the $Ξ^-_b$ baryon lifetime

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1064 additional authors not shown)

    Abstract: A sample of $pp$ collision data, corresponding to an integrated luminosity of 5.5 fb$^{-1}$ and collected by the LHCb experiment during Run 2, is used to measure the ratio of the lifetime of the $Ξ^-_b$ baryon to that of the $Λ^0_b$ baryon, $r_τ\equivτ_{Ξ^-_b}/τ_{Λ^0_b}$. The value ${r_τ^{\rm Run\,2}=1.076\pm0.013\pm0.006}$ is obtained, where the first uncertainty is statistical and the second sys… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures. All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2014-010.html (LHCb public pages)

    Report number: LHCb-PAPER-2024-010, CERN-EP-2024-139

  28. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  29. arXiv:2406.11455  [pdf, other

    cs.CL cs.AI

    Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

    Authors: Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Yanghua Xiao, Jiaqing Liang

    Abstract: Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  30. arXiv:2406.11451  [pdf, other

    cs.CV

    MedThink: Inducing Medical Large-scale Visual Language Models to Hallucinate Less by Thinking More

    Authors: Yue Jiang, Jiawei Chen, Dingkang Yang, Mingcheng Li, Shunli Wang, Tong Wu, Ke Li, Lihua Zhang

    Abstract: When Large Vision Language Models (LVLMs) are applied to multimodal medical generative tasks, they suffer from significant model hallucination issues. This severely impairs the model's generative accuracy, making it challenging for LVLMs to be implemented in real-world medical scenarios to assist doctors in diagnosis. Enhancing the training data for downstream medical generative tasks is an effect… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  31. arXiv:2406.11375  [pdf, other

    cs.CL cs.AI

    Boosting Scientific Concepts Understanding: Can Analogy from Teacher Models Empower Student Models?

    Authors: Siyu Yuan, Cheng Jiayang, Lin Qiu, Deqing Yang

    Abstract: Analogical reasoning plays a critical role in human cognition, enabling us to understand new concepts by associating them with familiar ones. Previous research in the AI community has mainly focused on identifying and generating analogies and then examining their quality under human evaluation, which overlooks the practical application of these analogies in real-world settings. Inspired by the hum… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.10753  [pdf, other

    astro-ph.CO

    Testing the parametric model for self-interacting dark matter using matched halos in cosmological simulations

    Authors: Daneng Yang, Ethan O. Nadler, Hai-Bo Yu

    Abstract: We systemically evaluate the performance of the self-interacting dark matter (SIDM) halo model proposed in arXiv:2305.16176 with matched halos from high-resolution cosmological CDM and SIDM simulations. The model incorporates SIDM effects along mass evolution histories of CDM halos and it is applicable to both isolated halos and suhbhalos. We focus on the accuracy of the model in predicting halo d… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 20 pages, 19 figures

  33. arXiv:2406.10527  [pdf, other

    cs.CV

    Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center

    Authors: Zichen Yu, Changyong Shu, Qianpu Sun, Junjie Linghu, Xiaobao Wei, Jiangyong Yu, Zongdai Liu, Dawei Yang, Hui Li, Yan Chen

    Abstract: Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  34. arXiv:2406.10185  [pdf, other

    cs.CV

    Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

    Authors: Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang

    Abstract: Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundational Large Language Models (LLMs), they also inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts where the margin for error is mi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  35. arXiv:2406.10056  [pdf, other

    cs.SD eess.AS

    UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

    Authors: Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng

    Abstract: The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-dr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  36. arXiv:2406.09264  [pdf, other

    cs.HC cs.AI cs.CL

    Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

    Authors: Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

    Abstract: Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve th… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 56 pages

  37. arXiv:2406.08336  [pdf, other

    cs.SD cs.CV eess.AS

    CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

    Authors: Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec language modeling to improve the reconstruction results, especially for the speaker similarity and prosody naturalness. Our proposed model consists of: (… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  38. arXiv:2406.07042  [pdf, other

    cs.CV

    EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network

    Authors: Yining Shi, Kun Jiang, Ke Wang, Kangan Qian, Yunlong Wang, Jiusi Li, Tuopu Wen, Mengmeng Yang, Yiliang Xu, Diange Yang

    Abstract: 3D occupancy prediction (Occ) is a rapidly rising challenging perception task in the field of autonomous driving which represents the driving scene as uniformly partitioned 3D voxel grids with semantics. Compared to 3D object detection, grid perception has great advantage of better recognizing irregularly shaped, unknown category, or partially occluded general objects. However, existing 3D occupan… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: preprint under review

  39. arXiv:2406.07037  [pdf, other

    cs.CV

    PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

    Authors: Yining Shi, Jiusi Li, Kun Jiang, Ke Wang, Yunlong Wang, Mengmeng Yang, Diange Yang

    Abstract: Vision-centric occupancy networks, which represent the surrounding environment with uniform voxels with semantics, have become a new trend for safe driving of camera-only autonomous driving perception systems, as they are able to detect obstacles regardless of their shape and occlusion. Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise sem… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 3dv2024

  40. arXiv:2406.06840  [pdf, other

    cs.CL cs.LG

    Silent Signals, Loud Impact: LLMs for Word-Sense Disambiguation of Coded Dog Whistles

    Authors: Julia Kruk, Michela Marchini, Rijul Magu, Caleb Ziems, David Muchlinski, Diyi Yang

    Abstract: A dog whistle is a form of coded communication that carries a secondary meaning to specific audiences and is often weaponized for racial and socioeconomic discrimination. Dog whistling historically originated from United States politics, but in recent years has taken root in social media as a means of evading hate speech detection systems and maintaining plausible deniability. In this paper, we pr… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: ACL 2024

    ACM Class: J.4; K.4.1; K.4.2

  41. arXiv:2406.06329  [pdf, other

    cs.CL eess.AS

    A Parameter-efficient Language Extension Framework for Multilingual ASR

    Authors: Wei Liu, Jingyong Hou, Dong Yang, Muyong Cao, Tan Lee

    Abstract: Covering all languages with a multilingual speech recognition model (MASR) is very difficult. Performing language extension on top of an existing MASR is a desirable choice. In this study, the MASR continual learning problem is probabilistically decomposed into language identity prediction (LP) and cross-lingual adaptation (XLA) sub-problems. Based on this, we propose an architecture-based framewo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  42. arXiv:2406.06040  [pdf, other

    cs.CV

    Vript: A Video Is Worth Thousands of Words

    Authors: Dongjie Yang, Suyuan Huang, Chengqiang Lu, Xiaodong Han, Haoxin Zhang, Yan Gao, Yao Hu, Hai Zhao

    Abstract: Advancements in multimodal learning, particularly in video understanding and generation, require high-quality video-text datasets for improved model performance. Vript addresses this issue with a meticulously annotated corpus of 12K high-resolution videos, offering detailed, dense, and script-like captions for over 420K clips. Each clip has a caption of ~145 words, which is over 10x longer than mo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: submitted to NeurIPS Dataset & Benchmark track

  43. arXiv:2406.05447  [pdf, other

    astro-ph.IM astro-ph.EP astro-ph.SR

    The PLATO Mission

    Authors: Heike Rauer, Conny Aerts, Juan Cabrera, Magali Deleuil, Anders Erikson, Laurent Gizon, Mariejo Goupil, Ana Heras, Jose Lorenzo-Alvarez, Filippo Marliani, Cesar Martin-Garcia, J. Miguel Mas-Hesse, Laurence O'Rourke, Hugh Osborn, Isabella Pagano, Giampaolo Piotto, Don Pollacco, Roberto Ragazzoni, Gavin Ramsay, Stéphane Udry, Thierry Appourchaux, Willy Benz, Alexis Brandeker, Manuel Güdel, Eduardo Janot-Pacheco , et al. (801 additional authors not shown)

    Abstract: PLATO (PLAnetary Transits and Oscillations of stars) is ESA's M3 mission designed to detect and characterise extrasolar planets and perform asteroseismic monitoring of a large number of stars. PLATO will detect small planets (down to <2 R_(Earth)) around bright stars (<11 mag), including terrestrial planets in the habitable zone of solar-like stars. With the complement of radial velocity observati… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  44. arXiv:2406.05285  [pdf, other

    cs.CV

    VISTA3D: Versatile Imaging SegmenTation and Annotation model for 3D Computed Tomography

    Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

    Abstract: Segmentation foundation models have attracted great interest, however, none of them are adequate enough for the use cases in 3D computed tomography scans (CT) images. Existing works finetune on medical images with 2D foundation models trained on natural images, but interactive segmentation, especially in 2D, is too time-consuming for 3D scans and less useful for large cohort analysis. Models that… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  45. arXiv:2406.04784  [pdf, other

    cs.CL cs.AI

    SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

    Authors: Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang

    Abstract: Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and in adapting to environments where feedback is delayed. In this paper, we present SelfGoal, a novel automatic approach designed to enhance agen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint

  46. arXiv:2406.04542  [pdf, other

    cs.CV cs.GR

    M&M VTO: Multi-Garment Virtual Try-On and Editing

    Authors: Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman

    Abstract: We present M&M VTO, a mix and match virtual try-on method that takes as input multiple garment images, text description for garment layout and an image of a person. An example input includes: an image of a shirt, an image of a pair of pants, "rolled sleeves, shirt tucked in", and an image of a person. The output is a visualization of how those garments (in the desired layout) would look like on th… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Highlight. Project website: https://mmvto.github.io/

  47. arXiv:2406.04298  [pdf, other

    cs.IR cs.CL

    Measuring and Addressing Indexical Bias in Information Retrieval

    Authors: Caleb Ziems, William Held, Jane Dwivedi-Yu, Diyi Yang

    Abstract: Information Retrieval (IR) systems are designed to deliver relevant content, but traditional systems may not optimize rankings for fairness, neutrality, or the balance of ideas. Consequently, IR can often introduce indexical biases, or biases in the positional order of documents. Although indexical bias can demonstrably affect people's opinion, voting patterns, and other behaviors, these issues re… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL 2024

  48. arXiv:2406.04151  [pdf, other

    cs.AI cs.CL

    AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

    Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project site: https://agentgym.github.io

  49. arXiv:2406.03387  [pdf, other

    hep-ex

    Measurement of the branching fraction ratios $R(D^{+})$ and $R(D^{*+})$ using muonic $τ$ decays

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1063 additional authors not shown)

    Abstract: The branching fraction ratios of $\overline{B}^0\to D^+τ^-\overlineν_τ$ and $\overline{B}^0\to D^{*+}τ^-\overlineν_τ$ decays are measured with respect to their muonic counterparts, using a data sample corresponding to an integrated luminosity of 2.0 fb$^{-1}$ collected by the LHCb experiment in proton-proton collisions at $\sqrt{s} = 13$ TeV. The reconstructed final states are formed by combining… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: All figures and tables, along with machine-readable versions and any supplementary material and additional information, are available at https://lhcbproject.web.cern.ch/Publications/LHCbProjectPublic/LHCb-PAPER-2024-007.html (LHCb public pages)

    Report number: LHCb-PAPER-2024-007, CERN-EP-2024-125

  50. arXiv:2406.03156  [pdf, other

    hep-ex

    Observation of new charmonium(-like) states in $B^+ \to D^{*\pm} D^{\mp} K^+$ decays

    Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

    Abstract: A study of resonant structures in $B^{+}\rightarrow{D^{\ast+}D^{-}K^{+}}$ and $B^{+}\rightarrow{D^{\ast-}D^{+}K^{+}}$ decays is performed, using proton-proton collision data at centre-of-mass energies of $\sqrt{s}=7, 8$, and $13$ TeV recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 fb$^{-1}$. A simultaneous amplitude fit is performed to the two channels with contribu… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2023-047.html (LHCb public pages)

    Report number: LHCb-PAPER-2023-047, CERN-EP-2024-096