Skip to main content

Showing 1–50 of 357 results for author: Jin, Q

  1. arXiv:2407.00431  [pdf, other

    cs.CV

    Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones

    Authors: Qiangguo Jin, Jiapeng Huang, Changming Sun, Hui Cui, Ping Xuan, Ran Su, Leyi Wei, Yu-Jie Wu, Chia-An Wu, Henry B. L. Duh, Yueh-Hsun Lu

    Abstract: The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Journal ref: MICCAI 2024

  2. arXiv:2406.17755  [pdf, other

    cs.CL

    Accelerating Clinical Evidence Synthesis with Large Language Models

    Authors: Zifeng Wang, Lang Cao, Benjamin Danek, Yichi Zhang, Qiao Jin, Zhiyong Lu, Jimeng Sun

    Abstract: Automatic medical discovery by AI is a dream of many. One step toward that goal is to create an AI model to understand clinical studies and synthesize clinical evidence from the literature. Clinical evidence synthesis currently relies on systematic reviews of clinical trials and retrospective analyses from medical literature. However, the rapid expansion of publications presents challenges in effi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.16814  [pdf, other

    math.NA math.OC

    Convergence analysis of a stochastic heavy-ball method for linear ill-posed problems

    Authors: Qinian Jin, Yanjun Liu

    Abstract: In this paper we consider a stochastic heavy-ball method for solving linear ill-posed inverse problems. With suitable choices of the step-sizes and the momentum coefficients, we establish the regularization property of the method under {\it a priori} selection of the stopping index and derive the rate of convergence under a benchmark source condition on the sought solution. Numerical results are p… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.16578  [pdf, other

    cs.RO cs.AI

    QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

    Authors: Ye Wang, Yuting Mei, Sipeng Zheng, Qin Jin

    Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-ma… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under review

  5. arXiv:2406.16537  [pdf, other

    cs.CV cs.AI

    Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

    Authors: Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu

    Abstract: Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. The… ▽ More

    Submitted 3 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16301  [pdf, other

    cs.CV cs.AI cs.MM

    UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

    Authors: Yuting Mei, Linli Yao, Qin Jin

    Abstract: With the surge in the amount of video data, video summarization techniques, including visual-modal(VM) and textual-modal(TM) summarization, are attracting more and more attention. However, unimodal summarization inevitably loses the rich semantics of the video. In this paper, we focus on a more comprehensive video summarization task named Bimodal Semantic Summarization of Videos (BiSSV). Specifica… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM International Conference on Multimedia Retrieval (ICMR'24)

    Journal ref: Proceedings of the 2024 International Conference on Multimedia Retrieval, May 2024, Pages 1034-1042

  7. arXiv:2406.12259  [pdf

    cs.AI

    Adversarial Attacks on Large Language Models in Medicine

    Authors: Yifan Yang, Qiao Jin, Furong Huang, Zhiyong Lu

    Abstract: The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12036  [pdf, other

    cs.CL cs.AI

    MedCalc-Bench: Evaluating Large Language Models for Medical Calculations

    Authors: Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid A Anwar, Andrew Zhang, Aidan Gilson, Maxwell B Singer, Amisha Dave, Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu

    Abstract: As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative e… ▽ More

    Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Github link: https://github.com/ncbi-nlp/MedCalc-Bench HuggingFace link: https://huggingface.co/datasets/nsk7153/MedCalc-Bench

  9. arXiv:2406.10960  [pdf, other

    cs.CL

    ESCoT: Towards Interpretable Emotional Support Dialogue Systems

    Authors: Tenggan Zhang, Xinjie Zhang, Jinming Zhao, Li Zhou, Qin Jin

    Abstract: Understanding the reason for emotional support response is crucial for establishing connections between users and emotional support dialogue systems. Previous works mostly focus on generating better responses but ignore interpretability, which is extremely important for constructing reliable dialogue systems. To empower the system with better interpretability, we propose an emotional support respo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (Long Paper)

  10. arXiv:2406.10911  [pdf, other

    cs.SD eess.AS

    SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

    Authors: Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin

    Abstract: In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction m… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.10710  [pdf, other

    cs.AI cs.CL

    SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task

    Authors: Ziije Zhong, Linqing Zhong, Zhaoze Sun, Qingyun Jin, Zengchang Qin, Xiaofan Zhang

    Abstract: Integrating Large Language Models (LLMs) with existing Knowledge Graph (KG) databases presents a promising avenue for enhancing LLMs' efficacy and mitigating their "hallucinations". Given that most KGs reside in graph databases accessible solely through specialized query languages (e.g., Cypher), there exists a critical need to bridge the divide between LLMs and KG databases by automating the tran… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 19 pages, 15 figures, 8 tables

  12. arXiv:2406.08997  [pdf, ps, other

    cs.CV

    Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

    Authors: Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

    Abstract: Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing wo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by ICME 2024

  13. arXiv:2406.08905  [pdf, other

    cs.SD eess.AS

    SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

    Authors: Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin

    Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation th… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  14. arXiv:2406.08416  [pdf, other

    cs.SD eess.AS

    TokSing: Singing Voice Synthesis based on Discrete Tokens

    Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin

    Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  15. arXiv:2406.07725  [pdf, ps, other

    cs.SD eess.AS

    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

    Authors: Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin

    Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This manuscript has been accepted by Interspeech2024

  16. arXiv:2406.03688  [pdf, other

    eess.IV cs.CV

    Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification

    Authors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao Jin, Zhiyong Lu, Ronald M. Summers

    Abstract: In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  17. arXiv:2406.02016  [pdf, other

    math.OC cs.LG stat.ML

    Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

    Authors: Ruichen Jiang, Ali Kavis, Qiujiang Jin, Sujay Sanghavi, Aryan Mokhtari

    Abstract: We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  18. arXiv:2405.21063  [pdf, other

    cs.LG cs.AI

    Neural Network Verification with Branch-and-Bound for General Nonlinearities

    Authors: Zhouxing Shi, Qirui Jin, Zico Kolter, Suman Jana, Cho-Jui Hsieh, Huan Zhang

    Abstract: Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide whic… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint

  19. arXiv:2405.17719  [pdf, other

    cs.CV

    EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

    Authors: Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin Jin

    Abstract: Egocentric video-language pretraining is a crucial paradigm to advance the learning of egocentric hand-object interactions (EgoHOI). Despite the great success on existing testbeds, these benchmarks focus more on closed-set visual concepts or limited scenarios. Due to the occurrence of diverse EgoHOIs in the real world, we propose an open-vocabulary benchmark named EgoHOIBench to reveal the diminis… ▽ More

    Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/xuboshen/EgoNCEpp

  20. arXiv:2405.16205  [pdf

    cs.AI cs.CL

    GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases

    Authors: Zhizheng Wang, Qiao Jin, Chih-Hsuan Wei, Shubo Tian, Po-Ting Lai, Qingqing Zhu, Chi-Ping Day, Christina Ross, Zhiyong Lu

    Abstract: Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 30 pages with 10 figures and/or tables

  21. arXiv:2405.14040  [pdf, other

    cs.MM

    Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

    Authors: Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin

    Abstract: Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 15 pages, 13 figures

  22. arXiv:2405.13340  [pdf, other

    math.NA math.OC

    Randomized block coordinate descent method for linear ill-posed problems

    Authors: Qinian Jin, Duo Liu

    Abstract: Consider the linear ill-posed problems of the form $\sum_{i=1}^{b} A_i x_i =y$, where, for each $i$, $A_i$ is a bounded linear operator between two Hilbert spaces $X_i$ and ${\mathcal Y}$. When $b$ is huge, solving the problem by an iterative method using the full gradient at each iteration step is both time-consuming and memory insufficient. Although randomized block coordinate decent (RBCD) meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    MSC Class: 65J20; 65J22; 65J10; 94A08

  23. arXiv:2405.10860  [pdf, other

    cs.CL

    ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains

    Authors: Zhaopei Huang, Jinming Zhao, Qin Jin

    Abstract: Understanding the process of emotion generation is crucial for analyzing the causes behind emotions. Causal Emotion Entailment (CEE), an emotion-understanding task, aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance. However, current works in CEE mainly focus on modeling semantic and emotional interactions in conversations, neglecti… ▽ More

    Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  24. arXiv:2405.08269  [pdf, ps, other

    math.NA math.OC

    On saturation of the discrepancy principle for nonlinear Tikhonov regularization in Hilbert spaces

    Authors: Qinian Jin

    Abstract: In this paper we revisit the discrepancy principle for Tikhonov regularization of nonlinear ill-posed problems in Hilbert spaces and provide some new and improved saturation results under less restrictive conditions, comparing with the existing results in the literature.

    Submitted 13 May, 2024; originally announced May 2024.

  25. arXiv:2404.16731  [pdf, ps, other

    math.OC

    Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search

    Authors: Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthe… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  26. arXiv:2404.16635  [pdf, other

    cs.CV

    TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

    Authors: Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang

    Abstract: Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficien… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages, 11 figures

  27. arXiv:2404.16361  [pdf, other

    cs.LG cs.NE cs.SC

    Evolutionary Causal Discovery with Relative Impact Stratification for Interpretable Data Analysis

    Authors: Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin

    Abstract: This study proposes Evolutionary Causal Discovery (ECD) for causal discovery that tailors response variables, predictor variables, and corresponding operators to research datasets. Utilizing genetic programming for variable relationship parsing, the method proceeds with the Relative Impact Stratification (RIS) algorithm to assess the relative impact of predictor variables on the response variable,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  28. arXiv:2404.14705  [pdf, other

    cs.CV

    Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

    Authors: Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin

    Abstract: This work addresses the 3D situated reasoning task which aims to answer questions given egocentric observations in a 3D environment. The task remains challenging as it requires comprehensive 3D perception and complex reasoning skills. End-to-end models trained on supervised data for 3D situated reasoning suffer from data scarcity and generalization ability. Inspired by the recent success of levera… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  29. arXiv:2404.13370  [pdf, other

    cs.CV cs.CL cs.MM

    Movie101v2: Improved Movie Narration Benchmark

    Authors: Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin

    Abstract: Automatic movie narration targets at creating video-aligned plot descriptions to assist visually impaired audiences. It differs from standard video captioning in that it requires not only describing key visual details but also inferring the plots developed across multiple movie shots, thus posing unique and ongoing challenges. To advance the development of automatic movie narrating systems, we fir… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  30. arXiv:2404.05900  [pdf, ps, other

    math.OC

    Distributionally Robust Optimization with Decision-Dependent Information Discovery

    Authors: Qing Jin, Angelos Georghiou, Phebe Vayanos, Grani A. Hanasusanto

    Abstract: We study two-stage distributionally robust optimization (DRO) problems with decision-dependent information discovery (DDID) wherein (a portion of) the uncertain parameters are revealed only if an (often costly) investment is made in the first stage. This class of problems finds many important applications in selection problems (e.g., in hiring, project portfolio optimization, or optimal sensor loc… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  31. arXiv:2404.03218  [pdf, other

    math.NA math.OC

    An adaptive heavy ball method for ill-posed inverse problems

    Authors: Qinian Jin, Qin Huang

    Abstract: In this paper we consider ill-posed inverse problems, both linear and nonlinear, by a heavy ball method in which a strongly convex regularization function is incorporated to detect the feature of the sought solution. We develop ideas on how to adaptively choose the step-sizes and the momentum coefficients to achieve acceleration over the Landweber-type method. We then analyze the method and establ… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  32. arXiv:2404.01267  [pdf, other

    math.OC

    Non-asymptotic Global Convergence Rates of BFGS with Exact Line Search

    Authors: Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to other quasi-Newton methods in the convex Broyden class employing exact line search, such as the Davidon-Fletcher-Powell (DFP) method. Specifically, we focus on… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  33. arXiv:2403.15033  [pdf, other

    cs.CV

    Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

    Authors: Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni

    Abstract: Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new… ▽ More

    Submitted 8 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  34. arXiv:2403.12895  [pdf, other

    cs.CV

    mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

    Authors: Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

    Abstract: Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition ability but lack general structure understanding abilities for text-rich document images. In this work, we emphasize the importance of structure informatio… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 21 pages, 15 figures

  35. Inter- and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation

    Authors: Qiangguo Jin, Hui Cui, Changming Sun, Yang Song, Jiangbin Zheng, Leilei Cao, Leyi Wei, Ran Su

    Abstract: Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertaint… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: Expert Systems with Applications, 2024, 238: 122093

  36. arXiv:2403.12460  [pdf, ps, other

    math.NA math.OC

    Stochastic variance reduced gradient method for linear ill-posed inverse problems

    Authors: Qinian Jin, Liuhong Chen

    Abstract: In this paper we apply the stochastic variance reduced gradient (SVRG) method, which is a popular variance reduction method in optimization for accelerating the stochastic gradient method, to solve large scale linear ill-posed systems in Hilbert spaces. Under {\it a priori} choices of stopping indices, we derive a convergence rate result when the sought solution satisfies a benchmark source condit… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  37. arXiv:2403.05874  [pdf, other

    cs.CV cs.RO

    SPAFormer: Sequential 3D Part Assembly with Transformers

    Authors: Boshen Xu, Sipeng Zheng, Qin Jin

    Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/xuboshen/SPAFormer

  38. POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

    Authors: Boshen Xu, Sipeng Zheng, Qin Jin

    Abstract: We humans are good at translating third-person observations of hand-object interactions (HOI) into an egocentric view. However, current methods struggle to replicate this ability of view adaptation from third-person to first-person. Although some approaches attempt to learn view-agnostic representation from large-scale video datasets, they ignore the relationships among multiple third-person views… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM MM 2023. Project page: https://xuboshen.github.io/

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023). Association for Computing Machinery, New York, NY, USA, 2807-2816

  39. arXiv:2403.05680  [pdf, other

    cs.AI cs.CL cs.CV

    How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analyses

    Authors: Qingqing Zhu, Benjamin Hou, Tejas S. Mathai, Pritam Mukherjee, Qiao Jin, Xiuying Chen, Zhizheng Wang, Ruida Cheng, Ronald M. Summers, Zhiyong Lu

    Abstract: Automatically interpreting CT scans can ease the workload of radiologists. However, this is challenging mainly due to the scarcity of adequate datasets and reference standards for evaluation. This study aims to bridge this gap by introducing a novel evaluation framework, named ``GPTRadScore''. This framework assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  40. Fast, nonlocal and neural: a lightweight high quality solution to image denoising

    Authors: Yu Guo, Axel Davy, Gabriele Facciolo, Jean-Michel Morel, Qiyu Jin

    Abstract: With the widespread application of convolutional neural networks (CNNs), the traditional model based denoising algorithms are now outperformed. However, CNNs face two problems. First, they are computationally demanding, which makes their deployment especially difficult for mobile terminals. Second, experimental evidence shows that CNNs often over-smooth regular textures present in images, in contr… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 5 pages. This paper was accepted by IEEE Signal Processing Letters on July 1, 2021

    Journal ref: IEEE Signal Processing Letters, 2021, 28:1515-1519

  41. Kernel Correlation-Dissimilarity for Multiple Kernel k-Means Clustering

    Authors: Rina Su, Yu Guo, Caiying Wu, Qiyu Jin, Tieyong Zeng

    Abstract: The main objective of the Multiple Kernel k-Means (MKKM) algorithm is to extract non-linear information and achieve optimal clustering by optimizing base kernel matrices. Current methods enhance information diversity and reduce redundancy by exploiting interdependencies among multiple kernels based on correlations or dissimilarities. Nevertheless, relying solely on a single metric, such as correla… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 36 pages. This paper was accepted by Pattern Recognition on January 31, 2024

    Journal ref: Pattern Recognition, 2024, 150:110307

  42. arXiv:2402.14545  [pdf, other

    cs.CL cs.CV

    Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

    Authors: Zihao Yue, Liang Zhang, Qin Jin

    Abstract: Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation, leading to continued outputs beyond visual perception limits. By investigating how the model decides to ter… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024

  43. arXiv:2402.13225  [pdf

    cs.CL cs.AI

    AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning

    Authors: Qiao Jin, Zhizheng Wang, Yifan Yang, Qingqing Zhu, Donald Wright, Thomas Huang, W John Wilbur, Zhe He, Andrew Taylor, Qingyu Chen, Zhiyong Lu

    Abstract: Clinical calculators play a vital role in healthcare by offering accurate evidence-based predictions for various purposes such as prognosis. Nevertheless, their widespread utilization is frequently hindered by usability challenges, poor dissemination, and restricted functionality. Augmenting large language models with extensive collections of clinical calculators presents an opportunity to overcom… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Work in progress

  44. arXiv:2402.13178  [pdf, other

    cs.CL cs.AI

    Benchmarking Retrieval-Augmented Generation for Medicine

    Authors: Guangzhi Xiong, Qiao Jin, Zhiyong Lu, Aidong Zhang

    Abstract: While large language models (LLMs) have achieved state-of-the-art performance on a wide range of medical question answering (QA) tasks, they still face challenges with hallucinations and outdated knowledge. Retrieval-augmented generation (RAG) is a promising solution and has been widely adopted. However, a RAG system can involve multiple flexible components, and there is a lack of best practices r… ▽ More

    Submitted 23 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Homepage: https://teddy-xionggz.github.io/benchmark-medical-rag/

  45. arXiv:2402.04247  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

    Authors: Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

    Abstract: Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, these agents, called scientific LLM agents, also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notab… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  46. arXiv:2402.03484  [pdf, other

    cs.IR cs.CL

    Harnessing PubMed User Query Logs for Post Hoc Explanations of Recommended Similar Articles

    Authors: Ashley Shin, Qiao Jin, James Anibal, Zhiyong Lu

    Abstract: Searching for a related article based on a reference article is an integral part of scientific research. PubMed, like many academic search engines, has a "similar articles" feature that recommends articles relevant to the current article viewed by a user. Explaining recommended items can be of great utility to users, particularly in the literature search process. With more than a million biomedica… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  47. arXiv:2402.01693  [pdf

    cs.CL cs.AI

    Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study

    Authors: Zhe He, Balu Bhasuran, Qiao Jin, Shubo Tian, Karim Hanna, Cindy Shavor, Lisbeth Garcia Arguello, Patrick Murray, Zhiyong Lu

    Abstract: Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with au… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

  48. arXiv:2401.17619  [pdf, ps, other

    cs.SD eess.AS

    Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

    Authors: Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe

    Abstract: In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatu… ▽ More

    Submitted 12 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech2024

  49. arXiv:2401.16578  [pdf, other

    cs.CL cs.AI

    Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology Reports

    Authors: Qingqing Zhu, Xiuying Chen, Qiao Jin, Benjamin Hou, Tejas Sudharshan Mathai, Pritam Mukherjee, Xin Gao, Ronald M Summers, Zhiyong Lu

    Abstract: In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarit… ▽ More

    Submitted 16 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  50. arXiv:2401.13867  [pdf

    cs.CL

    Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation

    Authors: Yifan Yang, Xiaoyu Liu, Qiao Jin, Furong Huang, Zhiyong Lu

    Abstract: Large language models like GPT-3.5-turbo and GPT-4 hold promise for healthcare professionals, but they may inadvertently inherit biases during their training, potentially affecting their utility in medical applications. Despite few attempts in the past, the precise impact and extent of these biases remain uncertain. Through both qualitative and quantitative analyses, we find that these models tend… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.