Skip to main content

Showing 1–50 of 155 results for author: Miao, Y

  1. arXiv:2407.12164  [pdf, other

    cs.CV cs.AI cs.LG

    Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

    Authors: Yanting Miao, William Loh, Suraj Kothawade, Pascal Poupart, Abdullah Rashwan, Yeqing Li

    Abstract: Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant p… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  2. arXiv:2407.11011  [pdf, other

    cs.CR cs.CV cs.LG

    Toward Availability Attacks in 3D Point Clouds

    Authors: Yifan Zhu, Yibo Miao, Yinpeng Dong, Xiao-Shan Gao

    Abstract: Despite the great progress of 3D vision, data privacy and security issues in 3D deep learning are not explored systematically. In the domain of 2D images, many availability attacks have been proposed to prevent data from being illicitly learned by unauthorized deep models. However, unlike images represented on a fixed dimensional grid, point clouds are characterized as unordered and unstructured s… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: ICML 2024, 21 pages

  3. arXiv:2407.07959  [pdf, other

    cs.SE cs.AI

    Source Code Summarization in the Era of Large Language Models

    Authors: Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, Zhenyu Chen

    Abstract: To support software developers in understanding and maintaining programs, various automatic (source) code summarization techniques have been proposed to generate a concise natural language summary (i.e., comment) for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of code-related tasks. In this paper, we undertake a systemat… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Just accepted to the 47th International Conference on Software Engineering (ICSE 2025)

    MSC Class: 68-04 ACM Class: D.2.3; I.2.7

  4. arXiv:2407.05965  [pdf, other

    cs.CV cs.AI cs.CL cs.CR cs.LG

    T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

    Authors: Yibo Miao, Yifan Zhu, Yinpeng Dong, Lijia Yu, Jun Zhu, Xiao-Shan Gao

    Abstract: The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along with this comes the rising concern about its security risks. The generated videos may contain illegal or unethical content, and there is a lack of comprehensive quantitative understanding of their safety, posing a challenge to their reliability and practical deployment. Previous evaluations primarily focus o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  5. arXiv:2407.04999  [pdf, other

    cs.LG

    Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

    Authors: Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang

    Abstract: Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In respo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  6. arXiv:2407.00072  [pdf, other

    cs.IR cs.CL

    Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

    Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

    Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability. Drawing inspiration from these principles, Pistis-RAG is a scalable multi-stage framework designed to address the challenges of large-scale retrieval-augmented generation (RAG) systems. This framework consists of distinct stages: matching, pre-ranking, ranking, reasoning, and aggregating. Each stage contributes to narrowing… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 June, 2024; originally announced July 2024.

  7. arXiv:2406.13233  [pdf, other

    cs.AI

    AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

    Authors: Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

    Abstract: Mixture of experts (MoE) has become the standard for constructing production-level large language models (LLMs) due to its promise to boost model capacity without causing significant overheads. Nevertheless, existing MoE methods usually enforce a constant top-k routing for all tokens, which is arguably restrictive because various tokens (e.g., "<EOS>" vs. "apple") may require various numbers of ex… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  9. arXiv:2406.07327  [pdf, other

    cs.AI cs.CL cs.LG

    3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

    Authors: Yuzi Yan, Yibo Miao, Jialian Li, Yipin Zhang, Jian Xie, Zhijie Deng, Dong Yan

    Abstract: Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples. Despite the efficiency, DPO has rarely be used in the state-of-the-art production-level LLMs, implying its potential pathologies. In this work, we revisit DPO with a comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  10. arXiv:2406.00588  [pdf, other

    cs.LG cs.CR math.ST

    Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

    Authors: Lijia Yu, Shuang Liu, Yibo Miao, Xiao-Shan Gao, Lijun Zhang

    Abstract: The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  11. arXiv:2405.19098  [pdf, other

    cs.LG cs.AI cs.CR cs.CV stat.ML

    Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

    Authors: Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu

    Abstract: This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradie… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  12. arXiv:2405.18524  [pdf, other

    cs.CV

    Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures

    Authors: Hongjun Wu, Li Xiao, Xingkuo Zhang, Yining Miao

    Abstract: Knowledge distillation is commonly employed to compress neural networks, reducing the inference costs and memory footprint. In the scenario of homogenous architecture, feature-based methods have been widely validated for their effectiveness. However, in scenarios where the teacher and student models are of heterogeneous architectures, the inherent differences in feature representation significantl… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, conference paper

  13. arXiv:2405.15130  [pdf, other

    cs.SE cs.CL cs.LG

    OptLLM: Optimal Assignment of Queries to Large Language Models

    Authors: Yueyue Liu, Hongyu Zhang, Yuantian Miao, Van-Hoang Le, Zhiqiang Li

    Abstract: Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: This paper is accepted by ICWS 2024

  14. arXiv:2405.01333  [pdf, other

    cs.RO cs.CV

    NeRF in Robotics: A Survey

    Authors: Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

    Abstract: Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as sim… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 21 pages, 19 figures

  15. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  16. arXiv:2404.03037  [pdf, other

    cs.LG cs.AI

    Model-based Reinforcement Learning for Parameterized Action Spaces

    Authors: Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris

    Abstract: We propose a novel model-based reinforcement learning algorithm -- Dynamics Learning and predictive control with Parameterized Actions (DLPA) -- for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generate… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  17. arXiv:2404.00469  [pdf, other

    cs.CV

    SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

    Authors: Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth

    Abstract: We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases.… ▽ More

    Submitted 12 July, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

  18. arXiv:2404.00312  [pdf, other

    cs.CV cs.AI

    Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

    Authors: Yibo Miao, Yu Lei, Feng Zhou, Zhijie Deng

    Abstract: Low-shot image classification is a fundamental task in computer vision, and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field. However, most existing CLIP-based methods lack the flexibility to effectively incorporate other pre-trained models that encompass knowledge distinct from CLIP. To bridge the gap, this work proposes… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  19. arXiv:2403.12760  [pdf, other

    cs.CV

    WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

    Authors: Yunqi Miao, Jiankang Deng, Jungong Han

    Abstract: Although diffusion models are rising as a powerful solution for blind face restoration, they are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details. In this work, we propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transforma… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  20. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  21. arXiv:2403.04164  [pdf, ps, other

    cs.CV cs.AI

    ProMISe: Promptable Medical Image Segmentation using SAM

    Authors: Jinfeng Wang, Sifan Song, Xinkun Wang, Yiyi Wang, Yiyi Miao, Jionglong Su, S. Kevin Zhou

    Abstract: With the proposal of the Segment Anything Model (SAM), fine-tuning SAM for medical image segmentation (MIS) has become popular. However, due to the large size of the SAM model and the significant domain gap between natural and medical images, fine-tuning-based strategies are costly with potential risk of instability, feature damage and catastrophic forgetting. Furthermore, some methods of transfer… ▽ More

    Submitted 18 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  22. arXiv:2403.02558  [pdf

    cs.CL cs.CV

    The Minimum Information about CLinical Artificial Intelligence Checklist for Generative Modeling Research (MI-CLAIM-GEN)

    Authors: Brenda Y. Miao, Irene Y. Chen, Christopher YK Williams, Jaysón Davidson, Augusto Garcia-Agundez, Shenghuan Sun, Travis Zack, Suchi Saria, Rima Arnaout, Giorgio Quer, Hossein J. Sadaei, Ali Torkamani, Brett Beaulieu-Jones, Bin Yu, Milena Gianfrancesco, Atul J. Butte, Beau Norgeot, Madhumita Sushil

    Abstract: Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage pres… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  23. arXiv:2402.15813  [pdf, other

    cs.CL cs.GT

    Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method

    Authors: Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, Rui Wang

    Abstract: Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It al… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings. The dataset AmazonHistoryPrice and our code are available at https://github.com/TianXiaSJTU/AmazonPriceHistory

  24. arXiv:2402.09345  [pdf, other

    cs.LG cs.AI

    InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling

    Authors: Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao

    Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge. This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this pr… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 35 pages, 28 figures

  25. arXiv:2402.05821  [pdf, other

    cs.LG cs.NE

    Guided Evolution with Binary Discriminators for ML Program Search

    Authors: John D. Co-Reyes, Yingjie Miao, George Tucker, Aleksandra Faust, Esteban Real

    Abstract: How to automatically design better machine learning programs is an open problem within AutoML. While evolution has been a popular tool to search for better ML programs, using learning itself to guide the search has been less successful and less understood on harder problems but has the promise to dramatically increase the speed and final performance of the optimization process. We propose guiding… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  26. arXiv:2402.03597  [pdf

    cs.CL cs.IR cs.LG

    Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models

    Authors: Brenda Y. Miao, Christopher YK Williams, Ebenezer Chinedu-Eneh, Travis Zack, Emily Alsentzer, Atul J. Butte, Irene Y. Chen

    Abstract: Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  27. arXiv:2401.05568  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Phase discovery with active learning: Application to structural phase transitions in equiatomic NiTi

    Authors: Jonathan Vandermause, Anders Johansson, Yucong Miao, Joost J. Vlassak, Boris Kozinsky

    Abstract: Nickel titanium (NiTi) is a protypical shape-memory alloy used in a range of biomedical and engineering devices, but direct molecular dynamics simulations of the martensitic B19' -> B2 phase transition driving its shape-memory behavior are rare and have relied on classical force fields with limited accuracy. Here, we train four machine-learned force fields for equiatomic NiTi based on the LDA, PBE… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  28. arXiv:2401.00434  [pdf, other

    cs.CL

    GeoGalactica: A Scientific Large Language Model in Geoscience

    Authors: Zhouhan Lin, Cheng Deng, Le Zhou, Tianhang Zhang, Yi Xu, Yutong Xu, Zhongmou He, Yuanyuan Shi, Beiya Dai, Yunchong Song, Boyi Zeng, Qiyuan Chen, Yuxun Miao, Bo Xue, Shu Wang, Luoyi Fu, Weinan Zhang, Junxian He, Yunqiang Zhu, Xinbing Wang, Chenghu Zhou

    Abstract: Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utili… ▽ More

    Submitted 13 April, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    ACM Class: I.2.7; F.4.1

  29. arXiv:2312.12803  [pdf, other

    cs.IT

    Repairing Schemes for Tamo-Barg Codes

    Authors: Han Cai, Ying Miao, Moshe Schwartz, Xiaohu Tang

    Abstract: In this paper, we explore a practical system setting where a rack-aware storage system consists of racks, each containing a few parity checks, referred to as a rack-aware system with locality. To minimize cross-rack bandwidth in this system, we organize the repair sets of locally repairable codes into racks and investigate the problem of repairing erasures in locally repairable codes beyond the co… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  30. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  31. arXiv:2312.00413  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

    Authors: Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, Zhenyu Chen

    Abstract: Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: submitted to ACM Transactions on Software Engineering and Methodology. arXiv admin note: text overlap with arXiv:2103.10668 by other authors

    MSC Class: 68-04; 68T30 ACM Class: D.2.3; I.2.2; I.2.4

  32. arXiv:2311.15269  [pdf, other

    cs.DC cs.AI

    Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

    Authors: Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

    Abstract: Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcraft… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: The paper is accepted by HPCA 2024

  33. arXiv:2311.12592  [pdf, other

    cs.HC cs.AI eess.SY

    Visual tracking brain computer interface

    Authors: Changxing Huang, Nanlin Shi, Yining Miao, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: Brain-computer interfaces (BCIs) offer a way to interact with computers without relying on physical movements. Non-invasive electroencephalography (EEG)-based visual BCIs, known for efficient speed and calibration ease, face limitations in continuous tasks due to discrete stimulus design and decoding methods. To achieve continuous control, we implemented a novel spatial encoding stimulus paradigm… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  34. arXiv:2311.11596  [pdf

    cs.HC cs.IT eess.SP q-bio.NC

    High-performance cVEP-BCI under minimal calibration

    Authors: Yining Miao, Nanlin Shi, Changxing Huang, Yonghao Song, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: The ultimate goal of brain-computer interfaces (BCIs) based on visual modulation paradigms is to achieve high-speed performance without the burden of extensive calibration. Code-modulated visual evoked potential-based BCIs (cVEP-BCIs) modulated by broadband white noise (WN) offer various advantages, including increased communication speed, expanded encoding target capabilities, and enhanced coding… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 35 pages, 5 figures

  35. arXiv:2311.07608  [pdf, other

    cs.LG cs.AI

    MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction

    Authors: Yan Miao, Lequan Yu

    Abstract: Hospital readmission prediction is considered an essential approach to decreasing readmission rates, which is a key factor in assessing the quality and efficacy of a healthcare system. Previous studies have extensively utilized three primary modalities, namely electronic health records (EHR), medical images, and clinical notes, to predict hospital readmissions. However, the majority of these studi… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  36. arXiv:2311.06517  [pdf, other

    cs.AI cs.DB cs.LG stat.AP

    BClean: A Bayesian Data Cleaning System

    Authors: Jianbin Qin, Sifan Huang, Yaoshu Wang, Jing Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao

    Abstract: There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice,… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: Our source code is available at https://github.com/yyssl88/BClean

  37. arXiv:2309.14737  [pdf, other

    cs.RO cs.CV

    Volumetric Semantically Consistent 3D Panoptic Mapping

    Authors: Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath

    Abstract: We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic… ▽ More

    Submitted 8 July, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 pages, 2 figures

  38. arXiv:2309.10895  [pdf, ps, other

    cs.HC cs.MA

    Large Language Models as Agents in the Clinic

    Authors: Nikita Mehandru, Brenda Y. Miao, Eduardo Rodriguez Almaraz, Madhumita Sushil, Atul J. Butte, Ahmed Alaa

    Abstract: Recent developments in large language models (LLMs) have unlocked new opportunities for healthcare, from information synthesis to clinical decision support. These new LLMs are not just capable of modeling language, but can also act as intelligent "agents" that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 4 pages

  39. arXiv:2309.05557  [pdf, other

    cs.CL cs.AI cs.NI

    An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

    Authors: Yukai Miao, Yu Bai, Li Chen, Dan Li, Haifeng Sun, Xizheng Wang, Ziqiu Luo, Yanyu Ren, Dapeng Sun, Xiuting Xu, Qi Zhang, Chao Xiang, Xinchi Li

    Abstract: Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry. However, some vertical domains are more interested in the in-domain capabilities of LLMs. For the Networks domain, we present NetEval, an evaluation set for measuring the comprehensive capabilities of LLMs in Network Operations (NetOps). NetEval is designed for evaluati… ▽ More

    Submitted 19 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

  40. arXiv:2309.05028  [pdf, other

    cs.CV

    SC-NeRF: Self-Correcting Neural Radiance Field with Sparse Views

    Authors: Liang Song, Guangming Wang, Jiuming Liu, Zhenyang Fu, Yanzi Miao, Hesheng

    Abstract: In recent studies, the generalization of neural radiance fields for novel view synthesis task has been widely explored. However, existing methods are limited to objects and indoor scenes. In this work, we extend the generalization task to outdoor scenes, trained only on object-level datasets. This approach presents two challenges. Firstly, the significant distributional shift between training and… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  41. arXiv:2308.13232  [pdf, other

    cs.HC cs.IT eess.SP q-bio.NC

    Estimating and approaching maximum information rate of noninvasive visual brain-computer interface

    Authors: Nanlin Shi, Yining Miao, Changxing Huang, Xiang Li, Yonghao Song, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: The mission of visual brain-computer interfaces (BCIs) is to enhance information transfer rate (ITR) to reach high speed towards real-life communication. Despite notable progress, noninvasive visual BCIs have encountered a plateau in ITRs, leaving it uncertain whether higher ITRs are achievable. In this study, we investigate the information rate limits of the primary visual channel to explore whet… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  42. CORAL: Expert-Curated medical Oncology Reports to Advance Language Model Inference

    Authors: Madhumita Sushil, Vanessa E. Kennedy, Divneet Mandair, Brenda Y. Miao, Travis Zack, Atul J. Butte

    Abstract: Both medical care and observational studies in oncology require a thorough understanding of a patient's disease progression and treatment history, often elaborately documented in clinical notes. Despite their vital role, no current oncology information representation and annotation schema fully encapsulates the diversity of information recorded within these notes. Although large language models (L… ▽ More

    Submitted 11 January, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Source code available at: https://github.com/MadhumitaSushil/OncLLMExtraction

  43. arXiv:2308.01857  [pdf, other

    cs.AR

    iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library

    Authors: Xingquan Li, Simin Tao, Zengrong Huang, Shijian Chen, Zhisheng Zeng, Liwei Ni, Zhipeng Huang, Chunan Zhuang, Hongxi Wu, Weiguo Li1, Xueyan Zhao, He Liu, Shuaiying Long, Wei He, Bojun Liu, Sifeng Gan, Zihao Yu, Tong Liu, Yuchi Miao, Zhiyuan Yan, Hao Wang, Jie Zhao, Yifan Li, Ruizhi Liu, Xiaoze Lin , et al. (31 additional authors not shown)

    Abstract: Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Opti… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  44. arXiv:2307.00534  [pdf, other

    cs.LG

    Shared Growth of Graph Neural Networks via Prompted Free-direction Knowledge Distillation

    Authors: Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang

    Abstract: Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer i… ▽ More

    Submitted 16 November, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.06561

  45. arXiv:2306.12113  [pdf, other

    cs.CV cs.AI

    Lightweight wood panel defect detection method incorporating attention mechanism and feature fusion network

    Authors: Yongxin Cao, Fanghua Liu, Lai Jiang, Cheng Bao, You Miao, Yang Chen

    Abstract: In recent years, deep learning has made significant progress in wood panel defect detection. However, there are still challenges such as low detection , slow detection speed, and difficulties in deploying embedded devices on wood panel surfaces. To overcome these issues, we propose a lightweight wood panel defect detection method called YOLOv5-LW, which incorporates attention mechanisms and a feat… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  46. MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

    Authors: Yongzhu Miao, Shasha Li, Jintao Tang, Ting Wang

    Abstract: Prompt tuning, like CoOp, has recently shown promising vision recognizing and transfer learning ability on various downstream tasks with the emergence of large pre-trained vision-language models like CLIP. However, we identify that existing uni-modal prompt tuning approaches may result in sub-optimal performance since this uni-modal design breaks the original alignment of textual and visual repres… ▽ More

    Submitted 14 July, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: The paper has been accepted by ICME 2023

  47. arXiv:2306.09792  [pdf, other

    cs.LG cs.CE physics.comp-ph

    GPINN: Physics-informed Neural Network with Graph Embedding

    Authors: Yuyang Miao, Haolin Li

    Abstract: This work proposes a Physics-informed Neural Network framework with Graph Embedding (GPINN) to perform PINN in graph, i.e. topological space instead of traditional Euclidean space, for improved problem-solving efficiency. The method integrates topological data into the neural network's computations, which significantly boosts the performance of the Physics-Informed Neural Network (PINN). The graph… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  48. arXiv:2306.08423  [pdf, other

    cs.DC

    DistSim: A performance model of large-scale hybrid distributed DNN training

    Authors: Guandong Lu, Runzhe Chen, Yakai Wang, Yangjie Zhou, Rui Zhang, Zheng Hu, Yanming Miao, Zhifang Cai, Li Li, Jingwen Leng, Minyi Guo

    Abstract: With the ever-increasing computational demand of DNN training workloads, distributed training has been widely adopted. A combination of data, model and pipeline parallelism strategy, called hybrid parallelism distributed training, is imported to tackle the problem of deploying large-scale models. However, how to evaluate the hybrid strategy and the utilization of each device remains a challenge si… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  49. arXiv:2306.04362  [pdf, other

    cs.CV cs.CL

    Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

    Authors: Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

    Abstract: To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known Chinese video-sharing website, with strict criteria of safety, diversity, and quality. Youku-mPLUG contains 10 million Chi… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Working in progress

  50. arXiv:2306.04240  [pdf, other

    cs.CV math.NA

    T-ADAF: Adaptive Data Augmentation Framework for Image Classification Network based on Tensor T-product Operator

    Authors: Feiyang Han, Yun Miao, Zhaoyi Sun, Yimin Wei

    Abstract: Image classification is one of the most fundamental tasks in Computer Vision. In practical applications, the datasets are usually not as abundant as those in the laboratory and simulation, which is always called as Data Hungry. How to extract the information of data more completely and effectively is very important. Therefore, an Adaptive Data Augmentation Framework based on the tensor T-product O… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.