Skip to main content

Showing 1–50 of 791 results for author: Shen, C

  1. arXiv:2407.12226  [pdf, other

    cs.LG

    Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation

    Authors: Hang Chen, Collin Meese, Mark Nejad, Chien-Chung Shen

    Abstract: Low-latency traffic prediction is vital for smart city traffic management. Federated Learning has emerged as a promising technique for Traffic Prediction (FLTP), offering several advantages such as privacy preservation, reduced communication overhead, improved prediction accuracy, and enhanced adaptability to changing traffic conditions. However, majority of the current FLTP frameworks lack a real… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 16 pages, 4 figures

  2. arXiv:2407.10785  [pdf, other

    eess.IV cs.CV

    Interpretability analysis on a pathology foundation model reveals biologically relevant embeddings across modalities

    Authors: Nhat Le, Ciyue Shen, Chintan Shah, Blake Martin, Daniel Shenker, Harshith Padigela, Jennifer Hipp, Sean Grullon, John Abel, Harsha Vardhan Pokkalla, Dinkar Juyal

    Abstract: Mechanistic interpretability has been explored in detail for large language models (LLMs). For the first time, we provide a preliminary investigation with similar interpretability methods for medical imaging. Specifically, we analyze the features from a ViT-Small encoder obtained from a pathology Foundation Model via application to two datasets: one dataset of pathology images, and one dataset of… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  3. arXiv:2407.10575  [pdf, other

    cs.CV

    A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication

    Authors: Jingyi Deng, Chenhao Lin, Zhengyu Zhao, Shuai Liu, Qian Wang, Chao Shen

    Abstract: Deep generative models have demonstrated impressive performance in various computer vision applications, including image synthesis, video generation, and medical analysis. Despite their significant advancements, these models may be used for malicious purposes, such as misinformation, deception, and copyright violation. In this paper, we provide a systematic and timely review of research efforts on… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.10196  [pdf, other

    cs.LG cs.AI

    A3S: A General Active Clustering Method with Pairwise Constraints

    Authors: Xun Deng, Junlong Liu, Han Zhong, Fuli Feng, Chen Shen, Xiangnan He, Jieping Ye, Zheng Wang

    Abstract: Active clustering aims to boost the clustering performance by integrating human-annotated pairwise constraints through strategic querying. Conventional approaches with semi-supervised clustering schemes encounter high query costs when applied to large datasets with numerous classes. To address these limitations, we propose a novel Adaptive Active Aggregation and Splitting (A3S) framework, falling… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  5. arXiv:2407.09295  [pdf, other

    cs.CR

    Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

    Authors: Yulong Yang, Xinshan Yang, Shuaidong Li, Chenhao Lin, Zhengyu Zhao, Chao Shen, Tianwei Zhang

    Abstract: The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and d… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in progress

  6. arXiv:2407.09247  [pdf, other

    cs.AI

    Constrained Intrinsic Motivation for Reinforcement Learning

    Authors: Xiang Zheng, Xingjun Ma, Chao Shen, Cong Wang

    Abstract: This paper investigates two fundamental problems that arise when utilizing Intrinsic Motivation (IM) for reinforcement learning in Reward-Free Pre-Training (RFPT) tasks and Exploration with Intrinsic Motivation (EIM) tasks: 1) how to design an effective intrinsic objective in RFPT tasks, and 2) how to reduce the bias introduced by the intrinsic objective in EIM tasks. Existing IM methods suffer fr… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCAI 2024

  7. arXiv:2407.09120  [pdf, other

    cs.LG cs.CL cs.CV

    URRL-IMVC: Unified and Robust Representation Learning for Incomplete Multi-View Clustering

    Authors: Ge Teng, Ting Mao, Chen Shen, Xiang Tian, Xuesong Liu, Yaowu Chen, Jieping Ye

    Abstract: Incomplete multi-view clustering (IMVC) aims to cluster multi-view data that are only partially available. This poses two main challenges: effectively leveraging multi-view information and mitigating the impact of missing views. Prevailing solutions employ cross-view contrastive learning and missing view recovery techniques. However, they either neglect valuable complementary information by focusi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM SIGKDD 2024

  8. arXiv:2407.07930  [pdf

    q-bio.BM cs.LG

    Token-Mol 1.0: Tokenized drug design with large language model

    Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

    Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  9. arXiv:2407.06083  [pdf, other

    cs.LG cs.IR

    A Survey of Controllable Learning: Methods and Applications in Information Retrieval

    Authors: Chenglei Shen, Xiao Zhang, Teng Shi, Changshuo Zhang, Guofu Xie, Jun Xu

    Abstract: Controllable learning (CL) emerges as a critical component in trustworthy machine learning, ensuring that learners meet predefined targets and can adaptively adjust without retraining according to the changes in those targets. We provide a formal definition of CL, and discuss its applications in information retrieval (IR) where information needs are often complex and dynamic. The survey categorize… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  10. arXiv:2407.04947  [pdf, other

    cs.CV

    FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior

    Authors: Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen

    Abstract: We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. Rather than concentrating on specific use cases such as appearance editing (image harmonization) or semantic editing (semantic image composition), we showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to Proc. Eur. Conf. Comp. Vision 2024. Project webpage: https://github.com/aim-uofa/FreeCompose

  11. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  12. arXiv:2407.03130  [pdf, other

    cs.CV

    Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

    Authors: Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

    Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  13. arXiv:2407.02805  [pdf, other

    cs.SE cs.AI

    Efficient DNN-Powered Software with Fair Sparse Models

    Authors: Xuanqi Gao, Weipeng Jiang, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: With the emergence of the Software 3.0 era, there is a growing trend of compressing and integrating large models into software systems, with significant societal implications. Regrettably, in numerous instances, model compression techniques impact the fairness performance of these models and thus the ethical behavior of DNN-powered software. One of the most notable example is the Lottery Ticket Hy… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  14. arXiv:2407.02014  [pdf, other

    cs.CV

    Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning

    Authors: Chengchao Shen, Jianzhong Chen, Jianxin Wang

    Abstract: The existing contrastive learning methods mainly focus on single-grained representation learning, e.g., part-level, object-level or scene-level ones, thus inevitably neglecting the transferability of representations on other granularity levels. In this paper, we aim to learn multi-grained representations, which can effectively describe the image on various granularity levels, thus improving genera… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  15. arXiv:2406.19311  [pdf, other

    cs.CR cs.SD eess.AS

    Zero-Query Adversarial Attack on Black-box Automatic Speech Recognition Systems

    Authors: Zheng Fang, Tao Wang, Lingchen Zhao, Shenyi Zhang, Bowen Li, Yunjie Ge, Qi Li, Chao Shen, Qian Wang

    Abstract: In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  16. arXiv:2406.14913  [pdf, other

    physics.soc-ph cs.MA

    Cooperative bots exhibit nuanced effects on cooperation across strategic frameworks

    Authors: Zehua Si, Zhixue He, Chen Shen, Jun Tanimoto

    Abstract: The positive impact of cooperative bots on cooperation within evolutionary game theory is well documented; however, existing studies have predominantly used discrete strategic frameworks, focusing on deterministic actions with a fixed probability of one. This paper extends the investigation to continuous and mixed strategic approaches. Continuous strategies employ intermediate probabilities to con… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  17. arXiv:2406.13988  [pdf, other

    cs.CV

    LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction

    Authors: Kuang Wu, Sulei Nian, Can Shen, Chuan Yang, Zhanbin Li

    Abstract: This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online mapping pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  18. arXiv:2406.12671  [pdf, other

    cs.CV

    GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models

    Authors: Yongtao Ge, Guangkai Xu, Zhiyue Zhao, Libo Sun, Zheng Huang, Yanlong Sun, Hao Chen, Chunhua Shen

    Abstract: Recent advances in discriminative and generative pretraining have yielded geometry estimation models with strong generalization capabilities. While discriminative monocular geometry estimation methods rely on large-scale fine-tuning data to achieve zero-shot generalization, several generative-based paradigms show the potential of achieving impressive generalization performance on unseen scenes by… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Code and Benchmark are available at: https://github.com/aim-uofa/GeoBench

  19. arXiv:2406.12196  [pdf, other

    cs.SE

    CITADEL: Context Similarity Based Deep Learning Framework Bug Finding

    Authors: Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Shiwei Wang, Chao Shen

    Abstract: With deep learning (DL) technology becoming an integral part of the new intelligent software, tools of DL framework testing and bug-finding are in high demand. Existing DL framework testing tools have limited coverage on bug types. For example, they lack the capability of finding performance bugs, which are critical for DL model training and inference regarding performance, economics, and the envi… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures

  20. arXiv:2406.11548  [pdf, other

    cs.RO cs.AI cs.CV

    AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

    Authors: Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Ruiping Wang, Hao Dong

    Abstract: The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects.Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly.However, these methods typically focus on high-level planning corrections using an ad… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  21. arXiv:2406.10584  [pdf, other

    cs.CL

    Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

    Authors: Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, Chao Shen

    Abstract: Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep la… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024, Preprint, Under review

  22. arXiv:2406.10125  [pdf, other

    cs.CV

    MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

    Authors: Zhongyu Yang, Mai Liu, Jinluo Xie, Yueming Zhang, Chen Shen, Wei Shao, Jichao Jiao, Tengfei Xing, Runbo Hu, Pengfei Xu

    Abstract: Autonomous driving without high-definition (HD) maps demands a higher level of active scene understanding. In this competition, the organizers provided the multi-perspective camera images and standard-definition (SD) maps to explore the boundaries of scene reasoning capabilities. We found that most existing algorithms construct Bird's Eye View (BEV) features from these multi-perspective images and… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  23. arXiv:2406.08477  [pdf, other

    cs.IR

    Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens

    Authors: Ting-Ji Huang, Jia-Qi Yang, Chunxu Shen, Kai-Qi Liu, De-Chuan Zhan, Han-Jia Ye

    Abstract: Characterizing users and items through vector representations is crucial for various tasks in recommender systems. Recent approaches attempt to apply Large Language Models (LLMs) in recommendation through a question and answer format, where real users and items (e.g., Item No.2024) are represented with in-vocabulary tokens (e.g., "item", "20", "24"). However, since LLMs are typically pretrained on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  24. arXiv:2406.06579  [pdf, other

    cs.CL cs.AI cs.CV

    From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models

    Authors: Xiaofeng Zhang, Chen Shen, Xiaosong Yuan, Shaotian Yan, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye

    Abstract: Recently, multimodal large language models have exploded with an endless variety, most of the popular Large Vision Language Models (LVLMs) depend on sequential visual representation, where images are converted into hundreds or thousands of tokens before being input into the Large Language Model (LLM) along with language prompts. The black-box design hinders the interpretability of visual-language… ▽ More

    Submitted 13 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  25. arXiv:2406.05810  [pdf, other

    cs.CV

    ControlLoc: Physical-World Hijacking Attack on Visual Perception in Autonomous Driving

    Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qian Wang, Qi Alfred Chen, Chao Shen

    Abstract: Recent research in adversarial machine learning has focused on visual perception in Autonomous Driving (AD) and has shown that printed adversarial patches can attack object detectors. However, it is important to note that AD visual perception encompasses more than just object detection; it also includes Multiple Object Tracking (MOT). MOT enhances the robustness by compensating for object detectio… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  26. arXiv:2406.05800  [pdf, other

    cs.CV cs.CR

    SlowPerception: Physical-World Latency Attack against Visual Perception in Autonomous Driving

    Authors: Chen Ma, Ningfei Wang, Zhengyu Zhao, Qi Alfred Chen, Chao Shen

    Abstract: Autonomous Driving (AD) systems critically depend on visual perception for real-time object detection and multiple object tracking (MOT) to ensure safe driving. However, high latency in these visual perception components can lead to significant safety risks, such as vehicle collisions. While previous research has extensively explored latency attacks within the digital realm, translating these meth… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  27. arXiv:2406.04596  [pdf, other

    cs.LG

    Federated Representation Learning in the Under-Parameterized Regime

    Authors: Renpu Liu, Cong Shen, Jing Yang

    Abstract: Federated representation learning (FRL) is a popular personalized federated learning (FL) framework where clients work together to train a common representation while retaining their personalized heads. Existing studies, however, largely focus on the over-parameterized regime. In this paper, we make the initial efforts to investigate FRL in the under-parameterized regime, where the FL model is ins… ▽ More

    Submitted 17 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: This work has been accepted to ICML 2024

  28. arXiv:2406.04149  [pdf

    eess.IV cs.AI

    Characterizing segregation in blast rock piles a deep-learning approach leveraging aerial image analysis

    Authors: Chengeng Liu, Sihong Liu, Chaomin Shen, Yupeng Gao, Yuxuan Liu

    Abstract: Blasted rock material serves a critical role in various engineering applications, yet the phenomenon of segregation-where particle sizes vary significantly along the gradient of a quarry pile-presents challenges for optimizing quarry material storage and handling. This study introduces an advanced image analysis methodology to characterize such segregation of rock fragments. The accurate delineati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  29. arXiv:2406.03730  [pdf, other

    cs.LG cs.AI

    FastGAS: Fast Graph-based Annotation Selection for In-Context Learning

    Authors: Zihan Chen, Song Wang, Cong Shen, Jundong Li

    Abstract: In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts. Since generating the prompts needs to sample from a vast pool of instances and annotate them (e.g., add labels in classification task), existing methods have proposed to select a subset of unlabeled examples for annotation, thus enhancing the quality of prompts an… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  30. arXiv:2406.03726  [pdf

    cs.LG

    Efficient Graph Encoder Embedding for Large Sparse Graphs in Python

    Authors: Xihan Qin, Cencheng Shen

    Abstract: Graph is a ubiquitous representation of data in various research fields, and graph embedding is a prevalent machine learning technique for capturing key features and generating fixed-sized attributes. However, most state-of-the-art graph embedding methods are computationally and spatially expensive. Recently, the Graph Encoder Embedding (GEE) has been shown as the fastest graph embedding technique… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  31. arXiv:2406.03141  [pdf, other

    q-bio.BM cs.LG

    Floating Anchor Diffusion Model for Multi-motif Scaffolding

    Authors: Ke Liu, Weian Mao, Shuaike Shen, Xiaoran Jiao, Zheng Sun, Hao Chen, Chunhua Shen

    Abstract: Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  32. arXiv:2406.02435  [pdf, other

    cs.CV

    Generative Active Learning for Long-tailed Instance Segmentation

    Authors: Muzhi Zhu, Chengxiang Fan, Hao Chen, Yang Liu, Weian Mao, Xiaogang Xu, Chunhua Shen

    Abstract: Recently, large-scale language-image generative models have gained widespread attention and many works have utilized generated data from these models to further enhance the performance of perception tasks. However, not all generated data can positively impact downstream models, and these methods do not thoroughly explore how to better select and utilize generated data. On the other hand, there is… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  33. arXiv:2406.02189  [pdf, other

    cs.LG

    Fast and Scalable Multi-Kernel Encoder Classifier

    Authors: Cencheng Shen

    Abstract: This paper introduces a new kernel-based classifier by viewing kernel matrices as generalized graphs and leveraging recent progress in graph embedding techniques. The proposed method facilitates fast and scalable kernel matrix embedding, and seamlessly integrates multiple kernels to enhance the learning process. Our theoretical analysis offers a population-level characterization of this approach u… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 12 pages main + 3 pages appendix

  34. arXiv:2406.00602  [pdf, other

    cs.SE cs.PL

    From Effectiveness to Efficiency: Comparative Evaluation of Code Generated by LCGMs for Bilingual Programming Questions

    Authors: Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Chao Shen

    Abstract: Large Code Generation Models (LCGMs) have garnered significant attention and achieved promising results across various programming tasks. However, concerns arise regarding performance when using non-English prompts, as these models are primarily trained on English-centric corpora, and most programming language tokens resemble English. Existing benchmarks often rely on English programming questions… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 10 and a quarter pages, 6 figures

  35. arXiv:2406.00584  [pdf, other

    cs.DB cs.AI

    A Blueprint Architecture of Compound AI Systems for Enterprise

    Authors: Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng, Hannah Kim, Chen Shen, Jin Wang, Estevam Hruschka

    Abstract: Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we intr… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Compound AI Systems Workshop at the Data+AI Summit 2024

  36. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  37. arXiv:2405.15473  [pdf, other

    stat.ML cs.LG cs.SI

    Encoder Embedding for General Graph and Node Classification

    Authors: Cencheng Shen

    Abstract: Graph encoder embedding, a recent technique for graph data, offers speed and scalability in producing vertex-level representations from binary graphs. In this paper, we extend the applicability of this method to a general graph model, which includes weighted graphs, distance matrices, and kernel matrices. We prove that the encoder embedding satisfies the law of large numbers and the central limit… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 26 pages

  38. arXiv:2405.14092  [pdf, other

    cs.CL

    Large Language Models Can Self-Correct with Minimal Effort

    Authors: Zhenyu Wu, Qingkai Zeng, Zhihan Zhang, Zhaoxuan Tan, Chao Shen, Meng Jiang

    Abstract: Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current… ▽ More

    Submitted 23 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: Work in Progress

  39. arXiv:2405.13870  [pdf, other

    cs.CV

    FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

    Authors: Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen

    Abstract: Benefiting from large-scale pre-trained text-to-image (T2I) generative models, impressive progress has been achieved in customized image generation, which aims to generate user-specified concepts. Existing approaches have extensively focused on single-concept customization and still encounter challenges when it comes to complex scenarios that involve combining multiple concepts. These approaches o… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: CVPR2024

  40. arXiv:2405.12797  [pdf, other

    cs.SI stat.ML

    Refined Graph Encoder Embedding via Self-Training and Latent Community Recovery

    Authors: Cencheng Shen, Jonathan Larson, Ha Trinh, Carey E. Priebe

    Abstract: This paper introduces a refined graph encoder embedding method, enhancing the original graph encoder embedding using linear transformation, self-training, and hidden community recovery within observed communities. We provide the theoretical rationale for the refinement procedure, demonstrating how and why our proposed method can effectively identify useful hidden communities via stochastic block m… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 12 pages main + 4 pages appendix

  41. arXiv:2405.11326  [pdf, other

    cs.LG cs.CV

    On the Trajectory Regularity of ODE-based Diffusion Sampling

    Authors: Defang Chen, Zhenyu Zhou, Can Wang, Chunhua Shen, Siwei Lyu

    Abstract: Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: ICML 2024, 30 pages

  42. arXiv:2405.11034  [pdf, other

    cs.LG

    Safety in Graph Machine Learning: Threats and Safeguards

    Authors: Song Wang, Yushun Dong, Binchi Zhang, Zihan Chen, Xingbo Fu, Yinhan He, Cong Shen, Chuxu Zhang, Nitesh V. Chawla, Jundong Li

    Abstract: Graph Machine Learning (Graph ML) has witnessed substantial advancements in recent years. With their remarkable ability to process graph-structured data, Graph ML techniques have been extensively utilized across diverse applications, including critical domains like finance, healthcare, and transportation. Despite their societal benefits, recent research highlights significant safety concerns assoc… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 20 pages

  43. arXiv:2405.10185  [pdf, other

    cs.CV

    DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data

    Authors: Chengxiang Fan, Muzhi Zhu, Hao Chen, Yang Liu, Weijia Wu, Huaqi Zhang, Chunhua Shen

    Abstract: Instance segmentation is data-hungry, and as model capacity increases, data scale becomes crucial for improving the accuracy. Most instance segmentation datasets today require costly manual annotation, limiting their data scale. Models trained on such data are prone to overfitting on the training set, especially for those rare categories. While recent works have delved into exploiting generative m… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024, codes are available at \href{this https URL}{https://github.com/aim-uofa/DiverGen}

  44. arXiv:2405.09138  [pdf, other

    cs.CV

    OpenGait: A Comprehensive Benchmark Study for Gait Recognition towards Better Practicality

    Authors: Chao Fan, Saihui Hou, Junhao Liang, Chuanfu Shen, Jingzhe Ma, Dongyang Jin, Yongzhen Huang, Shiqi Yu

    Abstract: Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore,… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  45. arXiv:2405.05929  [pdf

    cs.HC

    Understanding Emotional Hijacking in Metaverse

    Authors: Syed Ali Asif, Philip Gable, Chien-Chung Shen, Yan-Ming Chiou

    Abstract: Emotions are an integral part of being human, and experiencing a range of emotions is what makes life rich and vibrant. From basic emotions like anger, fear, happiness, and sadness to more complex ones like excitement and grief, emotions help us express ourselves and connect with the world around us. In recent years, researchers have begun adopting virtual reality (VR) technology to evoke emotions… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  46. arXiv:2405.05919  [pdf

    cs.HC

    Protecting Human Users Against Cognitive Attacks in Immersive Environments

    Authors: Yan-Ming Chiou, Bob Price, Chien-Chung Shen, Syed Ali Asif

    Abstract: Integrating mixed reality (MR) with artificial intelligence (AI) technologies, including vision, language, audio, reasoning, and planning, enables the AI-powered MR assistant [1] to substantially elevate human efficiency. This enhancement comes from situational awareness, quick access to essential information, and support in learning new skills in the right context throughout everyday tasks. This… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  47. arXiv:2405.05918  [pdf

    cs.HC

    Safeguarding People's Financial Health in Metaverse with Emotionally Intelligent Virtual Buddy

    Authors: Syed Ali Asif, Emma Cao, Hang Chen, Chien-Chung Shen, Yan-Ming Chiou

    Abstract: The Metaverse, an immersive virtual world, has emerged as a shared space where people engage in various activities ranging from social interactions to commerce. Cryptocurrencies [3] and Non-Fungible Tokens (NFTs) [6] play pivotal roles within this virtual realm, reshaping interactions and transactions. Cryptocurrencies, utilizing cryptographic techniques for security, enable decentralized and secu… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    Comments: This is an accepted position statement of CHI 2024 Workshop (Novel Approaches for Understanding and Mitigating Emerging New Harms in Immersive and Embodied Virtual Spaces: A Workshop at CHI 2024)

  48. arXiv:2404.19652  [pdf, other

    cs.CV cs.AI

    VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

    Authors: Yuliang Liu, Mingxin Huang, Hao Yan, Linger Deng, Weijia Wu, Hao Lu, Chunhua Shen, Lianwen Jin, Xiang Bai

    Abstract: Text spotting, a task involving the extraction of textual information from image or video sequences, faces challenges in cross-domain adaption, such as image-to-image and image-to-video generalization. In this paper, we introduce a new method, termed VimTS, which enhances the generalization ability of the model by achieving better synergy among different tasks. Typically, we propose a Prompt Queri… ▽ More

    Submitted 14 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  49. arXiv:2404.19335  [pdf, other

    cs.CL

    StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

    Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

    Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Submitted to ACL 2024

  50. arXiv:2404.17871  [pdf, other

    cs.SE cs.AI

    A Survey of Deep Learning Library Testing Methods

    Authors: Xiaoyu Zhang, Weipeng Jiang, Chao Shen, Qi Li, Qian Wang, Chenhao Lin, Xiaohong Guan

    Abstract: In recent years, software systems powered by deep learning (DL) techniques have significantly facilitated people's lives in many aspects. As the backbone of these DL systems, various DL libraries undertake the underlying optimization and computation. However, like traditional software, DL libraries are not immune to bugs, which can pose serious threats to users' personal property and safety. Study… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 34 pages, 8 figures, 4 tables