Skip to main content

Showing 1–50 of 1,339 results for author: Yu, J

  1. arXiv:2407.13515  [pdf, other

    cs.HC

    CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

    Authors: Jaewook Lee, Andrew D. Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, Yuhang Zhao

    Abstract: Cooking is a central activity of daily living, supporting independence and both mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with cooking tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV) and robotics, we present CookAR, a head-mounted AR sys… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13335  [pdf, other

    cs.CV

    OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction

    Authors: Yini Fang, Jingling Yu, Haozheng Zhang, Ralf van der Lans, Bertram Shi

    Abstract: Visual search is important in our daily life. The efficient allocation of visual attention is critical to effectively complete visual search tasks. Prior research has predominantly modelled the spatial allocation of visual attention in images at the pixel level, e.g. using a saliency map. However, emerging evidence shows that visual attention is guided by objects rather than pixel intensities. Thi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted in ECCV 2024

  3. arXiv:2407.12887  [pdf, other

    cs.RO

    Self-Adaptive Robust Motion Planning for High DoF Robot Manipulator using Deep MPC

    Authors: Ye Zhang, Kangtong Mo, Fangzhou Shen, Xuanzhen Xu, Xingyu Zhang, Jiayue Yu, Chang Yu

    Abstract: In contemporary control theory, self-adaptive methodologies are highly esteemed for their inherent flexibility and robustness in managing modeling uncertainties. Particularly, robust adaptive control stands out owing to its potent capability of leveraging robust optimization algorithms to approximate cost functions and relax the stringent constraints often associated with conventional self-adaptiv… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  4. arXiv:2407.12857  [pdf, other

    cs.CL cs.DL cs.IR

    Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

    Authors: Jianxiang Yu, Zichen Ding, Jiaqi Tan, Kangyang Luo, Zhenmin Weng, Chenghua Gong, Long Zeng, Renjing Cui, Chengcheng Han, Qiushi Sun, Zhiyong Wu, Yunshi Lan, Xiang Li

    Abstract: In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  5. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  6. arXiv:2407.11736  [pdf, other

    cs.RO cs.CV

    GV-Bench: Benchmarking Local Feature Matching for Geometric Verification of Long-term Loop Closure Detection

    Authors: Jingwen Yu, Hanjing Ye, Jianhao Jiao, Ping Tan, Hong Zhang

    Abstract: Visual loop closure detection is an important module in visual simultaneous localization and mapping (SLAM), which associates current camera observation with previously visited places. Loop closures correct drifts in trajectory estimation to build a globally consistent map. However, a false loop closure can be fatal, so verification is required as an additional step to ensure robustness by rejecti… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 9 pages, 11 figures, Accepted by IROS(2024)

  7. arXiv:2407.11624  [pdf, other

    cs.LG cs.AI cs.CY

    Rethinking Fair Graph Neural Networks from Re-balancing

    Authors: Zhixun Li, Yushun Dong, Qiang Liu, Jeffrey Xu Yu

    Abstract: Driven by the powerful representation ability of Graph Neural Networks (GNNs), plentiful GNN models have been widely deployed in many real-world applications. Nevertheless, due to distribution disparities between different demographic groups, fairness in high-stake decision-making systems is receiving increasing attention. Although lots of recent works devoted to improving the fairness of GNNs and… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by SIGKDD 2024, research track

  8. arXiv:2407.11466  [pdf, other

    cs.CY

    Navigating the Data Trading Crossroads: An Interdisciplinary Survey

    Authors: Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang

    Abstract: Data has been increasingly recognized as a critical factor in the future economy. However, constructing an efficient data trading market faces challenges such as privacy breaches, data monopolies, and misuse. Despite numerous studies proposing algorithms to protect privacy and methods for pricing data, a comprehensive understanding of these issues and systemic solutions remain elusive. This paper… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2407.11100  [pdf, other

    cs.CR cs.AI cs.CL

    Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

    Authors: Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Qiao Yu, Li Li, Fei-Yue Wang

    Abstract: Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensur… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 59 pages, 7 figures

  10. arXiv:2407.10806  [pdf, other

    cs.CV

    Enhancing Robustness to Noise Corruption for Point Cloud Model via Spatial Sorting and Set-Mixing Aggregation Module

    Authors: Dingxin Zhang, Jianhui Yu, Tengfei Xue, Chaoyi Zhang, Dongnan Liu, Weidong Cai

    Abstract: Current models for point cloud recognition demonstrate promising performance on synthetic datasets. However, real-world point cloud data inevitably contains noise, impacting model robustness. While recent efforts focus on enhancing robustness through various strategies, there still remains a gap in comprehensive analyzes from the standpoint of network architecture design. Unlike traditional method… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 22 pages, 9 figures

  11. arXiv:2407.10204  [pdf, other

    cs.LG

    Improving Graph Out-of-distribution Generalization on Real-world Data

    Authors: Can Xu, Yao Cheng, Jianxiang Yu, Haosen Wang, Jingsong Lv, Xiang Li

    Abstract: Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environm… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 21 pages, 5 figures

  12. FedVAE: Trajectory privacy preserving based on Federated Variational AutoEncoder

    Authors: Yuchen Jiang, Ying Wu, Shiyao Zhang, James J. Q. Yu

    Abstract: The use of trajectory data with abundant spatial-temporal information is pivotal in Intelligent Transport Systems (ITS) and various traffic system tasks. Location-Based Services (LBS) capitalize on this trajectory data to offer users personalized services tailored to their location information. However, this trajectory data contains sensitive information about users' movement patterns and habits,… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 2023 IEEE 98th Vehicular Technology Conference

  13. Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models

    Authors: Peter Ince, Xiapu Luo, Jiangshan Yu, Joseph K. Liu, Xiaoning Du

    Abstract: In this paper, we test the hypothesis that although OpenAI's GPT-4 performs well generally, we can fine-tune open-source models to outperform GPT-4 in smart contract vulnerability detection. We fine-tune two models from Meta's Code Llama and a dataset of 17k prompts, Detect Llama - Foundation and Detect Llama - Instruct, and we also fine-tune OpenAI's GPT-3.5 Turbo model (GPT-3.5FT). We then evalu… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  14. arXiv:2407.06770  [pdf, other

    cs.RO

    Pretraining-finetuning Framework for Efficient Co-design: A Case Study on Quadruped Robot Parkour

    Authors: Ci Chen, Jiyu Yu, Haojian Lu, Hongbo Gao, Rong Xiong, Yue Wang

    Abstract: In nature, animals with exceptional locomotion abilities, such as cougars, often possess asymmetric fore and hind legs, with their powerful hind legs acting as reservoirs of energy for leaps. This observation inspired us: could optimize the leg length of quadruped robots endow them with similar locomotive capabilities? In this paper, we propose an approach that co-optimizes the mechanical structur… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  15. Consistency and Discrepancy-Based Contrastive Tripartite Graph Learning for Recommendations

    Authors: Linxin Guo, Yaochen Zhu, Min Gao, Yinghui Tao, Junliang Yu, Chen Chen

    Abstract: Tripartite graph-based recommender systems markedly diverge from traditional models by recommending unique combinations such as user groups and item bundles. Despite their effectiveness, these systems exacerbate the longstanding cold-start problem in traditional recommender systems, because any number of user groups or item bundles can be formed among users or items. To address this issue, we intr… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  16. arXiv:2407.05082  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    DMTG: One-Shot Differentiable Multi-Task Grouping

    Authors: Yuan Gao, Shuguo Jiang, Moran Li, Jin-Gang Yu, Gui-Song Xia

    Abstract: We aim to address Multi-Task Learning (MTL) with a large number of tasks by Multi-Task Grouping (MTG). Given N tasks, we propose to simultaneously identify the best task groups from 2^N candidates and train the model weights simultaneously in one-shot, with the high-order task-affinity fully exploited. This is distinct from the pioneering methods which sequentially identify the groups and train th… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ICML 2024

    Journal ref: International Conference on Machine Learning (ICML), 2024

  17. arXiv:2407.05008  [pdf, other

    cs.CV

    T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy

    Authors: Fan Duan, Jiahao Yu, Li Chen

    Abstract: Point clouds are commonly used in various practical applications such as autonomous driving and the manufacturing industry. However, these point clouds often suffer from incompleteness due to limited perspectives, scanner resolution and occlusion. Therefore the prediction of missing parts performs a crucial task. In this paper, we propose a novel method for point cloud completion. We utilize a sph… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  18. arXiv:2407.03188  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation

    Authors: Zihao Wang, Haoxuan Liu, Jiaxing Yu, Tao Zhang, Yan Liu, Kejun Zhang

    Abstract: Amid the rising intersection of generative AI and human artistic processes, this study probes the critical yet less-explored terrain of alignment in human-centric automatic song composition. We propose a novel task of Colloquial Description-to-Song Generation, which focuses on aligning the generated content with colloquial human expressions. This task is aimed at bridging the gap between colloquia… ▽ More

    Submitted 10 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  19. arXiv:2407.00995  [pdf, other

    cs.CY eess.SY physics.app-ph

    Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

    Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, Jingru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

    Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  20. arXiv:2406.19226  [pdf, other

    cs.CL cs.HC

    Simulating Classroom Education with LLM-Empowered Agents

    Authors: Zheyuan Zhang, Daniel Zhang-Li, Jifan Yu, Linlu Gong, Jinchang Zhou, Zhiyuan Liu, Lei Hou, Juanzi Li

    Abstract: Large language models (LLMs) have been employed in various intelligent educational tasks to assist teaching. While preliminary explorations have focused on independent LLM-empowered agents for specific educational tasks, the potential for LLMs within a multi-agent collaborative framework to simulate a classroom with real user participation remains unexplored. In this work, we propose SimClass, a m… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.17219  [pdf, other

    cs.CV

    Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction

    Authors: Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen, Chao Hu, Jun Yu

    Abstract: The unprecedented capture and application of face images raise increasing concerns on anonymization to fight against privacy disclosure. Most existing methods may suffer from the problem of excessive change of the identity-independent information or insufficient identity protection. In this paper, we present a new face anonymization approach by distracting the intrinsic and extrinsic identity atte… ▽ More

    Submitted 6 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen, Chao Hu, Jun Yu; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 12406-12415 Date of Conference: 17-21 June 2024 Conference Location: Seattle, USA

  22. arXiv:2406.16473  [pdf, other

    cs.CV cs.AI

    Seeking Certainty In Uncertainty: Dual-Stage Unified Framework Solving Uncertainty in Dynamic Facial Expression Recognition

    Authors: Haoran Wang, Xinji Mai, Zeng Tao, Xuan Tong, Junxiong Lin, Yan Wang, Jiawen Yu, Boyang Wang, Shaoqi Yan, Qing Zhao, Ziheng Zhou, Shuyong Gao, Wenqiang Zhang

    Abstract: The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  23. arXiv:2406.16459  [pdf, other

    cs.CV

    Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

    Authors: Junxiong Lin, Zeng Tao, Xuan Tong, Xinji Mai, Haoran Wang, Boyang Wang, Yan Wang, Qing Zhao, Jiawen Yu, Yuxuan Lin, Shaoqi Yan, Shuyong Gao, Wenqiang Zhang

    Abstract: The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  24. arXiv:2406.16434  [pdf, other

    cs.CV

    Multi-threshold Deep Metric Learning for Facial Expression Recognition

    Authors: Wenwu Yang, Jinyi Yu, Tuo Chen, Zhenguang Liu, Xun Wang, Jianbing Shen

    Abstract: Effective expression feature representations generated by a triplet-based deep metric learning are highly advantageous for facial expression recognition (FER). The performance of triplet-based deep metric learning is contingent upon identifying the best threshold for triplet loss. Threshold validation, however, is tough and challenging, as the ideal threshold changes among datasets and even across… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: accepted by Pattern Recognition

  25. arXiv:2406.15695  [pdf, other

    cs.CL

    SS-Bench: A Benchmark for Social Story Generation and Evaluation

    Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu

    Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timelin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  26. arXiv:2406.15000  [pdf, other

    cs.CL cs.AI

    Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations

    Authors: Lichao Zhang, Jia Yu, Shuai Zhang, Long Li, Yangyang Zhong, Guanbao Liang, Yuming Yan, Qing Ma, Fangsheng Weng, Fayu Pan, Jing Li, Renjun Xu, Zhenzhong Lan

    Abstract: Large Language Models (LLMs) have significantly advanced user-bot interactions, enabling more complex and coherent dialogues. However, the prevalent text-only modality might not fully exploit the potential for effective user engagement. This paper explores the impact of multi-modal interactions, which incorporate images and audio alongside text, on user engagement in chatbot conversations. We cond… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  27. arXiv:2406.14991  [pdf, other

    cs.CL cs.SE

    SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

    Authors: Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

    Abstract: We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Homepage: https://spreadsheetbench.github.io/

  28. arXiv:2406.14274  [pdf, other

    cs.CV cs.LG

    Learning to Discover Knowledge: A Weakly-Supervised Partial Domain Adaptation Approach

    Authors: Mengcheng Lan, Min Meng, Jun Yu, Jigang Wu

    Abstract: Domain adaptation has shown appealing performance by leveraging knowledge from a source domain with rich annotations. However, for a specific target task, it is cumbersome to collect related and high-quality source domains. In real-world scenarios, large-scale datasets corrupted with noisy labels are easy to collect, stimulating a great demand for automatic recognition in a generalized setting, i.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted to TIP 2024. Code available: https://github.com/mc-lan/SP-TCL

  29. arXiv:2406.14264  [pdf, other

    eess.IV cs.CV

    Zero-Shot Image Denoising for High-Resolution Electron Microscopy

    Authors: Xuanyu Tian, Zhuoya Dong, Xiyue Lin, Yue Gao, Hongjiang Wei, Yanhang Ma, Jingyi Yu, Yuyao Zhang

    Abstract: High-resolution electron microscopy (HREM) imaging technique is a powerful tool for directly visualizing a broad range of materials in real-space. However, it faces challenges in denoising due to ultra-low signal-to-noise ratio (SNR) and scarce data availability. In this work, we propose Noise2SR, a zero-shot self-supervised learning (ZS-SSL) denoising framework for HREM. Within our framework, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 12 figures

  30. arXiv:2406.13897  [pdf, other

    cs.CV

    CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

    Authors: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu

    Abstract: In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

    Comments: Project page: https://sites.google.com/view/clay-3dlm Video: https://youtu.be/YcKFp4U2Voo

  31. arXiv:2406.13665  [pdf, ps, other

    cs.LG

    Challenges in Binary Classification

    Authors: Pengbo Yang, Jian Yu

    Abstract: Binary Classification plays an important role in machine learning. For linear classification, SVM is the optimal binary classification method. For nonlinear classification, the SVM algorithm needs to complete the classification task by using the kernel function. Although the SVM algorithm with kernel function is very effective, the selection of kernel function is empirical, which means that the ke… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  32. arXiv:2406.13092  [pdf, other

    cs.CL

    Multilingual Synopses of Movie Narratives: A Dataset for Story Understanding

    Authors: Yidan Sun, Jianfei Yu, Boyang Li

    Abstract: Story video-text alignment, a core task in computational story understanding, aims to align video clips with corresponding sentences in their descriptions. However, progress on the task has been held back by the scarcity of manually annotated video-text correspondence and the heavy concentration on English narrations of Hollywood movies. To address these issues, in this paper, we construct a large… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 16 pages, 9 figures

  33. arXiv:2406.11682  [pdf, other

    cs.CL cs.AI cs.CR

    Knowledge-to-Jailbreak: One Knowledge Point Worth One Attack

    Authors: Shangqing Tu, Zhuoran Pan, Wenxuan Wang, Zhexin Zhang, Yuliang Sun, Jifan Yu, Hongning Wang, Lei Hou, Juanzi Li

    Abstract: Large language models (LLMs) have been increasingly applied to various domains, which triggers increasing concerns about LLMs' safety on specialized domains, e.g. medicine. However, testing the domain-specific safety of LLMs is challenging due to the lack of domain knowledge-driven attacks in existing benchmarks. To bridge this gap, we propose a new task, knowledge-to-jailbreak, which aims to gene… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 18 pages, 14 figures, 11 tables

  34. R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models

    Authors: Shangqing Tu, Yuanchun Wang, Jifan Yu, Yuyang Xie, Yaran Shi, Xiaozhi Wang, Jing Zhang, Lei Hou, Juanzi Li

    Abstract: Large language models have achieved remarkable success on general NLP tasks, but they may fall short for domain-specific problems. Recently, various Retrieval-Augmented Large Language Models (RALLMs) are proposed to address this shortcoming. However, existing evaluation tools only provide a few baselines and evaluate them on various domains without mining the depth of domain knowledge. In this pap… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures, Accepted by KDD2024

  35. arXiv:2406.11342  [pdf, other

    cs.MA

    KAOS: Large Model Multi-Agent Operating System

    Authors: Zhao Zhuo, Rongzhen Li, Kai Liu, Huhai Zou, KaiMao Li, Jie Yu, Tianhao Sun, Qingbo Wu

    Abstract: The intelligent interaction model based on large models reduces the differences in user experience across various system platforms but faces challenges in multi-agent collaboration and resource sharing. To demonstrate a uniform user experience across different foundational software platforms and address resource coordination management challenges, this paper proposes a multi-agent operating system… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  36. arXiv:2406.10527  [pdf, other

    cs.CV

    Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center

    Authors: Zichen Yu, Changyong Shu, Qianpu Sun, Junjie Linghu, Xiaobao Wei, Jiangyong Yu, Zongdai Liu, Dawei Yang, Hui Li, Yan Chen

    Abstract: Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc,… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  37. arXiv:2406.10057  [pdf, other

    cs.CV cs.AI

    First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

    Authors: Enming Zhang, Ruobing Yao, Huanyong Liu, Junhui Yu, Jiale Wang

    Abstract: With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first compr… ▽ More

    Submitted 18 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  38. arXiv:2406.08587  [pdf, other

    cs.CL cs.AI cs.LG

    CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

    Authors: Xiaoshuai Song, Muxi Diao, Guanting Dong, Zhengyang Wang, Yujia Fu, Runqi Qiao, Zhexu Wang, Dayuan Fu, Huangxuan Wu, Bin Liang, Weihao Zeng, Yejie Wang, Zhuoma GongQue, Jianing Yu, Qiuna Tan, Weiran Xu

    Abstract: Computer Science (CS) stands as a testament to the intricacies of human intelligence, profoundly advancing the development of artificial intelligence and modern society. However, the current community of large language models (LLMs) overly focuses on benchmarks for analyzing specific foundational skills (e.g. mathematics and code generation), neglecting an all-round evaluation of the computer scie… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Work in progress

  39. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.07268  [pdf, other

    cs.MM cs.CL cs.CV

    Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation

    Authors: Jinyuan Li, Ziyan Li, Han Li, Jianfei Yu, Rui Xia, Di Sun, Gang Pan

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases u… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Extension of our Findings of EMNLP 2023 & ACL 2024 paper

  41. arXiv:2406.06474  [pdf, other

    cs.AI cs.CL

    Towards a Personal Health Large Language Model

    Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

    Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 72 pages

  42. arXiv:2406.05837  [pdf, other

    cs.CV cs.AI

    Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation

    Authors: Jun Yu, Yunxiang Zhang, Fengzhao Sun, Leilei Wang, Renjie Lu

    Abstract: In this report, we present our solution for the semantic segmentation in adverse weather, in UG2+ Challenge at CVPR 2024. To achieve robust and accurate segmentation results across various weather conditions, we initialize the InternImage-H backbone with pre-trained weights from the large-scale joint dataset and enhance it with the state-of-the-art Upernet segmentation method. Specifically, we uti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Solution for CVPR 2024 UG2+ Challenge Track on All Weather Semantic Segmentation

  43. arXiv:2406.05745  [pdf, other

    stat.ML cs.AI cs.LG

    Structured Learning of Compositional Sequential Interventions

    Authors: Jialin Yu, Andreas Koukorinis, Nicolò Colombo, Yuchen Zhu, Ricardo Silva

    Abstract: We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously un… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  44. arXiv:2406.05433  [pdf, other

    cs.NE

    Large Language Model Assisted Adversarial Robustness Neural Architecture Search

    Authors: Rui Zhong, Yang Cao, Jun Yu, Masaharu Munetomo

    Abstract: Motivated by the potential of large language models (LLMs) as optimizers for solving combinatorial optimization problems, this paper proposes a novel LLM-assisted optimizer (LLMO) to address adversarial robustness neural architecture search (ARNAS), a specific application of combinatorial optimization. We design the prompt using the standard CRISPE framework (i.e., Capacity and Role, Insight, Stat… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by The 6th International Conference on Data-driven Optimization of Complex Systems (DOCS)

  45. arXiv:2406.05354  [pdf, other

    cs.AR cs.AI cs.DC

    Investigating Memory Failure Prediction Across CPU Architectures

    Authors: Qiao Yu, Wengui Zhang, Min Zhou, Jialiang Yu, Zhenli Sheng, Jasmin Bogatinovski, Jorge Cardoso, Odej Kao

    Abstract: Large-scale datacenters often experience memory failures, where Uncorrectable Errors (UEs) highlight critical malfunction in Dual Inline Memory Modules (DIMMs). Existing approaches primarily utilize Correctable Errors (CEs) to predict UEs, yet they typically neglect how these errors vary between different CPU architectures, especially in terms of Error Correction Code (ECC) applicability. In this… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Industry Track

  46. arXiv:2406.03866  [pdf, other

    cs.CV

    LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model

    Authors: Yixuan Yang, Junru Lu, Zixiang Zhao, Zhen Luo, James J. Q. Yu, Victor Sanchez, Feng Zheng

    Abstract: Designing 3D indoor layouts is a crucial task with significant applications in virtual reality, interior design, and automated space planning. Existing methods for 3D layout design either rely on diffusion models, which utilize spatial relationship priors, or heavily leverage the inferential capabilities of proprietary Large Language Models (LLMs), which require extensive prompt engineering and in… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  47. arXiv:2406.01514  [pdf, other

    cs.CL cs.AI cs.CR

    Decoupled Alignment for Robust Plug-and-Play Adaptation

    Authors: Haozheng Luo, Jiahao Yu, Wenxin Zhang, Jialong Li, Jerry Yao-Chieh Hu, Xinyu Xing, Han Liu

    Abstract: We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  48. arXiv:2406.01417  [pdf, other

    cs.LG cs.CV

    Mixup Augmentation with Multiple Interpolations

    Authors: Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok

    Abstract: Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pai… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  49. arXiv:2406.01022  [pdf, other

    cs.CR cs.IR

    Poisoning Attacks and Defenses in Recommender Systems: A Survey

    Authors: Zongwei Wang, Junliang Yu, Min Gao, Wei Yuan, Guanhua Ye, Shazia Sadiq, Hongzhi Yin

    Abstract: Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 22 pages, 8 figures

  50. arXiv:2406.00987  [pdf, other

    cs.LG cs.CY cs.SI

    Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement

    Authors: Wenjing Chang, Kay Liu, Philip S. Yu, Jianjun Yu

    Abstract: Graph anomaly detection (GAD) is increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions skewed toward certain demographic groups defined on sensitive attributes (e.g., gender, religion, ethnicity, etc.). This greatly limits the app… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.