Skip to main content

Showing 1–50 of 552 results for author: Hu, W

  1. arXiv:2407.12568  [pdf, other

    cs.CV

    LTRL: Boosting Long-tail Recognition via Reflective Learning

    Authors: Qihao Zhao, Yalun Dai, Shen Lin, Wei Hu, Fan Zhang, Jun Liu

    Abstract: In real-world scenarios, where knowledge distributions exhibit long-tail. Humans manage to master knowledge uniformly across imbalanced distributions, a feat attributed to their diligent practices of reviewing, summarizing, and correcting errors. Motivated by this learning process, we propose a novel learning paradigm, called reflecting learning, in handling long-tail recognition. Our method integ… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  2. arXiv:2407.11398  [pdf, other

    cs.CV

    Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

    Authors: Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao

    Abstract: Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Anim… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Project Page: https://animate3d.github.io/

  3. arXiv:2407.10990  [pdf

    cs.CL cs.AI

    MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

    Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

    Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 25 pages.4 figures

  4. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  5. arXiv:2407.10430  [pdf, other

    cs.CL cs.AI

    Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-Starting Progressive Propagation

    Authors: Zhoutian Shao, Yuanning Cui, Wei Hu

    Abstract: Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted in the 23rd International Semantic Web Conference (ISWC 2024)

  6. arXiv:2407.07479  [pdf, other

    cs.CV

    How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

    Authors: Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu

    Abstract: Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  7. arXiv:2407.07478  [pdf, other

    cs.CV

    EA-VTR: Event-Aware Video-Text Retrieval

    Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

    Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  8. arXiv:2407.07403  [pdf, other

    cs.CV

    A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

    Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

    Abstract: With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the compl… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  9. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  10. arXiv:2407.03884  [pdf, other

    cs.CL cs.AI

    Planning with Large Language Models for Conversational Agents

    Authors: Zhigen Li, Jianxiang Peng, Yanmeng Wang, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong

    Abstract: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be un… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  11. arXiv:2406.19240  [pdf, other

    cs.SE

    Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

    Authors: Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu

    Abstract: Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data prepara… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  12. arXiv:2406.18078  [pdf, other

    cs.CL cs.AI

    Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

    Authors: Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu

    Abstract: Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-tra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Main Conference

  13. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  14. arXiv:2406.13511  [pdf, other

    cs.DC

    Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving

    Authors: Ke Cheng, Wen Hu, Zhi Wang, Hongen Peng, Jianguo Li, Sheng Zhang

    Abstract: Large language models (LLMs) iteratively generate text token by token, with memory usage increasing with the length of generated token sequences. The unpredictability of generation lengths makes it difficult to estimate the time and memory needed to process requests, posing a challenge for effective request scheduling. Conventional sequence-level scheduling (SLS) serves requests in a first-come fi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 22 figures

  15. PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

    Authors: Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann

    Abstract: Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ACM Multimedia 2023. This version contains additional supplementary materials

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023) 56-66

  16. arXiv:2406.12881  [pdf, other

    physics.acc-ph cs.CL

    Towards Unlocking Insights from Logbooks Using AI

    Authors: Antonin Sulc, Alex Bien, Annika Eichler, Daniel Ratner, Florian Rehm, Frank Mayet, Gregor Hartmann, Hayden Hoschouer, Henrik Tuennermann, Jan Kaiser, Jason St. John, Jennefer Maldonado, Kyle Hazelwood, Raimund Kammering, Thorsten Hellert, Tim Wilksen, Verena Kain, Wan-Lin Hu

    Abstract: Electronic logbooks contain valuable information about activities and events concerning their associated particle accelerator facilities. However, the highly technical nature of logbook entries can hinder their usability and automation. As natural language processing (NLP) continues advancing, it offers opportunities to address various challenges that logbooks present. This work explores jointly t… ▽ More

    Submitted 25 May, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure, 15th International Particle Accelerator Conference

  17. arXiv:2406.10765  [pdf, other

    cs.DC

    PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

    Authors: Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu, Hong An, Jinlong Yang

    Abstract: First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  18. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.05785  [pdf, other

    cs.CV

    A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions

    Authors: Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

    Abstract: Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years. Compared to 2D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world and the complexity of data… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  20. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  21. arXiv:2406.04785  [pdf, other

    cs.DC

    Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

    Authors: Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang

    Abstract: Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. Fir… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 12 pages, 14 figures

  22. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  23. arXiv:2405.20279  [pdf, other

    cs.CV cs.AI eess.IV

    CV-VAE: A Compatible Video VAE for Latent Generative Video Models

    Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

    Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

  24. arXiv:2405.19958  [pdf, other

    cs.CL cs.AI

    Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation

    Authors: Yi Liu, Xiangyu Liu, Xiangrong Zhu, Wei Hu

    Abstract: Multi-aspect controllable text generation aims to control the generated texts in attributes from multiple aspects (e.g., "positive" from sentiment and "sport" from topic). For ease of obtaining training samples, existing works neglect attribute correlations formed by the intertwining of different attributes. Particularly, the stereotype formed by imbalanced attribute correlations significantly aff… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  25. arXiv:2405.19782  [pdf, other

    cs.SE cs.CL

    Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

    Authors: Wei Cheng, Yuhan Wu, Wei Hu

    Abstract: Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we pr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  26. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  27. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  28. arXiv:2405.17875  [pdf, other

    math.OC cs.LG

    BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification

    Authors: Yen-An Lu, Wei-Shou Hu, Joel A. Paulson, Qi Zhang

    Abstract: This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution m… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  29. arXiv:2405.17811  [pdf, other

    cs.GR cs.CV

    Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh

    Authors: Xiangjun Gao, Xiaoyu Li, Yiyu Zhuang, Qi Zhang, Wenbo Hu, Chaopeng Zhang, Yao Yao, Ying Shan, Long Quan

    Abstract: Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not hi… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: Project page here: https://gaoxiangjun.github.io/mani_gs/

  30. arXiv:2405.16122  [pdf, other

    cs.AI cs.CL cs.LG stat.ML

    Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

    Authors: Zhaoxuan Wu, Xiaoqiang Lin, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low

    Abstract: Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of in-context learning (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar s… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 23 pages, 1 figure, 23 tables

  31. arXiv:2405.14545  [pdf, other

    q-bio.BM cs.LG

    A Cross-Field Fusion Strategy for Drug-Target Interaction Prediction

    Authors: Hongzhi Zhang, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Drug-target interaction (DTI) prediction is a critical component of the drug discovery process. In the drug development engineering field, predicting novel drug-target interactions is extremely crucial.However, although existing methods have achieved high accuracy levels in predicting known drugs and drug targets, they fail to utilize global protein information during DTI prediction. This leads to… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  32. arXiv:2405.14536  [pdf, other

    q-bio.MN cs.AI cs.LG

    Regressor-free Molecule Generation to Support Drug Response Prediction

    Authors: Kun Li, Xiuwen Gong, Shirui Pan, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Drug response prediction (DRP) is a crucial phase in drug discovery, and the most important metric for its evaluation is the IC50 score. DRP results are heavily dependent on the quality of the generated molecules. Existing molecule generation methods typically employ classifier-based guidance, enabling sampling within the IC50 classification range. However, these methods fail to ensure the samplin… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 7 figures, 9 tables,

  33. arXiv:2405.13568  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    CPE-Identifier: Automated CPE identification and CVE summaries annotation with Deep Learning and NLP

    Authors: Wanyu Hu, Vrizlynn L. L. Thing

    Abstract: With the drastic increase in the number of new vulnerabilities in the National Vulnerability Database (NVD) every year, the workload for NVD analysts to associate the Common Platform Enumeration (CPE) with the Common Vulnerabilities and Exposures (CVE) summaries becomes increasingly laborious and slow. The delay causes organisations, which depend on NVD for vulnerability management and security me… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: International Conference on Information Systems Security and Privacy 2024

  34. arXiv:2405.11456  [pdf, other

    cs.CR

    Biometrics-Based Authenticated Key Exchange with Multi-Factor Fuzzy Extractor

    Authors: Hong Yen Tran, Jiankun Hu, Wen Hu

    Abstract: Existing fuzzy extractors and similar methods provide an effective way for extracting a secret key from a user's biometric data, but are susceptible to impersonation attack: once a valid biometric sample is captured, the scheme is no longer secure. We propose a novel multi-factor fuzzy extractor that integrates both a user's secret (e.g., a password) and a user's biometrics in the generation and r… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 17 pages

  35. arXiv:2405.11001  [pdf, other

    astro-ph.IM astro-ph.EP cs.RO

    Using physics-based simulation towards eliminating empiricism in extraterrestrial terramechanics applications

    Authors: Wei Hu, Pei Li, Arno Rogg, Alexander Schepelmann, Colin Creager, Samuel Chandler, Ken Kamrin, Dan Negrut

    Abstract: Recently, there has been a surge of international interest in extraterrestrial exploration targeting the Moon, Mars, the moons of Mars, and various asteroids. This contribution discusses how current state-of-the-art Earth-based testing for designing rovers and landers for these missions currently leads to overly optimistic conclusions about the behavior of these devices upon deployment on the targ… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  36. arXiv:2405.10642  [pdf, other

    cs.LG

    Hi-GMAE: Hierarchical Graph Masked Autoencoders

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu, Shirui Pan, Bo Du

    Abstract: Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance,… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, 3 tables

  37. arXiv:2405.10288  [pdf, other

    cs.CL cs.AI

    Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction

    Authors: Jianhao Chen, Haoyuan Ouyang, Junyang Ren, Wentao Ding, Wei Hu, Yuzhong Qu

    Abstract: Facts extraction is pivotal for constructing knowledge graphs. Recently, the increasing demand for temporal facts in downstream tasks has led to the emergence of the task of temporal fact extraction. In this paper, we specifically address the extraction of temporal facts from natural language text. Previous studies fail to handle the challenge of establishing time-to-fact correspondences in comple… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL2024 main conference

  38. arXiv:2405.06424  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

    Authors: JoonHo Lee, Jae Oh Woo, Juree Seok, Parisa Hassanzadeh, Wooseok Jang, JuYoun Son, Sima Didari, Baruch Gutow, Heng Hao, Hankyu Moon, Wenjun Hu, Yeong-Dae Kwon, Taehee Lee, Seungjai Min

    Abstract: Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t… ▽ More

    Submitted 19 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  39. arXiv:2405.05708  [pdf

    cs.IT

    Characteristic-Mode Based Conformal Design of Ultra-Wideband Antenna Array

    Authors: Zhan Chen, Wei Hu, Yuchen Gao, Qi Luo, Xiangbo Wang, Steven Gao

    Abstract: An innovative design method of conformal array antennas is presented by utilizing characteristic mode analysis (CMA) in this work. A single-layer continuous perfect electric conductor under bending conditions is conducted by CMA to evaluate the variations in operating performance. By using this method, the design process of a conformal array is simplified. The results indicate that the operating p… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  40. Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids

    Authors: Junchen Liu, Wenbo Hu, Zhuo Yang, Jianteng Chen, Guoliang Wang, Xiaoxue Chen, Yantong Cai, Huan-ang Gao, Hao Zhao

    Abstract: Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively and efficiently characterize anisotropic areas induced by the cone-casting procedure. This paper introduces a Ripmap-Encoded Platonic Solid representation to precisely and efficiently featurize 3D anisotrop… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024, Project page: https://junchenliu77.github.io/Rip-NeRF , Code: https://github.com/JunchenLiu77/Rip-NeRF

  41. arXiv:2404.15806  [pdf, other

    cs.LG

    Where to Mask: Structure-Guided Masking for Graph Masked Autoencoders

    Authors: Chuang Liu, Yuyao Wang, Yibing Zhan, Xueqi Ma, Dapeng Tao, Jia Wu, Wenbin Hu

    Abstract: Graph masked autoencoders (GMAE) have emerged as a significant advancement in self-supervised pre-training for graph-structured data. Previous GMAE models primarily utilize a straightforward random masking strategy for nodes or edges during training. However, this strategy fails to consider the varying significance of different nodes within the graph structure. In this paper, we investigate the po… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 Figures. Accepted by IJCAI 2024

  42. arXiv:2404.15729  [pdf, other

    cs.LG

    Gradformer: Graph Transformer with Exponential Decay

    Authors: Chuang Liu, Zelin Yao, Yibing Zhan, Xueqi Ma, Shirui Pan, Wenbin Hu

    Abstract: Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytic… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 9 pages, 7 figures. Accepted by IJCAI 2024

  43. arXiv:2404.13874  [pdf, other

    cs.CL cs.CV

    VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

    Authors: Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

    Abstract: Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucina… ▽ More

    Submitted 14 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Findings

  44. arXiv:2404.13518  [pdf, other

    cs.CR cs.AI

    Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion

    Authors: Hongyu Zhu, Sichu Liang, Wentao Hu, Fangqi Li, Ju Jia, Shilin Wang

    Abstract: With the rise of Machine Learning as a Service (MLaaS) platforms,safeguarding the intellectual property of deep learning models is becoming paramount. Among various protective measures, trigger set watermarking has emerged as a flexible and effective strategy for preventing unauthorized model distribution. However, this paper identifies an inherent flaw in the current paradigm of trigger set water… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  45. arXiv:2404.12210  [pdf, other

    cs.CV

    An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

    Authors: Jin Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

    Abstract: Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-esta… ▽ More

    Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: A submission to IJCV

  46. arXiv:2404.10413  [pdf, other

    cs.DB cs.LG cs.PF

    VDTuner: Automated Performance Tuning for Vector Data Management Systems

    Authors: Tiannuo Yang, Wen Hu, Wangqi Peng, Yusen Li, Jianguo Li, Gang Wang, Xiaoguang Liu

    Abstract: Vector data management systems (VDMSs) have become an indispensable cornerstone in large-scale information retrieval and machine learning systems like large language models. To enhance the efficiency and flexibility of similarity search, VDMS exposes many tunable index parameters and system parameters for users to specify. However, due to the inherent characteristics of VDMS, automatic performance… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ICDE 2024

  47. LoS Sensing-based Channel Estimation in UAV-Assisted OFDM Systems

    Authors: Chaojin Qing, Zhiying Liu, Wenquan Hu, Yinjie Zhang, Xi Cai, Pengfei Du

    Abstract: In unmanned aerial vehicle (UAV)-assisted orthogonal frequency division multiplexing (OFDM) systems, the potential advantage of the line-of-sight (LoS) path, characterized by its high probability of existence, has not been fully harnessed, thereby impeding the improvement of channel estimation (CE) accuracy. Inspired by the ideas of integrated sensing and communication (ISAC), this letter develops… ▽ More

    Submitted 22 February, 2024; originally announced April 2024.

  48. arXiv:2404.01340  [pdf, other

    cs.LG cs.AI

    From Similarity to Superiority: Channel Clustering for Time Series Forecasting

    Authors: Jialin Chen, Jan Eric Lenssen, Aosong Feng, Weihua Hu, Matthias Fey, Leandros Tassiulas, Jure Leskovec, Rex Ying

    Abstract: Time series forecasting has attracted significant attention in recent decades. Previous studies have demonstrated that the Channel-Independent (CI) strategy improves forecasting performance by treating different channels individually, while it leads to poor generalization on unseen instances and ignores potentially necessary interactions between channels. Conversely, the Channel-Dependent (CD) str… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 20 pages, 6 figures

  49. arXiv:2404.00776  [pdf, other

    cs.LG cs.DB stat.ML

    PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning

    Authors: Weihua Hu, Yiwen Yuan, Zecheng Zhang, Akihiro Nitta, Kaidi Cao, Vid Kocijan, Jure Leskovec, Matthias Fey

    Abstract: We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g.,… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: https://github.com/pyg-team/pytorch-frame

  50. arXiv:2403.18142  [pdf, other

    cs.LG

    HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

    Authors: Yongyi Yang, Jiaming Yang, Wei Hu, Michał Dereziński

    Abstract: As a variant of Graph Neural Networks (GNNs), Unfolded GNNs offer enhanced interpretability and flexibility over traditional designs. Nevertheless, they still suffer from scalability challenges when it comes to the training cost. Although many methods have been proposed to address the scalability issues, they mostly focus on per-iteration efficiency, without worst-case convergence guarantees. More… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.