Skip to main content

Showing 1–50 of 326 results for author: Wan, Z

  1. arXiv:2407.07131  [pdf, other

    math-ph math.CO math.QA

    A matrix solution to any polygon equation

    Authors: Zheyan Wan

    Abstract: In this article, we construct matrices associated to Pachner $\frac{n-1}{2}$-$\frac{n-1}{2}$ moves for odd $n$ and matrices associated to Pachner $(\frac{n}{2}-1)$-$\frac{n}{2}$ moves for even $n$. The entries of these matrices are rational functions of formal variables in a field. We prove that these matrices satisfy the $n$-gon equation for any $n$.

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 18 pages, 4 figures

  2. arXiv:2407.04998  [pdf, other

    cs.CV cs.CL cs.LG

    The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge

    Authors: Longfei Huang, Feng Yu, Zhihao Guan, Zhonghua Wan, Yang Yang

    Abstract: This report presents a solution for the zero-shot referring expression comprehension task. Visual-language multimodal base models (such as CLIP, SAM) have gained significant attention in recent years as a cornerstone of mainstream research. One of the key applications of multimodal base models lies in their ability to generalize to zero-shot downstream tasks. Unlike traditional referring expressio… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  3. arXiv:2407.04996  [pdf, other

    cs.LG cs.CV

    The Solution for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition

    Authors: Sishun Pan, Xixian Wu, Tingmin Li, Longfei Huang, Mingxu Feng, Zhonghua Wan, Yang Yang

    Abstract: This paper presents a data-free, parameter-isolation-based continual learning algorithm we developed for the sequential task continual learning track of the 2nd Greater Bay Area International Algorithm Competition. The method learns an independent parameter subspace for each task within the network's convolutional and linear layers and freezes the batch normalization layers after the first task. S… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  4. arXiv:2407.04994  [pdf, other

    cs.CV cs.LG

    The Solution for Language-Enhanced Image New Category Discovery

    Authors: Haonan Xu, Dian Chao, Xiangyu Wu, Zhonghua Wan, Yang Yang

    Abstract: Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on textual labels to store visual information is insufficient for representing the diversity of visual objects. In this paper, we propose reversing the training proce… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  5. arXiv:2407.04991  [pdf, other

    cs.LG cs.CL

    The Solution for the AIGC Inference Performance Optimization Competition

    Authors: Sishun Pan, Haonan Xu, Zhonghua Wan, Yang Yang

    Abstract: In recent years, the rapid advancement of large-scale pre-trained language models based on transformer architectures has revolutionized natural language processing tasks. Among these, ChatGPT has gained widespread popularity, demonstrating human-level conversational abilities and attracting over 100 million monthly users by late 2022. Concurrently, Baidu's commercial deployment of the Ernie Wenxin… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  6. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  7. The Host Galaxy Fluxes of Active Galaxy Nuclei Are Generally Overestimated by the Flux Variation Gradient Method

    Authors: Minxuan Cai, Zhen Wan, Zhenyi Cai, Lulu Fan, Junxian Wang

    Abstract: In terms of the variable nature of normal active galaxy nuclei (AGN) and luminous quasars, a so-called flux variation gradient (FVG) method has been widely utilized to estimate the underlying non-variable host galaxy fluxes. The FVG method assumes an invariable AGN color, but this assumption has been questioned by the intrinsic color variation of quasars and local Seyfert galaxies. Here, using an… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Journal ref: Universe 2024, 10, 282

  8. arXiv:2407.01081  [pdf, other

    cs.CV cs.CL

    CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

    Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che

    Abstract: Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2406.18139  [pdf, other

    cs.CL cs.CV

    LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

    Authors: Zhongwei Wan, Ziang Wu, Che Liu, Jinfa Huang, Zhihong Zhu, Peng Jin, Longyue Wang, Li Yuan

    Abstract: Long-context Multimodal Large Language Models (MLLMs) demand substantial computational resources for inference as the growth of their multimodal Key-Value (KV) cache, in response to increasing input lengths, challenges memory and time efficiency. Unlike single-modality LLMs that manage only textual contexts, the KV cache of long-context MLLMs includes representations from multiple images with temp… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  10. arXiv:2406.14278  [pdf, ps, other

    cs.DS

    Efficient Deterministic Algorithms for Maximizing Symmetric Submodular Functions

    Authors: Zongqi Wan, Jialin Zhang, Xiaoming Sun, Zhijie Zhang

    Abstract: Symmetric submodular maximization is an important class of combinatorial optimization problems, including MAX-CUT on graphs and hyper-graphs. The state-of-the-art algorithm for the problem over general constraints has an approximation ratio of $0.432$. The algorithm applies the canonical continuous greedy technique that involves a sampling process. It, therefore, suffers from high query complexity… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  11. arXiv:2406.13060  [pdf, other

    cs.LG cs.AI stat.AP

    Scale-Translation Equivariant Network for Oceanic Internal Solitary Wave Localization

    Authors: Zhang Wan, Shuo Wang, Xudong Zhang

    Abstract: Internal solitary waves (ISWs) are gravity waves that are often observed in the interior ocean rather than the surface. They hold significant importance due to their capacity to carry substantial energy, thus influence pollutant transport, oil platform operations, submarine navigation, etc. Researchers have studied ISWs through optical images, synthetic aperture radar (SAR) images, and altimeter d… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages, 5 figures

  12. arXiv:2406.13035  [pdf, other

    cs.CL

    D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

    Authors: Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang

    Abstract: Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discrimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  13. arXiv:2406.08701  [pdf

    physics.app-ph

    Impacts of Backside Insulation on the Dynamic On-Resistance of Lateral p-GaN HEMTs-on-Si

    Authors: Yu-Xuan Wang, Mao-Chou Tai, Ting-Chang Chang, Wei-Chen Huang, Zeyu Wan, Simon Li, Simon Sze, Guangrui Xia

    Abstract: We examined the effect of backside insulation on the dynamic on-resistance of lateral p-GaN HEMTs. To gain a comprehensive understanding of the dynamic onresistance difference between substrate grounded and substrate floating p-GaN HEMTs, we conducted in-circuit double pulse testing and long-term direct current (DC) bias stress. We have realized that while backside insulation can enhance the break… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.07146  [pdf, other

    cs.CV cs.AI

    Benchmarking and Boosting Radiology Report Generation for 3D High-Resolution Medical Images

    Authors: Che Liu, Zhongwei Wan, Yuqi Wang, Hui Shen, Haozhe Wang, Kangyu Zheng, Mi Zhang, Rossella Arcucci

    Abstract: Automatic radiology report generation can significantly benefit the labor-intensive process of report writing by radiologists, especially for 3D radiographs like CT scans, which are crucial for broad clinical diagnostics yet underexplored compared to 2D radiographs. Existing methods often handle 3D volumes either slice-wise or with aggressive downsampling due to current GPU memory limitations, whi… ▽ More

    Submitted 12 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  15. arXiv:2406.04835  [pdf, other

    cs.RO

    SLR: Learning Quadruped Locomotion without Privileged Information

    Authors: Shiyi Chen, Zeyu Wan, Shiyang Yan, Chun Zhang, Weiyi Zhang, Qiang Li, Debing Zhang, Fasih Ud Din Farrukh

    Abstract: Traditional reinforcement learning control for quadruped robots often relies on privileged information, demanding meticulous selection and precise estimation, thereby imposing constraints on the development process. This work proposes a Self-learning Latent Representation (SLR) method, which achieves high-performance control policy learning without the need for privileged information. To enhance t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  16. arXiv:2406.03403  [pdf, other

    cs.LG cs.AI q-bio.QM

    Structure-based Drug Design Benchmark: Do 3D Methods Really Dominate?

    Authors: Kangyu Zheng, Yingzhou Lu, Zaixi Zhang, Zhongwei Wan, Yao Ma, Marinka Zitnik, Tianfan Fu

    Abstract: Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the perfo… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  17. arXiv:2406.01601  [pdf, other

    cs.DC cs.AI cs.LG

    Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration

    Authors: Wei Ji, Li Li, Zheqi Lv, Wenqiao Zhang, Mengze Li, Zhen Wan, Wenqiang Lei, Roger Zimmermann

    Abstract: In our increasingly interconnected world, where intelligent devices continually amass copious personalized multi-modal data, a pressing need arises to deliver high-quality, personalized device-aware services. However, this endeavor presents a multifaceted challenge to prevailing artificial intelligence (AI) systems primarily rooted in the cloud. As these systems grapple with shifting data distribu… ▽ More

    Submitted 21 May, 2024; originally announced June 2024.

  18. arXiv:2405.18132  [pdf, other

    cs.CV

    EG4D: Explicit Generation of 4D Object without Score Distillation

    Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

    Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  19. arXiv:2405.15821  [pdf, other

    cs.AI cs.LG

    Reinforcing Language Agents via Policy Optimization with Action Decomposition

    Authors: Muning Wen, Ziyu Wan, Weinan Zhang, Jun Wang, Ying Wen

    Abstract: Language models as intelligent agents push the boundaries of sequential decision-making agents but struggle with limited knowledge of environmental dynamics and exponentially huge action space. Recent efforts like GLAM and TWOSOME manually constrain the action space to a restricted subset and employ reinforcement learning to align agents' knowledge with specific environments. However, they overloo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 24 pages with 9 pages are main context

  20. arXiv:2405.12757  [pdf, other

    cs.CV

    BIMM: Brain Inspired Masked Modeling for Video Representation Learning

    Authors: Zhifan Wan, Jie Zhang, Changzhen Li, Shiguang Shan

    Abstract: The visual pathway of human brain includes two sub-pathways, ie, the ventral pathway and the dorsal pathway, which focus on object identification and dynamic information modeling, respectively. Both pathways comprise multi-layer structures, with each layer responsible for processing different aspects of visual information. Inspired by visual information processing mechanism of the human brain, we… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.11916  [pdf, ps, other

    cs.LG cs.CR

    Information Leakage from Embedding in Large Language Models

    Authors: Zhipeng Wan, Anda Cheng, Yinggui Wang, Lei Wang

    Abstract: The widespread adoption of large language models (LLMs) has raised concerns regarding data privacy. This study aims to investigate the potential for privacy invasion through input reconstruction attacks, in which a malicious model provider could potentially recover user inputs from embeddings. We first propose two base methods to reconstruct original texts from a model's hidden states. We find tha… ▽ More

    Submitted 22 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  22. arXiv:2405.10767  [pdf, other

    cs.HC cs.AI

    Evaluating Saliency Explanations in NLP by Crowdsourcing

    Authors: Xiaotian Lu, Jiyi Li, Zhen Wan, Xiaofeng Lin, Koh Takeuchi, Hisashi Kashima

    Abstract: Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 13 pages, 4 figures, Accepted for LREC-Coling 2024 (Oral)

  23. arXiv:2404.15700  [pdf, other

    cs.CV cs.RO

    MAS-SAM: Segment Any Marine Animal with Aggregated Features

    Authors: Tianyu Yan, Zifu Wan, Xinhao Deng, Pingping Zhang, Yang Liu, Huchuan Lu

    Abstract: Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light images. In underwater scenes, it exhibits substantial performance degradation due to the light scattering and absorption. Meanwhile, the simplicity of th… ▽ More

    Submitted 9 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI2024 as Poster

  24. arXiv:2404.15261  [pdf, other

    math.OC cs.DM cs.LG math.PR

    All You Need is Resistance: On the Equivalence of Effective Resistance and Certain Optimal Transport Problems on Graphs

    Authors: Sawyer Robertson, Zhengchao Wan, Alexander Cloninger

    Abstract: The fields of effective resistance and optimal transport on graphs are filled with rich connections to combinatorics, geometry, machine learning, and beyond. In this article we put forth a bold claim: that the two fields should be understood as one and the same, up to a choice of $p$. We make this claim precise by introducing the parameterized family of $p$-Beckmann distances for probability measu… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 35 pages, 7 figures

    MSC Class: 65K10; 05C21; 90C25; 68R10; 05C50

  25. arXiv:2404.12083  [pdf, other

    cs.CV

    MambaPupil: Bidirectional Selective Recurrent model for Event-based Eye tracking

    Authors: Zhong Wang, Zengyu Wan, Han Han, Bohao Liao, Yuliang Wu, Wei Zhai, Yang Cao, Zheng-jun Zha

    Abstract: Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera. However, the diversity and abruptness of eye movement patterns, including blinking, fixating, saccades, and smooth pursuit, pose significant challenges for eye localization. To achieve a stable event-based eye-tracking system, this paper proposes a bidirectional long-… ▽ More

    Submitted 30 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024 Workshop (AIS: Vision, Graphics and AI for Streaming), top solution of challenge Event-based Eye Tracking, see https://www.kaggle.com/competitions/event-based-eye-tracking-ais2024

  26. arXiv:2404.11901  [pdf, other

    physics.optics physics.bio-ph

    Deep and Dynamic Metabolic and Structural Imaging in Living Tissues

    Authors: Kunzan Liu, Honghao Cao, Kasey Shashaty, Li-Yu Yu, Sarah Spitz, Francesca Michela Pramotton, Zhengpeng Wan, Ellen L. Kan, Erin N. Tevonian, Manuel Levy, Eva Lendaro, Roger D. Kamm, Linda G. Griffith, Fan Wang, Tong Qiu, Sixian You

    Abstract: Label-free imaging through two-photon autofluorescence (2PAF) of NAD(P)H allows for non-destructive and high-resolution visualization of cellular activities in living systems. However, its application to thick tissues and organoids has been restricted by its limited penetration depth within 300 $μ$m, largely due to tissue scattering at the typical excitation wavelength (~750 nm) required for NAD(P… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 20 pages, 5 figures, under review in Science Advances

  27. arXiv:2404.11770  [pdf, other

    cs.CV cs.AI

    Event-Based Eye Tracking. AIS 2024 Challenge Survey

    Authors: Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li , et al. (14 additional authors not shown)

    Abstract: This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggl… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Qinyu Chen is the corresponding author

  28. Understanding the Role of Temperature in Diverse Question Generation by GPT-4

    Authors: Arav Agarwal, Karthik Mittal, Aidan Doyle, Pragnya Sridhar, Zipiao Wan, Jacob Arthur Doughty, Jaromir Savelka, Majd Sakr

    Abstract: We conduct a preliminary study of the effect of GPT's temperature parameter on the diversity of GPT4-generated questions. We find that using higher temperature values leads to significantly higher diversity, with different temperatures exposing different types of similarity between generated sets of questions. We also demonstrate that diverse question generation is especially difficult for questio… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  29. arXiv:2404.04256  [pdf, other

    cs.CV

    Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

    Authors: Zifu Wan, Yuhao Wang, Silong Yong, Pingping Zhang, Simon Stepputtis, Katia Sycara, Yaqi Xie

    Abstract: Multi-modal semantic segmentation significantly enhances AI agents' perception and scene understanding, especially under adverse conditions like low-light or overexposed environments. Leveraging additional modalities (X-modality) like thermal and depth alongside traditional RGB provides complementary information, enabling more robust and reliable segmentation. In this work, we introduce Sigma, a S… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  30. arXiv:2404.04173  [pdf, other

    cs.AR cs.LG

    H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations

    Authors: Zishen Wan, Che-Kai Liu, Mohamed Ibrahim, Hanchen Yang, Samuel Spetalnick, Tushar Krishna, Arijit Raychowdhury

    Abstract: Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative com… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 2024 Design Automation and Test in Europe (DATE); The first two authors have equal contributions

  31. arXiv:2404.03654  [pdf, other

    cs.CV

    RaFE: Generative Radiance Fields Restoration

    Authors: Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu

    Abstract: NeRF (Neural Radiance Fields) has demonstrated tremendous potential in novel view synthesis and 3D reconstruction, but its performance is sensitive to input image quality, which struggles to achieve high-fidelity rendering when provided with low-quality sparse input viewpoints. Previous methods for NeRF restoration are tailored for specific degradation type, ignoring the generality of restoration.… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://zkaiwu.github.io/RaFE

  32. arXiv:2403.12268  [pdf, other

    cs.IT eess.SP

    Near-Field Channel Modeling for Electromagnetic Information Theory

    Authors: Zhongzhichao Wan, Jieao Zhu, Linglong Dai

    Abstract: Electromagnetic information theory (EIT) is one of the emerging topics for 6G communication due to its potential to reveal the performance limit of wireless communication systems. For EIT, the research foundation is reasonable and accurate channel modeling. Existing channel modeling works for EIT in non-line-of-sight (NLoS) scenario focus on far-field modeling, which can not accurately capture the… ▽ More

    Submitted 26 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In this paper, we propose the near-field channel model for EIT based on electromagnetic scattering theory. Then, we derive the analytical expression of the correlation function of the fields and analyze the characteristics of it. Finally, we design a channel estimation scheme for near-field scenario

  33. arXiv:2403.07378  [pdf, other

    cs.CL cs.LG

    SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

    Authors: Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang

    Abstract: The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment. Singular Value Decomposition (SVD) offers a promising solution for LLM compression. However, state-of-the-art SVD-based LLM compression methods have two key limitations: truncating smaller singular values may lead to higher compression… ▽ More

    Submitted 28 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Code available at: https://github.com/AIoT-MLSys-Lab/SVD-LLM

  34. arXiv:2403.07153  [pdf, other

    cs.CV

    2023 Low-Power Computer Vision Challenge (LPCVC) Summary

    Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

    Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: LPCVC 2023, website: https://lpcv.ai/

  35. arXiv:2403.06659  [pdf, other

    eess.SP cs.AI cs.LG

    Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

    Authors: Che Liu, Zhongwei Wan, Cheng Ouyang, Anand Shah, Wenjia Bai, Rossella Arcucci

    Abstract: Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) methods show promise in representation learning from unannotated ECG data, they often overlook the clinical knowledge that can be found in reports. This oversight and the requirement for annotated samples for downstream tasks… ▽ More

    Submitted 2 July, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by ICML2024

  36. arXiv:2403.05465  [pdf, other

    cs.AR cs.AI cs.LG cs.NE

    Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

    Authors: Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna

    Abstract: Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training. In this study, we introduce Logarithmic Posits (LP), an adaptive, hardware-friendly data type inspired by posits that dynamica… ▽ More

    Submitted 26 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 2024 61st IEEE/ACM Design Automation Conference (DAC)

  37. arXiv:2403.04945  [pdf, other

    cs.CL cs.LG eess.SP

    MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation

    Authors: Zhongwei Wan, Che Liu, Xin Wang, Chaofan Tao, Hui Shen, Zhenwu Peng, Jie Fu, Rossella Arcucci, Huaxiu Yao, Mi Zhang

    Abstract: Electrocardiogram (ECG) is the primary non-invasive diagnostic tool for monitoring cardiac conditions and is crucial in assisting clinicians. Recent studies have concentrated on classifying cardiac conditions using ECG data but have overlooked ECG report generation, which is time-consuming and requires clinical expertise. To automate ECG report generation and ensure its versatility, we propose the… ▽ More

    Submitted 18 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Under review

  38. arXiv:2403.03690  [pdf

    cs.CL cs.AI

    Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

    Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

    Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}

  39. arXiv:2403.01686  [pdf, other

    astro-ph.HE astro-ph.GA

    AT2023lli: A Tidal Disruption Event with Prominent Optical Early Bump and Delayed Episodic X-ray Emission

    Authors: Shifeng Huang, Ning Jiang, Jiazheng Zhu, Yibo Wang, Tinggui Wang, Shan-Qin Wang, Wen-Pei Gan, En-Wei Liang, Yu-Jing Qin, Zheyu Lin, Lin-Na Xu, Min-Xuan Cai, Ji-An Jiang, Xu Kong, Jiaxun Li, Long Li, Jian-Guo Wang, Ze-Lin Xu, Yongquan Xue, Ye-Fei Yuan, Jingquan Cheng, Lulu Fan, Jie Gao, Lei Hu, Weida Hu , et al. (20 additional authors not shown)

    Abstract: High-cadence, multiwavelength observations have continuously revealed the diversity of tidal disruption events (TDEs), thus greatly advancing our knowledge and understanding of TDEs. In this work, we conducted an intensive optical-UV and X-ray follow-up campaign of TDE AT2023lli, and found a remarkable month-long bump in its UV/optical light curve nearly two months prior to maximum brightness. The… ▽ More

    Submitted 26 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: 14 pages, 8 figures,accepted for publication by ApJL

  40. arXiv:2402.15398  [pdf, other

    cs.LG cs.AI cs.CY

    TransFlower: An Explainable Transformer-Based Model with Flow-to-Flow Attention for Commuting Flow Prediction

    Authors: Yan Luo, Zhuoyue Wan, Yuzhong Chen, Gengchen Mai, Fu-lai Chung, Kent Larson

    Abstract: Understanding the link between urban planning and commuting flows is crucial for guiding urban development and policymaking. This research, bridging computer science and urban studies, addresses the challenge of integrating these fields with their distinct focuses. Traditional urban studies methods, like the gravity and radiation models, often underperform in complex scenarios due to their limited… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  41. arXiv:2402.14202  [pdf, other

    cs.LG

    Comparing Graph Transformers via Positional Encodings

    Authors: Mitchell Black, Zhengchao Wan, Gal Mishne, Amir Nayyeri, Yusu Wang

    Abstract: The distinguishing power of graph transformers is closely tied to the choice of positional encoding: features used to augment the base transformer with information about the graph. There are two primary types of positional encoding: absolute positional encodings (APEs) and relative positional encodings (RPEs). APEs assign features to each node and are given as input to the transformer. RPEs instea… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: accepted to ICML 2024

  42. arXiv:2402.13607  [pdf, other

    cs.CV cs.CL

    CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

    Authors: Fuwen Luo, Chi Chen, Zihao Wan, Zhaolu Kang, Qidong Yan, Yingjie Li, Xiaolong Wang, Siyu Wang, Ziyue Wang, Xiaoyue Mi, Peng Li, Ning Ma, Maosong Sun, Yang Liu

    Abstract: Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpret… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  43. arXiv:2402.09924  [pdf, ps, other

    astro-ph.HE astro-ph.GA

    Understanding the phenomenological and intrinsic blazar sequence using a simple scaling model

    Authors: Zhu-Jian Wan, Rui Xue, Ze-Rui Wang, Hu-Bing Xiao, Jun-Hui Fan

    Abstract: The blazar sequence, including negative correlations between radiative luminosity $L_{\rm rad}$ and synchrotron peak frequency $ν$, and between Compton dominance $Y$ and $ν$, is widely adopted as a phenomenological description of spectral energy distributions (SEDs) of blazars, although its underlying cause is hotly debated. In particular, these correlations turn positive after correcting Doppler… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in MNRAS (14 pages, 6 figures, 3 tables)

  44. arXiv:2402.07157  [pdf, other

    cs.CL cs.AI cs.LG

    Natural Language Reinforcement Learning

    Authors: Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushik, Yali Du, Ying Wen, Jun Wang

    Abstract: Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To tackle these limitations, we take inspiration from the human learning process and introduce Natural Language Reinforcement Learning (NLRL), which innovatively co… ▽ More

    Submitted 14 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: Work in Progress

  45. arXiv:2402.06023  [pdf, other

    cs.LG cs.AI cs.GT

    Decision Theory-Guided Deep Reinforcement Learning for Fast Learning

    Authors: Zelin Wan, Jin-Hee Cho, Mu Zhu, Ahmed H. Anwar, Charles Kamhoua, Munindar P. Singh

    Abstract: This paper introduces a novel approach, Decision Theory-guided Deep Reinforcement Learning (DT-guided DRL), to address the inherent cold start problem in DRL. By integrating decision theory principles, DT-guided DRL enhances agents' initial performance and robustness in complex environments, enabling more efficient and reliable convergence during learning. Our investigation encompasses two primary… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  46. arXiv:2402.04467  [pdf, other

    cs.LG math.DS

    DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

    Authors: Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-Núñez

    Abstract: Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invari… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024; Code to reproduce our experiments is available at https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/ergodic

  47. arXiv:2401.16193  [pdf, other

    cs.LG cs.DB

    Contributing Dimension Structure of Deep Feature for Coreset Selection

    Authors: Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin'ichi Satoh

    Abstract: Coreset selection seeks to choose a subset of crucial training samples for efficient learning. It has gained traction in deep learning, particularly with the surge in training dataset sizes. Sample selection hinges on two main aspects: a sample's representation in enhancing performance and the role of sample diversity in averting overfitting. Existing methods typically measure both the representat… ▽ More

    Submitted 2 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 13 pages,11 figures, to be published in AAAI2024

  48. arXiv:2401.11713  [pdf, other

    cs.CV cs.AI

    Medical Image Debiasing by Learning Adaptive Agreement from a Biased Council

    Authors: Luyang Luo, Xin Huang, Minghao Wang, Zhuoyue Wan, Hao Chen

    Abstract: Deep learning could be prone to learning shortcuts raised by dataset bias and result in inaccurate, unreliable, and unfair models, which impedes its adoption in real-world clinical applications. Despite its significance, there is a dearth of research in the medical image classification domain to address dataset bias. Furthermore, the bias labels are often agnostic, as identifying biases can be lab… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 10 pages, 5 figures, 3 tables. Code and benchmark will be released via https://github.com/LLYXC/Ada-ABC/tree/main

  49. arXiv:2401.10443  [pdf, other

    cs.SE

    Towards Automated Driving Violation Cause Analysis in Scenario-Based Testing for Autonomous Driving Systems

    Authors: Ziwen Wan, Yuqi Huai, Yuntianyi Chen, Joshua Garcia, Qi Alfred Chen

    Abstract: The rapid advancement of Autonomous Vehicles (AVs), exemplified by companies like Waymo and Cruise offering 24/7 paid taxi services, highlights the paramount importance of ensuring AVs' compliance with various policies, such as safety regulations, traffic rules, and mission directives. Despite significant progress in the development of Autonomous Driving System (ADS) testing tools, there has been… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  50. arXiv:2401.08330  [pdf, other

    cs.LG cs.AI math.OC

    Boosting Gradient Ascent for Continuous DR-submodular Maximization

    Authors: Qixin Zhang, Zongqi Wan, Zengde Deng, Zaiyi Chen, Xiaoming Sun, Jialin Zhang, Yu Yang

    Abstract: Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficient… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 74 pages, 6 figures and 9 tables. An extended version of Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function (ICML 2022)