Skip to main content

Showing 1–50 of 56 results for author: Niu, W

  1. arXiv:2407.13054  [pdf, other

    cs.AI

    Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data

    Authors: Wenjin Niu, Zijun Gao, Liyan Song, Lingbo Li

    Abstract: Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal disc… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.08532  [pdf, other

    cs.CR cs.SE

    Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models

    Authors: Ying Zhang, Xiaoyan Zhou, Hui Wen, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

    Abstract: Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 11 figures

  3. arXiv:2407.02813  [pdf, other

    cs.CV cs.AI cs.LG

    Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

    Authors: Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma

    Abstract: Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks… ▽ More

    Submitted 11 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  4. arXiv:2406.11515  [pdf, other

    cs.CR

    Obfuscating IoT Device Scanning Activity via Adversarial Example Generation

    Authors: Haocong Li, Yaxin Zhang, Long Cheng, Wenjia Niu, Haining Wang, Qiang Li

    Abstract: Nowadays, attackers target Internet of Things (IoT) devices for security exploitation, and search engines for devices and services compromise user privacy, including IP addresses, open ports, device types, vendors, and products.Typically, application banners are used to recognize IoT device profiles during network measurement and reconnaissance. In this paper, we propose a novel approach to obfusc… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. Nurgle: Exacerbating Resource Consumption in Blockchain State Storage via MPT Manipulation

    Authors: Zheyuan He, Zihao Li, Ao Qiao, Xiapu Luo, Xiaosong Zhang, Ting Chen, Shuwei Song, Dijun Liu, Weina Niu

    Abstract: Blockchains, with intricate architectures, encompass various components, e.g., consensus network, smart contracts, decentralized applications, and auxiliary services. While offering numerous advantages, these components expose various attack surfaces, leading to severe threats to blockchains. In this study, we unveil a novel attack surface, i.e., the state storage, in blockchains. The state storag… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  6. arXiv:2405.09054  [pdf, other

    cs.CV

    Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory Association

    Authors: Weihua Gao, Wenlong Niu, Wenlong Lu, Pengcheng Wang, Zhaoyuan Qi, Xiaodong Peng, Zhen Yang

    Abstract: The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  7. arXiv:2404.13528  [pdf, other

    cs.LG cs.AI cs.DC

    SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

    Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

    Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  8. arXiv:2404.13470  [pdf, other

    cs.DC cs.AI

    GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

    Authors: Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin

    Abstract: The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  9. arXiv:2404.11467  [pdf, other

    cs.SE cs.CR

    A Large-scale Fine-grained Analysis of Packages in Open-Source Software Ecosystems

    Authors: Xiaoyan Zhou, Feiran Liang, Zhaojie Xie, Yang Lan, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

    Abstract: Package managers such as NPM, Maven, and PyPI play a pivotal role in open-source software (OSS) ecosystems, streamlining the distribution and management of various freely available packages. The fine-grained details within software packages can unveil potential risks within existing OSS ecosystems, offering valuable insights for detecting malicious packages. In this study, we undertake a large-sca… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  10. arXiv:2404.04991  [pdf, other

    cs.CR cs.SE

    OSS Malicious Package Analysis in the Wild

    Authors: Xiaoyan Zhou, Ying Zhang, Wenjia Niu, Jiqiang Liu, Haining Wang, Qiang Li

    Abstract: The open-source software (OSS) ecosystem suffers from various security threats and risks, and malicious packages play a central role in software supply chain (SSC) attacks. Although malware research has a history of over thirty years, less attention has been paid to OSS malware. Its existing research has three limitations: a lack of high-quality datasets, malware diversity, and attack campaign con… ▽ More

    Submitted 21 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  11. arXiv:2403.10799  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

    Authors: Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang

    Abstract: Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices. Structured pruning is a widely used method to address this challenge. However, when dealing with the complex structure of the multiple decoder layers, general methods often employ common estimatio… ▽ More

    Submitted 14 May, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

  12. arXiv:2403.00176  [pdf, other

    cs.LG cs.AI cs.PL

    SoD$^2$: Statically Optimizing Dynamic Deep Neural Network

    Authors: Wei Niu, Gagan Agrawal, Bin Ren

    Abstract: Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a class… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  13. arXiv:2402.10787  [pdf, other

    cs.LG cs.AI cs.CL

    EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge

    Authors: Xuan Shen, Zhenglun Kong, Changdi Yang, Zhaoyang Han, Lei Lu, Peiyan Dong, Cheng Lyu, Chih-hsiang Li, Xuehang Guo, Zhihao Shu, Wei Niu, Miriam Leeser, Pu Zhao, Yanzhi Wang

    Abstract: Despite the remarkable strides of Large Language Models (LLMs) in various fields, the wide applications of LLMs on edge devices are limited due to their massive parameters and computations. To address this, quantization is commonly adopted to generate lightweight LLMs with efficient computations and fast inference. However, Post-Training Quantization (PTQ) methods dramatically degrade in quality w… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Preprint

  14. arXiv:2312.03345  [pdf, other

    cs.RO cs.CV

    GraNet: A Multi-Level Graph Network for 6-DoF Grasp Pose Generation in Cluttered Scenes

    Authors: Haowen Wang, Wanhao Niu, Chungang Zhuang

    Abstract: 6-DoF object-agnostic grasping in unstructured environments is a critical yet challenging task in robotics. Most current works use non-optimized approaches to sample grasp locations and learn spatial features without concerning the grasping task. This paper proposes GraNet, a graph-based grasp pose generation framework that translates a point cloud scene into multi-level graphs and propagates feat… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: IROS 2023

  15. arXiv:2309.07438  [pdf, other

    cs.AI cs.NI

    Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges

    Authors: Fei Dou, Jin Ye, Geng Yuan, Qin Lu, Wei Niu, Haijian Sun, Le Guan, Guoyu Lu, Gengchen Mai, Ninghao Liu, Jin Lu, Zhengliang Liu, Zihao Wu, Chenjiao Tan, Shaochen Xu, Xianqiao Wang, Guoming Li, Lilong Chai, Sheng Li, Jin Sun, Hongyue Sun, Yunli Shao, Changying Li, Tianming Liu, Wenzhan Song

    Abstract: Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas. This fascination extends particularly to the Internet of Things (IoT), a landscape characterized by the interconnection of countless devices, sensors, and systems, c… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  16. arXiv:2305.04081  [pdf, other

    cs.GT

    Portfolio-Based Incentive Mechanism Design for Cross-Device Federated Learning

    Authors: Jiaxi Yang, Sheng Cao, Cuifang Zhao, Weina Niu, Li-Chuan Tsai

    Abstract: In recent years, there has been a significant increase in attention towards designing incentive mechanisms for federated learning (FL). Tremendous existing studies attempt to design the solutions using various approaches (e.g., game theory, reinforcement learning) under different settings. Yet the design of incentive mechanism could be significantly biased in that clients' performance in many appl… ▽ More

    Submitted 11 July, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

  17. arXiv:2304.02136  [pdf, other

    math.DS cs.SC econ.TH

    Stability and chaos of the duopoly model of Kopel: A study based on symbolic computations

    Authors: Xiaoliang Li, Kongyan Chen, Wei Niu, Bo Huang

    Abstract: Since Kopel's duopoly model was proposed about three decades ago, there are almost no analytical results on the equilibria and their stability in the asymmetric case. The first objective of our study is to fill this gap. This paper analyzes the asymmetric duopoly model of Kopel analytically by using several tools based on symbolic computations. We discuss the possibility of the existence of multip… ▽ More

    Submitted 28 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2301.12628

  18. arXiv:2303.08331  [pdf, other

    cs.CV cs.LG cs.NE eess.IV

    Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

    Authors: Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma

    Abstract: As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, t… ▽ More

    Submitted 18 June, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: CVPR 2023 Highlight Paper

  19. arXiv:2209.09476  [pdf, other

    cs.LG cs.AI cs.CV

    SparCL: Sparse Continual Learning on the Edge

    Authors: Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

    Abstract: Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: Published at NeurIPS 2022 as a conference paper

  20. arXiv:2208.13363  [pdf, other

    cs.LG

    Survey: Exploiting Data Redundancy for Optimization of Deep Learning

    Authors: Jou-An Chen, Wei Niu, Bin Ren, Yanzhi Wang, Xipeng Shen

    Abstract: Data redundancy is ubiquitous in the inputs and intermediate results of Deep Neural Networks (DNN). It offers many significant opportunities for improving DNN performance and efficiency and has been explored in a large body of work. These studies have scattered in many venues across several years. The targets they focus on range from images to videos and texts, and the techniques they use to detec… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  21. arXiv:2207.12577  [pdf, other

    cs.CV cs.AR cs.LG eess.IV

    Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

    Authors: Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

    Abstract: Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this,… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

  22. arXiv:2206.01244  [pdf, other

    cs.CV eess.IV

    Real-Time Portrait Stylization on the Edge

    Authors: Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang

    Abstract: In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices. We propose a latency-driven differentiable architecture search method, maintaining realistic generative quality. With our framework, we obtain $10\times$ computation reduction on the generative model and achieve real-time video stylization on off-the-sh… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  23. arXiv:2112.13890  [pdf, other

    cs.CV cs.AI cs.AR cs.LG

    SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

    Authors: Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

    Abstract: Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Pruning, a traditional model compression paradigm for hardware efficiency, has been widely applied in various DNN structures. Nevertheless, it stays ambiguous on how to perform exclusive pru… ▽ More

    Submitted 20 September, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: ECCV 2022

  24. arXiv:2111.11581  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

    Authors: Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

    Abstract: Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this paper, we propose a general, fine-gr… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  25. arXiv:2110.14032  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

    Authors: Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

    Abstract: Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) that ensure s… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021 Spotlight Paper

  26. arXiv:2110.06373  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Enabling Level-4 Autonomous Driving on a Single $1k Off-the-Shelf Card

    Authors: Hsin-Hsuan Sung, Yuanchao Xu, Jiexiong Guan, Wei Niu, Shaoshan Liu, Bin Ren, Yanzhi Wang, Xipeng Shen

    Abstract: Autonomous driving is of great interest in both research and industry. The high cost has been one of the major roadblocks that slow down the development and adoption of autonomous driving in practice. This paper, for the first-time, shows that it is possible to run level-4 (i.e., fully autonomous driving) software on a single off-the-shelf card (Jetson AGX Xavier) for less than $1k, an order of ma… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: under conference review

  27. arXiv:2108.13342  [pdf, other

    cs.LG cs.AI

    DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

    Authors: Wei Niu, Jiexiong Guan, Yanzhi Wang, Gagan Agrawal, Bin Ren

    Abstract: Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frame… ▽ More

    Submitted 30 November, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

  28. arXiv:2108.11033  [pdf, other

    cs.LG cs.AI

    GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

    Authors: Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

    Abstract: It is appealing but challenging to achieve real-time deep neural network (DNN) inference on mobile devices because even the powerful modern mobile devices are considered as ``resource-constrained'' when executing large-scale DNNs. It necessitates the sparse model inference via weight pruning, i.e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilit… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: Accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021

  29. arXiv:2108.08910  [pdf, other

    eess.IV cs.AI cs.CV cs.LG cs.NE

    Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

    Authors: Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

    Abstract: Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices. To overcome the challenge and facilitate the real-time deploymen… ▽ More

    Submitted 14 February, 2023; v1 submitted 18 August, 2021; originally announced August 2021.

  30. arXiv:2106.15304  [pdf, other

    cs.CV

    Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

    Authors: Xuan Shen, Geng Yuan, Wei Niu, Xiaolong Ma, Jiexiong Guan, Zhengang Li, Bin Ren, Yanzhi Wang

    Abstract: The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms. However, to achieve high accuracy, state-of-the-art methods tend to have a large model size and complex post-processing algorithm, which costs intense computation and long end-to-end latenc… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  31. arXiv:2106.14943  [pdf, other

    cs.CV cs.AI

    Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search

    Authors: Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Bin Ren, Yanzhi Wang, Xue Lin

    Abstract: Object detection plays an important role in self-driving cars for security development. However, mobile systems on self-driving cars with limited computation resources lead to difficulties for object detection. To facilitate this, we propose a compiler-aware neural pruning search framework to achieve high-speed inference on autonomous vehicles for 2D and 3D object detection. The framework automati… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: Presented on the HiPEAC 2021 workshop (cogarch 2021)

  32. arXiv:2106.00526  [pdf, other

    cs.LG cs.AI

    A Compression-Compilation Framework for On-mobile Real-time BERT Applications

    Authors: Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

    Abstract: Transformer-based deep learning models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model to meet both resource and real-time specifications of mobile devices. Our framework applies a compiler-aware neural architecture optimization method (CANAO… ▽ More

    Submitted 6 June, 2021; v1 submitted 30 May, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.06823

  33. Blind Motion Deblurring Super-Resolution: When Dynamic Spatio-Temporal Learning Meets Static Image Understanding

    Authors: Wenjia Niu, Kaihao Zhang, Wenhan Luo, Yiran Zhong

    Abstract: Single-image super-resolution (SR) and multi-frame SR are two ways to super resolve low-resolution images. Single-Image SR generally handles each image independently, but ignores the temporal information implied in continuing frames. Multi-frame SR is able to model the temporal dependency via capturing motion information. However, it relies on neighbouring frames which are not always available in… ▽ More

    Submitted 18 October, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

    Comments: To appear in IEEE Transactions on Image Processing (TIP)

  34. arXiv:2012.13801  [pdf, other

    cs.CV cs.AI

    Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device

    Authors: Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Hsin-Hsuan Sung, Sijia Liu, Xipeng Shen, Bin Ren, Yanzhi Wang, Xue Lin

    Abstract: 3D object detection is an important task, especially in the autonomous driving application domain. However, it is challenging to support the real-time performance with the limited computation and memory resources on edge-computing devices in self-driving cars. To achieve this, we propose a compiler-aware unified framework incorporating network enhancement and pruning search with the reinforcement… ▽ More

    Submitted 6 March, 2021; v1 submitted 26 December, 2020; originally announced December 2020.

  35. arXiv:2012.00596  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

    Authors: Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin

    Abstract: With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed. Prior methods towards this goal, including model compression and network architecture search (NAS), are largely performed independently and do not fully consider compiler-level optimizations which is a must-do for mobile ac… ▽ More

    Submitted 16 June, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted as an oral paper in the Conference on Computer Vision and Pattern Recognition (CVPR), 2021

  36. ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning

    Authors: Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao

    Abstract: Convolutional neural networks (CNNs) are becoming increasingly deeper, wider, and non-linear because of the growing demand on prediction accuracy and analysis quality. The wide and deep CNNs, however, require a large amount of computing resources and processing time. Many previous works have studied model pruning to improve inference performance, but little work has been done for effectively reduc… ▽ More

    Submitted 30 April, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: 12 pages, 15 figures, 2 tables, published by ICS'21

  37. arXiv:2009.06823  [pdf, other

    cs.CL cs.LG

    Real-Time Execution of Large-scale Language Models on Mobile

    Authors: Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

    Abstract: Pre-trained large-scale language models have increasingly demonstrated high accuracy on many natural language processing (NLP) tasks. However, the limited weight storage and computational speed on hardware platforms have impeded the popularity of pre-trained models, especially in the era of edge computing. In this paper, we seek to find the best model structure of BERT for a given computation size… ▽ More

    Submitted 22 October, 2020; v1 submitted 14 September, 2020; originally announced September 2020.

  38. arXiv:2009.05697  [pdf, other

    cs.CV cs.AI cs.LG

    YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

    Authors: Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang

    Abstract: The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a… ▽ More

    Submitted 30 December, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

  39. arXiv:2007.09835  [pdf, other

    cs.LG cs.CV cs.NE eess.IV

    RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices

    Authors: Wei Niu, Mengshu Sun, Zhengang Li, Jou-An Chen, Jiexiong Guan, Xipeng Shen, Yanzhi Wang, Sijia Liu, Xue Lin, Bin Ren

    Abstract: Mobile devices are becoming an important carrier for deep learning tasks, as they are being equipped with powerful, high-end mobile CPUs and GPUs. However, it is still a challenging task to execute 3D Convolutional Neural Networks (CNNs) targeting for real-time performance, besides high inference accuracy. The reason is more complex model structure and higher model dimensionality overwhelm the ava… ▽ More

    Submitted 3 January, 2021; v1 submitted 19 July, 2020; originally announced July 2020.

    Comments: To appear in Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21)

  40. Category-Specific CNN for Visual-aware CTR Prediction at JD.com

    Authors: Hu Liu, Jing Lu, Hao Yang, Xiwei Zhao, Sulong Xu, Hao Peng, Zehua Zhang, Wenjie Niu, Xiaokun Zhu, Yongjun Bao, Weipeng Yan

    Abstract: As one of the largest B2C e-commerce platforms in China, JD com also powers a leading advertising system, serving millions of advertisers with fingertip connection to hundreds of millions of customers. In our system, as well as most e-commerce scenarios, ads are displayed with images.This makes visual-aware Click Through Rate (CTR) prediction of crucial importance to both business effectiveness an… ▽ More

    Submitted 19 June, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  41. arXiv:2004.11250  [pdf, other

    cs.LG cs.CV cs.MM

    Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization

    Authors: Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, Bin Ren

    Abstract: High-end mobile platforms rapidly serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. However, the constrained computation and storage resources on these devices still pose significant challenges for real-time DNN inference executions. To address this problem, we propose a set of hardware-friendly structured model pruning and compiler optimization techniq… ▽ More

    Submitted 21 April, 2020; originally announced April 2020.

    Comments: accepted by the IJCAI-PRICAI 2020 Demonstrations Track

  42. arXiv:2003.06513  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

    Authors: Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiaolin Xu, Yanzhi Wang

    Abstract: Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. However, previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. To mitigate this concern, we propose a privacy-preserving-oriented pruning and mobile acceleration framewor… ▽ More

    Submitted 16 September, 2020; v1 submitted 13 March, 2020; originally announced March 2020.

  43. arXiv:2002.11474  [pdf, other

    cs.SD cs.LG eess.AS

    RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

    Authors: Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao

    Abstract: Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a nove… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

  44. arXiv:2002.05150  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

    Authors: Phillip Keung, Wei Niu, Yichao Lu, Julian Salazar, Vikas Bhardwaj

    Abstract: We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances. We decode audio from the British National Corpus with an attentional encoder-decoder model trained solely on the LibriSpeech corpus. We ob… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

    Comments: Artifacts like our filtered Audio BNC dataset can be found at https://github.com/aws-samples/seq2seq-asr-misbehaves

  45. arXiv:2001.08357  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

    Authors: Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang

    Abstract: Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem. Prior works utilize l1-based group lasso or dynamic regularization such as ADMM to perform structured pruning on DNN models to leverage the parallel computing architectures. However, both of the pruning dimensions and pruning methods lack universality, which leads to degraded performance an… ▽ More

    Submitted 21 February, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

  46. arXiv:2001.07710  [pdf, other

    cs.CV cs.LG cs.NE eess.IV

    An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

    Authors: Xiaolong Ma, Wei Niu, Tianyun Zhang, Sijia Liu, Sheng Lin, Hongjia Li, Xiang Chen, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

    Abstract: Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms. However, most of the pruning techniques are essentially trade-offs between model accuracy and regularity which lead to impaired inference accuracy and limited on-device acceleration performance. To solve th… ▽ More

    Submitted 4 July, 2020; v1 submitted 20 January, 2020; originally announced January 2020.

    Comments: Paper accepted in the 16th European Conference on Computer Vision (ECCV 2020)

  47. arXiv:2001.00138  [pdf, other

    cs.LG cs.CV cs.DC

    PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

    Authors: Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

    Abstract: With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks (DNNs) is still challenging considering high computation and storage demands, specifically, if real-time performance with high accuracy is needed. Weight pruning of… ▽ More

    Submitted 21 January, 2020; v1 submitted 31 December, 2019; originally announced January 2020.

    Comments: To be published in the Proceedings of Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 20)

  48. A Hierarchical Self-Attentive Model for Recommending User-Generated Item Lists

    Authors: Yun He, Jianling Wang, Wei Niu, James Caverlee

    Abstract: User-generated item lists are a popular feature of many different platforms. Examples include lists of books on Goodreads, playlists on Spotify and YouTube, collections of images on Pinterest, and lists of answers on question-answer sites like Zhihu. Recommending item lists is critical for increasing user engagement and connecting users to new items, but many approaches are designed for the item-b… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

    Comments: Accepted by CIKM 2019

    ACM Class: H.3

    Journal ref: CIKM 2019

  49. arXiv:1909.05073  [pdf, other

    cs.LG cs.CV cs.DC cs.NE stat.ML

    PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

    Authors: Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

    Abstract: Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accu… ▽ More

    Submitted 4 March, 2020; v1 submitted 5 September, 2019; originally announced September 2019.

    Comments: To appear in Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20)

  50. arXiv:1905.03454  [pdf, other

    cs.CR cs.LG stat.ML

    Bidirectional RNN-based Few-shot Training for Detecting Multi-stage Attack

    Authors: Di Zhao, Jiqiang Liu, Jialin Wang, Wenjia Niu, Endong Tong, Tong Chen, Gang Li

    Abstract: "Feint Attack", as a new type of APT attack, has become the focus of attention. It adopts a multi-stage attacks mode which can be concluded as a combination of virtual attacks and real attacks. Under the cover of virtual attacks, real attacks can achieve the real purpose of the attacker, as a result, it often caused huge losses inadvertently. However, to our knowledge, all previous works use commo… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.