Skip to main content

Showing 1–45 of 45 results for author: Geng, T

  1. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.04272  [pdf, other

    cs.LG cs.DC

    Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression

    Authors: Hao Feng, Boyuan Zhang, Fanjiang Ye, Min Si, Ching-Hsiang Chu, Jiannan Tian, Chunxing Yin, Summer Deng, Yuchen Hao, Pavan Balaji, Tong Geng, Dingwen Tao

    Abstract: DLRM is a state-of-the-art recommendation system model that has gained widespread adoption across various industry applications. The large size of DLRM models, however, necessitates the use of multiple devices/GPUs for efficient training. A significant bottleneck in this process is the time-consuming all-to-all communication required to collect embedding data from all devices. To mitigate this, we… ▽ More

    Submitted 11 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: accepted by SC '24

  3. arXiv:2406.01559  [pdf, other

    cs.CV

    Prototypical Transformer as Unified Motion Learners

    Authors: Cheng Han, Yawen Lu, Guohao Sun, James C. Liang, Zhiwen Cao, Qifan Wang, Qiang Guan, Sohail A. Dianat, Raghuveer M. Rao, Tong Geng, Zhiqiang Tao, Dongfang Liu

    Abstract: In this work, we introduce the Prototypical Transformer (ProtoFormer), a general and unified framework that approaches various motion tasks from a prototype perspective. ProtoFormer seamlessly integrates prototype learning with Transformer by thoughtfully considering motion dynamics, introducing two innovative designs. First, Cross-Attention Prototyping discovers prototypes based on signature moti… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 21 pages, 10 figures

  4. PointDifformer: Robust Point Cloud Registration With Neural Diffusion and Transformer

    Authors: Rui She, Qiyu Kang, Sijie Wang, Wee Peng Tay, Kai Zhao, Yang Song, Tianyu Geng, Yi Xu, Diego Navarro Navarro, Andreas Hartmannsgruber

    Abstract: Point cloud registration is a fundamental technique in 3-D computer vision with applications in graphics, autonomous driving, and robotics. However, registration tasks under challenging conditions, under which noise or perturbations are prevalent, can be difficult. We propose a robust point cloud registration approach that leverages graph neural partial differential equations (PDEs) and heat kerne… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Transactions on Geoscience and Remote Sensing

  5. arXiv:2404.03179  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

    Authors: Tiantian Geng, Teng Wang, Yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng

    Abstract: Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL). Existing methods over-specialize on each task, overlooking the fact that these instances often occur in the same video to form the complete video content. In this work, we present UniAV, a Unified Audio… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  6. arXiv:2403.10042  [pdf

    cond-mat.mtrl-sci cs.LG

    Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids

    Authors: Yanfei Li, Juejing Liu, Xiaodong Zhao, Wenjun Liu, Tong Geng, Ang Li, Xin Zhang

    Abstract: Traditional analysis of highly distorted micro-X-ray diffraction (μ-XRD) patterns from hydrothermal fluid environments is a time-consuming process, often requiring substantial data preprocessing and labeled experimental data. This study demonstrates the potential of deep learning with a multitask learning (MTL) architecture to overcome these limitations. We trained MTL models to identify phase inf… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  7. PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with Perturbations

    Authors: Rui She, Sijie Wang, Qiyu Kang, Kai Zhao, Yang Song, Wee Peng Tay, Tianyu Geng, Xingchao Jian

    Abstract: Point cloud registration is a crucial technique in 3D computer vision with a wide range of applications. However, this task can be challenging, particularly in large fields of view with dynamic objects, environmental noise, or other perturbations. To address this challenge, we propose a model called PosDiffNet. Our approach performs hierarchical registration based on window-level, patch-level, and… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2024), Vancouver, Canada, 2024

  8. arXiv:2311.04417  [pdf, other

    cs.AR cs.DC cs.LG cs.PF

    Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

    Authors: Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker, Ang Li

    Abstract: The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Traditional computing architectures, based on the von Neumann model, are being outstripped by the requirements of contemporary AI/ML algorithms, leading to a surge… ▽ More

    Submitted 19 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: ICPE 2024 accepted publication

    ACM Class: C.4

  9. arXiv:2309.14331  [pdf, other

    cs.LG cs.AI cs.CR

    LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference

    Authors: Hongwu Peng, Ran Ran, Yukui Luo, Jiahui Zhao, Shaoyi Huang, Kiran Thorat, Tong Geng, Chenghong Wang, Xiaolin Xu, Wujie Wen, Caiwen Ding

    Abstract: The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encrypt… ▽ More

    Submitted 4 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023 accepted publication

    ACM Class: E.3; I.2; B.0

  10. arXiv:2309.13196  [pdf, other

    cs.CV

    ClusterFormer: Clustering As A Universal Visual Learner

    Authors: James C. Liang, Yiming Cui, Qifan Wang, Tong Geng, Wenguan Wang, Dongfang Liu

    Abstract: This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1. recurrent cross-attention clustering, which reformulates the cross-attention mechanism in Transformer and enables recursive updates of cluster centers to facilitate strong representation learning; and 2. feature dispatching, which uses the update… ▽ More

    Submitted 5 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

  11. arXiv:2308.11825  [pdf, other

    cs.AR cs.LG

    Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

    Authors: Xi Xie, Hongwu Peng, Amit Hasan, Shaoyi Huang, Jiahui Zhao, Haowen Fang, Wei Zhang, Tong Geng, Omer Khan, Caiwen Ding

    Abstract: Graph Convolutional Networks (GCNs) are pivotal in extracting latent information from graph data across various domains, yet their acceleration on mainstream GPUs is challenged by workload imbalance and memory access irregularity. To address these challenges, we present Accel-GCN, a GPU accelerator architecture for GCNs. The design of Accel-GCN encompasses: (i) a lightweight degree sorting stage t… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCAD 2023 accepted publication

    ACM Class: I.2; B.6; C.3

  12. arXiv:2308.10134  [pdf, other

    cs.CR cs.LG

    AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

    Authors: Hongwu Peng, Shaoyi Huang, Tong Zhou, Yukui Luo, Chenghong Wang, Zigeng Wang, Jiahui Zhao, Xi Xie, Ang Li, Tony Geng, Kaleel Mahmood, Wujie Wen, Xiaolin Xu, Caiwen Ding

    Abstract: The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients' data privacy and security issues. Private inference (PI) techniques using cryptographic primitives offer a solution but often have high computation and communication costs, particularly with non-linear operators like ReLU. Many attempts to reduce ReLU operations exist, but they may need heuristic threshold sele… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 accepeted publication

    ACM Class: E.3; I.2; B.0

  13. arXiv:2306.15513  [pdf, other

    cs.CR

    PASNet: Polynomial Architecture Search Framework for Two-party Computation-based Secure Neural Network Deployment

    Authors: Hongwu Peng, Shanglin Zhou, Yukui Luo, Nuo Xu, Shijin Duan, Ran Ran, Jiahui Zhao, Chenghong Wang, Tong Geng, Wujie Wen, Xiaolin Xu, Caiwen Ding

    Abstract: Two-party computation (2PC) is promising to enable privacy-preserving deep learning (DL). However, the 2PC-based privacy-preserving DL implementation comes with high comparison protocol overhead from the non-linear operators. This work presents PASNet, a novel systematic framework that enables low latency, high energy efficiency & accuracy, and security-guaranteed 2PC-DL by integrating the hardwar… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: DAC 2023 accepeted publication, short version was published on AAAI 2023 workshop on DL-Hardware Co-Design for AI Acceleration: RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference

    ACM Class: E.3; I.2; B.0

    Journal ref: DAC 2023

  14. arXiv:2304.11523  [pdf, other

    cs.CV

    TransFlow: Transformer as Flow Learner

    Authors: Yawen Lu, Qifan Wang, Siqi Ma, Tong Geng, Yingjie Victor Chen, Huaijin Chen, Dongfang Liu

    Abstract: Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. In this work, we propose TransFlow, a pure transformer architecture for optical flow estimation. Compared to dominant CNN-based methods, TransFlow demonstrates three advantages. First, it provides more accurate correlation and trustwo… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: 11 pages. Accepted by CVPR2023

  15. arXiv:2304.02525  [pdf, other

    cs.ET

    Supporting Energy-Based Learning With An Ising Machine Substrate: A Case Study on RBM

    Authors: Uday Kumar Reddy Vengalam, Yongchao Liu, Tong Geng, Hui Wu, Michael Huang

    Abstract: Nature apparently does a lot of computation constantly. If we can harness some of that computation at an appropriate level, we can potentially perform certain type of computation (much) faster and more efficiently than we can do with a von Neumann computer. Indeed, many powerful algorithms are inspired by nature and are thus prime candidates for nature-based computation. One particular branch of t… ▽ More

    Submitted 19 February, 2024; v1 submitted 5 April, 2023; originally announced April 2023.

  16. arXiv:2303.12930  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

    Authors: Tiantian Geng, Teng Wang, Jinming Duan, Runmin Cong, Feng Zheng

    Abstract: Existing audio-visual event localization (AVE) handles manually trimmed videos with only a single instance in each of them. However, this setting is unrealistic as natural videos often contain numerous audio-visual events with different categories. To better adapt to real-life applications, in this paper we focus on the task of dense-localizing audio-visual events, which aims to jointly localize a… ▽ More

    Submitted 24 March, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  17. arXiv:2303.10881  [pdf

    cond-mat.dis-nn cs.LG

    Machine Learning Automated Approach for Enormous Synchrotron X-Ray Diffraction Data Interpretation

    Authors: Xiaodong Zhao, YiXuan Luo, Juejing Liu, Wenjun Liu, Kevin M. Rosso, Xiaofeng Guo, Tong Geng, Ang Li, Xin Zhang

    Abstract: Manual analysis of XRD data is usually laborious and time consuming. The deep neural network (DNN) based models trained by synthetic XRD patterns are proved to be an automatic, accurate, and high throughput method to analysis common XRD data collected from solid sample in ambient environment. However, it remains unknown that whether synthetic XRD based models are capable to solve u-XRD mapping dat… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: See link below for supporting information https://docs.google.com/document/d/1m2SyaBDej4BhkWCA38GRXJe5Jd7Di7cp/edit?usp=sharing&ouid=108731997922646321851&rtpof=true&sd=true

  18. arXiv:2302.02292  [pdf, other

    cs.CR cs.LG

    RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference

    Authors: Hongwu Peng, Shanglin Zhou, Yukui Luo, Nuo Xu, Shijin Duan, Ran Ran, Jiahui Zhao, Shaoyi Huang, Xi Xie, Chenghong Wang, Tong Geng, Wujie Wen, Xiaolin Xu, Caiwen Ding

    Abstract: The proliferation of deep learning (DL) has led to the emergence of privacy and security concerns. To address these issues, secure Two-party computation (2PC) has been proposed as a means of enabling privacy-preserving DL computation. However, in practice, 2PC methods often incur high computation and communication overhead, which can impede their use in large-scale systems. To address this challen… ▽ More

    Submitted 22 February, 2023; v1 submitted 4 February, 2023; originally announced February 2023.

    Comments: This is work is a updated version of arXiv:2209.09424, the original version has been withdrawn

    ACM Class: I.2

  19. arXiv:2210.04114  [pdf, other

    cs.LG

    Towards Real-Time Temporal Graph Learning

    Authors: Deniz Gurevin, Mohsin Shan, Tong Geng, Weiwen Jiang, Caiwen Ding, Omer Khan

    Abstract: In recent years, graph representation learning has gained significant popularity, which aims to generate node embeddings that capture features of graphs. One of the methods to achieve this is employing a technique called random walks that captures node sequences in a graph and then learns embeddings for each node using a natural language processing technique called Word2Vec. These embeddings are t… ▽ More

    Submitted 11 October, 2022; v1 submitted 8 October, 2022; originally announced October 2022.

  20. arXiv:2209.09424   

    cs.CR cs.LG

    PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference

    Authors: Hongwu Peng, Shanglin Zhou, Yukui Luo, Shijin Duan, Nuo Xu, Ran Ran, Shaoyi Huang, Chenghong Wang, Tong Geng, Ang Li, Wujie Wen, Xiaolin Xu, Caiwen Ding

    Abstract: The rapid growth and deployment of deep learning (DL) has witnessed emerging privacy and security concerns. To mitigate these issues, secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving DL computation. In practice, they often come at very high computation and communication overhead, and potentially prohibit their popularity in large scale systems. Two orthogon… ▽ More

    Submitted 22 February, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Uploaded a new version of the paper in another new submission: RRNet: Towards ReLU-Reduced Neural Network for Two-party Computation Based Private Inference [arXiv:2302.02292]

    ACM Class: I.2; E.3; C.3

  21. arXiv:2209.06800  [pdf, other

    cs.DC cs.LG

    MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms

    Authors: Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li, Yufei Ding

    Abstract: The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the… ▽ More

    Submitted 26 June, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: Paper is accepted to OSDI'23

  22. arXiv:2209.04766  [pdf, other

    cs.LG

    Towards Sparsification of Graph Neural Networks

    Authors: Hongwu Peng, Deniz Gurevin, Shaoyi Huang, Tong Geng, Weiwen Jiang, Omer Khan, Caiwen Ding

    Abstract: As real-world graphs expand in size, larger GNN models with billions of parameters are deployed. High parameter count in such models makes training and inference on graphs expensive and challenging. To reduce the computational and memory costs of GNNs, optimization methods such as pruning the redundant nodes and edges in input graphs have been commonly adopted. However, model compression, which di… ▽ More

    Submitted 24 February, 2023; v1 submitted 10 September, 2022; originally announced September 2022.

    Comments: ICCD 2022 Paper

    ACM Class: I.2; C.4

  23. arXiv:2209.02193  [pdf, other

    cs.RO cs.AR cs.PL

    Programming Autonomous Machines

    Authors: Shaoshan Liu, Xiaoming Li, Tongsheng Geng, Stephane Zuckerman, Jean-Luc Gaudiot

    Abstract: One key technical challenge in the age of autonomous machines is the programming of autonomous machines, which demands the synergy across multiple domains, including fundamental computer science, computer architecture, and robotics, and requires expertise from both academia and industry. This paper discusses the programming theory and practices tied to producing real-life autonomous machines, and… ▽ More

    Submitted 5 September, 2022; originally announced September 2022.

    Comments: EMSOFT 2022

  24. A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining

    Authors: Hongwu Peng, Shaoyi Huang, Shiyang Chen, Bingbing Li, Tong Geng, Ang Li, Weiwen Jiang, Wujie Wen, Jinbo Bi, Hang Liu, Caiwen Ding

    Abstract: Transformers are considered one of the most important deep learning models since 2018, in part because it establishes state-of-the-art (SOTA) records and could potentially replace existing Deep Neural Networks (DNNs). Despite the remarkable triumphs, the prolonged turnaround time of Transformer models is a widely recognized roadblock. The variety of sequence lengths imposes additional computing ov… ▽ More

    Submitted 20 August, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

    Comments: 2022 59th ACM/IEEE Design Automation Conference (DAC)

    ACM Class: I.2; B.6; C.3

  25. arXiv:2206.13734  [pdf, other

    cs.AR cs.LG

    H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture

    Authors: Chengming Zhang, Tong Geng, Anqi Guo, Jiannan Tian, Martin Herbordt, Ang Li, Dingwen Tao

    Abstract: Graph Neural Networks (GNNs) have drawn tremendous attention due to their unique capability to extend Machine Learning (ML) approaches to applications broadly-defined as having unstructured data, especially graphs. Compared with other Machine Learning (ML) modalities, the acceleration of Graph Neural Networks (GNNs) is more challenging due to the irregularity and heterogeneity derived from graph t… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: 8 pages, 8 figures, 4 tables, accepted by FPL'22

  26. arXiv:2206.08482  [pdf, other

    cs.DC

    GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

    Authors: Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Ang Li, Yufei Ding

    Abstract: With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  27. arXiv:2206.03291  [pdf, other

    cs.NE cs.LG

    GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm

    Authors: Yanfei Li, Tong Geng, Samuel Stein, Ang Li, Huimin Yu

    Abstract: Binary neural networks (BNNs) show promising utilization in cost and power-restricted domains such as edge devices and mobile systems. This is due to its significantly less computation and storage demand, but at the cost of degraded performance. To close the accuracy gap, in this paper we propose to add a complementary activation function (AF) ahead of the sign based binarization, and rely on the… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

  28. Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric Behaviors

    Authors: Wei Sun, Ang Li, Tong Geng, Sander Stuijk, Henk Corporaal

    Abstract: Tensor Cores have been an important unit to accelerate Fused Matrix Multiplication Accumulation (MMA) in all NVIDIA GPUs since Volta Architecture. To program Tensor Cores, users have to use either legacy wmma APIs or current mma APIs. Legacy wmma APIs are more easy-to-use but can only exploit limited features and power of Tensor Cores. Specifically, wmma APIs support fewer operand shapes and can n… ▽ More

    Submitted 24 November, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  29. arXiv:2203.03990  [pdf, other

    cs.CV

    Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

    Authors: Jingfei Xia, Mingchen Zhuge, Tiantian Geng, Shun Fan, Yuantai Wei, Zhenyu He, Feng Zheng

    Abstract: Figure skating scoring is challenging because it requires judging the technical moves of the players as well as their coordination with the background music. Most learning-based methods cannot solve it well for two reasons: 1) each move in figure skating changes quickly, hence simply applying traditional frame sampling will lose a lot of valuable information, especially in 3 to 5 minutes long vide… ▽ More

    Submitted 17 December, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Our code is available at https://github.com/AndyFrancesco29/Audio-Visual-Figure-Skating

  30. arXiv:2203.03606  [pdf, other

    cs.AR cs.LG

    I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization

    Authors: Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin C. Herbordt, Yingyan Lin, Ang Li

    Abstract: Graph Convolutional Networks (GCNs) have drawn tremendous attention in the past three years. Compared with other deep learning modalities, high-performance hardware acceleration of GCNs is as critical but even more challenging. The hurdles arise from the poor data locality and redundant computation due to the large size, high sparsity, and irregular non-zero distribution of real-world graphs. In… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Published in MICRO 2022

  31. arXiv:2112.11594  [pdf, other

    cs.AR cs.LG

    GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design

    Authors: Haoran You, Tong Geng, Yongan Zhang, Ang Li, Yingyan Lin

    Abstract: Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because real-world graphs can be extremely large and sparse. Furthermore, the no… ▽ More

    Submitted 30 March, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: Published as a conference paper at HPCA 2022

  32. arXiv:2109.08983  [pdf, other

    cs.AR cs.AI cs.LG

    G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency

    Authors: Yongan Zhang, Haoran You, Yonggan Fu, Tong Geng, Ang Li, Yingyan Lin

    Abstract: Graph Neural Networks (GNNs) have emerged as the state-of-the-art (SOTA) method for graph-based learning tasks. However, it still remains prohibitively challenging to inference GNNs over large graph datasets, limiting their application to large-scale real-world tasks. While end-to-end jointly optimizing GNNs and their accelerators is promising in boosting GNNs' inference efficiency and expediting… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted at ICCAD 2021

  33. arXiv:2109.06355  [pdf, other

    cs.AR

    Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search

    Authors: Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott A. Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, Caiwen Ding

    Abstract: Molecular similarity search has been widely used in drug discovery to identify structurally similar compounds from large molecular databases rapidly. With the increasing size of chemical libraries, there is growing interest in the efficient acceleration of large-scale similarity search. Existing works mainly focus on CPU and GPU to accelerate the computation of the Tanimoto coefficient in measurin… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: ICCAD 2021

    ACM Class: B.0; I.0

  34. arXiv:2108.04811  [pdf, other

    cs.LG

    Binary Complex Neural Network Acceleration on FPGA

    Authors: Hongwu Peng, Shanglin Zhou, Scott Weitze, Jiaxin Li, Sahidul Islam, Tong Geng, Ang Li, Wei Zhang, Minghu Song, Mimi Xie, Hang Liu, Caiwen Ding

    Abstract: Being able to learn from complex data with phase information is imperative for many signal processing applications. Today' s real-valued deep neural networks (DNNs) have shown efficiency in latent information analysis but fall short when applied to the complex domain. Deep complex networks (DCN), in contrast, can learn from complex data, but have high computational costs; therefore, they cannot sa… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: ASAP 2021, 8 pages

    ACM Class: B.0; C.3; I.2

  35. CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression

    Authors: Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, Dingwen Tao

    Abstract: As HPC systems continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. I… ▽ More

    Submitted 13 May, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: 13 pages, 15 figures, 8 tables, accepted by ACM ICS '22

  36. arXiv:2106.12169  [pdf, other

    cs.DC cs.AI cs.AR cs.CV

    APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores

    Authors: Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding

    Abstract: Over the years, accelerating neural networks with quantization has been widely studied. Unfortunately, prior efforts with diverse precisions (e.g., 1-bit weights and 2-bit activations) are usually restricted by limited precision support on GPUs (e.g., int1 and int4). To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantizat… ▽ More

    Submitted 16 November, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted by SC'21

  37. arXiv:2104.10044  [pdf, other

    cs.NE cs.LG

    BCNN: Binary Complex Neural Network

    Authors: Yanfei Li, Tong Geng, Ang Li, Huimin Yu

    Abstract: Binarized neural networks, or BNNs, show great promise in edge-side applications with resource limited hardware, but raise the concerns of reduced accuracy. Motivated by the complex neural networks, in this paper we introduce complex representation into the BNNs and propose Binary complex neural network -- a novel network design that processes binary complex inputs and weights through complex conv… ▽ More

    Submitted 27 March, 2021; originally announced April 2021.

  38. arXiv:2012.06132  [pdf, other

    cs.CV

    Color-related Local Binary Pattern: A Learned Local Descriptor for Color Image Recognition

    Authors: Bin Xiao, Tao Geng, Xiuli Bi, Weisheng Li

    Abstract: Local binary pattern (LBP) as a kind of local feature has shown its simplicity, easy implementation and strong discriminating power in image recognition. Although some LBP variants are specifically investigated for color image recognition, the color information of images is not adequately considered and the curse of dimensionality in classification is easily caused in these methods. In this paper,… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

  39. arXiv:2011.04931  [pdf, other

    cs.DC cs.AR

    ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing

    Authors: Cheng Tan, Chenhao Xie, Tong Geng, Andres Marquez, Antonino Tumeo, Kevin Barker, Ang Li

    Abstract: The next generation HPC and data centers are likely to be reconfigurable and data-centric due to the trend of hardware specialization and the emergence of data-driven applications. In this paper, we propose ARENA -- an asynchronous reconfigurable accelerator ring architecture as a potential scenario on how the future HPC and data centers will be like. Despite using the coarse-grained reconfigurabl… ▽ More

    Submitted 19 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

  40. arXiv:2009.07899  [pdf, other

    cs.LG stat.ML

    Comparison Lift: Bandit-based Experimentation System for Online Advertising

    Authors: Tong Geng, Xiliang Lin, Harikesh S. Nair, Jun Hao, Bin Xiang, Shurui Fan

    Abstract: Comparison Lift is an experimentation-as-a-service (EaaS) application for testing online advertising audiences and creatives at JD.com. Unlike many other EaaS tools that focus primarily on fixed sample A/B testing, Comparison Lift deploys a custom bandit-based experimentation algorithm. The advantages of the bandit-based approach are two-fold. First, it aligns the randomization induced in the test… ▽ More

    Submitted 16 September, 2020; originally announced September 2020.

  41. CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks

    Authors: Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden K. -H. So, Martin Herbordt, Ang Li, Yanzhi Wang

    Abstract: Recurrent neural networks (RNNs) have been widely adopted in temporal sequence analysis, where realtime performance is often in demand. However, RNNs suffer from heavy computational workload as the model often comes with large weight matrices. Pruning schemes have been proposed for RNNs to eliminate the redundant (close-to-zero) weight values. On one hand, the non-structured pruning methods achiev… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    ACM Class: C.1.4

  42. arXiv:1908.10834  [pdf, other

    cs.DC cs.LG

    AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing

    Authors: Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, Martin Herbordt

    Abstract: Deep learning systems have been successfully applied to Euclidean data such as images, video, and audio. In many applications, however, information and their relationships are better expressed with graphs. Graph Convolutional Networks (GCNs) appear to be a promising approach to efficiently learn from graph data structures, having shown advantages in many critical applications. As with other deep l… ▽ More

    Submitted 10 September, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

  43. arXiv:1907.02178  [pdf, other

    cs.LG stat.ML

    Online Evaluation of Audiences for Targeted Advertising via Bandit Experiments

    Authors: Tong Geng, Xiliang Lin, Harikesh S. Nair

    Abstract: Firms implementing digital advertising campaigns face a complex problem in determining the right match between their advertising creatives and target audiences. Typical solutions to the problem have leveraged non-experimental methods, or used "split-testing" strategies that have not explicitly addressed the complexities induced by targeted audiences that can potentially overlap with one another. T… ▽ More

    Submitted 4 September, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

  44. arXiv:1905.05359  [pdf, other

    cs.DC cs.AR

    Fully Integrated On-FPGA Molecular Dynamics Simulations

    Authors: Chen Yang, Tong Geng, Tianqi Wang, Rushi Patel, Qingqing Xiong, Ahmed Sanaullah, Jiayi Sheng, Charles Lin, Vipin Sachdeva, Woody Sherman, Martin C. Herbordt

    Abstract: The implementation of Molecular Dynamics (MD) on FPGAs has received substantial attention. Previous work, however, has consisted of either proof-of-concept implementations of components, usually the range-limited force; full systems, but with much of the work shared by the host CPU; or prototype demonstrations, e.g., using OpenCL, that neither implement a whole system nor have competitive performa… ▽ More

    Submitted 13 May, 2019; originally announced May 2019.

    Comments: 13 pages, 17 figures;

  45. arXiv:1901.01007  [pdf, other

    cs.LG cs.AR cs.DC stat.ML

    FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters

    Authors: Tong Geng, Tianqi Wang, Ang Li, Xi Jin, Martin Herbordt

    Abstract: Deep Neural Networks (DNNs) have revolutionized numerous applications, but the demand for ever more performance remains unabated. Scaling DNN computations to larger clusters is generally done by distributing tasks in batch mode using methods such as distributed synchronous SGD. Among the issues with this approach is that to make the distributed cluster work with high utilization, the workload dist… ▽ More

    Submitted 21 June, 2020; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: Accepted by IEEE TRANSACTIONS ON COMPUTERS (TC)