-
Learning Multi-view Anomaly Detection
Authors:
Haoyang He,
Jiangning Zhang,
Guanzhong Tian,
Chengjie Wang,
Lei Xie
Abstract:
This study explores the recently proposed challenging multi-view Anomaly Detection (AD) task. Single-view tasks would encounter blind spots from other perspectives, resulting in inaccuracies in sample-level prediction. Therefore, we introduce the \textbf{M}ulti-\textbf{V}iew \textbf{A}nomaly \textbf{D}etection (\textbf{MVAD}) framework, which learns and integrates features from multi-views. Specif…
▽ More
This study explores the recently proposed challenging multi-view Anomaly Detection (AD) task. Single-view tasks would encounter blind spots from other perspectives, resulting in inaccuracies in sample-level prediction. Therefore, we introduce the \textbf{M}ulti-\textbf{V}iew \textbf{A}nomaly \textbf{D}etection (\textbf{MVAD}) framework, which learns and integrates features from multi-views. Specifically, we proposed a \textbf{M}ulti-\textbf{V}iew \textbf{A}daptive \textbf{S}election (\textbf{MVAS}) algorithm for feature learning and fusion across multiple views. The feature maps are divided into neighbourhood attention windows to calculate a semantic correlation matrix between single-view windows and all other views, which is a conducted attention mechanism for each single-view window and the top-K most correlated multi-view windows. Adjusting the window sizes and top-K can minimise the computational complexity to linear. Extensive experiments on the Real-IAD dataset for cross-setting (multi/single-class) validate the effectiveness of our approach, achieving state-of-the-art performance among sample \textbf{4.1\%}$\uparrow$/ image \textbf{5.6\%}$\uparrow$/pixel \textbf{6.7\%}$\uparrow$ levels with a total of ten metrics with only \textbf{18M} parameters and fewer GPU memory and training time.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Authors:
Christopher E. Mower,
Yuhui Wan,
Hongzhan Yu,
Antoine Grosnit,
Jonas Gonzalez-Billandon,
Matthieu Zimmer,
Jinlong Wang,
Xinyu Zhang,
Yao Zhao,
Anbang Zhai,
Puze Liu,
Daniel Palenicek,
Davide Tateo,
Cesar Cadena,
Marco Hutter,
Jan Peters,
Guangjian Tian,
Yuzheng Zhuang,
Kun Shao,
Xingyue Quan,
Jianye Hao,
Jun Wang,
Haitham Bou-Ammar
Abstract:
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connect…
▽ More
We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.
△ Less
Submitted 12 July, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark
Authors:
Jiangning Zhang,
Chengjie Wang,
Xiangtai Li,
Guanzhong Tian,
Zhucun Xue,
Yong Liu,
Guansong Pang,
Dacheng Tao
Abstract:
Anomaly detection (AD) is often focused on detecting anomaly areas for industrial quality inspection and medical lesion examination. However, due to the specific scenario targets, the data scale for AD is relatively small, and evaluation metrics are still deficient compared to classic vision tasks, such as object detection and semantic segmentation. To fill these gaps, this work first constructs a…
▽ More
Anomaly detection (AD) is often focused on detecting anomaly areas for industrial quality inspection and medical lesion examination. However, due to the specific scenario targets, the data scale for AD is relatively small, and evaluation metrics are still deficient compared to classic vision tasks, such as object detection and semantic segmentation. To fill these gaps, this work first constructs a large-scale and general-purpose COCO-AD dataset by extending COCO to the AD field. This enables fair evaluation and sustainable development for different methods on this challenging benchmark. Moreover, current metrics such as AU-ROC have nearly reached saturation on simple datasets, which prevents a comprehensive evaluation of different methods. Inspired by the metrics in the segmentation field, we further propose several more practical threshold-dependent AD-specific metrics, ie, m$F_1$$^{.2}_{.8}$, mAcc$^{.2}_{.8}$, mIoU$^{.2}_{.8}$, and mIoU-max. Motivated by GAN inversion's high-quality reconstruction capability, we propose a simple but more powerful InvAD framework to achieve high-quality feature reconstruction. Our method improves the effectiveness of reconstruction-based methods on popular MVTec AD, VisA, and our newly proposed COCO-AD datasets under a multi-class unsupervised setting, where only a single detection model is trained to detect anomalies from different classes. Extensive ablation experiments have demonstrated the effectiveness of each component of our InvAD. Full codes and models are available at https://github.com/zhangzjn/ader.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection
Authors:
Haoyang He,
Yuhu Bai,
Jiangning Zhang,
Qingdong He,
Hongxu Chen,
Zhenye Gan,
Chengjie Wang,
Xiangtai Li,
Guanzhong Tian,
Lei Xie
Abstract:
Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches. However, CNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Mamba-based models, with their superior long-range modeling and linear efficiency, have garnered substantial attention. This study pioneers the application of Mamba to mu…
▽ More
Recent advancements in anomaly detection have seen the efficacy of CNN- and transformer-based approaches. However, CNNs struggle with long-range dependencies, while transformers are burdened by quadratic computational complexity. Mamba-based models, with their superior long-range modeling and linear efficiency, have garnered substantial attention. This study pioneers the application of Mamba to multi-class unsupervised anomaly detection, presenting MambaAD, which consists of a pre-trained encoder and a Mamba decoder featuring (Locality-Enhanced State Space) LSS modules at multi-scales. The proposed LSS module, integrating parallel cascaded (Hybrid State Space) HSS blocks and multi-kernel convolutions operations, effectively captures both long-range and local information. The HSS block, utilizing (Hybrid Scanning) HS encoders, encodes feature maps into five scanning methods and eight directions, thereby strengthening global connections through the (State Space Model) SSM. The use of Hilbert scanning and eight directions significantly improves feature sequence modeling. Comprehensive experiments on six diverse anomaly detection datasets and seven metrics demonstrate state-of-the-art performance, substantiating the method's effectiveness.
△ Less
Submitted 14 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector
Authors:
Junbo Li,
Keyan Chen,
Gengju Tian,
Lu Li,
Zhenwei Shi
Abstract:
The segmentation and interpretation of the Martian surface play a pivotal role in Mars exploration, providing essential data for the trajectory planning and obstacle avoidance of rovers. However, the complex topography, similar surface features, and the lack of extensive annotated data pose significant challenges to the high-precision semantic segmentation of the Martian surface. To address these…
▽ More
The segmentation and interpretation of the Martian surface play a pivotal role in Mars exploration, providing essential data for the trajectory planning and obstacle avoidance of rovers. However, the complex topography, similar surface features, and the lack of extensive annotated data pose significant challenges to the high-precision semantic segmentation of the Martian surface. To address these challenges, we propose a novel encoder-decoder based Mars segmentation network, termed MarsSeg. Specifically, we employ an encoder-decoder structure with a minimized number of down-sampling layers to preserve local details. To facilitate a high-level semantic understanding across the shadow multi-level feature maps, we introduce a feature enhancement connection layer situated between the encoder and decoder. This layer incorporates Mini Atrous Spatial Pyramid Pooling (Mini-ASPP), Polarized Self-Attention (PSA), and Strip Pyramid Pooling Module (SPPM). The Mini-ASPP and PSA are specifically designed for shadow feature enhancement, thereby enabling the expression of local details and small objects. Conversely, the SPPM is employed for deep feature enhancement, facilitating the extraction of high-level semantic category-related information. Experimental results derived from the Mars-Seg and AI4Mars datasets substantiate that the proposed MarsSeg outperforms other state-of-the-art methods in segmentation performance, validating the efficacy of each proposed component.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Dual-path Frequency Discriminators for Few-shot Anomaly Detection
Authors:
Yuhu Bai,
Jiangning Zhang,
Yuhang Dong,
Guanzhong Tian,
Liang Liu,
Yunkang Cao,
Yabiao Wang,
Chengjie Wang
Abstract:
Few-shot anomaly detection (FSAD) is essential in industrial manufacturing. However, existing FSAD methods struggle to effectively leverage a limited number of normal samples, and they may fail to detect and locate inconspicuous anomalies in the spatial domain. We further discover that these subtle anomalies would be more noticeable in the frequency domain. In this paper, we propose a Dual-Path Fr…
▽ More
Few-shot anomaly detection (FSAD) is essential in industrial manufacturing. However, existing FSAD methods struggle to effectively leverage a limited number of normal samples, and they may fail to detect and locate inconspicuous anomalies in the spatial domain. We further discover that these subtle anomalies would be more noticeable in the frequency domain. In this paper, we propose a Dual-Path Frequency Discriminators (DFD) network from a frequency perspective to tackle these issues. Specifically, we generate anomalies at both image-level and feature-level. Differential frequency components are extracted by the multi-frequency information construction module and supplied into the fine-grained feature construction module to provide adapted features. We consider anomaly detection as a discriminative classification problem, wherefore the dual-path feature discrimination module is employed to detect and locate the image-level and feature-level anomalies in the feature space. The discriminators aim to learn a joint representation of anomalous features and normal features in the latent space. Extensive experiments conducted on MVTec AD and VisA benchmarks demonstrate that our DFD surpasses current state-of-the-art methods. Source code will be available.
△ Less
Submitted 11 March, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
IB-Net: Initial Branch Network for Variable Decision in Boolean Satisfiability
Authors:
Tsz Ho Chan,
Wenyi Xiao,
Junhua Huang,
Huiling Zhen,
Guangji Tian,
Mingxuan Yuan
Abstract:
Boolean Satisfiability problems are vital components in Electronic Design Automation, particularly within the Logic Equivalence Checking process. Currently, SAT solvers are employed for these problems and neural network is tried as assistance to solvers. However, as SAT problems in the LEC context are distinctive due to their predominantly unsatisfiability nature and a substantial proportion of UN…
▽ More
Boolean Satisfiability problems are vital components in Electronic Design Automation, particularly within the Logic Equivalence Checking process. Currently, SAT solvers are employed for these problems and neural network is tried as assistance to solvers. However, as SAT problems in the LEC context are distinctive due to their predominantly unsatisfiability nature and a substantial proportion of UNSAT-core variables, existing neural network assistance has proven unsuccessful in this specialized domain. To tackle this challenge, we propose IB-Net, an innovative framework utilizing graph neural networks and novel graph encoding techniques to model unsatisfiable problems and interact with state-of-the-art solvers. Extensive evaluations across solvers and datasets demonstrate IB-Net's acceleration, achieving an average runtime speedup of 5.0% on industrial data and 8.3% on SAT competition data empirically. This breakthrough advances efficient solving in LEC workflows.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments
Authors:
Yu Zhang,
Guangyao Tian,
Long Wen,
Xiangtong Yao,
Liding Zhang,
Zhenshan Bing,
Wei He,
Alois Knoll
Abstract:
This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles. This framework addresses two significant challenges associated with traditional dynamic control barrier functions (D-CBFs): their online construction and the diminished real-time performance caused…
▽ More
This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles. This framework addresses two significant challenges associated with traditional dynamic control barrier functions (D-CBFs): their online construction and the diminished real-time performance caused by utilizing multiple D-CBFs. To tackle the first challenge, the framework's perception component begins with clustering point clouds via the DBSCAN algorithm, followed by encapsulating these clusters with the minimum bounding ellipses (MBEs) algorithm to create elliptical representations. By comparing the current state of MBEs with those stored from previous moments, the differentiation between static and dynamic obstacles is realized, and the Kalman filter is utilized to predict the movements of the latter. Such analysis facilitates the D-CBF's online construction for each MBE. To tackle the second challenge, we introduce buffer zones, generating Type-II D-CBFs online for each identified obstacle. Utilizing these buffer zones as activation areas substantially reduces the number of D-CBFs that need to be activated. Upon entering these buffer zones, the system prioritizes safety, autonomously navigating safe paths, and hence referred to as the exploration mode. Exiting these buffer zones triggers the system's transition to goal-seeking mode. We demonstrate that the system's states under this framework achieve safety and asymptotic stabilization. Experimental results in simulated and real-world environments have validated our framework's capability, allowing a LiDAR-equipped mobile robot to efficiently and safely reach the desired location within dynamic environments containing multiple obstacles.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Convergence Analysis of Split Federated Learning on Heterogeneous Data
Authors:
Pengchao Han,
Chao Huang,
Geng Tian,
Ming Tang,
Xin Liu
Abstract:
Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train one part in a parallel federated manner, and a main server trains the other. Despite the recent research on SFL algorithm development, the convergence analysis of SFL is missing in the literature, and…
▽ More
Split federated learning (SFL) is a recent distributed approach for collaborative model training among multiple clients. In SFL, a global model is typically split into two parts, where clients train one part in a parallel federated manner, and a main server trains the other. Despite the recent research on SFL algorithm development, the convergence analysis of SFL is missing in the literature, and this paper aims to fill this gap. The analysis of SFL can be more challenging than that of federated learning (FL), due to the potential dual-paced updates at the clients and the main server. We provide convergence analysis of SFL for strongly convex and general convex objectives on heterogeneous data. The convergence rates are $O(1/T)$ and $O(1/\sqrt[3]{T})$, respectively, where $T$ denotes the total number of rounds for SFL training. We further extend the analysis to non-convex objectives and where some clients may be unavailable during training. Numerical experiments validate our theoretical results and show that SFL outperforms FL and split learning (SL) when data is highly heterogeneous across a large number of clients.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Evaluating the point cloud of individual trees generated from images based on Neural Radiance fields (NeRF) method
Authors:
Hongyu Huang,
Guoji Tian,
Chongcheng Chen
Abstract:
Three-dimensional (3D) reconstruction of trees has always been a key task in precision forestry management and research. Due to the complex branch morphological structure of trees themselves and the occlusions from tree stems, branches and foliage, it is difficult to recreate a complete three-dimensional tree model from a two-dimensional image by conventional photogrammetric methods. In this study…
▽ More
Three-dimensional (3D) reconstruction of trees has always been a key task in precision forestry management and research. Due to the complex branch morphological structure of trees themselves and the occlusions from tree stems, branches and foliage, it is difficult to recreate a complete three-dimensional tree model from a two-dimensional image by conventional photogrammetric methods. In this study, based on tree images collected by various cameras in different ways, the Neural Radiance Fields (NeRF) method was used for individual tree reconstruction and the exported point cloud models are compared with point cloud derived from photogrammetric reconstruction and laser scanning methods. The results show that the NeRF method performs well in individual tree 3D reconstruction, as it has higher successful reconstruction rate, better reconstruction in the canopy area, it requires less amount of images as input. Compared with photogrammetric reconstruction method, NeRF has significant advantages in reconstruction efficiency and is adaptable to complex scenes, but the generated point cloud tends to be noisy and low resolution. The accuracy of tree structural parameters (tree height and diameter at breast height) extracted from the photogrammetric point cloud is still higher than those of derived from the NeRF point cloud. The results of this study illustrate the great potential of NeRF method for individual tree reconstruction, and it provides new ideas and research directions for 3D reconstruction and visualization of complex forest scenes.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection
Authors:
Xuhai Chen,
Jiangning Zhang,
Guanzhong Tian,
Haoyang He,
Wuhao Zhang,
Yabiao Wang,
Chengjie Wang,
Yong Liu
Abstract:
This paper considers zero-shot Anomaly Detection (AD), performing AD without reference images of the test objects. We propose a framework called CLIP-AD to leverage the zero-shot capabilities of the large vision-language model CLIP. Firstly, we reinterpret the text prompts design from a distributional perspective and propose a Representative Vector Selection (RVS) paradigm to obtain improved text…
▽ More
This paper considers zero-shot Anomaly Detection (AD), performing AD without reference images of the test objects. We propose a framework called CLIP-AD to leverage the zero-shot capabilities of the large vision-language model CLIP. Firstly, we reinterpret the text prompts design from a distributional perspective and propose a Representative Vector Selection (RVS) paradigm to obtain improved text features. Secondly, we note opposite predictions and irrelevant highlights in the direct computation of the anomaly maps. To address these issues, we introduce a Staged Dual-Path model (SDP) that leverages features from various levels and applies architecture and feature surgery. Lastly, delving deeply into the two phenomena, we point out that the image and text features are not aligned in the joint embedding space. Thus, we introduce a fine-tuning strategy by adding linear layers and construct an extended model SDP+, further enhancing the performance. Abundant experiments demonstrate the effectiveness of our approach, e.g., on MVTec-AD, SDP outperforms the SOTA WinCLIP by +4.2/+10.7 in segmentation metrics F1-max/PRO, while SDP+ achieves +8.3/+20.5 improvements.
△ Less
Submitted 2 March, 2024; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Digital Twin System for Home Service Robot Based on Motion Simulation
Authors:
Zhengsong Jiang,
Guohui Tian,
Yongcheng Cui,
Tiantian Liu,
Yu Gu,
Yifei Wang
Abstract:
In order to improve the task execution capability of home service robot, and to cope with the problem that purely physical robot platforms cannot sense the environment and make decisions online, a method for building digital twin system for home service robot based on motion simulation is proposed. A reliable mapping of the home service robot and its working environment from physical space to digi…
▽ More
In order to improve the task execution capability of home service robot, and to cope with the problem that purely physical robot platforms cannot sense the environment and make decisions online, a method for building digital twin system for home service robot based on motion simulation is proposed. A reliable mapping of the home service robot and its working environment from physical space to digital space is achieved in three dimensions: geometric, physical and functional. In this system, a digital space-oriented URDF file parser is designed and implemented for the automatic construction of the robot geometric model. Next, the physical model is constructed from the kinematic equations of the robot and an improved particle swarm optimization algorithm is proposed for the inverse kinematic solution. In addition, to adapt to the home environment, functional attributes are used to describe household objects, thus improving the semantic description of the digital space for the real home environment. Finally, through geometric model consistency verification, physical model validity verification and virtual-reality consistency verification, it shows that the digital twin system designed in this paper can construct the robot geometric model accurately and completely, complete the operation of household objects successfully, and the digital twin system is effective and practical.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Concurrent Composition for Interactive Differential Privacy with Adaptive Privacy-Loss Parameters
Authors:
Samuel Haney,
Michael Shoemate,
Grace Tian,
Salil Vadhan,
Andrew Vyrros,
Vicki Xu,
Wanrong Zhang
Abstract:
In this paper, we study the concurrent composition of interactive mechanisms with adaptively chosen privacy-loss parameters. In this setting, the adversary can interleave queries to existing interactive mechanisms, as well as create new ones. We prove that every valid privacy filter and odometer for noninteractive mechanisms extends to the concurrent composition of interactive mechanisms if privac…
▽ More
In this paper, we study the concurrent composition of interactive mechanisms with adaptively chosen privacy-loss parameters. In this setting, the adversary can interleave queries to existing interactive mechanisms, as well as create new ones. We prove that every valid privacy filter and odometer for noninteractive mechanisms extends to the concurrent composition of interactive mechanisms if privacy loss is measured using $(ε, δ)$-DP, $f$-DP, or Rényi DP of fixed order. Our results offer strong theoretical foundations for enabling full adaptivity in composing differentially private interactive mechanisms, showing that concurrency does not affect the privacy guarantees. We also provide an implementation for users to deploy in practice.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
LuViRA Dataset Validation and Discussion: Comparing Vision, Radio, and Audio Sensors for Indoor Localization
Authors:
Ilayda Yaman,
Guoda Tian,
Erik Tegler,
Jens Gulin,
Nikhil Challa,
Fredrik Tufvesson,
Ove Edfors,
Kalle Astrom,
Steffen Malkowsky,
Liang Liu
Abstract:
We present a unique comparative analysis, and evaluation of vision, radio, and audio based localization algorithms. We create the first baseline for the aforementioned sensors using the recently published Lund University Vision, Radio, and Audio (LuViRA) dataset, where all the sensors are synchronized and measured in the same environment. Some of the challenges of using each specific sensor for in…
▽ More
We present a unique comparative analysis, and evaluation of vision, radio, and audio based localization algorithms. We create the first baseline for the aforementioned sensors using the recently published Lund University Vision, Radio, and Audio (LuViRA) dataset, where all the sensors are synchronized and measured in the same environment. Some of the challenges of using each specific sensor for indoor localization tasks are highlighted. Each sensor is paired with a current state-of-the-art localization algorithm and evaluated for different aspects: localization accuracy, reliability and sensitivity to environment changes, calibration requirements, and potential system complexity. Specifically, the evaluation covers the ORB-SLAM3 algorithm for vision-based localization with an RGB-D camera, a machine-learning algorithm for radio-based localization with massive MIMO technology, and the SFS2 algorithm for audio-based localization with distributed microphones. The results can serve as a guideline and basis for further development of robust and high-precision multi-sensory localization systems, e.g., through sensor fusion, context, and environment-aware adaptation.
△ Less
Submitted 25 April, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
VNI-Net: Vector Neurons-based Rotation-Invariant Descriptor for LiDAR Place Recognition
Authors:
Gengxuan Tian,
Junqiao Zhao,
Yingfeng Cai,
Fenglin Zhang,
Wenjie Mu,
Chen Ye
Abstract:
LiDAR-based place recognition plays a crucial role in Simultaneous Localization and Mapping (SLAM) and LiDAR localization.
Despite the emergence of various deep learning-based and hand-crafting-based methods, rotation-induced place recognition failure remains a critical challenge.
Existing studies address this limitation through specific training strategies or network structures.
However, th…
▽ More
LiDAR-based place recognition plays a crucial role in Simultaneous Localization and Mapping (SLAM) and LiDAR localization.
Despite the emergence of various deep learning-based and hand-crafting-based methods, rotation-induced place recognition failure remains a critical challenge.
Existing studies address this limitation through specific training strategies or network structures.
However, the former does not produce satisfactory results, while the latter focuses mainly on the reduced problem of SO(2) rotation invariance. Methods targeting SO(3) rotation invariance suffer from limitations in discrimination capability.
In this paper, we propose a new method that employs Vector Neurons Network (VNN) to achieve SO(3) rotation invariance.
We first extract rotation-equivariant features from neighboring points and map low-dimensional features to a high-dimensional space through VNN.
Afterwards, we calculate the Euclidean and Cosine distance in the rotation-equivariant feature space as rotation-invariant feature descriptors.
Finally, we aggregate the features using GeM pooling to obtain global descriptors.
To address the significant information loss when formulating rotation-invariant descriptors, we propose computing distances between features at different layers within the Euclidean space neighborhood.
This greatly improves the discriminability of the point cloud descriptors while ensuring computational efficiency.
Experimental results on public datasets show that our approach significantly outperforms other baseline methods implementing rotation invariance, while achieving comparable results with current state-of-the-art place recognition methods that do not consider rotation issues.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
When MiniBatch SGD Meets SplitFed Learning:Convergence Analysis and Performance Evaluation
Authors:
Chao Huang,
Geng Tian,
Ming Tang
Abstract:
Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two…
▽ More
Federated learning (FL) enables collaborative model training across distributed clients (e.g., edge devices) without sharing raw data. Yet, FL can be computationally expensive as the clients need to train the entire model multiple times. SplitFed learning (SFL) is a recent distributed approach that alleviates computation workload at the client device by splitting the model at a cut layer into two parts, where clients only need to train part of the model. However, SFL still suffers from the \textit{client drift} problem when clients' data are highly non-IID. To address this issue, we propose MiniBatch-SFL. This algorithm incorporates MiniBatch SGD into SFL, where the clients train the client-side model in an FL fashion while the server trains the server-side model similar to MiniBatch SGD. We analyze the convergence of MiniBatch-SFL and show that the bound of the expected loss can be obtained by analyzing the expected server-side and client-side model updates, respectively. The server-side updates do not depend on the non-IID degree of the clients' datasets and can potentially mitigate client drift. However, the client-side model relies on the non-IID degree and can be optimized by properly choosing the cut layer. Perhaps counter-intuitive, our empirical result shows that a latter position of the cut layer leads to a smaller average gradient divergence and a better algorithm performance. Moreover, numerical results show that MiniBatch-SFL achieves higher accuracy than conventional SFL and FL. The accuracy improvement can be up to 24.1\% and 17.1\% with highly non-IID data, respectively.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Multi-Agent Cooperation via Unsupervised Learning of Joint Intentions
Authors:
Shanqi Liu,
Weiwei Liu,
Wenzhou Chen,
Guanzhong Tian,
Yong Liu
Abstract:
The field of cooperative multi-agent reinforcement learning (MARL) has seen widespread use in addressing complex coordination tasks. While value decomposition methods in MARL have been popular, they have limitations in solving tasks with non-monotonic returns, restricting their general application. Our work highlights the significance of joint intentions in cooperation, which can overcome non-mono…
▽ More
The field of cooperative multi-agent reinforcement learning (MARL) has seen widespread use in addressing complex coordination tasks. While value decomposition methods in MARL have been popular, they have limitations in solving tasks with non-monotonic returns, restricting their general application. Our work highlights the significance of joint intentions in cooperation, which can overcome non-monotonic problems and increase the interpretability of the learning process. To this end, we present a novel MARL method that leverages learnable joint intentions. Our method employs a hierarchical framework consisting of a joint intention policy and a behavior policy to formulate the optimal cooperative policy. The joint intentions are autonomously learned in a latent space through unsupervised learning and enable the method adaptable to different agent configurations. Our results demonstrate significant performance improvements in both the StarCraft micromanagement benchmark and challenging MAgent domains, showcasing the effectiveness of our method in learning meaningful joint intentions.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning
Authors:
Jun Chen,
Shipeng Bai,
Tianxin Huang,
Mengmeng Wang,
Guanzhong Tian,
Yong Liu
Abstract:
Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quanti…
▽ More
Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data-free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
ViG-UNet: Vision Graph Neural Networks for Medical Image Segmentation
Authors:
Juntao Jiang,
Xiyu Chen,
Guanzhong Tian,
Yong Liu
Abstract:
Deep neural networks have been widely used in medical image analysis and medical image segmentation is one of the most important tasks. U-shaped neural networks with encoder-decoder are prevailing and have succeeded greatly in various segmentation tasks. While CNNs treat an image as a grid of pixels in Euclidean space and Transformers recognize an image as a sequence of patches, graph-based repres…
▽ More
Deep neural networks have been widely used in medical image analysis and medical image segmentation is one of the most important tasks. U-shaped neural networks with encoder-decoder are prevailing and have succeeded greatly in various segmentation tasks. While CNNs treat an image as a grid of pixels in Euclidean space and Transformers recognize an image as a sequence of patches, graph-based representation is more generalized and can construct connections for each part of an image. In this paper, we propose a novel ViG-UNet, a graph neural network-based U-shaped architecture with the encoder, the decoder, the bottleneck, and skip connections. The downsampling and upsampling modules are also carefully designed. The experimental results on ISIC 2016, ISIC 2017 and Kvasir-SEG datasets demonstrate that our proposed architecture outperforms most existing classic and state-of-the-art U-shaped networks.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition
Authors:
Junqiao Zhao,
Fenglin Zhang,
Yingfeng Cai,
Gengxuan Tian,
Wenjie Mu,
Chen Ye,
Tiantian Feng
Abstract:
Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually…
▽ More
Visual Place Recognition (VPR) aims to retrieve frames from a geotagged database that are located at the same place as the query frame. To improve the robustness of VPR in perceptually aliasing scenarios, sequence-based VPR methods are proposed. These methods are either based on matching between frame sequences or extracting sequence descriptors for direct retrieval. However, the former is usually based on the assumption of constant velocity, which is difficult to hold in practice, and is computationally expensive and subject to sequence length. Although the latter overcomes these problems, existing sequence descriptors are constructed by aggregating features of multiple frames only, without interaction on temporal information, and thus cannot obtain descriptors with spatio-temporal discrimination.In this paper, we propose a sequence descriptor that effectively incorporates spatio-temporal information. Specifically, spatial attention within the same frame is utilized to learn spatial feature patterns, while attention in corresponding local regions of different frames is utilized to learn the persistence or change of features over time. We use a sliding window to control the temporal range of attention and use relative positional encoding to construct sequential relationships between different features. This allows our descriptors to capture the intrinsic dynamics in a sequence of frames.Comprehensive experiments on challenging benchmark datasets show that the proposed approach outperforms recent state-of-the-art methods.The code is available at https://github.com/tiev-tongji/Spatio-Temporal-SeqVPR.
△ Less
Submitted 27 January, 2024; v1 submitted 19 May, 2023;
originally announced May 2023.
-
MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
Authors:
Liang Liu,
Boshen Zhang,
Jiangning Zhang,
Wuhao Zhang,
Zhenye Gan,
Guanzhong Tian,
Wenbing Zhu,
Yabiao Wang,
Chengjie Wang
Abstract:
Scale variation across object instances remains a key challenge in object detection task. Despite the remarkable progress made by modern detection models, this challenge is particularly evident in the semi-supervised case. While existing semi-supervised object detection methods rely on strict conditions to filter high-quality pseudo labels from network predictions, we observe that objects with ext…
▽ More
Scale variation across object instances remains a key challenge in object detection task. Despite the remarkable progress made by modern detection models, this challenge is particularly evident in the semi-supervised case. While existing semi-supervised object detection methods rely on strict conditions to filter high-quality pseudo labels from network predictions, we observe that objects with extreme scale tend to have low confidence, resulting in a lack of positive supervision for these objects. In this paper, we propose a novel framework that addresses the scale variation problem by introducing a mixed scale teacher to improve pseudo label generation and scale-invariant learning. Additionally, we propose mining pseudo labels using score promotion of predictions across scales, which benefits from better predictions from mixed scale features. Our extensive experiments on MS COCO and PASCAL VOC benchmarks under various semi-supervised settings demonstrate that our method achieves new state-of-the-art performance. The code and models are available at \url{https://github.com/lliuz/MixTeacher}.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
High-Precision Machine-Learning Based Indoor Localization with Massive MIMO System
Authors:
Guoda Tian,
Ilayda Yaman,
Michiel Sandra,
Xuesong Cai,
Liang Liu,
Fredrik Tufvesson
Abstract:
High-precision cellular-based localization is one of the key technologies for next-generation communication systems. In this paper, we investigate the potential of applying machine learning (ML) to a massive multiple-input multiple-output (MIMO) system to enhance localization accuracy. We analyze a new ML-based localization pipeline that has two parallel fully connected neural networks (FCNN). The…
▽ More
High-precision cellular-based localization is one of the key technologies for next-generation communication systems. In this paper, we investigate the potential of applying machine learning (ML) to a massive multiple-input multiple-output (MIMO) system to enhance localization accuracy. We analyze a new ML-based localization pipeline that has two parallel fully connected neural networks (FCNN). The first FCNN takes the instantaneous spatial covariance matrix to capture angular information, while the second FCNN takes the channel impulse responses to capture delay information. We fuse the estimated coordinates of these two FCNNs for further accuracy improvement. To test the localization algorithm, we performed an indoor measurement campaign with a massive MIMO testbed at 3.7GHz. In the measured scenario, the proposed pipeline can achieve centimeter-level accuracy by combining delay and angular information.
△ Less
Submitted 7 March, 2023;
originally announced March 2023.
-
The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization
Authors:
Ilayda Yaman,
Guoda Tian,
Martin Larsson,
Patrik Persson,
Michiel Sandra,
Alexander Dürr,
Erik Tegler,
Nikhil Challa,
Henrik Garde,
Fredrik Tufvesson,
Kalle Åström,
Ove Edfors,
Steffen Malkowsky,
Liang Liu
Abstract:
We present a synchronized multisensory dataset for accurate and robust indoor localization: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes color images, corresponding depth maps, inertial measurement unit (IMU) readings, channel response between a 5G massive multiple-input and multiple-output (MIMO) testbed and user equipment, audio recorded by 12 microphones,…
▽ More
We present a synchronized multisensory dataset for accurate and robust indoor localization: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes color images, corresponding depth maps, inertial measurement unit (IMU) readings, channel response between a 5G massive multiple-input and multiple-output (MIMO) testbed and user equipment, audio recorded by 12 microphones, and accurate six degrees of freedom (6DOF) pose ground truth of 0.5 mm. We synchronize these sensors to ensure that all data is recorded simultaneously. A camera, speaker, and transmit antenna are placed on top of a slowly moving service robot, and 89 trajectories are recorded. Each trajectory includes 20 to 50 seconds of recorded sensor data and ground truth labels. Data from different sensors can be used separately or jointly to perform localization tasks, and data from the motion capture (mocap) system is used to verify the results obtained by the localization algorithms. The main aim of this dataset is to enable research on sensor fusion with the most commonly used sensors for localization tasks. Moreover, the full dataset or some parts of it can also be used for other research areas such as channel estimation, image classification, etc. Our dataset is available at: https://github.com/ilaydayaman/LuViRA_Dataset
△ Less
Submitted 26 April, 2024; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Fast Gumbel-Max Sketch and its Applications
Authors:
Yuanming Zhang,
Pinghui Wang,
Yiyan Qi,
Kuankuan Cheng,
Junzhou Zhao,
Guangjian Tian,
Xiaohong Guan
Abstract:
The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight eleme…
▽ More
The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight element $i$, and then samples the element $i$ with the largest value of $g_i+\ln v_i$. Recently, applications including similarity estimation and weighted cardinality estimation require to generate $k$ independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large $k$ (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, which reduces the time complexity from $O(kn^+)$ to $O(k \ln k + n^+)$, where $n^+$ is the number of positive elements in the vector of interest. FastGM stops the procedure of Gumbel random variables computing for many elements, especially for those with small weights. We perform experiments on a variety of real-world datasets and the experimental results demonstrate that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy or incurring additional expenses.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
Cross-Layer Retrospective Retrieving via Layer Attention
Authors:
Yanwen Fang,
Yuxi Cai,
Jintai Chen,
Jingyu Zhao,
Guangjian Tian,
Guodong Li
Abstract:
More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current…
▽ More
More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. A light-weighted version of MRLA is also proposed to reduce the quadratic computation cost. The proposed layer attention mechanism can enrich the representation power of many state-of-the-art vision networks, including CNNs and vision transformers. Its effectiveness has been extensively evaluated in image classification, object detection and instance segmentation tasks, where improvements can be consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs. Surprisingly, it can boost the performances by a large margin of 3-4% box AP and mask AP in dense prediction tasks. Our code is available at https://github.com/joyfang1106/MRLA.
△ Less
Submitted 28 February, 2023; v1 submitted 8 February, 2023;
originally announced February 2023.
-
DL-SLOT: Dynamic LiDAR SLAM and object tracking based on collaborative graph optimization
Authors:
Xuebo Tian,
Zhongyang Zhu,
Junqiao Zhao,
Gengxuan Tian,
Chen Ye
Abstract:
Ego-pose estimation and dynamic object tracking are two critical problems for autonomous driving systems. The solutions to these problems are generally based on their respective assumptions, \ie{the static world assumption for simultaneous localization and mapping (SLAM) and the accurate ego-pose assumption for object tracking}. However, these assumptions are challenging to hold in dynamic road sc…
▽ More
Ego-pose estimation and dynamic object tracking are two critical problems for autonomous driving systems. The solutions to these problems are generally based on their respective assumptions, \ie{the static world assumption for simultaneous localization and mapping (SLAM) and the accurate ego-pose assumption for object tracking}. However, these assumptions are challenging to hold in dynamic road scenarios, where SLAM and object tracking become closely correlated. Therefore, we propose DL-SLOT, a dynamic LiDAR SLAM and object tracking method, to simultaneously address these two coupled problems. This method integrates the state estimations of both the autonomous vehicle and the stationary and dynamic objects in the environment into a unified optimization framework. First, we used object detection to identify all points belonging to potentially dynamic objects. Subsequently, a LiDAR odometry was conducted using the filtered point cloud. Simultaneously, we proposed a sliding window-based object association method that accurately associates objects according to the historical trajectories of tracked objects. The ego-states and those of the stationary and dynamic objects are integrated into the sliding window-based collaborative graph optimization. The stationary objects are subsequently restored from the potentially dynamic object set. Finally, a global pose-graph is implemented to eliminate the accumulated error. Experiments on KITTI datasets demonstrate that our method achieves better accuracy than SLAM and object tracking baseline methods. This confirms that solving SLAM and object tracking simultaneously is mutually advantageous, dramatically improving the robustness and accuracy of SLAM and object tracking in dynamic road scenarios.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Near-Term Quantum Computing Techniques: Variational Quantum Algorithms, Error Mitigation, Circuit Compilation, Benchmarking and Classical Simulation
Authors:
He-Liang Huang,
Xiao-Yue Xu,
Chu Guo,
Guojing Tian,
Shi-Jie Wei,
Xiaoming Sun,
Wan-Su Bao,
Gui-Lu Long
Abstract:
Quantum computing is a game-changing technology for global academia, research centers and industries including computational science, mathematics, finance, pharmaceutical, materials science, chemistry and cryptography. Although it has seen a major boost in the last decade, we are still a long way from reaching the maturity of a full-fledged quantum computer. That said, we will be in the Noisy-Inte…
▽ More
Quantum computing is a game-changing technology for global academia, research centers and industries including computational science, mathematics, finance, pharmaceutical, materials science, chemistry and cryptography. Although it has seen a major boost in the last decade, we are still a long way from reaching the maturity of a full-fledged quantum computer. That said, we will be in the Noisy-Intermediate Scale Quantum (NISQ) era for a long time, working on dozens or even thousands of qubits quantum computing systems. An outstanding challenge, then, is to come up with an application that can reliably carry out a nontrivial task of interest on the near-term quantum devices with non-negligible quantum noise. To address this challenge, several near-term quantum computing techniques, including variational quantum algorithms, error mitigation, quantum circuit compilation and benchmarking protocols, have been proposed to characterize and mitigate errors, and to implement algorithms with a certain resistance to noise, so as to enhance the capabilities of near-term quantum devices and explore the boundaries of their ability to realize useful applications. Besides, the development of near-term quantum devices is inseparable from the efficient classical simulation, which plays a vital role in quantum algorithm design and verification, error-tolerant verification and other applications. This review will provide a thorough introduction of these near-term quantum computing techniques, report on their progress, and finally discuss the future prospect of these techniques, which we hope will motivate researchers to undertake additional studies in this field.
△ Less
Submitted 27 December, 2022; v1 submitted 16 November, 2022;
originally announced November 2022.
-
From RDMA to RDCA: Toward High-Speed Last Mile of Data Center Networks Using Remote Direct Cache Access
Authors:
Qiang Li,
Qiao Xiang,
Derui Liu,
Yuxin Wang,
Haonan Qiu,
Xiaoliang Wang,
Jie Zhang,
Ridi Wen,
Haohao Song,
Gexiao Tian,
Chenyang Huang,
Lulu Chen,
Shaozong Liu,
Yaohui Wu,
Zhiwu Wu,
Zicheng Luo,
Yuchao Shao,
Chao Han,
Zhongjie Wu,
Jianbo Dong,
Zheng Cao,
Jinbo Wu,
Jiwu Shu,
Jiesheng Wu
Abstract:
In this paper, we conduct systematic measurement studies to show that the high memory bandwidth consumption of modern distributed applications can lead to a significant drop of network throughput and a large increase of tail latency in high-speed RDMA networks.We identify its root cause as the high contention of memory bandwidth between application processes and network processes. This contention…
▽ More
In this paper, we conduct systematic measurement studies to show that the high memory bandwidth consumption of modern distributed applications can lead to a significant drop of network throughput and a large increase of tail latency in high-speed RDMA networks.We identify its root cause as the high contention of memory bandwidth between application processes and network processes. This contention leads to frequent packet drops at the NIC of receiving hosts, which triggers the congestion control mechanism of the network and eventually results in network performance degradation.
To tackle this problem, we make a key observation that given the distributed storage service, the vast majority of data it receives from the network will be eventually written to high-speed storage media (e.g., SSD) by CPU. As such, we propose to bypass host memory when processing received data to completely circumvent this performance bottleneck. In particular, we design Lamda, a novel receiver cache processing system that consumes a small amount of CPU cache to process received data from the network at line rate. We implement a prototype of Lamda and evaluate its performance extensively in a Clos-based testbed. Results show that for distributed storage applications, Lamda improves network throughput by 4.7% with zero memory bandwidth consumption on storage nodes, and improves network throughput by up 17% and 45% for large block size and small size under the memory bandwidth pressure, respectively. Lamda can also be applied to latency-sensitive HPC applications, which reduces their communication latency by 35.1%.
△ Less
Submitted 25 March, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
Temporal and Spatial Online Integrated Calibration for Camera and LiDAR
Authors:
Shouan Wang,
Xinyu Zhang,
GuiPeng Zhang,
Yijin Xiong,
Ganglin Tian,
Shichun Guo,
Jun Li
Abstract:
While camera and LiDAR are widely used in most of the assisted and autonomous driving systems, only a few works have been proposed to associate the temporal synchronization and extrinsic calibration for camera and LiDAR which are dedicated to online sensors data fusion. The temporal and spatial calibration technologies are facing the challenges of lack of relevance and real-time. In this paper, we…
▽ More
While camera and LiDAR are widely used in most of the assisted and autonomous driving systems, only a few works have been proposed to associate the temporal synchronization and extrinsic calibration for camera and LiDAR which are dedicated to online sensors data fusion. The temporal and spatial calibration technologies are facing the challenges of lack of relevance and real-time. In this paper, we introduce the pose estimation model and environmental robust line features extraction to improve the relevance of data fusion and instant online ability of correction. Dynamic targets eliminating aims to seek optimal policy considering the correspondence of point cloud matching between adjacent moments. The searching optimization process aims to provide accurate parameters with both computation accuracy and efficiency. To demonstrate the benefits of this method, we evaluate it on the KITTI benchmark with ground truth value. In online experiments, our approach improves the accuracy by 38.5\% than the soft synchronization method in temporal calibration. While in spatial calibration, our approach automatically corrects disturbance errors within 0.4 second and achieves an accuracy of 0.3-degree. This work can promote the research and application of sensor fusion.
△ Less
Submitted 21 July, 2022;
originally announced July 2022.
-
Convolutional Neural Network Modelling for MODIS Land Surface Temperature Super-Resolution
Authors:
Binh Minh Nguyen,
Ganglin Tian,
Minh-Triet Vo,
Aurélie Michel,
Thomas Corpetti,
Carlos Granero-Belinchon
Abstract:
Nowadays, thermal infrared satellite remote sensors enable to extract very interesting information at large scale, in particular Land Surface Temperature (LST). However such data are limited in spatial and/or temporal resolutions which prevents from an analysis at fine scales. For example, MODIS satellite provides daily acquisitions with 1Km spatial resolutions which is not sufficient to deal with…
▽ More
Nowadays, thermal infrared satellite remote sensors enable to extract very interesting information at large scale, in particular Land Surface Temperature (LST). However such data are limited in spatial and/or temporal resolutions which prevents from an analysis at fine scales. For example, MODIS satellite provides daily acquisitions with 1Km spatial resolutions which is not sufficient to deal with highly heterogeneous environments as agricultural parcels. Therefore, image super-resolution is a crucial task to better exploit MODIS LSTs. This issue is tackled in this paper. We introduce a deep learning-based algorithm, named Multi-residual U-Net, for super-resolution of MODIS LST single-images. Our proposed network is a modified version of U-Net architecture, which aims at super-resolving the input LST image from 1Km to 250m per pixel. The results show that our Multi-residual U-Net outperforms other state-of-the-art methods.
△ Less
Submitted 1 April, 2022; v1 submitted 22 February, 2022;
originally announced February 2022.
-
SelFSR: Self-Conditioned Face Super-Resolution in the Wild via Flow Field Degradation Network
Authors:
Xianfang Zeng,
Jiangning Zhang,
Liang Liu,
Guangzhong Tian,
Yong Liu
Abstract:
In spite of the success on benchmark datasets, most advanced face super-resolution models perform poorly in real scenarios since the remarkable domain gap between the real images and the synthesized training pairs. To tackle this problem, we propose a novel domain-adaptive degradation network for face super-resolution in the wild. This degradation network predicts a flow field along with an interm…
▽ More
In spite of the success on benchmark datasets, most advanced face super-resolution models perform poorly in real scenarios since the remarkable domain gap between the real images and the synthesized training pairs. To tackle this problem, we propose a novel domain-adaptive degradation network for face super-resolution in the wild. This degradation network predicts a flow field along with an intermediate low resolution image. Then, the degraded counterpart is generated by warping the intermediate image. With the preference of capturing motion blur, such a model performs better at preserving identity consistency between the original images and the degraded. We further present the self-conditioned block for super-resolution network. This block takes the input image as a condition term to effectively utilize facial structure information, eliminating the reliance on explicit priors, e.g. facial landmarks or boundary. Our model achieves state-of-the-art performance on both CelebA and real-world face dataset. The former demonstrates the powerful generative ability of our proposed architecture while the latter shows great identity consistency and perceptual quality in real-world images.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
FlowMapper.org: A web-based framework for designing origin-destination flow maps
Authors:
Caglar Koylu,
Geng Tian,
Mary Windsor
Abstract:
FlowMapper.org is a web-based framework for automated production and design of origin-destination flow maps (https://flowmapper.org). FlowMapper has four major features that contribute to the advancement of existing flow mapping systems. First, users can upload and process their own data to design and share customized flow maps. The ability to save data, cartographic design and map elements in a p…
▽ More
FlowMapper.org is a web-based framework for automated production and design of origin-destination flow maps (https://flowmapper.org). FlowMapper has four major features that contribute to the advancement of existing flow mapping systems. First, users can upload and process their own data to design and share customized flow maps. The ability to save data, cartographic design and map elements in a project file allows users to easily share their data and cartographic design with others. Second, users can customize the flow line symbology by including options to change the flow line style, width, and coloring. FlowMapper includes algorithms for drawing curved line styles with varying thickness along a flow line, which reduces the visual cluttering and overlapping by tapering flow lines at origin and destination points. The ability to customize flow symbology supports different flow map reading tasks such as comparing flow magnitudes and directions and identifying flow and location clusters that are strongly connected with each other. Third, FlowMapper supports supplementary layers such as node symbol, choropleth, and base maps to contextualize flow patterns with location references and characteristics such as net-flow, gross flow, net-flow ratio, or a locational attribute such as population density. FlowMapper also supports user interactions to zoom, filter, and obtain details-on-demand functions to support visual information seeking about nodes, flows and regions. Finally, the web-based architecture of FlowMapper supports server side computational capabilities to process, normalize and summarize large flow data to reveal natural patterns of flows.
△ Less
Submitted 9 September, 2021;
originally announced October 2021.
-
Sensing and Classification Using Massive MIMO: A Tensor Decomposition-Based Approach
Authors:
B. R. Manoj,
Guoda Tian,
Sara Gunnarsson,
Fredrik Tufvesson,
Erik G. Larsson
Abstract:
Wireless-based activity sensing has gained significant attention due to its wide range of applications. We investigate radio-based multi-class classification of human activities using massive multiple-input multiple-output (MIMO) channel measurements in line-of-sight and non line-of-sight scenarios. We propose a tensor decomposition-based algorithm to extract features by exploiting the complex cor…
▽ More
Wireless-based activity sensing has gained significant attention due to its wide range of applications. We investigate radio-based multi-class classification of human activities using massive multiple-input multiple-output (MIMO) channel measurements in line-of-sight and non line-of-sight scenarios. We propose a tensor decomposition-based algorithm to extract features by exploiting the complex correlation characteristics across time, frequency, and space from channel tensors formed from the measurements, followed by a neural network that learns the relationship between the input features and output target labels. Through evaluations of real measurement data, it is demonstrated that the classification accuracy using a massive MIMO array achieves significantly better results compared to the state-of-the-art even for a smaller experimental data set.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
BF-QC: Belief Functions on Quantum Circuits
Authors:
Qianli Zhou,
Guojing Tian,
Yong Deng
Abstract:
Dempster-Shafer Theory (DST) of belief function is a basic theory of artificial intelligence, which can represent the underlying knowledge more reasonably than Probability Theory (ProbT). Because of the computation complexity exploding exponentially with the increasing number of elements, the practical application scenarios of DST are limited. In this paper, we encode Basic Belief Assignments (BBA…
▽ More
Dempster-Shafer Theory (DST) of belief function is a basic theory of artificial intelligence, which can represent the underlying knowledge more reasonably than Probability Theory (ProbT). Because of the computation complexity exploding exponentially with the increasing number of elements, the practical application scenarios of DST are limited. In this paper, we encode Basic Belief Assignments (BBA) into quantum superposition states and propose the implementation and operation methods of BBA on quantum circuits. We decrease the computation complexity of the matrix evolution on BBA (MEoB) on quantum circuits. Based on the MEoB, we realize the quantum belief functions' implementation, the similarity measurements of BBAs, evidence Combination Rules (CR), and probability transformation on quantum circuits.
△ Less
Submitted 12 October, 2022; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Poisoning MorphNet for Clean-Label Backdoor Attack to Point Clouds
Authors:
Guiyu Tian,
Wenhao Jiang,
Wei Liu,
Yadong Mu
Abstract:
This paper presents Poisoning MorphNet, the first backdoor attack method on point clouds. Conventional adversarial attack takes place in the inference stage, often fooling a model by perturbing samples. In contrast, backdoor attack aims to implant triggers into a model during the training stage, such that the victim model acts normally on the clean data unless a trigger is present in a sample. Thi…
▽ More
This paper presents Poisoning MorphNet, the first backdoor attack method on point clouds. Conventional adversarial attack takes place in the inference stage, often fooling a model by perturbing samples. In contrast, backdoor attack aims to implant triggers into a model during the training stage, such that the victim model acts normally on the clean data unless a trigger is present in a sample. This work follows a typical setting of clean-label backdoor attack, where a few poisoned samples (with their content tampered yet labels unchanged) are injected into the training set. The unique contributions of MorphNet are two-fold. First, it is key to ensure the implanted triggers both visually imperceptible to humans and lead to high attack success rate on the point clouds. To this end, MorphNet jointly optimizes two objectives for sample-adaptive poisoning: a reconstruction loss that preserves the visual similarity between benign / poisoned point clouds, and a classification loss that enforces a modern recognition model of point clouds tends to mis-classify the poisoned sample to a pre-specified target category. This implicitly conducts spectral separation over point clouds, hiding sample-adaptive triggers in fine-grained high-frequency details. Secondly, existing backdoor attack methods are mainly designed for image data, easily defended by some point cloud specific operations (such as denoising). We propose a third loss in MorphNet for suppressing isolated points, leading to improved resistance to denoising-based defense. Comprehensive evaluations are conducted on ModelNet40 and ShapeNetcorev2. Our proposed Poisoning MorphNet outstrips all previous methods with clear margins.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
A hybrid ensemble method with negative correlation learning for regression
Authors:
Yun Bai,
Ganglin Tian,
Yanfei Kang,
Suling Jia
Abstract:
Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study automatically selects and weights sub-models from a heterogeneous model pool. It solves an optimization p…
▽ More
Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study automatically selects and weights sub-models from a heterogeneous model pool. It solves an optimization problem using an interior-point filtering linear-search algorithm. The objective function innovatively incorporates negative correlation learning as a penalty term, with which a diverse model subset can be selected. The best sub-models from each model class are selected to build the NCL ensemble, which performance is better than the simple average and other state-of-the-art weighting methods. It is also possible to improve the NCL ensemble with a regularization term in the objective function. In practice, it is difficult to conclude the optimal sub-model for a dataset prior due to the model uncertainty. Regardless, our method would achieve comparable accuracy as the potential optimal sub-models. In conclusion, the value of this study lies in its ease of use and effectiveness, allowing the hybrid ensemble to embrace diversity and accuracy.
△ Less
Submitted 15 May, 2023; v1 submitted 6 April, 2021;
originally announced April 2021.
-
Moving Object Classification with a Sub-6 GHz Massive MIMO Array using Real Data
Authors:
B. R. Manoj,
Guoda Tian,
Sara Gunnarsson,
Fredrik Tufvesson,
Erik G. Larsson
Abstract:
Classification between different activities in an indoor environment using wireless signals is an emerging technology for various applications, including intrusion detection, patient care, and smart home. Researchers have shown different methods to classify activities and their potential benefits by utilizing WiFi signals. In this paper, we analyze classification of moving objects by employing mac…
▽ More
Classification between different activities in an indoor environment using wireless signals is an emerging technology for various applications, including intrusion detection, patient care, and smart home. Researchers have shown different methods to classify activities and their potential benefits by utilizing WiFi signals. In this paper, we analyze classification of moving objects by employing machine learning on real data from a massive multi-input-multi-output (MIMO) system in an indoor environment. We conduct measurements for different activities in both line-of-sight and non line-of-sight scenarios with a massive MIMO testbed operating at 3.7 GHz. We propose algorithms to exploit amplitude and phase-based features classification task. For the considered setup, we benchmark the classification performance and show that we can achieve up to 98% accuracy using real massive MIMO data, even with a small number of experiments. Furthermore, we demonstrate the gain in performance results with a massive MIMO system as compared with that of a limited number of antennas such as in WiFi devices.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Synergetic Learning of Heterogeneous Temporal Sequences for Multi-Horizon Probabilistic Forecasting
Authors:
Longyuan Li,
Jihai Zhang,
Junchi Yan,
Yaohui Jin,
Yunhao Zhang,
Yanjie Duan,
Guangjian Tian
Abstract:
Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relatio…
▽ More
Time-series is ubiquitous across applications, such as transportation, finance and healthcare. Time-series is often influenced by external factors, especially in the form of asynchronous events, making forecasting difficult. However, existing models are mainly designated for either synchronous time-series or asynchronous event sequence, and can hardly provide a synthetic way to capture the relation between them. We propose Variational Synergetic Multi-Horizon Network (VSMHN), a novel deep conditional generative model. To learn complex correlations across heterogeneous sequences, a tailored encoder is devised to combine the advances in deep point processes models and variational recurrent neural networks. In addition, an aligned time coding and an auxiliary transition scheme are carefully devised for batched training on unaligned sequences. Our model can be trained effectively using stochastic variational inference and generates probabilistic predictions with Monte-Carlo simulation. Furthermore, our model produces accurate, sharp and more realistic probabilistic forecasts. We also show that modeling asynchronous event sequences is crucial for multi-horizon time-series forecasting.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Memory Group Sampling Based Online Action Recognition Using Kinetic Skeleton Features
Authors:
Guoliang Liu,
Qinghui Zhang,
Yichao Cao,
Junwei Li,
Hao Wu,
Guohui Tian
Abstract:
Online action recognition is an important task for human centered intelligent services, which is still difficult to achieve due to the varieties and uncertainties of spatial and temporal scales of human actions. In this paper, we propose two core ideas to handle the online action recognition problem. First, we combine the spatial and temporal skeleton features to depict the actions, which include…
▽ More
Online action recognition is an important task for human centered intelligent services, which is still difficult to achieve due to the varieties and uncertainties of spatial and temporal scales of human actions. In this paper, we propose two core ideas to handle the online action recognition problem. First, we combine the spatial and temporal skeleton features to depict the actions, which include not only the geometrical features, but also multi-scale motion features, such that both the spatial and temporal information of the action are covered. Second, we propose a memory group sampling method to combine the previous action frames and current action frames, which is based on the truth that the neighbouring frames are largely redundant, and the sampling mechanism ensures that the long-term contextual information is also considered. Finally, an improved 1D CNN network is employed for training and testing using the features from sampled frames. The comparison results to the state of the art methods using the public datasets show that the proposed method is fast and efficient, and has competitive performance
△ Less
Submitted 3 November, 2020; v1 submitted 1 November, 2020;
originally announced November 2020.
-
Motion Planning Combines Psychological Safety and Motion Prediction for a Sense Motive Robot
Authors:
Hejing Ling,
Guoliang Liu,
Guohui Tian
Abstract:
Human safety is the most important demand for human robot interaction and collaboration (HRIC), which not only refers to physical safety, but also includes psychological safety. Although many robots with different configurations have entered our living and working environments, the human safety problem is still an ongoing research problem in human-robot coexistence scenarios. This paper addresses…
▽ More
Human safety is the most important demand for human robot interaction and collaboration (HRIC), which not only refers to physical safety, but also includes psychological safety. Although many robots with different configurations have entered our living and working environments, the human safety problem is still an ongoing research problem in human-robot coexistence scenarios. This paper addresses the human safety issue by covering both the physical safety and psychological safety aspects. First, we introduce an adaptive robot velocity control and step size adjustment method according to human facial expressions, such that the robot can adjust its movement to keep safety when the human emotion is unusual. Second, we predict the human motion by detecting the suddenly changes of human head pose and gaze direction, such that the robot can infer whether the human attention is distracted, predict the next move of human and rebuild a repulsive force to avoid potential collision. Finally, we demonstrate our idea using a 7 DOF TIAGo robot in a dynamic HRIC environment, which shows that the robot becomes sense motive, and responds to human action and emotion changes quickly and efficiently.
△ Less
Submitted 23 October, 2020; v1 submitted 29 September, 2020;
originally announced October 2020.
-
New Oracle-Efficient Algorithms for Private Synthetic Data Release
Authors:
Giuseppe Vietri,
Grace Tian,
Mark Bun,
Thomas Steinke,
Zhiwei Steven Wu
Abstract:
We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented…
▽ More
We present three new algorithms for constructing differentially private synthetic data---a sanitized version of a sensitive dataset that approximately preserves the answers to a large collection of statistical queries. All three algorithms are \emph{oracle-efficient} in the sense that they are computationally efficient when given access to an optimization oracle. Such an oracle can be implemented using many existing (non-private) optimization tools such as sophisticated integer program solvers. While the accuracy of the synthetic data is contingent on the oracle's optimization performance, the algorithms satisfy differential privacy even in the worst case. For all three algorithms, we provide theoretical guarantees for both accuracy and privacy. Through empirical evaluation, we demonstrate that our methods scale well with both the dimensionality of the data and the number of queries. Compared to the state-of-the-art method High-Dimensional Matrix Mechanism \cite{McKennaMHM18}, our algorithms provide better accuracy in the large workload and high privacy regime (corresponding to low privacy loss $\varepsilon$).
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
Do RNN and LSTM have Long Memory?
Authors:
Jingyu Zhao,
Feiqing Huang,
Jia Lv,
Yanjie Duan,
Zhen Qin,
Guodong Li,
Guangjian Tian
Abstract:
The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications. With its success and drawbacks in mind, this paper raises the question - do RNN and LSTM have long memory? We answer it partially by proving that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networ…
▽ More
The LSTM network was proposed to overcome the difficulty in learning long-term dependence, and has made significant advancements in applications. With its success and drawbacks in mind, this paper raises the question - do RNN and LSTM have long memory? We answer it partially by proving that RNN and LSTM do not have long memory from a statistical perspective. A new definition for long memory networks is further introduced, and it requires the model weights to decay at a polynomial rate. To verify our theory, we convert RNN and LSTM into long memory networks by making a minimal modification, and their superiority is illustrated in modeling long-term dependence of various datasets.
△ Less
Submitted 10 June, 2020; v1 submitted 6 June, 2020;
originally announced June 2020.
-
Amplitude and Phase Estimation for Absolute Calibration of Massive MIMO Front-Ends
Authors:
Guoda Tian,
Harsh Tataria,
Fredrik Tufvesson
Abstract:
Massive multiple-input multiple-output (MIMO) promises significantly higher performance relative to conventional multiuser systems. However, the promised gains of massive MIMO systems rely heavily on the accuracy of the absolute front-end calibration, as well as quality of channel estimates at the base station (BS). In this paper, we analyze user equipment-aided calibration mechanism to estimate t…
▽ More
Massive multiple-input multiple-output (MIMO) promises significantly higher performance relative to conventional multiuser systems. However, the promised gains of massive MIMO systems rely heavily on the accuracy of the absolute front-end calibration, as well as quality of channel estimates at the base station (BS). In this paper, we analyze user equipment-aided calibration mechanism to estimate the amplitude scaling and phase drift at each radio-frequency chain interfacing with the BS array. Assuming a uniform linear array at the BS and Ricean fading, we obtain the estimation parameters with moment-based (amplitude, phase) and maximum-likelihood (phase-only) estimation techniques. In stark contrast to previous works, we mathematically articulate the equivalence of the two approaches for phase estimation. Furthermore, we rigorously derive a Cramer-Rao lower bound to characterize the accuracy of the two estimators. Via numerical simulations, we evaluate the estimator performance with varying dominant line-of-sight powers, dominant angles-of-arrival, and signal-to-noise ratios.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Fast Generating A Large Number of Gumbel-Max Variables
Authors:
Yiyan Qi,
Pinghui Wang,
Yuanming Zhang,
Junzhou Zhao,
Guangjian Tian,
Xiaohong Guan
Abstract:
The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ (or a Gumbel-Max variable $i$) in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$…
▽ More
The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ (or a Gumbel-Max variable $i$) in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight element $i$, and then samples the element $i$ with the largest value of $g_i+\ln v_i$. Recently, applications including similarity estimation and graph embedding require to generate $k$ independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large $k$ (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, \emph{FastGM}, that reduces the time complexity from $O(kn^+)$ to $O(k \ln k + n^+)$, where $n^+$ is the number of positive elements in the vector of interest. Instead of computing $k$ independent Gumbel random variables directly, we find that there exists a technique to generate these variables in descending order. Using this technique, our method FastGM computes variables $g_i+\ln v_i$ for all positive elements $i$ in descending order. As a result, FastGM significantly reduces the computation time because we can stop the procedure of Gumbel random variables computing for many elements especially for those with small weights. Experiments on a variety of real-world datasets show that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy and incurring additional expenses.
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Compact Autoregressive Network
Authors:
Di Wang,
Feiqing Huang,
Jingyu Zhao,
Guodong Li,
Guangjian Tian
Abstract:
Autoregressive networks can achieve promising performance in many sequence modeling tasks with short-range dependence. However, when handling high-dimensional inputs and outputs, the huge amount of parameters in the network lead to expensive computational cost and low learning efficiency. The problem can be alleviated slightly by introducing one more narrow hidden layer to the network, but the sam…
▽ More
Autoregressive networks can achieve promising performance in many sequence modeling tasks with short-range dependence. However, when handling high-dimensional inputs and outputs, the huge amount of parameters in the network lead to expensive computational cost and low learning efficiency. The problem can be alleviated slightly by introducing one more narrow hidden layer to the network, but the sample size required to achieve a certain training error is still large. To address this challenge, we rearrange the weight matrices of a linear autoregressive network into a tensor form, and then make use of Tucker decomposition to represent low-rank structures. This leads to a novel compact autoregressive network, called Tucker AutoRegressive (TAR) net. Interestingly, the TAR net can be applied to sequences with long-range dependence since the dimension along the sequential order is reduced. Theoretical studies show that the TAR net improves the learning efficiency, and requires much fewer samples for model training. Experiments on synthetic and real-world datasets demonstrate the promising performance of the proposed compact network.
△ Less
Submitted 6 September, 2019;
originally announced September 2019.
-
Two-Stream Video Classification with Cross-Modality Attention
Authors:
Lu Chi,
Guiyu Tian,
Yadong Mu,
Qi Tian
Abstract:
Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream's prediction scores at the last stage. A valid question is whether there exists a more effective method to fuse information cross modality. With the development of attention mechanism in natural lan…
▽ More
Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream's prediction scores at the last stage. A valid question is whether there exists a more effective method to fuse information cross modality. With the development of attention mechanism in natural language processing, there emerge many successful applications of attention in the field of computer vision. In this paper, we propose a cross-modality attention operation, which can obtain information from other modality in a more effective way than two-stream. Correspondingly we implement a compatible block named CMA block, which is a wrapper of our proposed attention operation. CMA can be plugged into many existing architectures. In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification. All experiments clearly demonstrate strong performance superiority by our proposed method. We also analyze the advantages of the CMA block by visualizing the attention map, which intuitively shows how the block helps the final prediction.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks
Authors:
Guanzhong Tian,
Yi Yuan,
Yong liu
Abstract:
We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain…
▽ More
We propose an end to end deep learning approach for generating real-time facial animation from just audio. Specifically, our deep architecture employs deep bidirectional long short-term memory network and attention mechanism to discover the latent representations of time-varying contextual information within the speech and recognize the significance of different information contributed to certain face status. Therefore, our model is able to drive different levels of facial movements at inference and automatically keep up with the corresponding pitch and latent speaking style in the input audio, with no assumption or further human intervention. Evaluation results show that our method could not only generate accurate lip movements from audio, but also successfully regress the speaker's time-varying facial movements.
△ Less
Submitted 27 May, 2019;
originally announced May 2019.
-
Multipath IP Routing on End Devices: Motivation, Design, and Performance
Authors:
Liyang Sun,
Guibin Tian,
Guanyu Zhu,
Yong Liu,
Hang Shi,
David Dai
Abstract:
Most end devices are now equipped with multiple network interfaces. Applications can exploit all available interfaces and benefit from multipath transmission. Recently Multipath TCP (MPTCP) was proposed to implement multipath transmission at the transport layer and has attracted lots of attention from academia and industry. However, MPTCP only supports TCP-based applications and its multipath rout…
▽ More
Most end devices are now equipped with multiple network interfaces. Applications can exploit all available interfaces and benefit from multipath transmission. Recently Multipath TCP (MPTCP) was proposed to implement multipath transmission at the transport layer and has attracted lots of attention from academia and industry. However, MPTCP only supports TCP-based applications and its multipath routing flexibility is limited. In this paper, we investigate the possibility of orchestrating multipath transmission from the network layer of end devices, and develop a Multipath IP (MPIP) design consisting of signaling, session and path management, multipath routing, and NAT traversal. We implement MPIP in Linux and Android kernels. Through controlled lab experiments and Internet experiments, we demonstrate that MPIP can effectively achieve multipath gains at the network layer. It not only supports the legacy TCP and UDP protocols, but also works seamlessly with MPTCP. By facilitating user-defined customized routing, MPIP can route traffic from competing applications in a coordinated fashion to maximize the aggregate user Quality-of-Experience.
△ Less
Submitted 17 September, 2017;
originally announced September 2017.
-
Terahertz Security Image Quality Assessment by No-reference Model Observers
Authors:
Menghan Hu,
Xiongkuo Min,
Guangtao Zhai,
Wenhan Zhu,
Yucheng Zhu,
Zhaodi Wang,
Xiaokang Yang,
Guang Tian
Abstract:
To provide the possibility of developing objective image quality assessment (IQA) algorithms for THz security images, we constructed the THz security image database (THSID) including a total of 181 THz security images with the resolution of 127*380. The main distortion types in THz security images were first analyzed for the design of subjective evaluation criteria to acquire the mean opinion scor…
▽ More
To provide the possibility of developing objective image quality assessment (IQA) algorithms for THz security images, we constructed the THz security image database (THSID) including a total of 181 THz security images with the resolution of 127*380. The main distortion types in THz security images were first analyzed for the design of subjective evaluation criteria to acquire the mean opinion scores. Subsequently, the existing no-reference IQA algorithms, which were 5 opinion-aware approaches viz., NFERM, GMLF, DIIVINE, BRISQUE and BLIINDS2, and 8 opinion-unaware approaches viz., QAC, SISBLIM, NIQE, FISBLIM, CPBD, S3 and Fish_bb, were executed for the evaluation of the THz security image quality. The statistical results demonstrated the superiority of Fish_bb over the other testing IQA approaches for assessing the THz image quality with PLCC (SROCC) values of 0.8925 (-0.8706), and with RMSE value of 0.3993. The linear regression analysis and Bland-Altman plot further verified that the Fish__bb could substitute for the subjective IQA. Nonetheless, for the classification of THz security images, we tended to use S3 as a criterion for ranking THz security image grades because of the relatively low false positive rate in classifying bad THz image quality into acceptable category (24.69%). Interestingly, due to the specific property of THz image, the average pixel intensity gave the best performance than the above complicated IQA algorithms, with the PLCC, SROCC and RMSE of 0.9001, -0.8800 and 0.3857, respectively. This study will help the users such as researchers or security staffs to obtain the THz security images of good quality. Currently, our research group is attempting to make this research more comprehensive.
△ Less
Submitted 3 October, 2017; v1 submitted 12 July, 2017;
originally announced July 2017.
-
Self-Taught Convolutional Neural Networks for Short Text Clustering
Authors:
Jiaming Xu,
Bo Xu,
Peng Wang,
Suncong Zheng,
Guanhua Tian,
Jun Zhao,
Bo Xu
Abstract:
Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC^2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw t…
▽ More
Short text clustering is a challenging problem due to its sparseness of text representation. Here we propose a flexible Self-Taught Convolutional neural network framework for Short Text Clustering (dubbed STC^2), which can flexibly and successfully incorporate more useful semantic features and learn non-biased deep text representation in an unsupervised manner. In our framework, the original raw text features are firstly embedded into compact binary codes by using one existing unsupervised dimensionality reduction methods. Then, word embeddings are explored and fed into convolutional neural networks to learn deep feature representations, meanwhile the output units are used to fit the pre-trained binary codes in the training process. Finally, we get the optimal clusters by employing K-means to cluster the learned representations. Extensive experimental results demonstrate that the proposed framework is effective, flexible and outperform several popular clustering methods when tested on three public short text datasets.
△ Less
Submitted 31 December, 2016;
originally announced January 2017.