subscribe to arXiv mailings

LVCP: LiDAR-Vision Tightly Coupled Collaborative Real-time Relative Positioning

Authors: Zhuozhu Jian, Qixuan Li, Shengtao Zheng, Xueqian Wang, Xinlei Chen

Abstract: In air-ground collaboration scenarios without GPS and prior maps, the relative positioning of drones and unmanned ground vehicles (UGVs) has always been a challenge. For a drone equipped with monocular camera and an UGV equipped with LiDAR as an external sensor, we propose a robust and real-time relative pose estimation method (LVCP) based on the tight coupling of vision and LiDAR point cloud info… ▽ More In air-ground collaboration scenarios without GPS and prior maps, the relative positioning of drones and unmanned ground vehicles (UGVs) has always been a challenge. For a drone equipped with monocular camera and an UGV equipped with LiDAR as an external sensor, we propose a robust and real-time relative pose estimation method (LVCP) based on the tight coupling of vision and LiDAR point cloud information, which does not require prior information such as maps or precise initial poses. Given that large-scale point clouds generated by 3D sensors has more accurate spatial geometric information than the feature point cloud generated by image, we utilize LiDAR point clouds to correct the drift in visual-inertial odometry (VIO) when the camera undergoes significant shaking or the IMU has a low signal-to-noise ratio. To achieve this, we propose a novel coarse-to-fine framework for LiDAR-vision collaborative localization. In this framework, we construct point-plane association based on spatial geometric information, and innovatively construct a point-aided Bundle Adjustment (BA) problem as the backend to simultaneously estimate the relative pose of the camera and LiDAR and correct the VIO drift. In this process, we propose a particle swarm optimization (PSO) based sampling algorithm to complete the coarse estimation of the current camera-LiDAR pose. In this process, the initial pose of the camera used for sampling is obtained based on VIO propagation, and the valid feature-plane association number (VFPN) is used to trigger PSO-sampling process. Additionally, we propose a method that combines Structure from Motion (SFM) and multi-level sampling to initialize the algorithm, addressing the challenge of lacking initial values. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: See more details in https://sites.google.com/view/lvcp

arXiv:2405.14383 [pdf, other]

Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

Authors: Zhihua Wen, Zhiliang Tian, Zexin Jian, Zhen Huang, Pei Ke, Yifu Gao, Minlie Huang, Dongsheng Li

Abstract: Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations. The knowledge boundary (KB) of an LLM limits its factual understanding, beyond which it may begin to hallucinate. Investigating the perception of LLMs' KB is crucial for detecting hallucinations and LLMs' reliable generation. Current studies perceive LLMs' KB on questions with a concrete answer (clos… ▽ More Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations. The knowledge boundary (KB) of an LLM limits its factual understanding, beyond which it may begin to hallucinate. Investigating the perception of LLMs' KB is crucial for detecting hallucinations and LLMs' reliable generation. Current studies perceive LLMs' KB on questions with a concrete answer (close-ended questions) while paying limited attention to semi-open-ended questions (SoeQ) that correspond to many potential answers. Some researchers achieve it by judging whether the question is answerable or not. However, this paradigm is unsuitable for SoeQ, which are usually partially answerable, containing both answerable and ambiguous (unanswerable) answers. Ambiguous answers are essential for knowledge-seeking, but they may go beyond the KB of LLMs. In this paper, we perceive the LLMs' KB with SoeQ by discovering more ambiguous answers. First, we apply an LLM-based approach to construct SoeQ and obtain answers from a target LLM. Unfortunately, the output probabilities of mainstream black-box LLMs are inaccessible to sample for low-probability ambiguous answers. Therefore, we apply an open-sourced auxiliary model to explore ambiguous answers for the target LLM. We calculate the nearest semantic representation for existing answers to estimate their probabilities, with which we reduce the generation probability of high-probability answers to achieve a more effective generation. Finally, we compare the results from the RAG-based evaluation and LLM self-evaluation to categorize four types of ambiguous answers that are beyond the KB of the target LLM. Following our method, we construct a dataset to perceive the KB for GPT-4. We find that GPT-4 performs poorly on SoeQ and is often unaware of its KB. Besides, our auxiliary model, LLaMA-2-13B, is effective in discovering more ambiguous answers. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.02520 [pdf, other]

TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU

Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Zizhong Chen, Franck Cappello

Abstract: The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the erro… ▽ More The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the error propagation issue by encoding a batch of input signals with different linear combinations, which not only allows fast batched error detection but also enables error correction on-the-fly instead of recomputing. We explore two-sided checksum designs at the kernel, thread, and threadblock levels, and provide a baseline FFT implementation competitive to the state-of-the-art, closed-source cuFFT. We demonstrate a kernel fusion strategy to mitigate and overlap the computation/memory overhead introduced by fault tolerance with underlying FFT computation. We present a template-based code generation strategy to reduce development costs and support a wide range of input sizes and data types. Experimental results on an NVIDIA A100 server GPU and a Tesla Turing T4 GPU demonstrate TurboFFT offers a competitive or superior performance compared to the closed-source library cuFFT. TurboFFT only incurs a minimum overhead (7\% to 15\% on average) compared to cuFFT, even under hundreds of error injections per minute for both single and double precision. TurboFFT achieves a 23\% improvement compared to existing fault tolerance FFT schemes. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.02840 [pdf, ps, other]

A Survey on Error-Bounded Lossy Compression for Scientific Datasets

Authors: Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello

Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each… ▽ More Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each of them features particular pros and cons. In this paper we provide a comprehensive survey of emerging error-bounded lossy compression techniques for different use cases each involving big data to process. The key contribution is fourfold. (1) We summarize an insightful taxonomy of lossy compression into 6 classic compression models. (2) We provide a comprehensive survey of 10+ commonly used compression components/modules used in error-bounded lossy compressors. (3) We provide a comprehensive survey of 10+ state-of-the-art error-bounded lossy compressors as well as how they combine the various compression modules in their designs. (4) We provide a comprehensive survey of the lossy compression for 10+ modern scientific applications and use-cases. We believe this survey is useful to multiple communities including scientific applications, high-performance computing, lossy compression, and big data. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: submitted to ACM Computing journal, requited to be 35 pages including references

arXiv:2404.01699 [pdf, other]

Task Integration Distillation for Object Detectors

Authors: Hai Su, ZhenWen Jian, Songsen Yu

Abstract: Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial u… ▽ More Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial understanding of the object detector's comprehensive task, resulting in skewed estimations and potentially adverse effects. Therefore, we propose a knowledge distillation method that addresses both the classification and regression tasks, incorporating a task significance strategy. By evaluating the importance of features based on the output of the detector's two sub-tasks, our approach ensures a balanced consideration of both classification and regression tasks in object detection. Drawing inspiration from real-world teaching processes and the definition of learning condition, we introduce a method that focuses on both key and weak areas. By assessing the value of features for knowledge distillation based on their importance differences, we accurately capture the current model's learning situation. This method effectively prevents the issue of biased predictions about the model's learning reality caused by an incomplete utilization of the detector's outputs. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.12568 [pdf, other]

Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices

Authors: Xueshuo Xie, Haoxu Wang, Zhaolong Jian, Tao Li, Wei Wang, Zhiwei Xu, Guiling Wang

Abstract: Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data, addressing concerns about data privacy in consumer Internet of Things (IoT) devices. For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential. However, the limited secure memory in TEEs poses challenges for dep… ▽ More Edge intelligence enables resource-demanding Deep Neural Network (DNN) inference without transferring original data, addressing concerns about data privacy in consumer Internet of Things (IoT) devices. For privacy-sensitive applications, deploying models in hardware-isolated trusted execution environments (TEEs) becomes essential. However, the limited secure memory in TEEs poses challenges for deploying DNN inference, and alternative techniques like model partitioning and offloading introduce performance degradation and security issues. In this paper, we present a novel approach for advanced model deployment in TrustZone that ensures comprehensive privacy preservation during model inference. We design a memory-efficient management method to support memory-demanding inference in TEEs. By adjusting the memory priority, we effectively mitigate memory leakage risks and memory overlap conflicts, resulting in 32 lines of code alterations in the trusted operating system. Additionally, we leverage two tiny libraries: S-Tinylib (2,538 LoCs), a tiny deep learning library, and Tinylibm (827 LoCs), a tiny math library, to support efficient inference in TEEs. We implemented a prototype on Raspberry Pi 3B+ and evaluated it using three well-known lightweight DNN models. The experimental results demonstrate that our design significantly improves inference speed by 3.13 times and reduces power consumption by over 66.5% compared to non-memory optimization method in TEEs. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2311.12133 [pdf, other]

High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation

Authors: Jinyang Liu, Sheng Di, Kai Zhao, Xin Liang, Sian Jin, Zizhe Jian, Jiajun Huang, Shixun Wu, Zizhong Chen, Franck Cappello

Abstract: Error-bounded lossy compression has been identified as a promising solution for significantly reducing scientific data volumes upon users' requirements on data distortion. For the existing scientific error-bounded lossy compressors, some of them (such as SPERR and FAZ) can reach fairly high compression ratios and some others (such as SZx, SZ, and ZFP) feature high compression speeds, but they rare… ▽ More Error-bounded lossy compression has been identified as a promising solution for significantly reducing scientific data volumes upon users' requirements on data distortion. For the existing scientific error-bounded lossy compressors, some of them (such as SPERR and FAZ) can reach fairly high compression ratios and some others (such as SZx, SZ, and ZFP) feature high compression speeds, but they rarely exhibit both high ratio and high speed meanwhile. In this paper, we propose HPEZ with newly-designed interpolations and quality-metric-driven auto-tuning, which features significantly improved compression quality upon the existing high-performance compressors, meanwhile being exceedingly faster than high-ratio compressors. The key contributions lie in the following points: (1) We develop a series of advanced techniques such as interpolation re-ordering, multi-dimensional interpolation, and natural cubic splines to significantly improve compression qualities with interpolation-based data prediction. (2) The auto-tuning module in HPEZ has been carefully designed with novel strategies, including but not limited to block-wise interpolation tuning, dynamic dimension freezing, and Lorenzo tuning. (3) We thoroughly evaluate HPEZ compared with many other compressors on six real-world scientific datasets. Experiments show that HPEZ outperforms other high-performance error-bounded lossy compressors in compression ratio by up to 140% under the same error bound, and by up to 360% under the same PSNR. In parallel data transfer experiments on the distributed database, HPEZ achieves a significant performance gain with up to 40% time cost reduction over the second-best compressor. △ Less

Submitted 13 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2310.02648 [pdf, other]

Long-Term Dynamic Window Approach for Kinodynamic Local Planning in Static and Crowd Environments

Authors: Zhiqiang Jian, Songyi Zhang, Lingfeng Sun, Wei Zhan, Nanning Zheng, Masayoshi Tomizuka

Abstract: Local planning for a differential wheeled robot is designed to generate kinodynamic feasible actions that guide the robot to a goal position along the navigation path while avoiding obstacles. Reactive, predictive, and learning-based methods are widely used in local planning. However, few of them can fit static and crowd environments while satisfying kinodynamic constraints simultaneously. To solv… ▽ More Local planning for a differential wheeled robot is designed to generate kinodynamic feasible actions that guide the robot to a goal position along the navigation path while avoiding obstacles. Reactive, predictive, and learning-based methods are widely used in local planning. However, few of them can fit static and crowd environments while satisfying kinodynamic constraints simultaneously. To solve this problem, we propose a novel local planning method. The method applies a long-term dynamic window approach to generate an initial trajectory and then optimizes it with graph optimization. The method can plan actions under the robot's kinodynamic constraints in real time while allowing the generated actions to be safer and more jitterless. Experimental results show that the proposed method adapts well to crowd and static environments and outperforms most SOTA approaches. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 9 pages, 7 figures

Journal ref: 2023 IEEE RA-L

arXiv:2310.02625 [pdf, other]

Adaptive Spatio-Temporal Voxels Based Trajectory Planning for Autonomous Driving in Highway Traffic Flow

Authors: Zhiqiang Jian, Songyi Zhang, Lingfeng Sun, Wei Zhan, Masayoshi Tomizuka, Nanning Zheng

Abstract: Trajectory planning is crucial for the safe driving of autonomous vehicles in highway traffic flow. Currently, some advanced trajectory planning methods utilize spatio-temporal voxels to construct feasible regions and then convert trajectory planning into optimization problem solving based on the feasible regions. However, these feasible region construction methods cannot adapt to the changes in d… ▽ More Trajectory planning is crucial for the safe driving of autonomous vehicles in highway traffic flow. Currently, some advanced trajectory planning methods utilize spatio-temporal voxels to construct feasible regions and then convert trajectory planning into optimization problem solving based on the feasible regions. However, these feasible region construction methods cannot adapt to the changes in dynamic environments, making them difficult to apply in complex traffic flow. In this paper, we propose a trajectory planning method based on adaptive spatio-temporal voxels which improves the construction of feasible regions and trajectory optimization while maintaining the quadratic programming form. The method can adjust feasible regions and trajectory planning according to real-time traffic flow and environmental changes, realizing vehicles to drive safely in complex traffic flow. The proposed method has been tested in both open-loop and closed-loop environments, and the test results show that our method outperforms the current planning methods. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 8 pages, 5 figures

Journal ref: IEEE ITSC 2023

arXiv:2309.03475 [pdf, other]

InteractionNet: Joint Planning and Prediction for Autonomous Driving with Transformers

Authors: Jiawei Fu, Yanqing Shen, Zhiqiang Jian, Shitao Chen, Jingmin Xin, Nanning Zheng

Abstract: Planning and prediction are two important modules of autonomous driving and have experienced tremendous advancement recently. Nevertheless, most existing methods regard planning and prediction as independent and ignore the correlation between them, leading to the lack of consideration for interaction and dynamic changes of traffic scenarios. To address this challenge, we propose InteractionNet, wh… ▽ More Planning and prediction are two important modules of autonomous driving and have experienced tremendous advancement recently. Nevertheless, most existing methods regard planning and prediction as independent and ignore the correlation between them, leading to the lack of consideration for interaction and dynamic changes of traffic scenarios. To address this challenge, we propose InteractionNet, which leverages transformer to share global contextual reasoning among all traffic participants to capture interaction and interconnect planning and prediction to achieve joint. Besides, InteractionNet deploys another transformer to help the model pay extra attention to the perceived region containing critical or unseen vehicles. InteractionNet outperforms other baselines in several benchmarks, especially in terms of safety, which benefits from the joint consideration of planning and forecasting. The code will be available at https://github.com/fujiawei0724/InteractionNet. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: Accepted to IROS 2023

arXiv:2307.00599 [pdf, other]

RH-Map: Online Map Construction Framework of Dynamic Objects Removal Based on Region-wise Hash Map Structure

Authors: Zihong Yan, Xiaoyi Wu, Zhuozhu Jian, Bin Lan Xueqian Wang, Bin Liang

Abstract: Mobile robots navigating in outdoor environments frequently encounter the issue of undesired traces left by dynamic objects and manifested as obstacles on map, impeding robots from achieving accurate localization and effective navigation. To tackle the problem, a novel map construction framework based on 3D region-wise hash map structure (RH-Map) is proposed, consisting of front-end scan fresher a… ▽ More Mobile robots navigating in outdoor environments frequently encounter the issue of undesired traces left by dynamic objects and manifested as obstacles on map, impeding robots from achieving accurate localization and effective navigation. To tackle the problem, a novel map construction framework based on 3D region-wise hash map structure (RH-Map) is proposed, consisting of front-end scan fresher and back-end removal modules, which realizes real-time map construction and online dynamic object removal (DOR). First, a two-layer 3D region-wise hash map structure of map management is proposed for effective online DOR. Then, in scan fresher, region-wise ground plane estimation (R-GPE) is adopted for estimating and preserving ground information and Scan-to-Map Removal (S2M-R) is proposed to discriminate and remove dynamic regions. Moreover, the lightweight back-end removal module maintaining keyframes is proposed for further DOR. As experimentally verified on SemanticKITTI, our proposed framework yields promising performance on online DOR of map construction compared with the state-of-the-art methods. And we also validate the proposed framework in real-world environments. △ Less

Submitted 24 July, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

arXiv:2306.08977 [pdf, other]

Path Generation for Wheeled Robots Autonomous Navigation on Vegetated Terrain

Authors: Zhuozhu Jian, Zejia Liu, Haoyu Shao, Xueqian Wang, Xinlei Chen, Bin Liang

Abstract: Wheeled robot navigation has been widely used in urban environments, but little research has been conducted on its navigation in wild vegetation. External sensors (LiDAR, camera etc.) are often used to construct point cloud map of the surrounding environment, however, the supporting rigid ground used for travelling cannot be detected due to the occlusion of vegetation. This often causes unsafe or… ▽ More Wheeled robot navigation has been widely used in urban environments, but little research has been conducted on its navigation in wild vegetation. External sensors (LiDAR, camera etc.) are often used to construct point cloud map of the surrounding environment, however, the supporting rigid ground used for travelling cannot be detected due to the occlusion of vegetation. This often causes unsafe or not smooth path during planning process. To address the drawback, we propose the PE-RRT* algorithm, which effectively combines a novel support plane estimation method and sampling algorithm to generate real-time feasible and safe path in vegetation environments. In order to accurately estimate the support plane, we combine external perception and proprioception, and use Multivariate Gaussian Processe Regression (MV-GPR) to estimate the terrain at the sampling nodes. We build a physical experimental platform and conduct experiments in different outdoor environments. Experimental results show that our method has high safety, robustness and generalization. △ Less

Submitted 29 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.02659 [pdf, other]

Hybrid Trajectory Optimization for Autonomous Terrain Traversal of Articulated Tracked Robots

Authors: Zhengzhe Xu, Yanbo Chen, Zhuozhu Jian, Junbo Tan, Xueqian Wang, Bin Liang

Abstract: Autonomous terrain traversal of articulated tracked robots can reduce operator cognitive load to enhance task efficiency and facilitate extensive deployment. We present a novel hybrid trajectory optimization method aimed at generating efficient, stable, and smooth traversal motions. To achieve this, we develop a planar robot-terrain contact model and divide the robot's motion into hybrid modes of… ▽ More Autonomous terrain traversal of articulated tracked robots can reduce operator cognitive load to enhance task efficiency and facilitate extensive deployment. We present a novel hybrid trajectory optimization method aimed at generating efficient, stable, and smooth traversal motions. To achieve this, we develop a planar robot-terrain contact model and divide the robot's motion into hybrid modes of driving and traversing. By using a generalized coordinate description, the configuration space dimension is reduced, which facilitates real-time planning. The hybrid trajectory optimization is transcribed into a nonlinear programming problem and divided into subproblems to be solved in a receding-horizon planning fashion. Mode switching is facilitated by associating optimized motion durations with a predefined traversal sequence. A multi-objective cost function is formulated to further improve the traversal performance. Additionally, map sampling, terrain simplification, and tracking controller modules are integrated into the autonomous terrain traversal system. Our approach is validated in simulation and real-world scenarios with the Searcher robotic platform. Comparative experiments with expert operator control and state-of-the-art methods show advantages in terms of time and energy efficiency, stability, and smoothness of motion. △ Less

Submitted 23 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: IEEE Robotics and Automation Letters (RA-L)

arXiv:2305.02444 [pdf, other]

doi 10.1145/3588195.3595947

FT-GEMM: A Fault Tolerant High Performance GEMM Implementation on x86 CPUs

Authors: Shixun Wu, Yujia Zhai, Jiajun Huang, Zizhe Jian, Zizhong Chen

Abstract: General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by f… ▽ More General matrix/matrix multiplication (GEMM) is crucial for scientific computing and machine learning. However, the increased scale of the computing platforms raises concerns about hardware and software reliability. In this poster, we present FT-GEMM, a high-performance GEMM being capable of tolerating soft errors on-the-fly. We incorporate the fault tolerant functionality at algorithmic level by fusing the memory-intensive operations into the GEMM assembly kernels. We design a cache-friendly scheme for parallel FT-GEMM. Experimental results on Intel Cascade Lake demonstrate that FT-GEMM offers high reliability and performance -- faster than Intel MKL, OpenBLAS, and BLIS by 3.50\%$\sim$ 22.14\% for both serial and parallel GEMM, even under hundreds of errors injected per minute. △ Less

Submitted 8 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2104.00897

arXiv:2305.01024 [pdf, other]

doi 10.1145/3577193.3593715

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

Authors: Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Bryan M. Wong, Zizhong Chen

Abstract: General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reli… ▽ More General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While researchers often strive for faster performance by using large compute platforms, the increased scale of these systems can raise concerns about hardware and software reliability. In this paper, we present a design for a high-performance GEMM with algorithm-based fault tolerance for use on GPUs. We describe fault-tolerant designs for GEMM at the thread, warp, and threadblock levels, and also provide a baseline GEMM implementation that is competitive with or faster than the state-of-the-art, proprietary cuBLAS GEMM. We present a kernel fusion strategy to overlap and mitigate the memory latency due to fault tolerance with the original GEMM computation. To support a wide range of input matrix shapes and reduce development costs, we present a template-based approach for automatic code generation for both fault-tolerant and non-fault-tolerant GEMM implementations. We evaluate our work on NVIDIA Tesla T4 and A100 server GPUs. Experimental results demonstrate that our baseline GEMM presents comparable or superior performance compared to the closed-source cuBLAS. The fault-tolerant GEMM incurs only a minimal overhead (8.89\% on average) compared to cuBLAS even with hundreds of errors injected per minute. For irregularly shaped inputs, the code generator-generated kernels show remarkable speedups of $160\% \sim 183.5\%$ and $148.55\% \sim 165.12\%$ for fault-tolerant and non-fault-tolerant GEMMs, outperforming cuBLAS by up to $41.40\%$. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 11 pages, 2023 International Conference on Supercomputing

arXiv:2209.08539 [pdf, other]

Dynamic Control Barrier Function-based Model Predictive Control to Safety-Critical Obstacle-Avoidance of Mobile Robot

Authors: Zhuozhu Jian, Zihong Yan, Xuanang Lei, Zihong Lu, Bin Lan, Xueqian Wang, Bin Liang

Abstract: This paper presents an efficient and safe method to avoid static and dynamic obstacles based on LiDAR. First, point cloud is used to generate a real-time local grid map for obstacle detection. Then, obstacles are clustered by DBSCAN algorithm and enclosed with minimum bounding ellipses (MBEs). In addition, data association is conducted to match each MBE with the obstacle in the current frame. Cons… ▽ More This paper presents an efficient and safe method to avoid static and dynamic obstacles based on LiDAR. First, point cloud is used to generate a real-time local grid map for obstacle detection. Then, obstacles are clustered by DBSCAN algorithm and enclosed with minimum bounding ellipses (MBEs). In addition, data association is conducted to match each MBE with the obstacle in the current frame. Considering MBE as an observation, Kalman filter (KF) is used to estimate and predict the motion state of the obstacle. In this way, the trajectory of each obstacle in the forward time domain can be parameterized as a set of ellipses. Due to the uncertainty of the MBE, the semi-major and semi-minor axes of the parameterized ellipse are extended to ensure safety. We extend the traditional Control Barrier Function (CBF) and propose Dynamic Control Barrier Function (D-CBF). We combine D-CBF with Model Predictive Control (MPC) to implement safety-critical dynamic obstacle avoidance. Experiments in simulated and real scenarios are conducted to verify the effectiveness of our algorithm. The source code is released for the reference of the community. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: Submitted to IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2203.04541 [pdf, other]

PUTN: A Plane-fitting based Uneven Terrain Navigation Framework

Authors: Zhuozhu Jian, Zihong Lu, Xiao Zhou, Bin Lan, Anxing Xiao, Xueqian Wang, Bin Liang

Abstract: Autonomous navigation of ground robots has been widely used in indoor structured 2D environments, but there are still many challenges in outdoor 3D unstructured environments, especially in rough, uneven terrains. This paper proposed a plane-fitting based uneven terrain navigation framework (PUTN) to solve this problem. The implementation of PUTN is divided into three steps. First, based on Rapidly… ▽ More Autonomous navigation of ground robots has been widely used in indoor structured 2D environments, but there are still many challenges in outdoor 3D unstructured environments, especially in rough, uneven terrains. This paper proposed a plane-fitting based uneven terrain navigation framework (PUTN) to solve this problem. The implementation of PUTN is divided into three steps. First, based on Rapidly-exploring Random Trees (RRT), an improved sample-based algorithm called Plane Fitting RRT* (PF-RRT*) is proposed to obtain a sparse trajectory. Each sampling point corresponds to a custom traversability index and a fitted plane on the point cloud. These planes are connected in series to form a traversable strip. Second, Gaussian Process Regression is used to generate traversability of the dense trajectory interpolated from the sparse trajectory, and the sampling tree is used as the training set. Finally, local planning is performed using nonlinear model predictive control (NMPC). By adding the traversability index and uncertainty to the cost function, and adding obstacles generated by the real-time point cloud to the constraint function, a safe motion planning algorithm with smooth speed and strong robustness is available. Experiments in real scenarios are conducted to verify the effectiveness of the method. The source code is released for the reference of the community. △ Less

Submitted 27 September, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2203.03927 [pdf, other]

Quadruped Guidance Robot for the Visually Impaired: A Comfort-Based Approach

Authors: Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Yunong Yangli, Anxing Xiao, Xueqian Wang, Bin Liang

Abstract: Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the tracti… ▽ More Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the traction force experienced by the human. In this paper, we propose a novel quadruped guidance robot system with a comfort-based concept. We design a controllable traction device that can adjust the length and force between human and robot to ensure comfort. To allow the human to be guided safely and comfortably to the target position in complex environments, our proposed human motion planner can plan the traction force with the force-based human motion model. To track the planned force, we also propose a robot motion planner that can generate the specific robot motion command and design the force control device. Our system has been deployed on Unitree Laikago quadrupedal platform and validated in real-world scenarios. △ Less

Submitted 23 June, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2103.14984 [pdf, ps, other]

Realistic face animation generation from videos

Authors: Zihao Jian, Minshan Xie

Abstract: 3D face reconstruction and face alignment are two fundamental and highly related topics in computer vision. Recently, some works start to use deep learning models to estimate the 3DMM coefficients to reconstruct 3D face geometry. However, the performance is restricted due to the limitation of the pre-defined face templates. To address this problem, some end-to-end methods, which can completely byp… ▽ More 3D face reconstruction and face alignment are two fundamental and highly related topics in computer vision. Recently, some works start to use deep learning models to estimate the 3DMM coefficients to reconstruct 3D face geometry. However, the performance is restricted due to the limitation of the pre-defined face templates. To address this problem, some end-to-end methods, which can completely bypass the calculation of 3DMM coefficients, are proposed and attract much attention. In this report, we introduce and analyse three state-of-the-art methods in 3D face reconstruction and face alignment. Some potential improvement on PRN are proposed to further enhance its accuracy and speed. △ Less

Submitted 27 March, 2021; originally announced March 2021.

arXiv:1803.05263 [pdf, other]

Feature Selective Small Object Detection via Knowledge-based Recurrent Attentive Neural Network

Authors: Kai Yi, Zhiqiang Jian, Shitao Chen, Nanning Zheng

Abstract: At present, the performance of deep neural network in general object detection is comparable to or even surpasses that of human beings. However, due to the limitations of deep learning itself, the small proportion of feature pixels, and the occurence of blur and occlusion, the detection of small objects in complex scenes is still an open question. But we can not deny that real-time and accurate ob… ▽ More At present, the performance of deep neural network in general object detection is comparable to or even surpasses that of human beings. However, due to the limitations of deep learning itself, the small proportion of feature pixels, and the occurence of blur and occlusion, the detection of small objects in complex scenes is still an open question. But we can not deny that real-time and accurate object detection is fundamental to automatic perception and subsequent perception-based decision-making and planning tasks of autonomous driving. Considering the characteristics of small objects in autonomous driving scene, we proposed a novel method named KB-RANN, which based on domain knowledge, intuitive experience and feature attentive selection. It can focus on particular parts of image features, and then it tries to stress the importance of these features and strengthenes the learning parameters of them. Our comparative experiments on KITTI and COCO datasets show that our proposed method can achieve considerable results both in speed and accuracy, and can improve the effect of small object detection through self-selection of important features and continuous enhancement of proposed method, and deployed it in our self-developed autonomous driving car. △ Less

Submitted 20 April, 2019; v1 submitted 13 March, 2018; originally announced March 2018.

arXiv:0802.3081 [pdf]

A High-Q Microwave MEMS Resonator

Authors: Z. Jian, Y. Yuanwei, Z. Yong, Chen Chen, J. Shixing

Abstract: A High-Q microwave (K band) MEMS resonator is presented, which empolys substrate integrated waveguide (SIW) and micromachined via-hole arrays by ICP process. Nonradiation dielectric waveguide (NRD) is formed by metal filled via-hole arrays and grounded planes. The three dimensional (3D) high resistivity silicon substrate filled cavity resonator is fed by current probes using CPW line. This monol… ▽ More A High-Q microwave (K band) MEMS resonator is presented, which empolys substrate integrated waveguide (SIW) and micromachined via-hole arrays by ICP process. Nonradiation dielectric waveguide (NRD) is formed by metal filled via-hole arrays and grounded planes. The three dimensional (3D) high resistivity silicon substrate filled cavity resonator is fed by current probes using CPW line. This monolithic resonator results in low cost, high performance and easy integration with planar cicuits. The measured quality factor is beyond 180 and the resonance frequency is 21GHz.It shows a good agreement with the simulation results. The chip size is only 4.7mm x 4.6mm x 0.5mm. Finally, as an example of applications, a filter using two SIW resonators is designed. △ Less

Submitted 21 February, 2008; originally announced February 2008.

Comments: Submitted on behalf of EDA Publishing Association (http://irevues.inist.fr/EDA-Publishing)

Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2007, Stresa, lago Maggiore : Italie (2007)

arXiv:0711.3318 [pdf]

A Ku-Band Novel Micromachined Bandpass Filter with Two Transmission Zeros

Authors: Zhang Yong, Zhu Jian, Yu Yuanwei, Chen Chen, Jia Shi Xing

Abstract: This paper presents a micromachined bandpass filter with miniature size that has relatively outstanding performance. A silicon-based eight-order microstrip bandpass filter is fabricated and measured. A novel design method of the interdigital filter that can create two transmission zeros is described. The location of the transmission zeros can be shifted arbitrarily in the stopband. By adjusting… ▽ More This paper presents a micromachined bandpass filter with miniature size that has relatively outstanding performance. A silicon-based eight-order microstrip bandpass filter is fabricated and measured. A novel design method of the interdigital filter that can create two transmission zeros is described. The location of the transmission zeros can be shifted arbitrarily in the stopband. By adjusting the zero location properly, the filter provides much better skirt rejection and lower insertion loss than a conventional microstrip interdigital filter. To reduce the chip size, through-silicon-substrate-via-hole is used. Good experimental results are obtained. △ Less

Submitted 21 November, 2007; originally announced November 2007.

Comments: Submitted on behalf of TIMA Editions (http://irevues.inist.fr/tima-editions)

Journal ref: Dans Symposium on Design, Test, Integration and Packaging of MEMS/MOEMS - DTIP 2006, Stresa, Lago Maggiore : Italie (2006)

Showing 1–22 of 22 results for author: Jian, Z