-
Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level
Authors:
Ali Hassani,
Wen-Mei Hwu,
Humphrey Shi
Abstract:
Neighborhood attention reduces the cost of self attention by restricting each token's attention span to its nearest neighbors. This restriction, parameterized by a window size and dilation factor, draws a spectrum of possible attention patterns between linear projection and self attention. Neighborhood attention, and more generally sliding window attention patterns, have long been bounded by infra…
▽ More
Neighborhood attention reduces the cost of self attention by restricting each token's attention span to its nearest neighbors. This restriction, parameterized by a window size and dilation factor, draws a spectrum of possible attention patterns between linear projection and self attention. Neighborhood attention, and more generally sliding window attention patterns, have long been bounded by infrastructure, particularly in higher-rank spaces (2-D and 3-D), calling for the development of custom kernels, which have been limited in either functionality, or performance, if not both. In this work, we first show that neighborhood attention can be represented as a batched GEMM problem, similar to standard attention, and implement it for 1-D and 2-D neighborhood attention. These kernels on average provide 895% and 272% improvement in full precision latency compared to existing naive kernels for 1-D and 2-D neighborhood attention respectively. We find certain inherent inefficiencies in all unfused neighborhood attention kernels that bound their performance and lower-precision scalability. We also developed fused neighborhood attention; an adaptation of fused dot-product attention kernels that allow fine-grained control over attention across different spatial axes. Known for reducing the quadratic time complexity of self attention to a linear complexity, neighborhood attention can now enjoy a reduced and constant memory footprint, and record-breaking half precision latency. We observe that our fused kernels successfully circumvent some of the unavoidable inefficiencies in unfused implementations. While our unfused GEMM-based kernels only improve half precision performance compared to naive kernels by an average of 496% and 113% in 1-D and 2-D problems respectively, our fused kernels improve naive kernels by an average of 1607% and 581% in 1-D and 2-D problems respectively.
△ Less
Submitted 22 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Predicting Next Useful Location With Context-Awareness: The State-Of-The-Art
Authors:
Alireza Nezhadettehad,
Arkady Zaslavsky,
Rakib Abdur,
Siraj Ahmed Shaikh,
Seng W. Loke,
Guang-Li Huang,
Alireza Hassani
Abstract:
Predicting the future location of mobile objects reinforces location-aware services with proactive intelligence and helps businesses and decision-makers with better planning and near real-time scheduling in different applications such as traffic congestion control, location-aware advertisements, and monitoring public health and well-being. The recent developments in the smartphone and location sen…
▽ More
Predicting the future location of mobile objects reinforces location-aware services with proactive intelligence and helps businesses and decision-makers with better planning and near real-time scheduling in different applications such as traffic congestion control, location-aware advertisements, and monitoring public health and well-being. The recent developments in the smartphone and location sensors technology and the prevalence of using location-based social networks alongside the improvements in artificial intelligence and machine learning techniques provide an excellent opportunity to exploit massive amounts of historical and real-time contextual information to recognise mobility patterns and achieve more accurate and intelligent predictions. This survey provides a comprehensive overview of the next useful location prediction problem with context-awareness. First, we explain the concepts of context and context-awareness and define the next location prediction problem. Then we analyse nearly thirty studies in this field concerning the prediction method, the challenges addressed, the datasets and metrics used for training and evaluating the model, and the types of context incorporated. Finally, we discuss the advantages and disadvantages of different approaches, focusing on the usefulness of the predicted location and identifying the open challenges and future work on this subject by introducing two potential use cases of next location prediction in the automotive industry.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude
Authors:
Feng Liu,
Ryan Ashbaugh,
Nicholas Chimitt,
Najmul Hassan,
Ali Hassani,
Ajay Jaiswal,
Minchul Kim,
Zhiyuan Mao,
Christopher Perry,
Zhiyuan Ren,
Yiyang Su,
Pegah Varghaei,
Kai Wang,
Xingguang Zhang,
Stanley Chan,
Arun Ross,
Humphrey Shi,
Zhangyang Wang,
Anil Jain,
Xiaoming Liu
Abstract:
Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and…
▽ More
Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and drones as input and outputs a candidate list of identities from a gallery. The system is designed to address several challenges, including (i) low-quality imagery, (ii) large yaw and pitch angles, (iii) robust feature extraction to accommodate large intra-person variabilities and large inter-person similarities, and (iv) the large domain gap between training and test sets. FarSight combines the physics of imaging and deep learning models to enhance image restoration and biometric feature encoding. We test FarSight's effectiveness using the newly acquired IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) dataset. Notably, FarSight demonstrated a substantial performance increase on the BRIAR dataset, with gains of +11.82% Rank-20 identification and +11.3% TAR@1% FAR.
△ Less
Submitted 6 September, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Context Query Simulation for Smart Carparking Scenarios in the Melbourne CDB
Authors:
Shakthi Weerasinghe,
Arkaday Zaslavsky,
Alireza Hassani,
Seng W. Loke,
Alexey Medvedev,
Amin Abken
Abstract:
The rapid growth in Internet of Things (IoT) has ushered in the way for better context-awareness enabling more smarter applications. Although for the growth in the number of IoT devices, Context Management Platforms (CMPs) that integrate different domains of IoT to produce context information lacks scalability to cater to a high volume of context queries. Research in scalability and adaptation in…
▽ More
The rapid growth in Internet of Things (IoT) has ushered in the way for better context-awareness enabling more smarter applications. Although for the growth in the number of IoT devices, Context Management Platforms (CMPs) that integrate different domains of IoT to produce context information lacks scalability to cater to a high volume of context queries. Research in scalability and adaptation in CMPs are of significant importance due to this reason. However, there is limited methods to benchmarks and validate research in this area due to the lack of sizable sets of context queries that could simulate real-world situations, scenarios, and scenes. Commercially collected context query logs are not publicly accessible and deploying IoT devices, and context consumers in the real-world at scale is expensive and consumes a significant effort and time. Therefore, there is a need to develop a method to reliably generate and simulate context query loads that resembles real-world scenarios to test CMPs for scale. In this paper, we propose a context query simulator for the context-aware smart car parking scenario in Melbourne Central Business District in Australia. We present the process of generating context queries using multiple real-world datasets and publicly accessible reports, followed by the context query execution process. The context query generator matches the popularity of places with the different profiles of commuters, preferences, and traffic variations to produce a dataset of context query templates containing 898,050 records. The simulator is executable over a seven-day profile which far exceeds the simulation time of any IoT system simulator. The context query generation process is also generic and context query language independent.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems
Authors:
Shakthi Weerasinghe,
Arkady Zaslavsky,
Seng W. Loke,
Amin Abken,
Alireza Hassani
Abstract:
Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems for real-time context queries. This paper proposes a reinforcement learning based approach to adaptively cache context with the objective of minimizing the cost incurred by context management systems in responding to context queries. Our novel algorithms enable…
▽ More
Performance metrics-driven context caching has a profound impact on throughput and response time in distributed context management systems for real-time context queries. This paper proposes a reinforcement learning based approach to adaptively cache context with the objective of minimizing the cost incurred by context management systems in responding to context queries. Our novel algorithms enable context queries and sub-queries to reuse and repurpose cached context in an efficient manner. This approach is distinctive to traditional data caching approaches by three main features. First, we make selective context cache admissions using no prior knowledge of the context, or the context query load. Secondly, we develop and incorporate innovative heuristic models to calculate expected performance of caching an item when making the decisions. Thirdly, our strategy defines a time-aware continuous cache action space. We present two reinforcement learning agents, a value function estimating actor-critic agent and a policy search agent using deep deterministic policy gradient method. The paper also proposes adaptive policies such as eviction and cache memory scaling to complement our objective. Our method is evaluated using a synthetically generated load of context sub-queries and a synthetic data set inspired from real world data and query samples. We further investigate optimal adaptive caching configurations under different settings. This paper presents, compares, and discusses our findings that the proposed selective caching methods reach short- and long-term cost- and performance-efficiency. The paper demonstrates that the proposed methods outperform other modes of context management such as redirector mode, and database mode, and cache all policy by up to 60% in cost efficiency.
△ Less
Submitted 9 February, 2023; v1 submitted 22 December, 2022;
originally announced December 2022.
-
From Traditional Adaptive Data Caching to Adaptive Context Caching: A Survey
Authors:
Shakthi Weerasinghe,
Arkady Zaslavsky,
Seng W. Loke,
Alireza Hassani,
Amin Abken,
Alexey Medvedev
Abstract:
Context information is in demand more than ever with the rapid increase in the number of context-aware Internet of Things applications developed worldwide. Research in context and context-awareness is being conducted to broaden its applicability in light of many practical and technical challenges. One of the challenges is improving performance when responding to a large number of context queries.…
▽ More
Context information is in demand more than ever with the rapid increase in the number of context-aware Internet of Things applications developed worldwide. Research in context and context-awareness is being conducted to broaden its applicability in light of many practical and technical challenges. One of the challenges is improving performance when responding to a large number of context queries. Context Management Platforms that infer and deliver context to applications measure this problem using Quality of Service (QoS) parameters. Although caching is a proven way to improve QoS, transiency of context and features such as variability and heterogeneity of context queries pose an additional real-time cost management problem. This paper presents a critical survey of the state-of-the-art in adaptive data caching with the objective of developing a body of knowledge in cost- and performance-efficient adaptive caching strategies. We comprehensively survey a large number of research publications and evaluate, compare, and contrast different techniques, policies, approaches, and schemes in adaptive caching. Our critical analysis is motivated by the focus on adaptively caching context as a core research problem. A formal definition for adaptive context caching is then proposed, followed by identified features and requirements of a well-designed, objective optimal adaptive context caching strategy.
△ Less
Submitted 9 February, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
OneFormer: One Transformer to Rule Universal Image Segmentation
Authors:
Jitesh Jain,
Jiachen Li,
MangTik Chiu,
Ali Hassani,
Nikita Orlov,
Humphrey Shi
Abstract:
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best p…
▽ More
Universal Image Segmentation is not a new concept. Past attempts to unify image segmentation in the last decades include scene parsing, panoptic segmentation, and, more recently, new panoptic architectures. However, such panoptic architectures do not truly unify image segmentation because they need to be trained individually on the semantic, instance, or panoptic segmentation to achieve the best performance. Ideally, a truly universal framework should be trained only once and achieve SOTA performance across all three image segmentation tasks. To that end, we propose OneFormer, a universal image segmentation framework that unifies segmentation with a multi-task train-once design. We first propose a task-conditioned joint training strategy that enables training on ground truths of each domain (semantic, instance, and panoptic segmentation) within a single multi-task training process. Secondly, we introduce a task token to condition our model on the task at hand, making our model task-dynamic to support multi-task training and inference. Thirdly, we propose using a query-text contrastive loss during training to establish better inter-task and inter-class distinctions. Notably, our single OneFormer model outperforms specialized Mask2Former models across all three segmentation tasks on ADE20k, CityScapes, and COCO, despite the latter being trained on each of the three tasks individually with three times the resources. With new ConvNeXt and DiNAT backbones, we observe even more performance improvement. We believe OneFormer is a significant step towards making image segmentation more universal and accessible. To support further research, we open-source our code and models at https://github.com/SHI-Labs/OneFormer
△ Less
Submitted 26 December, 2022; v1 submitted 10 November, 2022;
originally announced November 2022.
-
StyleNAT: Giving Each Head a New Perspective
Authors:
Steven Walton,
Ali Hassani,
Xingqian Xu,
Zhangyang Wang,
Humphrey Shi
Abstract:
Image generation has been a long sought-after but challenging task, and performing the generation task in an efficient manner is similarly difficult. Often researchers attempt to create a "one size fits all" generator, where there are few differences in the parameter space for drastically different datasets. Herein, we present a new transformer-based framework, dubbed StyleNAT, targeting high-qual…
▽ More
Image generation has been a long sought-after but challenging task, and performing the generation task in an efficient manner is similarly difficult. Often researchers attempt to create a "one size fits all" generator, where there are few differences in the parameter space for drastically different datasets. Herein, we present a new transformer-based framework, dubbed StyleNAT, targeting high-quality image generation with superior efficiency and flexibility. At the core of our model, is a carefully designed framework that partitions attention heads to capture local and global information, which is achieved through using Neighborhood Attention (NA). With different heads able to pay attention to varying receptive fields, the model is able to better combine this information, and adapt, in a highly flexible manner, to the data at hand. StyleNAT attains a new SOTA FID score on FFHQ-256 with 2.046, beating prior arts with convolutional models such as StyleGAN-XL and transformers such as HIT and StyleSwin, and a new transformer SOTA on FFHQ-1024 with an FID score of 4.174. These results show a 6.4% improvement on FFHQ-256 scores when compared to StyleGAN-XL with a 28% reduction in the number of parameters and 56% improvement in sampling throughput. Code and models will be open-sourced at https://github.com/SHI-Labs/StyleNAT.
△ Less
Submitted 12 August, 2023; v1 submitted 10 November, 2022;
originally announced November 2022.
-
An Adaptive Neighborhood Partition Full Conditional Mutual Information Maximization Method for Feature Selection
Authors:
Gaoshuai Wang,
Fabrice Lauri,
Pu Wang,
Hongyuan Luo,
Amir Hajjam lL Hassani
Abstract:
Feature selection is used to eliminate redundant features and keep relevant features, it can enhance machine learning algorithm's performance and accelerate computing speed. In various methods, mutual information has attracted increasingly more attention as it's an effective criterion to measure variable correlation. However, current works mainly focus on maximizing the feature relevancy with clas…
▽ More
Feature selection is used to eliminate redundant features and keep relevant features, it can enhance machine learning algorithm's performance and accelerate computing speed. In various methods, mutual information has attracted increasingly more attention as it's an effective criterion to measure variable correlation. However, current works mainly focus on maximizing the feature relevancy with class label and minimizing the feature redundancy within selected features, we reckon that pursuing feature redundancy minimization is reasonable but not necessary because part of so-called redundant features also carries some useful information to promote performance. In terms of mutual information calculation, it may distort the true relationship between two variables without proper neighborhood partition. Traditional methods usually split the continuous variables into several intervals even ignore such influence. We theoretically prove how variable fluctuation negatively influences mutual information calculation. To remove the referred obstacles, for feature selection method, we propose a full conditional mutual information maximization method (FCMIM) which only considers the feature relevancy in two aspects. For obtaining a better partition effect and eliminating the negative influence of attribute fluctuation, we put up an adaptive neighborhood partition algorithm (ANP) with the feedback of mutual information maximization algorithm, the backpropagation process helps search for a proper neighborhood partition parameter. We compare our method with several mutual information methods on 17 benchmark datasets. Results of FCMIM are better than other methods based on different classifiers. Results show that ANP indeed promotes nearly all the mutual information methods' performance.
△ Less
Submitted 27 June, 2023; v1 submitted 21 October, 2022;
originally announced October 2022.
-
A GA-like Dynamic Probability Method With Mutual Information for Feature Selection
Authors:
Gaoshuai Wang,
Fabrice Lauri,
Amir Hajjam El Hassani
Abstract:
Feature selection plays a vital role in promoting the classifier's performance. However, current methods ineffectively distinguish the complex interaction in the selected features. To further remove these hidden negative interactions, we propose a GA-like dynamic probability (GADP) method with mutual information which has a two-layer structure. The first layer applies the mutual information method…
▽ More
Feature selection plays a vital role in promoting the classifier's performance. However, current methods ineffectively distinguish the complex interaction in the selected features. To further remove these hidden negative interactions, we propose a GA-like dynamic probability (GADP) method with mutual information which has a two-layer structure. The first layer applies the mutual information method to obtain a primary feature subset. The GA-like dynamic probability algorithm, as the second layer, mines more supportive features based on the former candidate features. Essentially, the GA-like method is one of the population-based algorithms so its work mechanism is similar to the GA. Different from the popular works which frequently focus on improving GA's operators for enhancing the search ability and lowering the converge time, we boldly abandon GA's operators and employ the dynamic probability that relies on the performance of each chromosome to determine feature selection in the new generation. The dynamic probability mechanism significantly reduces the parameter number in GA that making it easy to use. As each gene's probability is independent, the chromosome variety in GADP is more notable than in traditional GA, which ensures GADP has a wider search space and selects relevant features more effectively and accurately. To verify our method's superiority, we evaluate our method under multiple conditions on 15 datasets. The results demonstrate the outperformance of the proposed method. Generally, it has the best accuracy. Further, we also compare the proposed model to the popular heuristic methods like POS, FPA, and WOA. Our model still owns advantages over them.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Dilated Neighborhood Attention Transformer
Authors:
Ali Hassani,
Humphrey Shi
Abstract:
Transformers are quickly becoming one of the most heavily applied deep learning architectures across modalities, domains, and tasks. In vision, on top of ongoing efforts into plain transformers, hierarchical transformers have also gained significant attention, thanks to their performance and easy integration into existing frameworks. These models typically employ localized attention mechanisms, su…
▽ More
Transformers are quickly becoming one of the most heavily applied deep learning architectures across modalities, domains, and tasks. In vision, on top of ongoing efforts into plain transformers, hierarchical transformers have also gained significant attention, thanks to their performance and easy integration into existing frameworks. These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention. While effective at reducing self attention's quadratic complexity, local attention weakens two of the most desirable properties of self attention: long range inter-dependency modeling, and global receptive field. In this paper, we introduce Dilated Neighborhood Attention (DiNA), a natural, flexible and efficient extension to NA that can capture more global context and expand receptive fields exponentially at no additional cost. NA's local attention and DiNA's sparse global attention complement each other, and therefore we introduce Dilated Neighborhood Attention Transformer (DiNAT), a new hierarchical vision transformer built upon both. DiNAT variants enjoy significant improvements over strong baselines such as NAT, Swin, and ConvNeXt. Our large model is faster and ahead of its Swin counterpart by 1.6% box AP in COCO object detection, 1.4% mask AP in COCO instance segmentation, and 1.4% mIoU in ADE20K semantic segmentation. Paired with new frameworks, our large variant is the new state of the art panoptic segmentation model on COCO (58.5 PQ) and ADE20K (49.4 PQ), and instance segmentation model on Cityscapes (45.1 AP) and ADE20K (35.4 AP) (no extra data). It also matches the state of the art specialized semantic segmentation models on ADE20K (58.1 mIoU), and ranks second on Cityscapes (84.5 mIoU) (no extra data).
△ Less
Submitted 16 January, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition
Authors:
Yulin Wang,
Yang Yue,
Xinhong Xu,
Ali Hassani,
Victor Kulikov,
Nikita Orlov,
Shiji Song,
Humphrey Shi,
Gao Huang
Abstract:
Recent research has revealed that reducing the temporal and spatial redundancy are both effective approaches towards efficient video recognition, e.g., allocating the majority of computation to a task-relevant subset of frames or the most valuable image regions of each frame. However, in most existing works, either type of redundancy is typically modeled with another absent. This paper explores th…
▽ More
Recent research has revealed that reducing the temporal and spatial redundancy are both effective approaches towards efficient video recognition, e.g., allocating the majority of computation to a task-relevant subset of frames or the most valuable image regions of each frame. However, in most existing works, either type of redundancy is typically modeled with another absent. This paper explores the unified formulation of spatial-temporal dynamic computation on top of the recently proposed AdaFocusV2 algorithm, contributing to an improved AdaFocusV3 framework. Our method reduces the computational cost by activating the expensive high-capacity network only on some small but informative 3D video cubes. These cubes are cropped from the space formed by frame height, width, and video duration, while their locations are adaptively determined with a light-weighted policy network on a per-sample basis. At test time, the number of the cubes corresponding to each video is dynamically configured, i.e., video cubes are processed sequentially until a sufficiently reliable prediction is produced. Notably, AdaFocusV3 can be effectively trained by approximating the non-differentiable cropping operation with the interpolation of deep features. Extensive empirical results on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2 and Diving48) demonstrate that our model is considerably more efficient than competitive baselines.
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Distilling Facial Knowledge With Teacher-Tasks: Semantic-Segmentation-Features For Pose-Invariant Face-Recognition
Authors:
Ali Hassani,
Zaid El Shair,
Rafi Ud Duala Refat,
Hafiz Malik
Abstract:
This paper demonstrates a novel approach to improve face-recognition pose-invariance using semantic-segmentation features. The proposed Seg-Distilled-ID network jointly learns identification and semantic-segmentation tasks, where the segmentation task is then "distilled" (MobileNet encoder). Performance is benchmarked against three state-of-the-art encoders on a publicly available data-set emphasi…
▽ More
This paper demonstrates a novel approach to improve face-recognition pose-invariance using semantic-segmentation features. The proposed Seg-Distilled-ID network jointly learns identification and semantic-segmentation tasks, where the segmentation task is then "distilled" (MobileNet encoder). Performance is benchmarked against three state-of-the-art encoders on a publicly available data-set emphasizing head-pose variations. Experimental evaluations show the Seg-Distilled-ID network shows notable robustness benefits, achieving 99.9% test-accuracy in comparison to 81.6% on ResNet-101, 96.1% on VGG-19 and 96.3% on InceptionV3. This is achieved using approximately one-tenth of the top encoder's inference parameters. These results demonstrate distilling semantic-segmentation features can efficiently address face-recognition pose-invariance.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Transfer functions of FXLMS-based Multi-channel Multi-tone Active Noise Equalizers
Authors:
Miguel Ferrer,
María de Diego,
Gema Piñero,
Amin Hassani,
Marc Moonen,
Alberto González
Abstract:
Multi-channel Multi-tone Active Noise Equalizers can achieve different user-selected noise spectrum profiles even at different space positions. They can apply a different equalization factor at each noise frequency component and each control point. Theoretically, the value of the transfer function at the frequencies where the noise signal has energy is determined by the equalizer configuration. In…
▽ More
Multi-channel Multi-tone Active Noise Equalizers can achieve different user-selected noise spectrum profiles even at different space positions. They can apply a different equalization factor at each noise frequency component and each control point. Theoretically, the value of the transfer function at the frequencies where the noise signal has energy is determined by the equalizer configuration. In this work, we show how to calculate these transfer functions with a double aim: to verify that at the frequencies of interest the values imposed by the equalizer settings are obtained, and to characterize the behavior of these transfer functions in the rest of the spectrum, as well as to get clues to predict the convergence behaviour of the algorithm. The information provided thanks to these transfer functions serves as a practical alternative to the cumbersome statistical analysis of convergence, whose results are often of no practical use.
△ Less
Submitted 3 July, 2022;
originally announced July 2022.
-
Refactorisation of the Dirichlet convolution
Authors:
Ansar El Hassani
Abstract:
We present a new way to factor the dirichlet convolution for completely multiplicative functions whitch led us to constructing a ring that arise from the operations involved in the factorisation. We will conclude by some identities that was found during this work. An application of the results gives us a generalisation of the following Hardy formula:…
▽ More
We present a new way to factor the dirichlet convolution for completely multiplicative functions whitch led us to constructing a ring that arise from the operations involved in the factorisation. We will conclude by some identities that was found during this work. An application of the results gives us a generalisation of the following Hardy formula: $$ζ(x)^{2} = ζ(2x)\sum_{m=1}^{+\infty} \frac{2^{ω(m)}}{m^{x}}$$ which is: $$|ζ(z)|^{2} = ζ(2x)\sum_{m=1}^{+\infty}\frac{1}{m^{x}}2^{ω(m)}\prod_{p | m , p \in \mathbb{P}}^{ω(m)}\cos(y\ln(p^{v_{p}(m)}))$$ with: $z$ a complex number with $z = x+iy$ and $\Re(z) > 1 $ and x > 1 in Hardy's formula, $ω(m)$ number of unique primes in $m$, $v_{p}(m$ power of the prime $p$ in $m$.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
DiSparse: Disentangled Sparsification for Multitask Model Compression
Authors:
Xinglong Sun,
Ali Hassani,
Zhangyang Wang,
Gao Huang,
Humphrey Shi
Abstract:
Despite the popularity of Model Compression and Multitask Learning, how to effectively compress a multitask model has been less thoroughly analyzed due to the challenging entanglement of tasks in the parameter space. In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. We consider each task independently by disentangling the i…
▽ More
Despite the popularity of Model Compression and Multitask Learning, how to effectively compress a multitask model has been less thoroughly analyzed due to the challenging entanglement of tasks in the parameter space. In this paper, we propose DiSparse, a simple, effective, and first-of-its-kind multitask pruning and sparse training scheme. We consider each task independently by disentangling the importance measurement and take the unanimous decisions among all tasks when performing parameter pruning and selection. Our experimental results demonstrate superior performance on various configurations and settings compared to popular sparse training and pruning methods. Besides the effectiveness in compression, DiSparse also provides a powerful tool to the multitask learning community. Surprisingly, we even observed better performance than some dedicated multitask learning methods in several cases despite the high model sparsity enforced by DiSparse. We analyzed the pruning masks generated with DiSparse and observed strikingly similar sparse network architecture identified by each task even before the training starts. We also observe the existence of a "watershed" layer where the task relatedness sharply drops, implying no benefits in continued parameters sharing. Our code and models will be available at: https://github.com/SHI-Labs/DiSparse-Multitask-Model-Compression.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Neighborhood Attention Transformer
Authors:
Ali Hassani,
Steven Walton,
Jiachen Li,
Shen Li,
Humphrey Shi
Abstract:
We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention (SA) to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA. The sliding-window pattern allows NA's receptive field to grow without needing extra…
▽ More
We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention (SA) to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA. The sliding-window pattern allows NA's receptive field to grow without needing extra pixel shifts, and preserves translational equivariance, unlike Swin Transformer's Window Self Attention (WSA). We develop NATTEN (Neighborhood Attention Extension), a Python package with efficient C++ and CUDA kernels, which allows NA to run up to 40% faster than Swin's WSA while using up to 25% less memory. We further present Neighborhood Attention Transformer (NAT), a new hierarchical transformer design based on NA that boosts image classification and downstream vision performance. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20K, which is 1.9% ImageNet accuracy, 1.0% COCO mAP, and 2.6% ADE20K mIoU improvement over a Swin model with similar size. To support more research based on sliding-window attention, we open source our project and release our checkpoints at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .
△ Less
Submitted 16 May, 2023; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Adversarial Attacks on Speech Recognition Systems for Mission-Critical Applications: A Survey
Authors:
Ngoc Dung Huynh,
Mohamed Reda Bouadjenek,
Imran Razzak,
Kevin Lee,
Chetan Arora,
Ali Hassani,
Arkady Zaslavsky
Abstract:
A Machine-Critical Application is a system that is fundamentally necessary to the success of specific and sensitive operations such as search and recovery, rescue, military, and emergency management actions. Recent advances in Machine Learning, Natural Language Processing, voice recognition, and speech processing technologies have naturally allowed the development and deployment of speech-based co…
▽ More
A Machine-Critical Application is a system that is fundamentally necessary to the success of specific and sensitive operations such as search and recovery, rescue, military, and emergency management actions. Recent advances in Machine Learning, Natural Language Processing, voice recognition, and speech processing technologies have naturally allowed the development and deployment of speech-based conversational interfaces to interact with various machine-critical applications. While these conversational interfaces have allowed users to give voice commands to carry out strategic and critical activities, their robustness to adversarial attacks remains uncertain and unclear. Indeed, Adversarial Artificial Intelligence (AI) which refers to a set of techniques that attempt to fool machine learning models with deceptive data, is a growing threat in the AI and machine learning research community, in particular for machine-critical applications. The most common reason of adversarial attacks is to cause a malfunction in a machine learning model. An adversarial attack might entail presenting a model with inaccurate or fabricated samples as it's training data, or introducing maliciously designed data to deceive an already trained model. While focusing on speech recognition for machine-critical applications, in this paper, we first review existing speech recognition techniques, then, we investigate the effectiveness of adversarial attacks and defenses against these systems, before outlining research challenges, defense recommendations, and future work. This paper is expected to serve researchers and practitioners as a reference to help them in understanding the challenges, position themselves and, ultimately, help them to improve existing models of speech recognition for mission-critical applications. Keywords: Mission-Critical Applications, Adversarial AI, Speech Recognition Systems.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Dynamic Models of Spherical Parallel Robots for Model-Based Control Schemes
Authors:
Ali Hassani,
Abbas Bataleblu,
S. A. Khalilpour,
Hamid D. Taghirad,
Philippe Cardou
Abstract:
In this paper, derivation of different forms of dynamic formulation of spherical parallel robots (SPRs) is investigated. These formulations include the explicit dynamic forms, linear regressor, and Slotine-Li (SL) regressor, which are required for the design and implementation of the vast majority of model-based controllers and dynamic parameters identification schemes. To this end, the implicit d…
▽ More
In this paper, derivation of different forms of dynamic formulation of spherical parallel robots (SPRs) is investigated. These formulations include the explicit dynamic forms, linear regressor, and Slotine-Li (SL) regressor, which are required for the design and implementation of the vast majority of model-based controllers and dynamic parameters identification schemes. To this end, the implicit dynamic of SPRs is first formulated using the principle of virtual work in task-space, and then by using an extension, their explicit dynamic formulation is derived. The dynamic equation is then analytically reformulated into linear and S-L regression form with respect to the inertial parameters, and by using the Gauss-Jordan procedure, it is reduced to a unique and closed-form structure. Finally, to illustrate the effectiveness of the proposed method, two different SPRs, namely, the ARAS-Diamond, and the 3-RRR, are examined as the case studies. The obtained results are verified by using the MSC-ADAMS software, and are shared to interested audience for public access.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
ConvMLP: Hierarchical Convolutional MLPs for Vision
Authors:
Jiachen Li,
Ali Hassani,
Steven Walton,
Humphrey Shi
Abstract:
MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, singl…
▽ More
MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy computation. To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs. In particular, ConvMLP-S achieves 76.8% top-1 accuracy on ImageNet-1k with 9M parameters and 2.4G MACs (15% and 19% of MLP-Mixer-B/16, respectively). Experiments on object detection and semantic segmentation further show that visual representation learned by ConvMLP can be seamlessly transferred and achieve competitive results with fewer parameters. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Convolutional-MLPs.
△ Less
Submitted 18 September, 2021; v1 submitted 9 September, 2021;
originally announced September 2021.
-
Escaping the Big Data Paradigm with Compact Transformers
Authors:
Ali Hassani,
Steven Walton,
Nikhil Shah,
Abulikemu Abuduweili,
Jiachen Li,
Humphrey Shi
Abstract:
With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains a…
▽ More
With the rise of Transformers as the standard for language processing, and their advancements in computer vision, there has been a corresponding growth in parameter size and amounts of training data. Many have come to believe that because of this, transformers are not suitable for small sets of data. This trend leads to concerns such as: limited availability of data in certain scientific domains and the exclusion of those with limited resource from research in the field. In this paper, we aim to present an approach for small-scale learning by introducing Compact Transformers. We show for the first time that with the right size, convolutional tokenization, transformers can avoid overfitting and outperform state-of-the-art CNNs on small datasets. Our models are flexible in terms of model size, and can have as little as 0.28M parameters while achieving competitive results. Our best model can reach 98% accuracy when training from scratch on CIFAR-10 with only 3.7M parameters, which is a significant improvement in data-efficiency over previous Transformer based models being over 10x smaller than other transformers and is 15% the size of ResNet50 while achieving similar performance. CCT also outperforms many modern CNN based approaches, and even some recent NAS-based approaches. Additionally, we obtain a new SOTA result on Flowers-102 with 99.76% top-1 accuracy, and improve upon the existing baseline on ImageNet (82.71% accuracy with 29% as many parameters as ViT), as well as NLP tasks. Our simple and compact design for transformers makes them more feasible to study for those with limited computing resources and/or dealing with small datasets, while extending existing research efforts in data efficient transformers. Our code and pre-trained models are publicly available at https://github.com/SHI-Labs/Compact-Transformers.
△ Less
Submitted 7 June, 2022; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Text Mining using Nonnegative Matrix Factorization and Latent Semantic Analysis
Authors:
Ali Hassani,
Amir Iranmanesh,
Najme Mansouri
Abstract:
Text clustering is arguably one of the most important topics in modern data mining. Nevertheless, text data require tokenization which usually yields a very large and highly sparse term-document matrix, which is usually difficult to process using conventional machine learning algorithms. Methods such as Latent Semantic Analysis have helped mitigate this issue, but are nevertheless not completely s…
▽ More
Text clustering is arguably one of the most important topics in modern data mining. Nevertheless, text data require tokenization which usually yields a very large and highly sparse term-document matrix, which is usually difficult to process using conventional machine learning algorithms. Methods such as Latent Semantic Analysis have helped mitigate this issue, but are nevertheless not completely stable in practice. As a result, we propose a new feature agglomeration method based on Nonnegative Matrix Factorization, which is employed to separate the terms into groups, and then each group's term vectors are agglomerated into a new feature vector. Together, these feature vectors create a new feature space much more suitable for clustering. In addition, we propose a new deterministic initialization for spherical K-Means, which proves very useful for this specific type of data. In order to evaluate the proposed method, we compare it to some of the latest research done in this field, as well as some of the most practiced methods. In our experiments, we conclude that the proposed method either significantly improves clustering performance, or maintains the performance of other methods, while improving stability in results.
△ Less
Submitted 24 February, 2020; v1 submitted 12 November, 2019;
originally announced November 2019.
-
DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering
Authors:
Ali Hassani,
Amir Iranmanesh,
Mahdi Eftekhari,
Abbas Salemi
Abstract:
One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-Means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number…
▽ More
One of the applications of center-based clustering algorithms such as K-Means is partitioning data points into K clusters. In some examples, the feature space relates to the underlying problem we are trying to solve, and sometimes we can obtain a suitable feature space. Nevertheless, while K-Means is one of the most efficient offline clustering algorithms, it is not equipped to estimate the number of clusters, which is useful in some practical cases. Other practical methods which do are simply too complex, as they require at least one run of K-Means for each possible K. In order to address this issue, we propose a K-Means initialization similar to K-Means++, which would be able to estimate K based on the feature space while finding suitable initial centroids for K-Means in a deterministic manner. Then we compare the proposed method, DISCERN, with a few of the most practical K estimation methods, while also comparing clustering results of K-Means when initialized randomly, using K-Means++ and using DISCERN. The results show improvement in both the estimation and final clustering performance.
△ Less
Submitted 22 September, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Optimal Control of Storage Regeneration with Repair Codes
Authors:
Francesco De Pellegrini,
Rachid El Azouzi,
Alonso Silva,
and Olfa Hassani
Abstract:
High availability of containerized applications requires to perform robust storage of applications' state. Since basic replication techniques are extremely costly at scale, storage space requirements can be reduced by means of erasure or repairing codes. In this paper we address storage regeneration using repair codes, a robust distributed storage technique with no need to fully restore the whole…
▽ More
High availability of containerized applications requires to perform robust storage of applications' state. Since basic replication techniques are extremely costly at scale, storage space requirements can be reduced by means of erasure or repairing codes. In this paper we address storage regeneration using repair codes, a robust distributed storage technique with no need to fully restore the whole state in case of failure. In fact, only the lost servers' content is replaced. To do so, new cleanslate storage units are made operational at a cost for activating new storage servers and a cost for the transfer of repair data. Our goal is to guarantee maximal availability of containers' state files by a given deadline. activation of servers and communication cost. Upon a fault occurring at a subset of the storage servers, we aim at ensuring that they are repaired by a given deadline. We introduce a controlled fluid model and derive the optimal activation policy to replace servers under such correlated faults. The solution concept is the optimal control of regeneration via the Pontryagin minimum principle. We characterise feasibility conditions and we prove that the optimal policy is of threshold type. Numerical results describe how to apply the model for system dimensioning and show the tradeoff between
△ Less
Submitted 8 November, 2017;
originally announced November 2017.
-
Design, Development and Evaluation of a UAV to Study Air Quality in Qatar
Authors:
Khalid Al-Hajjaji,
Mouadh Ezzin,
Husain Khamdan,
Abdelhakim El Hassani,
Nizar Zorba
Abstract:
Measuring gases for air quality monitoring is a challenging task that claims a lot of time of observation and large numbers of sensors. The aim of this project is to develop a partially autonomous unmanned aerial vehicle (UAV) equipped with sensors, in order to monitor and collect air quality real time data in designated areas and send it to the ground base. This project is designed and implemente…
▽ More
Measuring gases for air quality monitoring is a challenging task that claims a lot of time of observation and large numbers of sensors. The aim of this project is to develop a partially autonomous unmanned aerial vehicle (UAV) equipped with sensors, in order to monitor and collect air quality real time data in designated areas and send it to the ground base. This project is designed and implemented by a multidisciplinary team from electrical and computer engineering departments. The electrical engineering team responsible for implementing air quality sensors for detecting real time data and transmit it from the plane to the ground. On the other hand, the computer engineering team is in charge of Interface sensors and provide platform to view and visualize air quality data and live video streaming. The proposed project contains several sensors to measure Temperature, Humidity, Dust, CO, CO2 and O3. The collected data is transmitted to a server over a wireless internet connection and the server will store, and supply these data to any party who has permission to access it through android phone or website in semi-real time. The developed UAV has carried several field tests in Al Shamal airport in Qatar, with interesting results and proof of concept outcomes.
△ Less
Submitted 17 September, 2017;
originally announced September 2017.
-
Factorisation of the product of Dirichlet series of completely multiplicative functions
Authors:
Ansar El Hassani
Abstract:
In the first chapter, we will present a computation of the square value of the module of L functions associated to a Dirichlet character. This computation suggests to ask if a certain ring of arithmetic multiplicative functions exists and if it is unique. This search has led to the construction of that ring in chapter two. Finally, in the third chapter, we will present some propositions associated…
▽ More
In the first chapter, we will present a computation of the square value of the module of L functions associated to a Dirichlet character. This computation suggests to ask if a certain ring of arithmetic multiplicative functions exists and if it is unique. This search has led to the construction of that ring in chapter two. Finally, in the third chapter, we will present some propositions associated with this ring. The result below is one of the main results of this work :
For F and G two completely multiplicative functions, $ s $ a complex number such as the dirichlet series $ D(F,s) $ and $ D(G,s) $ converge :
$ \forall F,G \in \mathbb{M}_{c} : D(F,s) \times D(G,s) = D(F \times G,2s) \times D(F \square G,s) $
where the operation $ \square $ is defined in chapter two as the sum of the previously mentioned ring. Here are some similar versions, with $ s = x+iy $ :
$ \forall F, G \in \mathbb{M}_{c} : ~ D(F,s) \times D(G,\overline{s}) = D(F \times G,2x) \times D(\frac{F}{\text{Id}_{e}^{iy}} \square \frac{G}{\text{Id}_{e}^{-iy}}, x) $
$ \forall F, G \in \mathbb{M}_{c} : ~ |D(F,s)|^{2} = D(|F|^{2},2x) \times D(\frac{F}{\text{Id}_{e}^{iy}} \square \overline{\frac{F}{\text{Id}_{e}^{iy}}}, x) $
△ Less
Submitted 8 February, 2017;
originally announced February 2017.
-
Hadronic shift in pionic hydrogen
Authors:
M. Hennebach,
D. F. Anagnostopoulos,
A. Dax,
H. Fuhrmann,
D. Gotta,
A. Gruber,
A. Hirtl,
P. Indelicato,
Y. -W. Liu,
B. Manil,
V. E. Markushin,
A. J. Rusi el Hassani,
L. M. Simons,
M. Trassinelli,
J. Zmeskal
Abstract:
The hadronic shift in pionic hydrogen has been redetermined to be $ε_{1s}=7.086\,\pm\,0.007(stat)\,\pm\,0.006(sys)$\,eV by X-ray spectroscopy of ground state transitions applying various energy calibration schemes. The experiment was performed at the high-intensity low-energy pion beam of the Paul Scherrer Institut by using the cyclotron trap and an ultimate-resolution Bragg spectrometer with bent…
▽ More
The hadronic shift in pionic hydrogen has been redetermined to be $ε_{1s}=7.086\,\pm\,0.007(stat)\,\pm\,0.006(sys)$\,eV by X-ray spectroscopy of ground state transitions applying various energy calibration schemes. The experiment was performed at the high-intensity low-energy pion beam of the Paul Scherrer Institut by using the cyclotron trap and an ultimate-resolution Bragg spectrometer with bent crystals.
△ Less
Submitted 17 December, 2014; v1 submitted 25 June, 2014;
originally announced June 2014.
-
Computer Aided Tolerancing Based on Analysis and Synthetizes of Tolerances Method
Authors:
Abdessalem Hassani,
Nizar Aifaoui,
Abdelmajid Benamara,
Serge Samper
Abstract:
The tolerancing step has a great importance in the design process. It characterises the relationship between the different sectors of the product life cycle: Design, Manufacturing and Control. We can distinguish several methods to assist the tolerancing process in the design. Based on arithmetic and statistical method, this paper presents a new approach of analysis and verification of tolerances.…
▽ More
The tolerancing step has a great importance in the design process. It characterises the relationship between the different sectors of the product life cycle: Design, Manufacturing and Control. We can distinguish several methods to assist the tolerancing process in the design. Based on arithmetic and statistical method, this paper presents a new approach of analysis and verification of tolerances. The chosen approach is based on the Worst Case Method as an arithmetic method and Monte Carlo method as a statistical method. In this paper, we compare these methods and we present our main approach, which is validated using an example of 1 D tolerancing.
△ Less
Submitted 13 February, 2011;
originally announced February 2011.
-
On Strong $(a)$-Rings
Authors:
Najib Mahdou,
Aziza Rahmouni Hassani
Abstract:
In this paper, we introduce a strong property $(A)$ and we study the transfer of property $(A)$ and strong property $(A)$ in trivial ring extensions and amalgamated duplication of a ring along an ideal. We also exhibit a class of rings which satisfy property $(A)$ and do not satisfy strong property $(A)$.
In this paper, we introduce a strong property $(A)$ and we study the transfer of property $(A)$ and strong property $(A)$ in trivial ring extensions and amalgamated duplication of a ring along an ideal. We also exhibit a class of rings which satisfy property $(A)$ and do not satisfy strong property $(A)$.
△ Less
Submitted 10 August, 2009;
originally announced August 2009.