Mehrsan Javan

Montreal, Quebec, Canada Contact Info
2K followers 500+ connections

Join to view profile

About

I am an entrepreneur and technology executive, currently the CTO and Co-founder of…

Activity

Join now to see all activity

Experience & Education

  • Sportlogiq

View Mehrsan’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Volunteer Experience

  • Exhibits co-chair

    International Conference on Computer Vision (ICCV) 2021

    - 1 year 1 month

    Science and Technology

  • McGill IEEE Student Branch Graphic

    Secretary

    McGill IEEE Student Branch

    - 1 year

    Education

  • President

    McGill Iranian Student Association (MISA)

    - 2 years 4 months

    Education

Publications

  • Actor-Transformers for Group Activity Recognition

    2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020)

    This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-specific static and dynamic representations expressed by features from a 2D pose network and 3D…

    This paper strives to recognize individual actions and group activities from videos. While existing solutions for this challenging problem explicitly model spatial and temporal relationships based on location of individual actors, we propose an actor-transformer model able to learn and selectively extract information relevant for group activity recognition. We feed the transformer with rich actor-specific static and dynamic representations expressed by features from a 2D pose network and 3D CNN, respectively. We empirically study different ways to combine these representations and show their complementary benefits. Experiments show what is important to transform and how it should be transformed. What is more, actor-transformers achieve state-of-the-art results on two publicly available benchmarks for group activity recognition, outperforming the previous best published results by a considerable margin.

    See publication
  • Optimizing Through Learned Errors for Accurate Sports Field Registration

    The IEEE Winter Conference on Applications of Computer Vision (WACV)

    We propose an optimization-based framework to register sports field templates onto broadcast videos. For accurate registration we go beyond the prevalent feed-forward paradigm. Instead, we propose to train a deep network that regresses the registration error, and then register images by finding the registration parameters that minimize the regressed error. We demonstrate the effectiveness of our method by applying it to real-world sports broadcast videos, outperforming the state of the art. We…

    We propose an optimization-based framework to register sports field templates onto broadcast videos. For accurate registration we go beyond the prevalent feed-forward paradigm. Instead, we propose to train a deep network that regresses the registration error, and then register images by finding the registration parameters that minimize the regressed error. We demonstrate the effectiveness of our method by applying it to real-world sports broadcast videos, outperforming the state of the art. We further apply our method on a synthetic toy example and demonstrate that our method brings significant gains even when the problem is simplified and unlimited training data is available.

    See publication
  • Pose Guided Gated Fusion for Person Re-identification

    The IEEE Winter Conference on Applications of Computer Vision (WACV)

    Person re-identification is an important yet challenging problem in visual recognition. Despite the recent advances with deep learning (DL) models for spatio-temporal and multi-modal fusion, re-identification approaches often fail to leverage the contextual information (e.g., pose and illu- mination) to dynamically select the most discriminant con- volutional filters (i.e., appearance features) for feature rep- resentation and inference. State-of-the-art techniques for gated fusion employ…

    Person re-identification is an important yet challenging problem in visual recognition. Despite the recent advances with deep learning (DL) models for spatio-temporal and multi-modal fusion, re-identification approaches often fail to leverage the contextual information (e.g., pose and illu- mination) to dynamically select the most discriminant con- volutional filters (i.e., appearance features) for feature rep- resentation and inference. State-of-the-art techniques for gated fusion employ complex dedicated part- or attention- based architectures for late fusion, and do not incorpo- rate pose and appearance information to train the back- bone network. In this paper, a new DL model is proposed for pose-guided re-identification, comprised of a deep back- bone, pose estimation, and gated fusion network. Given a query image of an individual, the backbone convolutional NN produces a feature embedding required for pair-wise matching with embeddings for reference images, where fea- ture maps from the pose network and from mid-level CNN layers are combined by the gated fusion network to gen- erate pose-guided gating. The proposed framework al- lows to dynamically activate the most discriminant CNN filters based on pose information in order to perform a finer grained recognition. Extensive experiments on three challenging benchmark datasets indicate that integrating the pose-guided gated fusion into the state-of-the-art re- identification backbone architecture allows to improve their recognition accuracy. Experimental results also support our intuition on the advantages of gating backbone appear- ance information using the pose feature maps at mid-level CNN layers.

    See publication
  • Learning Agent Representations for Ice Hockey

    Neural Information Processing Systems 2020 - NeurIPS

    Team sports is a new application domain for agent modeling with high real-world impact. A fundamental challenge for modeling professional players is their large number (over 1K), which includes many bench players with sparse participation in a game season. The diversity and sparsity of player observations make it difficult to extend previous agent representation models to the sports domain. This paper develops a new approach for agent representations, based on a Markov game model, that is…

    Team sports is a new application domain for agent modeling with high real-world impact. A fundamental challenge for modeling professional players is their large number (over 1K), which includes many bench players with sparse participation in a game season. The diversity and sparsity of player observations make it difficult to extend previous agent representation models to the sports domain. This paper develops a new approach for agent representations, based on a Markov game model, that is tailored towards applications in professional ice hockey. We introduce a novel player representation via player generation framework where a variational encoder embeds player information with latent variables. The encoder learns a context-specific shared prior to induce a shrinkage effect for the posterior player representations, allowing it to share statistical information across players with different participations. To model the play dynamics in sequential sports data, we design a Variational Recurrent Ladder Agent Encoder (VaRLAE). It learns a contextualized player representation with a hierarchy of latent variables that effectively prevents latent posterior collapse. We validate our player representations in major sports analytics tasks. Our experimental results, based on a large dataset that contains over 4.5M events, show state-of-the-art performance for our VarLAE on facilitating 1) identifying the acting player, 2) estimating expected goals, and 3) predicting the final score difference.

  • An On-Line, Real-Time Learning Method For Detecting Anomalies In Videos Using Spatio-Temporal Compositions

    Computer Vision and Image Understanding

    This paper presents an approach for detecting suspicious events in videos by using only the video itself as the training samples for valid behaviors. These salient events are obtained in real-time by detecting anomalous spatio-temporal regions in a densely sampled video. The method codes a video as a compact set of spatio-temporal volumes, while considering the uncertainty in the codebook construction. The spatio-temporal compositions of video volumes are modeled using a probabilistic…

    This paper presents an approach for detecting suspicious events in videos by using only the video itself as the training samples for valid behaviors. These salient events are obtained in real-time by detecting anomalous spatio-temporal regions in a densely sampled video. The method codes a video as a compact set of spatio-temporal volumes, while considering the uncertainty in the codebook construction. The spatio-temporal compositions of video volumes are modeled using a probabilistic framework, which calculates their likelihood of being normal in the video. This approach can be considered as an extension of the Bag of Video words (BOV) approaches, which represent a video as an order-less distribution of video volumes. The proposed method imposes spatial and temporal constraints on the video volumes so that an inference mechanism can estimate the probability density functions of their arrangements. Anomalous events are assumed to be video arrangements with very low frequency of occurrence. The algorithm is very fast and does not employ background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. Experiments were performed on four video datasets of abnormal activities in both crowded and non-crowded scenes and under difficult illumination conditions. The proposed method outperformed all other approaches based on BOV that do not account for contextual information.

    Other authors
    See publication
  • Human activity recognition in videos using a single example

    Image and Vision Computing

    This paper presents a novel approach for action recognition, localization and video matching based on a hierarchical codebook model of local spatio-temporal video volumes. Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a target video dataset. The method is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also…

    This paper presents a novel approach for action recognition, localization and video matching based on a hierarchical codebook model of local spatio-temporal video volumes. Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a target video dataset. The method is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm codes a video as a compact set of spatio-temporal volumes, while considering their spatio-temporal compositions in order to account for spatial and temporal contextual information. This hierarchy is achieved by first constructing a codebook of spatio-temporal video volumes. Then a large contextual volume containing many spatio-temporal volumes (ensemble of volumes) is considered. These ensembles are used to construct a probabilistic model of video volumes and their spatio-temporal compositions. The algorithm was applied to three available video datasets for action recognition with different complexities (KTH, Weizmann, and MSR II) and the results were superior to other approaches, especially in the case of a single training example and cross-dataset action recognition.

    Other authors
    See publication
  • Online Dominant and Anomalous Behavior Detection in Videos

    2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013)

    We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the…

    We present a novel approach for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. Dominant behaviors are those occurring frequently in videos and hence, usually do not attract much attention. They can be characterized by different complexities in space and time, ranging from a scene background to human activities. In contrast, an anomalous behavior is defined as having a low likelihood of occurrence. We do not employ any models of the entities in the scene in order to detect these two kinds of behaviors.
    In this paper, video events are learnt at each pixel without supervision using densely constructed spatio-temporal video volumes. Furthermore, the volumes are organized into large contextual graphs. These compositions are employed to construct a hierarchical codebook model for the dominant behaviors. By decomposing spatio-temporal contextual information into unique spatial and temporal contexts, the proposed framework learns the models of the dominant spatial and temporal events. Thus, it is ultimately capable of simultaneously modeling high-level behaviors as well as low-level spatial, temporal and spatio-temporal pixel level changes.

    Other authors
    See publication
  • A Multi-Scale Hierarchical Codebook Method for Human Action Recognition in Videos Using a Single Example

    Computer and Robot Vision (CRV), 2012 Ninth Conference on

    This paper presents a novel action matching method based on a hierarchical codebook of local spatio-temporal video volumes (STVs). Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a video dataset. It is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some…

    This paper presents a novel action matching method based on a hierarchical codebook of local spatio-temporal video volumes (STVs). Given a single example of an activity as a query video, the proposed method finds similar videos to the query in a video dataset. It is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm yields a compact subset of salient code words of STVs for the query video, and then the likelihood of similarity between the query video and all STVs in the target video is measured using a probabilistic inference mechanism. This hierarchy is achieved by initially constructing a codebook of STVs, while considering the uncertainty in the codebook construction, which is always ignored in current versions of the BOV approach. At the second level of the hierarchy, a large contextual region containing many STVs (Ensemble of STVs) is considered in order to construct a probabilistic model of STVs and their spatio-temporal compositions. At the third level of the hierarchy a codebook is formed for the ensembles of STVs based on their contextual similarities. The latter are the proposed labels (code words) for the actions being exhibited in the video. Finally, at the highest level of the hierarchy, the salient labels for the actions are selected by analyzing the high level code words assigned to each image pixel as a function of time. The algorithm was applied to three available video datasets for action recognition with different complexities (KTH, Weizmann, and MSR II) and the results were superior to other approaches, especially in the cases of a single training example and cross-dataset action recognition.

    Other authors
    See publication
  • Imitative Learning Based Emotional Controller for Unknown Systems with Unstable Equilibrium

    International Journal of Intelligent Computing and Cybernetics

    In this paper, a novel approach for controlling unstable systems or systems with unstable equilibrium by model free controllers is proposed. This approach is based on imitative learning in preliminary phase of learning and soft switching to an interactive emotional learning. Moreover, FISs are used to model the linguistic knowledge of the ascendancy and situated importance of the objectives. These FISs are used to attentionally modulate the stress signals for the emotional controller. The…

    In this paper, a novel approach for controlling unstable systems or systems with unstable equilibrium by model free controllers is proposed. This approach is based on imitative learning in preliminary phase of learning and soft switching to an interactive emotional learning. Moreover, FISs are used to model the linguistic knowledge of the ascendancy and situated importance of the objectives. These FISs are used to attentionally modulate the stress signals for the emotional controller. The results of proposed strategy on two benchmarks reveal the efficacy of this strategy of model free control.

    Other authors
    See publication
  • Static, Dynamic and Mixed Eccentricity Fault Diagnosis in Permanent Magnet Synchronous Motors

    IEEE Transactions on Industrial Electronics

    Mixed-eccentricity (ME) fault diagnosis has not been so far documented for permanent-magnet (PM) synchronous motors (PMSMs). This paper investigates how the static eccentricity (SE), dynamic eccentricity (DE), and ME in three-phase PMSMs can be detected. A novel index for noninvasive diagnosis of these eccentricities is introduced for a faulty PMSM. The nominated index is the amplitude of sideband components with a particular frequency pattern which is extracted from the spectrum of stator…

    Mixed-eccentricity (ME) fault diagnosis has not been so far documented for permanent-magnet (PM) synchronous motors (PMSMs). This paper investigates how the static eccentricity (SE), dynamic eccentricity (DE), and ME in three-phase PMSMs can be detected. A novel index for noninvasive diagnosis of these eccentricities is introduced for a faulty PMSM. The nominated index is the amplitude of sideband components with a particular frequency pattern which is extracted from the spectrum of stator current. Using this index makes it possible to determine the occurrence, as well as the type and percentage, of eccentricity precisely. Meanwhile, the current spectrum of the faulty PMSM during a large span is inspected, and the ability of the proposed index is exhibited to detect eccentricity in faulty PMSMs with different loads. A novel theoretical scrutiny based on a magnetic field analysis is presented to prove the introduced index and generalize the illustrated fault recognition method. To show the merit of this index in the eccentricity detection and estimation of its severity, first, the correlation between the index and the SE and DE degrees is determined. Then, the type of the eccentricity is determined by a k-nearest neighbor classifier. At the next step, a three-layer artificial neural network is employed to estimate the eccentricity degree and its type. After all, a white Gaussian noise is added to the simulated current, and the robustness of the proposed index is analyzed with respect to the noise variance. In this paper, the PMSM under magnetic fault (demagnetization) and electrical faults (short and open circuits) is modeled, and the current spectrum of the faulty PMSM under demagnetization, short circuit, and open circuit faults is analyzed. It is demonstrated that the proposed index, due to eccentricity fault, is not generated in the current spectrum due to magnetic and electrical faults.

    Other authors
    See publication
  • A Novel Algorithm for Straightening Highly Curved Images of Human Chromosome

    Pattern Recognition Letters

    An effective chromosome image processing algorithm for straightening highly curved chromosomes is presented. This will extend the domain of application of most of the previously reported algorithms to the curved chromosomes. The proposed algorithm is based on the calculating and analyzing the vertical and horizontal projection vectors of the binary image of the chromosome obtained at various rotation angles. The binary image is obtained by thresholding the input image after histogram…

    An effective chromosome image processing algorithm for straightening highly curved chromosomes is presented. This will extend the domain of application of most of the previously reported algorithms to the curved chromosomes. The proposed algorithm is based on the calculating and analyzing the vertical and horizontal projection vectors of the binary image of the chromosome obtained at various rotation angles. The binary image is obtained by thresholding the input image after histogram modification. By minimizing a rotation score S which is defined based on the relative amplitude of the main peaks in the horizontal projections of the rotated pictures, the most appropriately rotated image is identified. This picture is used to determine the bending axis and from which the bending centre of the chromosome, which is then used to artificially straighten the curved chromosome. When applied to the real images of highly curved chromosomes the proposed algorithm could straighten all of the chromosome images within the dataset. To assess the effectiveness of the proposed algorithm, the automatically extracted bending centers are compared to the manually defined ones on the whole data set. Moreover, the density profiles of the chromosomes (a one-dimensional vector obtained by intensity sampling of the chromosome along its longitudinal axis), which is the most important and most commonly used feature for classification purposes, are identified and compared before and after chromosome straightening. The quantitative analysis of the results in both cases showed a close correlation between the two.

    Other authors
    • S.K. Setarehdan
    See publication

Projects

  • Human Activity Recognition in Videos From a Single Example

    We are working on a novel action matching method based on a hierarchical codebook of local spatio-temporal video volumes (STVs). Given a single example of an activity as a query video, the proposed method tries to find similar videos to the query in a video dataset. It is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well…

    We are working on a novel action matching method based on a hierarchical codebook of local spatio-temporal video volumes (STVs). Given a single example of an activity as a query video, the proposed method tries to find similar videos to the query in a video dataset. It is based on the bag of video words (BOV) representation and does not require prior knowledge about actions, background subtraction, motion estimation or tracking. It is also robust to spatial and temporal scale changes, as well as some deformations. The hierarchical algorithm yields a compact subset of salient codewords of STVs for the query video, and then the likelihood of similarity between the query video and all STVs in the target video is measured using a probabilistic inference mechanism.

    Other creators
    See project
  • Behavior Understanding in Unconstrained Surveillance Videos

    - Present

    Video surveillance systems are widely used in many applications such as nursing care institutions, law enforcement and building security. In most circumstances, it is necessary for humans to analyze the videos, which is inefficient in terms of accuracy and cost. In light of this, together with the tremendous number of such videos produced on a daily basis, there is a great demand for a real-time automated system that detects and locates suspicious behaviors and alerts security agents…

    Video surveillance systems are widely used in many applications such as nursing care institutions, law enforcement and building security. In most circumstances, it is necessary for humans to analyze the videos, which is inefficient in terms of accuracy and cost. In light of this, together with the tremendous number of such videos produced on a daily basis, there is a great demand for a real-time automated system that detects and locates suspicious behaviors and alerts security agents. Therefore, detecting unusual objects or suspicious behaviors in a scene is the primary objective of an automated surveillance system. We refer to this activity as anomaly detection because the sought-after situations are not observed regularly. In other words, all such systems are based on the implicit assumption that things that occur occasionally are potentially suspicious. In addition, the anomalies are defined with respect to a context, meaning that a particular activity in a particular context would be an anomaly, while in another context it might be normal.
    We have developed novel approaches for video parsing and simultaneous online learning of dominant and anomalous behaviors in surveillance videos. The method codes video as a compact set of spatio-temporal volumes while considering their spatio-temporal arrangements using a probabilistic framework to calculate the likelihood of the regions in the video. This approach can be considered as an extension of the common bag of video-words approaches, which represent a video as an order-less distribution of video volumes. The results are superior when compared to other approaches, while requiring vastly fewer computations. In addition, the algorithm is very fast and does not employ background subtraction, motion estimation or tracking.

    Other creators
    See project
  • Brain Emotional Learning Based Intelligent Controller (BELBIC)

    -

    In biological system, emotional reactions are utilized for fast decision making in complex environments or emergency situations. Hence, emotional behavior of the biological organisms plays an important role in their survival during the evolution process. On the other hand, emotions indicate how successful a course of actions have been and whether another set of actions should have been taken instead. Therefore, emotions can be considered as a constant feedback to the learning system that…

    In biological system, emotional reactions are utilized for fast decision making in complex environments or emergency situations. Hence, emotional behavior of the biological organisms plays an important role in their survival during the evolution process. On the other hand, emotions indicate how successful a course of actions have been and whether another set of actions should have been taken instead. Therefore, emotions can be considered as a constant feedback to the learning system that produces emotional behavior. In mammals, emotional intelligence is more complicated and is an important part of their intelligence. The main part of mammalians’ brain which is responsible for emotional behavior is called the limbic system. Several attempts have been made to model the limbic system which resulted in the Brain Emotional Learning (BEL) model. It is a computational model of Amygdala and Orbitofrontal cortex, which are the main parts of limbic system in the brain. Finally, the Brain Emotional Learning Based Intelligent Controller (BELBIC), a model-free intelligent controller, was developed. In this computational model, learning takes place in two fundamental steps. First, a particular stimulus is correlated with an emotional response. Then, this emotional consequence shapes an association between the stimulus and the response.
    In several studies we have successfully employed the controller, BELBIC, to control different stable and unstable laboratorial systems such as an overhead traveling crane, inverse pendulum system, overhead crane with double pendulum loads, etc. We performed several studies on how to design emotional stresses and combine multiple objectives via fusion of different stress signals.

    Other creators
    See project

Languages

  • English

    Native or bilingual proficiency

  • French

    Limited working proficiency

Recommendations received

More activity by Mehrsan

View Mehrsan’s full profile

  • See who you know in common
  • Get introduced
  • Contact Mehrsan directly
Join to view full profile

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses