skip to main content
research-article
Open access

MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality

Published: 12 August 2020 Publication History
  • Get Citation Alerts
  • Abstract

    We present a system for real-time hand-tracking to drive virtual and augmented reality (VR/AR) experiences. Using four fisheye monochrome cameras, our system generates accurate and low-jitter 3D hand motion across a large working volume for a diverse set of users. We achieve this by proposing neural network architectures for detecting hands and estimating hand keypoint locations. Our hand detection network robustly handles a variety of real world environments. The keypoint estimation network leverages tracking history to produce spatially and temporally consistent poses. We design scalable, semi-automated mechanisms to collect a large and diverse set of ground truth data using a combination of manual annotation and automated tracking. Additionally, we introduce a detection-by-tracking method that increases smoothness while reducing the computational cost; the optimized system runs at 60Hz on PC and 30Hz on a mobile processor. Together, these contributions yield a practical system for capturing a user's hands and is the default feature on the Oculus Quest VR headset powering input and social presence.

    Supplementary Material

    MP4 File (a87-han.mp4)
    MP4 File (3386569.3392452.mp4)
    Presentation video

    References

    [1]
    Vassilis Athitsos and Stan Sclaroff. 2003. Estimating 3D hand pose from a cluttered image. Technical Report. Boston University Computer Science Department.
    [2]
    Luca Ballan, Aparna Taneja, Jürgen Gall, Luc Van Gool, and Marc Pollefeys. 2012. Motion capture of hands in action using discriminative salient points. In Proceedings of European Conference on Computer Vision. Springer, 640--653.
    [3]
    Adnane Boukhayma, Rodrigo de Bem, and Philip HS Torr. 2019. 3D Hand Shape and Pose from Images in the Wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [4]
    Yujun Cai, Liuhao Ge, Jianfei Cai, and Junsong Yuan. 2018. Weakly-supervised 3d hand pose estimation from monocular rgb images. ECCV, Springer 12 (2018).
    [5]
    Yujun Cai, Liuhao Ge, Jun Liu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan, and Nadia Magnenat Thalmann. 2019. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
    [6]
    Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [7]
    Martin de La Gorce, Nikos Paragios, and David J Fleet. 2008. Model-based hand tracking with texture, shading and self-occlusions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR). IEEE, 1--8.
    [8]
    Endri Dibra, Thomas Wolf, Cengiz Oztireli, and Markus Gross. 2017. How to Refine 3D Hand Pose Estimation from Unlabelled Depth Data?. In 3DV.
    [9]
    Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, and Junsong Yuan. 2019. 3D Hand Shape and Pose Estimation from a Single RGB Image. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [10]
    Shangchen Han, Beibei Liu, Robert Wang, Yuting Ye, Christopher D. Twigg, and Kenrick Kin. 2018. Online Optical Marker-based Hand Tracking with Deep Labels. ACM Trans. Graph. 37, 4, Article 166 (July 2018), 10 pages.
    [11]
    Yana Hasson, Gül Varol, Dimitrios Tzionas, Igor Kalevatykh, Michael J. Black, Ivan Laptev, and Cordelia Schmid. 2019. Learning joint reconstruction of hands and manipulated objects. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR). https://www.di.ens.fr/willow/research/obman
    [12]
    K. He, G. Gkioxari, P. Dollár, and R. Girshick. 2017. Mask R-CNN. In 2017 IEEE International Conference on Computer Vision (ICCV). 2980--2988.
    [13]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
    [14]
    Yinghao Huang, Manuel Kaufmann, Emre Aksan, Michael J. Black, Otmar Hilliges, and Gerard Pons-Moll. 2018. Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) (2018), 185:1--185:15. Two first authors contributed equally.
    [15]
    Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, and Jan Kautz. 2018. Hand Pose Estimation via Latent 2.5 D Heatmap Regression. In Proceedings of European Conference on Computer Vision.
    [16]
    J. Kannala and S. S. Brandt. 2006. A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 8 (Aug 2006).
    [17]
    Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision.
    [18]
    Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2018. Ganerated hands for real-time 3d hand tracking from monocular RGB. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [19]
    Franziska Mueller, Micah Davis, Florian Bernard, Oleksandr Sotnychenko, Mickeal Verschoor, Miguel A. Otaduy, Dan Casas, and Christian Theobalt. 2019. Real-time Pose and Shape Reconstruction of Two Interacting Hands With a Single Depth Camera. ACM Transactions on Graphics (TOG) 38, 4 (2019).
    [20]
    Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time Hand Tracking under Occlusion from an Egocentric RGB-D Sensor. In Proceedings of International Conference on Computer Vision (ICCV). 10. https://handtracker.mpi-inf.mpg.de/projects/OccludedHands/
    [21]
    Markus Oberweger, Gernot Riegler, Paul Wohlhart, and Vincent Lepetit. 2016. Efficiently creating 3D training data for fine hand pose estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [22]
    Iasonas Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2012. Tracking the articulated motion of two strongly interacting hands. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR). IEEE, 1862--1869.
    [23]
    Victor Adrian Prisacariu and Ian Reid. 2011. Robust 3D hand tracking for human computer interaction. In Face and Gesture. IEEE, 368--375.
    [24]
    Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [25]
    Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia) 36, 6 (Nov. 2017).
    [26]
    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
    [27]
    Toby Sharp, Cem Keskin, Duncan Robertson, Jonathan Taylor, Jamie Shotton, David Kim, Christoph Rhemann, Ido Leichter, Alon Vinnikov, Yichen Wei, et al. 2015. Accurate, robust, and flexible real-time hand tracking. In CHI.
    [28]
    Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. 2017. Learning from Simulated and Unsupervised Images through Adversarial Training. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [29]
    Tomas Simon, Hanbyul Joo, Iain A Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [30]
    Adrian Spurr, Jie Song, Seonwook Park, and Otmar Hilliges. 2018. Cross-modal deep variational hand pose estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [31]
    Srinath Sridhar, Franziska Mueller, Michael Zollhoefer, Dan Casas, Antti Oulasvirta, and Christian Theobalt. 2016. Real-time Joint Tracking of a Hand Manipulating an Object from RGB-D Input. In Proceedings of European Conference on Computer Vision (ECCV). 17. http://handtracker.mpi-inf.mpg.de/projects/RealtimeHO/
    [32]
    Björn Stenger, Arasanathan Thayananthan, Philip HS Torr, and Roberto Cipolla. 2006. Model-based hand tracking using a hierarchical bayesian filter. IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (2006), 1372--1384.
    [33]
    James S Supancic, Grégory Rogez, Yi Yang, Jamie Shotton, and Deva Ramanan. 2015. Depth-based hand pose estimation: data, methods, and challenges. In Proceedings of International Conference on Computer Vision(ICCV).
    [34]
    Andrea Tagliasacchi, Matthias Schroeder, Anastasia Tkach, Sofien Bouaziz, Mario Botsch, and Mark Pauly. 2015. Robust Articulated-ICP for Real-Time Hand Tracking. Computer Graphics Forum (Symposium on Geometry Processing) 34, 5 (2015).
    [35]
    Jonathan Taylor, Lucas Bordeaux, Thomas Cashman, Bob Corish, Cem Keskin, Toby Sharp, Eduardo Soto, David Sweeney, Julien Valentin, Benjamin Luff, et al. 2016. Efficient and precise interactive hand tracking through joint, continuous optimization of pose and correspondences. ACM Transactions on Graphics (TOG) 35, 4 (2016), 143.
    [36]
    Jonathan Taylor, Vladimir Tankovich, Danhang Tang, Cem Keskin, David Kim, Philip Davidson, Adarsh Kowdle, and Shahram Izadi. 2017. Articulated distance fields for ultra-fast tracking of hands interacting. ACM Transactions on Graphics (TOG) (2017).
    [37]
    Bugra Tekin, Federica Bogo, and Marc Pollefeys. 2019. H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [38]
    Jonathan Tompson, Murphy Stein, Yann Lecun, and Ken Perlin. 2014. Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks. ACM Transactions on Graphics (TOG) (2014).
    [39]
    Chengde Wan, Thomas Probst, Luc Van Gool, and Angela Yao. 2019. Self-supervised 3D hand pose estimation through training by fitting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [40]
    Robert Y Wang and Jovan Popović. 2009. Real-time hand-tracking with a color glove. ACM transactions on graphics (TOG) 28, 3 (2009), 63.
    [41]
    Linlin Yang and Angela Yao. 2019. Disentangling Latent Hands for Image Synthesis and Pose Estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [42]
    Shanxin Yuan, Qi Ye, Bjorn Stenger, Siddhand Jain, and Tae-Kyun Kim. 2017. BigHand2. 2M Benchmark: Hand Pose Dataset and State of the Art Analysis. In Proceedings of IEEE Conference on Computer Vision and Pattern Recogntion(CVPR).
    [43]
    Hao Zhang, Zi-Hao Bo, Jun-Hai Yong, and Feng Xu. 2019a. InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Transactions on Graphics (TOG) 38, 4 (2019).
    [44]
    Jiawei Zhang, Jianbo Jiao, Mingliang Chen, Liangqiong Qu, Xiaobin Xu, and Qingxiong Yang. 2016. 3d hand pose tracking and estimation using stereo matching. arXiv preprint arXiv:1610.07214 (2016).
    [45]
    Xiong Zhang, Qiang Li, Wenbo Zhang, and Wen Zheng. 2019b. End-to-end Hand Mesh Recovery from a Monocular RGB Image. In Proceedings of International Conference on Computer Vision(ICCV).
    [46]
    Christian Zimmermann and Thomas Brox. 2017a. Learning to estimate 3d hand pose from single rgb images. In Proceedings of the IEEE International Conference on Computer Vision. 4903--4911.
    [47]
    Christian Zimmermann and Thomas Brox. 2017b. Learning to Estimate 3D Hand Pose from Single RGB Images. In IEEE International Conference on Computer Vision (ICCV). https://lmb.informatik.uni-freiburg.de/projects/hand3d/https://arxiv.org/abs/1705.01389.
    [48]
    Christian Zimmermann, Duygu Ceylan, Jimei Yang, Bryan Russell, Max Argus, and Thomas Brox. 2019. FreiHAND: A Dataset for Markerless Capture of Hand Pose and Shape from Single RGB Images. arXiv:1909.04349 [cs.CV]

    Cited By

    View all
    • (2024)Estimating Power, Performance, and Area for On-Sensor Deployment of AR/VR Workloads Using an Analytical FrameworkACM Transactions on Design Automation of Electronic Systems10.1145/3670404Online publication date: 7-Jun-2024
    • (2024)Influence of Gameplay Duration, Hand Tracking, and Controller Based Control Methods on UX in VRProceedings of the 16th International Workshop on Immersive Mixed and Virtual Environment Systems10.1145/3652212.3652222(22-28)Online publication date: 15-Apr-2024
    • (2024)The Effect of Degraded Eye Tracking Accuracy on Interactions in VRProceedings of the 2024 Symposium on Eye Tracking Research and Applications10.1145/3649902.3656369(1-7)Online publication date: 4-Jun-2024
    • Show More Cited By

    Index Terms

    1. MEgATrack: monochrome egocentric articulated hand-tracking for virtual reality

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 39, Issue 4
      August 2020
      1732 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3386569
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2020
      Published in TOG Volume 39, Issue 4

      Check for updates

      Author Tags

      1. hand tracking
      2. motion capture
      3. virtual reality

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1,433
      • Downloads (Last 6 weeks)105

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Estimating Power, Performance, and Area for On-Sensor Deployment of AR/VR Workloads Using an Analytical FrameworkACM Transactions on Design Automation of Electronic Systems10.1145/3670404Online publication date: 7-Jun-2024
      • (2024)Influence of Gameplay Duration, Hand Tracking, and Controller Based Control Methods on UX in VRProceedings of the 16th International Workshop on Immersive Mixed and Virtual Environment Systems10.1145/3652212.3652222(22-28)Online publication date: 15-Apr-2024
      • (2024)The Effect of Degraded Eye Tracking Accuracy on Interactions in VRProceedings of the 2024 Symposium on Eye Tracking Research and Applications10.1145/3649902.3656369(1-7)Online publication date: 4-Jun-2024
      • (2024)Exploiting Spatial-Temporal Context for Interacting Hand Reconstruction on Monocular RGB VideoACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363970720:6(1-18)Online publication date: 8-Mar-2024
      • (2024)STMG: A Machine Learning Microgesture Recognition System for Supporting Thumb-Based VR/AR InputProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642702(1-15)Online publication date: 11-May-2024
      • (2024)TriPad: Touch Input in AR on Ordinary Surfaces with Hand Tracking OnlyProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642323(1-18)Online publication date: 11-May-2024
      • (2024)PressureVision++: Estimating Fingertip Pressure from Diverse RGB Images2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00850(8683-8693)Online publication date: 3-Jan-2024
      • (2024)Subtask-Based Virtual Hand Visualization Method for Enhanced User Accuracy in Virtual Reality Environments2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW)10.1109/VRW62533.2024.00008(6-11)Online publication date: 16-Mar-2024
      • (2024)Effect of Hand and Object Visibility in Navigational Tasks Based on Rotational and Translational Movements in Virtual Reality2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR)10.1109/VR58804.2024.00035(115-125)Online publication date: 16-Mar-2024
      • (2024)SMR: Spatial-Guided Model-Based Regression for 3D Hand Pose and Mesh ReconstructionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328515334:1(299-314)Online publication date: 1-Jan-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media