Skip to main content
Log in

Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

  • Published:
Doklady Mathematics Aims and scope Submit manuscript

Abstract

The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

REFERENCES

  1. S. L. Pintea, J. C. van Gemert, and A. W. M. Smeulders, “Déja Vu: Motion prediction in static images,” Computer Vision–ECCV 2014: Proceedings of the 13th European Conference, Zurich, Switzerland, September 6–12, 2014 (Springer International, 2014), Part III, pp. 172–187.

  2. J. Walker, A. Gupta, and M. Hebert, “Dense optical flow prediction from a static image,” Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 2443–2451.

  3. Y. W. Chao et al., “Forecasting human dynamics from static images,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 548–556.

  4. O. Amosov et al., “Human localization in video frames using a growing neural gas algorithm and fuzzy inference,” Comput. Opt. 41 (1), 46–58 (2017). https://doi.org/10.18287/2412-6179-2017-41-1-46-58

    Article  Google Scholar 

  5. O. S. Amosov et al., “Using the deep neural networks for normal and abnormal situation recognition in the automatic access monitoring and control system of vehicles,” Neural Comput. Appl. 33 (8), 3069–3083 (2021). https://doi.org/10.1007/s00521-020-05170-5

    Article  Google Scholar 

  6. N. A. Gerasimenko, A. S. Chernyavsky, and M. A. Nikiforova, “RuSciBERT: A transformer language model for obtaining semantic embeddings of scientific texts in Russian,” Dokl. Math. 106, Suppl. 1, S95–S96 (2022). https://doi.org/10.1134/S1064562422060072

    Article  Google Scholar 

  7. O. S. Amosov et al., “Using the ensemble of deep neural networks for normal and abnormal situations detection and recognition in the continuous video stream of the security system,” Procedia Comput. Sci. 150, 532–539 (2019). https://doi.org/10.1016/j.procs.2019.02.089

    Article  Google Scholar 

  8. X. Gao et al., “Accurate grid keypoint learning for efficient video prediction,” 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2021), pp. 5908–5915. https://doi.org/10.1109/IROS51168.2021.9636874

  9. Z. Liu et al., “Swin transformer V2: Scaling up capacity and resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 12009–12019. https://doi.org/10.1109/CVPR52688.2022.01170

  10. C. Ionescu et al., “Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments,” IEEE Trans. Pattern Anal. Mach. Intell. 36 (7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248

    Article  Google Scholar 

  11. Y. Ivanov et al., “Using an ensemble of deep neural networks to detect human keypoints in the workspace of a collaborative robotic system,” Eng. Proc. 33 (1), 19 (2023). https://doi.org/10.3390/engproc2023033019

    Article  Google Scholar 

  12. GutHub. https://github.com/IdentySergey/fnet. Accessed August 25, 2023.

  13. J. Lee-Thorp et al., “FNet: Mixing tokens with Fourier transforms” (2021). https://doi.org/10.48550/arXiv.2105.03824

  14. S. Kreiss, L. Bertoni, and A. Alahi, “OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association,” IEEE Trans. Intell. Transport. Syst. 23 (8), 13498–13511 (2021). https://doi.org/10.1109/tits.2021.3124981

    Article  Google Scholar 

  15. Lugaresi et al., “MediaPipe: A framework for building perception pipelines” (2019). https://doi.org/10.48550/arXiv.1906.08172

Download references

Funding

This work was supported by the Russian Science Foundation, project no. 22-71-10093, https://rscf.ru/en/project/22-71-10093/.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to S. V. Zhiganov, Y. S. Ivanov or D. M. Grabar.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Publisher���s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhiganov, S.V., Ivanov, Y.S. & Grabar, D.M. Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers. Dokl. Math. 108 (Suppl 2), S484–S493 (2023). https://doi.org/10.1134/S1064562423701624

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1064562423701624

Keywords:

Navigation