Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

14 Accesses
Explore all metrics

Abstract

The problem of predicting the position of a person on future frames of a video stream is solved, and in-depth experimental studies on the application of traditional and SOTA blocks for this task are carried out. An original architecture of KeyFNet and its modifications based on transform blocks is presented, which is able to predict coordinates in the video stream for 30, 60, 90, and 120 frames ahead with high accuracy. The novelty lies in the application of a combined algorithm based on multiple FNet blocks with fast Fourier transform as an attention mechanism concatenating the coordinates of key points. Experiments on Human3.6M and on our own real data confirmed the effectiveness of the proposed approach based on FNet blocks, compared to the traditional approach based on LSTM. The proposed algorithm matches the accuracy of advanced models, but outperforms them in terms of speed, uses less computational resources, and thus can be applied in collaborative robotic solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

S. L. Pintea, J. C. van Gemert, and A. W. M. Smeulders, “Déja Vu: Motion prediction in static images,” Computer Vision–ECCV 2014: Proceedings of the 13th European Conference, Zurich, Switzerland, September 6–12, 2014 (Springer International, 2014), Part III, pp. 172–187.
J. Walker, A. Gupta, and M. Hebert, “Dense optical flow prediction from a static image,” Proceedings of the IEEE International Conference on Computer Vision (2015), pp. 2443–2451.
Y. W. Chao et al., “Forecasting human dynamics from static images,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 548–556.
O. Amosov et al., “Human localization in video frames using a growing neural gas algorithm and fuzzy inference,” Comput. Opt. 41 (1), 46–58 (2017). https://doi.org/10.18287/2412-6179-2017-41-1-46-58
Article Google Scholar
O. S. Amosov et al., “Using the deep neural networks for normal and abnormal situation recognition in the automatic access monitoring and control system of vehicles,” Neural Comput. Appl. 33 (8), 3069–3083 (2021). https://doi.org/10.1007/s00521-020-05170-5
Article Google Scholar
N. A. Gerasimenko, A. S. Chernyavsky, and M. A. Nikiforova, “RuSciBERT: A transformer language model for obtaining semantic embeddings of scientific texts in Russian,” Dokl. Math. 106, Suppl. 1, S95–S96 (2022). https://doi.org/10.1134/S1064562422060072
Article Google Scholar
O. S. Amosov et al., “Using the ensemble of deep neural networks for normal and abnormal situations detection and recognition in the continuous video stream of the security system,” Procedia Comput. Sci. 150, 532–539 (2019). https://doi.org/10.1016/j.procs.2019.02.089
Article Google Scholar
X. Gao et al., “Accurate grid keypoint learning for efficient video prediction,” 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE, 2021), pp. 5908–5915. https://doi.org/10.1109/IROS51168.2021.9636874
Z. Liu et al., “Swin transformer V2: Scaling up capacity and resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), pp. 12009–12019. https://doi.org/10.1109/CVPR52688.2022.01170
C. Ionescu et al., “Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments,” IEEE Trans. Pattern Anal. Mach. Intell. 36 (7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248
Article Google Scholar
Y. Ivanov et al., “Using an ensemble of deep neural networks to detect human keypoints in the workspace of a collaborative robotic system,” Eng. Proc. 33 (1), 19 (2023). https://doi.org/10.3390/engproc2023033019
Article Google Scholar
GutHub. https://github.com/IdentySergey/fnet. Accessed August 25, 2023.
J. Lee-Thorp et al., “FNet: Mixing tokens with Fourier transforms” (2021). https://doi.org/10.48550/arXiv.2105.03824
S. Kreiss, L. Bertoni, and A. Alahi, “OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association,” IEEE Trans. Intell. Transport. Syst. 23 (8), 13498–13511 (2021). https://doi.org/10.1109/tits.2021.3124981
Article Google Scholar
Lugaresi et al., “MediaPipe: A framework for building perception pipelines” (2019). https://doi.org/10.48550/arXiv.1906.08172

Download references

Funding

This work was supported by the Russian Science Foundation, project no. 22-71-10093, https://rscf.ru/en/project/22-71-10093/.

Author information

Authors and Affiliations

Komsomolsk-on-Amur State University, Komsomolsk-on-Amur, Khabarovsk krai, Russia
S. V. Zhiganov, Y. S. Ivanov & D. M. Grabar

Authors

S. V. Zhiganov
View author publications
You can also search for this author in PubMed Google Scholar
Y. S. Ivanov
View author publications
You can also search for this author in PubMed Google Scholar
D. M. Grabar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to S. V. Zhiganov, Y. S. Ivanov or D. M. Grabar.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Publisher��s Note.

Pleiades Publishing remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhiganov, S.V., Ivanov, Y.S. & Grabar, D.M. Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers. Dokl. Math. 108 (Suppl 2), S484–S493 (2023). https://doi.org/10.1134/S1064562423701624

Download citation

Received: 02 September 2023
Revised: 15 September 2023
Accepted: 24 October 2023
Published: 25 March 2024
Issue Date: December 2023
DOI: https://doi.org/10.1134/S1064562423701624

Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

Abstract

Access this article

Subscribe and save

Buy Now

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Publisher��s Note.

Rights and permissions

About this article

Cite this article

Keywords:

Subscribe and save

Buy Now

Navigation

Investigation of Neural Network Algorithms for Human Movement Prediction Based on LSTM and Transformers

Abstract

Access this article

Subscribe and save

Buy Now

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Additional information

Publisher���s Note.

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now

Search

Navigation

Publisher��s Note.