Video-based isolated hand sign language recognition using a deep cascaded model

Rastgoo, Razieh; Kiani, Kourosh; Escalera, Sergio

doi:10.1007/s11042-020-09048-5

Video-based isolated hand sign language recognition using a deep cascaded model

Published: 02 June 2020

Volume 79, pages 22965–22987, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

1701 Accesses
41 Citations
Explore all metrics

Abstract

In this paper, we propose an efficient cascaded model for sign language recognition taking benefit from spatio-temporal hand-based information using deep learning approaches, especially Single Shot Detector (SSD), Convolutional Neural Network (CNN), and Long Short Term Memory (LSTM), from videos. Our simple yet efficient and accurate model includes two main parts: hand detection and sign recognition. Three types of spatial features, including hand features, Extra Spatial Hand Relation (ESHR) features, and Hand Pose (HP) features, have been fused in the model to feed to LSTM for temporal features extraction. We train SSD model for hand detection using some videos collected from five online sign dictionaries. Our model is evaluated on our proposed dataset (Rastgoo et al., Expert Syst Appl 150: 113336, 2020), including 10’000 sign videos for 100 Persian sign using 10 contributors in 10 different backgrounds, and isoGD dataset. Using the 5-fold cross-validation method, our model outperforms state-of-the-art alternatives in sign language recognition

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 11

Hand pose aware multimodal isolated sign language recognition

Article 01 September 2020

Razieh Rastgoo, Kourosh Kiani & Sergio Escalera

Real-time isolated hand sign language recognition using deep networks and SVD

Article 16 February 2021

Razieh Rastgoo, Kourosh Kiani & Sergio Escalera

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

References

Acton B, Koum J (2009) Yahoo.www.whatsapp.com
Chai X, Guang L, Lin Y, Xu Z h, Tang Y, Chen X, Zhou M (2013) Sign language recognition and translation with kinect. In: IEEE International conference on automatic face and gesture recognition (FG2013). April 22–26. Shanghai
Chen Ch, Zhang B, Zhenjie H, Jiang J, Liu M, Yang Y (2017) Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multimedia Tools and Applications
Cooper H, Ong W-J, Pugeault N, Bowden R (2012) Sign language recognition using sub-units. J Mach Learn Res 13:2205–2231
Google Scholar
Duan J, Zhou Sh, Wan J, Guo X, Li SZ (2016) Multi-modality fusion based on consensus-voting and 3D convolution for isolated gesture recognition, arXiv:https://arxiv.org/abs/1611.06689v2
El Khattabi Z, Tabii Y, Benkaddour A (2015) Video summarization: techniques and applications. Int J Comput Inform Eng 4:9
Article Google Scholar
Forster, et al. (2012) WTH-PHOENIX v1 - German sign language RWTH-PHOENIX v2
Ge L, Liang H, Yuan J, Thalmann D (2018) Robust 3D hand pose estimation in single depth images: from single-view CNN to multi-view CNNs. IEEE Transactions on Image Processing
Goodwyn S, Acredolo L, Brown C (2000) Impact of symbolic gesturing on early language development. Nonverbal Behavior, 81–103. https://www.babysignlanguage.com/dictionary/?v=04c19fa1e772
He K, Zhang X, Ren Sh, Sun J (2016) Deep residual learning for image recognition. CVPR
Jameson L, et al. (2004) American Sign Language
Kang B, Tripathi S, Nguyen TQ (2015) Real-time sign language fingerspelling recognition using convolutional neural networks from depth map. In: 3rd IAPR Asian conference on pattern recognition (ACPR)
Kapuscinski T, Oszust M, Wysocki M, Warchol D (2015) Recognition of hand gestures observed by depth cameras. International Journal of Advanced Robotic Systems
Kim S, Ban Y, Lee S (2017) Tracking and classification of in-air hand gesture based on thermal guided joint filter. Sensors
Koller O, Forster J, Hermann N (2015) Continuous sign language recognition: towards large vocabulary statistical recognition systems handling multiple signers. Comput Vis Image Underst, 108–125
Le TH, Jaw DW, Lin ICh, Liu HB, Huang ShCh (2018) An efficient hand detection method based on convolutional neural network. In: The 7th IEEE international symposium on next-generation electronics
Liu W, Anguelov D, Erhan D, Szegedy Ch, Reed S, Fu ChY, Berg AC (2016) SSD: single shot MultiBox detector. ECCV, 21–37
Miao Q, Li Y, Ouyang W, Ma Z, Xu X, Shi W, Cao X, Liu Z, Chai X, Liu Z et al (2017) Multimodal gesture recognition based on the resc3d network. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Miller J, Winn B, Winn J (2019) Signing savvy. Online dictionary
Narayana P, Beveridge JR, Bruce AD (2018) Gesture recognition: focus on the hands. CVPR, 5235–5244
Neverova N, Wolf C h, Taylor GW, Nebout F (2014) Hand segmentation with structured convolutional learning. In: Asian conference on computer vision (ACCV) 2014: computer vision, pp 687–702
Ong WJ, Cooper H, Pugeault N, Bowden R (2012) Sign language recognition using sequential pattern trees. CVPR
Oszust M, Wysocki M (2013) Polish sign language words recognition with Kinect. In: 6th International conference on human system interactions (HSI)
Pagebites Inc. (2019) United States. www.imo.com
Pugeault N, Bowden R (2011) Spelling it out: real-time ASL fingerspelling recognition. In: Proceedings of the 1st IEEE workshop on consumer depth cameras for computer vision, jointly with ICCV’2011
Rastgoo R, Kiani K, Escalera S (2018) Multi-modal deep hand sign language recognition in still images using restricted Boltzmann machine. Entropy 20:809
Article Google Scholar
Rastgoo R, Kiani K, Escalera S (2020) Hand sign language recognition using multi-view hand skeleton. Expert Syst Appl 150:113336. https://doi.org/10.1016/j.eswa.2020.113336
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. NIPS
Ronchetti F, Quiroga F, Estrebou C, Lanzarini L (2016) Handshape recognition for Argentinian sign language using ProbSom. JCS-T
Ronchetti F, Quiroga F, Estrebou C, Lanzarini LC, Rosete A (2016) LSA64: an Argentinian sign language dataset. Congreso Argentino de Ciencias de la Computación (CACIC 2016)
Scogin J (2008) Texas math sign language dictionary. http://www.tsdvideo.org/about.php
Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multi-view bootstrapping. arXiv:https://arxiv.org/abs/1704.07809
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:https://arxiv.org/abs/1409.1556v6
Sun A, Wei Y, Liang S, Tang X, Sun J (2015) Cascaded hand pose regression. CVPR, 824–832
Thangali A, Nash J, Sclaroff S, Neidle C (2011) Exploiting phonological constraints for handshape inference in ASL video. CVPR
Wang H, Wang P, Song Z, Li W (2017) Large-scale multimodal gesture recognition using heterogeneous networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition
William V (2013) American sign language. William Vicars Publisher, http://www.lifeprint.com/index.htm
Yan S h, Xia Y, Smith JS, Lu W, Zhang B (2017) Multi-scale convolutional neural networks for hand detection. Applied Computational Intelligence and Soft Computing
Zhang L, Zhu G, Shen P, Song J, Shah SA, Bennamoun M (2017) Learning spatiotemporal features using 3dcnn and convolutional lstm for gesture recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhou Y, Lu J, Lin X, Sun Y, Ma X (2018) HBE: hand branch ensemble network for real-time 3D Hand Pose Estimation. ECCV
Zimmermann Ch, Brox T (2017) Learning to estimate 3D hand pose from single RGB images. ICCV

Download references

Acknowledgements

This work is partially supported by the Spanish project TIN2016-74946-P (MINECO/FEDER,UE), CERCA Programme /Generalitat de Catalunya, ICREA under the ICREA Academia programme, and High Intelligent Solution (HIS) company of Iran. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan XP GPU used for this research. Also, we would like to thank two deaf centers of Iran (Semnan and Tehran) and the Computer Vision Center (CVC) of Spain for their collaborations.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Semnan University, Semnan, Iran
Razieh Rastgoo & Kourosh Kiani
Department of Mathematics and Informatics, University of Barcelona and Computer Vision Center, UAB, Barcelona, Spain
Sergio Escalera

Authors

Razieh Rastgoo
View author publications
You can also search for this author in PubMed Google Scholar
Kourosh Kiani
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Escalera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kourosh Kiani.

Ethics declarations

Conflict of interests

The authors certify that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rastgoo, R., Kiani, K. & Escalera, S. Video-based isolated hand sign language recognition using a deep cascaded model. Multimed Tools Appl 79, 22965–22987 (2020). https://doi.org/10.1007/s11042-020-09048-5

Download citation

Received: 26 June 2019
Revised: 09 March 2020
Accepted: 07 May 2020
Published: 02 June 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11042-020-09048-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Video-based isolated hand sign language recognition using a deep cascaded model

Abstract

Access this article

Similar content being viewed by others

Hand pose aware multimodal isolated sign language recognition

Real-time isolated hand sign language recognition using deep networks and SVD

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video-based isolated hand sign language recognition using a deep cascaded model

Abstract

Access this article

Similar content being viewed by others

Hand pose aware multimodal isolated sign language recognition

Real-time isolated hand sign language recognition using deep networks and SVD

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation