Abstract
Recent breakthroughs in Artificial Intelligence, Deep Learning, and Document Image Analysis and Recognition have significantly eased the creation of digital libraries and the transcription of historical documents. However, for documents in rare scripts with few labelled training data available, current Handwritten Text Recognition (HTR) systems are too constraining. Moreover, research on HTR often focuses on technical aspects only, and rarely puts emphasis on implementing software tools for scholars in Humanities. In this article, we describe, compare, and analyse different transcription methods for rare scripts. We evaluate their performance in a real-use case of a medieval manuscript written in the runic script (Codex Runicus) and discuss advantages and disadvantages of each method from the user perspective. From this exhaustive analysis and comparison with a fully manual transcription, we raise conclusions and provide recommendations to scholars interested in using automatic transcription tools.
- [1] . 2017. Transcription of encoded manuscripts with image processing techniques. In Digital Humanities Conference (DH2017). 441–443.Google Scholar
- [2] . 2007. Hierarchical k-means: An algorithm for centroids initialization for K-means. Reports of the Faculty of Science and Engineering 36, 1 (2007), 25–31.Google Scholar
- [3] . 2019. Towards a generic unsupervised method for transcription of encoded manuscripts. In International Conference on Digital Access to Textual Cultural Heritage (DATECH). Google ScholarDigital Library
- [4] . 2019. Shoot less and sketch more: An efficient sketch classification via joining graph neural networks and few-shot learning. In International Workshop on Graphics Recognition (GREC). IEEE, 80–85.Google Scholar
- [5] . 2016. Segmentation free spotting of cuneiform using part structured models. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). IEEE, 301–306.Google ScholarCross Ref
- [6] . 2018. An efficient end-to-end neural model for handwritten text recognition. arXiv preprint arXiv:1807.07965.Google Scholar
- [7] . 2014. Continuous handwritten script recognition. In Handbook of Document Image Processing and Recognition. Springer, 391–425.Google ScholarCross Ref
- [8] . 1943. Svenska Landskapslagar. Serie 4 Skanelagen Och Gutalagen. Vol. 4. Geber, Stockholm.Google Scholar
- [9] . 2020. Improving recurrent neural networks for offline Arabic handwriting recognition by combining different language models. International Journal of Pattern Recognition and Artificial Intelligence (2020), 2052007.Google ScholarCross Ref
- [10] . 2020. Candidate fusion: Integrating language modelling into a sequence-to-sequence handwritten word recognition architecture. Pattern Recognition (2020), 107790.Google Scholar
- [11] . 2019. eScriptorium: An open source platform for historical document analysis. In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Vol. 2. IEEE, 19–19.Google ScholarCross Ref
- [12] . 2019. Meta-learning with differentiable convex optimization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10649–10657. Google ScholarCross Ref
- [13] . 2018. Gradient-based meta-learning with learned layerwise metric and subspace. In 35th International Conference on Machine Learning(
Proceedings of Machine Learning Research , Vol. 80), and (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 2927–2936. http://proceedings.mlr.press/v80/lee18a.html.Google Scholar - [14] . 2019. Finding task-relevant features for few-shot learning by category traversal. 1–10. Google ScholarCross Ref
- [15] . 2019. Transforming scholarship in the archives through handwritten text recognition: Transkribus as a case study. Journal of Documentation 75, 5 (2019), 954–976. Google ScholarCross Ref
- [16] . 2020. Codex runicus (AM 28 8vo): A pilot project for encoding a runic manuscript. Umanistica Digitale 9 (
Dec. 2020), 155–169. Google ScholarCross Ref - [17] . 2020. Incremental few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
- [18] . 2017. Are multidimensional recurrent layers really necessary for handwritten text recognition? In International Conference on Document Analysis and Recognition (ICDAR), Vol. 1. IEEE, 67–72.Google ScholarCross Ref
- [19] . 2015. Retrieving cuneiform structures in a segmentation-free word spotting framework. In 3rd International Workshop on Historical Document Imaging and Processing. 129–136.Google Scholar
- [20] . 2019. A novel procedure to speed up the transcription of historical handwritten documents by interleaving keyword spotting and user validation. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1226–1230.Google ScholarCross Ref
- [21] . 2020. Using keyword spotting systems as tools for the transcription of historical handwritten documents: Models and procedures for performance evaluation. Pattern Recognition Letters 131 (2020), 329–335.Google ScholarCross Ref
- [22] . 2018. Few-shot learning with graph neural networks. In ICLR. https://openreview.net/forum?id=BJj6qGbRW.Google Scholar
- [23] . 1997. Adaptive document binarization. In 4th International Conference on Document Analysis and Recognition, Vol. 1. 147–152. Google ScholarCross Ref
- [24] . 2001. Vision-based mobile robot localization and mapping using scale-invariant features. In Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), Vol. 2. IEEE, 2051–2058.Google ScholarCross Ref
- [25] . 2017. Prototypical networks for few-shot learning. In NIPS. 4080–4090. http://papers.nips.cc/paper/6996-prototypical-networks-for-few-shot-learning.Google Scholar
- [26] . 2022. One-shot compositional data generation for low resource handwritten text recognition. In IEEE/CVF Winter Conference on Applications of Computer Vision. 935–943.Google ScholarCross Ref
- [27] . 2021. A few-shot learning approach for historical ciphered manuscript recognition. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 5413–5420.Google ScholarCross Ref
- [28] . 2016. The Danish Medieval Laws. The Laws of Scania, Zealand and Jutland. Routledge, London, New York. xiii, 349 Seiten pages.Google ScholarCross Ref
- [29] . 2019. Decipherment of historical manuscript images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 78–85.Google ScholarCross Ref
- [30] . 2016. Handwritten chinese character recognition with spatial transformer and deep residual networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 3440–3445.Google ScholarCross Ref
Index Terms
- A User Perspective on HTR Methods for the Automatic Transcription of Rare Scripts: The Case of Codex Runicus
Recommendations
Study on Automated Approach to Recognize Characters for Handwritten and Historical Document
Script recognition is the mechanism of automatic script analysis and recognition whereby intensive study has been carried out and a significant amount of papers on this problem have been released over the past. But there are still a few issues to be ...
Towards Automatic Transcription of Syriac Handwriting
ICIAP '03: Proceedings of the 12th International Conference on Image Analysis and ProcessingWe describe an implemented method for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written ...
Handwriting Recognition in Indian Regional Scripts: A Survey of Offline Techniques
Offline handwriting recognition in Indian regional scripts is an interesting area of research as almost 460 million people in India use regional scripts. The nine major Indian regional scripts are Bangla (for Bengali and Assamese languages), Gujarati, ...
Comments