Abstract
An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.
Work supported by the EC (FEDER/FSE) and the Spanish MCE/MICINN under the MIPRCV “Consolider Ingenio 2010” programme (CSD2007-00018), the iTransDoc project (TIN2006-15694-CO2-01), the Juan de la Cierva programme, and the FPU scholarship AP2007-02867. Also supported by the UPV grant 20080033.
Chapter PDF
Similar content being viewed by others
References
Toselli, A.H., Juan, A., Keysers, D., et al.: Integrated handwriting recognition and interpretation using finite-state models. IJPRAI 18(4), 519–539 (2004)
Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. IJDAR 9, 123–138 (2007)
Bertolami, R., Bunke, H.: Hidden markov model-based ensemble methods for offline handwritten text line recognition. Patter Recog. 41, 3452–3460 (2008)
Bourgeois, F.L., Emptoz, H.: DEBORA: Digital AccEss to BOoks of the RenAissance. IJDAR 9, 193–221 (2007)
Juan, A., et al.: iDoc research project (2009), http://prhlt.iti.es/projects/handwritten/idoc/content.php?page=idoc.php
Pérez, D., Tarazón, L., Serrano, N., Castro, F., Ramos, O., Juan, A.: The GERMANA database. In: Proc. of ICDAR 2009 (2009)
Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Conf. measures for large vocabulary speech recognition. IEEE Trans. on Speech and Audio Proc. 9(3), 288–298 (2001)
Sanchis, A.: Estimación y aplicación de medidas de confianza en reconocimiento automático del habla. PhD thesis, Univ. Politécnica de Valencia, Spain (2004)
Bertolami, R., Zimmermann, M., Bunke, H.: Rejection strategies for offline handwritten text recognition. Pattern Recognition Letter 27, 2005–2012 (2006)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for off-line handwriting recognition. IJDAR, 39–46 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tarazón, L. et al. (2009). Confidence Measures for Error Correction in Interactive Transcription Handwritten Text. In: Foggia, P., Sansone, C., Vento, M. (eds) Image Analysis and Processing – ICIAP 2009. ICIAP 2009. Lecture Notes in Computer Science, vol 5716. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04146-4_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-04146-4_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04145-7
Online ISBN: 978-3-642-04146-4
eBook Packages: Computer ScienceComputer Science (R0)