ABSTRACT
In this paper we present a crowdsourcing web-based application for extracting information from demographic handwritten document images. The proposed application integrates two points of view: the semantic information for demographic research, and the ground-truthing for document analysis research. Concretely, the application has the contents view, where the information is recorded into forms, and the labeling view, with the word labels for evaluating document analysis techniques. The crowdsourcing architecture allows to accelerate the information extraction (many users can work simultaneously), validate the information, and easily provide feedback to the users. We finally show how the proposed application can be extended to other kind of demographic historical manuscripts.
- A. Amato, A. Sappa, A. Fornés, F. Lumbreras, and J. Lladós. Divide and conquer: Atomizing and parallelizing a task in a mobile crowdsourcing platform. In 2nd International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM), pages 21--22, 2013. Google ScholarDigital Library
- S. Averkamp and M. Butler. The care and feeding of a crowd. In Code4Lib Conference, February 2013. http://code4lib.org/conference/2013/averkamp-butler.Google Scholar
- N. Cirera, A. Fornés, V. Frinken, and J. Lladós. Hybrid grammar language model for handwritten historical documents recognition. In Pattern Recognition and Image Analysis, volume 7887, pages 117--124, 2013.Google ScholarCross Ref
- C. Clausner, S. Pletschacher, and A. Antonacopoulos. Aletheia-an advanced document layout and text ground-truthing system for production environments. In International Conference on Document Analysis and Recognition (ICDAR), pages 48--52. IEEE, 2011. Google ScholarDigital Library
- F. Le Bourgeois and H. Emptoz. Debora: Digital access to books of the renaissance. International Journal of Document Analysis and Recognition (IJDAR), 9(2-4):193--221, 2007. Google ScholarDigital Library
- A. G. Noll. Crowdsourcing transcriptions of archival materials. In Graduate History Conference, pages 1--33, march 2013.Google Scholar
- V. Romero, F. A., N. Serrano, J. Sánchez, A. Toselli, V. Frinken, E. Vidal, and J. Lladós. The {ESPOSALLES} database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recognition, 46(6):1658--1669, 2013. Google ScholarDigital Library
- V. Romero, A. H. Toselli, and E. Vidal. Multimodal Interactive Handwritten Text Transcription. Series in Machine Perception and Artificial Intelligence (MPAI). World Scientific Publishing, 2012. http://www.worldscientific.com/worldscibooks/10.1142/8394.Google Scholar
- E. Saund, J. Lin, and P. Sarkar. Pixlabeler: User interface for pixel-level labeling of elements in document images. In 10th International Conference on Document Analysis and Recognition (ICDAR), pages 646--650. IEEE, 2009. Google ScholarDigital Library
- M.-C. Yuen, I. King, and K.-S. Leung. A survey of crowdsourcing systems. In IEEE third International Conference on Privacy, security, risk and trust (PASSAT), and IEEE third International Conference on Social Computing (Socialcom), pages 766--773. IEEE, 2011.Google Scholar
Index Terms
- A bimodal crowdsourcing platform for demographic historical manuscripts
Recommendations
The lifecycle of a digital historical document: structure and content
DocEng '04: Proceedings of the 2004 ACM symposium on Document engineeringThis paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic ...
Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)
Leveraging Generative Intelligence in Digital Libraries: Towards Human-Machine CollaborationText line segmentation of historical documents: a survey
There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word ...
Comments