research-article

A bimodal crowdsourcing platform for demographic historical manuscripts

Authors:
Alicia Fornés

Computer Vision Center, Bellaterra, Spain

Computer Vision Center, Bellaterra, Spain
View Profile

,
Josep Lladós

Computer Vision Center, Bellaterra, Spain

Computer Vision Center, Bellaterra, Spain
View Profile

,
Joan Mas

Computer Vision Center, Bellaterra, Spain

Computer Vision Center, Bellaterra, Spain
View Profile

,
Joana Maria Pujades

Centre for Demographic Studies, Bellaterra, Spain

Centre for Demographic Studies, Bellaterra, Spain
View Profile

,
Anna Cabré

Centre for Demographic Studies, Bellaterra, Spain

Centre for Demographic Studies, Bellaterra, Spain
View Profile

DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural HeritageMay 2014Pages 103–108https://doi.org/10.1145/2595188.2595199

Published:19 May 2014Publication History

DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage

Pages 103–108

ABSTRACT

In this paper we present a crowdsourcing web-based application for extracting information from demographic handwritten document images. The proposed application integrates two points of view: the semantic information for demographic research, and the ground-truthing for document analysis research. Concretely, the application has the contents view, where the information is recorded into forms, and the labeling view, with the word labels for evaluating document analysis techniques. The crowdsourcing architecture allows to accelerate the information extraction (many users can work simultaneously), validate the information, and easily provide feedback to the users. We finally show how the proposed application can be extended to other kind of demographic historical manuscripts.

References

A. Amato, A. Sappa, A. Fornés, F. Lumbreras, and J. Lladós. Divide and conquer: Atomizing and parallelizing a task in a mobile crowdsourcing platform. In 2nd International ACM Workshop on Crowdsourcing for Multimedia (CrowdMM), pages 21--22, 2013. Google ScholarDigital Library
S. Averkamp and M. Butler. The care and feeding of a crowd. In Code4Lib Conference, February 2013. http://code4lib.org/conference/2013/averkamp-butler.Google Scholar
N. Cirera, A. Fornés, V. Frinken, and J. Lladós. Hybrid grammar language model for handwritten historical documents recognition. In Pattern Recognition and Image Analysis, volume 7887, pages 117--124, 2013.Google ScholarCross Ref
C. Clausner, S. Pletschacher, and A. Antonacopoulos. Aletheia-an advanced document layout and text ground-truthing system for production environments. In International Conference on Document Analysis and Recognition (ICDAR), pages 48--52. IEEE, 2011. Google ScholarDigital Library
F. Le Bourgeois and H. Emptoz. Debora: Digital access to books of the renaissance. International Journal of Document Analysis and Recognition (IJDAR), 9(2-4):193--221, 2007. Google ScholarDigital Library
A. G. Noll. Crowdsourcing transcriptions of archival materials. In Graduate History Conference, pages 1--33, march 2013.Google Scholar
V. Romero, F. A., N. Serrano, J. Sánchez, A. Toselli, V. Frinken, E. Vidal, and J. Lladós. The {ESPOSALLES} database: An ancient marriage license corpus for off-line handwriting recognition. Pattern Recognition, 46(6):1658--1669, 2013. Google ScholarDigital Library
V. Romero, A. H. Toselli, and E. Vidal. Multimodal Interactive Handwritten Text Transcription. Series in Machine Perception and Artificial Intelligence (MPAI). World Scientific Publishing, 2012. http://www.worldscientific.com/worldscibooks/10.1142/8394.Google Scholar
E. Saund, J. Lin, and P. Sarkar. Pixlabeler: User interface for pixel-level labeling of elements in document images. In 10th International Conference on Document Analysis and Recognition (ICDAR), pages 646--650. IEEE, 2009. Google ScholarDigital Library
M.-C. Yuen, I. King, and K.-S. Leung. A survey of crowdsourcing systems. In IEEE third International Conference on Privacy, security, risk and trust (PASSAT), and IEEE third International Conference on Social Computing (Socialcom), pages 766--773. IEEE, 2011.Google Scholar

Index Terms

A bimodal crowdsourcing platform for demographic historical manuscripts
1. Information systems

Recommendations

The lifecycle of a digital historical document: structure and content
DocEng '04: Proceedings of the 2004 ACM symposium on Document engineering

This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic ...
Read More
Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)
Leveraging Generative Intelligence in Digital Libraries: Towards Human-Machine Collaboration
Abstract
In this paper, we address the challenge of document image analysis for historical index table documents with handwritten records. Demographic studies can gain insight from the use of automatic document analysis in such documents through the study ...
Read More
Text line segmentation of historical documents: a survey

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage
May 2014
200 pages
ISBN:9781450325882
DOI:10.1145/2595188
Program Chairs:
Apostolos Antonacopoulos
University of Salford
,
Klaus U. Schulz
Ludwig-Maximilians-Universität München
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 May 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
document image analysis
ground-truth generation
historical documents
Qualifiers
- research-article
Conference

Acceptance Rates
DATeCH '14 Paper Acceptance Rate31of49submissions,63%Overall Acceptance Rate60of86submissions,70%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 119
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A bimodal crowdsourcing platform for demographic historical manuscripts

DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage

ABSTRACT

References

Cited By

Index Terms

Recommendations

The lifecycle of a digital historical document: structure and content

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Text line segmentation of historical documents: a survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A bimodal crowdsourcing platform for demographic historical manuscripts

DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage

ABSTRACT

References

Cited By

Index Terms

Recommendations

The lifecycle of a digital historical document: structure and content

Text Line Detection in Historical Index Tables: Evaluations on a New French PArish REcord Survey Dataset (PARES)

Text line segmentation of historical documents: a survey

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media