Abstract
Date estimation of historical document images is a challenging problem, with several contributions in the literature that lack of the ability to generalize from one dataset to others. This paper presents a robust date estimation system based in a retrieval approach that generalizes well in front of heterogeneous collections. We use a ranking loss function named smooth-nDCG to train a Convolutional Neural Network that learns an ordination of documents for each problem. One of the main usages of the presented approach is as a tool for historical contextual retrieval. It means that scholars could perform comparative analysis of historical images from big datasets in terms of the period where they were produced. We provide experimental evaluation on different types of documents from real datasets of manuscript and newspaper images.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
XAC is a governmental archivist institution with url for further detail: https://xac.gencat.cat/en/inici/.
- 2.
mAP approximated from training bacthes.
References
Adam, K., Baig, A., Al-Maadeed, S., Bouridane, A., El-Menshawy, S.: Kertas: dataset for automatic dating of ancient Arabic manuscripts. Int. J. Doc. Anal. Recognit. (IJDAR) 21(4), 283–290 (2018)
Brink, A.A., Smit, J., Bulacu, M., Schomaker, L.: Writer identification using directional ink-trace width measurements. Pattern Recogn. 45(1), 162–171 (2012)
Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)
Cloppet, F., Eglin, V., Helias-Baron, M., Kieu, C., Vincent, N., Stutzmann, D.: ICDAR 2017 competition on the classification of medieval handwritings in latin script. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1371–1376. IEEE (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Dhali, M.A., Jansen, C.N., de Wit, J.W., Schomaker, L.: Feature-extraction methods for historical manuscript dating based on writing style development. Pattern Recogn. Lett. 131, 413–420 (2020). https://doi.org/10.1016/j.patrec.2020.01.027
Hamid, A., Bibi, M., Moetesum, M., Siddiqi, I.: Deep learning based approach for historical manuscript dating. In: International Conference on Document Analysis and Recognition - ICDAR2019, pp. 967–972. IEEE (2019)
Hamid, A., Bibi, M., Siddiqi, I., Moetesum, M.: Historical manuscript dating using textural measures. In: 2018 International Conference on Frontiers of Information Technology (FIT), pp. 235–240. IEEE (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, S., Samara, P., Burgers, J., Schomaker, L.: Historical manuscript dating based on temporal pattern codebook. Comput. Vis. Image Underst. 152, 167–175 (2016)
He, S., Samara, P., Burgers, J., Schomaker, L.: Image-based historical manuscript dating using contour and stroke fragments. Pattern Recognit. 58, 159–171 (2016)
He, S., Samara, P., Burgers, J., Schomaker, L.: A multiple-label guided clustering algorithm for historical document dating and localization. IEEE Trans. Image Process. 25(11), 5252–5265 (2016). https://doi.org/10.1109/TIP.2016.2602078
He, S., Sammara, P., Burgers, J., Schomaker, L.: Towards style-based dating of historical documents. In: 2014 14th International Conference on Frontiers in Handwriting Recognition, pp. 265–270. IEEE (2014)
He, S., Schomaker, L.: A polar stroke descriptor for classification of historical documents. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 6–10. IEEE (2015)
He, S., Schomaker, L., Samara, P., Burgers, J.: MPS Data set with images of medieval charters for handwriting-style based dating of manuscripts (2016)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Li, Y., Genzel, D., Fujii, Y., Popat, A.C.: Publication date estimation for printed historical documents using convolutional neural networks. In: Association for Computing Machinery, HIP 2015, New York, NY, USA, pp. 99–106 (2015). https://doi.org/10.1145/2809544.2809550
Lombardi, F., Marinai, S.: Deep learning for historical document analysis and recognition-a survey. J. Imaging 6(10), 110 (2020)
Martin, P., Doucet, A., Jurie, F.: Dating color images with ordinal classification. In: Proceedings of International Conference on Multimedia Retrieval, pp. 447–450 (2014)
Molina, A., Riba, P., Gomez, L., Ramos-Terrades, O., Lladós, J.: Date estimation in the wild of scanned historical photos: an image retrieval approach. In: ICDAR (2021)
Müller, E., Springstein, M., Ewerth, R.: “When was this picture taken?”-image date estimation in the wild. In: Proceedings of the European Conference on Computer Vision, pp. 619–625 (2017)
Riba, P., Molina, A., Gomez, L., Ramos-Terrades, O., Lladós, J.: Learning to rank words: optimizing ranking metrics for word spotting. In: ICDAR (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Studer, L., et al.: A comprehensive study of imagenet pre-training for historical document image analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 720–725. IEEE (2019)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Vincent, L.: Google book search: document understanding on a massive scale. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 819–823. IEEE (2007)
Wahlberg, F., Wilkinson, T., Brun, A.: Historical manuscript production date estimation using deep convolutional neural networks. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 205–210. IEEE (2016)
Acknowledgment
This work has been partially supported by the Spanish projects RTI2018-095645-B-C21, and FCT-19-15244, and the Catalan projects 2017-SGR-1783, the Culture Department of the Generalitat de Catalunya, and the CERCA Program/Generalitat de Catalunya.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Molina, A., Gomez, L., Ramos Terrades, O., Lladós, J. (2022). A Generic Image Retrieval Method for Date Estimation of Historical Document Collections. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-031-06555-2_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)