Skip to main content
Log in

Multimodal page classification in administrative document image streams

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this paper, we present a page classification application in a banking workflow. The proposed architecture represents administrative document images by merging visual and textual descriptions. The visual description is based on a hierarchical representation of the pixel intensity distribution. The textual description uses latent semantic analysis to represent document content as a mixture of topics. Several off-the-shelf classifiers and different strategies for combining visual and textual cues have been evaluated. A final step uses an \(n\)-gram model of the page stream allowing a finer-grained classification of pages. The proposed method has been tested in a real large-scale environment and we report results on a dataset of 70,000 pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. ABBYY Finereader Engine 9.

References

  1. Aggarwal, C., Zhai, C.: Mining Text Data, Chap. A Survey of Text Classification Algorithms. Springer, New York (2012)

    Google Scholar 

  2. Augereau, O., Journet, N., Vialard, A., Domenger, J.: Improving classification of an industrial document image database by combining visual and textual features. In: Proceedings of the Eleventh IAPR International Workshop on Document Analysis Systems (2014)

  3. Bagdanov, A.: Fine-grained document genre classification using first order random graphs. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 79–83 (2001)

  4. van Beusekom, J., Keysers, D., Shafait, F., Breuel, T.: Distance measures for layout-based document image retrieval. In: Proceedings of the International Conference on Document Image Analysis for Libraries (2006)

  5. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

  6. Cesarini, F., Lastri, M., Marinai, S., Soda, G.: Encoding of modified X-Y trees for document classification. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 1131–1136 (2001)

  7. Chang, C., Lin, C.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)

    Article  Google Scholar 

  8. Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Document Anal. Recognit. 10(1), 1–16 (2006)

    Article  Google Scholar 

  9. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  10. Dengel, A., Dubiel, F.: Computer understanding of document structure. Int. J. Imaging Syst. Technol. 7(4), 271–278 (1996)

    Article  Google Scholar 

  11. Dimmick, D., Garris, M., Wilson, C.L.: Structured forms database. Tech. rep, National Institutte of Standards and Technology (1991)

  12. Doermann, D.: The indexing and retrieval of document images: a survey. Comput. Vis. Image Underst. 70(3), 287–298 (1998)

    Article  Google Scholar 

  13. Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley-Interscience, New York (2000)

    Google Scholar 

  14. Erol, B., Hull, J.: Semantic classification of business images. In: Electronic Imaging, pp. 60,730G–60,730G (2006)

  15. Ford, G., Thoma, G.: Ground truth data for document image analysis. In: Proceedings of the Symposium on Document Image Understanding and Technology, pp. 199–205 (2003)

  16. Gaceb, D., Eglin, V., Lebourgeois, F.: Classification of business documents for real-time application. J. Real-time Image Process. (2011). doi:10.1007/s11554-011-0227-4

  17. Gordo, A., Gibert, J., Valveny, E., Rusiñol, M.: A kernel-based approach to document retrieval. In: Proceedings of the Ninth IAPR International Workshop on Document Analysis Systems, pp. 377–384 (2010)

  18. Gordo, A., Perronnin, F.: A bag-of-pages approach to unordered multi-page document classification. In: International Conference on Pattern Recognition, pp. 1920–1923 (2010)

  19. Gordo, A., Perronnin, F., Valveny, E.: Document classification using multiple views. In: Proceedings of the Tenth IAPR International Workshop on Document Analysis Systems, pp. 33–37 (2012)

  20. Gordo, A., Perronnin, F., Valveny, E.: Large-scale document image retrieval and classification with runlength histograms and binary embeddings. Pattern Recognit 46(7), 1898–1905 (2013)

    Article  Google Scholar 

  21. Gordo, A., Rusiñol, M., Karatzas, D., Bagdanov, A.: Document classification and page stream segmentation for digital mailroom applications. In: International Conference on Document Analysis and Recognition (2013)

  22. Hamza, H., Belaïd, Y., Belaïd, A., Chaudhuri, B.: An end-to-end administrative document analysis system. In: Proceedings of the Fourteenth International Conference on Pattern Recognition, pp. 175–182 (2008)

  23. van der Heijden, F., Duin, R., de Ridder, D., Tax, D.: Classification, Parameter Estimation and State Estimation—An Engineering Approach Using Matlab. Wiley, New York (2004)

    Book  MATH  Google Scholar 

  24. Héroux, P., Diana, S., Ribert, A., Trupin, E.: Classification method study for automatic form class identification. In: Proceedings of the Fourteenth International Conference on Pattern Recognition, pp. 926–928 (1998)

  25. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the Twenty second Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

  26. Meilender, T., Belaïd, A.: Segmentation of continuous document flow by a modified backward-forward algorithm. In: Proceedings of the Document Recognition and Retrieval (2009)

  27. Misue, K., Sakakibara, Y.: Building of a document classification tree by recursive optimization of keyword selection function. US Patent US5463773 A (1995).

  28. Odone, F., Barla, A., Verri, A.: Building kernels from binary strings for image matching. IEEE Trans. Image Process. 14(2), 169–180 (2005)

    Article  MathSciNet  Google Scholar 

  29. Porter, M.: Snowball: a language for stemming algorithms (2001)

  30. Rangoni, Y., Belaïd, A., Vajda, S.: Labelling logical structures of document images using a dynamic perceptive neural network. Int. J. Document Anal. Recognit. 15(1), 45–55 (2012)

    Article  Google Scholar 

  31. Řehůřek, R.: Subspace tracking for latent semantic analysis. In: Proceedings of the 33rd European Conference on Information Retrieval Research, pp. 289–300 (2011)

  32. Rusiñol, M., Karatzas, D., Bagdanov, A.D., Llados, J.: Multipage document retrieval by textual and visual representations. In: International Conference on Pattern Recognition, pp. 521–524 (2012)

  33. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988). doi:10.1016/0306-4573(88)90021-0

    Article  Google Scholar 

  34. Sarkar, P.: Image classification: classifying distributions of visual features. In: Proceedings of the International Conference on Pattern Recognition (2006)

  35. Schmidtler, M., Amtrup, J.: Automatic document separation: a combination of probabilistic classification and finite-state sequence modeling. In: Natural Language Processing and Text Mining, pp. 123–144 (2006)

  36. Sebsatiani, F.: Machine learning in automated text categorization. J. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  37. Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Document Anal. Recognit. 3(4), 232–247 (2001)

    Article  Google Scholar 

  38. Sidiropoulos, P., Vrochidis, S., Kompatsiaris, I.: Content-based binary image retrieval using the adaptive hierarchical density histogram. Pattern Recognit. 44(4), 739–750 (2011)

    Article  Google Scholar 

  39. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49 (1999)

  40. Yang, Y., Pederson, J.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)

  41. Young, S., Russell, N., Thornton, J.: Token passing: A simple conceptual model for connected speech recognition systems. Tech. Rep. CUED/F-INFENG/TR38, Cambridge University (1998)

Download references

Acknowledgments

This work has been partially supported by the Spanish Ministry of Education and Science under projects TIN2011-24631, TIN2012-37475-C02-02, RYC-2009-05031 and RYC-2012-11776; by the People Programme (Marie Curie Actions) of the Seventh Framework Programme of the European Union (FP7/2007-2013) under REA grant agreement no. 600388, and by the Agency of Competitiveness for Companies of the Government of Catalonia, ACCIÓ; and by the CREST project from Japan Society for the Promotion of Science (JSPS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marçal Rusiñol.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rusiñol, M., Frinken, V., Karatzas, D. et al. Multimodal page classification in administrative document image streams. IJDAR 17, 331–341 (2014). https://doi.org/10.1007/s10032-014-0225-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-014-0225-8

Keywords

Navigation