Skip to main content

The Diagonal Split: A Pre-segmentation Step for Page Layout Analysis and Classification

  • Conference paper
Pattern Recognition and Image Analysis (IbPRIA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5524))

Included in the following conference series:

Abstract

Document classification is an important task in all the processes related to document storage and retrieval. In the case of complex documents, structural features are needed to achieve a correct classification. Unfortunately, physical layout analysis is error prone. In this paper we present a pre-segmentation step based on a divide & conquer strategy that can be used to improve the page segmentation results, independently of the segmentation algorithm used. This pre-segmentation step is evaluated in classification and retrieval using the selective CRLA algorithm for layout segmentation together with a clustering based on the voronoi area diagram, and tested on two different databases, MARG and Girona Archives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A.K., Bhattacharjee, S.: Text segmentation using Gabor filters for automatic document processing. Machine Vission Appl. 5, 169–184 (1992)

    Article  Google Scholar 

  2. O’Gorman, L.: The Document Spectrum for Page Layout Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1162–1173 (1993)

    Article  Google Scholar 

  3. Baird, H.S.: Background structure in document images. In: Document Image Analysis, pp. 17–34. World Scientific, Singapore (1994)

    Chapter  Google Scholar 

  4. Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recognit. 10(1), 1–16 (2007)

    Article  Google Scholar 

  5. Cesarini, F., Lastri, M., Marinai, S., Soda, G.: Encoding of modified X-Y trees for document classification. In: Proceedings Sixth International Conference on Document Analysis and Recognition, pp. 1131–1136 (2001)

    Google Scholar 

  6. Shafait, F., Keysers, D., Breuel, T.M.: Performance Comparison of Six Algorithms for Page Segmentation. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 368–379. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Keysers, D., Deselaers, T., Ney, H.: Pixel-to-Pixel Matching for Image Recognition using Hungarian Graph Matching. In: DAGM 2004, Pattern Recognition, 26th DAGM Symposium, pp. 154–162 (2004)

    Google Scholar 

  8. Kise, K., Sato, A., Iwata, M.: Segmentation of page images using the area Voronoi diagram. Comput. Vis. Image Underst. 70(3), 370–382 (1998)

    Article  Google Scholar 

  9. Sun, H.: Page segmentation for Manhattan and non-Manhattan layout documents via selective CRLA. In: Proc. Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 116–120 (2005)

    Google Scholar 

  10. Nagy, G., Seth, S.: Hierarchical representation of optically scanned documents. In: Proc. Seventh Int. Conf. Patt. Recogn (ICPR), pp. 347–349 (1984)

    Google Scholar 

  11. van Beusekom, J., Keysers, D., Shafait, F., Breuel, T.M.: Distance measures for layout-based document image retrieval. In: Second International Conference on Document Image Analysis for Libraries(DIAL) (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gordo, A., Valveny, E. (2009). The Diagonal Split: A Pre-segmentation Step for Page Layout Analysis and Classification. In: Araujo, H., Mendonça, A.M., Pinho, A.J., Torres, M.I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2009. Lecture Notes in Computer Science, vol 5524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02172-5_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02172-5_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02171-8

  • Online ISBN: 978-3-642-02172-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics