Skip to main content
Log in

A graph-based approach for segmenting touching lines in historical handwritten documents

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Text line segmentation in handwritten documents is an important task in the recognition of historical documents. Handwritten document images contain text lines with multiple orientations, touching and overlapping characters between consecutive text lines and different document structures, making line segmentation a difficult task. In this paper, we present a new approach for handwritten text line segmentation solving the problems of touching components, curvilinear text lines and horizontally overlapping components. The proposed algorithm formulates line segmentation as finding the central path in the area between two consecutive lines. This is solved as a graph traversal problem. A graph is constructed using the skeleton of the image. Then, a path-finding algorithm is used to find the optimum path between text lines. The proposed algorithm has been evaluated on a comprehensive dataset consisting of five databases: ICDAR2009, ICDAR2013, UMD, the George Washington and the Barcelona Marriages Database. The proposed method outperforms the state-of-the-art considering the different types and difficulties of the benchmarking data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. For the sake of understanding we denote \(v \in \{\gamma _i\}\) to represent a node belonging to the category of initial nodes (equally for the rest of types).

  2. This database is available upon request to the authors of the paper.

References

  1. Alaei, A., Nagabhushan, P., Pal, U.: Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with persian text documents. Pattern Anal. Appl. 14(4), 381–394 (2011)

    Article  MathSciNet  Google Scholar 

  2. Arivazhagan, M., Srinivasan, H., Srihari, S.: A statistical approach to line segmentation in handwritten documents. In: Document Recognition and Retrieval XIV SPIE, pp. 6500T-1-11 (2007)

  3. Boykov, Y., Veksler, O.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  4. Bukhari, S., Shafait, F., Breuel, T.: Script-independent handwritten textlines segmentation using active contours. In: International Conference on Document Analysis and Recognition, pp. 446–450 (2009)

  5. Bukhari, S., Shafait, F., Breuel, T.: Towards generic text-line extraction. In: International Conference on Document Analysis and Recognition, pp. 748–752 (2013)

  6. Bukhari, S.S., Breuel, T.M.: Layout analysis for arabic historical document images using machine learning. In: International Conference on Frontiers in Handwriting Recognition, pp. 635–640 (2012)

  7. Cohen, E., Hull, J., Srihari, S.: Control structure for interpreting handwritten addresses. IEEE Trans. Pattern Anal. Mach. Intell. 16(10), 1049–1055 (1994)

    Article  Google Scholar 

  8. Cruz, F., Ramos, O.: Handwritten line detection via an em algorithm. In: International Conference on Document Analysis and Recognition (2013)

  9. Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Mathematik 1(1), 269–271 (1959)

    Article  MATH  MathSciNet  Google Scholar 

  10. Dos Santos, R., Clemente, G.S.G., Ren, T.T.I., Cavalcanti, G.G.D., Santos, R.P.D.: Text line segmentation based on morphology and histogram projection. In: International Conference on Document Analysis and Recognition, pp. 651–655 (2009)

  11. Feldbach, M., Tonnies, K.: Line detection and segmentation in historical church registers. In: International Conference on Document Analysis and Recognition, pp. 743–747 (2001)

  12. Fernández, D., Lladós, J., Fornés, A.: Handwritten word spotting in old manuscript images using a pseudo-structural descriptor organized in a hash structure. In: Pattern Recognition and Image Analysis, pp. 628–635 (2011)

  13. Gatos, B., Stamatopoulos, N., Louloudis, G.: Icdar 2009 handwriting segmentation contest. In: International Conference on Document Analysis and Recognition, pp. 1393–1397 (2009)

  14. Ha, J., Haralick, R.M., Phillips, I.T.: Document page decomposition by the bounding-box project. In: International Conference on Document Analysis and Recognition, p. 1119 (1995)

  15. Hart, P., Nilsson, N.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)

    Article  Google Scholar 

  16. Hull, J.: Document image skew detection: survey and annotated bibliography. Series in Machine Perception and Artificial Intelligence, pp. 40–66 (1998)

  17. Jindal, M., Sharma, R., Lehal, G.: Segmentation of horizontally overlapping lines in printed indian scripts. In: International Journal of Computational Intelligence Research, pp. 277–286 (2007)

  18. Kang, L., Doermann, D.: Template based Segmentation of Touching Components in Handwritten Text Lines. In: International Conference on Document Analysis and Recognition, pp. 569–573 (2011)

  19. Kang, L., Kumar, J., Ye, P., Dermann, D.: Learning text-line segmentation using codebooks and graph partitioning. In: International Conference on Frontiers in Handwriting Recognition, pp. 63–68 (2012)

  20. Kang, L., Kumar, J., Ye, P., Doermann, D.: Learning Text-line Segmentation using Codebooks and Graph Partitioning. In: International Conference on Frontiers in Handwriting Recognition, pp. 63–68 (2012)

  21. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1988)

    Article  Google Scholar 

  22. Kavallieratou, E., Dromazou, N., Fakotakis, N., Kokkinakis, G.: An integrated system for handwritten document image processing. Int. J. Pattern Recogn. Al 17(4), 617–636 (2003)

    Article  Google Scholar 

  23. Kavallieratou, E., Fakotakis, N., Kokkinakis, G.K.: An unconstrained handwriting recognition system. Int. J. Doc. Anal. Recogn. 4(4), 226–242 (2002)

    Article  Google Scholar 

  24. Kennard, D., Barrett, W.: Separating lines of text in free-form handwritten historical documents. In: Document Image Analysis for Libraries, pp. 12–23 (2006)

  25. Koo, H., Cho, N.: Text-line extraction in handwritten chinese documents based on an energy minimization framework. Trans. Img. Proc. pp. 1169–1175 (2012)

  26. Kornfield, E., Manmatha, R., Allan, J.: Text alignment with handwritten documents. In: Document Image Analysis for Libraries, pp. 195–209 (2004)

  27. Kumar, J., Abd-Almageed, W., Kang, L., Doermann, D.: Handwritten arabic text line segmentation using affinity propagation. In: IAPR International Workshop on Document Analysis Systems, pp. 135–142 (2010)

  28. Kumar, J., Kang, L., Doermann, D., Abd-Almageed, W.: Segmentation of handwritten textlines in presence of touching components. In: International Conference on Document Analysis and Recognition. pp. 109–113 (2011)

  29. Kumar, K.S., Namboodiri, A.: Learning segmentation of documents with complex scripts. In: Indian conference on Computer Vision, Graphics and Image Processing, pp. 749–760 (2006)

  30. Li, Y., Zheng, Y., Doermann, D., Jaeger, S.: Script-independent text line segmentation in freestyle handwritten documents. IEEE Trans. Pattern Anal. Mach. Intell. 30(8), 1313–1329 (2008)

    Article  Google Scholar 

  31. Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line segmentation of historical documents: a survey. Int. J. Doc. Anal. Recogn. 9(2–4), 123–138 (2006)

    Google Scholar 

  32. Liwicki, M., Indermuhle, E., Bunke, H.: On-line handwritten text line detection using dynamic programming. In: International Conference on Document Analysis and Recognition, pp. 447–451 (2007)

  33. Louloudis, G., Gatos, B., Pratikakis, I., Halatsis, C.: Text line detection in handwritten documents. Pattern Recogn. 41(12), 3758–3772 (2008)

    Article  MATH  Google Scholar 

  34. Manmatha, R., Rothfeder, J.: A scale space approach for automatically segmenting words from historical handwritten documents. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1212–1225 (2005)

  35. Manmatha, R., Srimal, N.: Scale space technique for word segmentation in handwritten documents. Scale Space Theor. Comput. Vis. 1682, 22–33 (1999)

    Article  Google Scholar 

  36. Manohar, V., Vitaladevuni, S., Cao, H., Prasad, R., Natarajan, P.: Graph clustering-based ensemble method for handwritten text line segmentation. In: International Conference on Document Analysis and Recognition, pp. 574–578 (2011)

  37. Nicolaou, A., Gatos, B.: Handwritten text line segmentation by shredding text into its lines. In: International Conference on Document Analysis and Recognition, pp. 626–630 (2009)

  38. O’Gorman, L.: The document spectrum for page layout analysis. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1162–1173 (1993)

  39. Otsu, N.: IEEE Trans. Syst. Man Cybern. 11(285–296), 62–66 (1979)

    Google Scholar 

  40. Ouwayed, N., Belaid, A.: A general approach for multi-oriented text line extraction of handwritten document. Int. J. Doc. Anal. Recogn. 15(4), 297–314 (2011)

    Article  Google Scholar 

  41. Papavassiliou, V., Stafylakis, T., Katsouros, V., Carayannis, G.: Handwritten document image segmentation into text lines and words. Pattern Recogn. 15(4), 369–377 (2010)

    Article  Google Scholar 

  42. Rath, T., Manmatha, R.: Word image matching using dynamic time warping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 521–527 (2003)

  43. Rath, T., Manmatha, R.: Word image matching using dynamic time warping. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. II-521 - II-527 vol. 2 (2003)

  44. Rath, T., Manmatha, R., Lavrenko, V.: A search engine for historical manuscript images. In: International Conference on Research and development in information retrieval, pp. 369–376 (2004)

  45. Rath, T.M., Manmatha, R.: Word spotting for historical documents. Int. J. Doc. Anal. Recogn. 9(2–4), 139–152 (2007)

    Article  Google Scholar 

  46. Rodríguez-Serrano, J.A., Perronnin, F.: Handwritten word-spotting using hidden markov models and universal vocabularies. Pattern Recogn. 42(9), 2106–2116 (2009)

    Article  MATH  Google Scholar 

  47. Rohini, S., Uma Devi, R., Mohanavel, S.: Segmentation of touching, overlapping, skewed and short handwritten text lines. In: International Journal of Computer Applications. pp. 24–27 (2012)

  48. Roy, P., Pal, U., Lladós, J.: Morphology based handwritten line segmentation using foreground and background information. In: International Conference on Frontiers in Handwriting Recognition, pp. 241–246 (2008)

  49. Saabni, R., El-Sana, J.: Language-independent text lines extraction using seam carving. In: International Conference on Document Analysis and Recognition, pp. 563–568 (2011)

  50. Sarkar, R., Moulik, S., Das, N., Basu, S., Nasipuri, M., Kundu, M.: Suppression of non-text components in handwritten document images. In: International Conference on Image Information Processing, pp. 1–7 (2011)

  51. Shi, Z., Setlur, S., Govindaraju, V.: Text extraction from gray scale historical document images using adaptive local connectivity map. In: International Conference on Document Analysis and Recognition, pp. 794–798 (2005)

  52. Shi, Z., Setlur, S., Govindaraju, V.: A steerable directional local profile technique for extraction of handwritten arabic text lines. In: International Conference on Document Analysis and Recognition, pp. 176–180 (2009)

  53. Stafylakis, T., Papavassiliou, V., Katsouros, V., Carayannis, G.: Robust text-line and word segmentation for handwritten documents images. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3393–3396 (2008)

  54. Stamatopoulos, N., Gatos, B., Louloudis, G., Pal, U., Alaei, A.: Icdar 2013 handwriting segmentation contest. In: International Conference on Document Analysis and Recognition pp. 1402–1406 (2013)

  55. Wahlberg, F., Brun, A.: Graph based line segmentation on cluttered handwritten manuscripts. In: International Conference on Pattern Recognition, pp. 1570–1573 (2012)

  56. Yanikoglu, B.: Segmentation of off-line cursive handwriting using linear programming. Pattern Recogn. 31(12), 1825–1833 (1998)

  57. Yin, F.: Handwritten text line extraction based on minimum spanning tree clustering. In: International Conference on Wavelet Analysis and Pattern Recognition, pp. 1123–1128 (2007)

  58. Yin, F., Liu, C.: Handwritten text line segmentation by clustering with distance metric learning. In: International Conference on Frontiers in Handwriting Recognition, pp. 229-234 (2008)

  59. Yin, F., Liu, C.: Handwritten chinese text line segmentation by clustering with distance metric learning. Pattern Recogn. 42(12), 3146–3157 (2009)

    Article  MATH  Google Scholar 

  60. Zagoris, K., Pratikakis, I., Antonacopoulos, A., Gatos, B., Papamarkos, N.: Handwritten and machine printed text separation in document images using the bag of visual words paradigm. In: International Conference on Frontiers in Handwriting Recognition, pp. 103–108 (2012)

Download references

Acknowledgments

The authors thank the CED-UAB and the Cathedral of Barcelona for providing the images. This work has been partially supported by the Spanish projects TIN2011-24631 and TIN2012-37475-C02-02 and by the EU project ERC-2010-AdG-20100407-269796.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Fernández-Mota.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fernández-Mota, D., Lladós, J. & Fornés, A. A graph-based approach for segmenting touching lines in historical handwritten documents. IJDAR 17, 293–312 (2014). https://doi.org/10.1007/s10032-014-0220-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-014-0220-0

Keywords

Navigation