Elsevier

Pattern Recognition

Volume 45, Issue 5, May 2012, Pages 1972-1983
Pattern Recognition

Multi-oriented touching text character segmentation in graphical documents using dynamic programming

https://doi.org/10.1016/j.patcog.2011.09.026Get rights and content

Abstract

The touching character segmentation problem becomes complex when touching strings are multi-oriented. Moreover in graphical documents sometimes characters in a single-touching string have different orientations. Segmentation of such complex touching is more challenging. In this paper, we present a scheme towards the segmentation of English multi-oriented touching strings into individual characters. When two or more characters touch, they generate a big cavity region in the background portion. Based on the convex hull information, at first, we use this background information to find some initial points for segmentation of a touching string into possible primitives (a primitive consists of a single character or part of a character). Next, the primitives are merged to get optimum segmentation. A dynamic programming algorithm is applied for this purpose using the total likelihood of characters as the objective function. A SVM classifier is used to find the likelihood of a character. To consider multi-oriented touching strings the features used in the SVM are invariant to character orientation. Experiments were performed in different databases of real and synthetic touching characters and the results show that the method is efficient in segmenting touching characters of arbitrary orientations and sizes.

Highlights

► Touching text character segmentation. ► Multi-scale and multi-oriented character recognition. ► OCR in graphical documents.

Introduction

As electronic media becomes more and more accessible, the need for transferring offline documents to the electronic domain grows. Optical Character Recognition (OCR) works by scanning source documents and performing character analysis on the resulting images giving a transcription to ASCII text which can then be stored and manipulated electronically like any standard electronic document. As part of the OCR process, character segmentation techniques are applied to word images before individual characters images are recognized. The simplest way to perform character segmentation is to use the small space between characters as segmentation regions. This strategy fails when there are touching or broken characters, which often occur in degraded text images. Some examples of such documents are photocopies, faxes, historical documents, etc. and they are often degraded due to compression, bilevel conversion, aging or poor typing [3], [28]. In these situations, two or more characters may be segmented as one character component or one character may split into multiple pieces. Due to degradation, adjacent characters in a word touch together and they share common pixels in touching regions [27].

Besides the huge amount of documents having only horizontal direction text, there are many graphical documents such as maps, engineering drawings, etc. or artistic documents, where text lines appear frequently in different orientations other than usual horizontal direction. The purpose of such orientation and curvi-linearity is to catch people's attention at some particular words/lines or to annotate the location of graphical objects. Thus, a single document may contain strings with different inter-character spacing in the strings due to the annotation, style, etc. Also, the curvi-linear nature of the text makes the orientations of characters in a string different. As a result, it is difficult to detect the skew of such strings and hence character recognition of such documents is a complex task.

Segmentation of touching components is one of the difficulties to get higher recognition rates by OCR systems. The OCR systems available commercially do not perform well when words are multi-oriented in fashion in a document. When touching occurs in multi-oriented documents (e.g. artistic or graphical documents), it is much more difficult to segment such multi-oriented touching than touching segmentation of normal horizontal touching. Touching in curvi-linear string leads to false character segmentation and hence wrong recognition result occurs.

Text-lines could appear at different directions in the same document as illustrated in Fig. 1. It can be seen from Fig. 1(a), the word “PLANET” contains a touching string “LANE” of four characters. In Fig. 1(b), we show a map where many characters in the word “Mayurakshi” are touching and they are oriented in different directions, although they belong to a same word. Orientation of two touching strings “ON” and “RE” of Fig. 1(c) are perpendicular to each other. In Fig. 1(d), it may be noted that orientations of “es” and “no” in the word “Couesnon” are not the same and such strings create difficulty for segmentation.

There are many published papers towards the recognition and segmentation of the touching characters of horizontal direction [2], [15], [19], [30] and they are briefly reviewed here.

Among the earlier pieces of work on touching character segmentation, one class of approaches uses contour features of the connected components for segmentation [7], [15], [29]. When analyzing the contour of a touching pattern, valley and crest points are derived. Next, a cutting path is decided to segment the touching pattern by joining valley and crest points. Kahan et al. [13] used projection profiles as the objective function for touching character segmentation. They used the idea of joining adjacent characters that have minimum vertical projection. The segmentation function is calculated from the ratio of the second derivative of the projection-profile curve to its height. Later, Lu [18] introduced a peak-to-valley function to improve the segmentation approach. Fujisawa et al. [8] used profile features for touching numeral segmentation. Upper and lower profiles of the connected component are computed and the distance between upper and lower profiles are analyzed to detect the segmentation points.

Afterwards, Liang et al. [17] proposed discriminating functions for machine printed touching character segmentation. Pixel projection and profile projection techniques are employed as discrimination functions for solving heavily touching printed characters. Next, they applied forward segmentation along with a backward merge procedure based on the output of a character classifier. It works on the components generated by discriminating functions. Yu and Yan [32] presented a segmentation technique using structural features for single-touching handwritten numeral strings. At first, the touching region of the character components is determined based on its structural shape. Next, a candidate touching point is pre-selected using the geometrical information of special structural shapes. Finally, morphological analysis and partial recognition results are used for the purpose of segmentation. Dimauro et al. [5] applied contour based features along with some descending algorithms for the touching character segmentation purpose.

Another class of approaches is based on thinning [3], [19]. In these approaches, thinning of foreground and/or background pixels of the connected pattern are processed. End and fork points obtained by thinning are used for cutting points extraction. These methods are time consuming and in addition they generate protrusions. These protrusions sometimes give wrong results because they bring some confusions among the actual fork and end points.

A water reservoir based technique [21] is employed to locate inter-character spaces in touching numeral strings. Water reservoir is a metaphor to illustrate the cavity region of a component. In this sense, if water is poured from a side of a component, the cavity regions of the background portion of the component where water will be stored are considered as reservoirs of the component. Based on the size of water reservoirs, the segmentation zones of the touching string are selected. Next, segmentation is done using structural information of these reservoirs.

Yong et al. [31] proposed an approach using supervised learning on the labeled examples and a Markov Random Field (MRF) approach has been applied for this purpose. Further, a propagation minimization method is employed to select the candidate patches based on the compatibility of the neighbor patches. The output of the MRF after the iterative belief propagation forms a segmentation probability map. Finally, the cut position is extracted from the map. An accuracy rate of 94.8% is reported.

Methods based on combinations of features have also been used for touching segmentation. Oliveira et al. [20] used contour, profile and skeleton features to find a set of points for touching characters segmentation. First, local minima of contours and profile features are defined as basic point (BP). Second, a point with more than two pixels in its neighborhood is defined as an intersection point (IP). Afterwards, an Euclidean distance scheme is applied to determine proximity between IP and BP for segmentation.

The state-of-the-art approaches of touching character segmentation generally consider touching of characters in horizontal text strings. These methods assume the characters of strings are aligned horizontally and thus segmentation features are devised for such characters in horizontal strings. Also, the features used in most of the approaches for text character recognition are generally not rotation invariant. The characters along a touching portion may be in different orientations with respect to the baseline of the word. In graphical documents when characters touch, it is difficult to know the angle of alignment of characters in the touching regions. Moreover in Fig. 1(b), we show examples where the characters in a single-touching have different orientations. As a result, skew correction methods cannot make such touching horizontal and hence the methods that take care of horizontal touching cannot be used. For segmentation purpose, we need technique that can take care of size and rotation invariant touching strings. Hence, we propose here a segmentation approach that can handle touching strings in multiple orientations. Recently, we proposed a touching character segmentation approach in ICDAR-2009 [25] and the present work is its extended version. This paper elaborates the different steps of character segmentation method. Also, extensive experiments including comparative study are included in this version to prove the efficiency of this method.

As mentioned earlier, many techniques are available for segmentation of horizontal touching characters but to the best of our knowledge there is no work towards multi-oriented touching character segmentation except our work. In this present paper, we propose an approach for multi-oriented n-character touching string segmentation scheme. The block-diagram of our approach is shown in Fig. 2. The different steps used in this system are discussed as follows.

Recognition process: an important step in our system is the recognition of the isolated character. As we consider multi-oriented graphical document, features used in our system must be rotation invariant. Circular and convex hull ring based zoning approach has been used along with angular information of the contour pixels of the character to make the feature rotation invariant. A SVM classifier is used to find the likelihood of a character. The C1 and C2 modules of Fig. 2 discuss about recognition.

Touching component detection: there may exist touching or non-touching characters in a word. Before passing a component into our segmentation process (steps T2–T5 of Fig. 2), we detect if the component is touching or isolated. We apply a Connected Component (CC) labeling to the word image and extract individual components. For each component, we compute the recognition confidence for all character class models using our recognition process (steps C1–C2). Based on this recognition score, the components are separated into touching and non-touching components. The touching components are processed next for segmentation. This module is noted by T1 in Fig. 2 and discussed later.

Segmentation zones: when two or more characters touch, they generate a big cavity region at the background portion. This background portion is used to detect the segmentation zones. To handle the background information of a multi-oriented string, properties of the convex hull of the touching string have been used. This process is marked by T2 in Fig. 2.

Initial segmentation points: the segmentation zones are used to find the segmentation points. A set of initial segmentation points are calculated in the contours of convex hull residua using the Douglas–Peucker polyline-approximation. This is denoted by T3 in Fig. 2.

Primitive segmentation: next, segmentation lines are calculated from the initial segmentation points of the touching character. Based on these segmentation lines, the touching string is segmented into primitives. A primitive consists of a single character or a part of a single character. This step is mentioned by T4 in Fig. 2.

Merging of primitive segments using dynamic programming: some of the primitives obtained before are merged to get optimum segmentation. To do this, dynamic programming algorithm is applied using total likelihood of characters as the objective function. Based on the recognition rates of primitive segments, multiple hypothesis of segmentation are generated. A dynamic programming (DP) algorithm is applied to get the optimal solution for the touching character segmentation. This step is marked by T5 in Fig. 2.

As discussed earlier, the main contribution of this paper is the multi-oriented n-character string segmentation for its recognition (i.e. steps T2–T5 in Fig. 2). However, it is difficult to dissociate this part from the recognition process. So, we will present recognition procedure briefly before detail discussion of touching character segmentation.

The rest of the paper is organized as follows. In Section 2, we explain the feature extraction procedure as well as recognition for handling characters in multi-scale and multi-oriented environments (steps C1 and C2 of Fig. 2). In Section 3, we present the proposed segmentation approach for n-touching strings (steps T1–T5 of Fig. 2). Next, data details and the different experimental results are discussed in Section 4. Finally, the paper is concluded in Section 5.

Section snippets

Feature description and recognition of text character

Recognition of text characters in multi-rotation and multi-scale environment is a challenging task. Recognition of individual characters in multi-oriented and multi-sized environment drives the segmentation hypothesis of n-touching characters in our system. In the literature different shape descriptors like Angular Radial Transform (ART) [23], Hu's moments [12], Zernike moments [14], Fourier–Mellin [1], Angle based features [22], etc. are proposed for recognition and they are invariant to

Touching character segmentation

A touching component segmentation approach for two touching characters in a multi-oriented environment was presented in [24]. This segmentation was performed with the knowledge of the number of total characters in the touching component. To do this, touching components of only two characters were collected and the method was based on cavity regions of the background portion. The convex hull was used to find the cavity regions. Next, several hypothesis of segmentation lines were computed from

Data collection and experimental results

In this section we detail the performance of the proposed approach. First we describe the dataset generation and then we show experimental results of our touching character segmentation approach.

Conclusions

In this paper we have proposed a scheme towards segmentation of multi-oriented and multi-sized n-character touching strings. The algorithm, at first, segments the touching characters into primitives and then finds the best sequence of characters shapes based on a dynamic programming approach using these primitive segments.

We also performed an adhoc segmentation approach of touching characters based on the knowledge of number of characters in the touching string. In such a restrictive dataset,

Acknowledgments

This work has been partially supported by the Spanish projects TIN2009-14633-C03-03, TIN2008-04998 and CONSOLIDERINGENIO 2010 (CSD2007-00018). It has also been supported in part by the AAP program 2010–2011 of François Rabelais University, Tours City, France. The authors would like to thank Dr. Sebastien Adam for providing the Fourier–Mellin shape descriptor code for our research.

References (32)

  • T.C. Chang et al.

    Character segmentation using convex-hull techniques

    International Journal of Pattern Recognition and Artificial Intelligence

    (1999)
  • Y.K. Chen et al.

    Segmentation of single- or multiple-touching handwritten numeral string using background and foreground analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2000)
  • M. Delalandre et al.

    Generation of synthetic documents for performance evaluation of symbol recognition and spotting systems, International Journal on Document Analysis and Recognition

    (IJDAR)

    (2011)
  • G. Dimauro et al.

    From character to cursive script recognition: future trends in scientific research

  • D. Douglas et al.

    Algorithms for the reduction of the number of points required to represent a digitized line or its caricature

    The Canadian Cartographer

    (1973)
  • R. Fenrich

    Segmentation of automatically located handwritten words

  • Cited by (0)

    View full text