1 Introduction

Colon cancer is the fourth most common cause of cancer death worldwide and its survival rate depends on the stage it is detected on, going from rates higher than \(95\,\%\) in the first stages to rates lower than \(35\,\%\) in later stages [1]. Although colonoscopy is the gold standard screening modality for early cancer detection, a polyp miss-rate, especially for the smallest ones, around \(6\,\%\) [2] limits its diagnosis yield.

The high definition videos provided by recent equipments allow a more accurate detection of the smallest polyps [3] at the cost of an increase of computational time if all available information is processed. Such increase in computational time is definitely a flaw for an effective deployment in clinical environments as part of an intelligent system [4]. During an exploration, the navigation along the flexible colon anatomy produces many non-informative frames in videos and non-informative regions in valid images. Efficient identification of such non-informative data would substantially speed-up a further processing of colonoscopy explorations.

Most of the existing works on non-informative data identification focus on frame detection. The work of [5] addresses the identification of non-informative frames -with low quality or without any useful clinical information- by analyzing the energy of the detail coefficients of the wavelet decomposition of a given image, which is used as the input to the classification system. The work of [6] defines a measure -isolated pixel ratio (IPR)- to classify frames into informative, ambiguous and non-informative. The IPR measure is calculated from the edges of the image, being an isolated pixel one that is not connected to any other edge pixel. Some authors [7] analyze image content to discern between frames that correspond to either a diagnostic or a therapeutic operation. The identification of the operation shots is based on the detection of diagnostic or therapeutic instruments, mapping in this case the detection of instrumental to the problem of detecting the cables of these instruments.

Concerning non-informative region -NIR- identification, the only work addressing the topic is, up to our knowledge, the content-based approach presented in [8, 9]. On the grounds that dark regions do not contain data valid for a further image processing, images were split into darker and brighter regions using watershed segmentation. Although efficient, a main concern was that intensity does not suffice for describing the endoluminal scene [10]. Moreover, the cited method included into its definition of NIR region the black borders of the image which has a clear impact in its performance.

In this paper we propose to extend the geometric-appearance models of the lumen introduced in [10] to segment NIR regions. We generate a one-parametric family of likelihood maps which minimal curve progressively approaches the non-informative region. For each such a likelihood map we define a non-informative cost function which minimum selects the parameter that best splits images. NIR region boundaries are extracted using three different operators that are evaluated using non-parametric Analysis of Variance (ANOVA) to determine the most suitable ones in terms of area overlap scores.

The structure of the paper is as follows: we explain our non-informative region segmentation method in Sect. 2. Experimental results are exposed in Sect. 3. We close this paper with the conclusions and future work in Sect. 4.

2 A Strategy for Discarding Non Informative Regions in Colonoscopy

Non informative regions are areas of such a low visual quality that neither physicians nor computer vision methods would be able to discern anything inside them. Non informative regions include lumen and other dark parts of the image generated by protruding objects which decrease the reflection towards the camera of structures below them. Therefore they correspond to dark region of images whose center is the hub of image gradients. Following [10], we characterize dark areas of the image using our Dark Region Identification (DRI) given by convolving the image, \(I=I(x,y)\), with an isotropic gaussian kernel of standard deviation \(\sigma \):

$$ {\textit{DRI}}:= g_{\sigma } *I = \frac{1}{(2\pi ) \sigma ^2 } e^{ -\left( \frac{ x^2 }{2 \sigma ^2} +\frac{y^2}{2 \sigma ^2} \right) } *I(x,y) $$

Meanwhile, image gradient centralness is formulated using a Directed Gradient Accumulation (DGA) image given by [10]:

$$ {\textit{DGA}}(x,y):= \sum _{(x_0,y_0)} \chi _{\gamma _{\nabla I (x_0,y_0)}}(x,y) $$

for the image \(\chi _{\gamma _{\nabla I(x_0,y_0)}}(x,y)\) defined for each image pixel \((x_0,y_0)\) as the mask associated to its gradient line, \(\gamma _{\nabla I(x_0,y_0)}(\lambda )=(x_0,y_0)+ \lambda \nabla I(x_0,y_0)\) and \(\lambda \) as the free parameter of the gradient line equation.

Non informative region pixels will have low DRI and high DGA values. We build up a feature space given by (DRI,DGA) which should discriminate pixels from non informative region from others belonging to informative endoluminal structures. The semi-supervised strategy described in [10], classifies the space (DRI,DGA) into informative and non-informative classes by means of a linear classifier:

$$\begin{aligned} LK_{0}=({\textit{DRI}}-{\textit{DRI}}_0)V_{{\textit{DRI}}}+({\textit{DGA}}-{\textit{DGA}}_0) V_{{\textit{DGA}}}=0 \end{aligned}$$
(1)

for \((V_{{\textit{DRI}}},V_{{\textit{DGA}}})\) the normal to the line defined by the classifier to separate the two categories and which passes through \(({\textit{DRI}}_0,{\textit{DGA}}_0)\). The linear classifier (1) defines a 1-parametric family of likelihood maps depending on the intercept, namely \(l\):

$$\begin{aligned} \begin{array}{rcl} LK_{l} &{}:=&{} | {\textit{DRI}} \cdot V_{{\textit{DRI}}} + {\textit{DGA}} \cdot V_{{\textit{DGA}}} - ({\textit{DRI}}_0 \cdot V_{{\textit{DRI}}} + {\textit{DGA}}_0 \cdot V_{{\textit{DGA}}}) | = \\ &{}=&{} | {\textit{DRI}} \cdot V_{{\textit{DRI}}} + {\textit{DGA}} \cdot V_{{\textit{DGA}}} - l| \end{array} \end{aligned}$$
(2)

for \(| \cdot |\) the absolute value. The values \(LK_{l}\) can be interpreted as the distance (in the feature space) to the set of pixels that define the border (given by \(LK_{l} =0\)) between non-informative and informative regions. This set of pixels correspond to a local minima of \(LK_{l}\) which appears as an energy valley in Lumen Energy Map -LEM- images. It follows that the boundary, which we note by \(\gamma _{LK_{l}}\) separating non-informative and informative regions could be extracted using a suitable valley detector.

In order that the boundary curve properly encloses non-informative regions, a main requirement is that the valley operator yields closed regular curves. Closed contours are required for the dynamic selection of the optimal \(l\) value, while regularity is a must for an accurate region segmentation. Under these considerations we have tested 3 region valley boundary detection methods:

  1. 1.

    GSM2. This valley operator presented in [11] produces complete valleys by combining steerable filters [12] with operators based on level sets geometry [13]. The geometric operator [13] is computed as the divergence of the maximum eigenvector of the structure tensor, \(\overrightarrow{V}=(P,Q)\), reoriented along the image gradient:

    $$\begin{aligned} NRM :=\mathrm {div}( \overrightarrow{V})=\partial _x P+\partial _y Q \end{aligned}$$
    (3)

    where NRM stands for Normalized Ridge Map and \(P,Q\) stand for the components of the structure tensor \(\overrightarrow{V}\). The above operator assigns positive values to ridge pixels and negative values to valley ones. A main advantage is that NRM \(\in [-N, N]\) for \(N\) the dimension of the volume, so that it is possible to set a threshold common to any volume for detecting significant valleys. A main limitation is that it could produce fragmented curves. In [11] this was overcome by further convolving NMR with a bank of steerable filters defined by 2nd derivatives of (oriented) anisotropic Gaussian kernels \(g_{\sigma }^{\uptheta }\) with standard deviation \(\sigma _y\):

    $$\begin{aligned} \partial ^2_{y} g_{\sigma }^{\uptheta }=(\tilde{y}^2/\sigma _y^4-1/\sigma _y^2) g_{\sigma }^{\uptheta } \end{aligned}$$
    (4)

    for \((\tilde{x}, \tilde{y})\) the coordinates given by a rotation of angle \(\uptheta \) that transform the y-axis into the unitary vector \((cos(\uptheta ),sin(\uptheta ))\) and the scales set to \(\sigma _x = 2* \sigma _y\). The maximum response for a sampling of the angulation, \(\uptheta _{i}\) given by \(\uptheta _i = \{\frac{i \pi }{N}, i=1..N\}\), defines the Geometric Steerable Map (GSM2) as:

    $$\begin{aligned} {\textit{GSM2}} := \max _{i} \left( \partial ^2_{y} g_{\sigma }^{\theta _{i}} *NRM \right) \end{aligned}$$
    (5)
  2. 2.

    Depth of Valleys Accumulation-based segmentation (DOVA). Depth of Valleys Accumulation -DOVA- energy map are linked with likelihood of polyp presence in colonoscopy images [14]. These maps are built from a model of appearance for polyps which describe their boundaries in terms of valley information. In order to generate these maps the authors use a ring of radial sectors to accumulate contributions of pixels with high valley information. We propose here to use the same rationale but in this case taking advantage of the fact that we already know which is the lumen center -\(c^{max}\)- and we have \({\textit{LEM}}\) maps. The application of DOVA maps here consists of calculating valley information from the LEM maps to obtain a first approximation of NIR region boundary which is later refined using the ring of radial sectors. The steps are:

    1. (a)

      Definition of a ring of \(ns\) radial sectors centred in \(c^{max}\).

    2. (b)

      Calculation of the valley image \({\textit{VLEM}}\) from LEM maps using valley detection method proposed in [13].

    3. (c)

      Calculation of the position of the maximum of \({\textit{VLEM}}\) image under each sector \(S_i\) of the ring as \(p^{max}_{i} \in S_i | \forall k \in S_i, {\textit{VLEM}}(c^{max}_{i}) \ge {\textit{VLEM}}(q), {\text {with}}\,i \in [1,ns]\).

    By directly joining the positions of \(p^{max}_{i}\) we can obtain a first segmentation of the NIR region. As this segmentation may present some spikes we propose the use of median filtering of \(p^{max}_{i}\) in the polar domain to correct these irregularities in order to have similar distances from consecutive sectors maximums to \(c^{max}\).

  3. 3.

    Watershed with markers (WSM). Watershed segmentation considers a grayscale image as a topographic surface and achieves the segmentation by a process of ‘filling’ of catchment basins from local minimums. Providing markers helps the algorithm to define the catchment basins that must be considered in the process of segmentation [15]. For our specific applications we will use \(c^{max}\) as the internal marker, placing the external marker in a padding masks surrounding the whole image.

For all the methods above, the non informative region was identified as the one containing the center of the lumen. Such point can be computed from the linear classifier (1) using the semi-supervised strategy described in [10].

We observe that the \({\textit{LK}}_{l}\) map best separating non-informative and informative regions should split image pixels into the darkest and brightest ones. Darkest pixels should all lie in the interior of \({\textit{LK}}_{l}\) boundary, \(\gamma _{{\textit{LK}}_{l}}\), while the image region outside \(\gamma _{{\textit{LK}}_{l}}\) should present a significantly brighter intensity level. We will note pixels belonging to the interior of \(\gamma _{{\textit{LK}}_{l}}\) by \({\textit{NIR}}\) and by \({\textit{NIR}}^c\) pixels outside \(\gamma _{{\textit{LK}}_{l}}\). The difference between internal and external intensities can be measured using the following cost function:

$$\begin{aligned} E_{{\textit{NIR}}}(\gamma _{{\textit{LK}}_{l}}):=\frac{1}{|{\textit{NIR}}|} \sum _{(x,y) \in {\textit{NIR}}} I(x,y) - \frac{1}{|{\textit{NIR}}^c|} \sum _{(x,y) \in {\textit{NIR}}^c} I(x,y) \end{aligned}$$
(6)

for \(|{\textit{NIR}}|\) denoting the number of pixels in the NIR region and \(|{\textit{NIR}}^c|\) the number of pixels outside NIR region. The function \(E_{{\textit{NIR}}}(\gamma _{{\textit{LK}}_{l}})\) attains a minimal value for the optimal intercept, namely \(\widetilde{l}\) that best separates \({\textit{NIR}}\) and \({\textit{NIR}}^c\). This optimal value can be efficiently obtained by exhaustive search of all possible intercept values or using any gradient descent method.

We show in Fig. 1a complete example of our non-informative region identification method.

Fig. 1.
figure 1

Complete example of non informative region identification: (a) Colonoscopy feature space; (b) Original image with boundary of NIR superimposed; (c) LEM map with boundary superimposed; (d) Selection of the optimal threshold as the minimal value of function \(E_{NIR}(\gamma _{LK_{l}})\).

3 Experiments

In order to validate the performance of our non informative region identification method, we have used the same database presented in [10] taken from 15 different sequences with a polyp from colonoscopy interventions. We have selected those colonoscopy frames having non-informative regions including lumen and low visibility regions. The final dataset used in our experiments contains \(100\) frames. One expert provided a mask labelling non-informative regions excluding the black borders which surround natively colonoscopy frames. We show some examples of frames of our database along with non-informative masks in Fig. 2.

Fig. 2.
figure 2

Examples of the content of the dataset used in the experiments: (a, c) Original images; (b, d) Non-informative region masks provided by manual annotation by the expert.

Assessment of the proposed methods was quantified using the Annotated Area Covered (AAC) and the Dice Similarity Coefficient (DICE) given by: [16].

$$\begin{aligned} {\textit{AAC}} = 100 \cdot \frac{T_{{\textit{NIR}}}}{{\textit{GT}}_{{\textit{NIR}}}} \,\,\,\,\ {\textit{DICE}} = 100 \cdot \frac{T_{{\textit{NIR}}}}{{\textit{AUT}}_{{\textit{NIR}}}}; \end{aligned}$$
(7)

where \(T_{{\textit{NIR}}}\) stands for the number of pixels correctly labelled as non informative region, \({\textit{GT}}_{{\textit{NIR}}}\) for the number of annotated pixels and \({\textit{AUT}}_{{\textit{NIR}}}\) for the number of pixels detected using the methodology explained. Both measures are complementary, as the former calculates the amount of annotated \({\textit{NIR}}\) area while the latter complements it with the amount of \({\textit{NIR}}^c\) information that is kept in the region.

In order to explore differences across the three methods for region extraction proposed in Sect. 2, we have used a non-parametric analysis of variance given by the Kruskal-Wallis one-way analysis of variance by ranks [17]. The Kruskal-Wallis test has been done for the DICE and AAC scores obtained by each region extractor method, which define the anova groups. Quality scores have been summarized using confidence intervals for their average values [18]. The Kruskal-Wallis test and average score confidence intervals have been computed at a significance level \(\upalpha =0.05\).

For both scores, the Kruskal-Wallis test shows a significant difference on the performance of methods (\(p-val<10^{-4}\)). Figure 3 shows, for each score, a comparison of the average ranks (the lower, the better) of each method with Tukey-Kramer correction for the multi-comparison. The intervals shown in the graph are computed so that to a very close approximation, two estimates being compared are significantly different if their intervals are disjoint, and are not significantly different if their intervals overlap [19]. We have highlighted in blue the best ranked methods and in red the worse ones. For AAC, \({\textit{GSM2}}\) is significantly the best ranked method compared to the ranks of \({\textit{DOVA}}\) and \({\textit{WSM}}\). Although there is no significance difference between the latter (their rank intervals overlap), \({\textit{DOVA}}\) has a worse average rank than \({\textit{WSM}}\). This trend reverts in the case of \({\textit{DICE}}\) as \({\textit{GSM2}}\) rank is significantly worse than \({\textit{DOVA}}\) and \({\textit{WSM}}\). As before, there are no significant differences between \({\textit{DOVA}}\) and \({\textit{WSM}}\), but \({\textit{DOVA}}\) has a better average rank.

Fig. 3.
figure 3

Multicomparison Analysis for the two Quality Scores. Horizontal axis represent the average ranks (the lower, the better).

Table 1. Confidence Intervals for average AAC and DICE

Table 1 reports AAC and DICE confidence intervals for the 3 methods. Results indicate that \({\textit{GSM2}}\) is the most strict one in terms of the amount of selected valid information (with AAC around \(85\,\%\) and DICE around \(65\,\%\)) and it might discard areas that could be considered valid for a further inspection. On the other hand, \({\textit{DOVA}}\) is the most permissive in terms of information discarding (with DICE around \(85\,\%\) but AAC around \(65\,\%\)) and might include some non-informative areas. Finally, \({\textit{WSM}}\) achieves a compromise between AAC and DICE, with both indexes round \(75\,\%\). Figure 4 shows some qualitative examples of NIR region identification using the 3 methods for delimiting their boundary. Manual boundaries are shown in white and automatic ones in green. The first row shows and example of \({\textit{DOVA}}\) under-segmentation but accurate \({\textit{GSM2}}\) and \({\textit{WSM}}\) segmentation of the non-informative region, which includes the lumen and a shadow. The second row shows the opposite behavior with \({\textit{DOVA}}\) and \({\textit{WSM}}\) providing a more accurate segmentation in contrast to a larger non-informative \({\textit{GSM2}}\) region.

Fig. 4.
figure 4

Qualitative examples of NIR region identification using as the three proposed methods -GSM2, DOVA and WSM- as boundary detector.

Finally and regarding computation time, all the results presented in this paper have been obtained with a PC with an Intel Core i7 3930K twelve-core processor with 8 GB of RAM memory. In order to develop the different algorithms we have used Matlab scripts and compiled functions to incorporate the GIPL libraries of the CrossVisions package [20]. Obtaining LEM energy maps from a single image takes \(3.87\) s in a single core. The computation time different algorithms for final NIR region calculation from LEM maps is as follows: \(3.82\) s for GSM2, \(4.52\) s for DOVA and \(0.02\) s for WSM. The direct computation of NIR regions using the method proposed in [8, 9] takes \(80.46\) s. All the proposed algorithms are naturally parallelizable and they can be integrated into GPU architectures by image partitioning and individual pixel assignation to core.

4 Conclusions and Future Work

This paper addresses identification of non-informative regions, NIR, in colonoscopy frames which should be discarded at later stages of clinical support algorithms. An automatic discard of NIR saves a computational time that allows a more accurate processing of valid parts of the image. Aside form computational time savings, discarding frames with large non-informative regions could also be used to automatically create summaries of colonoscopy videos [4], omitting those non-informative frames.

In this work, three different alternatives for NIR segmentation, \({\textit{GSM2}}\), \({\textit{DOVA}}\) and \({\textit{WSM}}\), have been presented and evaluated according to AAC and DICE score. The methods can be ranked according to the amount of valid information discarded from the most strict \({\textit{GSM2}}\) to the most permissive \({\textit{DOVA}}\) and \({\textit{WSM}}\) presenting the best compromise with average scores over \(75\,\%\). This already represents a huge improvement of previous results [8, 9] which achieved average DICE and AAC scores around \(50\,\%\). Although very promising, our results have room for improvement. First, visual identification of non-informative regions is a difficult task presenting a significant variability within observers. In order to account for it, images will be annotated twice. Second, the proposed feature space works in the gray intensity domain, which usually discards larger areas that include information valid in the color space. This could be overcome by working in a 3D color space and it is currently under research.