Skip to main content
Log in

Generalized multi-scale stacked sequential learning for multi-class classification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In many classification problems, neighbor data labels have inherent sequential relationships. Sequential learning algorithms take benefit of these relationships in order to improve generalization. In this paper, we revise the multi-scale sequential learning approach (MSSL) for applying it in the multi-class case (MMSSL). We introduce the error-correcting output codesframework in the MSSL classifiers and propose a formulation for calculating confidence maps from the margins of the base classifiers. In addition, we propose a MMSSL compression approach which reduces the number of features in the extended data set without a loss in performance. The proposed methods are tested on several databases, showing significant performance improvement compared to classical approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Abbreviations

X :

Set of samples

Y :

Set of labels

x :

A sample

y :

A label

h(x):

A classifier

\(y^{\prime}\) :

A prediction from a classifier

\(y^{\prime\prime}\) :

A final prediction from a chain of classifiers

x ext :

Extended set

J :

Neighborhood relationship function

z :

Neighborhood model features

ρ :

Neighborhood

θ :

Neighborhood parameterization

w :

Number of elements in the neighborhood window

s :

Number of scales

c :

Set of different classes in a multi-class problem

\(\hat{F}(\mathbf{x}, c)\) :

A prediction confidence map

N :

Number of classes in a multi-class problem

n :

Number of dichotomizers

σ :

Parameter of a Gaussian filter

∑:

Set of scales defined by σ parameters

b :

A dichotomizer

M :

ECOC coding matrix

\({\mathcal{Y}}\) :

A class codeword in ECOC framework

\({\mathcal{X}}\) :

A sample prediction codeword in ECOC framework

m x :

Margin for a prediction of sample x

β :

Constant which governs transition in a sigmoidean function

t :

Number of iterations in an ADABoost classifier

δ :

A soft distance

α :

Normalization parameter for soft distance δ

g σ :

A multidimensional isotropic gaussian filter with zero mean and σ standard deviation

\({\mathcal{P}}\) :

A set of partitions of classes

P :

A partition of groups of classes

γ :

A symbol in a partition codeword

\(\Upgamma\) :

A partition codeword

R :

The mean ranking for each system configurations

E :

The total number of experiments

k :

The total number of system configuration

\(\chi_{2}^{F}\) :

Friedman statistic value

References

  1. Allwein E, Schapire R, Singer Y (2002) Reducing multiclass to binary: a unifying approach for margin classifiers. J Mach Learn Res 1:113–141

    MathSciNet  Google Scholar 

  2. Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286

    MATH  Google Scholar 

  3. Dietterich TG (2002) Machine learning for sequential data: A Review. Proceedings on joint IAPR international workshop on structural, syntactic, and statistical pattern recognition. In: Lecture notes in computer science, vol 2396, pp 15–30

  4. Dietterich TG, Ashenfelter A, Bulatov Y (2004) Training conditional random fields via gradient tree boosting. In: Proceedings of the 21th ICML, pp 217–224

  5. Nilsson NJ (1965) Learning Machines. McGraw-Hill, New York

  6. Cohen WW, de Carvalho VR (2005) Stacked sequential learning. In: Proceedings of IJCAI 2005, pp 671–676

  7. McCallum A, Freitag D, Pereira F (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of ICML 2000, pp 591–598

  8. Friedman J, Hastie T, Tibshirani R (1998) Additive logistic regression: a statistical view of boosting. Ann Stat 28:2

    MathSciNet  Google Scholar 

  9. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  MathSciNet  Google Scholar 

  10. Lafferty JD, McCallum A, Pereira F (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML 2001, pp 282–289

  11. Burt P, Adelson E (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 31(4):532–540

    Article  Google Scholar 

  12. Korč F, Förstner W (2009) eTRIMS Image Database for Interpreting Images of Man-Made Scenes, TR-IGG-P-2009-01, University of Bonn

  13. Boykov Y, Funka-Lea G (2006) Graph cuts and eN-D image segmentation. Int J Comput Vis 70(2):109–131

    Article  Google Scholar 

  14. Escalera S, Tax D, Pujol O, Radeva P, Duin R (2008) Subclass problem-dependent design of error-correcting output codes. IEEE Trans Pattern Anal Mach Intell 30(6):1041–1054

    Article  Google Scholar 

  15. Mottl V, Dvoenko S, Kopylov A (2004) Pattern recognition in interrelated data: the problem, fundamental assumptions, recognition algorithms. In: Proceedingsof the 17th ICPR, Cambridge UK, vol 1, pp 188–191

  16. Gatta C, Puertas E, Pujol O (2011) Multi-scale stacked sequential learning. Pattern Recognit 44(10–11):2414–2426

    Article  Google Scholar 

  17. Ciompi F et al (2011)A holistic approach for the detection of media-adventitia border in IVUS. In: Med Image Comput Comput Assist Interv. MICCAI’11 vol 14, 3rd edn, pp 411–419

  18. Dalal N, Triggs B (2011) Histograms of oriented gradients for human detection. In: Proceedings of 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) vol 1, pp 886–893

  19. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  20. Casale P, Pujol O, Radeva P (2011) Personalization and user verification in wearable systems using biometric walking patterns.Personal Ubiquitous Comput, pp 1–18

  21. Escalera S, Pujol O, Radeva P (2010) On the decoding process in ternary error-correcting output codes. Trans Pattern Anal Mach Intell 32(1):120–134

    Article  Google Scholar 

  22. Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  MathSciNet  Google Scholar 

  23. Boykov Y, Kolmogorov V (2003) Computing geodesics and minimal surfaces via graph cuts. In: Proceedings Ninth IEEE international conference on computer vision, vol 1, pp 26–33, 13–16 Oct 2003

  24. Bottou L, LeCun Y (2005) Graph transformer networks for image recognition. Bulletin of the International Statistical Institute (ISI), 55th Session

Download references

Acknowledgments

This work has been supported in part by the projects TIN2009-14404-C02, IMSERSO Mediminder and Rercercaixa 2011 Remedi.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eloi Puertas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Puertas, E., Escalera, S. & Pujol, O. Generalized multi-scale stacked sequential learning for multi-class classification. Pattern Anal Applic 18, 247–261 (2015). https://doi.org/10.1007/s10044-013-0333-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-013-0333-y

Keywords

Navigation