Elsevier

Pattern Recognition Letters

Volume 27, Issue 3, February 2006, Pages 201-209
Pattern Recognition Letters

Boosting the distance estimation: Application to the K-Nearest Neighbor Classifier

https://doi.org/10.1016/j.patrec.2005.08.019Get rights and content

Abstract

In this work we introduce a new distance estimation technique by boosting and we apply it to the K-Nearest Neighbor Classifier (K-NN). Instead of applying AdaBoost to a typical classification problem, we use it for learning a distance function and the resulting distance is used into K-NN. The proposed method (Boosted Distance with Nearest Neighbor) outperforms the AdaBoost classifier when the training set is small. It also outperforms the K-NN classifier used with several different distances and the distances obtained with other estimation methods such as Relevant Component Analysis (RCA) [Duda, R.O., Hart, P.E., Stock, D.G., 2001. Pattern Classification, John Wiley and Sons Inc.]. Furthermore, our distance estimation performs dimension-reduction, being much more efficient in terms of classification accuracy than classical techniques such as PCA, LDA, and NDA. The method has been thoroughly tested on 13 standard databases from the UCI repository, a standard gender recognition database and the MNIST database.

Introduction

Many applications in computer vision and pattern recognition use a distance function as a means of comparing objects. Well-known examples are the K-Nearest Neighbor Classifier (K-NN), unsupervised classifiers such as K-means, kernel-based classifiers such as the Support Vector Machine (SVM) and ranking-based retrieval in large databases. Obtaining an appropriate distance function can significantly improve the performance of those methods. In this work we propose a new framework for estimating the distance and we apply it to improve the performance of K-NN. The most common distance is the Euclidean distance that assumes the data has a Gaussian isotropic distribution. When the feature space has a large number of dimensions, an isotropic assumption is especially inappropriate and the performance of the classifier is decreased.

The typical procedure is to find an accurate distance by considering the family of (anisotropic) Mahalanobis distances (x-y)TW(x-y) where the goal is to estimate the weight matrix W. If d is the number of dimensions, the matrix W contains d2 parameters to be estimated, which is not robust when the training set is small compared to the number of dimensions d. Furthermore, by using this distance we are still assuming the distribution of the data is Gaussian, which is not necessarily true. In order to reduce the number d of dimensions, we can apply classical techniques such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA) or Non-parametric Discriminant Analysis (NDA) (Duda et al., 2001, Fukunaga and Mantock, 1997). However, these methods again need to estimate a covariance or scatter matrix with d2 elements, which does not ameliorate the problem in the presence of a small training set.

Instead of estimating a parametric distance (or similarity), we propose to use a functional approximation where the similarity function is estimated by a generalization method (classifier). In particular, we propose to use AdaBoost with decision stumps (Schapire and Singer, 1999) in order to estimate the similarity function. Given a training set with feature vectors Ω={xi}i=1N, the similarity estimation is done by training AdaBoost with differences between vectors in Ω, where each difference vector xi-xj has an associated label (desired value). Based on this training set, a function H is learnt by AdaBoost, and we derive a similarity S(x,y)[0,1] that is accurate in a classification context. Finally, the estimated similarity S is incorporated into K-NN. Using K-NN along with a boosted similarity is especially suitable when the training set is small. Two factors contribute to this. First, if N is the size of the original training set, the new training set has O(N2) relations between the vectors and this makes AdaBoost more robust against over-fitting. Second, AdaBoost complements K-NN by providing a similarity adapted to the characteristics of the given data. Increasing the effectiveness for small training sets is necessary in many real classification problems in general, and in particular in applications such as retrieval where only a small training set is expected to be provided online by the user.

In summary, the proposed method has three main advantages: (i) it adds effectiveness to the classifier when we have a small training set compared to the number of dimensions; (ii) the estimated similarity only uses a small number of dimensions, resulting in a powerful dimension-reduction technique that does not have the problems highlighted above for classical techniques (e.g. PCA, LDA, NDA); (iii) opposite to other works (Peng et al., 2001, Domeniconi et al., 2002), the proposed similarity estimation is general, i.e. it is not specifically designed for K-NN.

The rest of the paper is organized as follows: in Section 2 we review previous work on similarity estimation methods, Section 3 describes the proposed method, in Section 4 we provide results comparing our method to other methods, and in Section 5 we conclude and suggest future lines of work.

Section snippets

Related work

Recently, there have been several works on estimating the distance to ameliorate certain pattern recognition problems (Hertz et al., 2004, Domeniconi et al., 2002, Peng et al., 2001, Xing et al., 2003, Bar-Hillel et al., 2003). Domeniconi et al., 2002, Peng et al., 2001 propose specific estimations designed for the K-NN classifier. They obtain an anisotropic distance based on local neighborhoods that are more narrow along relevant dimensions and more elongated along non-relevant ones. Xing et

Distance estimation by boosting

In this section, a similarity estimation method is introduced that uses AdaBoost for learning the similarity. We seek for a similarity function that is particular of each class C (for example, consider the Mahalanobis distance that depends on the class C through the covariance Σ of this class). Let SC be the similarity function we want to learn for class C. First, let us describe the application of a class-dependent similarity SC into any distance-based classifier such as the Nearest Neighbor.

Results

We tested the performance of the proposed distance estimation into the Nearest Neighbor classifier, and compared it with the classical AdaBoost. We also compared our method with Nearest Neighbor using several other distances and dimension-reduction methods such as PCA, LDA (Duda et al., 2001) and NDA (Fukunaga and Mantock, 1997). In order to test the performance of the different methods, we used 13 standard data-sets from the UCI repository (Merz and Murphy, 1996) and two other known databases

Discussion

In this work, we provided an efficient method for estimating the distance by boosting. We showed that the Nearest Neighbor classifier used along with the proposed (Boosted) Distance outperforms the popular AdaBoost classifier. In particular, we used AdaBoost with decision stumps for learning the distance, and showed that NN with the Boosted Distance is an efficient classifier in the presence of high-dimensional spaces and small training sets. This is very important in many applications where we

References (17)

  • A. Bar-Hillel et al.

    Learning distance functions using equivalence relations

    Int’l Proc. ICML

    (2003)
  • C. Domeniconi et al.

    Locally adaptive metric nearest-neighbor classification

    IEEE TPAMI

    (2002)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • Y. Freund et al.

    Experiments with a new boosting algorithm

    Proc. Int’l Conf. Mach. Learn.

    (1996)
  • K. Fukunaga et al.

    Nonparametric discriminant analysis

    IEEE TPAMI

    (1997)
  • T. Hertz et al.

    Learning distance functions for image retrieval

    IEEE Proc. CVPR

    (2004)
  • D.W. Jacobs et al.

    Classification with nonmetric distances: Image retrieval and class representation

    IEEE TPAMI

    (2000)
  • LeCun, Y., 1998. MNIST database. Available from:...
There are more references available in the full text version of this article.

Cited by (42)

  • Semi-supervised double sparse graphs based discriminant analysis for dimensionality reduction

    2017, Pattern Recognition
    Citation Excerpt :

    Although pseudo-labels are not real labels, the active influence of them could not be ignored because of their relationship with labeled samples. The k-nearest-neighbor method has been successfully applied in many applications such as classification [45] and graph construction [15,17]. Because of its simplicity and effectiveness, the idea of it is employed to select pseudo-labeled samples from unlabeled samples in the proposed methods.

  • A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric

    2012, Pattern Recognition Letters
    Citation Excerpt :

    In the case of the Boosted Distance algorithm, we are not confident in our implementation of the algorithm, due to the fact that we found an error in the pseudo code in (Amores et al., 2006). Therefore, only results for the sonar, ionosphere and liver dataset which are obtained from Amores et al. (2006) are included in the comparison. ( Please note that there are slight differences in the experiment setup.)

  • Classification of time-frequency representations based on two-direction 2DLDA for gear fault diagnosis

    2011, Applied Soft Computing Journal
    Citation Excerpt :

    In this way, fifty features can be calculated for each sample. In order to evaluate the performances of the above four feature extraction approaches, three popular classifiers means the K nearest neighbor classifier (KNNC) [33–36], Naïve Bayes classifier (NBC) [37–39] and least-square support vector machine (LS-SVM) [40–42] were employed as induction algorithms for gear fault classification. The KNNC and NBC were implemented by utilizing the Matlab Toolbox for Pattern Recognition (PRTools 4.1) [43].

View all citing articles on Scopus
1

Partially supported by CICYT TIC2000-1635-C04-04, Spain.

View full text