Elsevier

Pattern Recognition Letters

Volume 30, Issue 3, 1 February 2009, Pages 285-297
Pattern Recognition Letters

Separability of ternary codes for sparse designs of error-correcting output codes

https://doi.org/10.1016/j.patrec.2008.10.002Get rights and content

Abstract

Error-correcting output codes (ECOC) represent a successful framework to deal with multi-class categorization problems based on combining binary classifiers. With the extension of the binary ECOC to the ternary ECOC framework, ECOC designs have been proposed in order to better adapt to distributions of the data. In order to decode ternary matrices, recent works redefined many decoding strategies that were formulated to deal with just two symbols. However, the coding step also is affected, and therefore, it requires to be reconsidered. In this paper, we present a new formulation of the ternary ECOC distance and the error-correcting capabilities in the ternary ECOC framework. Based on the new measure, we stress on how to design coding matrices preventing codification ambiguity and propose a new sparse random coding matrix with ternary distance maximization. The results on a wide set of UCI Machine Learning Repository data sets and in a real speed traffic sign categorization problem show that when the coding design satisfies the new ternary measures, significant performance improvement is obtained independently of the decoding strategy applied.

Introduction

In the literature, one can find several powerful types of binary classifiers. However, when one needs to deal with multi-class classification problems, many learning techniques fail to manage this information. Instead, it is common to construct the classifiers to distinguish between just two classes, and to combine them. In this sense, error-correcting output codes were born as a general framework to combine binary problems to address the multi-class problem. The strategy was introduced by Dietterich and Bakiri (1995). Based on the error-correcting principles (Dietterich and Bakiri, 1995) and because of its ability to correct the bias and variance errors of the base classifiers (Kong and Dietterich, 1995), ECOC has been successfully applied to a wide range of Computer Vision applications, such as face recognition (Windeatt and Ardeshir, 2003), face verification (Kittler et al., 2001), text recognition (Ghani, 2001) or manuscript digit classification (Zhou and Suen, 2005).

The ECOC technique can be broken down into two general stages: encoding and decoding. Given a set of classes, the coding stage designs a codeword1 for each class based on different binary problems. The decoding stage makes a classification decision for a given test sample based on the value of the output code.

At the coding step, given a set of N classes to be learnt, n different bi-partitions (groups of classes) are formed, and n binary problems (dichotomizers) are trained. As a result, a codeword of length n is obtained for each class, where each bit of the code corresponds to the response of a given dichotomizer (coded by +1, −1, according to its class set membership). Arranging the codewords as rows of a matrix, we define a coding matrix M, where M{-1,1}N×n in the binary case. The most well-known binary coding strategies are the one-versus-all strategy (Nilsson, 1965), where each class is discriminated against the rest of classes, and the dense random strategy (Allwein et al., 2002), where a random matrix M is generated maximizing the rows and columns separability in terms of the Hamming distance (Dietterich and Bakiri, 1995). In Fig. 1a, the one-versus-all ECOC design for a 4-class problem is shown. The white regions of the coding matrix M correspond to the positions coded by 1, and the black regions to −1. Thus, the codeword for class c1 is {1,-1,-1,-1}. Each column j of the coding matrix codifies a binary problem learnt by its corresponding dichotomizer hi. For instance, dichotomizer h1 learns c1 against classes c2,c3 and c4, dichotomizer h2 learns c2 against classes c1,c3 and c4, etc. An example of a dense random matrix for a 4-class problem is shown in Fig. 1c.

It was when Allwein et al. (2002) introduced a third symbol (the zero symbol) in the coding process that the coding step received special attention. This symbol increases the number of partitions of classes to be considered in a ternary ECOC framework by allowing some classes to be ignored. Then, the ternary coding matrix becomes M{-1,0,1}N×n. In this case, the symbol zero means that a particular class is not considered by a certain binary classifier. Thanks to this, strategies such as one-versus-one (Hastie and Tibshirani, 1998) and sparse random coding (Allwein et al., 2002) have been formulated in the ECOC framework. Fig. 1b shows the one-versus-one ECOC configuration for a 4-class problem. In this case, the grey positions correspond to the zero symbol. A possible sparse random matrix for a 4-class problem is shown in Fig. 1d. Recently, new improvements in the ternary ECOC coding demonstrate the suitability of the ECOC methodology to deal with multi-class classification problems (Pujol et al., 2006, Escalera et al., 2007). These recent designs use the knowledge of the problem-domain to learn relevant binary problems from ternary codes. The basic idea of these methods is to use the training data to guide the training process, and thus, to construct the coding matrix M focusing on the binary problems that better fits the decision boundaries of a given data set.

The decoding step was originally based on error-correcting principles under the assumption that the learning task can be modeled as a communication problem, in which class information is transmitted over a channel (Dietterich and Bakiri, 1995). During the decoding process, applying the n binary classifiers, a code is obtained for each data point in the test set. This code is compared to the base codewords of each class defined in the matrix M, and the data point is assigned to the class with the closest codeword. The most frequently applied decoding strategies are the Hamming (HD) (Nilsson, 1965) and the Euclidean (ED) decoding distances (Hastie and Tibshirani, 1998). With the introduction of the zero symbol, Allwein et al. (2002) showed the advantage of using a Loss-based function of the output margin of the base classifier. Recently, Escalera et al. (2008) proposed a loss-weighted strategy to decode, where a set of probabilities based on the performances of the base classifiers is used to weight the final classification decision. In Fig. 1, each ECOC codification is used to classify an input object X. The input X is tested with each dichotomizer hi, obtaining an output Xi,i{1,..,n}. The final code {X1,,Xn} of the test input X is used by a given decoding strategy to obtain the final classification decision. Note that in both, the binary and the ternary ECOC framework, the value of each position Xj of the test codeword can not take the value zero since the output of each dichotomizer is hj{-1,+1}, meaning the automatical increasing of distance/error.

To deal with multi-class categorization problems in the ternary ECOC framework, recent works redefined decoding strategies that were formulated to deal with just two symbols (Escalera et al., 2008, Allwein et al., 2002). However, the influence of the zero symbol to the error-correction capabilities and the design of the coding strategies have not been taken into account. In this paper, we formulate the ternary distance and the ternary error-correcting capabilities in the ternary ECOC framework. We propose a new sparse coding design based on maximizing the new ternary distance. We evaluate the methodology on a wide set of UCI Machine Learning Repository data sets and in a real Computer Vision problem: speed traffic sign categorization. The results show that when the new ternary distance is considered on sparse designs, significant performance improvement is obtained.

The paper is organized as follows: Section 2 overviews the ECOC random designs and presents a new sparse coding design based on ternary distance maximization. Section 3 presents the experimental results. Finally, Section 4 concludes the paper.

Section snippets

Random ECOC designs

In this section, we overview both dense and sparse random ECOC designs (Allwein et al., 2002). We show the inconsistency of the classical sparse random design and introduce a new measure for sparse coding designs.

Results

We discuss the data, comparatives, and measurements of the experiments before the results are presented.

  • Data: The data used for the experiments consists of 16 multi-class data sets from the UCI Machine Learning Repository database (Asuncion and Newman, 2007). The details of the data sets are shown in Table 1.

    We also use the video sequences obtained from a Mobile Mapping System (Casacuberta et al., 2004) to test the methods in a real traffic sign categorization problem.

  • Comparative: For the

Conclusions

In this paper, we introduced a new formulation of the ternary distance that defines the classes separability in the ternary ECOC framework. We showed that the rows separability in terms of the Hamming distance of the binary ECOC framework can not be applied in the ternary case. Based on the new measure, we illustrated that the design of the standard sparse random strategy is inconsistent, and a new sparse random construction is presented. The results show that the new design applied with any

Acknowledgments

This work has been supported in part by projects TIN2006-15308-C02, FIS PI061290, and CONSOLIDER-INGENIO CSD 2007-00018.

References (20)

  • S. Escalera et al.

    Boosted landmarks of contextual descriptors and Forest-ECOC: A novel framework to detect and classify objects in clutter scenes

    Pattern Recognition Lett.

    (2007)
  • Allwein, E., Schapire, R., Singer, Y., 2002. Reducing multiclass to binary: A unifying approach for margin classifiers....
  • Asuncion, A., Newman, D., 2007. In: UCI Machine Learning Repository, University of California, Irvine, School of...
  • Casacuberta, J., Miranda, J., Pla, M., Sanchez, S., Serra, A., Talaya, J., 2004. On the accuracy and performance of the...
  • T. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    J. Artif. Intell. Res.

    (1995)
  • Escalera, S., Pujol, O., Radeva, P., 2006. Decoding of ternary error correcting output codes. In: CIARP, vol. 4225, pp....
  • Escalera, S., Pujol, O., Radeva, P., 2008. Loss-weighted decoding for error-correcting output codes. In: Internat....
  • J. Friedman et al.

    Additive logistic regression: A statistical view of boosting

    Ann. Statist.

    (1998)
  • Ghani, R., 2001. Combining labeled and unlabeled data for text classification with a large number of categories. In:...
  • Hastie, T., Tibshirani, R., 1998. Classification by pairwise grouping. In: NIPS, vol. 26,...
There are more references available in the full text version of this article.

Cited by (117)

  • Voice spoofing detector: A unified anti-spoofing framework

    2022, Expert Systems with Applications
    Citation Excerpt :

    Thus, we extracted 13 GTCC coefficients and later fused them with the proposed novel ATCoP features for audio signal representation. To address the multi-class classification problem, we employed the error correcting output codes (ECOC) framework (Escalera, Pujol, & Radeva, 2009) by combining three binary classifiers. ECOC model generates a codeword against each class during encoding and predict the class of given test sample at the decoding phase.

View all citing articles on Scopus
View full text