Elsevier

Knowledge-Based Systems

Volume 145, 1 April 2018, Pages 219-235
Knowledge-Based Systems

An overview of incremental feature extraction methods based on linear subspaces

https://doi.org/10.1016/j.knosys.2018.01.020Get rights and content

Highlights

  • Categorized overview of incremental feature extraction based on linear subspace methods.

  • To add new information as it becomes available while retaining the previously acquired knowledge.

  • Emphasis is done on methods with orthogonal matrix constraints based on global loss function.

  • Incremental approaches are differentiated according to subspace model.

  • Computational complexity, experimental setup and the performance are analyzed.

Abstract

With the massive explosion of machine learning in our day-to-day life, incremental and adaptive learning has become a major topic, crucial to keep up-to-date and improve classification models and their corresponding feature extraction processes. This paper presents a categorized overview of incremental feature extraction based on linear subspace methods which aim at incorporating new information to the already acquired knowledge without accessing previous data. Specifically, this paper focuses on those linear dimensionality reduction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives. Thus, we cover the approaches derived from Principal Components Analysis, Linear Discriminative Analysis and Discriminative Common Vector methods. For each basic method, its incremental approaches are differentiated according to the subspace model and matrix decomposition involved in the updating process. Besides this categorization, several updating strategies are distinguished according to the amount of data used to update and to the fact of considering a static or dynamic number of classes. Moreover, the specific role of the size/dimension ratio in each method is considered. Finally, computational complexity, experimental setup and the accuracy rates according to published results are compiled and analyzed, and an empirical evaluation is done to compare the best approach of each kind.

Introduction

Processing large amounts of data is nowadays a challenging task in the field of pattern recognition, which aims to extract meaningful information embedded in the data. As a first general step, appropriate structured descriptors or features must be selected or extracted from raw data through a learning process using any prior information available. This leads to a more discriminative data representation with lower dimensionality, facilitating the following steps on machine learning and data mining pipelines. The traditional way to extract these features is usually based on batch learning. However, this requires that all the data must be available from the beginning and used as whole, which is not convenient or even feasible in most online, interactive or stream-based processing applications. Several application domains such as autonomous navigation systems [1], human-robot interaction [2], object tracking [3], image classification [4], stream processing [5], face recognition [6] or recommendation systems [7], [8], [9], [10] have been shown as examples where a complete set of training samples is usually not known in advance but generally provided little by little. Moreover, in some cases the properties of data may change as new data is considered. For instance, in face recognition tasks, human faces may show large variations depending on expressions, lighting conditions, make-up, hairstyles, aging and so forth. When a human is registered in a person identification system, it is quite difficult to consider all this facial variability in advance [11] but instead it is more convenient to discover it during the operation of the system.

As an effective alternative, the paradigm of incremental or adaptive learning has been considered and deeply studied as its own pattern recognition and machine learning subfield. By using incremental learning, feature extraction processes should be capable of incorporating the new information available while retaining the previously acquired knowledge, without accessing the previously processed training data. This fact is very challenging specially in the era of big data, where new chunks of data is continuously appearing and new classification objectives arise.

Among the huge amount of incremental learning schemes, this paper focuses on linear subspace-based incremental feature extraction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives with unconstrained objectives, such as probabilistic PCA [3], [12], or matrix factorization methods [7], [8], [9], [10], [13], [14], [15], [16], mostly popular for building collaborative filtering on recommender systems. Note that not all linear feature extraction methods need to produce orthogonal projections, or indeed projections at all. While subspace-based methods can be based on linear and non-linear subspaces, linear methods are the most extensively used, even in highly non-linear problems where the non-linearity is modeled in the subsequent feature extraction and classification stages instead. An example of this is the use of linear dimensionality reduction methods in modern deep learning architectures as preprocessing step to reduce the number of parameters to be learned and the number training samples [17], [18], [19]. Moreover, these techniques have been used in the last years in many successful problems as object tracking [20], [21], [22] or in other application fields, such as pharmaceutics [23], medical image [24], [25], agriculture [26], industrial applications [27], chemometrics [28], [29] pattern recognition [30] or bioinformatics [31], [32].

Therefore, this paper presents a categorized overview of the research done over the past decades on linear subspace-based incremental feature extraction and dimensionality reduction for matrices and general applications. Special emphasis is put on those methods with orthogonal matrix constraints based on global loss function, such as Principal Components Analysis (PCA), Linear Discriminative Analysis (LDA), and Discriminative Common Vector (DCV) methods, over methods with unconstrained objectives, such as probabilistic PCA [3], [12] or matrix factorizations [7], [8], [9], [10], [14], [15], [16]. Similarly, we consider that those incremental methods which are more related to subspace-to-subspace matching [33], [34], and tensor factorization [35], [36], [37], [38] are out of the scope. By restricting ourselves to these methods, we can both keep our survey to a manageable size and also concentrate at the basic ideas behind the different incremental approaches that are usually shared across a wider range of works. For the same reason, we have obviated incremental nonlinear extensions of the above methods [39].

In the present work we will differentiate methods according to the subspace model used. From this viewpoint, two main categories of incremental subspace-based methods are usually considered depending on whether or not the above matrices are explicitly considered and computed (using different forms of decompositions) or not. Some of these variants are referred to in the literature as covariance-based or covariance-free methods. Table 1 summarizes all the papers considered in the present work according to the subspace model used and the computation (or not) of the above matrices.

To complete this multidimensional taxonomy, which is graphically illustrated in Fig. 1, different ways of feeding incremental algorithms are considered. The first one is in terms of the data size required for each update, which may range from one single sample at a time to moderate chunks of data. The second one is in terms of data labels, i.e. whether or not the set of labels in the corresponding classification problem is fixed beforehand or may grow arbitrarily along the incremental process. We will refer to these two aspects as chunk size and chunk label structure, respectively. Finally, we will also consider the size/dimension ratio, where we explicitly distinguish between the case in which the input space dimension is much greater than the expected data stream size and this constitutes a requirement or strongly conditions a particular method. Facing very small values of this ratio is usually referred as the small sample size (SSS) case.

The paper is organized around the above taxonomy paired with the discussion of the advantages and disadvantages inherent to each approach. The remainder of the paper is structured as follows. Section 2 describes the problem setting. Sections 3–5 contain an organized overview of incremental feature extraction based on PCA, LDA and DCV approaches, respectively. Section 6 shows a performance analysis of the incremental methods regarding their experimental setup and accuracy rates available in published results, as well as the empirical comparison of the best approach of each kind. Finally, Section 7 summarizes the main contributions of this survey.

Section snippets

Problem setting

Throughout this paper, the d-dimensional vector space, Rd, will be considered as the input representation space. Ur=[u1ur] represents a projection mapping, where ui are orthonormal vectors that span a linear subspace in Rd and usually r ≪ d. Given a (column) vector, xRd, its mapping onto Rr is UrTx, and the corresponding orthogonal projection onto the subspace defined by Ur is given by UrUrTx. Different methods optimize different criteria which are based on different scatter matrices 1

Incremental Principal Components Analysis (IPCA)

PCA is the most popular feature extraction method which aims to find a set of orthonormal basis vectors (the principal components) that maximize the variance over all the data when it is projected onto the subspace spanned by these principal components. Formally, it corresponds to the following optimization problem max|UTStU|s.t.UUT=I

Although PCA is not optimal with regard to discrimination since no class information is used to obtain principal components, it is optimal in terms of minimum

Incremental Linear Discriminant Analysis (ILDA)

LDA is a traditional statistical technique that reduces dimensionality while preserving as much of the class discriminatory information as possible, Instead of maximizing variability, LDA tries to find appropriate subspaces in which labelled data gets optimally separated. To this end, the total scatter matrix can be decomposed in two parts, St=Sw+Sb, such that one wants minimal within-class dispersion and maximal between-class dispersion at the same time. The classical way to achieve this,

Incremental Discriminative Common Vector (IDCV)

The DCV [106] approach constitutes a different way to overcome the singularity problem of LDA. It is particularly appealing because of its good performance behavior and flexibility of implementation, specially in case of very large dimensionalities such as in image recognition or genomic problems. Similarly to null space based methods, DCV method uses the N(Sw) to project the samples on it and implicitly avoid singularities. In fact, the rationale behind DCV is basically the same as with LDA

Performance analysis

This section gives an overview of the discriminant properties, in terms of accuracy rate, according to the empirical results provided by the authors of the incremental algorithms referenced in this paper. The evaluation and comparison of the performance of the incremental approaches is a difficult task, since few of them contain numerical results. As well, these results are usually not comparable across publications largely due to different experimental setups. For instance, ICLDA [66] and

Summary and concluding remarks

An exhaustive survey on incremental feature extraction based on linear subspace methods with orthogonal matrix constraints based on global loss function has been presented in this paper. Incremental methods are discussed and categorised in terms of subspace model, decomposition of matrices, updating strategy, requirements to be applied, computational complexity, experimental setup and accuracy rates. These different aspects allow us to analyze their properties and suitability when applied to

References (108)

  • W. Zheng et al.

    Fast algorithm for updating the discriminant vectors of dual-space LDA

    IEEE Trans. Inf. Forens. Secur.

    (2009)
  • K. Diaz-Chito et al.

    Image recognition through incremental discriminative common vectors

    Proceedings of the Advanced Concepts for Intelligent Vision Systems - ACIVS

    (2010)
  • K. Diaz-Chito et al.

    Null space based image recognition using incremental eigendecomposition

    Proceedings of the Pattern Recognition and Image Analysis

    (2011)
  • T. Zhu et al.

    An online incremental orthogonal component analysis method for dimensionality reduction

    Neural Netw.

    (2017)
  • G. Golub

    Some modified matrix eigenvalue problems

    SIAM Rev.

    (1973)
  • P. Gill et al.

    Methods for modifying matrix factorizations

    Math. Comput.

    (1974)
  • T. Sim, S. Baker, M. Bsat, The CMU pose, illumination, and expression (PIE) database, in: Proceedings of the 5th...
  • H. Yu et al.

    A direct LDA algorithm for high-dimensional data with application to face recognition

    Pattern Recognit.

    (2001)
  • J. Yang et al.

    Why can LDA be performed in PCA transformed space?

    Pattern Recognit

    (2003)
  • J. Ye et al.

    LDA/QR: An efficient and effective dimension reduction algorithm and its theoretical foundation

    Pattern Recognit.

    (2004)
  • G. Chen et al.

    An incremental-learning-by-navigation approach to vision-based autonomous land vehicle guidance in indoor environments using vertical line information and multiweighted generalized hough transform technique

    IEEE Trans. Syst. Man Cybern.Part B

    (1998)
  • M. Pardowitz et al.

    Incremental learning of tasks from user demonstrations, past experiences, and vocal comments.

    IEEE Trans. Syst. Man. Cybern. Part B

    (2007)
  • D. Ross et al.

    Incremental learning for robust visual tracking

    IJCV

    (2008)
  • K. Diaz-Chito et al.

    Incremental generalized discriminative common vectors for image classification

    IEEE Trans. Neural Netw. Learn. Syst.

    (2015)
  • X. Zeng et al.

    Covariance free incremental principal component analysis with exact mean update

    J. Comput. Inf. Syst.

    (2013)
  • Y. Peng et al.

    Chunk incremental IDR/QR LDA learning

    IJCNN

    (2013)
  • G. Takács et al.

    Scalable collaborative filtering approaches for large recommender systems

    J. Mach. Learn. Res.

    (2009)
  • E. Diaz-Aviles et al.

    Real-time top-n recommendation in social streams

    Proceedings of the Sixth ACM Conference on Recommender Systems

    (2012)
  • J. Vinagre et al.

    Fast incremental matrix factorization for recommendation with positive-only feedback

    Proceedings of the 22nd International Conference on User Modeling, Adaptation, and Personalization (UMAP)

    (2014)
  • Q. Song et al.

    Incremental matrix factorization via feature space re-learning for recommender system

    Proceedings of the 9th ACM Conference on Recommender Systems

    (2015)
  • S. Ozawa et al.

    On-line feature selection for adaptive evolving connectionist systems

    International Journal of Innovative Computing, Information and Control

    (2006)
  • A. Levy et al.

    Sequential Karhunen–Loeve basis extraction and its application to images

    IEEE Trans. Image Proces.

    (2000)
  • G. Ling et al.

    Online learning for collaborative filtering

    Proceedings of the International Joint Conference on Neural Networks (IJCNN)

    (2012)
  • C. Zhang et al.

    Incremental nonnegative matrix factorization based on matrix sketching and k-means clustering

    Proceedings of the 17th International Conference on Intelligent Data Engineering and Automated Learning – IDEAL

    (2016)
  • X. Huang et al.

    Incremental matrix factorization: A linear feature transformation perspective

    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, (IJCAI)

    (2017)
  • T.H. Chan et al.

    Pcanet: a simple deep learning baseline for image classification?

    IEEE Trans. Image Process.

    (2015)
  • H. Zheng et al.

    Text-independent voice conversion using deep neural network based phonetic level features

    Proceedings of the 23rd International Conference on Pattern Recognition (ICPR)

    (2016)
  • M. Seuret, M. Alberti, R. Ingold, M. Liwicki, Pca-initialized deep neural networks applied to document image analysis,...
  • F. Xu et al.

    Object tracking based on two-dimensional PCA

    Opt. Rev.

    (2016)
  • C. Bo et al.

    Online object tracking based on convex hull representation

    Proceedings of the IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)

    (2016)
  • P. Liang, Y. Wu, H. Ling, Planar object tracking in the wild: a benchmark, arXiv preprint arXiv:1703.07938...
  • L. Sun et al.

    Pharmaceutical raw material identification using miniature near-infrared (micronir) spectroscopy and supervised pattern recognition using support vector machine

    Appl. Spectrosc.

    (2016)
  • D.R. Vidhyavathi

    Principal component analysis (PCA) in medical image processing using digital imaging and communications in medicine (dicom) medical images

    Int. J. Pharma Bio. Sci.

    (2017)
  • D. Nandi et al.

    Principal component analysis in medical image processing: a study

    Int. J. Image Mining

    (2015)
  • Y. Wang et al.

    An adaptive image processing system based on incremental learning for industrial applications

    Proceedings of the IEEE Emerging Technology and Factory Automation (ETFA)

    (2014)
  • E. Robotti et al.

    Chemometric multivariate tools for candidate biomarker identification: lda, pls-da, simca, ranking-pca

    2-D PAGE Map Anal. Methods Protoc.

    (2016)
  • H. Hamid et al.

    Performance analysis: an integration of principal component analysis and linear discriminant analysis for a very large number of measured variables

    Res. J. Appl. Sci.

    (2016)
  • A.D. Bimbo, D. Fabrizio, L. Giuseppe, A real time solution for face logging(2009) 1–6....
  • A. Usman, S. Ahmed, J. Ferzund, A. Mehmood, A. Rehman, Using pca and factor analysis for dimensionality reduction of...
  • W. Ni et al.

    Localized, adaptive recursive partial least squares regression for dynamic system modeling

    Ind. Eng. Chem. Res.

    (2012)
  • Cited by (17)

    • Fast neighborhood reconstruction with adaptive weights learning

      2023, Knowledge-Based Systems
      Citation Excerpt :

      PCA aims to find the projection directions that maximize the average distance between samples and the centroid, while LDA makes use of the label information of data and expects samples belonging to the same class to gather around the corresponding class centroid while maintaining the separability between classes [6]. As a result of the simple data distribution assumptions involved, these methods fail to take the underlying data manifold into consideration [7,8]. In this section, we will propose a novel reconstruction-based discriminant analysis model, followed by an iterative optimization algorithm.

    • Fast algorithms for incremental and decremental semi-supervised discriminant analysis

      2022, Pattern Recognition
      Citation Excerpt :

      In cognitive computer vision, incremental learning is also called online task. Online tasks always get more samples over time instead of providing all the data in advance [14]. When new data is added, the cost of time in modifying a trained learner is usually lower than retraining a learner.

    • MDFA-Net: Multiscale dual-path feature aggregation network for cardiac segmentation on multi-sequence cardiac MR

      2021, Knowledge-Based Systems
      Citation Excerpt :

      With the rise of deep learning, this boom has also intensified in the field of medical image processing. Deep learning-based methods have been widely used in feature extraction [10–13], image recognition [14,15], fault diagnosis [16,17], target tracking [18] and other fields. Although classical segmentation methods [19–21] can solve part of segmentation problems, with the rise of deep learning, end-to-end medical image segmentation methods [22–25] have become a good choice.

    View all citing articles on Scopus
    View full text