An overview of incremental feature extraction methods based on linear subspaces
Introduction
Processing large amounts of data is nowadays a challenging task in the field of pattern recognition, which aims to extract meaningful information embedded in the data. As a first general step, appropriate structured descriptors or features must be selected or extracted from raw data through a learning process using any prior information available. This leads to a more discriminative data representation with lower dimensionality, facilitating the following steps on machine learning and data mining pipelines. The traditional way to extract these features is usually based on batch learning. However, this requires that all the data must be available from the beginning and used as whole, which is not convenient or even feasible in most online, interactive or stream-based processing applications. Several application domains such as autonomous navigation systems [1], human-robot interaction [2], object tracking [3], image classification [4], stream processing [5], face recognition [6] or recommendation systems [7], [8], [9], [10] have been shown as examples where a complete set of training samples is usually not known in advance but generally provided little by little. Moreover, in some cases the properties of data may change as new data is considered. For instance, in face recognition tasks, human faces may show large variations depending on expressions, lighting conditions, make-up, hairstyles, aging and so forth. When a human is registered in a person identification system, it is quite difficult to consider all this facial variability in advance [11] but instead it is more convenient to discover it during the operation of the system.
As an effective alternative, the paradigm of incremental or adaptive learning has been considered and deeply studied as its own pattern recognition and machine learning subfield. By using incremental learning, feature extraction processes should be capable of incorporating the new information available while retaining the previously acquired knowledge, without accessing the previously processed training data. This fact is very challenging specially in the era of big data, where new chunks of data is continuously appearing and new classification objectives arise.
Among the huge amount of incremental learning schemes, this paper focuses on linear subspace-based incremental feature extraction methods with orthogonal matrix constraints based on global loss function, due to the extensive use of their batch approaches versus other linear alternatives with unconstrained objectives, such as probabilistic PCA [3], [12], or matrix factorization methods [7], [8], [9], [10], [13], [14], [15], [16], mostly popular for building collaborative filtering on recommender systems. Note that not all linear feature extraction methods need to produce orthogonal projections, or indeed projections at all. While subspace-based methods can be based on linear and non-linear subspaces, linear methods are the most extensively used, even in highly non-linear problems where the non-linearity is modeled in the subsequent feature extraction and classification stages instead. An example of this is the use of linear dimensionality reduction methods in modern deep learning architectures as preprocessing step to reduce the number of parameters to be learned and the number training samples [17], [18], [19]. Moreover, these techniques have been used in the last years in many successful problems as object tracking [20], [21], [22] or in other application fields, such as pharmaceutics [23], medical image [24], [25], agriculture [26], industrial applications [27], chemometrics [28], [29] pattern recognition [30] or bioinformatics [31], [32].
Therefore, this paper presents a categorized overview of the research done over the past decades on linear subspace-based incremental feature extraction and dimensionality reduction for matrices and general applications. Special emphasis is put on those methods with orthogonal matrix constraints based on global loss function, such as Principal Components Analysis (PCA), Linear Discriminative Analysis (LDA), and Discriminative Common Vector (DCV) methods, over methods with unconstrained objectives, such as probabilistic PCA [3], [12] or matrix factorizations [7], [8], [9], [10], [14], [15], [16]. Similarly, we consider that those incremental methods which are more related to subspace-to-subspace matching [33], [34], and tensor factorization [35], [36], [37], [38] are out of the scope. By restricting ourselves to these methods, we can both keep our survey to a manageable size and also concentrate at the basic ideas behind the different incremental approaches that are usually shared across a wider range of works. For the same reason, we have obviated incremental nonlinear extensions of the above methods [39].
In the present work we will differentiate methods according to the subspace model used. From this viewpoint, two main categories of incremental subspace-based methods are usually considered depending on whether or not the above matrices are explicitly considered and computed (using different forms of decompositions) or not. Some of these variants are referred to in the literature as covariance-based or covariance-free methods. Table 1 summarizes all the papers considered in the present work according to the subspace model used and the computation (or not) of the above matrices.
To complete this multidimensional taxonomy, which is graphically illustrated in Fig. 1, different ways of feeding incremental algorithms are considered. The first one is in terms of the data size required for each update, which may range from one single sample at a time to moderate chunks of data. The second one is in terms of data labels, i.e. whether or not the set of labels in the corresponding classification problem is fixed beforehand or may grow arbitrarily along the incremental process. We will refer to these two aspects as chunk size and chunk label structure, respectively. Finally, we will also consider the size/dimension ratio, where we explicitly distinguish between the case in which the input space dimension is much greater than the expected data stream size and this constitutes a requirement or strongly conditions a particular method. Facing very small values of this ratio is usually referred as the small sample size (SSS) case.
The paper is organized around the above taxonomy paired with the discussion of the advantages and disadvantages inherent to each approach. The remainder of the paper is structured as follows. Section 2 describes the problem setting. Sections 3–5 contain an organized overview of incremental feature extraction based on PCA, LDA and DCV approaches, respectively. Section 6 shows a performance analysis of the incremental methods regarding their experimental setup and accuracy rates available in published results, as well as the empirical comparison of the best approach of each kind. Finally, Section 7 summarizes the main contributions of this survey.
Section snippets
Problem setting
Throughout this paper, the d-dimensional vector space, will be considered as the input representation space. represents a projection mapping, where ui are orthonormal vectors that span a linear subspace in and usually r ≪ d. Given a (column) vector, its mapping onto is and the corresponding orthogonal projection onto the subspace defined by Ur is given by . Different methods optimize different criteria which are based on different scatter matrices 1
Incremental Principal Components Analysis (IPCA)
PCA is the most popular feature extraction method which aims to find a set of orthonormal basis vectors (the principal components) that maximize the variance over all the data when it is projected onto the subspace spanned by these principal components. Formally, it corresponds to the following optimization problem
Although PCA is not optimal with regard to discrimination since no class information is used to obtain principal components, it is optimal in terms of minimum
Incremental Linear Discriminant Analysis (ILDA)
LDA is a traditional statistical technique that reduces dimensionality while preserving as much of the class discriminatory information as possible, Instead of maximizing variability, LDA tries to find appropriate subspaces in which labelled data gets optimally separated. To this end, the total scatter matrix can be decomposed in two parts, such that one wants minimal within-class dispersion and maximal between-class dispersion at the same time. The classical way to achieve this,
Incremental Discriminative Common Vector (IDCV)
The DCV [106] approach constitutes a different way to overcome the singularity problem of LDA. It is particularly appealing because of its good performance behavior and flexibility of implementation, specially in case of very large dimensionalities such as in image recognition or genomic problems. Similarly to null space based methods, DCV method uses the to project the samples on it and implicitly avoid singularities. In fact, the rationale behind DCV is basically the same as with LDA
Performance analysis
This section gives an overview of the discriminant properties, in terms of accuracy rate, according to the empirical results provided by the authors of the incremental algorithms referenced in this paper. The evaluation and comparison of the performance of the incremental approaches is a difficult task, since few of them contain numerical results. As well, these results are usually not comparable across publications largely due to different experimental setups. For instance, ICLDA [66] and
Summary and concluding remarks
An exhaustive survey on incremental feature extraction based on linear subspace methods with orthogonal matrix constraints based on global loss function has been presented in this paper. Incremental methods are discussed and categorised in terms of subspace model, decomposition of matrices, updating strategy, requirements to be applied, computational complexity, experimental setup and accuracy rates. These different aspects allow us to analyze their properties and suitability when applied to
References (108)
- et al.
Incremental collaborative filtering recommender based on regularized matrix factorization
Knowl. Based Syst.
(2012) - et al.
A cognitive vision approach to early pest detection in greenhouse crops
Comput. Electron. Agric.
(2008) - et al.
Incremental model learning for spectroscopy-based food analysis
Chemom. Intel. Lab. Syst.
(2017) - et al.
Candid covariance-free incremental principal component analysis
IEEE Trans. Pattern Anal. Mach. Intell.
(2003) - et al.
Weighted and robust incremental method for subspace learning
Proceedings of the 9th IEEE International Conference on CV
(2003) - et al.
A novel incremental principal component analysis and its application for face recognition
IEEE Trans. Syst. Man Cybern.Part B
(2006) - et al.
Incremental complete LDA for face recognition
Pattern Recognit.
(2012) - et al.
Incremental LDA learning by combining reconstructive and discriminative approaches
Proceedings of the Conference on British Machine Vision
(2007) - et al.
A rank-one update method for least squares linear discriminant analysis with concept drift
Pattern Recognit.
(2013) - et al.
Incremental linear discriminant analysis: a fast algorithm and comparisons
IEEE Trans. Neural Netw. Learn. Syst.
(2015)
Fast algorithm for updating the discriminant vectors of dual-space LDA
IEEE Trans. Inf. Forens. Secur.
Image recognition through incremental discriminative common vectors
Proceedings of the Advanced Concepts for Intelligent Vision Systems - ACIVS
Null space based image recognition using incremental eigendecomposition
Proceedings of the Pattern Recognition and Image Analysis
An online incremental orthogonal component analysis method for dimensionality reduction
Neural Netw.
Some modified matrix eigenvalue problems
SIAM Rev.
Methods for modifying matrix factorizations
Math. Comput.
A direct LDA algorithm for high-dimensional data with application to face recognition
Pattern Recognit.
Why can LDA be performed in PCA transformed space?
Pattern Recognit
LDA/QR: An efficient and effective dimension reduction algorithm and its theoretical foundation
Pattern Recognit.
An incremental-learning-by-navigation approach to vision-based autonomous land vehicle guidance in indoor environments using vertical line information and multiweighted generalized hough transform technique
IEEE Trans. Syst. Man Cybern.Part B
Incremental learning of tasks from user demonstrations, past experiences, and vocal comments.
IEEE Trans. Syst. Man. Cybern. Part B
Incremental learning for robust visual tracking
IJCV
Incremental generalized discriminative common vectors for image classification
IEEE Trans. Neural Netw. Learn. Syst.
Covariance free incremental principal component analysis with exact mean update
J. Comput. Inf. Syst.
Chunk incremental IDR/QR LDA learning
IJCNN
Scalable collaborative filtering approaches for large recommender systems
J. Mach. Learn. Res.
Real-time top-n recommendation in social streams
Proceedings of the Sixth ACM Conference on Recommender Systems
Fast incremental matrix factorization for recommendation with positive-only feedback
Proceedings of the 22nd International Conference on User Modeling, Adaptation, and Personalization (UMAP)
Incremental matrix factorization via feature space re-learning for recommender system
Proceedings of the 9th ACM Conference on Recommender Systems
On-line feature selection for adaptive evolving connectionist systems
International Journal of Innovative Computing, Information and Control
Sequential Karhunen–Loeve basis extraction and its application to images
IEEE Trans. Image Proces.
Online learning for collaborative filtering
Proceedings of the International Joint Conference on Neural Networks (IJCNN)
Incremental nonnegative matrix factorization based on matrix sketching and k-means clustering
Proceedings of the 17th International Conference on Intelligent Data Engineering and Automated Learning – IDEAL
Incremental matrix factorization: A linear feature transformation perspective
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, (IJCAI)
Pcanet: a simple deep learning baseline for image classification?
IEEE Trans. Image Process.
Text-independent voice conversion using deep neural network based phonetic level features
Proceedings of the 23rd International Conference on Pattern Recognition (ICPR)
Object tracking based on two-dimensional PCA
Opt. Rev.
Online object tracking based on convex hull representation
Proceedings of the IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)
Pharmaceutical raw material identification using miniature near-infrared (micronir) spectroscopy and supervised pattern recognition using support vector machine
Appl. Spectrosc.
Principal component analysis (PCA) in medical image processing using digital imaging and communications in medicine (dicom) medical images
Int. J. Pharma Bio. Sci.
Principal component analysis in medical image processing: a study
Int. J. Image Mining
An adaptive image processing system based on incremental learning for industrial applications
Proceedings of the IEEE Emerging Technology and Factory Automation (ETFA)
Chemometric multivariate tools for candidate biomarker identification: lda, pls-da, simca, ranking-pca
2-D PAGE Map Anal. Methods Protoc.
Performance analysis: an integration of principal component analysis and linear discriminant analysis for a very large number of measured variables
Res. J. Appl. Sci.
Localized, adaptive recursive partial least squares regression for dynamic system modeling
Ind. Eng. Chem. Res.
Cited by (17)
A joint-norm distance metric 2DPCA for robust dimensionality reduction
2023, Information SciencesFast neighborhood reconstruction with adaptive weights learning
2023, Knowledge-Based SystemsCitation Excerpt :PCA aims to find the projection directions that maximize the average distance between samples and the centroid, while LDA makes use of the label information of data and expects samples belonging to the same class to gather around the corresponding class centroid while maintaining the separability between classes [6]. As a result of the simple data distribution assumptions involved, these methods fail to take the underlying data manifold into consideration [7,8]. In this section, we will propose a novel reconstruction-based discriminant analysis model, followed by an iterative optimization algorithm.
Fast algorithms for incremental and decremental semi-supervised discriminant analysis
2022, Pattern RecognitionCitation Excerpt :In cognitive computer vision, incremental learning is also called online task. Online tasks always get more samples over time instead of providing all the data in advance [14]. When new data is added, the cost of time in modifying a trained learner is usually lower than retraining a learner.
MDFA-Net: Multiscale dual-path feature aggregation network for cardiac segmentation on multi-sequence cardiac MR
2021, Knowledge-Based SystemsCitation Excerpt :With the rise of deep learning, this boom has also intensified in the field of medical image processing. Deep learning-based methods have been widely used in feature extraction [10–13], image recognition [14,15], fault diagnosis [16,17], target tracking [18] and other fields. Although classical segmentation methods [19–21] can solve part of segmentation problems, with the rise of deep learning, end-to-end medical image segmentation methods [22–25] have become a good choice.
A Model for Enhancing Unstructured Big Data Warehouse Execution Time
2024, Big Data and Cognitive Computing