Adaptive feature descriptor selection based on a multi-table reinforcement learning strategy
Introduction
In the computer vision domain the visual object classification (VOC) has attracted the attention of researchers over the last two decades (e.g., [1], [2], [3], [4]). Generally, VOC is based on the representation of the given scene in a space of features, which were extracted and then described by means of some feature descriptors. These feature descriptions are then used as discriminative elements to characterize the given objects. They are computed using information of interest points together with their neighborhood; such interest points are pixels with special characteristics (e.g., [5], [6]). Hence, given an image, the feature descriptors characterize the objects at a higher abstraction level, where classical learning techniques can be used in order to recognize the target object. More elaborated techniques, such as Bag of Features (BoF), are becoming nowadays popular for visual object recognition (e.g., [7], [3], [8], [9]). The BoF consists of four steps as detailed below:
- 1.
Extract the features from the images of the training set using a given detector and a given descriptor.
- 2.
Build a dictionary of visual words using the features extracted before.
- 3.
Construct a histogram, using (1) and (2), for each image in the training set. Hence, the histogram bins represent the number of times a visual word is in the image.
- 4.
Train a classification algorithm using the histogram obtained before.
The current work is focused on the first step of the BoF; in particular, the goal is to learn the best algorithm to describe the interest points. From our experience, the performance of the BOF is strongly influenced by the image feature descriptor, so we state that identifying the best image descriptor for each image will improve the classification rate. A naive approach to solve this problem could be the concatenation of all the possible descriptors. However, this solution is not always feasible since on the one hand it could take a large amount of resources (e.g., memory, CPU time) and on the other hand this would introduce noise to the solution [10]. The challenge of the problem and the importance of finding the right solution have been recently addressed. An approach to select the best descriptor for each image is presented in [10], [11].
In [10] a method for selecting the best descriptor for every image in the database is proposed. In order to select the best descriptor, several attributes of the image (e.g., colorfulness, roughness, shininess, etc.) are taken into account. Although interesting results are presented, their main drawback is the use of a supervised learning scheme where the authors select the descriptors with a subjective criterion. On the contrary, in [11] a method that learns the best descriptor for each image using a Reinforcement Learning (RL) scheme is presented. The RL is a simple learning method based on a trial and error strategy. This work presents two improvements from [11].
- 1.
We propose to use several state definitions.
- 2.
A multi-table scheme is introduced in order to exploit the best state definition for each image.
In summary, this work proposes a novel method to learn the best descriptor from a given set. In order to improve the performance, multiple state definitions are used. This scheme works with a BoF approach, and in concrete, the implementation uses a kd-tree in the second step and a support vector machine (SVM) in the fourth step. The reminder of the paper is organized as follows. Section 2 presents the state of the art. Section 3 summarizes the RL technique. Then, Section 4 presents in detail the proposed method. Experimental results and comparisons are provided in section 5. Section 6 gives the conclusions and future work.
Section snippets
State of the art
Reinforcement learning is a learning technique widely used in the robotics community; recently, some work involving RL have been proposed in the computer vision field. For instance, in image segmentation, the RL technique is used to select the appropriate threshold (e.g., [12], [13]). In [14] the authors propose a RL based approach to tackle the face recognition problem. The authors present a method to learn the set of dominant features for each image. An approach that joins an active learning
Reinforcement learning
The reinforcement learning, as mentioned before, is a trial-and-error learning process [21] where the agent does not have a prior knowledge about which is the correct action to take. RL can be used as a technique to solve a Markov decision process (MDP) problem, in which the agent learns how to take an action in a given environment in order to maximize the expected reward. These concepts are incorporated to the tuple of MDP where:
- •
S is a set of environment states. In this work the
Proposed method
This paper proposes a method to learn the best descriptor for each image. Fig. 2 shows an illustration of the proposed scheme. In particular, Fig. 2(left) presents a classical BoF (i.e., [7], [3], [8], [9]) while Fig. 2(right) shows the proposed RL based scheme. In fact, we propose a new multi-table RL based strategy to select the best descriptor for each image from a set that contains the most widely used according to the literature (i.e., Spin, SIFT, SURF, C-SIFT and PHOW). This section is
Experimental results
The proposed method has been evaluated using two different databases (ETH and COIL). The evaluation framework compares the results using:
- •
A unique descriptor for the whole database.
- •
All the descriptors concatenated in a single one.
- •
The RL-based approach presented in [11].
- •
The RL-based approach with different state definitions.
- •
All the states concatenated.
- •
The information provided by the Q-tables combined (Fig. 7).
Conclusions and future work
This paper presents a novel framework for visual object classification. In particular, it is focussed on the selection of the best image feature descriptor. It is based on the combined use of a bag of features scheme together with a reinforcement learning technique, implemented trough the Q-learning approach. Note that any visual classification method (based on image descriptors) can substitute the BoF in this approach.
The proposed method combines different state definitions in a multi-table
Acknowledgements
This work was partially supported by the Spanish Government under Project TIN2011-25606. Monica Piñol was supported by Universitat Autònoma de Barcelona grant PIF 471-01-8/09.
Monica Piñol Naranjo received the computer science degree from the Universitat Autònoma de Barcelona, Barcelona, Spain, in 2009. In the same year, she joined the Computer Vision Center, Barcelona; in 2010 she received the Master degree in Computer Vision and Artificial Intelligence from the same university. Currently she is pursuing her Ph.D. degree working on reinforcement learning approaches applied to computer vision domain.
References (33)
- et al.
A reinforcement agent for threshold fusion
Appl. Soft. Comput.
(2008) - J. Peng, J. Peng, B. Bhanu, Local reinforcement learning for object recognition, in: Proceedings of Fourteenth...
Distinctive image features from scale invariant keypoints
Int. J. Comput. Vis.
(2004)- L. Fei-Fei, P. Perona, A Bayesian hierarchical model for learning natural scene categories, in: Proceedings of IEEE...
- L. Bo, X. Ren, D. Fox, Depth kernel descriptors for object recognition, in: IEEE/RSJ International Conference on...
- C. Harris, M. Stephens, A combined corner and edge detector, in: Alvey vision conference, vol. 15, Manchester, UK,...
- et al.
Local invariant feature detectors
Found. Trends Comput. Graph. Vis.
(2008) - G. Csurka, C. R. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on...
- H. Bay, T. Tuytelaars, L.V. Gool, Surf: speeded up robust features, in: Proceedings of the European Conference on...
- A. Bosch, A. Zisserman, X. Muñoz, Image classification using random forests and ferns, in: Proceedings of IEEE...
Cited by (1)
Bioinspired and knowledge based techniques and applications
2015, Neurocomputing
Monica Piñol Naranjo received the computer science degree from the Universitat Autònoma de Barcelona, Barcelona, Spain, in 2009. In the same year, she joined the Computer Vision Center, Barcelona; in 2010 she received the Master degree in Computer Vision and Artificial Intelligence from the same university. Currently she is pursuing her Ph.D. degree working on reinforcement learning approaches applied to computer vision domain.
Angel Domingo Sappa received the electromechanical engineering degree from the National University of La Pampa, General Pico, Argentina, in 1995 and the Ph.D. degree in industrial engineering from the Polytechnic University of Catalonia, Barcelona, Spain, in 1999. In 2003, after holding research positions in France, U.K., and Greece, he joined the Computer Vision Center, Barcelona, where he is currently a Senior Researcher. His current research focuses on stereo image processing and analysis, 3-D modeling, and dense optical flow estimation. His research interests span a broad spectrum within the 2-D and 3-D image processing. Dr. Sappa is a member of the Advanced Driver Assistance Systems Group, Computer Vision Center.
Ricardo Toledo received the degree in Electronic Engineering from the Universidad Nacional de Rosario (Argentina) in 1986, the M.Sc. degree in image processing and artificial intelligence from the Universitat Autònoma de Barcelona (UAB) in 1992 and the Ph.D. in 2001.
Since 1989 he has been giving lectures at the Computer Science Department of the UAB and participating in R+D projects. Currently he is a full time associated professor. In 1996 he participated in the foundation of the Computer Vision Center (CVC) at the UAB. Ricardo has participated in national and international R+D projects being the leader of some of them, and is coauthor of more than 40 articles, all these in the field of computer vision, robotics and medical imaging.