A reduced feature set for driver head pose estimation
Graphical abstract
Introduction
Driver fatigue/drowsiness and distraction are known to be behind a large amount of traffic accidents. Accordingly, different systems have been developed to detect such situations [1], [2], [3], [4]. Distractions are specially challenging because many times are difficult to predict in advance since they may be due to sudden events in the environment or in the cabin. Indeed, a more general challenge including distractions is driving performance. Evaluation of driving performance is of utmost importance in order to reduce road accident rate. Behavior analysis while driving generally points out the abilities of the driver, which include cognitive (attention, executive functions, and memory) and (visual-spatial) perception skills, as well as, their fatigue levels or attention capability [5]. These abilities can be analyzed from several points of view, either measuring non-visual features like heart rate variability [6], or analyzing visual features such as eye blinking behavior [7], gaze direction estimation [8] or analysis of motion of the hands [9] or the feet [10]. In particular, head pose is a crucial indicator of driver's field of view and his/her attention focus [11] and, like most of the indicators named above, it deserves further consideration.
Head pose estimation is a challenging problem in itself due to the variability introduced by factors such as driver's identity and expression, cabin and outdoor illumination, etc. [12]. In fact, during the last decade there has been an increasing interest in developing methods to estimate head pose [13] for different applications such as security and surveillance systems [14], meeting rooms [15], intelligent wheelchair systems [16], and driving monitoring [1], [17]. In the particular case of driving performance, when drivers are paying attention to the road ahead, their facial direction is within approximately ±15° from the straight normal [18]. Thus, the yaw angle of the driver could contribute to determine if he/she perceives road elements such as traffic lights, roundabouts, crosswalks and the attention he/she devotes. Accordingly, in this paper we focus on the computation of such angle from still images. Such a method can be very useful as part of a multi-cue system [19], [20] for early detection of abnormal driving performance in common situations. For example, if the driver does not pay attention to the correct direction in a roundabout, or if he/she attends in a crosswalk but does not see a pedestrian crossing on, an advanced driver assistance system (ADAS) combining driving monitoring and exterior hazard assessment, can decide to elevate the warning level or even braking the car.
Murphy-Chutorian and Trivedi [13] divide head pose estimation methods in 8 categories. Three of them are regression methods, geometric methods and manifold embedding ones.
Regression methods apply a regression model on a training set in one or more directions (angles). The regression model usually is a non-linear one, such as, support vector regressors [21], sparse Bayesian regression [22] or neural networks [23]. One drawback of these methods is that they take into account the whole face image, so that its high dimensionality decreases the efficiency. In some cases, the dimensionality can be reduced, like when the face is well localized. Still, this high dimensionality makes not clear if an specific regression tool is capable of learning the proper curve modeling the directions arch.
Geometric methods are an alternative to directly explode most influencing properties on human head pose estimation, which are usually based on human perception. These features can be divided on two types, those based on face appearance, such as, orientation information of head images [24], [25], skin color histogram [26] or facial contours [27], and those relying on a set of (usually 5–6) local facial keypoints [28]. In the first case, computational cost is still high, since they need to analyze the whole face image. In the second case, facial features detection needs to be highly precise and accurate. To overcome the limitations of a single category method, manifold embedding methods are usually combined with them, gaining in accuracy [12], [29], [30].
In the same fashion, this paper combines the three methods explained above to estimate the continuous angle of head pose of a driver. Roughly, given an image of the driver's head, we rely on a small set of geometric features computed from just three representative facial keypoints, namely the center of the eyes and the nose tip. Our method is based on a combination of subspace projections, as Fisher's linear discriminant (FLD) and principal component analysis (PCA), as well as multiple linear regression adjusted for each pose interval. Fig. 1, Fig. 2 sketch the main steps of the method, split in training the system and testing new samples. For the training (Fig. 1), from a set of samples, we extract the facial keypoints to compute a geometric feature vector. A projection of the samples on a FLD allows the system to suppress some samples not useful to train. Then, the new set of samples is projected on another subspace based on PCA and a multiple linear regression is computed to estimate the regression parameters. When a new sample is given (Fig. 2), it is projected and then classified on the FLD subspace and on the one based on PCA. A combination of both classifications gives us the final coarse yaw angle estimation while the regression parameters serve to compute the continuous yaw angle. Besides, the method integrates a mechanism to self-evaluate the likelihood of the generated hypothesis and discard non likely poses by comparing the discrete angle obtained from FLD and the continuous angle from the regression.
The analysis of the results assessing the reliability of the method shows that, although the very few facial keypoints required, the approach has high accuracy and precision, which makes it comparable to the methods present in the literature. Besides, the computational cost is as low as it can run in real time, making easy to integrate it in massive consume devices such as tablets or mobiles and be part of a multi-cue system for driving performance evaluation. As well, the robustness of the method against noise in the facial keypoint detection is proven.
The remains of the paper are organized as follows. Section 2 describes the mathematical tools involved in the driver's yaw angle estimation. Section 3 describes the detection of the facial keypoints and the geometric features derived from them while Section 4 presents the workflow of the method. The experimental setting and the measures used to assess the reliability of the method are detailed in Section 5. Results and their analysis, are shown in Section 6, while the method is compared with the ones in the literature in Section 7. Last section, 8, is devoted to conclude the paper with some final remarks.
Section snippets
Mathematical tools
In this section, we explain the mathematical tools used along the method.
Features
The facial keypoints we use are the left eye center (LE), right eye center (RE), and nose tip (N) as shown in Fig. 3(a). These keypoints are enough to calculate 10 geometric features consisting of Euclidean distances, ratios, differences and angles.
Fig. 3(b)–(d) shows the features that can be directly extracted from RE, LE and N detection in the three different views, right, frontal and left. Let x = [d(RE, N), θE, θRE, θLE, r(dE, θE), r(θE, θRE), r(θE, θLE), r(θRE, θLE), r(dRE, dRL, θE), s(θLE, θ
Head pose estimation
The method presented in this paper for estimating head yaw angle is split in two main steps, one for training the system and another one for computing the angle of the head in a new image.
Validation protocol
Our method has been tested on two sets of data, one based on images from a controlled scenario and the other based on images acquired while driving.1 For the first dataset, we have considered the CMU-Pose, Illumination and Expression database [32], which contains 13 images of 68 persons that present head pose changes in the horizontal axis of [−135°:22.5°:135°] for a total of 884 images of 640 × 480 pixels
Experimental results
The system has been trained and evaluated with those samples where facial keypoints were detected, which represents a 98.81% of the total samples. This sample set has been randomly divided in two groups, one for training with 70% of samples and the remaining 30% for testing. The training set consists of 47 samples for classes at −45°, 0°, 22.5°, 45°, and 46 samples for the class at −22.5°. The test set consists of 20 samples for classes at −45° and 0°, and 19 samples for classes at −22.5°,
Comparison to existing methods
In order to compare our method with the current state of the art, Table 5 summarizes the results obtained by the methods in the literature that use CMU-PIE. For each method, we report the accuracy rate (Acc), the number of iterations they run the experiments (Iter.), its application to continuous environments (φcont) and if the method is fully automatic or not (Aut.). In case the experiments of a method have been run more than once, we report the mean and the standard deviation as long as they
Discussion and conclusions
In this paper, we have introduced a new methodology for driver coarse and fine head's yaw angle estimation by using a feature set generated from a reduced set of facial keypoints. The approach is based on a combination of subspace methods, as PCA and FLD, and multiple linear regression. As well, it integrates a mechanism to self-evaluate the likelihood of the generated hypothesis and discard uncertain poses by comparing pose label from FLD and PCA.
The reliability of the method has been assessed
Acknowledgements
This work was supported by the Spanish MINECO project TRA2014-57088-C2-1-R and the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya (2014-SGR-1506).
References (39)
- et al.
Fully automated real time fatigue detection of drivers through fuzzy expert systems
Appl. Soft Comput.
(2014) - et al.
Applying neural network analysis on heart rate variability data to assess driver fatigue
Expert Syst. Appl.
(2011) - et al.
Modeling and prediction of driver behavior by foot gesture analysis
Comput. Vis. Image Underst.
(2012) - et al.
An integrated approach for head gesture based interface
Appl. Soft Comput.
(2012) - et al.
Support vector machine based multi-view face detection and recognition
Image Vis. Comput.
(2004) - et al.
Face distributions in similarity space under varying head pose
Image Vis. Comput.
(2001) - et al.
EM enhancement of 3D head pose estimated by point at infinity
Image Vis. Comput.
(2007) - et al.
Driver alertness monitoring using fusion of facial features and bio-signals
Sensors
(2012) - et al.
Visual analysis of eye state and head pose for driver alertness monitoring
IEEE T-ITS
(2013) - et al.
Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness
IEEE T-ITS
(2010)