Elsevier

Applied Soft Computing

Volume 45, August 2016, Pages 98-107
Applied Soft Computing

A reduced feature set for driver head pose estimation

https://doi.org/10.1016/j.asoc.2016.04.027Get rights and content

Highlights

  • We present a new automatic approach for head yaw angle estimation of the driver.

  • We rely on a set of geometric features computed from just three representative facial keypoints.

  • The method has a confidence mechanism to decide the reliability of a sample label.

  • The results are comparable to the state-of-the-art techniques.

  • The method can be easily integrated in massive consume devices.

Abstract

Evaluation of driving performance is of utmost importance in order to reduce road accident rate. Since driving ability includes visual-spatial and operational attention, among others, head pose estimation of the driver is a crucial indicator of driving performance. This paper proposes a new automatic method for coarse and fine head's yaw angle estimation of the driver. We rely on a set of geometric features computed from just three representative facial keypoints, namely the center of the eyes and the nose tip. With these geometric features, our method combines two manifold embedding methods and a linear regression one. In addition, the method has a confidence mechanism to decide if the classification of a sample is not reliable. The approach has been tested using the CMU-PIE dataset and our own driver dataset. Despite the very few facial keypoints required, the results are comparable to the state-of-the-art techniques. The low computational cost of the method and its robustness makes feasible to integrate it in massive consume devices as a real time application.

Introduction

Driver fatigue/drowsiness and distraction are known to be behind a large amount of traffic accidents. Accordingly, different systems have been developed to detect such situations [1], [2], [3], [4]. Distractions are specially challenging because many times are difficult to predict in advance since they may be due to sudden events in the environment or in the cabin. Indeed, a more general challenge including distractions is driving performance. Evaluation of driving performance is of utmost importance in order to reduce road accident rate. Behavior analysis while driving generally points out the abilities of the driver, which include cognitive (attention, executive functions, and memory) and (visual-spatial) perception skills, as well as, their fatigue levels or attention capability [5]. These abilities can be analyzed from several points of view, either measuring non-visual features like heart rate variability [6], or analyzing visual features such as eye blinking behavior [7], gaze direction estimation [8] or analysis of motion of the hands [9] or the feet [10]. In particular, head pose is a crucial indicator of driver's field of view and his/her attention focus [11] and, like most of the indicators named above, it deserves further consideration.

Head pose estimation is a challenging problem in itself due to the variability introduced by factors such as driver's identity and expression, cabin and outdoor illumination, etc. [12]. In fact, during the last decade there has been an increasing interest in developing methods to estimate head pose [13] for different applications such as security and surveillance systems [14], meeting rooms [15], intelligent wheelchair systems [16], and driving monitoring [1], [17]. In the particular case of driving performance, when drivers are paying attention to the road ahead, their facial direction is within approximately ±15° from the straight normal [18]. Thus, the yaw angle of the driver could contribute to determine if he/she perceives road elements such as traffic lights, roundabouts, crosswalks and the attention he/she devotes. Accordingly, in this paper we focus on the computation of such angle from still images. Such a method can be very useful as part of a multi-cue system [19], [20] for early detection of abnormal driving performance in common situations. For example, if the driver does not pay attention to the correct direction in a roundabout, or if he/she attends in a crosswalk but does not see a pedestrian crossing on, an advanced driver assistance system (ADAS) combining driving monitoring and exterior hazard assessment, can decide to elevate the warning level or even braking the car.

Murphy-Chutorian and Trivedi [13] divide head pose estimation methods in 8 categories. Three of them are regression methods, geometric methods and manifold embedding ones.

Regression methods apply a regression model on a training set in one or more directions (angles). The regression model usually is a non-linear one, such as, support vector regressors [21], sparse Bayesian regression [22] or neural networks [23]. One drawback of these methods is that they take into account the whole face image, so that its high dimensionality decreases the efficiency. In some cases, the dimensionality can be reduced, like when the face is well localized. Still, this high dimensionality makes not clear if an specific regression tool is capable of learning the proper curve modeling the directions arch.

Geometric methods are an alternative to directly explode most influencing properties on human head pose estimation, which are usually based on human perception. These features can be divided on two types, those based on face appearance, such as, orientation information of head images [24], [25], skin color histogram [26] or facial contours [27], and those relying on a set of (usually 5–6) local facial keypoints [28]. In the first case, computational cost is still high, since they need to analyze the whole face image. In the second case, facial features detection needs to be highly precise and accurate. To overcome the limitations of a single category method, manifold embedding methods are usually combined with them, gaining in accuracy [12], [29], [30].

In the same fashion, this paper combines the three methods explained above to estimate the continuous angle of head pose of a driver. Roughly, given an image of the driver's head, we rely on a small set of geometric features computed from just three representative facial keypoints, namely the center of the eyes and the nose tip. Our method is based on a combination of subspace projections, as Fisher's linear discriminant (FLD) and principal component analysis (PCA), as well as multiple linear regression adjusted for each pose interval. Fig. 1, Fig. 2 sketch the main steps of the method, split in training the system and testing new samples. For the training (Fig. 1), from a set of samples, we extract the facial keypoints to compute a geometric feature vector. A projection of the samples on a FLD allows the system to suppress some samples not useful to train. Then, the new set of samples is projected on another subspace based on PCA and a multiple linear regression is computed to estimate the regression parameters. When a new sample is given (Fig. 2), it is projected and then classified on the FLD subspace and on the one based on PCA. A combination of both classifications gives us the final coarse yaw angle estimation while the regression parameters serve to compute the continuous yaw angle. Besides, the method integrates a mechanism to self-evaluate the likelihood of the generated hypothesis and discard non likely poses by comparing the discrete angle obtained from FLD and the continuous angle from the regression.

The analysis of the results assessing the reliability of the method shows that, although the very few facial keypoints required, the approach has high accuracy and precision, which makes it comparable to the methods present in the literature. Besides, the computational cost is as low as it can run in real time, making easy to integrate it in massive consume devices such as tablets or mobiles and be part of a multi-cue system for driving performance evaluation. As well, the robustness of the method against noise in the facial keypoint detection is proven.

The remains of the paper are organized as follows. Section 2 describes the mathematical tools involved in the driver's yaw angle estimation. Section 3 describes the detection of the facial keypoints and the geometric features derived from them while Section 4 presents the workflow of the method. The experimental setting and the measures used to assess the reliability of the method are detailed in Section 5. Results and their analysis, are shown in Section 6, while the method is compared with the ones in the literature in Section 7. Last section, 8, is devoted to conclude the paper with some final remarks.

Section snippets

Mathematical tools

In this section, we explain the mathematical tools used along the method.

Features

The facial keypoints we use are the left eye center (LE), right eye center (RE), and nose tip (N) as shown in Fig. 3(a). These keypoints are enough to calculate 10 geometric features consisting of Euclidean distances, ratios, differences and angles.

Fig. 3(b)–(d) shows the features that can be directly extracted from RE, LE and N detection in the three different views, right, frontal and left. Let x = [d(RE, N), θE, θRE, θLE, r(dE, θE), r(θE, θRE), r(θE, θLE), r(θRE, θLE), r(dRE, dRL, θE), s(θLE, θ

Head pose estimation

The method presented in this paper for estimating head yaw angle is split in two main steps, one for training the system and another one for computing the angle of the head in a new image.

Validation protocol

Our method has been tested on two sets of data, one based on images from a controlled scenario and the other based on images acquired while driving.1 For the first dataset, we have considered the CMU-Pose, Illumination and Expression database [32], which contains 13 images of 68 persons that present head pose changes in the horizontal axis of [−135°:22.5°:135°] for a total of 884 images of 640 × 480 pixels

Experimental results

The system has been trained and evaluated with those samples where facial keypoints were detected, which represents a 98.81% of the total samples. This sample set has been randomly divided in two groups, one for training with 70% of samples and the remaining 30% for testing. The training set consists of 47 samples for classes at −45°, 0°, 22.5°, 45°, and 46 samples for the class at −22.5°. The test set consists of 20 samples for classes at −45° and 0°, and 19 samples for classes at −22.5°,

Comparison to existing methods

In order to compare our method with the current state of the art, Table 5 summarizes the results obtained by the methods in the literature that use CMU-PIE. For each method, we report the accuracy rate (Acc), the number of iterations they run the experiments (Iter.), its application to continuous environments (φcont) and if the method is fully automatic or not (Aut.). In case the experiments of a method have been run more than once, we report the mean and the standard deviation as long as they

Discussion and conclusions

In this paper, we have introduced a new methodology for driver coarse and fine head's yaw angle estimation by using a feature set generated from a reduced set of facial keypoints. The approach is based on a combination of subspace methods, as PCA and FLD, and multiple linear regression. As well, it integrates a mechanism to self-evaluate the likelihood of the generated hypothesis and discard uncertain poses by comparing pose label from FLD and PCA.

The reliability of the method has been assessed

Acknowledgements

This work was supported by the Spanish MINECO project TRA2014-57088-C2-1-R and the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya (2014-SGR-1506).

References (39)

  • C.H. Zhao et al.

    Recognition of driving postures by combined features and random subspace ensemble of multilayer perceptron classifiers

    Neural Comput. Appl.

    (2013)
  • H. Jo et al.

    In-attention state monitoring for a driver based on head pose and eye blinking detection using one class support vector machine

    Neural Information Processing

    (2014)
  • F. Vicente et al.

    Driver gaze tracking and eyes off the road detection system

    IEEE Trans. Intell. Transp. Syst.

    (2015)
  • S. Martin et al.

    Understanding head and hand activities and coordination in naturalistic driving videos

  • E. Murphy-Chutorian et al.

    Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation

  • C. Hu et al.

    An effective head pose estimation approach using lie algebrized Gaussians based face representation

    Multimed. Tools Appl.

    (2014)
  • E. Murphy-Chutorian et al.

    Head pose estimation in computer vision: a survey

    IEEE T-PAMI

    (2009)
  • Y. Tian et al.

    Absolute Head Pose Estimation from Overhead Wide-Angle Cameras

    (2003)
  • R. Stiefelhagen

    Tracking focus of attention in meetings

  • Cited by (0)

    View full text