2D–3D-based on-board pedestrian detection system
Introduction
Nowadays, traffic accidents represent one of the major causes of death worldwide. According to the World Health Organization, everyday 3000 people die as a result of a road accident [1]. Concretely, in the vehicle-to-pedestrian accidents case the Economic Commission for Europe reported almost 150,000 injuries and 7000 killed pedestrians only in the European Union in 2003, representing the second source of fatalities just after vehicle-to-vehicle accidents [2]. However, contrary to the socially accepted view of traffic accidents as a random and unpredictable consequence of road transportation, these fatalities can be tackled by prevention and sensible measures. As a result, in the last decades such a problem is gaining more attention from both governments and industry, which invest big efforts in traffic safety research.
In last decade, in addition to the improvements in the road infrastructures (e.g., visibility enhancements, roundabouts, speed controls, better signposting, etc.), a new area of research has received a special focus: the Advanced Driver Assistance Systems (ADAS). ADAS are intelligent on-board systems that aim at anticipating and preventing accidents, or at least, minimizing their effects when unavoidable. Examples of ADAS are the Adaptive Cruise Control, which adjusts the own vehicle speed in order to keep a safe gap with the preceding vehicle, or the Lane Departure Warning, which warns the driver in case that the vehicle leaves the lane inadvertently. One of the most complex ADAS applications are the Pedestrian Protection Systems (PPSs), focus of this paper. In this case, the aim is to detect and localize static or moving people in a defined area in front of the vehicle in order both to provide information to the driver and to perform evasive or braking actions. Fig. 1 illustrates the typical risky areas to be tackled by a PPS. In regular conditions, the vehicle stopping distance is about 5 m at 30 km/h , increasing up to 12 m at 50 km/h, thus the systems must intelligently focus their techniques on the danger of detecting a pedestrian in these areas.
Computer Vision, by the use of passive sensors like cameras, plays a key role in most of these systems. For instance, cameras are used in PPSs in order to detect the traffic objects of interest (i.e., pedestrians) taking advantage of their rich amount of cues and high resolution. The topics involved in ADAS are in the frontier of the state-of-the-art since they require real-time interpretation of outdoor scenarios (uncontrolled illumination) from a mobile platform (fast background changes and presence of objects of unknown movement). Furthermore, in the PPSs context, pedestrian detection is even more challenging due to the high variability of their appearance (i.e., different articulated pose, clothes, distance and viewpoint) and the cluttered scenarios usually found in urban environments. It is worth to mention that the moving nature of ADAS makes some well-established techniques from other human detection areas, like background subtraction methods for surveillance, not applicable in our case.
In this paper we present a pedestrian detection system that makes use of Computer Vision cues, specially taking advantage of 3D information to enrich the classification, which is typically based on 2D. The system is divided in three steps. First, 3D data computed from a stereo rig are used to estimate the road pose, which is needed to adjust pedestrian sized windows in 3D. These windows, regions of interest (ROIs from now on), are then projected onto the 2D image plane where they are labeled as pedestrians or non-pedestrians by our proposed classifier: Real AdaBoost learning algorithm with Haar wavelets (HW) and edge orientation histogram (EOH) features. The final stage of the system verifies each positive labeled ROI by checking its 3D position and size. A final refinement stage is used to group overlapped redundant detections in 2D.
The remainder of this paper is as follows. After overviewing the related research in Section 2, an introduction to the proposed system is described in Section 3, fitting it to a general PPS architecture presented in [3]. Then, the modules of the current system, which make use of the aforementioned techniques, are placed in this architecture context. The first module, described in Section 4, makes use of the 3D-based adaptive image sampling technique. Section 5 presents the 2D classification module. Section 6 presents the last module, consisting of the 3D verification and the final 2D detections grouping. Finally, Section 7 presents experimental results of each of the three modules and of the whole system. Conclusions are summarized in Section 8.
Section snippets
Related research
By having a look at the literature [3] it is seen that most of the systems are based on feature selection and machine learning to perform 2D pedestrian classification. Some examples are the symmetry and binary template based approach by Broggi et al. [4], SVM on gradient images approach by Grubb et al. [5], the hierarchical template matching (Chamfer System) and neural networks by Gavrila et al. [6] or the parts-based SVM and AdaBoost approach by Shashua et al. [7]. In fact, PPSs can take
2D–3D system
The literature overview leads us to two important points, which are taken as keypoints for the current proposal. First, it is difficult to think of a perfect classifier just using 2D cues, thus we bet for combining it with 3D information. Second, a common methodology can be inferred from such proposals when tackling the development of a PPS. In fact, in a recent survey, Gerónimo et al. [3] propose a general architecture for ADAS pedestrian detection. It consists of six modules in which a
Adaptive image sampling
The main target at this stage is to define a set of ROIs, by a uniform sampling of the road surface that results in an adaptive sampling of the image plane (Fig. 4c). It works by fitting a plane to the road surface using a RANSAC based least squares fitting.
In order to acquire the 3D information of the region in front of the host vehicle (Fig. 4b), a commercial stereo vision system (Bumblebee from Point Grey (http://www.ptgrey.com) has been used (Fig. 4a). The baseline of the stereo head is 12
ROI classification
Once the list of ROIs laying on the ground has been generated, this stage is aimed at labeling them as pedestrians or non-pedestrians, now by using just one of the cameras. Generally, in PPSs, object classification approaches can be broadly divided into two categories: silhouette matching and appearance based. The papers laying in the former one (e.g., head-and-shoulders binary silhouette [4] or the Chamfer System [6]) have been proven not to be robust enough to carry out the classification
3D verification and ROI grouping
Although the classification module provides satisfactory results when classifying state-of-the-art pedestrian databases as illustrated in Section 7, the number of false positives (FP) is still high to fulfill the requirements of ADAS. In addition, as a result of road sampling technique and a desired shift tolerance of the classifier, a number of overlapped ROIs containing a pedestrian are expected to be labeled as positive. Hence, two requirements are expected to be fulfilled by this module: to
Experimental results
As in any complex system, the obtained final result will depend on the success of every single component. In this section, the performance of each module is evaluated. Then, some final detection images illustrate the whole system results. A 3.2 GHz Pentium IV PC with a non-optimized code has been used.
Conclusions
This paper presents a system that detects pedestrians from a moving vehicle in urban scenarios. There are three main contributions presented as independent modules. First, an adaptive image sampling method estimates the relative camera/road plane position in order to distribute pedestrian sized ROIs along the surface. This algorithm is also useful for other ADAS tasks like vehicle detection and road segmentation. Second, a pedestrian classifier based on fast-to-compute features, namely Haar
Acknowledgments
This work was been founded by the Government of Spain under Project TRA2007-62526/AUT and research programme Consolider Ingenio 2010: MIPRCV (CSD200700018). D. Gerónimo is supported by Grant BES-2005-8864.
References (36)
- United Nations, Economic Commission for Europe. Statistics of Road Traffic Accidents in Europe and North America,...
- D. Gerónimo, A. López, A. Sappa, Computer vision approaches to pedestrian detection: visible spectrum survey, in:...
- et al.
Shape-based pedestrian detection
- G. Grubb, A. Zelinsky, L. Nilsson, M. Rilbe, 3D vision sensing for improved pedestrian safety, in: Proceedings of the...
- et al.
Multi-cue pedestrian detection and tracking from a moving vehicle
International Journal on Computer Vision
(2007) - A. Shashua, Y. Gdalyahu, G. Hayun, Pedestrian detection for driving assistance systems: single-frame classification and...
- N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: Proceedings of the IEEE Conference on...
- B. Leibe, E. Seemann, B. Schiele, Pedestrian detection in crowded scenes, in: Proceedings of the IEEE Conference on...
Distinctive image features from scale-invariant keypoints
International Journal on Computer Vision
(2004)
Pedestrian detection via classification on riemannian manifold
IEEE Transactions on Pattern Analysis and Machine Intelligence
Detection and tracking of multiple, partially occluded humans by Bayesian combination of edgelet part detectors
International Journal on Computer Vision
A trainable system for object detection
International Journal on Computer Vision
Cited by (65)
Collaborative target tracking of IoT heterogeneous nodes
2019, Measurement: Journal of the International Measurement ConfederationCitation Excerpt :Hu et al. [27] studied the character representation of multi-source heterogenous data, data fusion, and target tracking method through wireless signal, camera, and deep-sensing data. Geronimo et al. [28] combined 2D and 3D modes for detecting pedestrians on the road by using an image detector in the target region. Kim et al. [29] established a hybrid multi-source localization system by combining magnetic field intensity, honeycomb signal, and Wi-Fi.
Training my car to see using virtual worlds
2017, Image and Vision ComputingEnvironmental perception for intelligent vehicles
2017, Intelligent Vehicles: Enabling Technologies and Future DevelopmentsDriver-pedestrian interaction under different road environments
2017, Transportation Research ProcediaHuman detection from images and videos: A survey
2016, Pattern RecognitionCitation Excerpt :If stereo images are available, depth information can be used to isolate human candidates [21,22]. Knowledge of the ground plane can also be considered as an important cue to limit the search space of the location and scale of human objects in some methods, e.g. [23,22,24]. Fig. 4(b) and (c) shows examples of using background subtraction and stereo information to extract human candidates.