Elsevier

Robotics and Autonomous Systems

Volume 85, November 2016, Pages 26-36
Robotics and Autonomous Systems

Monocular visual odometry: A cross-spectral image fusion based approach

https://doi.org/10.1016/j.robot.2016.08.005Get rights and content

Highlights

  • Monocular visual odometry based on a fused image approach.

  • DWT image fusion parameters selected according to a quantitative evaluation metric.

  • Experimental results with two public data sets illustrate its validity.

  • Comparisons with other approaches are provided.

Abstract

This manuscript evaluates the usage of fused cross-spectral images in a monocular visual odometry approach. Fused images are obtained through a Discrete Wavelet Transform (DWT) scheme, where the best setup is empirically obtained by means of a mutual information based evaluation metric. The objective is to have a flexible scheme where fusion parameters are adapted according to the characteristics of the given images. Visual odometry is computed from the fused monocular images using an off the shelf approach. Experimental results using data sets obtained with two different platforms are presented. Additionally, comparison with a previous approach as well as with monocular-visible/infrared spectra are also provided showing the advantages of the proposed scheme.

Introduction

The usage of cross-spectral imaging has been increasing due to the drop in price of cameras working at different spectral bands. That increase is motivated by the possibility of developing robust solutions that cannot be obtained if a single band were used. These robust solutions can be found in domains such as thermal inspection  [1], video surveillance  [2], face detection  [3], driving assistance  [4] and visual odometry  [5], which is the focus of the current work. Before tackling one of the problems mentioned above, the information provided by the cameras working at different spectral bands needs to be fused into a single and compact representation for further processing, assuming an early fusion scheme is followed.

Visual Odometry (VO) is the process of estimating the egomotion of an agent (e.g., vehicle, human or a robot) using only the input of a single or multiple cameras attached to it. This term has been proposed by Nister  [6] in 2004, which has been chosen for its similarity to wheel odometry. In wheel odometry, the motion of a vehicle is obtained by integrating the number of turns of its wheels over time. Similarly, VO operates by incrementally estimating the pose of the vehicle by analyzing the changes induced by the motion in the images of the onboard vision system.

State of the art VO approaches are based on monocular or stereo vision systems; most of them working with cameras in the visible spectrum (e.g.,  [7], [8]). The approaches proposed in the literature can be coarsely classified into feature based methods, image based methods and hybrid methods. The feature based methods rely on visual features extracted from the given images (e.g., corners, edges) that are matched between consecutive frames to estimate the egomotion. On the contrary to feature based methods, the image based approaches directly estimate the motion by minimizing the intensity error between consecutive images. Finally, hybrid methods are based on a combination of the approaches mentioned before to reach a more robust solution. All the VO approaches based on visible spectrum imaging, in addition to their own intrinsic limitations, have additional ones related with the nature of the images (i.e., photometry). Having in mind these limitations (i.e., noise, sensitivity to lighting changes, etc.) monocular and stereo vision based VO approaches, using cameras in the infrared spectrum, have been proposed (e.g.,  [9], [10]) and more recently cross-spectral stereo based approaches have been also introduced (e.g.,  [11], [5]). The current work proposes a step further by tackling the monocular vision odometry problem with an image resulting from the fusion of a cross-spectral imaging device. The goal behind such an approach is to take advantage of the strengths of each band according to the characteristics of the scenario (e.g., daytime, nighttime, poor lighting conditions, etc.). A difference to a previous approach published in  [5] is that in the current work fusion parameters are adapted to the characteristics of the given images.

Image fusion is the process of combining information from two or more images of a given scene into a single representation. This process is intended for encoding information from source images into a single and more informative one, which could be suitable for further processing or visual perception. There are two different cases where image fusion takes place; firstly, the case of images obtained from different sensors (multisensory), which could also work at different spectral band (multispectral). Secondly, the case of images of the same scene but acquired at different times (multitemporal). The current work is focused on the first case, more specifically, fusing pair of images from visible and infrared spectra obtained at the same time by different sensors. It is assumed that the images to be fused are correctly registered  [12]; otherwise a process of cross-spectral feature detection and description should be followed in order to find the correspondences between the images (e.g.,  [13], [14]).

During the last decades, the image fusion problem has been largely studied, mainly for remote sensing applications (e.g.,  [15], [16]). Most of these methods have been proposed to produce a high-resolution multispectral representation from a low-resolution multispectral image fused with high-resolution panchromatic one. The difference in image resolution is generally tackled by means of multi-scale image decomposition schemes that preserve spectral characteristics but represented at a high spatial resolution. Among the different proposals, wavelet based approaches have shown some of the best performance by producing better results than standard methods such as intensity–hue–saturation (IHS) transform technique or principal component analysis (PCA)  [17]. Wavelet based image fusion consists of two stages. Firstly, the given images are decomposed into two components (more details are given in Section  2.1.1); secondly, the components from the given images are fused in order to generate the final representation. Hence, the main challenge with wavelet based fusion schemes lies on finding the best setup for both the image decomposition approach (i.e., number of levels, wavelet family and its configurations) and the fusion strategy to merge the information from decomposed images into a single representation (e.g., min, max, mean, rand, etc., from the two approximations and details obtained from the given images at element-wise by taking, respectively, the minimum, the maximum, the mean value, or a random element). The selection of the right setup for fusing the given images will depend on the way the performance is evaluated. Hence a special care should be paid to the quantitative metric used to evaluate the obtained result, avoiding psychophysical experiments that will result in qualitative values  [18].

The current paper addresses the problem of cross-spectral fused image visual odometry by using the algorithm proposed by Geiger et al. in [19], which is referred to as LibVISO2. The main novelty of the current approach is to take advantage of information obtained at different spectral bands when visual odometry is estimated. In this way, robust solutions are obtained independently of the scenario’s characteristics (e.g., daytime). Fused images are obtained by a Wavelet based scheme. Different fusion schemes are quantitatively evaluated looking for the best one, evaluations are performed by means of a quality metric based on Mutual Information. Once the best configuration is found, the fused image based visual odometry is computed and compared with a previous cross-spectral based approach  [5] and classical visible/infrared based approaches.

The manuscript is organized as follows. Section  2 presents the proposed approach detailing the discrete wavelet transform based image fusion and its setups together with the off the shelf monocular visual odometry algorithm used to compute the vehicle odometry. Experimental results and comparisons are presented in Section  3. Finally, conclusions are given in Section  4.

Section snippets

Proposed approach

This section presents the Discrete Wavelet Transform image fusion scheme, the evaluation metric used to find the best setup and the monocular visual odometry approach used in the current work.

Experimental results

This section presents experimental results and comparisons with classical approaches based on visible spectrum or infrared images. Additionally, comparisons with the results presented in  [5] are provided showing the improvements when a better setup is considered for the fusion algorithm. In all the cases GPS information is used as ground truth data to evaluate the performance of evaluated approaches. Below, the acquisition platforms are introduced and then experimental results are depicted.

Conclusion

The manuscript evaluates the performance of a classical monocular visual odometry when cross-spectral fused images are used. The best fusion strategy is selected by using a novel mutual information based metric. The obtained visual odometry results are compared with a previous approach as well as with classical ones (based on visible and infrared spectrum, respectively). While at day light time the performance of classical visible spectrum based approach is quite similar to the one obtained

Acknowledgments

This work has been partially supported by the Spanish Government under Project TIN2014-56919-C3-2-R; the PROMETEO Project of the “Secretaría Nacional de Educación Superior, Ciencia, Tecnología e Innovación de la República del Ecuador”; Ecuador; the ESPOL project Pattern recognition: case study on agriculture and aquaculture (M1-D1-2015); and the “Secretaria d’ Universitats i Recerca del Departament d’ Economia i Coneixement de la Generalitat de Catalunya” (2014-SGR-1506). C. Aguilera has been

Angel Domingo Sappa received the Electromechanical Engineering degree from National University of La Pampa, General Pico, Argentina, in 1995, and the Ph.D. degree in Industrial Engineering from the Polytechnic University of Catalonia, Barcelona, Spain, in 1999. In 2003, after holding research positions in France, the UK, and Greece, he joined the Computer Vision Center, Barcelona, Spain, where he is currently a Senior Researcher, member of the Advanced Driver Assistance Systems Group. Since

References (30)

  • T. Bourlai et al.

    Cross-spectral face verification in the short wave infrared (swir) band

  • Y. Choi, et al. All-day visual place recognition: Benchmark dataset and baseline, in: IEEE International Conference on...
  • J. Poujol et al.

    Visible-thermal fusion based monocular visual odometry

  • D. Nistér, O. Naroditsky, J. Bergen, Visual odometry, in: IEEE Intgernational Conference on Computer Vision and Pattern...
  • D. Scaramuzza, F. Fraundorfer, R. Siegwart, Real-time monocular visual odometry for on-road vehicles with 1-point...
  • Cited by (6)

    • Absolute scale estimation for underwater monocular visual odometry based on 2-D imaging sonar

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      Single-beam echosounder cannot provide 2-D information, so the classic multiview geometry methods cannot be adopted. Multimodal VO consisting of visual sensors at different spectral bands have been proved capable of providing better performance than monocular VO [18–20]. To build an underwater multimodal vision system, the sensors are required to provide 2-D information, ideally with acoustic sensors (e.g. high frequency FLS) as a supplement to visual sensors (e.g. cameras), which leads to the emergence of FLS-camera system.

    • A cross-spectral image fusion based visual odometry

      2020, Proceedings - 2020 35th Youth Academic Annual Conference of Chinese Association of Automation, YAC 2020
    • Learning to Find Unpaired Cross-Spectral Correspondences

      2019, IEEE Transactions on Image Processing
    • Comparison of Different Level Fusion Schemes for Infrared-Visible Object Tracking: An Experimental Survey

      2018, 2018 2nd International Conference on Robotics and Automation Sciences, ICRAS 2018

    Angel Domingo Sappa received the Electromechanical Engineering degree from National University of La Pampa, General Pico, Argentina, in 1995, and the Ph.D. degree in Industrial Engineering from the Polytechnic University of Catalonia, Barcelona, Spain, in 1999. In 2003, after holding research positions in France, the UK, and Greece, he joined the Computer Vision Center, Barcelona, Spain, where he is currently a Senior Researcher, member of the Advanced Driver Assistance Systems Group. Since 2016 he is also with the Electrical and Computer Science Engineering school of the ESPOL, Guayaquil, Ecuador, as an invited Full Professor. His research interests span a broad spectrum within the 2D and 3D image processing. His current research focuses on stereo image processing and analysis, 3D modeling, dense optical flow estimation and multispectral imaging.

    Cristhian Aguilera received the B.S. degree in automation engineer from the Universidad del Bío-Bío Concepción, Chile, in 2008 and the M.Sc. degree in computer vision from the Autonomous University of Barcelona, Barcelona, Spain, in 2014. He is currently working towards the Ph.D. degree in computer science from the Autonomous University of Barcelona. Since 2015, he is an editor assistant of the Electronic Letter on Computer Vision and Image Analysis journal. His current research focuses in cross-spectral image similarity, stereo vision and deep convolutional networks.

    Juan Carvajal Ayala received the Bachelor’s degree in Electronic Communications Engineering from the University of Navarra School of Engineering, San Sebastian, Spain, in 2014. He did a research internship at Fraunhofer IIS in Erlangen, Germany, and is currently a research assistant at Center for Research, Development and Innovation of Computer Systems (CIDIS), ESPOL. His research interests are image fusion, pattern recognition, and deep learning.

    Miguel Oliveira received the Mechanical Engineering and M.Sc. in Mechanical Engineering degrees from the University of Aveiro, Portugal, in 2004 and 2007, respectively. Later in 2013 he obtained the Ph.D. in Mechanical Engineering specialization in Robotics, on the topic of Autonomous Driving and Drivers Assistance Systems. Currently he is a researcher at both the Institute for Systems and Computer Engineering,Technology and Science in Porto, Portugal, as well as the Institute of Electronics and Telematics Engineering of Aveiro, Portugal. In addition, he is an assistant professor at the Department of Mechanical Engineering, University of Aveiro, Portugal, where he teaches computer vision courses. His research interests include visual object recognition in open-ended domains, scene reconstruction from multi-modal sensor data, image and 3D data processing, computer vision and robotics.

    Dennis G. Romero received the Computer Engineering degree from Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil, Ecuador, in 2007, and the Ph.D. degree in Electrical Engineering from Universidade Federal do Espírito Santo, UFES, Vitória, Brazil, in 2014. In 2014, he joined the Center for Research, Development and Innovation of Computer Systems (CIDIS). He is a member of the Pattern recognition research group at ESPOL. His research interests center on improving the data understanding from sensor fusion, mainly through the application of machine learning and statistics for pattern recognition. His current research focuses on Pattern Recognition from microscope images of shrimps for identification of diseases.

    Boris X. Vintimilla received his degree in mechanical engineering in 1995 at the Escuela Superior Politécnica del Litora—ESPOL, Guayaquil, Ecuador, and his Ph.D. degree in industrial engineering in 2001 at the Polytechnic University of Catalonia, Barcelona, Spain. In May 2001, he joined the Department of Electrical and Computer Science Engineering of the ESPOL as associated professor and in 2008 became a full professor. Dr. Vintimilla has been the director of the Center of Vision and Robotics from 2005 to 2008. He did his post-doctorate research in the Digital Imaging Research Center at Kingston University (London, UK) from 2008 to 2009. Currently, he is director of the Center for Research, Development and Innovation of Computer Systems (CIDIS) at ESPOL. His research areas include topics related with image processing and analysis, and vision applied to mobile robotics. Dr. Vintimilla has been involved in several projects supported by international and national organizations, as result of these researches he has published more than 40 scientific articles and book chapters.

    Ricardo Toledo received the degree in Electronic Engineering from the Universidad Nacional de Rosario (Argentina) in 1986, the M.Sc. degree in image processing and artificial intelligence from the Universitat Autònoma de Barcelona (UAB) in 1992 and the Ph.D. in 2001. Since 1989 he was lecturer in the Computer Science Dept. of the UAB, and was involved in R + D projects. In 1996, he participated in the foundation of the Computer Vision Center (CVC) at the UAB. Currently, he is a full time associated professor at the Computer Science Dept., Cordinator of the Master in Informatics at the Escola d’Enginyeria (UAB) and member of the Computer Vision Centre. Ricardo has participated in several national and international/EU R + D projects, being the leader of some of them, is author/co-author of more than 40 papers, all these in the field of computer vision, robotics and medical imaging and has supervised several Master and Ph.D. thesis.

    View full text