Instantaneous 3D motion from image derivatives using the Least Trimmed Square regression

https://doi.org/10.1016/j.patrec.2008.12.006Get rights and content

Abstract

This paper presents a new technique to the instantaneous 3D motion estimation. The main contributions are as follows. First, we show that the 3D camera or scene velocity can be retrieved from image derivatives only assuming that the scene contains a dominant plane. Second, we propose a new robust algorithm that simultaneously provides the Least Trimmed Square solution and the percentage of inliers—the non-contaminated data. Experiments on both synthetic and real image sequences demonstrated the effectiveness of the developed method. Those experiments show that the new robust approach can outperform classical robust schemes.

Introduction

Computing object and camera motions from 2D image sequences has been a central problem in computer vision for many years (Hartley and Zisserman, 2000, Jasinschi et al., 2000, Weng et al., 1993, Zucchelli et al., 2002). More especially, computing the 3D velocity of either the camera or the scene is of particular interest to a wide variety of applications in computer vision and robotics such as calibration (Malm and Heyden, 2002), visual servoing, ego-motion estimation, detecting independently moving objects, to mention a few. One of the main tasks in computer vision is the reconstruction of the structure of a scene in a process known as structure from motion (SFM). The classic approach to SFM, which attracted considerable attention in the literature, is based on the extraction and matching of image features throughout the image sequence.

Many algorithms have been proposed for estimating the 3D relative camera motions (discrete case) (Lourakis and Argyros, 2004) and the 3D velocity (differential case) (Baumela et al., 2000, Brooks et al., 1997, Rother and Carlesson, 2002).

In (Jasinschi et al., 2000), the authors describe a method for extracting the camera velocity. This method is a combination of the eight-point method in structure-from-motion with a statistical technique to automatically select feature points in the image, irrespective of 3D content information. In (Kim et al., 1997), the authors estimate the camera motion parameters from image sequences using a linear motion parameter equation and the Kalman filtering method.

Very few non-correspondence motion estimation algorithms have been proposed (Dellaert et al., 2000, Boughorbel et al., 2003). One can notice that although these ones circumvent the need for establishing direct correspondences, they still need to perform feature extraction in the images. In (Boughorbel et al., 2003), the authors propose a method for estimating the relative motion of a camera from two successive frames. The method relies on a structure saliency measure, and does not require any previous knowledge of point correspondences between the images. They presented two different such metrics. The first metric was simple and based on measuring the scattering of the structure points. The second metric used the tensor voting approach and was more robust. In (Baumela et al., 2000), the authors derived the continuous analogue of the discrete epipolar equation, given a geometric interpretation of it, and a practical algorithm for computing camera’s motion parameters from closely spaced views. The input were given by the optical flow field.

While the discrete case requires feature matching and tracking across the images, the differential case requires the computation of the optical flow field (2D velocity field) (Barron et al., 1994, Irani, 1999, Fleet et al., 2000). All these problems are generally ill-conditioned.

In our work, we assume that the scene is far from the camera or it contains a dominant planar structure. The use of image derivatives has been exploited in (Brodsky and Fermuller, 2002) to make camera intrinsic calibration. In (Malm and Heyden, 2000), image derivatives have been used to perform hand-eye calibration with constrained camera motions obtained by controlling the motion of the robot hand. The current paper has two main contributions. First, we introduce a novel technique to the unconstrained 3D velocity estimation using image derivatives alone, therefore feature extraction and tracking are not required. Second, we propose a robust method that combines the Least Trimmed Square regression and the Golden Section Search algorithm where the number of inliers is not known a priori. Our robust method simultaneously estimates the inlier percentage and the robust solution. On the other hand, existing LTS regression methods (e.g., Rousseeuw and Driessen, 2002) assume that the percentage of inliers is known in advance.

The first contribution concerns the instantaneous 3D motion estimation from image data, which can be useful for many applications in vision and robotics such as extrinsic calibration (Dornaika and Chung, 2008), visual servoing (Horaud et al., 1998), video indexing (Jasinschi et al., 2000), space robot localization (Johnson et al., 2007), and augmented reality (Lourakis and Argyros, 2004). What differentiates our work from existing ones is the use of image derivatives alone and not the optical flow field with a novel robust statistics solution.

The second contribution is within the field of robust linear regression (Maronna et al., 2006).

In our study, we deal with the estimation of the 3D velocity of the camera or the scene from image derivatives where the motion is not constrained. Our proposed approach lends itself nicely to all applications in which the camera motion is not controlled (e.g.; when using a hand-held camera in indoor environments). The paper is organized as follows. Section 2 describes the problem we are focusing on. Section 3 describes the proposed methods. Experimental results on both synthetic and real image sequences are given in Section 4.

Section snippets

Problem formulation

Throughout this paper we represent the coordinates of a point in the image plane by small letters (x,y) and the object coordinates in the camera coordinate frame by capital letters (X,Y,Z). In our work, we use the perspective camera model as our projection model. Thus, the projection is governed by the following equation where the coordinates are expressed in homogeneous form,λxy1=fsxc00rfyc00010XYZ1Here, f denotes the focal length in pixels, r and s the aspect ratio and the skew and (xc,yc)

Proposed tools and methods

We assume that the image contains N pixels for which the spatio-temporal derivatives (Ix,Iy,It) have been computed. In practice, N is very large. In order to reduce this number, one can either drop pixels having small gradient components or adopt a low-resolution representation of the images. In the sequel, we do not distinguish between the two cases, i.e., N is either the original size or the reduced one. By inserting Eqs. (6), (7) into Eq. (3) we getIxa1+Ixxa2+Ixya3+Iya4+Iyxa5+Iyya6+(Ixx2+Iyxy

Experimental results

Our experiments have been carried out on synthetic and real images.

Conclusion

This paper presented an approach to the 3D velocity estimation from spatio-temporal image derivatives alone. What differentiates the presented work from existing ones is the use of image derivatives alone and not the optical flow field with a novel robust statistics solution. Despite the fact that the developed approach does not rely on the computation of the optical flow, the latter one is a by-product of the approach. This paper had two main contributions. First, we introduced a novel

Acknowledgements

This work was partially supported by the Government of Spain under MEC Project TRA2007-62526/AUT and research programme Consolider Ingenio 2010: MIPRCV (CSD2007-00018).

References (27)

  • F. Boughorbel et al.

    Estimating 3D camera motion without correspondences using a search for the best structure

    Pattern Recognition Lett.

    (2003)
  • J.L. Barron et al.

    Performance of optical flow techniques

    Internat. J. Comput. Vision

    (1994)
  • Baumela, L., Agapito, L., Bustos, P., Reid, I., 2000. Motion estimation using the differential epipolar equation. In:...
  • T. Brodsky et al.

    Self-calibration from image derivatives

    Internat. J. Comput. Vision

    (2002)
  • M.J. Brooks et al.

    Determining the egomotion of an uncalibrated camera from instantaneous optical flow

    J. Optical Soc. Amer. A

    (1997)
  • Dellaert, F., Seitz, S., Thorpe, C., Thrun, S., 2000. Structure from motion without correspondence. In: IEEE Conf. on...
  • F. Dornaika et al.

    Stereo geometry from 3D ego-motion streams

    IEEE Trans. Systems Man Cybernet., Part B

    (2008)
  • D.J. Fleet et al.

    Design and use of linear models for image motion analysis

    Internat. J. Comput. Vision

    (2000)
  • R. Hartley et al.

    Multiple View Geometry in Computer Vision

    (2000)
  • R. Horaud et al.

    Visually guided object grasping

    IEEE Trans. Robotics Automation

    (1998)
  • Irani, M., 1999. Multi-frame optical flow estimation using subspace constraints. In: IEEE Conf. on Computer...
  • R.S. Jasinschi et al.

    Apparent 3D camera velocity—Extraction and applications

    IEEE Trans. Circuits Systems Video Technol.

    (2000)
  • A. Johnson et al.

    Design through operation of an image-based velocity estimation system for Mars landing

    Internat. J. Comput. Vision

    (2007)
  • Cited by (1)

    1

    Tel.: +34 935813036; fax: +34 935811670.

    View full text