ROBOT VISION Lesson 10: Object Tracking and Visual Servoing Matthias Rüther

ROBOT VISIONLesson 10: Object Tracking andVisual ServoingMatthias Rüther

Contents • Object Tracking • Appearance based tracking • Kalman filtering • Condensation algorithm • Model based tracking • Model fitting and tracking • Visual Servoing • Principle • Servoing Types

Tracking Tracking

Definition of Tracking • Tracking: • Generate some conclusions about the motion of the scene, objects, or the camera, given a sequence of images. • Knowing this motion, predict where things are going to project in the next image, so that we don’t have so much work looking for them.

Why Track?

Tracking a Silhouette by Measuring Edge Positions • Observations are positions of edges along normals to tracked contour

Why not Wait and Process the Set of Images as a Batch? • E.g. in a car system, detecting and tracking pedestrians in real time is important. • Recursive methods require less computing

Implicit Assumptions of Tracking • Physical cameras do not move instantly from a viewpoint to another. • Objects do not teleport between places around the scene. • Relative position between camera and scene changes incrementally. • We can model motion

Related Fields • Signal Detection and Estimation • Radar technology

The Problem: Signal Estimation • We have a system with parameters • Scene structure, camera motion, automatic zoom • System state is unknown (“hidden”) • We have measurements • Components of stable “feature points” in the images. • “Observations”, projections of the state. • We want to recover the state components from the observations

Necessary Models

A Simple Example of Estimation by Least Square Method

Recursive Least Square Estimation • We don’t want to wait until all data have been collected to get an estimate of the depth. • We don’t want to reprocess old data when we make a new measurement. • Recursive method: data at step i are obtained from data at step i-1

Recursive Least Square Estimation 2

Recursive Least Square Estimation 3

Least Square Estimation of the State Vector of a Static System

Least Square Estimation of the State Vector of a Static System 2

Dynamic System

Recursive Least Square Estimation for a Dynamic System (Kalman Filter)

Estimation when System Model is Nonlinear (Extended Kalman Filter)

Tracking Steps

Recursive Least Square Estimation for a Dynamic System (Kalman Filter)

Tracking as a Probabilistic Inference Problem • Find distributions for state vector aiand for measurement vector xi. Then we are able to compute the expectations âi and x^i. • • Simplifying assumptions (same as for HMM)

Tracking as Inference

Model based tracking

MODEL-BASED 3-D TRACKING IDEA: if motion is caused by known 3-D object, we can track 3-D motion parameters, not just individual features! ADVANTAGES: - low dimensionality (3 rotations, 3 translations independent of number of features tracked) - mutually constrained motion instead of independently moving points LIMITATIONS: - 6 params only with rigid objects! Not articulated, not deformable. - assumes 3-D model known a priori

Example Algorithm [Wunsch,Hirzinger IEEE RA 1997] SKETCH OF ALGORITHM: 0. Initialize 3-D pose R0, t0 (rot, transl) 1. Extract features from image It 2. Match img features with features of 3-D model positioned at Rt-1, tt-1 3. Evaluate global error metric in 3-D space (notice, not in image space) 4. Estimate Rt, tt aligning img and model features 5. Next frame and go to 1.

Some Details FEATURES: for instance using image edges with orient.  and offset d(and sx, sy camera scale factors), then is the normal of the 3-D plane through the img edge. 3-D plane through img edge q p Corresponding model edge ERROR METRIC:in 3-D space for efficiency (no back-projection): orthogonality of n and model edge

Some Details MINIMISATION: using, say, 3 types of features: Trick 1: Approximating R with differential rotations: All E terms can be linearized, a linear system obtained from the quadratic minimization,and a solution computed in closed form: e.g., for edges,

Some Details ... where The resulting linear system A [t  ] = b is (trick 2) applied iteratively at each time instant to reduce errors; a few iterations should suffice for small frame-to-frame displacements. NOTICE ASSUMPTIONS MADE: - rigid object - model known a priori - small frame-to-frame displacements - img-model feature correspondences known (if small displacements, by min distance)

Problems with Tracking • Initial detection • If it is too slow we will never catch up • If it is fast, why not do detection at every frame? • Even if raw detection can be done in real time, tracking saves processing cycles compared to raw detection. • The CPU has other things to do. • Detection is needed again if you lose tracking • Most vision tracking prototypes use initial detection done by hand

Visual Servoing • Vision System operates in a closed control loop. • Better Accuracy than „Look and Move“ systems Figures from S.Hutchinson: A Tutorial on Visual Servo Control

Visual Servoing • Example: Maintaining relative Object Position Figures from P. Wunsch and G. Hirzinger. Real-Time Visual Tracking of 3-D Objects with Dynamic Handling of Occlusion

Visual Servoing • Camera Configurations: End-Effector Mounted Fixed Figures from S.Hutchinson: A Tutorial on Visual Servo Control

Visual Servoing • Servoing Architectures Figures from S.Hutchinson: A Tutorial on Visual Servo Control

Visual Servoing • Position-based and Image Based control • Position based: • Alignment in target coordinate system • The 3D structure of the target is rconstructed • The end-effector is tracked • Sensitive to calibration errors • Sensitive to reconstruction errors • Image based: • Alignment in image coordinates • No explicit reconstruction necessary • Insensitive to calibration errors • Only special problems solvable • Depends on initial pose • Depends on selected features End-effector target Image of end effector Image of target

Visual Servoing • EOL and ECL control • EOL: endpoint open-loop; only the target is observed by the camera • ECL: endpoint closed-loop; target as well as end-effector are observed by the camera EOL ECL

Visual Servoing • Position Based Algorithm: • Estimation of relative pose • Computation of error between current pose and target pose • Movement of robot • Example: point alignment p1 p2

p1m p2m d Visual Servoing • Position based point alignment • Goal: bring e to 0 by moving p1 e = |p2m – p1m| u = k*(p2m – p1m) • pxm is subject to the following measurement errors: sensor position, sensor calibration, sensor measurement error • pxm is independent of the following errors: end effector position, target position

Visual Servoing • Image based point alignment • Goal: bring e to 0 by moving p1 e = |u1m – v1m| + |u2m – v2m| • uxm, vxm is subject only to sensor measurement error • uxm, vxm is independent of the following measurement errors: sensor position, end effector position, sensor calibration, target position p1 p2 u1 v1 v2 u2 d1 d2 c1 c2

Visual Servoing • Example Laparoscopy Figures from A.Krupa: Autonomous 3-D Positioning of SurgicalInstruments in Robotized LaparoscopicSurgery Using VisualServoing

Tracking using CONDENSATION CONditional DENSity PropagATION M. Isard and A. Blake, CONDENSATION – Conditional density propagation for visual tracking, Int. J. Computer Vision 29(1), 1998, pp. 4-28.

Goal • Model-based visual tracking in dense clutter at near video frame rates

Example

Approach • Probabilistic framework for tracking objects such as curves in clutter using an iterative sampling algorithm. • Model motion and shape of target • Top-down approach • Simulation instead of analytic solution

Probabilistic Framework • Object dynamics form a temporal Markov chain • Observations, zt , are independent (mutually and w.r.t process) • Use Bayes’ rule

Notation X State vector, e.g., curve’s position and orientation Z Measurement vector, e.g., image edge locations p(X) Prior probability of state vector; summarizes prior domain knowledge, e.g., by independent measurements p(Z) Probability of measuring Z; fixed for any given image p(Z | X) Probability of measuring Z given that the state is X; compares image to expectation based on state p(X | Z) Probability of X given that measurement Z has occurred; called state posterior

Tracking as Estimation • Compute state posterior, p(X|Z), and select next state to be the one that maximizes this (Maximum a Posteriori (MAP) estimate) • Measurements are complex and noisy, so posterior cannot be evaluated in closed form • Particle filter (iterative sampling) idea: • Stochastically approximate the state posterior with a set of N weighted particles, (s, ), where s is a sample state and  is its weight • Use Bayes’ rule to compute p(X|Z)

Factored Sampling • Generate a set of samples that approximates the posterior p(X|Z) • Sample set s={s(1), …, s(N)} generated from p(X); each sample has a weight (“probability”)

ROBOT VISION Lesson 10: Object Tracking and Visual Servoing Matthias Rüther