Understanding tracking: Lucas-Kanade Corner Tracking

By: Aryan Jain

Coming soon 👀Last updated: 👀

PrerequisiteUnderstanding corners: Harris Corner Detection

Hey guys! Welcome to this blog post about the Lucas-Kanade corner tracking algorithm. In the last blog we learned how to track corners and in this post we will learn how to track them! Why do we need to track features? We need to track features for various reasons:

Video Stabaization
SLAM (how a robot knows where it is, think Roombas!)
Object tracking
If you want to make a workout app that sees how good your form is you would need to track key body features and compare them to a picture perfect form

Everything in the past two blogs have been building methods so we can now finally dive into the amazing feature tracking algorithm designed by Bruce Lucas and Takeo Kanade in 1981!

These are the two frames we are going to be looking at in this blog (we have more that we will test our algorithm on).

Now zooming into these two regions we are going to try to track this feature across the frames. When we track features across frames we need to make a couple of assumptions.

Brightness constancy - we can assume that the brightness of a feature will stay the same or relatively close when that image is moved. By doing this we are able to get an equation like this:
This equation is pretty much just saying that the illumination (the I function) stays the same at an (x, y) location if we shift it by some (u,v) and increase the time in the video. This is more simply just saying that things in the physical world will stay relatively the same if you move it around a little bit (try to develop the intuition here).

Small motion - the next thing we will assume to track features using the algorithm is that the motion (u, v) is small. If we assume the motion is small we can now change that equation above using a Taylor expansion. Why can we do that? Well given a small enough change we can use the derivative of pixel intensity (I) at (x, y) to take the (u, v) values out of the intensity function. We do that since we don't actually know what the (u, v) values are so we can now solve for them with this new equation. If you need a refresher on Taylor expansion, check out the Harris corner detection blog .
Now this equation is a lot easier to work:
$I(x,y,t) = I(x,y,t) + uI_x + vI_y + I_t$
$0 = uI_x + vI_y + I_t$
$-I_t = uI_x + vI_y$
If you look closely this is an equation for a line. Use this interactive diagram to look at what possible values of (u, v) could satisfy the line (for some chosen Ix, Iy, and It).
$0 = u(1.00) + v(0.60) + (-0.90)$
$u$ = 0.90, $v$ = 0.00
For one fixed set of $I_x, I_y, I_t$ values, there are many possible $(u,v)$ pairs that satisfy this equation. That is why one pixel gives us a line, not a single motion.
$I_x$ : 1.00 $I_y$ : 0.60 $I_t$ : -0.90
As shown by this diagram, many (u, v) values can work for this equation and now the issue is we don't know which one is correct. How can we solve this? The reason we have this issue is because we have one equation and two unknowns. If you have taken linear algebra tha obvious answer is to get more equations and now we can solve a system of linear equations. The last assumption is how we solve this problem.

Spacial Coherence - This is the idea that pixels that are near each other tend to move together. If you think about it this makes a lot of sense. If you are tracking the corner of your iPhone, chances are if you move your phone the camera moves along with it.This is how the math looks like with this and now we are able to solve this overdetermined system.An overdetermined system in linear algebra occurs when a system has more equations (m) than variables (n), usually resulting in no exact solution. These systems are often inconsistent because the additional constraints ontradict each other, but they can be solved for a best fit using least squares approximation.
Now that we understand this we can transition it into a matrix (remember that linear algebra is a prereq for most things ML).
$\left[ \begin{array}{cc} I_x^{(1)} & I_y^{(1)} \\ I_x^{(2)} & I_y^{(2)} \\ \multicolumn{2}{c}{\vdots} \\ I_x^{(N)} & I_y^{(N)} \end{array} \right] \left[ \begin{array}{c} u \\ v \end{array} \right] = - \left[ \begin{array}{c} I_t^{(1)} \\ I_t^{(2)} \\ \vdots \\ I_t^{(N)} \end{array} \right]$
$Ad = b$