Understanding corners: Harris Corner Detection

By: Aryan Jain

Mar 26, 2026Last updated: Mar 26, 2026Code

PrerequisiteUnderstanding edges: Canny Edge Detector

Hey everyone! In this blog we are going to be delving into the Harris Corner Detection algorithm. Before we dive into this I really recommend reading the article about Canny edge detection as some of the concepts bleed into this blog. With that said let's dive into it!

So as you can tell from the title of this post we are looking at an algorithm that can detect corners in an image. This is useful because when we want to start tracking features in a video the features we use to track are corners. The reason we use corners as opposed to edges will become clear very soon.

Take a look at this image. There are three separate patches of pixels displayed here each of a different type: patch, edge, or corner.

View:

Now, these are the same patches, each moved around a little bit. What we see is that, among these three, the patch is nearly impossible to notice changes in, the edge is a little bit easier, and the corner is very easy to tell. Based on some of the math in this post and the intuition you have just gained, we see that corners are very easy to track and because of that they will prove to be pivotal in tracking features. As we discussed in the previous blog, tracking features has limitless possibilities.

View:

Now I want to take a look at some handwritten notes I had. Say we take the blue and green image and take an arbitrary window and shift that window by a factor of u and v (as we just did). Now if we take the difference of the two windows (subtracting the intensity values for corresponding pixels aka the change in intensity of the two windows) we now have the difference in intensity when we move by a factor of u and v. If we take that singular value we can plot it on a coordinate grid marked by u and v (as shown below).

This is the basis for the auto-correlation function also known as the SSD error function. This function measures how much a window of pixels in an image changes when shifted by (u,v). Let's now take a look at our original three windows' auto-correlation function (notated with the character E)

So we have established that we want to track corners, the question now becomes how can we use this auto-correlation function to do so. We want to see high values (meaning there is a large pixel change so it is either an edge or corner) AND we want to see that the change in pixel intensity is happening in the left and right direction. This will tell us that we have a corner and not a patch or edge.

Think about why that is the case and really let this intuition sink in because we will need it later!

As a lot of time in computer vision, this solution is not efficient. It turns out that if we calculate the auto-correlation function for every pixel at some arbitrary window and step size it is VERY INEFFICIENT. We have to instead find some sort of an approximation for the auto-correlation function .

We saw in the last blog post that finding the derivatives of the intensity in the x and y direction were very easy. With this tool in our bag we can now look at approximating this term with a Taylor expansion: I(x+u, y+v). If you don't know what a Taylor expansion is watch this video and if you don't have a solid understanding of calculus just know that we can approximate I(x+u, y+v) for small changes in u and v using a Taylor expansion. With that said we can approximate $I(x+u, y+v) \approx I(x,y) + uI_x(x,y) + vI_y(x,y)$

Ok, now I want you to build some intuition for the auto-correlation in a geometric view. So, for starters, what does the function output? We are computing the difference of the two regions and squaring them, so from a conceptual standpoint we are determining the difference. Two similar patches have a small auto-correlation value, and two different pixel windows have a large value. So if we have a patch, then if we move the window anywhere, chances are that it will not result in a large shift. If we have a corner, chances are if we move the window around and compute the difference, there will be a large difference. Now we have to start thinking about the auto-correlation function like a 3D parabola. The x and y values are the u and v values, and the z is the actual function output. So any point is the difference between the original window and the window we get when we shift by those specific u and v values.If we have a corner, then a small change results in a large change.That means that the parabola is very steep, so a small change jumps us up a lot. A patch would be the exact opposite, where only large changes result in changes. Small changes would not increase the auto-correlation output that much. Edges find themselves in the middle, where a small change in either the x or y direction would result in a large auto-correlation function output. If we take a top-down view of this parabola, ignoring the z axis, we can see that tight small circles represent corners, whereas ovals are indicative of edges and large circles represent patches. Now, a final note is that the M matrix from the video is what actually determines this shape, so if we make it a diagonal matrix (values on the top-left to bottom-right diagonal, but zeros everywhere else), then the eigenvalues (values on the diagonal) can tell us if it is a corner, edge, or patch, as shown by the λ view panel.

E(u,v) \approx \left(\sum I_x^2\right)u^2 + 2\left(\sum I_xI_y\right)uv + \left(\sum I_y^2\right)v^2

M = \begin{bmatrix} 1.20 & 0.00 \\ 0.00 & 1.20 \end{bmatrix}

Top-down view

Current region: Corner (1.20, 1.20)

Swapping x and y changes the orientation, but not the eigenvalues.

Drag the surface to rotate it.

\sum I_x^2

: 1.20

\sum I_x I_y

: 0.00

\sum I_y^2

: 1.20

We now have the equation and the formula and can start implementing. The only things that I will preface is that we have a box summation above but instead we will implement a guassian summation of the values in the window (the reason is related to signle processesing which I may make a blog about later). Finally, after we apply the equation to the image we then need to do a NMS to localize the corners (look at the previous blog for why we need to do that and how it works).

We first need the $I_x$ and the $I_y$ values then we can calculate the M matrix.

python

# Recall the first step is performing a Gaussian Blur (recall from last blog)
smooth_window_size = 7
blur = cv2.GaussianBlur(gray_img, (smooth_window_size, smooth_window_size), 0)

# We can now create a Kernal and pass it through the image
kernel = np.array([-1, 0, 1], dtype=np.float32) / 2.0
Ix = ndimage.convolve1d(blur, kernel, axis=1, mode='reflect', output=np.float32)
Iy = ndimage.convolve1d(blur, kernel, axis=0, mode='reflect', output=np.float32)

Now we compute the values needed for the M matrix

python

# Now we want to compute the values needed for the M matrix
Ix2 = np.square(Ix)
Iy2 = np.square(Iy)
Ixy = Ix * Iy

Now blur so that we can do the summation part of the M matrix (the gaussian blur does a weighted summation based on the guassian distribution)

python

# Blur the resultants with a Gaussian blur which provides us the weighted summations 
# for each window
smooth_window_size = 7
G_Ix2 = cv2.GaussianBlur(Ix2, (smooth_window_size, smooth_window_size), 0)
G_Iy2 = cv2.GaussianBlur(Iy2, (smooth_window_size, smooth_window_size), 0)
G_Ixy = cv2.GaussianBlur(Ixy, (smooth_window_size, smooth_window_size), 0)

These three images tell us at each pixel location how much a shift in x and y changes the patch. Now how can we use that to tell us where there are corners. We can use those values to assign a cornerness score to each pixel. There are two ways I want to discuss how to do that.

Shi-Tomasi Corner Detection:

In this method, as we talked about eariler we can use the Eigenvalues to determine if a pixel is likely to be a corner. The trick is that we need two large Eigenvalues so in this method we take the M matrix for each pixel and computer the Eigenvalues of that matrix. We then look to see if the minimum between those two is above a certain threshold. If it is then BOOM we found a corner. Why this works is because if the smallest Eigenvalue is larger then a threshold then both of the values are sufficiently large.

Harris-Stephens Corner Detection:

It turns out however that calculating the Eigenvalues a couple thousand times is not efficient. Instead we can use this formula $R = \det(M) - \alpha\,\mathrm{trace}(M)^2$ which serves as a good approximation to the Shi-Tomasi Corner Detection (note that a lot of times we are using approximations in computer vision).

Numerical intuition

k

= 0.04

Case	$\lambda_1$	$\lambda_2$	det = $\lambda_1\lambda_2$	trace = $\lambda_1 + \lambda_2$	$R = \det - k(\mathrm{trace})^2$	Interpretation
Flat	0.1	0.1	0.01	0.2	0.01 - 0.04(0.04) = 0.0084	Very small -> flat
Edge	10	0.1	1	10.1	1 - 0.04(102) = -3.0804	Negative -> edge
Corner	10	10	100	20	100 - 0.04(400) = 84	Large positive -> corner
Your values			12	8	12 - 0.04(64) = 9.44	Positive -> corner-like

Let's take a look at the code!

python

alpha = 0.04
det = (G_Ix2 * G_Iy2) - np.square(G_Ixy)
trace = G_Ix2 + G_Iy2
R = det - alpha * np.square(trace)

We now apply a NMS and threshold for the strong values.

python

# NMS (look at the prev blog to see it done from scratch)
nms_window_size = 7 # size of 7 chosen empirically
max_filter = ndimage.maximum_filter(R, size = nms_window_size)

# This threshold is the value that we need to be larger than to get a corner
threshold = 0.01 * np.max(R)
mask = (R > threshold) & (R >= max_filter) & (R > 0)
print("Threshold:", threshold)
print(mask)

output

Threshold: 2.5500000000000003
[[False False False ... False False False]
 [False False False ... False False False]
 [False False False ... False False False]
 ...
 [False False False ... False False False]
 [ True  True False ... False False  True]
 [ True  True False ... False False  True]]

Now simply apply the mask

python

y_vals, x_vals = np.where(mask)
key_pts = np.column_stack((x_vals,y_vals))

Here is the final result!

Next readingUnderstanding tracking: Lucas-Kanade Corner Tracking