Computer Vision Interactive Learning

The Anatomy of Convolution

Convolution is the fundamental operation for image filtering. It involves sliding a small matrix called a kernel over an image. At each position, we multiply the kernel values with the underlying image pixels and sum them up to produce a single new pixel in the output image.

Selected Kernel

Input Image

0

255

0

255

0

255

0

255

0

Kernel

-1

0

1

-2

0

2

-1

0

1

Feature Map

0

·

State: Engine Paused

Scanning Pixel (0, 0)

Fig. 1.1 — Visualization of the sliding kernel operator

The mathematical operation for a kernel $H$ and image $I$ at pixel $(x, y)$ is expressed as:

(I * H)[x, y] = \sum_{i=-k}^k \sum_{j=-k}^k I[x-i, y-j] \cdot H[i, j]

Eq. 1.01

Interactive Laboratory

Different kernels produce different effects. A Box Blur averages neighboring pixels, while an Edge Detector like Sobel highlights gradients. Observe the transformation below.

Input Media

Operator Preset

Linear Transformation Grid

Plate 1.A — Output Comparison View

Correlation vs. Convolution

In deep learning, what we call “convolution” is mathematically often correlation. The only difference is that true mathematical convolution requires flipping the kernel horizontally and vertically ( $180^\circ$ rotation) before sliding it over the image.

Reference Operator

K = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Notice directional shift disparity

Target Selection