The Convolution Rule
Multiplying two circulant matrices multiplies their eigenvalues, and since those eigenvalues are the discrete Fourier transforms of the first columns, the columns themselves combine by convolution. Read in the other direction, this is the single most useful identity in signal processing: convolution in the original domain becomes ordinary pointwise multiplication in the Fourier domain. The same pattern, restricted to a short filter, is exactly what a convolutional layer of a neural network computes.
Cyclic convolution
Section titled “Cyclic convolution”This is precisely the rule for multiplying circulants. If and are circulants with first columns and , then is again circulant, and its first column is . The reason is that the first column of any circulant is the matrix applied to the first basis vector , and , so the first column of is , whose -th entry is .
The convolution rule
Section titled “The convolution rule”Because circulants share the Fourier eigenvectors, multiplying them multiplies their eigenvalues entry by entry. Translating that statement about eigenvalues back to the first columns gives the convolution rule.
The computational payoff is immediate. A direct cyclic convolution costs multiplications. Through the rule, transform both vectors, multiply the transforms pointwise in , and transform back. With the fast Fourier transform computing each in , convolution of length- signals drops from to . Polynomial multiplication, integer multiplication, and the filtering of long signals all run on this identity.
Linear versus cyclic convolution
Section titled “Linear versus cyclic convolution”A short filter applied to a signal usually means linear convolution, the coefficient rule for multiplying the polynomials with coefficients and :
with no wraparound, producing a longer output of length . Cyclic convolution is the same sum folded modulo . The two agree, and the FFT method applies exactly, once both vectors are zero-padded to length at least so that the wraparound never reaches real data. Without enough padding the tail of the linear convolution aliases back onto the front, the error that periodic boundary conditions silently introduce.
Visualize the rule directly: edit the signal and the filter, and compare the convolution computed by sliding the filter against the convolution recovered by multiplying transforms and inverting. They coincide, frequency by frequency.
Convolution as a matrix: the link to CNNs
Section titled “Convolution as a matrix: the link to CNNs”Fixing the filter and letting the signal vary makes convolution a linear map, represented by a matrix with constant diagonals. With periodic boundaries it is the circulant whose first column is (zero-padded); on a finite signal without wraparound it is a banded Toeplitz matrix, . Either way the same filter weights repeat down every diagonal.
The convolution matrix is banded with shared weights: a width- filter places the same numbers on every row, shifted by one. A dense layer would have an independent weight in every position.
This structure is what a convolutional neural network exploits. A convolutional layer is not a full dense matrix with free parameters; it is a Toeplitz (or, with periodic boundaries, circulant) matrix built from a small filter of weights. Three consequences follow from the structure alone.
- Parameter sharing. A width- filter has parameters regardless of the input length , against for a dense layer. A network can afford many such layers and many filters per layer.
- Translation equivariance. Convolution commutes with the shift , so shifting the input shifts the output identically. A feature detector learned at one location applies at every location, the property that makes these layers effective on images.
- Locality. Each output depends only on a small window of the input, the bandwidth of the Toeplitz matrix, matching the local correlations in natural images and signals.
Two-dimensional images use a two-dimensional filter, and the corresponding matrix is doubly Toeplitz (a block-Toeplitz matrix of Toeplitz blocks), still diagonalized by the two-dimensional DFT under periodic boundaries. Stacking many such layers, interleaved with pointwise nonlinearities, is the architecture behind the convolutional networks used for image recognition. The linear algebra of one layer is entirely the convolution rule on this page.