\( \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\dim}{\operatorname{dim}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\def}{\operatorname{def}} \newcommand{\ran}{\operatorname{ran}} \)

Back

Singular Value Decomposition

Definition.

The singular value decomposition (SVD) of Matrix \(A \in \mathbb{K}^{m \times n}\), \(\mathbb{K} \in \{\mathbb{R}, \mathbb{C}\}\), is a factorization such that \[A = U \Sigma V^*,\] where \(U\) is a \(m \times m\) unitary matrix, \(V\) is a \(n \times n\) unitary matrix and \(\Sigma\) is a \(m \times n\) diagonal matrix with nonnegative entries called singular values \(\sigma_i\).

Theorem.

For any matrix \(A \in \mathbb{K}^{m \times n}\) there exists a SVD.

Finite Dimensional Spectral Theorem.

If \(M \in \mathbb{K}^{n \times n}\) is symmetric (hermitian) there exists a orthonormal basis consisting of eigenvectors of \(M\). Each eigenvalue is real.

Proof.

By the fundamental theorem of Algebra factor the characteristic polynom \[p(\lambda) = \det(M - \lambda I) = (\lambda - \lambda_1)(\lambda - \lambda_2)\cdots(\lambda - \lambda_n)\] with eigenvalues \(\lambda_i \in \mathbb{C}\) and eigenvectors \(v_i\). As we have \[ \begin{align*} \lambda_i = \lambda_i \norm{v_i}_2^2 = v_i^* \lambda v_i = v_i^* M v_i = v_i^* M^* v_i = \overline{\lambda_i}, \end{align*} \] all eigenvalues are real.

For \(\lambda_i \neq \lambda_j\) it holds that \[ \begin{align*} v_i^* M v_j &= \lambda_j v_i^* v_j \\ v_i^* M^* v_j &= \lambda_i v_i^* v_j \\ \end{align*} \] and so \( v_i^* v_j = 0.\) This shows that eigenvectors of different eigenvalues are orthogonal.

By Gram-Schmidt we can find an orthonormal basis for each eigenspace \(\ker (M - \lambda_i I)\). Combining all vectors gives us the desired orthonormal basis.

\(\Box\)

Let \(M = A^*A\) which is symmetric (hermitian). By the spectral theorem let \(V \in \mathbb{K}^{n \times n}\) be the matrix with the orthonormal basis as columns.

We have \[ \begin{align*} V^* M V = \begin{pmatrix} D & 0 \\ 0 & 0 \end{pmatrix} \end{align*} \] where \(D \in \mathbb{K}^{l \times l}\) contains the non-zero eigenvalues of \(M\) sorted. It holds that \(l \le \min(m,n)\) since the eigenvalues of \(M\) are precisely the roots of the eigenvalues of \(A.\)

Split \(V = (V_1 \enspace V_2)\) in eigenvectors corresponding to non-zero eigenvalues and eigenvectors corresponding to eigenvalue \(0\).

We have \begin{align*} \begin{pmatrix} D & 0 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} V_1^* M V_1 & V_1^* M V_2 \\ V_2^* M V_1 & V_2^* M V_2 \end{pmatrix} \end{align*} which implies \(V_1^* A^* A V_1 = D\) and \(V_2^* A^* A V_2 = 0\) and thus \(A V_2 = 0\). As \(V\) is orthonormal we have \begin{align*} V^*_1 V_1 &= I_l \\ V_2^* V_2 &= I_{n-l} \\ V_1 V_1^* + V_2 V_2^* & = I_n. \end{align*}

Define \(U_1 = AV_1 D^{-1/2} \in \mathbb{K}^{m \times l} \). Then \begin{align*} U_1 D^{1/2} V_1^* &= A V_1 D^{1/2}D^{-1/2} V_1^* \\ &= A(I - V_2 V_2^*) \\ & = A - (A V_2) V_2^* = A. \end{align*} Extend \(U_1\) to an orthonormal basis (Gram-Schmidt) \(U = (U_1 \enspace U_2)\) and let \begin{align*} \Sigma = \begin{pmatrix} \begin{pmatrix} D^{1/2} & 0 \\ 0 & 0 \end{pmatrix} \\ 0 \end{pmatrix} \in \mathbb{K}^{m \times n}. \end{align*} Now we have \begin{align*} U \Sigma V^* &= \begin{pmatrix} U_1 & U_2 \end{pmatrix} \begin{pmatrix} \begin{pmatrix} D^{1/2} & 0 \\ 0 & 0 \end{pmatrix} \\ 0 \end{pmatrix} \begin{pmatrix} V_1 & V_2 \end{pmatrix}^* \\ &= \begin{pmatrix} U_1 & U_2 \end{pmatrix} \begin{pmatrix} D^{1/2} V_1^* \\ 0 \end{pmatrix} \\ &= U_1 D^{1/2} V_1^* \\& = A. \end{align*}

\(\Box\)

As the proof of the existence theorem is constuctive it gives us an algorithm to perform SVD:

1. Find eigenvectors \(V\) and eigenvalues \(\Lambda\) of \(A^*A\) (sorted by decreasing in eigenvalues.

2. Initialise \(\Sigma \in \mathbb{R}^{m \times n}\) with square roots of the \(\min(m,n)\) largest eigenvalues on the diagonal.

3. Solve \(U \Sigma = AV \) for \(U\) (e.g. \(\Sigma^* U^* = (AV)^*\) for \(U^*\)).

Eckart-Young-Mirsky theorem.

Let \[A_k = \sum_{i=1}^k \sigma_i u_i v_i^* \] be the rank \(k\) approximation of \(A \in \mathbb{K}^{m \times n}.\)
We have that \[\norm{A - A_k}_2 = \sigma_{k+1}\] and for every \(B \in \mathbb{K}^{m \times n}\) with rank \(k\) \[\norm{A - B}_2 \ge \norm{A - A_k}_2.\]

Claim.

\[\norm{A}_2 = \sigma_\max (A).\]

Proof.

\begin{align*} \norm{A}_2 &= \max_{x \neq 0} \frac{\norm{Ax}_2}{\norm{x}_2} \\ &= \sqrt{ \max_{x \neq 0} \frac{x^*A^*Ax}{x^*x} } \\ &= \sqrt{ \max_{x \neq 0} R_{A^*A}(x) } = \sqrt{ \lambda_\max(A^*A)} = \sigma_\max (A), \end{align*} where \(R_{A^*A}\) is the Rayleigh quotient of \(A^*A\).

\(\Box\)

Therefore \begin{align*} \norm{A - A_k}_2 = \sigma_\max \left( \sum_{i = k+1}^n \sigma_i u_i v_i^* \right) = \sigma_{k+1}. \end{align*}

Further, note that \(\def(B) = n - \rank(B) = n - k\). Let \(V_{k+1}\) be the matrix with the first {k+1} eigenvectors as columns. \begin{align*} &\dim(\ker(B) \cap \ran(V_{k+1})) = \\ &\dim(\ker(B)) + \dim(\ran(V_{k+1})) - \dim(\ker(B) + \ran(V_{k+1})) = \\ &\def(B) + \rank(V_{k+1}) - \dim(\ker(B) + \ran(V_{k+1})) = \\ &n - k + k + 1 - \dim(\ker(B) + \ran(V_{k+1})) \ge 1 \end{align*} Therefore there exists a \(x \in \ker(B) \cap \ran(V_{k+1})\) with \(\norm{x}_2 = 1\). \begin{align*} \norm{A - B}_2^2 &\ge \norm{(A - B)x}_2^2 \\ &= \norm{Ax}_2^2 \\ & = \sum_{i=1}^{k+1} \sigma_i ^2 |v_i^*x|^2 \\ &\ge \sigma_{k+1}^2 \sum_{i=1}^{k+1} |v_i^*x|^2 = \sigma_{k+1}^2 = \norm{A - A_k}_2^2. \end{align*}

Image Compression with SVD

As a normal image consists of three real matrices (for each RGB channel) we can calculate the best low rank approximation with SVD for each color. For a given rank \(k\) and dimensions \(m \times n\) the amount of entries for the SVD that we need to store is given by \[3(m \cdot k + k + k \cdot n).\] We do not have to store \(U\) and \(V\) fully as the entries have no effect in the multiplaction with \(\Sigma\).
Thus, we have a compression rate of \[\frac{(m + n + 1) }{m \cdot n}k,\] which is linear in \(k\). How the compression looks for different \(k\) can be seen on the top of the page.

Back