quantdev.blog - Eigenthingies and Diagonalizability

Each square matrix possesses a collection of one or more complex scalars, called eigenvalues and associated vectors called eigenvectors. A matrix is a concrete realization of a linear transformation on a vector space. The eigenvectors indicate the directions of pure stretch and the eigenvalues the extent of stretching.

Eigenvalues and Eigenvectors

Definition 1 (Eigenvalue and Eigenvector) Let \(A\) be an \(n \times n\) matrix. A scalar \(\lambda\) is called an eigenvalue of \(A\) if there exists a non-zero vector \(\mathbf{v} \neq \mathbf{0}\) such that

\[ A\mathbf{v} = \lambda \mathbf{v} \tag{1}\]

In geometric terms, the matrix \(A\) has the effect of stretching the eigenvector \(\mathbf{v}\) by an amount specified by the eigenvalue \(\lambda\).

The eigenvalue equation (Equation 1) is a system of linear equations is a system of linear equations for the entries of the eigenvector \(\mathbf{v}\), provided that the eigenvaluen \(\lambda\) is specified in advance. But, Gaussian elimination per se cannot solve the problem of determining two unknowns \(\lambda\) and \(\mathbf{v}\). We can rewrite the equation in the form:

\[ (A- \lambda I)\mathbf{v} = \mathbf{0} \tag{2}\]

This is a homogenous system of linear equations. It has the trivial solution \(\mathbf{v}=0\). But, we are specifically seeking a non-zero solution. The homogenous system \(R\mathbf{x}=\mathbf{0}\) has a non-trivial solution, if and only if, \(R\) is singular, \(rank(R) < n\) or equivalently \(det(R) = 0\). Consequently, we desire

\[ det(A-\lambda I) = 0 \tag{3}\]

This is called the characteristic equation and \(p(\lambda) = det(A-\lambda I)\) is called the characteristic polynomial.

In practice, one first solves the characteristic equation (Equation 3) to obtain a set of eigenvalues. Then, for each eigenvalue, we use standard linear algebra methods e.g. Gaussian elimination to solve the correponding linear system Equation 2 for the associated eigenvector \(\mathbf{v}\).

EMHE

Theorem 1 Every matrix has atleast one eigenvalue, and a corresponding eigenvector.

Proof.

This is just the FTA(Fundamental Theorem of Algebra), but it’s still worth enumerating as a theorem.

Let \(A \in \mathbb{C}^{n \times n}\) and the scalar field \(\mathbb{F}= \mathbb{R}\).

Let \(\mathbf{v}\) be any non-zero vector in \(\mathbb{C}^n\). Consider the list \(\{\mathbf{v},A\mathbf{v},\ldots,A^n \mathbf{v}\}\). These are \(n+1\) vectors and this must be a linearly dependent set. There exists \(a_0, \ldots, a_n\) not all zero, such that:

\[ a_n A^n \mathbf{v} + a_{n-1}A^{n-1}\mathbf{v} + \ldots + a_1 A \mathbf{v} + a_0 I \mathbf{v} = \mathbf{0} \]

Since this holds for all \(\mathbf{v}\neq \mathbf{0}\), the linear operator \(a_n A^n + \ldots + a_1 A + a_0 I\) must be the zero transformation.

By FTA, the polynomial equation with complex coefficients of degree \(n\):

\[ p(x) = a_0 + a_1 x + a_2 x^2 + \ldots + a_{n}x^n \]

can be factorized as :

\[ p(x) = (x - \lambda_1)(x - \lambda_2)\cdots(x - \lambda_n) \]

Putting it all together,

\[ \begin{align*} p(A)\mathbf{v} &= (A - \lambda_1 I)(A - \lambda_2 I)\cdots (A - \lambda_n I)\mathbf{v} = \mathbf{0} \end{align*} \]

\(\forall \mathbf{v} \neq \mathbf{0}\).

So, the composition of the factors \((A-\lambda_1 I)\cdots (A - \lambda_n I)\) has a non-trivial null space.

\[ ker((A-\lambda_1 I)(A-\lambda_2 I)\cdots (A - \lambda_n I)) \neq \{\mathbf{0}\} \]

So, atleast one of the factors must fail to be injective. There exists \(\lambda_i\), such that \((A-\lambda_i I)\mathbf{v}=\mathbf{0}\) such that \(\mathbf{v}\neq \mathbf{0}\). Thus, \(A\) has atleast one eigenvalue and one eigenvector. \(\blacksquare\)

Eigenvectors as the basis of a vector space

Lemma 1 If \(\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_n\) are \(n\) distinct eigenvalues of a matrix \(A\), \(\lambda_i \neq \lambda_j\), \(\forall i \neq j\), then the corresponding eigenvectors \(\{\mathbf{v}_1,\ldots,\mathbf{v}_n\}\) are linearly independent.

Proof.

We use induction on the number of eigenvalues. The case \(k=1\) is immediate, since an eigenvector cannot be zero. Assume that we know that the result is valid for \((k-1)\) eigenvalues. Our claim is that \(\{\mathbf{v}_1,\ldots,\mathbf{v}_{k-1},\mathbf{v}_k\}\) are linearly independent.

Suppose we have a vanishing linear combination:

\[ c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \ldots + c_{k} \mathbf{v}_k = \mathbf{0} \tag{4}\]

Let us multiply this equation by the matrix \(A\):

\[ \begin{align*} c_1 A\mathbf{v}_1 + c_2 A\mathbf{v}_2 + \ldots + c_{k} A\mathbf{v}_k &= \mathbf{0}\\ \Longrightarrow c_1 \lambda_1 \mathbf{v}_1 + c_2 \lambda_2 \mathbf{v}_2 + \ldots + c_k \lambda_k \mathbf{v}_k &= \mathbf{0} \end{align*} \]

On the other hand if we multiply the original Equation 4 by \(\lambda_k\), we have:

\[ c_1 \lambda_k \mathbf{v}_1 + c_2 \lambda_k \mathbf{v}_2 + \ldots + c_{k} \lambda_k \mathbf{v}_k = \mathbf{0} \]

Upon subtracting this from the previous equation, we obtain:

\[ c_1 (\lambda_1 - \lambda_k) \mathbf{v}_1 + c_2 (\lambda_2 - \lambda_k)\mathbf{v}_2 + \ldots + c_{k-1} (\lambda_{k-1} - \lambda_k)\mathbf{v}_{k-1} = \mathbf{0} \]

This is a vanishing linear combination of the first \((k-1)\) eigenvectors, and so, by our induction hypothesis, it can only happen if all the coefficients are zero:

\[ c_1(\lambda_1 - \lambda_k) = c_2(\lambda_2 - \lambda_k) = \ldots = c_{k-1}(\lambda_{k-1} - \lambda_k) = 0 \]

The eigenvalues were assumed to be distinct, and consequently \(c_1 = c_2 = \ldots = c_{k-1} = 0\). Substituting these values back into Equation 4, we find that \(c_k \mathbf{v}_k = 0\), and so \(c_k = 0\) also, since \(\mathbf{v}_k \neq \mathbf{0}\). Thus, we have proved that, if Equation 4 holds, then \(c_1 = \ldots = c_k = 0\). Thus, \(\{\mathbf{v}_1,\ldots,\mathbf{v}_k\}\) is a linearly independent set. \(\blacksquare\)

Theorem 2 If the \(n \times n\) real matrix \(A\) has \(n\) distinct real eigenvalues \(\lambda_1,\lambda_2,\ldots,\lambda_n\), then the corresponding real eigenvectors \(\mathbf{v}_1,\ldots,\mathbf{v}_n\) form a basis of \(\mathbb{R}^n\). If \(A\) (which may be either real or complex-valued matrix) has \(n\) distinct complex eigenvalues, then the corresponding eigenvectors \(\mathbf{v}_1,\ldots,\mathbf{v}_n\) form a basis of \(\mathbb{C}^n\).

Diagonalization

Consider a square matrix \(A \in \mathbb{R}^{n \times n}\) with \(n\) distinct eigenvalues. We can then write:

\[ A\begin{bmatrix}\mathbf{v}_1 \\ \mathbf{v}_2 \\ \vdots \\ \mathbf{v}_n\end{bmatrix} = \begin{bmatrix} \lambda_1 \\ & \lambda_2 \\ & & \ddots \\ & & & \lambda_n \end{bmatrix}\begin{bmatrix}\mathbf{v}_1 \\ \mathbf{v}_2 \\ \vdots \\ \mathbf{v}_n\end{bmatrix} \]

Define \(P\) as \((\mathbf{v}_1,\mathbf{v}_2,\ldots,\mathbf{v}_n)^T\). So, we can write:

\[ \begin{align*} AP &= \Lambda P\\ A & = P^{-1}\Lambda P \end{align*} \]

or equivalently \(A=P\Lambda P^{-1}\), where \(\Lambda = diag(\lambda_1,\ldots,\lambda_n)\) is a diagonal matrix. Consequently, if the matrix \(A\) has \(n\) distinct eigenvalues, then \(A\) is said to be diagonalizable.

Definition 2 A square matrix \(A\) is said to be diagonalizable, if and only if, there exists a non-singular matrix \(P\), such that \(A\) has a matrix factorization:

\[ A = P\Lambda P^{-1} \]

where \(\Lambda=diag(\lambda_1,\ldots,\lambda_n)\) .

Gershgorin-Circle Theorem

In pratice, precisely computing the eigenvalues of a matrix is done using a numerical algorithm. In certain theoretical applications, we may not require numerical values, but only their approximate locations. The Gershgorin circle theorem, due to early 20th century Russian mathematician Semyon Gershgorin, serves to restrict the eigenvalues to a certain well-defined region in the complex plane.

Definition 3 Let \(A \in \mathbb{C}^{n \times n}\) be a square matrix. For each \(1 \leq i \leq n\) , define the \(i\) th Gershgorin disk

\[ D_i = \{|z - a_{ii}|<r_i:z\in\mathbb{C}\}, \quad r_i = \sum_{j,j\neq i} |a_{ij}| \tag{5}\]

The Gershgorin domain \(D_A = \bigcup_{i=1}^n D_i \subset \mathbb{C}\) is the union of the Gershgorin disks.

Thus, the \(i\)th Gershgorin disk \(D_i\) is centered at the \(i\)-th diagonal entry of \(A\) and is an open ball of radius \(r_i\) equal to the sum of the absolute values of the off-diagonal entries that are in it’s \(i\)-th row.

Proof

Let \(\mathbf{v}\) be an eigenvector of \(A\) with eigenvalue \(\lambda\). Let \(\mathbf{u}=\mathbf{v}/||v||_{\infty}\) be the corresponding unit eigenvector with respect to the \(\infty\)-norm, so that:

\[ ||u||_{\infty} = \max\{|u|_1,|u|_2,\ldots,|u|_n\} = 1 \]

Let \(u_i\) be an entry of \(\mathbf{u}\) that achieves the maximum: \(|u_i|=1\). Writing out the \(i\)-th component of the eigenvalue equation \(A\mathbf{u}=\lambda \mathbf{u}\), we obtain:

\[ \begin{align*} \sum_{j=1}^{n} a_{ij}u_j &= \lambda u_i \\ \sum_{j \neq i} a_{ij}u_j &= (\lambda - a_{ii}) u_i \end{align*} \]

Therefore, since all \(|u_j| \leq 1\), while \(|u_i|=1\), the distance between \(\lambda\) and \(a_{ii}\) can be bounded from above as:

\[ \begin{align*} |\lambda - a_{ii}| &= \Bigg|\sum_{j \neq i} a_{ij}u_j \Bigg|\\ &\leq \sum_{j \neq i} |a_{ij}||u_j| & \{\text{ Triangle Inequality }\}\\ &\leq \sum_{j \neq i} |a_{ij}| & \{ |u_j| \leq 1 \}\\ &= r_i \end{align*} \]

This immediately implies that \(\lambda \in D_i \subset D_A\) belongs to the \(i\)th Gershgorin disk.