Similarity and diagonalization

Section 6.5 Similarity and diagonalization

Triangular matrices (including diagonal matrices in particular) have eigenvalues that are particularly easy to compute. In fact, they are just the diagonal entries.

Theorem 6.5.1. Eigenvalues of a triangular matrix.

Let \(A=[a_{i,j}]\) be a triangular matrix of order \(n\text{.}\) Then the eigenvalues of \(A\) are the diagonal entries \(a_{1,1},a_{2,2},\ldots,a_{n,n}\text{.}\)

Proof.

The matrix \(A-\lambda I\) is also triangular and, by Theorem 4.4.2, its determinant is just the product of the diagonal elements, that is,

\begin{equation*} p(\lambda)=(a_{1,1}-\lambda)(a_{2,2}-\lambda)\cdots(a_{n,n}-\lambda). \end{equation*}

The roots of this polynomial are the diagonal elements of \(A\text{.}\)

Sometimes it is possible to find the eigenvalues of a matrix by showing that it has the same eigenvalues as some diagonal matrix. The key concept used to do this is similarity.

Subsection 6.5.1 Similarity

Definition 6.5.2.

Let \(A\) and \(B\) be two square matrices of order \(n\text{.}\) They are similar if there exists an invertible matrix \(P\) of order \(n\) so that

\begin{equation*} B=P^{-1}AP\text{.} \end{equation*}

The first goal is to show that similar matrices have identical eigenvalues.

Theorem 6.5.3. Similar matrices have the same eigenvalues.

If \(A\) and \(B\) are similar, then they have the same eigenvalues.

Proof.

Suppose \(A\vec x=\lambda\vec x\) with \(\vec x\not=0\text{.}\) Let \(\vec y=P^{-1}\vec x\text{.}\) Then

\begin{equation*} B\vec y = BP^{-1}\vec x = P^{-1}APP^{-1}\vec x = P^{-1}A\vec x = P^{-1}\lambda \vec x = \lambda P^{-1}\vec x = \lambda \vec y\text{.} \end{equation*}

Note that \(\vec y=\vec0\) implies \(\vec x = PP^{-1}\vec x =P\vec y = P\vec0 = \vec0\) and, since \(\vec x\) is an eigenvector, this can not be the case. Hence \(\vec y\not=\vec0\text{,}\) and \(\lambda\) is an eigenvalue of \(B\text{.}\)

If a matrix \(A\) is similar to a diagonal matrix, then the eigenvalues are known.

Theorem 6.5.4.

If a matrix \(A\) is similar to a diagonal matrix \(D\text{,}\) then the eigenvalues of \(A\) are the diagonal entries of \(D\text{.}\)

Proof.

Theorem 6.5.3 implies that \(A\) and \(D\) have the same eigenvalues, and Theorem 6.5.1 implies that the eigenvalues of \(D\) are its diagonal elements.

A matrix \(A\) being similar to a diagonal matrix is important: it merits its own definition.

Definition 6.5.5.

A matrix \(A\) is diagonalizable if it is similar to a diagonal matrix.

Example 6.5.6. A diagonalizable matrix.

Let

\begin{align*} A \amp = \begin{bmatrix} 0 \amp 1 \amp 1 \amp 1 \\ 1 \amp 0 \amp 1 \amp 1 \\ 1 \amp 1 \amp 0 \amp 1 \\ 1 \amp 1 \amp 1 \amp 0 \end{bmatrix} \amp \text{and} \amp \amp P \amp = \begin{bmatrix} 1 \amp 1 \amp 1 \amp 1 \\ 1 \amp -1 \amp 0 \amp 0 \\ 1 \amp 0 \amp -1 \amp 0 \\ 1 \amp 0 \amp 0 \amp -1 \end{bmatrix} \end{align*}

Then

\begin{equation*} P^{-1}=\frac14 \begin{bmatrix} 1 \amp 1 \amp 1 \amp 1 \\ 1 \amp -3 \amp 1 \amp 1 \\ 1 \amp 1 \amp -3 \amp 1 \\ 1 \amp 1 \amp 1 \amp -3 \end{bmatrix} \end{equation*}

and

\begin{equation*} P^{-1}AP= \begin{bmatrix} 3 \amp 0 \amp 0\amp 0\\ 0 \amp -1 \amp 0\amp 0\\ 0 \amp 0 \amp -1\amp 0\\ 0 \amp 0 \amp 0\amp -1 \end{bmatrix}\text{.} \end{equation*}

Hence the eigenvalues of \(A\) are \(3\) and \(-1\text{.}\)

In Example 6.5.6 the matrix \(P\) is used to diagonalize a matrix. Where did it come from? Under the right conditions, eigenvectors may be used to construct this matrix.

Proposition 6.5.7. Constructing \(P\) using eigenvectors.

Suppose \(A\) is a square matrix of order \(n\text{,}\) and \(\{\vec x_1, \vec x_2,\ldots,\vec x_n\}\) is a linearly independent set of eigenvectors; say \(A\vec x_i=\lambda_i \vec x_i\) for \(i=1,2,\ldots,n\text{.}\) Let

\begin{equation*} P= \begin{bmatrix} \vec x_1\amp \vec x_2\amp\ldots\amp \vec x_n \end{bmatrix}\text{,} \end{equation*}

that is, the eigenvectors are the columns of \(P\text{.}\) Then

\begin{equation*} P^{-1}AP=\diag(\lambda_1,\lambda_2,\ldots,\lambda_n)\text{.} \end{equation*}

Proof.

From Theorem 5.9.12, the matrix \(P\) is nonsingular, and so \(P^{-1}\) exists. Now let \(\{\vec y_1, \vec y_2,\ldots,\vec y_n\}\) be the rows of \(P^{-1}\) so that

\begin{equation*} P^{-1}= \begin{bmatrix} \vec y_1\\ \vec y_2\\ \vdots\\ \vec y_n \end{bmatrix}\text{.} \end{equation*}

Then \(P^{-1}P=I\) is the same as

\begin{equation} \vec y_i \cdot \vec x_j= \begin{cases} 1 \amp \text{if } i=j\\ 0 \amp \text{otherwise.} \end{cases}\label{IAsDotProduct}\tag{6.5.1} \end{equation}

Now consider \(P^{-1}AP\text{.}\)

\begin{equation*} P^{-1}AP =\begin{bmatrix} \vec y_1\\ \vec y_2\\ \vdots\\ \vec y_n \end{bmatrix}A \begin{bmatrix} \vec x_1\amp \vec x_2\amp\ldots\amp \vec x_n \end{bmatrix} =\begin{bmatrix} \vec y_1\\ \vec y_2\\ \vdots\\ \vec y_n \end{bmatrix} \begin{bmatrix} \lambda_1\vec x_1\amp \lambda_2\vec x_2\amp\ldots\amp\lambda_n \vec x_n \end{bmatrix}\text{.} \end{equation*}

Applying (6.5.1),

\begin{equation*} P^{-1}AP=\diag(\lambda_1,\lambda_2,\ldots,\lambda_n)\text{.} \end{equation*}

Proposition 6.5.7 motivates the search for linear independent eigenvectors.

Lemma 6.5.8. Two eigenvectors from different eigenvalues are linearly independent.

Let \(\vec x_1\) and \(\vec x_2\) be two eigenvectors of \(A\) corresponding to different eigenvalues. Then \(\{\vec x_1,\vec x_2\}\) is linearly independent.

Proof.

Say \(A\vec x_1=\lambda_1\vec x_1\) and \(A\vec x_2=\lambda_2\vec x_2\text{,}\) and consider the equation \(r_1\vec x_1+r_2\vec x_2=\vec0\text{.}\) Multiplying both sides of the equation by the matrix \(A\) and multiplying both sides by \(\lambda_2\) gives two new equations:

\begin{gather*} r_1\lambda_1\vec x_1+r_2\lambda_2\vec x_2=\vec0\\ r_1\lambda_2\vec x_1+r_2\lambda_2\vec x_2=\vec0 \end{gather*}

which implies

\begin{equation*} r_1(\lambda_1-\lambda_2)\vec x_1=\vec0 \text{,} \end{equation*}

and, since \(\lambda_1\not=\lambda_2\) and \(\vec x_1\not=\vec0\text{,}\) we have \(r_1=0\text{.}\) Similarly, \(r_2=0\) and so \(\{\vec x_1,\vec x_2\}\) is linearly independent.

Proposition 6.5.9. Eigenvectors with distinct eigenvalues form a linearly independent set.

Let \(\vec x_1,\vec x_2,\ldots,\vec x_t\) be eigenvectors of \(A\) corresponding to eigenvalues that are distinct. Then the set \(\{\vec x_1,\vec x_2,\ldots,\vec x_t\}\) is linearly independent.

Proof.

Consider the sequence of sets \(\{\vec x_1\}, \{\vec x_1,\vec x_2\}, \{\vec x_1,\vec x_2,\vec x_3\},\ldots, \{\vec x_1,\vec x_2,\vec x_3,\ldots,\vec x_t\}\text{.}\) Suppose that \(\{\vec x_1,\ldots,\vec x_\ell\}\) is linearly independent, and consider \(\{\vec x_1,\ldots,\vec x_{\ell+1}\}\text{.}\) Is the new set linearly independent? To see if it is, start with the equation

\begin{equation*} r_1\vec x_1+r_2\vec x_2+\cdots+r_{\ell+1}\vec x_{\ell+1}=\vec0\text{.} \end{equation*}

Multiply both sides of this equation by \(A\text{,}\) and both sides of the equation by \(\lambda_{\ell+1}\) to get two new equations:

\begin{gather*} r_1\lambda_1\vec x_1 +r_2\lambda_2\vec x_2 +\cdots +r_{\ell+1}\lambda_{\ell+1}\vec x_{\ell+1} =\vec0\\ r_1\lambda_{\ell+1}\vec x_1 +r_2\lambda_{\ell+1}\vec x_2 +\cdots +r_{\ell+1}\lambda_{\ell+1}\vec x_{\ell+1} =\vec0\text{.} \end{gather*}

This gives

\begin{equation*} r_1(\lambda_1-\lambda_{\ell+1})\vec x_1 +r_2(\lambda_2-\lambda_{\ell+1})\vec x_2 +\cdots +r_\ell(\lambda_\ell-\lambda_{\ell+1})\vec x_\ell =\vec0\text{.} \end{equation*}

Since \(\lambda_i-\lambda_{\ell+1}\not=0\) for \(i=1,2,\ldots,\ell\text{,}\) the linear independence of \(\{\vec x_1,\ldots,\vec x_\ell\}\) implies that \(r_1=r_2=\cdots=r_\ell=0\text{.}\) Hence \(r_{\ell+1}\vec x_{\ell+1}=\vec0\) and so \(r_{\ell+1}=0\text{.}\) This makes \(\{\vec x_1,\ldots,\vec x_{\ell+1}\}\) linearly independent.

This means that if any one of the sets in our original sequence is linearly independent, then all of the following ones are also linearly independent. The first is linearly independent since \(\vec x_1\not=\vec0\text{.}\) The second is linearly independent by Lemma 6.5.8. Hence all the sets in the sequence are linearly independent, and, in particular, so is \(\{\vec x_1,\vec x_2,\ldots,\vec x_t\}\text{.}\)

Theorem 6.5.10. A matrix of order \(n\) with \(n\) distinct eigenvalues is diagonalizable.

Suppose \(A\) is a matrix of order \(n\) with \(n\) distinct eigenvalues. Then \(A\) is diagonalizable.

Proof.

Let \(\lambda_1,\ldots,\lambda_n\) be the \(n\) distinct eigenvalues with corresponding eigenvectors \(\vec x_1,\ldots,\vec x_n\text{.}\) From Proposition 6.5.9 the eigenvectors \(\vec x_1,\ldots,\vec x_n\) are linearly independent. Then Proposition 6.5.7 may be used to construct the matrix \(P\) that diagonalizes \(A\text{.}\)

There is an easy but useful relationship between the powers of similar matrices.

Theorem 6.5.11. Powers of similar matrices.

If \(B=P^{-1}AP \) then, for all \(r\geq1\text{,}\) \(B^r=P^{-1}A^rP \text{.}\)

Proof.

\begin{align*} B^r \amp = (P^{-1}AP)^r \\ \amp = (P^{-1}AP)(P^{-1}AP)\cdots (P^{-1}AP)(P^{-1}AP) \\ \amp = P^{-1}A(PP^{-1})A(PP^{-1})A(PP^{-1})\cdots (PP^{-1})A(PP^{-1})AP \\ \amp = P^{-1}A^rP \end{align*}

This theorem has particular application to diagonalizable matrices.

Corollary 6.5.12. Powers of diagonalizable matrices.

Suppose \(A\) is diagonalizable, that is, is similar to a matrix \(P^{-1}AP=D=\diag(\lambda_1,\lambda_2,\ldots,\lambda_n)\text{.}\) Then \(A^r=P\diag(\lambda_1^r,\lambda_2^r,\ldots,\lambda_n^r)P^{-1}\) where \(\lambda_1,\lambda_2,\ldots,\lambda_n\) are the eigenvalues of \(A\text{.}\)

Proof.

Using Theorem 6.5.11, \(D^r=P^{-1}A^rP\text{,}\) from which follows \(A^r=PD^rP^{-1}=P\diag(\lambda_1^r,\lambda_2^r,\ldots,\lambda_n^r)P^{-1}\text{.}\) From Theorem 6.5.1 and Theorem 6.5.3, \(\lambda_1,\lambda_2,\ldots,\lambda_n\) are the eigenvalues of \(D\) and hence also of \(A\text{.}\)

Example 6.5.13. Using eigenvalues to compute the powers of a matrix.

Let \(A= \begin{bmatrix}1\amp2\\2\amp1\end{bmatrix} \text{.}\) Then \(\vec x_1= \begin{bmatrix}1\\1\end{bmatrix}\) and \(\vec x_2 = \begin{bmatrix}1\\-1\end{bmatrix}\) are eigenvectors with corresponding eigenvalues \(\lambda_1=3\) and \(\lambda_2=-1\text{.}\) Using Theorem 6.5.10,

\begin{equation*} P=\begin{bmatrix} 1\amp 1\\ 1\amp -1 \end{bmatrix} \end{equation*}

diagonalizes \(A\text{.}\) Indeed,

\begin{align*} P^{-1}= \frac12\begin{bmatrix}1\amp1\\1\amp-1\end{bmatrix} \amp\amp D=\begin{bmatrix} 3\amp 0\\ 0\amp -1 \end{bmatrix} \amp \amp D^n=\begin{bmatrix} 3^n\amp 0\\ 0\amp (-1)^n \end{bmatrix}\text{.} \end{align*}

By an easy computation,

\begin{align*} P D^n P^{-1} \amp = \frac12 \begin{bmatrix} 1\amp 1\\ 1\amp -1 \end{bmatrix} \begin{bmatrix} 3^n\amp 0\\ 0\amp (-1)^n \end{bmatrix} \begin{bmatrix} 1\amp 1\\ 1\amp -1 \end{bmatrix}\\ \amp =\frac12 \begin{bmatrix} 3^n+(-1)^n \amp 3^n+(-1)^{n+1}\\ 3^n+(-1)^{n+1}\amp 3^n+(-1)^n \end{bmatrix}\text{.} \end{align*}

and so

\begin{equation*} A^n= \begin{bmatrix} \frac{3^n+(-1)^n}2 \amp \frac{3^n+(-1)^{n+1}}2\\ \frac{3^n+(-1)^{n+1}}2\amp \frac{3^n+(-1)^n}2 \end{bmatrix}\text{.} \end{equation*}

Observe that for small values of \(n\) we may easily compute the entries of \(A^n\text{:}\)

Table 6.5.14.

\(n\)	\(\frac{3^n+(-1)^n}2\)	\(\frac{3^n+(-1)^{n+1}}2\)
\(1\)	\(1\)	\(2\)
\(2\)	\(5\)	\(4\)
\(3\)	\(13\)	\(14\)
\(4\)	\(40\)	\(41\)
\(5\)	\(122\)	\(121\)

Compare this result with these earlier computations.