Skip to main content

Section 6.5 Similarity and diagonalization

Triangular matrices (including diagonal matrices in particular) have eigenvalues that are particularly easy to compute. In fact, they are just the diagonal entries.

The matrix \(A-\lambda I\) is also triangular and, by Theorem 4.4.2, its determinant is just the product of the diagonal elements, that is,

\begin{equation*} p(\lambda)=(a_{1,1}-\lambda)(a_{2,2}-\lambda)\cdots(a_{n,n}-\lambda). \end{equation*}

The roots of this polynomial are the diagonal elements of \(A\text{.}\)

Sometimes it is possible to find the eigenvalues of a matrix by showing that it has the same eigenvalues as some diagonal matrix. The key concept used to do this is similarity.

Subsection 6.5.1 Similarity

Definition 6.5.2.

Let \(A\) and \(B\) be two square matrices of order \(n\text{.}\) They are similar if there exists an invertible matrix \(P\) of order \(n\) so that

\begin{equation*} B=P^{-1}AP\text{.} \end{equation*}

The first goal is to show that similar matrices have identical eigenvalues.

Suppose \(A\vec x=\lambda\vec x\) with \(\vec x\not=0\text{.}\) Let \(\vec y=P^{-1}\vec x\text{.}\) Then

\begin{equation*} B\vec y = BP^{-1}\vec x = P^{-1}APP^{-1}\vec x = P^{-1}A\vec x = P^{-1}\lambda \vec x = \lambda P^{-1}\vec x = \lambda \vec y\text{.} \end{equation*}

Note that \(\vec y=\vec0\) implies \(\vec x = PP^{-1}\vec x =P\vec y = P\vec0 = \vec0\) and, since \(\vec x\) is an eigenvector, this can not be the case. Hence \(\vec y\not=\vec0\text{,}\) and \(\lambda\) is an eigenvalue of \(B\text{.}\)

If a matrix \(A\) is similar to a diagonal matrix, then the eigenvalues are known.

Theorem 6.5.3 implies that \(A\) and \(D\) have the same eigenvalues, and Theorem 6.5.1 implies that the eigenvalues of \(D\) are its diagonal elements.

A matrix \(A\) being similar to a diagonal matrix is important: it merits its own definition.

Definition 6.5.5.

A matrix \(A\) is diagonalizable if it is similar to a diagonal matrix.

Example 6.5.6. A diagonalizable matrix.

Let

\begin{align*} A \amp = \begin{bmatrix} 0 \amp 1 \amp 1 \amp 1 \\ 1 \amp 0 \amp 1 \amp 1 \\ 1 \amp 1 \amp 0 \amp 1 \\ 1 \amp 1 \amp 1 \amp 0 \end{bmatrix} \amp \text{and} \amp \amp P \amp = \begin{bmatrix} 1 \amp 1 \amp 1 \amp 1 \\ 1 \amp -1 \amp 0 \amp 0 \\ 1 \amp 0 \amp -1 \amp 0 \\ 1 \amp 0 \amp 0 \amp -1 \end{bmatrix} \end{align*}

Then

\begin{equation*} P^{-1}=\frac14 \begin{bmatrix} 1 \amp 1 \amp 1 \amp 1 \\ 1 \amp -3 \amp 1 \amp 1 \\ 1 \amp 1 \amp -3 \amp 1 \\ 1 \amp 1 \amp 1 \amp -3 \end{bmatrix} \end{equation*}

and

\begin{equation*} P^{-1}AP= \begin{bmatrix} 3 \amp 0 \amp 0\amp 0\\ 0 \amp -1 \amp 0\amp 0\\ 0 \amp 0 \amp -1\amp 0\\ 0 \amp 0 \amp 0\amp -1 \end{bmatrix}\text{.} \end{equation*}

Hence the eigenvalues of \(A\) are \(3\) and \(-1\text{.}\)

In Example 6.5.6 the matrix \(P\) is used to diagonalize a matrix. Where did it come from? Under the right conditions, eigenvectors may be used to construct this matrix.

From Theorem 5.9.12, the matrix \(P\) is nonsingular, and so \(P^{-1}\) exists. Now let \(\{\vec y_1, \vec y_2,\ldots,\vec y_n\}\) be the rows of \(P^{-1}\) so that

\begin{equation*} P^{-1}= \begin{bmatrix} \vec y_1\\ \vec y_2\\ \vdots\\ \vec y_n \end{bmatrix}\text{.} \end{equation*}

Then \(P^{-1}P=I\) is the same as

\begin{equation} \vec y_i \cdot \vec x_j= \begin{cases} 1 \amp \text{if } i=j\\ 0 \amp \text{otherwise.} \end{cases}\label{IAsDotProduct}\tag{6.5.1} \end{equation}

Now consider \(P^{-1}AP\text{.}\)

\begin{equation*} P^{-1}AP =\begin{bmatrix} \vec y_1\\ \vec y_2\\ \vdots\\ \vec y_n \end{bmatrix}A \begin{bmatrix} \vec x_1\amp \vec x_2\amp\ldots\amp \vec x_n \end{bmatrix} =\begin{bmatrix} \vec y_1\\ \vec y_2\\ \vdots\\ \vec y_n \end{bmatrix} \begin{bmatrix} \lambda_1\vec x_1\amp \lambda_2\vec x_2\amp\ldots\amp\lambda_n \vec x_n \end{bmatrix}\text{.} \end{equation*}

Applying (6.5.1),

\begin{equation*} P^{-1}AP=\diag(\lambda_1,\lambda_2,\ldots,\lambda_n)\text{.} \end{equation*}

Proposition 6.5.7 motivates the search for linear independent eigenvectors.

Say \(A\vec x_1=\lambda_1\vec x_1\) and \(A\vec x_2=\lambda_2\vec x_2\text{,}\) and consider the equation \(r_1\vec x_1+r_2\vec x_2=\vec0\text{.}\) Multiplying both sides of the equation by the matrix \(A\) and multiplying both sides by \(\lambda_2\) gives two new equations:

\begin{gather*} r_1\lambda_1\vec x_1+r_2\lambda_2\vec x_2=\vec0\\ r_1\lambda_2\vec x_1+r_2\lambda_2\vec x_2=\vec0 \end{gather*}

which implies

\begin{equation*} r_1(\lambda_1-\lambda_2)\vec x_1=\vec0 \text{,} \end{equation*}

and, since \(\lambda_1\not=\lambda_2\) and \(\vec x_1\not=\vec0\text{,}\) we have \(r_1=0\text{.}\) Similarly, \(r_2=0\) and so \(\{\vec x_1,\vec x_2\}\) is linearly independent.

Consider the sequence of sets \(\{\vec x_1\}, \{\vec x_1,\vec x_2\}, \{\vec x_1,\vec x_2,\vec x_3\},\ldots, \{\vec x_1,\vec x_2,\vec x_3,\ldots,\vec x_t\}\text{.}\) Suppose that \(\{\vec x_1,\ldots,\vec x_\ell\}\) is linearly independent, and consider \(\{\vec x_1,\ldots,\vec x_{\ell+1}\}\text{.}\) Is the new set linearly independent? To see if it is, start with the equation

\begin{equation*} r_1\vec x_1+r_2\vec x_2+\cdots+r_{\ell+1}\vec x_{\ell+1}=\vec0\text{.} \end{equation*}

Multiply both sides of this equation by \(A\text{,}\) and both sides of the equation by \(\lambda_{\ell+1}\) to get two new equations:

\begin{gather*} r_1\lambda_1\vec x_1 +r_2\lambda_2\vec x_2 +\cdots +r_{\ell+1}\lambda_{\ell+1}\vec x_{\ell+1} =\vec0\\ r_1\lambda_{\ell+1}\vec x_1 +r_2\lambda_{\ell+1}\vec x_2 +\cdots +r_{\ell+1}\lambda_{\ell+1}\vec x_{\ell+1} =\vec0\text{.} \end{gather*}

This gives

\begin{equation*} r_1(\lambda_1-\lambda_{\ell+1})\vec x_1 +r_2(\lambda_2-\lambda_{\ell+1})\vec x_2 +\cdots +r_\ell(\lambda_\ell-\lambda_{\ell+1})\vec x_\ell =\vec0\text{.} \end{equation*}

Since \(\lambda_i-\lambda_{\ell+1}\not=0\) for \(i=1,2,\ldots,\ell\text{,}\) the linear independence of \(\{\vec x_1,\ldots,\vec x_\ell\}\) implies that \(r_1=r_2=\cdots=r_\ell=0\text{.}\) Hence \(r_{\ell+1}\vec x_{\ell+1}=\vec0\) and so \(r_{\ell+1}=0\text{.}\) This makes \(\{\vec x_1,\ldots,\vec x_{\ell+1}\}\) linearly independent.

This means that if any one of the sets in our original sequence is linearly independent, then all of the following ones are also linearly independent. The first is linearly independent since \(\vec x_1\not=\vec0\text{.}\) The second is linearly independent by Lemma 6.5.8. Hence all the sets in the sequence are linearly independent, and, in particular, so is \(\{\vec x_1,\vec x_2,\ldots,\vec x_t\}\text{.}\)

Let \(\lambda_1,\ldots,\lambda_n\) be the \(n\) distinct eigenvalues with corresponding eigenvectors \(\vec x_1,\ldots,\vec x_n\text{.}\) From Proposition 6.5.9 the eigenvectors \(\vec x_1,\ldots,\vec x_n\) are linearly independent. Then Proposition 6.5.7 may be used to construct the matrix \(P\) that diagonalizes \(A\text{.}\)

There is an easy but useful relationship between the powers of similar matrices.

\begin{align*} B^r \amp = (P^{-1}AP)^r \\ \amp = (P^{-1}AP)(P^{-1}AP)\cdots (P^{-1}AP)(P^{-1}AP) \\ \amp = P^{-1}A(PP^{-1})A(PP^{-1})A(PP^{-1})\cdots (PP^{-1})A(PP^{-1})AP \\ \amp = P^{-1}A^rP \end{align*}

This theorem has particular application to diagonalizable matrices.

Using Theorem 6.5.11, \(D^r=P^{-1}A^rP\text{,}\) from which follows \(A^r=PD^rP^{-1}=P\diag(\lambda_1^r,\lambda_2^r,\ldots,\lambda_n^r)P^{-1}\text{.}\) From Theorem 6.5.1 and Theorem 6.5.3, \(\lambda_1,\lambda_2,\ldots,\lambda_n\) are the eigenvalues of \(D\) and hence also of \(A\text{.}\)

Example 6.5.13. Using eigenvalues to compute the powers of a matrix.

Let \(A= \begin{bmatrix}1\amp2\\2\amp1\end{bmatrix} \text{.}\) Then \(\vec x_1= \begin{bmatrix}1\\1\end{bmatrix}\) and \(\vec x_2 = \begin{bmatrix}1\\-1\end{bmatrix}\) are eigenvectors with corresponding eigenvalues \(\lambda_1=3\) and \(\lambda_2=-1\text{.}\) Using Theorem 6.5.10,

\begin{equation*} P=\begin{bmatrix} 1\amp 1\\ 1\amp -1 \end{bmatrix} \end{equation*}

diagonalizes \(A\text{.}\) Indeed,

\begin{align*} P^{-1}= \frac12\begin{bmatrix}1\amp1\\1\amp-1\end{bmatrix} \amp\amp D=\begin{bmatrix} 3\amp 0\\ 0\amp -1 \end{bmatrix} \amp \amp D^n=\begin{bmatrix} 3^n\amp 0\\ 0\amp (-1)^n \end{bmatrix}\text{.} \end{align*}

By an easy computation,

\begin{align*} P D^n P^{-1} \amp = \frac12 \begin{bmatrix} 1\amp 1\\ 1\amp -1 \end{bmatrix} \begin{bmatrix} 3^n\amp 0\\ 0\amp (-1)^n \end{bmatrix} \begin{bmatrix} 1\amp 1\\ 1\amp -1 \end{bmatrix}\\ \amp =\frac12 \begin{bmatrix} 3^n+(-1)^n \amp 3^n+(-1)^{n+1}\\ 3^n+(-1)^{n+1}\amp 3^n+(-1)^n \end{bmatrix}\text{.} \end{align*}

and so

\begin{equation*} A^n= \begin{bmatrix} \frac{3^n+(-1)^n}2 \amp \frac{3^n+(-1)^{n+1}}2\\ \frac{3^n+(-1)^{n+1}}2\amp \frac{3^n+(-1)^n}2 \end{bmatrix}\text{.} \end{equation*}

Observe that for small values of \(n\) we may easily compute the entries of \(A^n\text{:}\)

Table 6.5.14.
\(n\) \(\frac{3^n+(-1)^n}2\) \(\frac{3^n+(-1)^{n+1}}2\)
\(1\) \(1\) \(2\)
\(2\) \(5\) \(4\)
\(3\) \(13\) \(14\)
\(4\) \(40\) \(41\)
\(5\) \(122\) \(121\)

Compare this result with these earlier computations.