Skip to main content

Section 7.4 Matrix multiplication and linear transformations

If we are given an \(n\times m\) matrix \(A\text{,}\) we may define \(L(\vec x)=A\vec x\) and, as we saw in Subsection 7.3.1, \(L\colon\R^m\to\R^n\) is then a linear transformation. We denote this transformation by \(L_A\text{.}\)

Now suppose we have two linear transformations \(L_1\colon\R^m\to\R^n\) and \(L_2\colon\R^n\to\R^s\text{.}\) We then define a new linear transformation called the composition of \(L_1\) and \(L_2\) (and denoted \(L_2\circ L_1\)) in the following way: \(L_2\circ L_1\colon\R^m\to\R^s\) and

\begin{equation*} (L_2\circ L_1)(\vec x)=L_2(L_1(\vec x)) \end{equation*}

This means that we start with a vector \(\vec x\) in \(\R^m\) and compute \(L_1(\vec x)\text{.}\) This vector is in \(\R^n\text{,}\) which is exactly right if we want to evaluate \(L_2\text{.}\) Hence \(L_2(L_1(\vec x))\) not only makes sense, but upon evaluation we have a vector in \(\R^s\text{.}\)

We can visualize the composition of two functions as:

Figure 7.4.1. The composition of two linear transformations \(L_1\) and \(L_2\)

The vector \(\vec x\) is in \(\R^m\text{;}\) following the arrow labelled with \(L_1\) gets us to \(L_1(\vec x)\) in \(\R^m\text{;}\) following the next arrow labelled with \(L_2\) gets us to gets us to \(L_2(L_1(\vec x))\) in \(\R^s\text{.}\) The long arrow corresponds to \(L_2\circ L_1\text{:}\) it goes directly from \(\vec x\) to the same vector in \(\R^s.\)

Next, we wish to see that this composition is itself linear.

For any \(\vec x\) in \(\R^m\text{,}\) we have \(L_3(\vec x)=(L_2\circ L_1)(\vec x)=L_2(L_1(\vec x))\text{,}\) and so we have \(L_3\colon\R^m\to\R^s\text{.}\)

\begin{align*} L_3(\vec x + \vec y) \amp =L_2(L_1(\vec x+ \vec y))\\ \amp = L_2(L_1(\vec x) + L_1(\vec y)) \amp \gets \text{ since }L_1 \text{ is linear}\\ \amp = L_2(L_1(\vec x)) + L_2(L_1(\vec y)) \amp \gets \text{ since }L_2\text{ is linear}\\ \amp = L_3(\vec x) + L_3(\vec y)\text{.} \end{align*}
\begin{align*} L_3(r\vec x) \amp =L_2(L_1(r\vec x))\\ \amp =L_2(rL_1(\vec x)) \amp \gets\text{ since }L_1\text{ is linear}\\ \amp =rL_2(L_1(\vec x) \amp \gets\text{ since }L_2\text{ is linear}\\ \amp =rL_3(\vec x)\text{.} \end{align*}

Hence \(L_3=L_2\circ L_1\) is linear.

Example 7.4.3.

Consider the linear transformations \(T_1\colon \R^3\to \R^2\) and \(T_2\colon \R^2\to \R^3\) given by

\begin{equation*} T_1((x,y,z))=(x+y,y+z)\\ T_2((x,y))=(x+y,2x+y,x+2y)\text{.} \end{equation*}

First we observe that

\begin{equation*} T_1\circ T_2: \R^2\to \R^2\\ T_2\circ T_1: \R^3\to \R^3\text{.} \end{equation*}

Then the computation:

\begin{align*} (T_1\circ T_2)(x,y)\amp =T_1(T_2((x,y)))\\ \amp=T_1((x+y,2x+y,x+2y))\\ \amp=(3x+2y,3x+3y) \end{align*}
\begin{align*} (T_2\circ T_1)(x,y,z)\amp=T_2(T_1((x,y,z))\\ \amp=T_2((x+y,y+z))\\ \amp=(x+2y+z, 2x+3y+z, x+3y+2z)\text{.} \end{align*}

Next we note that composition of linear transformations and matrix multiplication are closely related:

First note that \(L_A\colon\R^m\to\R^n\) and \(L_B:\R^n\to\R^s\text{,}\) and so \(L_B\circ L_A\colon\R^m\to\R^s\) is properly defined. In addition, note that \(BA\) is an \(s\times m\) matrix, and so \(L_{BA}\colon\R^m\to\R^s\) also makes sense. Finally note that for any \(\vec x\) in \(\R^m\text{,}\) we have

\begin{equation*} (L_B\circ L_A)\vec x=L_B(L_A(\vec x))=L_B(A\vec x) =BA\vec x=L_{BA}(\vec x) \end{equation*}

Hence \(L_{BA}=L_B\circ L_A\text{.}\)

Example 7.4.5.

We continue with Example 7.4.3:

\begin{equation*} T_1((x,y,z))=(x+y,y+z)\\ T_2((x,y))=(x+y,2x+y,x+2y)\text{.} \end{equation*}

Then

\begin{equation*} A_{T_1} = \begin{bmatrix}T_1(\vec e_1)\amp T_1(\vec e_2)\amp T_1(\vec e_3)\end{bmatrix} = \begin{bmatrix}1\amp1\amp0\\0\amp1\amp1\end{bmatrix}\\ A_{T_2} = \begin{bmatrix} T_2(\vec e_1)\amp T_2(\vec e_2)\end{bmatrix} = \begin{bmatrix}1\amp1\\ 2\amp1\\ 1\amp2\end{bmatrix}\text{.} \end{equation*}

From Theorem 7.4.4,

\begin{equation*} A_{T_1\circ T_2} = A_{T_1}A_{T_2}= \begin{bmatrix}1\amp1\amp0\\0\amp1\amp1\end{bmatrix} \begin{bmatrix}1\amp1\\ 2\amp1\\ 1\amp2\end{bmatrix} = \begin{bmatrix} 3\amp2\\ 3\amp3 \end{bmatrix} \\ A_{T_2\circ T_1} = A_{T_2}A_{T_1} = \begin{bmatrix}1\amp1\\ 2\amp1\\ 1\amp2\end{bmatrix} \begin{bmatrix}1\amp1\amp0\\0\amp1\amp1\end{bmatrix} = \begin{bmatrix} 1\amp2\amp1\\ 2\amp3\amp1\\1\amp3\amp2 \end{bmatrix}\text{.} \end{equation*}

It then follows that

\begin{equation*} (T_1\circ T_2)(x,y) = A_{T_1}A_{T_2} \begin{bmatrix} x\\y \end{bmatrix} =\begin{bmatrix} 3\amp2\\ 3\amp3 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} = \begin{bmatrix} 3x+2y\\3x+3y \end{bmatrix}\\ (T_2\circ T_1)(x,y,z) = A_{T_2}A_{T_1} \begin{bmatrix} x\\y\\z \end{bmatrix} = \begin{bmatrix} 1\amp2\amp1\\ 2\amp3\amp1\\1\amp3\amp2 \end{bmatrix} \begin{bmatrix} x\\y\\z \end{bmatrix} = \begin{bmatrix} x+2y+z\\2x+3y+z\\x+3y+2z\end{bmatrix} \end{equation*}

and so

\begin{equation*} (T_1\circ T_2)(x,y)=( 3x+2y,3x+3y)\\ (T_2\circ T_1)(x,y)=(x+2y+z,2x+3y+z,x+3y+2z) \text{.} \end{equation*}

Compare this result with Example 7.4.3.

This example shows the power and importance of Theorem 7.4.4. The more difficult problem of computing the composition of two linear transformations is reduced to the much easier one of multiplying their respective matrix representations.

Subsection 7.4.1 Linear Operators

A linear operator is a linear transformation of the form

\begin{equation*} L\colon \R^n\to\R^n\text{,} \end{equation*}

that is, a linear transformation with \(m=n\text{.}\) In this case the matrix \(A\) that represents \(L\) will be square. Notice that the composition of two linear operators is also a linear operator.

If \(L\) is a linear operator, then the composition \(L\circ L\) is defined, and it is denoted \(L^2\text{.}\) Similarly, \(L^3=L \circ L \circ L\text{,}\) etc.

Let \(L\) be a linear operator with matrix representation \(A\text{.}\) Then

  • The matrix representation of \(L^2\) is \(A^2\text{.}\)
  • The matrix representation of \(L^n\) is \(A^n\) for \(n=1,2,\ldots\text{.}\)
Solution

This is an easy application of Theorem 7.4.4.

All of the examples in List 7.3.1 are linear operators. The following examples consider their compositions.

Example 7.4.7. Composition of linear operators.
  • Let \(L_1\) be a rotation around the origin counterclockwise by an angle \(\phi\) and \(L_2\) be a rotation by an angle \(\theta\text{.}\) This means
    \begin{equation*} A_{L_1}= \begin{bmatrix} \cos \phi \amp -\sin\phi\\ \sin\phi \amp \cos\phi \end{bmatrix}\\ A_{L_2}= \begin{bmatrix} \cos \theta \amp -\sin\theta\\ \sin\theta \amp \cos\theta \end{bmatrix} \text{.} \end{equation*}
    Then \(L_1\circ L_2\) is a rotation by \(\theta\) followed by a rotation by an angle \(\phi\text{;}\) together it is a rotation through an angle of \(\theta+\phi\text{,}\) and so
    \begin{equation*} A_{L_1\circ L_2}= \begin{bmatrix} \cos(\theta+\phi) \amp -\sin(\theta+\phi)\\ \sin(\theta+\phi) \amp \cos(\theta+\phi) \end{bmatrix} \text{.} \end{equation*}
    It then follows from Theorem 7.4.4 that
    \begin{equation*} \begin{bmatrix} \cos(\theta+\phi) \amp -\sin(\theta+\phi)\\ \sin(\theta+\phi) \amp \cos(\theta+\phi) \end{bmatrix} = \begin{bmatrix} \cos \phi \amp -\sin\phi\\ \sin\phi \amp \cos\phi \end{bmatrix} \begin{bmatrix} \cos \theta \amp -\sin\theta\\ \sin\theta \amp \cos\theta \end{bmatrix} \text{.} \end{equation*}
    Computing the matrix product and equating the corresponding entries in the first column gives
    \begin{equation*} \cos(\theta+\phi) = \cos\theta \cos\phi-\sin\theta\sin\phi\\ \sin(\theta+\phi) = \cos\theta\sin\phi+\sin\theta\cos\phi\text{.} \end{equation*}
    This might be the world's shortest proof of the sum formulas for the sine and cosine functions.
  • Let \(L\) be a reflection by the line \(y=x\text{,}\) and consider \(L^2\text{.}\) Then
    \begin{equation*} A_{L^2} =(A_L)^2 = \left(\begin{bmatrix} 0\amp1\\1\amp0 \end{bmatrix}\right)^2 =I\text{,} \end{equation*}
    and so \(L^2\) is the identity transformation. This is not surprising: the mirror image of a mirror image is just the original image. This can also be written as \(A=A^{-1}\text{.}\) A matrix satisfying this equation are called an involution.
  • Let \(L\) be the projection onto the line \(y=x\text{,}\) and consider \(L^2\text{.}\) Then
    \begin{equation*} A_{L^2} =(A_L)^2 = \left(\frac12\begin{bmatrix} 1\amp1\\1\amp1 \end{bmatrix}\right)^2 =\frac14 \begin{bmatrix} 2\amp2\\2\amp2 \end{bmatrix} =\frac12 \begin{bmatrix} 1\amp1\\1\amp1 \end{bmatrix} =A_L\text{.} \end{equation*}
    A matrix \(A\) satisfying \(A^2=A\) is called idempotent.

Prove that a matrix \(A\) is an involution if and only if \(\frac12(A+I) \) is idempotent.

Solution

Suppose \(\frac12(A+I)\) is idempotent. Then

\begin{equation*} \frac12(A+I)=\left(\frac12(A+I)\right)^2=\frac14(A^2+2A+I)\\ 2(A+I)=A^2+2A+I\\ A^2=I \end{equation*}

and so \(A\) is an involution.

Suppose \(A\) is an involution. Then

\begin{equation*} \left(\frac12(A+I)\right)^2=\frac14(A^2+2A+I) =\frac14(I+2A+I) =\frac12(A+I) \end{equation*}

and so \(\frac12(A+I)\) is idempotent.

Example 7.4.9.

Now we take another look at Figure 5.10.12 and Example 5.10.10 in which we wish to see that the graph of \(x^2-xy+y^2=1\) is an ellipse.

Figure 7.4.10. Graph of \(x^2-xy+y^2=1\)

The new strategy is to rotate the graph clockwise through an angle of \(\frac\pi4\) and then verify that the new graph satisfies \(\frac{x^2}{a^2}+\frac{y^2}{b^2}=1\) for appropriate choice of \(a\) and \(b\text{.}\)

First we determine \(a\) and \(b\text{.}\) Since \((1,1)\) rotates into \((\sqrt2,0)\text{,}\) we must have \(a^2=2\text{.}\) Similarly, we must have \(b^2=\frac23\text{.}\) This means that \(\frac{x^2}{a^2}+\frac{y^2}{b^2}=1\) may be rewritten as \(x^2+3y^2=2\text{.}\)

Now suppose that \((x,y)\) is on the original graph, that is, \(x^2-xy+y^2=1\text{,}\) and suppose that \((x,y)\) rotates to \((u,v)\text{.}\) The matrix representation of the rotation is \(\frac1{\sqrt2} \left[\begin{smallmatrix} 1\amp1\\-1\amp1 \end{smallmatrix}\right] \text{,}\) and so

\begin{equation*} \begin{bmatrix} u\\v \end{bmatrix} = \frac1{\sqrt2} \begin{bmatrix} 1\amp1\\-1\amp1 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} = \frac1{\sqrt2} \begin{bmatrix}x+y\\-x+y\end{bmatrix}\text{.} \end{equation*}

This implies

\begin{equation*} u^2+3v^2 =\frac12(x+y)^2+\frac32(-x+y)^2 =2(x^2-xy+y^2)=2\text{,} \end{equation*}

and so \((u,v)\) is on the ellipse with the equation \(\frac{x^2}{a^2}+\frac{y^2}{b^2}=1\text{.}\)

Example 7.4.11.

In List 7.3.1 we saw that a reflection by the line \(y=x\) is a linear transformation. We now extend this to arbitrary lines through the origin with slope \(m\text{,}\) that is, lines with an equation of the form \(y=mx\text{.}\) Note that the point \((1,m)\) is on the line. If we let \(\theta\) be the angle between the line and the positive \(x\text{-axis}\text{,}\) then \(\cos\theta=\frac1{\sqrt{m^2+1}}\) and \(\sin\theta=\frac m{\sqrt{m^2+1}}\text{.}\)

Figure 7.4.12. Reflections \(\mathbf u\) to \(\mathbf v\) by the line \(y=mx\)

By a clever use of composition of linear transformations, not only it can be seen that the reflection is a linear transformation, but also the actual formula can be revealed.

Consider the following sequence of linear transformations:

  • \(L_1\text{:}\) Rotation clockwise by an angle \(\theta\text{,}\)
  • \(L_2\text{:}\) Reflection by the \(x\)-axis,
  • \(L_3\text{:}\) Rotation counterclockwise by an angle \(\theta\text{.}\)
(a) Initial point and line
(b) Rotation clockwise by \(\theta\)
(c) Reflection by the \(x\)-axis
(d) Rotation counterclockwise by \(\theta\)
Figure 7.4.13. Reflection by the line \(y=mx\)

Observe that the composition \(L_3\circ L_2\circ L_1\) has the effect: first \(L_1\text{,}\) the rotation of the plane clockwise by \(\theta\) rotates the line \(y=mx\) to the \(x\)-axis; \(L_2\text{,}\) the reflection by the \(x\)-axis, simply multiplies the second coordinate by \(-1\text{;}\) \(L_3\text{,}\) the rotation of the plane counterclockwise by \(\theta\) moves \(\mathbf u\) back to its original position. The combined effect is the reflection by the line \(y=mx\text{.}\)

Table 7.4.14. Linear transformations and matrix representations
Transformation Description Matrix representation
\(L_1\) Rotation clockwise by \(\theta\) \(\left[\begin{smallmatrix} \cos\theta \amp \sin\theta \\ -\sin\theta \amp \cos\theta \end{smallmatrix}\right]\)
\(L_2\) Reflection by \(x\)-axis \(\left[\begin{smallmatrix} 1\amp 0\\ 0\amp -1 \end{smallmatrix}\right]\)
\(L_3\) Rotation by \(\theta\) \(\left[\begin{smallmatrix} \cos\theta \amp -\sin\theta \\ \sin\theta \amp \cos\theta \end{smallmatrix}\right]\)

The matrix representation for \(L_1\) is given in Checkpoint 7.3.19. Reflecting by the \(x\)-axis simply negates the second coordinate, and so the matrix representation for \(L_2\) is straightforward. Finally, Theorem 7.4.4 is used to compute the matrix representation of \(L_3\circ L_2\circ L_1\text{.}\) Recall that \(\cos\theta=\frac1{\sqrt{m^2+1}}\) and \(\sin\theta=\frac m{\sqrt{m^2+1}}\text{.}\)

\begin{alignat*}{4} \begin{bmatrix} \cos\theta \amp -\sin\theta \\ \sin\theta \amp \cos\theta \end{bmatrix} \amp \begin{bmatrix} 1\amp 0\\ 0\amp -1 \end{bmatrix} \begin{bmatrix} \cos\theta \amp \sin\theta \\ -\sin\theta \amp \cos\theta \end{bmatrix}\\ \amp =\frac1{\sqrt{m^2+1}} \begin{bmatrix} 1 \amp -m \\ m \amp 1 \end{bmatrix} \begin{bmatrix} 1 \amp 0 \\ 0 \amp -1 \end{bmatrix} \frac1{\sqrt{m^2+1}} \begin{bmatrix} 1 \amp m \\ -m \amp 1 \end{bmatrix}\\ \amp = \frac1{m^2+1} \begin{bmatrix} 1-m^2 \amp 2m \\ 2m \amp m^2-1 \end{bmatrix}\text{.} \end{alignat*}

It then follows that

\begin{equation*} T((x,y))=\frac1{m^2+1} \bigl((1-m^2)x+2my, 2mx-(1-m^2)y\bigr)\text{.} \end{equation*}
Example 7.4.16.

Consider the following sequence of linear operators in \(\R^2\text{:}\)

  1. \(L_1\text{:}\) reflect by the line \(y=x\text{,}\)
  2. \(L_2\text{:}\) rotate counterclockwise by \(\theta=\frac\pi4\text{,}\)
  3. \(L_3\text{:}\) reflect by the line \(y=-x\text{,}\)
  4. \(L_4\text{:}\) rotate clockwise by \(\theta=\frac\pi4\text{,}\)
  5. \(L_5\text{:}\) reflect by the line \(y=x\text{.}\)

The application of this sequence of operators results in an operator \(L=L_5\circ L_4\circ L_3\circ L_2\circ L_1\text{.}\) To understand the structure of \(L\text{,}\) look at the matrix representation in each case:

\(L_1\text{:}\) \(\begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \end{bmatrix}\)
\(L_2\text{:}\) \(\frac1{\sqrt2}\begin{bmatrix}1 \amp -1 \\ 1 \amp 1 \end{bmatrix}\)
\(L_3\text{:}\) \(\begin{bmatrix} 0 \amp-1 \\-1 \amp 0 \end{bmatrix}\)
\(L_4\text{:}\) \(\frac1{\sqrt2}\begin{bmatrix}1 \amp 1 \\ -1 \amp 1 \end{bmatrix}\)
\(L_5\text{:}\) \(\begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \end{bmatrix}\)

The matrix representation of the composition is then

\begin{equation*} \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \end{bmatrix} \frac1{\sqrt2}\begin{bmatrix}1 \amp 1 \\ -1 \amp 1 \end{bmatrix} \begin{bmatrix} 0 \amp-1 \\-1 \amp 0 \end{bmatrix} \frac1{\sqrt2}\begin{bmatrix}1 \amp -1 \\ 1 \amp 1 \end{bmatrix} \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \end{bmatrix} = \begin{bmatrix} 1 \amp 0 \\ 0 \amp -1 \end{bmatrix}\text{.} \end{equation*}

Hence \(L((x,y))=(x,-y)\) and \(L\) is simply a reflection by the \(x\)-axis.