The dot product of vectors in \(\R^n\)

Section 5.2 The dot product of vectors in \(\R^n\)

We now look at something different: a product of two vectors in \(\R^n\text{.}\)

Subsection 5.2.1 Definition of the dot product

Definition 5.2.1. The dot product.

If \(\vec x=(x_1,x_2,\ldots,x_n)\) and \(\vec y=(y_1,y_2,\ldots,y_n)\text{,}\) then the dot product (or inner product) of \(\vec x\) and \(\vec y\) is

\begin{equation*} \vec x\cdot \vec y= x_1y_1+x_2y_2+\cdots+x_n y_n =\sum_{i=1}^n x_iy_i \end{equation*}

Notice that if \(\vec x\) and \(\vec y\) are viewed as column vectors \(x\) and \(y\text{,}\) then

\begin{equation*} \vec x\cdot \vec y=x^T y \end{equation*}

and so we see that the dot product may be viewed as a special case of matrix multiplication.

Remark 5.2.2.

Since \(\vec x\cdot\vec y\) is a real number and \(x^Ty\) is a \(1\times 1\) matrix, to be completely precise we would say \(x^Ty=[\vec x\cdot\vec y]\text{.}\) We will be this precise only when not being so might create confusion.

Proposition 5.2.3. First properties of the dot product.

Let \(\vec x\text{,}\) \(\vec y\) and \(\vec u\) be vectors in \(\R^n\text{,}\) and let \(r\) be a scalar. Then

\(\displaystyle \vec x \cdot \vec y = \vec y\cdot \vec x\)
\(\displaystyle \vec u\cdot(\vec x+\vec y) = \vec u\cdot \vec x+\vec u\cdot \vec y\)
\(\displaystyle (\vec x+\vec y)\cdot\vec u = \vec x\cdot \vec u + \vec y\cdot \vec u\)
\(\displaystyle r(\vec x\cdot\vec y)= (r\vec x)\cdot \vec y = \vec x\cdot (r\vec y)\)
\(\vec x\cdot \vec x\ge 0\) with equality if and only if \(\vec x=\vec 0\)

Proof.

\(\displaystyle \vec x\cdot \vec y=\sum_{i=1}^n x_iy_i=\sum_{i=1}^n y_ix_i=\vec y\cdot\vec x\)
By direct evaluation

\begin{align*} \vec u\cdot(\vec x+\vec y) \amp = \sum_{i=1}^n u_i(x_i+y_i)\\ \amp = \sum_{i=1}^n (u_ix_i+u_iy_i)\\ \amp = \sum_{i=1}^n u_ix_i+\sum_{i=1}^nu_iy_i\\ \amp =\vec u\cdot \vec x+\vec u\cdot \vec y \end{align*}
Using (1) and (2),

\begin{equation*} (\vec x+\vec y)\cdot\vec u = \vec u\cdot(\vec x+\vec y) = \vec u\cdot \vec x+\vec u\cdot \vec y = \vec x\cdot \vec u + \vec y\cdot \vec u \end{equation*}
Evaluating each term:

\begin{equation*} r(\vec x\cdot\vec y)= r\sum_{i=1}^n x_iy_i \\ (r\vec x)\cdot \vec y = \sum_{i=1}^n (rx_i)y_i = \sum_{i=1}^n r(x_iy_i)= r\sum_{i=1}^n x_iy_i \\ \vec x\cdot (r\vec y)= \sum_{i=1}^n x_i(ry_i) = \sum_{i=1}^n r(x_iy_i)= r\sum_{i=1}^n x_iy_i \end{equation*}
\(\vec x\cdot \vec x=\sum_{i=1}^n x_i^2\text{,}\) and since \(x_i^2\ge0\) for any real number, we have \(\vec x\cdot \vec x\ge0\text{.}\) For the value to actually be \(0\text{,}\) we must have each \(x_i^2=0\) and hence \(x_i=0\) for \(i=1,\ldots,n\text{.}\) In this case \(\vec x=\vec0\text{.}\)

Checkpoint 5.2.4.

Show that \(\vec x\cdot \vec0=0\) for any vector \(\vec x\text{.}\)

Subsection 5.2.2 The dot product and the length of a vector

In \(\R^2\) the length of a vector is its distance to the origin and so for the vector \(\vec x=(x,y)\text{,}\) the Pythagorean theorem is used to compute the distance \(d=\sqrt{x^2+y^2}\text{.}\)

Figure 5.2.5.

This gives us the following definition.

Definition 5.2.6. Length of a vector in \(\R^2\).

If \(\vec x=(x,y)\) is a vector in \(\R^2\text{,}\) then the length of \(\vec x\) is

\begin{equation*} \|\vec x\|=\sqrt{x^2+y^2} \end{equation*}

The situation is similar in \(\R^3\text{.}\) As in \(\R^2\text{,}\) we get \(d'=\sqrt{x^2+y^2}\text{.}\) Applying the Pythagorean theorem again, we get \(d^2={d'}^2+|z|^2\text{,}\) which gives us length in \(\R^3\)

Figure 5.2.7.

Definition 5.2.8. Length of a vector in \(\R^3\).

If \(\vec x=(x,y,z)\) is a vector in \(\R^3\text{,}\) then the length of \(\vec x\) is

\begin{equation*} \|\vec x\|=\sqrt{x^2+y^2+z^2} \end{equation*}

Proposition 5.2.9. For any vector \(\vec x\text{,}\) we have \(\|\vec x\|^2=\vec x\cdot \vec x\).

If \(\vec x\) is a vector in \(\R^2\) or \(\R^3\text{,}\) then

\begin{equation*} \vec x\cdot \vec x=\|\vec x\|^2 \end{equation*}

We generalize the idea of length to \(\R^n\) in the most straightforward way:

Definition 5.2.10. The length of a vector in \(\R^n\).

Let \(\vec x\) be a vector in \(\R^n\text{.}\) Then the length of \(\vec x\) is defined by

\begin{equation*} \|\vec x\|^2=\vec x\cdot\vec x \end{equation*}

or, equivalently,

\begin{equation*} \|\vec x\|=\sqrt{\sum_{i=1}^n x_i^2}. \end{equation*}

Note that Proposition 5.2.3 ensures that there will indeed be a nonnegative real square root of \(\vec x\cdot\vec x\text{.}\)

Vectors with length one are often important and are given a special name.

Definition 5.2.11. Unit vectors in \(\R^n\).

A unit vector is one with length one.

It is clear the \(\vec x\) is a unit vector if and only if \(\|\vec x\|^2=\vec x\cdot\vec x=1\text{.}\)

Proposition 5.2.12. Unit vector construction.

Let \(\vec x\) be any nonzero vector in \(\R^n\text{.}\) Then \(\frac{\vec x}{\|\vec x\|}\) is a unit vector.

Proof.

\begin{equation*} \frac{\vec x}{\|\vec x\|} \cdot \frac{\vec x}{\|\vec x\|} = \frac{\vec x \cdot \vec x}{\|\vec x\|^2} = \frac{\|\vec x\|^2}{\|\vec x\|^2} =1 \end{equation*}

Subsection 5.2.3 The dot product and the angle between vectors in \(\R^n\)

We start the discussion of angles by considering \(\R^2\text{.}\) We take two nonzero vectors \(\vec x\) and \(\vec y\) and connect them by a line with \(\vec0\text{.}\) The angle between these vectors is the angle \(\theta\) between these lines.

Figure 5.2.13.

We use the law of cosines (Theorem 8.3.2) with \(a=\|\vec x\|\text{,}\) \(b=\|\vec y\|\) and \(c\) equal to the distance from \(\vec x\) to \(\vec y\text{:}\)

\begin{equation*} (x_1-y_1)^2+(x_2-y_2)^2 =(x_1^2+x_2^2) +(y_1^2+y_2^2) - 2 \|\vec x\| \|\vec y\| \cos\theta \\ x_1y_1+x_2y_2 = \|\vec x\| \|\vec y\| \cos\theta \\ \vec x\cdot\vec y = \|\vec x\| \|\vec y\| \cos\theta \end{equation*}

which gives us the following:

Proposition 5.2.14.

If \(\vec x\) and \(\vec y\) are nonzero vectors in \(\R^2\text{,}\) then

\begin{equation*} \vec x\cdot\vec y = \|\vec x\| \|\vec y\| \cos\theta. \end{equation*}

Checkpoint 5.2.15.

If \(\vec x\) and \(\vec y\) are nonzero vectors in \(\R^3\text{,}\) then

\begin{equation*} \vec x\cdot\vec y = \|\vec x\| \|\vec y\| \cos\theta. \end{equation*}

Solution

Use the triangle in \(\R^3\) with vertices \(\vec x\text{,}\) \(\vec y\) and \(\vec 0\) and apply the law of cosines.

Observation 5.2.16.

We have assumed the vectors \(\vec x\) and \(\vec y\) to be nonzero. If \(\vec x=\vec0\text{,}\) then \(\vec x\cdot\vec y=0\) and \(\|\vec x\|=0\text{,}\) and so \(\vec x\cdot\vec y = \|\vec x\| \|\vec y\| \cos\theta\) is still the valid (but perhaps uninteresting) equation \(0=0\text{.}\)
If \(\vec x\) and \(\vec y\) are both nonzero, then

\begin{equation*} \frac{\vec x\cdot\vec y}{\|\vec x\| \|\vec y\|}=\cos\theta \end{equation*}

and so

\begin{equation*} -1 \leq \frac{\vec x\cdot\vec y}{\|\vec x\| \|\vec y\|} \leq 1. \end{equation*}

Taking absolute values,

\begin{equation*} | \vec x\cdot\vec y| \leq \|\vec x\| \|\vec y\|. \end{equation*}

This inequality is called the Cauchy-Schwarz inequality.

Subsection 5.2.4 The Cauchy-Schwarz inequality

The following theorem is the key that allows many geometric concepts from \(\R^2\) and \(\R^3\) to be extended to \(\R^m\text{.}\)

Theorem 5.2.17. Cauchy-Schwarz.

If \(\vec x\) and \(\vec y\) are vectors in \(\R^n\text{,}\) then

\begin{equation*} | \vec x\cdot\vec y| \leq \|\vec x\| \|\vec y\|. \end{equation*}

Proof using elementary algebra.

This proof makes clever use of the quadratic formula for the solution of quadratic equations. We take a real number \(x\) and let it act as a variable in an equation.

\begin{equation*} \begin{array}{rl} 0 \amp \le (x\vec u+\vec v)\cdot(x\vec u+\vec v)\\ \amp = (x\vec u+\vec v)\cdot x\vec u + (x\vec u+\vec v)\cdot \vec v\\ \amp = x\vec u\cdot x\vec u + \vec v \cdot x\vec u +x\vec u\cdot \vec v+ \vec v\cdot \vec v\\ \amp = (\vec u\cdot\vec u) x^2 +2(\vec u\cdot \vec v) x+\vec v\cdot \vec v\\ \amp = \|\vec u\|^2 x^2 +2(\vec u\cdot \vec v) x+\|\vec v\|^2\\ \amp= ax^2+bx+c \end{array} \end{equation*}

where \(a=\|\vec u\|^2\text{,}\) \(b=2(\vec u\cdot\vec v)\) and \(c=\|\vec v\|^2\text{.}\) This means that the graph of \(ax^2+bx+c\) is a parabola that is never below the \(x\)-axis. In particular, this means that there can not be two real roots. Now the two roots of a quadratic equation are

\begin{equation*} x=\frac{-b\pm\sqrt{b^2-4ac}}{2a} \end{equation*}

and so the polynomial will have two real roots if \(b^2-4ac>0\text{.}\) This is exactly what can not happen, and so \(b^2-4ac\le0\text{,}\) or \(b^2\le4ac\text{.}\) Using our known values of \(a\text{,}\) \(b\) and \(c\text{,}\) we get

\begin{equation*} 4(\vec u\cdot\vec v)^2\le 4\|\vec u\|^2\|\vec v\|^2\\ (\vec u\cdot\vec v)^2\le\|\vec u\|^2\|\vec v\|^2\\ |\vec u\cdot\vec v|\le\|\vec u\|\,\|\vec v\|\\ \end{equation*}

Proof using Proposition 5.2.3.

First case: \(\vec x=\vec0\) or \(\vec y=\vec0\).

In this case \(\vec x\cdot\vec y=0\) and \(\|\vec x\| \|\vec y\|=0\text{,}\) and so the inequality reduces to \(0\leq 0\text{.}\)

Second case: \(\|\vec x\|=\|\vec y\|=1\).

Using Proposition 5.2.3,

\begin{align*} 0\amp \leq \|(\vec x+\vec y)\|^2 \\ \amp = (\vec x+\vec y)\cdot(\vec x+\vec y) \\ \amp =(\vec x+\vec y)\cdot\vec x+ (\vec x+\vec y)\cdot\vec y\\ \amp =\vec x \cdot \vec x+ \vec y\cdot \vec x + \vec x\cdot \vec y +\vec y\cdot\vec y\\ \amp =\|\vec x\|^2 +2(\vec x\cdot \vec y) + \|\vec y\|^2\\ \amp =2(\vec x\cdot \vec y) +2 \end{align*}

and so

\begin{equation*} -1\leq \vec x\cdot \vec y\text{.} \end{equation*}

Similarly

\begin{equation*} 0\leq(\vec x-\vec y)\cdot(\vec x-\vec y) =-2(\vec x\cdot \vec y) +2 \end{equation*}

and

\begin{equation*} \vec x\cdot \vec y \leq 1\text{.} \end{equation*}

These inequalities imply

\begin{equation*} |\vec x\cdot \vec y| \leq 1=\|\vec x\|\, \|\vec y\|\text{.} \end{equation*}

Third case: \(\|\vec x\|\neq 0\) and \(\|\vec y\|\neq0\).

As seen in Proposition 5.2.12, if \(\vec u=\frac{\vec x}{\|\vec x\|}\) and \(\vec v=\frac{\vec y}{\|\vec y\|}\text{,}\) then both \(\vec u\) and \(\vec v\) are unit vectors, and, as such, the second case is applicable. This means

\begin{equation*} |\vec u\cdot \vec v| \leq \|\vec u\| \|\vec v\| \end{equation*}

and so

\begin{equation*} |\vec u\cdot\vec v|= \left|\frac{\vec x}{\|\vec x\|} \cdot \frac{\vec y}{\|\vec y\|}\right| = \frac1{\|\vec x\|}\,\frac1{\|\vec y\|} |\vec x\cdot\vec y| \leq 1 \end{equation*}

which implies

\begin{equation*} |\vec x \cdot \vec y| \leq \|\vec x\| \|\vec y\|\text{.} \end{equation*}

The Cauchy-Schwarz theorem gives us further results about the length of vectors in \(\R^n\text{.}\)

Theorem 5.2.18. Properties of lengths of vectors in \(\R^n\).

Suppose that \(\vec x=(x_1,x_2,\ldots,x_n)\) and \(\vec y=(y_1,y_2,\ldots,y_n)\) are vectors in \(\R^n\text{,}\) and that \(r\) is any real number. Then

\(\|\vec x\| \ge 0\) with equality if and only if \(\vec x=\vec0.\)
\(\displaystyle \|r\vec x\|=|r| \|\vec x\|\)
\(|\vec x\cdot\vec y|\le\|\vec x\|\,\|\vec y\|\) (Cauchy-Schwarz inequality)
\(\|\vec x+\vec y\|^2 + \|\vec x-\vec y\|^2 = 2(\|\vec x\|^2 +\|\vec y\|^2)\) (Parallelogram equality)
\(\|\vec x+\vec y\| \le \|\vec x\|+\|\vec y\|\) (Triangle inequality)

Proof.

Since \(\|\vec x\|= \vec x\cdot\vec x\text{,}\) this result is contained in Proposition 5.2.3.
From direct evaluation:

\begin{equation*} \begin{array}{rl} \|r\vec x\|^2 \amp=\|r(x_1,x_2,\ldots,x_n)\|^2\\ \amp=\|(rx_1,rx_2,\ldots,rx_n)\|^2\\ \amp=r^2x_1^2+r^2x_2^2+\cdots+r^2x_n^2\\ \amp=r^2(x_1^2+x_2^2+\cdots+x_n^2) \end{array} \end{equation*}

and so by taking square roots, \(\|r\vec x\|=|r| \|\vec x\|\text{.}\)
This is just Theorem 5.2.17
Again, a direct evaluation:

\begin{equation*} \begin{array}{rl} \|\vec x+\vec y\|^2 + \|\vec x-\vec y\|^2 \amp= (\vec x+\vec y) \cdot (\vec x+\vec y) +(\vec x-\vec y) \cdot (\vec x-\vec y)\\ \amp= (\vec x\cdot\vec x + \vec x\cdot\vec y +\vec y\cdot\vec x +\vec y\cdot\vec y)\\ \amp \phantom{=} + (\vec x\cdot\vec x - \vec x\cdot\vec y -\vec y\cdot\vec x +\vec y\cdot\vec y)\\ \amp=2(\|\vec x\|^2 +\|\vec y\|^2) \end{array} \end{equation*}
The triangle inequality is a consequence of the Cauchy-Schwarz inequality:

\begin{equation*} \begin{array}{rll} \|\vec x+\vec y\|^2 \amp= (\vec x+\vec y)\cdot(\vec x+\vec y)\\ \amp= \vec x\cdot\vec x+2(\vec x\cdot\vec y)+\vec y\cdot\vec y\\ \amp= \|\vec x\|^2 +2(\vec x\cdot\vec y)+\|\vec y\|^2\\ \amp\le \|\vec x\|^2 +2|\vec x\cdot\vec y|+\|\vec y\|^2 \amp\amp\gets r\le|r|\text{ used here.}\\ \amp\le \|\vec x\|^2 +2\|\vec x\|\|\vec y\|+\|\vec y\|^2 \amp\amp\gets \text{ Cauchy-Schwarz used here.}\\ \amp= (\|\vec x\|+\|\vec y\|)^2 \end{array} \end{equation*}

The parallelogram equality has a nice geometric interpretation in \(\R^2\text{.}\) Consider the four sides and two diagonals of a parallelogram.

Figure 5.2.19.

The parallelogram equality says \(\|\vec x+\vec y\|^2 + \|\vec x-\vec y\|^2 = 2(\|\vec x\|^2 +\|\vec y\|^2)\text{.}\) This implies that the sum of the squares of the lengths of the diagonals is equal to the sum of the squares of the length of the four sides.

Subsection 5.2.5 Computing angles between vectors in \(\R^n\)

The angles between vectors is pretty easy to visualize in \(\R^2\) or \(\R^3\text{,}\) but we lose our nice geometric interpretation of vectors in \(\R^n\text{.}\) Nonetheless, there is something that still makes sense.

If either \(\vec x=\vec0\) or \(\vec y=\vec0\text{,}\) then the angle \(\theta\) between them is simply \(\theta=0\text{.}\)
On the other hand, if either \(\vec x\not=\vec0\) and \(\vec y\not=\vec0\text{,}\) then the Cauchy-Schwarz Theorem says

\begin{equation*} \frac{| \vec x\cdot\vec y |}{\|\vec x\| \|\vec y\|} \leq 1 \end{equation*}

which implies

\begin{equation*} -1\leq \frac{\vec x\cdot\vec y}{\|\vec x\| \|\vec y\|} \leq 1. \end{equation*}

The graph of \(\cos(x)\) descends from \(1\) to \(-1\) between \(x=0\) and \(x=\pi\text{,}\) and so there is one value \(\theta\) in that range so that

\begin{equation*} \frac{\vec x\cdot\vec y}{\|\vec x\| \|\vec y\|} =\cos \theta. \end{equation*}

That is the angle between \(\vec x\) and \(\vec y\text{.}\) (If you have studied the calculus, the existence and uniqueness of \(\theta\) follows from the intermediate value theorem applied to the continuous strictly decreasing function \(\cos x\text{.}\))

Defining \(\cos\theta\) in this way extends the result from \(\R^2\) to \(\R^n\text{.}\)

Theorem 5.2.20.

For any vectors \(\vec x\) and \(\vec y\) in \(\R^n\text{,}\)

\begin{equation*} \vec x\cdot \vec y= \|\vec x\| \|\vec y\| \cos\theta \end{equation*}

Example 5.2.21.

Let \(\vec x=(1,2,3,2,1)\) and \(\vec y= (1,-1,1,-1,1)\) be vectors in \(\R^5\text{.}\) Then

\begin{equation*} \frac{\vec x\cdot\vec y}{\|\vec x\| \|\vec y\|} = \frac 1{\sqrt{19}\sqrt5} \approx 0.102597835 =\cos \theta. \end{equation*}

and so \(\theta=1.4680\) radians (or \(84.1112^\circ\)).

Theorem 5.2.22.

If \(\theta\) is the angle between \(\vec x\) and \(\vec y\text{,}\) then

\(0\lt\theta\lt\frac\pi2\) if and only if \(\vec x\cdot\vec y \gt 0\) (so \(\theta\) is an acute angle)
\(\theta=\frac\pi2\) if and only if \(\vec x\cdot\vec y = 0\)
\(\frac\pi2\lt\theta\lt\pi\) if and only if \(\vec x\cdot\vec y \lt 0\) (so \(\theta\) is an obtuse angle)

Proof.

This is clear from the graph of \(\cos(x)\).

Definition 5.2.23. Orthogonal vectors in \(\R^n\).

Two vectors \(\vec x\) and \(\vec y\) are orthogonal (or perpendicular) if and only if \(\vec x\cdot\vec y=0\text{.}\)

Subsection 5.2.6 Distance between vectors in \(\R^n\)

The Cauchy-Schwartz theorem allowed us to define the angle between vectors in \(\R^n\text{.}\) This leads to the definition of the distance between vectors in a natural way. Consider the following figure, drawn in \(\R^2\) as the pattern for our discussion.

Figure 5.2.24.

The law of cosines then implies

\begin{align*} d^2 \amp= \|\vec x\|^2 + \|\vec y\|^2 -2 \|\vec x\| \|\vec y\| \cos\theta\\ \amp= \|\vec x\|^2 + \|\vec y\|^2 -2 (\vec x\cdot \vec y) \\ \amp= \vec x\cdot \vec x +\vec y\cdot \vec y -2 (\vec x\cdot \vec y) \\ \amp= \vec x\cdot \vec x -2 (\vec x\cdot \vec y) +\vec y\cdot \vec y \\ \amp= (\vec x - \vec y)\cdot (\vec x - \vec y)\\ \amp= \|\vec x - \vec y\|^2 \end{align*}

and so, if we denote the distance from \(\vec x\) to \(\vec y\) by \(d(\vec x,\vec y)\text{,}\) we have

\begin{equation*} d(\vec x,\vec y)= \sqrt{\|\vec x-\vec y\|^2}= \|\vec x-\vec y\| \end{equation*}

The figure is certainly valid for \(\R^2\) (and \(\R^3\)), and it motivates our general definition in \(\R^n\text{:}\)

Definition 5.2.25. Distance in \(\R^n\).

If \(\vec x\) and \(\vec y\) are in \(\R^n\text{,}\) then the distance between them is

\begin{equation*} d(\vec x,\vec y)= \|\vec x-\vec y\| \end{equation*}

Theorem 5.2.26. Properties of distance in \(\R^n\).

Let \(\vec x\text{,}\) \(\vec y\) and \(\vec z\) be vectors in \(\R^n\text{.}\) Then

\(d(\vec x,\vec y) \ge 0\) with equality if and only if \(\vec x=\vec y\)
\(\displaystyle d(\vec x,\vec y) =d(\vec y,\vec x)\)
\(d(\vec x,\vec z) \leq d(\vec x,\vec y)+d(\vec y,\vec z)\) (the triangle inequality)

Proof.

These results are easy adaptations from Theorem 5.2.18 using \(d(\vec x,\vec y)=\|\vec x-\vec y\|\text{.}\)

The triangle inequality is so named because the length of one side of a triangle is less than or equal to the sum of the lengths of the other two sides:

Figure 5.2.27.

Corollary 5.2.28. Equality in triangle inequality.

Suppose vectors \(\vec x\text{,}\) \(\vec y\) and \(\vec z\) satisfy \(d(\vec x,\vec z) = d(\vec x,\vec y)+d(\vec y,\vec z) \) and \(\theta \) is the angle between \(\vec x\) and \(\vec y \text{.}\) Then \(\theta=0\) or \(\theta=\pi.\)

Proof.

To get equality in the triangle inequality, we must have equality in the Cauchy-Schwarz inequality (see the proof of Theorem 5.2.17).

\begin{equation*} |\vec x\cdot \vec y|= \|\vec x\| \|\vec y\| \\ \|\vec x\| \|\vec y\|\, |\cos\theta| = \|\vec x\| \|\vec y\| \\ \cos\theta=\pm1\\ \theta = 0\textrm{ or } \pi \end{equation*}