Skip to main content

Section 4 Linear algebra in multivariable calculus

Subsection 4.1 Differentiability

A function \(f\colon \R^n\to \R^m\) is differentiable at the point \(\mathbf{x}_0\) if there exists a linear transformation \(L\colon \R^n\to \R^m\) such that
\begin{equation} \lim_{\mathbf{h}\to 0} \frac{f(\mathbf{x}_0+ \mathbf{h}) - f(\mathbf{x}_0)- L\mathbf{h}}{\|\mathbf{h}\|} = 0.\tag{4.1} \end{equation}
If \(L\) exists, it is called the derivative of \(f\) at \(\mathbf{x}_0\) denoted \(Df(\mathbf{x_0})\).
To understand the definition of the derivative, start with the case \(n=m=1\text{.}\) The derivative of \(f\) at \(x_0\) is a number \(f'(x_0)\) such that
\begin{equation*} f(x_0 + h) -f(x_0) \approx f'(x_0)h \end{equation*}
for \(h\) near \(0\text{.}\) The meaning of "approximately equals \(\ldots\) for \(h\) near \(0\)" is made precise by using a limit. To generalize to higher dimensions, interpret \(f'(x_0)h\) as the value of a linear transformation that sends \(h\) to \(f'(x_0)h\text{.}\) The derivative \(Df(\mathbf{x}_0)\) of \(f\) at \(\mathbf{x_0}\) is a linear transformation such that
\begin{equation*} f(\mathbf{x}_0 + \mathbf{h}) -f(\mathbf{x}_0) \approx Df(\mathbf{x}_0)\mathbf{h} \end{equation*}
for \(\mathbf{h}\) near \(\mathbf{0}\text{.}\) Putting \(\mathbf{h} = t\mathbf{e}_j\text{,}\) this reads
\begin{equation*} f(\mathbf{x}_0 + t\mathbf{e}_j) -f(\mathbf{x}_0) \approx Df(\mathbf{x}_0)t\mathbf{e}_j \end{equation*}
for \(t\) near \(0\text{.}\) Dividing both sides by \(t\) and taking a limit, we get an expression for \(Df(\mathbf{x}_0)\mathbf{e}_j\text{.}\)
\begin{equation} Df(\mathbf{x}_0)\mathbf{e}_j = \lim_{t\to 0} \frac{f(\mathbf{x}_0 + t\mathbf{e}_j) -f(\mathbf{x}_0)}{t} = \left(\frac{\partial y_1}{\partial x_j}, \frac{\partial y_2}{\partial x_j},\ldots, \frac{\partial y_m}{\partial x_j}\right)\tag{4.2} \end{equation}
where \(\mathbf{y}=(y_1,y_2,\ldots,y_m) = f(x_1,x_2,\ldots,x_n) = f(\mathbf{x})\text{.}\) From this it follows that \(Df(\mathbf{x}_0)\text{,}\) if it exists, is represented by the matrix \(\left[\frac{\partial y_i}{\partial x_j}\right]\text{.}\)
\begin{equation} [Df(\mathbf{x}_0)] = \left[\frac{\partial y_i}{\partial x_j}\right]\tag{4.3} \end{equation}

Subsection 4.2 The Chain Rule

Consider the composition of functions
\begin{equation*} \R^p \stackrel{g}{\longrightarrow} \R^n \stackrel{f}{\longrightarrow} \R^m \end{equation*}
and suppose \(g\) is differentiable at \(\mathbf{t}_0\) and \(f\) is differentiable at \(\mathbf{x}_0=g(\mathbf{t}_0)\text{.}\) The chain rule says that \(f\circ g\) is differentiable at \(\mathbf{t}_0\text{,}\) and that the derivative of the composition is the composition of the derivatives.
\begin{equation} D(f\circ g)(\mathbf{t}_0) = Df(\mathbf{x}_0) Dg(\mathbf{t}_0)\tag{4.4} \end{equation}
This explains the "tree diagram rule" given in most multivariate calculus texts. The partial derivative \(\frac{\partial y_i}{\partial t_j}\) is just the \(i,j\) entry of the product of the derivative matrices for \(f\) and \(g\text{.}\)
\begin{equation} \frac{\partial y_i}{\partial t_j} = \sum_{k=1}^n \frac{\partial y_i}{\partial x_k}\frac{\partial x_k}{\partial t_j}\tag{4.5} \end{equation}

Exercises 4.3 Exercises

1.

Verify the definition of differentiable function (4.1) given in is equivalent to the usual definition for \(n=m=1\) from Calculus 1.

2.

Explain equation (4.2). Why does the limit on the left equal the vector on the right?
\begin{equation*} \lim_{t\to 0} \frac{f(\mathbf{x}_0 + t\mathbf{e}_j) -f(\mathbf{x}_0)}{t}= \left(\frac{\partial y_1}{\partial x_j}, \frac{\partial y_2}{\partial x_j},\ldots, \frac{\partial y_m}{\partial x_j}\right) \end{equation*}

3.

Explain equation (4.3). How does this equation follow from the previous?

4.

Explain equation (4.5). How is it the same as the chain rule (4.4)?