Skip to main content

Section 4 Linear algebra in multivariable calculus

Subsection 4.1 Differentiability

A function \(f\colon \R^n\to \R^m\) is differentiable at the point \(\mathbf{x}_0\) if there exists a linear transformation \(L\colon \R^n\to \R^m\) such that

\begin{equation} \lim_{\mathbf{h}\to 0} \frac{f(\mathbf{x}_0+ \mathbf{h}) - f(\mathbf{x}_0)- L\mathbf{h}}{\|\mathbf{h}\|} = 0.\label{diffabilitydef}\tag{4.1} \end{equation}

If \(L\) exists, it is called the derivative of \(f\) at \(\mathbf{x}_0\) denoted \(Df(\mathbf{x_0})\).

To understand the definition of the derivative, start with the case \(n=m=1\text{.}\) The derivative of \(f\) at \(x_0\) is a number \(f'(x_0)\) such that

\begin{equation*} f(x_0 + h) -f(x_0) \approx f'(x_0)h \end{equation*}

for \(h\) near \(0\text{.}\) The meaning of "approximately equals \(\ldots\) for \(h\) near \(0\)" is made precise by using a limit. To generalize to higher dimensions, interpret \(f'(x_0)h\) as the value of a linear transformation that sends \(h\) to \(f'(x_0)h\text{.}\) The derivative \(Df(\mathbf{x}_0)\) of \(f\) at \(\mathbf{x_0}\) is a linear transformation such that

\begin{equation*} f(\mathbf{x}_0 + \mathbf{h}) -f(\mathbf{x}_0) \approx Df(\mathbf{x}_0)\mathbf{h} \end{equation*}

for \(\mathbf{h}\) near \(\mathbf{0}\text{.}\) Putting \(\mathbf{h} = t\mathbf{e}_j\text{,}\) this reads

\begin{equation*} f(\mathbf{x}_0 + t\mathbf{e}_j) -f(\mathbf{x}_0) \approx Df(\mathbf{x}_0)t\mathbf{e}_j \end{equation*}

for \(t\) near \(0\text{.}\) Dividing both sides by \(t\) and taking a limit, we get an expression for \(Df(\mathbf{x}_0)\mathbf{e}_j\text{.}\)

\begin{equation} Df(\mathbf{x}_0)\mathbf{e}_j = \lim_{t\to 0} \frac{f(\mathbf{x}_0 + t\mathbf{e}_j) -f(\mathbf{x}_0)}{t} = \left(\frac{\partial y_1}{\partial x_j}, \frac{\partial y_2}{\partial x_j},\ldots, \frac{\partial y_m}{\partial x_j}\right)\label{dfofej}\tag{4.2} \end{equation}

where \(\mathbf{y}=(y_1,y_2,\ldots,y_m) = f(x_1,x_2,\ldots,x_n) = f(\mathbf{x})\text{.}\) From this it follows that \(Df(\mathbf{x}_0)\text{,}\) if it exists, is represented by the matrix \(\left[\frac{\partial y_i}{\partial x_j}\right]\text{.}\)

\begin{equation} [Df(\mathbf{x}_0)] = \left[\frac{\partial y_i}{\partial x_j}\right]\label{dfmat}\tag{4.3} \end{equation}

Subsection 4.2 The Chain Rule

Consider the composition of functions

\begin{equation*} \R^p \stackrel{g}{\longrightarrow} \R^n \stackrel{f}{\longrightarrow} \R^m \end{equation*}

and suppose \(g\) is differentiable at \(\mathbf{t}_0\) and \(f\) is differentiable at \(\mathbf{x}_0=g(\mathbf{t}_0)\text{.}\) The chain rule says that \(f\circ g\) is differentiable at \(\mathbf{t}_0\text{,}\) and that the derivative of the composition is the composition of the derivatives.

\begin{equation} D(f\circ g)(\mathbf{t}_0) = Df(\mathbf{x}_0) Dg(\mathbf{t}_0)\label{chainrule}\tag{4.4} \end{equation}

This explains the "tree diagram rule" given in most multivariate calculus texts. The partial derivative \(\frac{\partial y_i}{\partial t_j}\) is just the \(i,j\) entry of the product of the derivative matrices for \(f\) and \(g\text{.}\)

\begin{equation} \frac{\partial y_i}{\partial t_j} = \sum_{k=1}^n \frac{\partial y_i}{\partial x_k}\frac{\partial x_k}{\partial t_j}\label{chainrulematrixmult}\tag{4.5} \end{equation}

Exercises 4.3 Exercises


Verify the definition of differentiable function (4.1) given in is equivalent to the usual definition for \(n=m=1\) from Calculus 1.


Explain equation (4.2). Why does the limit on the left equal the vector on the right?

\begin{equation*} \lim_{t\to 0} \frac{f(\mathbf{x}_0 + t\mathbf{e}_j) -f(\mathbf{x}_0)}{t}= \left(\frac{\partial y_1}{\partial x_j}, \frac{\partial y_2}{\partial x_j},\ldots, \frac{\partial y_m}{\partial x_j}\right) \end{equation*}

Explain equation (4.3). How does this equation follow from the previous?


Explain equation (4.5). How is it the same as the chain rule (4.4)?