Calculus/The chain rule and Clairaut's theorem

From testwiki
Jump to navigation Jump to search

Template:Calculus/Top Nav

Theorem

Let f:ℝmℝn be a function and let g:ℝnℝl be another function. Assume that g is differentiable at x0 and that f is differentiable at g(x0) .

Then fg is differentiable at x0 and

(fg)(x0)=f(g(x0))g(x0)
Proof

We prove that f(g(x0))g(x0) is a valid differential of fg , thereby proving differentiability.

We begin by noting from the second triangle inequality, that

|g(x0+𝐑)g(x0)𝐑g(x0)𝐑𝐑|g(x0+𝐑)[g(x0)+g(x0)𝐑]𝐑0,𝐑0

and hence the boundedness of

g(x0)𝐑𝐑mnmax\limits 1in1jm|ai,j|𝐑𝐑

implies that of

g(x0+𝐑)g(x0)𝐑

where A=(ai,j)1in1jm is the matrix of g(x0) .

Now we note by the triangle inequality, that

(fg)(x0+𝐑)[(fg)(x0)+f(g(x0))g(x0)𝐑]𝐑f(g(x0+h))[f(g(x0))+f(g(x0))[g(x0+h)g(x0)]]𝐑+f(g(x0))+f(g(x0))[g(x0+h)g(x0)][(fg)(x0)+f(g(x0))g(x0)𝐑]𝐑

We shall first treat the first summand, which is more difficult, but not so difficult still. We rewrite it as

f(g(x0+h))[f(g(x0))+f(g(x0))[g(x0+h)g(x0)]]g(x0+h)g(x0)g(x0+h)g(x0)𝐑

The latter factor is bounded due to the above considerations, and the first one converges to 0 as h0 (and thus g(x0+h)g(x0) due to the same boundedness (multiply with h); in fact, differentiability thus implies continuity).

Now for the second summand, which, by elementary cancellation and linearity of differentials, equals

f(g(x0))[[g(x0+h)g(x0)]g(x0)𝐑]𝐑mnmax1il1jn|bi,j|[g(x0+h)g(x0)]g(x0)𝐑𝐑

where B=(bi,j)1il1jn is the matrix of the differential of f . This goes to 0 as 𝐑0 due to the definition of the differential of f.

The gradient

The first application of the chain rule that we shall present has something to do with a thing called gradient, which is defined for functions f:ℝnℝ, that is, the image is one-dimensional (in the special case n=2 these functions look like "mountains" of a function on the plane ℝ2).

Definition

Let f:ℝnℝ be differentiable. Then the column vector

f(x):=(fx1fxn)

is called the gradient.

Theorem:

Let f,g:ℝnℝ be two functions totally differentiable at x0. Since they both map to ℝ, their product is defined, and we have

(fg)=fg+gf

Proof:

Now one could compute this directly from the definition of the gradient and the usual one-dimensional product rule (which actually has the merit of not requiring total differentiability), but there is a clever trick using the chain rule, which I found in Terence Tao's lecture notes, on which I based my repetition of this part of mathematics.

We simply define h:ℝnℝ2,h(x):=(f(x),g(x)) and i:ℝ2ℝ,i(x,y):=xy. Then the function f×g equals ih. Now the differential of i is given by the Jacobian matrix

Ji(x,y)=(y x)

and the differential of h is given by the Jacobian matrix

Jh(x)=(fx1(x)fxn(x)gx1(x)gxn(x))

Hence, the product rule implies that the differential of ih at x is given by

(g(x)f(x))(fx1(x)fxn(x)gx1(x)gxn(x))

and from the definition of the gradient we see that the differential is nothing but the transpose of the gradient (and vice versa, as taking transpose is idempotent).

Now we shall use the chain rule to generalize a well-known theorem from one dimension, the mean value theorem, to several dimensions.

Theorem:

Let f:ℝnℝ be totally differentiable, and let abℝn. Then there exists t[0,1] such that

f(b)f(a)=ba,f(tb+(1t)a)

where , is the standard scalar product on ℝn.

Proof:

This is actually a straightforward application of the chain rule.

We set

g(λ):=f((1λ)a+λb)

thus g(0)=f(a) and g(1)=f(b). By the one-dimensional mean-value theorem,

g(1)g(0)=g(t)

for a suitable t[0,1]. Now by the chain rule,

g(t)=ba,f(tb+(1t)a).

Clairaut's theorem

The next theorem shows that the order of differentiation does not matter, provided that the considered function is sufficiently differentiable. We will not need the general chain rule or any of its consequences during the course of the proof, but we will use the one-dimensional mean-value theorem.

Template:Theorem

Proof:

We begin with the following lemma:

Lemma:

jif(x)=limδ0f(x)f(x+δei)f(x+δej)+f(x+δ(ei+ej))δ2

Proof: We first apply the fundamental theorem of calculus to obtain that the above limit equals

limδ01δ2(0δif(x+sei)ds0δif(x+δej+sei)ds)

Using integration by substitution and linearity of the integral, we may rewrite this as

limδ01δ01(if(x+δsei)if(x+δej+δsei))ds

Now we apply the mean value theorem in one variable to obtain

if(x+δsei)if(x+δej+δsei)=δjif(x+δsei+tδej)

for a suitable tδ[0,δ]. Hence, the above limit equals

limδ001jif(x+δsei+tδej)ds

This is the average of jif over a certain subset of Bδ(x) and therefore converges to jif(x) by the continuity of jif (you can prove this rigorously by using

jif(x)=01jif(x)ds

and subtracting the integrals and applying the triangle inequality for integrals).

Now the expression of the lemma is totally symmetric in i and j, which is why Clairaut's theorem follows.

Template:BookCat