Examples and counterexamples in mathematics/Real-valued functions of one real variable

From testwiki
Jump to navigation Jump to search

Polynomials

Template:Wikipedia-inline

Polynomial with infinitely many roots

The zero polynomial P(x)=0; every number is a root of P. This is the only polynomial with infinitely many roots. A non-zero polynomial is of some degree n (n may be 0,1,2,...) and cannot have more than n roots since, by a well-known theorem of algebra, if P(a1)==P(am)=0 (for pairwise different a1,,am), then necessarily P(x)=(xa1)(xam)Q(x) for some non-zero polynomial Q of degree nm0.

Integer values versus integer coefficients

Template:Wikipedia-inline Every polynomial P with integer coefficients is integer-valued, that is, its value P(k) is an integer for every integer k; but the converse is true only for first degree polynomials (linear functions). For example, the polynomial P2(x)=12x212x=12x(x1) takes on integer values whenever x is an integer. That is because one of x and x-1 must be an even number. The values P2(k)=(k2) are the binomial coefficients. Template:Wikipedia-inline More generally, for every n=0,1,2,3,... the polynomial Pn(x)=1n!x(x1)(xn+1) is integer-valued; Pn(k)=(kn) are the binomial coefficients. In fact, every integer-valued polynomial is an integer linear combination of these Pn.

Polynomial mimics cosine: interpolation

Cosine: notable values
Cosine: notable values

The cosine function, f(x)=cosx, satisfies f(0)=1, f(π/6)=3/2, f(π/4)=2/2, f(π/3)=1/2, f(π/2)=0; also f(x)=f(x) and f(x+π)=f(x) for all x, which gives infinitely many x such that f(x) is one of the numbers ±1,±3/2,±2/2,±1/2,0; that is, infinitely many points on the graph. A polynomial cannot satisfy all these conditions, because |P(x)| as x± for every non-constant polynomial P; can it satisfy a finite portion of them?

Let us try to find P that satisfies the five conditions P(0)=1, P(π/6)=3/2, P(π/4)=2/2, P(π/3)=1/2, P(π/2)=0. For convenience we rescale x letting P(x)=Q(12πx) and rewrite the five conditions in terms of Q: Q(0)=1, Q(2)=3/2, Q(3)=2/2, Q(4)=1/2, Q(6)=0. In order to find such Q we use Lagrange polynomials. Template:Wikipedia-inline

Lagrange interpolation polynomial
Lagrange interpolation polynomial

Using the polynomial (x)=x(x2)(x3)(x4)(x6) of degree 5 with roots at the given points 0, 2, 3, 4, 6 and taking into account that (0)=(2)(3)(4)(6)=144 (check it by differentiating the product) we consider the so-called Lagrange basis polynomial (x)(0)x=1144(x2)(x3)(x4)(x6) of degree 4 with roots at 2, 3, 4, 6 (but not 0); the division in the left-hand side is interpreted algebraically as division of polynomials (that is, finding a polynomial whose product by the denominator is the numerator (x)) rather than division of functions, thus, the quotient is defined for all x, including 0. Its value at 0 is 1. Think, why; see the picture; recall that (0)=limx0(x)x.

Similarly, the second Lagrange basis polynomial (x)(2)(x2)=116x(x3)(x4)(x6) takes the values 0, 1, 0, 0, 0 (at 0, 2, 3, 4, 6 respectively). The third one (x)(3)(x3)=19x(x2)(x4)(x6) takes the values 0, 0, 1, 0, 0. And so on (calculate the fourth and fifth). It remains to combine these five Lagrange basis polynomials with the coefficients equal to the required values of Q:

Q(x)=Q(0)(x)(0)x+Q(2)(x)(2)(x2)+Q(3)(x)(3)(x3)+Q(4)(x)(4)(x4)+Q(6)(x)(6)(x6)=1144(x2)(x3)(x4)(x6)32116x(x3)(x4)(x6)++2219x(x2)(x4)(x6)12116x(x2)(x3)(x6).
Polynomial approximation to cosine
Polynomial approximation to cosine

Finally, cosxP(x)=Q(12πx). As we see on the picture, the two functions are quite close for 0<x<0.5π; in fact, the greatest |P(x)cosx| for these x is about 0.00029.

A better approximation can be obtained via derivative. The derivative of f(x)=cosx being f(x)=sinx, we have f(0)=0, f(π/6)=1/2, f(π/4)=2/2, f(π/3)=3/2, f(π/2)=1. The corresponding derivatives P(x) are close but different; for instance,

P(π2)=12πQ(6)=12π(114443232116632+221964212116643)=2π2732π+322π27π0.99561.

In order to fix the derivative without spoiling the values we replace Q(x) with Q(x)+(x)R(x) where R(x) is a polynomial of degree 4 such that the derivative of Q(12πx)+(12πx)R(12πx) is equal to f(x) for x=0,π6,π4,π3,π2; it means, 12πQ(12πx)+12π(12πx)R(12πx)=f(x), since (12πx)=0 for these x; so,

R(x)=π12f(π12x)Q(x)(x) for x=0,2,3,4,6.

We find such R as before:

R(x)=R(0)(x)(0)x+R(2)(x)(2)(x2)+R(3)(x)(3)(x3)+R(4)(x)(4)(x4)+R(6)(x)(6)(x6),
Polynomial approximation to cosine
Polynomial approximation to cosine

and get a better approximation P2(x)=P(x)+(12πx)R(12πx); in fact, max0<x<π/2|P2(x)cosx|4.21010. If you still want a smaller error, try second derivative and Q(x)+(x)R(x)+2(x)S(x).

Polynomial mimics cosine: roots

The cosine function, f(x)=cosx, satisfies f(0)=1 and has infinitely many roots: f(±0.5π)=f(±1.5π)=f(±2.5π)==0. A polynomial cannot satisfy all these conditions; can it satisfy a finite portion of them?

It is easy to find a polynomial P such that P(0)=1 and P(±0.5π)=0, namely P1(x)=4π2(x20.25π2) (check it). What about P(0)=1 and P(±0.5π)=P(±1.5π)=0?

The conditions being insensitive to the sign of x, we seek a polynomial of x2, that is, P(x)=Q(x2) where Q satisfies Q(0)=1 and Q(0.25π2)=Q(2.25π2)=0. It is easy to find such Q, namely, Q(x)=169π4(x0.25π2)(x2.25π2) (check it), which leads to

Polynomial approximation to cosine
Polynomial approximation to cosine

P2(x)=169π4(x20.25π2)(x22.25π2). As we see on the picture, the two functions are rather close for 0.5π<x<0.5π; in fact, the greatest |P2(x)cosx| for these x is about 0.028, while the greatest |P1(x)cosx| (for these x) is about 0.056.

Polynomial approximation to cosine
Polynomial approximation to cosine

The next step in this direction: P3(x)=64225π6(x20.25π2)(x22.25π2)(x26.25π2) =(14x2π2)(14x29π2)(14x225π2); max0.5π<x<0.5π|P3(x)cosx|0.019.

And so on. For every n=1,2, the polynomial

Pn(x)=(14x2π2)(14x29π2)(14x2(2n1)2π2)=k=1n(14x2(2k1)2π2)

satisfies Pn(0)=1 and Pn(±0.5π)=Pn(±1.5π)==Pn(±(n0.5)π)=0, which is easy to check. It is harder (but possible) to prove that Pn(x)cosx as n, which represents the cosine as an infinite product

cosx=(14x2π2)(14x29π2)=k=1(14x2(2k1)2π2).

Template:Wikipedia-inline

Polynomial approximation to cosine
Polynomial approximation to cosine

On the other hand, the well-known power series cosx=1x22+x424=k=0(1)kx2k(2k)! gives another sequence of polynomials Qn(x)=1x22+x424+(1)nx2n(2n)!=k=0n(1)kx2k(2k)! converging to the same cosine function. See the picture for Q3; max0.5π<x<0.5π|Q3(x)cosx|0.0009. Template:Wikipedia-inline Can we check the equality (14x2π2)(14x29π2)=1x22+x424 by opening the brackets? Let us try. The constant coefficient: just 1=1. The coefficient of x2: 4π249π2=12, that is, 1+132+152+=π28; really? Yes, 1+132+152+=(1+122+132+)(122+142+162+)=(1+122+132+)122(1+122+132+)=34π26=π28; the well-known series of reciprocal squares is instrumental. Template:Wikipedia-inline Such non-rigorous opening of brackets can be made rigorous as follows. For every polynomial P, the constant coefficient is the value of P at zero, P(0); the coefficient of x is the value at zero of the derivative, P '(0); and the coefficient of x2 is one half of the value at zero of the second derivative, ½P''(0). Clearly, Qn(0)=1=f(0), Q'n(0)=0=f(0) and Q'n(0)=1=f(0) for all n1 (as before, f(x)=cosx). The calculation above shows that P'n(0)1=f(0) as n tends to infinity. What about higher derivative, Pn(k)(0), does it converge to f(k)(0)? It is tedious (if at all possible) to generalize the above calculation to k=4,6,...; fortunately, there is a better approach. Namely, Pn(z)f(z) for all complex numbers z, and moreover, max|z|R|Pn(z)f(z)|0 for every R>0. Using Cauchy's differentiation formula one concludes that max|z|R|Pn(k)(z)f(k)(z)|0 (as n) for each k, and in particular, Pn(k)(0)f(k)(0). Template:Wikipedia-inline   See also Derivative of Uniform Limit of Analytic Functions at Proofwiki.

Limit of derivatives versus derivative of limit

Pn(x)=x(1x2)n.

For 1x1 we have Pn(x)0 (think, why) as n. Nevertheless, the derivative at zero does not converge to 0; rather, it is equal to 1 (for all n) since, denoting Qn(x)=(1x2)n, we have P'n(x)=(xQn(x))=1Qn(x)+xQ'n(x); P'n(0)=Qn(0)=1.

Derivative of limit
Derivative of limit

Thus, the limit of the sequence of functions (P1,P2,) on the interval [1,1] is the zero function, hence the derivative of the limit is the zero function as well. However, the limit of the sequence of derivatives (P'1,P'2,) fails to be zero at least for x=0. What happens for 0<|x|1? Here the limit of derivatives is zero, since P'n(x)=(1(2n+1)x2)(1x2)n10 (check it; the exponential decay of the second factor outweighs the linear growth of the first factor). Thus,

0=(limnPn(x))limnP'n(x)=f(x)={0for 1x<0,1for x=0,0for 0<x1.

It is not always possible to interchange derivative and limit.

Note the two equivalent definition of the function f; one is piecewise (something for some x, something else for other x, ...), but the other is a single expression f(x)=limn(1(2n+1)x2)(1x2)n1 for all these x. Template:Wikipedia-inline The function f is discontinuous (at 0), and nevertheless it is the limit of continuous functions P'n. This can happen for pointwise convergence (that is, convergence at every point of the considered domain), since the speed of convergence can depend on the point. Template:Wikipedia-inline Otherwise (when the speed of convergence does not depend on the point) convergence is called uniform; by the uniform convergence theorem, the uniform limit of continuous functions is a continuous function. Template:Wikipedia-inline It follows that convergence of P'n (to f) is non-uniform, but this is a proof by contradiction. Template:Wikipedia-inline A better understanding may be gained from a direct proof. The derivatives P'n fail to converge uniformly, since P'n(x) fails to be small (when n is large) for some x close to 0; for instance, try x=cn1:

max0<x1P'n(x)P'n(cn1)=(1(2n+1)cn1)12c(1cn1)n1ec(12c)ec

for all c>0; and (12c)ec is not zero (unless c=0.5).

In contrast, Pn0 uniformly on [1,1], that is, max1x1|Pn(x)|0 as n, since the maximum is reached at x=±12n+1 (check it by solving the equation P'n(x)=0) and 0Pn(12n+1)12n+10. And still, it appears to be impossible to interchange derivative and limit. Compare this case with a well-known theorem:

If (fn) is a sequence of differentiable functions on [a,b] such that limnfn(x0) exists (and is finite) for some x0[a,b] and the sequence (f'n) converges uniformly on [a,b], then fn converges uniformly to a function f on [a,b], and f(x)=limnf'n(x) for x[a,b].

Template:Wikipedia-inline Uniform convergence of derivatives f'n is required; uniform convergence of functions fn is not enough.

Complex numbers, helpful in Sect. "Polynomial mimics cosine: roots", are helpless here, since for z=±ci we have |P(z)|=c(1+c2)n for all c>0.

Monster functions

Continuous monster

Everyone knows how to visualize the behavior of a continuous function by a curve on the coordinate plane. To this end one samples enough points (x,f(x)) on the graph of the function, plots them on the (x,y) plane and, connecting these points by a curve, sketches the graph. However, this seemingly uncontroversial idea is challenged by some strange functions, sometimes called "continuous monsters". Often they are sums of so-called lacunary trigonometric series

f(x)=a0+n=1(ansin(λnx)+bncos(λnx)) (where the numbers λn are far apart).

Template:Wikipedia-inline    (Also Lacunary trigonometric series at EoM.)
The most famous of these is the Weierstrass function

f(x)=n=0ancos(bnπx) (for appropriate a,b such that 1ba<1).

Template:Wikipedia-inline According to Jarnicki and Pflug (Section 3.1), this is Ca,b(12x) (and the similar series of sine functions is Sa,b).

Weierstrass function

This image (reproduced here by Michael McLaughlin with reference to the above-mentioned Wikipedia article, and here by Richard Lipton without reference) is in fact a graph of the approximation f7(x)=n=0712ncos(3nπx) to the function f(x)=C0.5,3(12x) obtained by connecting 3100 sample points (which can be seen by inspecting the XML code in the SVG file). It looks like a curve albeit rather strange one. The last two summands are distorted, since the step size 431000.0013 exceeds the period 2370.0009 of cos(37πx) and is close to half the period 2360.0027 of cos(36πx). However, all terms with n>5 contribute at most 26+27+28+=25=132, that is, |f(x)f5(x)|132 for all x; this difference is barely visible unless one zooms the image.

So far, so good. But this monster is not the worst. In order to get a more monstrous lacunary trigonometric series one may try frequencies λn increasing faster than 3n, or coefficients decreasing slower than 1/2n, or both.

Let us try the series

g(x)=n=01(n+1)(n+2)cos(3n2πx)=12cos2πx+16cos6πx+112cos18πx+;

these coefficients 1(n+1)(n+2)=1n+11n+2 are convenient, since their sum is (finite and) evident: n=01(n+1)(n+2)=12+16+112+=(112)+(1213)+(1314)+=1. Accordingly, g(0)=1 (since cos0=1), and 1g(x)1 for all x (since 1cosx1). Template:Multiple image Similarly, n=0N1(n+1)(n+2)=11N+2 and n=N+11(n+1)(n+2)=1N+2. Accordingly, partial sums gN(x)=n=0N1(n+1)(n+2)cos(3n2πx) and tails g(x)gN(x)=n=N+11(n+1)(n+2)cos(3n2πx) satisfy gN(0)=11N+2, g(0)gN(0)=1N+2 and |gN(x)|11N+2, |g(x)gN(x)|1N+2 for all x.

Due to evident symmetries (see the graphs of g0,g1,g2 on the period [0,1]) it is enough to plot this function on [0,1/4].

Partial sum g8(x), zoom 4

Looking closely at the graph of g8 we come to some doubt. At first glance, it is drawn with a thicker pen. But no; some (almost?) vertical lines are thin. So, what do we see here, a curve, or rather, the area between two "parallel" curves? Template:Multiple image The graph of g8 becomes clearly visible after a zoom, but the doubt returns, hardened, with g10. One more zoom only worsens. We realize that the graph of g8 on [0,1/4] looks too nice. In particular, it appears that the graph of g crosses the upper side of the red box. So, how to get closer to the graph of g(x)?

Fortunately, for some special values of x the exact value of g(x) is evident. First, g(0)=1 (see above), whence g(k)=1 for all integers k due to periodicity: g(x+1)=g(x) for all x. Similarly, g(k+12)=g(12)=1, since coskπ=1 for all odd integers k (in particular, k=3n). And on the other hand, 1g(x)1 for all x.

Without the first summand we have the first tail g(x)g0(x), with the period 13. Accordingly, g(13k)g0(13k)=g(0)g0(0)=12 and g(13(k+12))g0(13(k+12))=g(16)g0(16)=12 for all integers k. And on the other hand, 12g(x)g0(x)12 for all x. Thus, Template:Multiple image

g(13k)=g0(13k)+12, g(13(k+12))=g0(13(k+12))12 for all integers k,
g0(x)12g(x)g0(x)+12 for all x.

Similarly, for all N,

g(13N+1k)=gN(13N+1k)+1N+2, g(13N+1(k+12))=gN(13N+1(k+12))1N+2 for all integers k,
gN(x)1N+2g(x)gN(x)+1N+2 for all x.
Via g10(x), zoom 400; also g12(x).

Returning to g10(x) and g12(x) on [0.11,0.1125], we add special values and bounds via g10(x) and see that g(x) is much further from g12(x) than g12(x) from g10(x). We have more than 440 points on the red curve, and equally many points on the blue curve.

Graph of g(x), zoom 400.

Thus, if the horizontal size of picture is less than 440 pixels, then inevitably the graph of g(x) crosses all the pixels between the red curve and the blue curve! Within the given resolution the graph of g(x) does not look like a curve, but as the area between two parallel curves.

This, in itself, isn't surprising. For example, a 10 megahertz radio wave modulated by a 100 hertz sound may be described by the function h(t)=cos(1022πt)cos(1072πt). The graph of h(t) on, say, [0,0.1] does not look like a curve, but as the region between two curves, ±cos(1022πt). And on [0,104] it looks like a rectangular region. However, zoom ultimately helps; the graph of h(t) on [0,106] (or any other 1 microsecond interval) does look like a curve.

In contrast, for g(t) zoom never helps, it only worsens. In fact, it ultimately leads to a graph that looks like a rectangular region. This shows the monstrous nature of g(x). On the other hand, the height of the (nearly) rectangular region converges to zero as zoom tends to infinity; this shows the continuous nature of g(x).

Random sample of points on the graph of g(x)g10(x).

Visualization by a region-like graph leaves much to be desired. Unable to draw a curve-like graph, we still can do more. We can choose at random many values of x, compute the corresponding values y of the given function, and draw the points (x,y). This picture shows a sample of 37500 random points on the graph of the tail g(x)g10(x) in the first quarter of the period of this tail (which is sufficient due to the evident symmetries, as was noted). Truth be told, the function g310(x)g10(x) was used here as a satisfactory approximation of g(x)g10(x), and each x was specified by more than 300 ternary (that is, base 3) digits in order to compute cos(33102πx).

As before, the red line and the blue line are bounds of the given function g(x)g10(x) (via g15(x)g10(x)), and the points on these curves are special values. A wonder: all the 37500 random points are far from these bounds! Usually 50 points is enough to get a satisfactory picture of the maximum value of the function. But for this monster, 37500 points are few.

Generally, for the sum of a lacunary trigonometric series, it is quite a challenge, to find its maximum and minimum (on a given interval) even approximately, say, with relative error less than 10%. Our choice of frequencies λn=3n, in concert with our use of cosine rather than sine, allow us to find unusually high values of the sum. In order to fully appreciate this good luck, try to maximize such function as n=01(n+1)(n+2)sin(3n2πx), or to minimize n=01(n+1)(n+2)cos(4n2πx), and you will realize, why Weierstrass preferred cosine functions and frequencies of the form bn where b is a positive odd integer. Usual numerical optimization methods fail because local extrema are numerous and very sharp. By continuity, a monster function is close to its maximum in some neighborhood of its maximizer, but this neighborhood is very small; a random point has very little chance of getting into such neighborhood. The vast majority of values are far from extreme.

Increasing rearrangement of sample points on the graph of g(x)g10(x).

In order to investigate the distribution of values of the tail g(x)g10(x) on its period, one may took the 37500 values y in the first quarter of the period (used above), together with the 37500 opposite numbers (y) (these are values in the second quarter), sort the list of all these 75000 numbers in ascending order, and treat the result as values of a new function at equally spaced points. Here is the result plotted, see the red curve. This is a numeric approximation of the so-called monotone (increasing) rearrangement of the given function.

Significantly, the result is quite close to the famous normal distribution 𝒩(0,σ2), and the red curve is quite close to the corresponding quantile function σΦ1(p) indicated by the black points. Template:Wikipedia-inline Template:Wikipedia-inline (Here σ2=01(g310(x)g10(x))2dx=12n=11310(1(n+1)(n+2))20.009812.) A wonder: the function is monstrous, but its monotone rearrangement is nice. Why so?

The given function is the sum of many summands (of the form 1(n+1)(n+2)cos(3n2πx)); and "The central limit theorem states that under certain (fairly common) conditions, the sum of many random variables will have an approximately normal distribution." Template:Wikipedia-inline Choosing a point x[0,1] at random according to the uniform distribution we may treat the summands as random variables. However, one of the conditions is independence; and our summands are dependent, moreover, functionally dependent, since cos(3θ)=4cos3θ3cosθ. (More generally, cosnθ=Tn(cosθ), Tn being the Chebyshev polynomial.) Nevertheless, probabilistic approach to lacunary trigonometric series exists and bears fruit; in particular, appearance of normal distribution in this context is a well-know phenomenon. Template:Wikipedia-inline

But clearly, the normal approximation must fail somewhere, since our function is bounded (by 1/12), while a normal random variable is not.

The word "central" in the name of the central limit theorem is interpreted in two ways: (1) of central importance in probability theory; (2) describes the center of the distribution as opposed to its tails, that is, large deviations.

Increasing rearrangement of g(x)g10(x), zoom 40.

Template:Wikipedia-inline Template:Wikipedia-inline In the normal distribution, deviations above mean+3σ appear with the probability 0.0013; accordingly, for 0<x<1, the inequality g(x)g10(x)>3σ holds on intervals of total length 0.0013, and a random sample of size 75000 should contain about 0.0013×75000100 such values. This is indiscernible on the previous picture, but clearly visible on the zoomed picture.

Increasing rearrangement of g(x)g10(x), log scale.

For 4σ the normal probability is only 0.000032, and 0.000032×75000 is only 2.4; in order to see what happens for 4σ and 5σ we need larger samples and logarithmic scale. This last picture presents a sample of size 108=100000000 (that took a computer several hours). We see that the distribution is close to normal at 4σ but moves away from normal at 5σ. Hopefully, the approximate normality follows from some moderate deviations theory that applies at 4σ, while the departure from normality follows from some large deviations theory that applies at 5σ and further; but for now, in spite of the recent progress in the probabilistic approach to lacunary series, such theories are not available, and the situation near the big red question mark on the picture remains unknown. It is natural to guess that is this domain the probability of tσ deviation is smaller than its normal approximation, therefore, smaller than exp(t2/2).

Assuming that this conjecture is true, and taking into account that 0.9112>7.6σ and exp(7.62/2)<31013, we conclude that the function g(x)g10(x) exceeds 90% of its maximum on intervals of total length less (probably, much less) than 31013.

But, again, this monster is not the worst. In order to get a more monstrous lacunary trigonometric series one may try frequencies λn increasing faster than 3n, or coefficients decreasing slower than 1/n2, or both. Unexpectedly, or not so unexpectedly, such functions, being more monstrous analytically, are more tractable probabilistically. (In the paper by Delbaen and Hovhannisyan, note the coefficients 1/nα for α<12 in Remark 2.3; note also the "big gaps" theorems 1.4, 2.15, 2.16 for λn+1λn.) This is a special case of a general phenomenon formulated by De Bruijn as follows:

It often happens that we want to evaluate a certain number [...] so that the direct method is almost prohibitive [...] we should be very happy to have an entirely different method [...] giving at least some useful information to it. And usually this new method gives (as remarked by Laplace) the better results in proportion to its being more necessary [...]

Template:BookCat