Statistics/Preliminaries

From testwiki
Revision as of 05:44, 9 December 2022 by imported>LeoChiukl
(diff) ← Older revision | Latest revision (diff) | Newer revision β†’ (diff)
Jump to navigation Jump to search

Template:Nav This chapter discusses some preliminary knowledge (related to statistics) for the following chapters in the advanced part.

Empirical distribution

Template:Colored definition Template:Colored remark Since all these n random variables follow the same cdf as X, we may expect their distribution should be somewhat similar to the distribution of X, and indeed, this is true. Before showing how this is true, we need to define "the distribution of these n random variables" more precisely, as follows: Template:Colored definition Template:Colored remark Template:Colored example Template:Colored remark Template:Colored theorem Template:Colored remark We have mentioned how we can approximate the cdf, and now we would like to estimate the Template:Colored em/Template:Colored em. Let us first discuss how to estimate the pmf.

For the discrete random variable X, from the empirical cdf, we know that each X1,,Xn is "assigned" with the probability 1/n. Also, considering the previous example, the empirical pmf is fn(x)=k=1n𝟏{Xk=x}n. Template:Colored remark To discuss the estimation of pdf of continuous random variable, we need to define Template:Colored em first. Template:Colored definition For the continuous random variable X, construct class intervals for X which are a non-overlapped partition of the interval [Xmin,Xmax], in which Xmin and Xmax are the minimum and maximum values in the sample. Then, the pdf f(x)F(cj)F(cj1)cjcj1,x(cj1,cj] and j=1,2,,i, when cj1 and cj are close, i.e. the length of each class interval is small. (Although the union of the above class intervals is (c0,ci] and thus the value c0 is not included in the interval, it does not matter since the value of the pdf at c0 does not affect the calculation of probability.) Here, c0 is Xmin and ci is Xmax.

Since F(cj)F(cj1)=β„™(X(cj1,cj])k=1n𝟏{Xk(cj1,cj]}n is the relative frequency of occurrences of the event {Xk(cj1,cj]}, we can rewrite the above expression as f(x)hn(x)=k=1n𝟏{Xk(cj1,cj]}n(cjcj1),x(cj1,cj] and j=1,2,,i in which hn(x) is called the Template:Colored em.

Since there are many possible ways to construct the class intervals, the value of hn(x) can differ even with the same n and x. When n is Template:Colored em and the length of each class interval is Template:Colored em, we will expect hn(x) to be a good estimate of f(x) (the theoretical pdf).

There are some properties related to the relative frequency histogram, as follows: Template:Colored proposition

Proof.

(i) Since the indicator function is nonnegative (its value is either 0 or 1), n is positive, and cj>cj1 so cjcj1 is positive, we have hn(x)0 by definition.

(ii) c0cihn(x)dx=c0c1hn(x)dx+c1c2hn(x)dx++ci1cihn(x)dx=1n(c0c1k=1n𝟏{Xk(c0,c1]}c1c0dx+c1c2k=1n𝟏{Xk(c1,c2]}c2c1dx++ci1cik=1n𝟏{Xk(ci1,ci]}cici1dx)=1n(k=1n𝟏{Xk(c0,c1]}c1c0(c1c0)+k=1n𝟏{Xk(c1,c2]}c2c1(c2c1)++k=1n𝟏{Xk(ci1,ci]}cici1(cici1))=1n(k=1n𝟏{Xk(c0,c1]}+k=1n𝟏{Xk(c1,c2]}++k=1n𝟏{Xk(ci1,ci]})=1n(k=1n𝟏{Xk(c0,c1](c1,c2](ci1,ci]})=1n(k=1n𝟏{Xk(c0,ci]sample space of X})=1nk=1n1=1nn=1. Here, c0 is Xmin and ci is Xmax.

(iii) We can "split" the integral in a similar way as in (ii), and then eventually the integral equals 1nk=1n𝟏{XkA}, and it can can approximate β„™(A) since it is the relative frequency of occurrences of the event {XkA}.

Expectation

In this section, we will discuss some results about expectation, which involve some sort of inequalities. Let a and b be constants. Also, let Ω be the sample space of X.

Template:Colored proposition

Proof. Assume β„™(a<XB)=1.

Case 1: X is discrete.

By definition of expectation, 𝔼[X]=xΩxf(x). Then, we have xΩaf(x)<xΩxf(x)xΩbf(x)axΩf(x)<𝔼[X]bxΩf(x)a<𝔼[X]b because of the condition β„™(a<Xb)=1.

Case 2: X is continuous.

We have similarly Ωaf(x)dx<Ωxf(x)dxΩbf(x)dxa<𝔼[X]b because of the condition of β„™(a<Xb)=1.

Template:Colored remark Template:Colored proposition

Proof. 𝔼[X]a=1axf(x)0dxaxf(x)dx1aaaf(x)dx=af(x)dx=β„™(Xa), as desired.

Template:Colored corollary

Proof. First, observe that X2 is a nonnegative random variable. Then, by Markov's inequality, for each (positive) a=a2, we have β„™(X2a)𝔼[X2]aβ„™(X2a2)𝔼[X2]a2β„™(X2a2)𝔼[X2]a2β„™(|X|a)𝔼[X2]a2, since a is positive.

Template:Colored proposition

Proof. Let L(x)=a+bx be the tangent of the function g(x) at x=𝔼[X]. Then, since g is convex, we have g(x)L(x) for each x (informally, we can observe this graphically). As a result, we have Ωg(x)f(x)dxΩL(x)f(x)dx𝔼[g(X)]𝔼[L(X)]=𝔼[a+bX]=a+b𝔼[X]=L(𝔼[X])=g(𝔼[X])since L(x) is tangent of g(x) at x=𝔼[X], as desired.

Template:Colored theorem

Proof. 0𝔼[(X𝔼[Y2]Y𝔼[XY])2]=𝔼[X2(𝔼[Y2])2constant2XY𝔼[Y2]𝔼[XY]constant+Y2(𝔼[XY])2constant]=(𝔼[Y2])2𝔼[X2]2𝔼[Y2]𝔼[XY]𝔼[XY]+(𝔼[XY])2𝔼[Y2]=𝔼[Y2](𝔼[X2]𝔼[Y2]2(𝔼[XY])2+(𝔼[XY])2)=𝔼[Y2](𝔼[X2]𝔼[Y2](𝔼[XY])2) Since 𝔼[Y2]0, we must have 𝔼[X2]𝔼[Y2](𝔼[XY])20(𝔼[XY])2𝔼[X2]𝔼[Y2].

Template:Colored example

Convergence

Before discussing convergence, we will define some terms that will be used later. Template:Colored definition Template:Colored remark In a Template:Colored em, say x1,,xn, we observe Template:Colored em values of their sample mean, x=i=1nxin, and sample variance, s2=i=1n(xix)2n. Template:Colored em, each of the values is only Template:Colored em realization of the respective random variables X and S2. We should notice the difference between these definite values (not random variables) and the statistics (random variables).

To explain the definitions of the sample mean X and sample variance S2 more intuitively, consider the following.

Recall that the empirical cdf Fn(x) assigns probability 1n to each of the random sample X1,,Xn. Thus, by the definition of mean and variance, the Template:Colored em of a random variable, say Y, with this cdf Fn(x) (and hence with the corresponding pmf fn(x)) is i=1n(Xi1n)=X. Similarly, the Template:Colored em of Y is i=1n((XiX)21n)=S2. In other words, the Template:Colored em and Template:Colored em of the empirical distribution, which corresponds to the Template:Colored em, is the Template:Colored em X and the Template:Colored em S2 respectively, which is quite natural, right? Template:Colored remark Also, recall that the empirical cdf Fn(x) can well approximate the cdf of X, F(x) when n is large. Since X and S2 are the mean and variance of a random variable with cdf Fn(x) it is natural to expect that X and S2 can well approximate the mean and variance of X.

Convergence in probability

Template:Colored definition Template:Colored remark The following theorem, namely Template:Colored em, is an important theorem which is related to convergence in probability. Template:Colored theorem

Proof. We use Sn to denote i=1nXi.

By definition, Xpμ as n is equivalent to β„™(|Snnμ|>ε)0 as n.

By Chebyshov's inequality, we have β„™(|Snnμ|>ϵ)1ε2𝔼[(Snnμ)2]=1ε2𝔼[(Snnμn)2]=1n2ε2𝔼[(Snnμ)2]=1n2ε2𝔼[(i=1nXiμ)2]=1n2ε2𝔼[i=1nj=1n(Xiμ)(Xjμ)]=1n2ε2(𝔼[i=j=1n(Xiμ)2]+𝔼[i=1nji,j=1n(Xiμ)(Xjμ)])

Since X1,X2, are Template:Colored em (and hence functions of them are also independent) and the expectation is multiplicative under independence, 1n2ε2(𝔼[i=j=1n(Xiμ)2]+𝔼[i=1nji,j=1n(Xiμ)(Xjμ)])=1n2ε2(𝔼[i=j=1n(Xiμ)2]+i=1nji,j=1n𝔼[Xiμ]=μμ=0𝔼[Xjμ]=μμ=0)=1n2ε2i=1n𝔼[(Xiμ)2]=σ2=nσ2n2ε2=σ2nε20as n. So, the probability β„™(|Snnμ|>ε) is Template:Colored em an expression that tends to be 0 as n. Since the probability is nonnegative (0), it follows that the probability also tends to be 0 as n.

Template:Colored remark There are also some properties of convergence in probability that help us to determine a complex expression converges to what thing. Template:Colored proposition

Proof. Template:Colored em: Assume XnpX and YnpY. Continuous mapping theorem is first proven so that we can use it in the proof of other properties (the proof is omitted here). Also, it can be shown that (Xn,Yn)p(X,Y) (joint convergence in probability, the definition is similar, except that the random variables become ordered pairs, so the interpretation of "|ZnZ|" becomes the Template:Colored em between the two points in Cartesian coordinate system, which are represented by the ordered pairs)

After that we define g(z1,z2)=az1+bz2, g(z1,z2)=z1z2, and g(z1/z2)=z1/z2 respectively, where each of these functions is continuous, and a,b are constants. Then, applying the continuous mapping theorem using each of these functions gives us the first three results.

Convergence in distribution

Template:Colored definition Template:Colored remark A very important theorem in statistics which is related to convergence in distribution is Template:Colored em. Template:Colored theorem

Proof. A (lengthy) proof can be founded in Probability/Transformation of Random Variables#Central limit theorem.

There are some properties of convergence in distribution, but they are a bit different from the properties of convergence in probability. These properties are given by Template:Colored em, and also continuous mapping theorem. Template:Colored theorem

Proof. Omitted.

Template:Colored theorem

Proof. Template:Colored em: Assume XndX and Ynpc. Then, it can be shown that (Xn,Yn)d(X,c) (joint convergence in distribution, and the definitions of this is similar, except that the cdf's become joint cdf's of ordered pairs). After that, we define g(z1,z2)=z1+z2,g(z1,z2)=z1z2, and g(z1,z2)=z1/z2 respectively, where each of the functions is continuous, and then applying the continuous mapping theorem using each of these functions gives us the three desired results.

Template:Colored remark

Resampling

By Template:Colored em, we mean creating new samples based on an existing sample. Now, let us consider the following for a general overview of the procedure of resampling.

Suppose X1,,Xn is a Template:Colored em from a distribution of a random variable X with cdf, F(x). Let x1,,xn be a corresponding Template:Colored em of the random sample X1,,Xn. Based on this realization, we have also a Template:Colored em of the empirical cdf: 1nk=1n𝟏{xkx} [1]. Since this is a realization of the empirical cdf, by Glivenko-Cantelli theorem, it is a good estimate of the cdf F(x) when n is large [2]. In other words, if we denote the random variable with the same pdf as that Template:Colored em of the empirical cdf by X*, X* and X have similar distributions when n is large.

Notice that a realization of empirical cdf is a Template:Colored em cdf (since the support x1,,xn is countable). We now draw a Template:Colored em (called the bootstrap (or resampling) random sample) with sample size B (called the Template:Colored em) X1*,,XB* from the distribution of a random variable X* (X* comes from Template:Colored em from X, so the behaviour of sampling from X* is called Template:Colored em).

Then, the relative frequency historgram of X1*,,XB* should be close to that of the corresponding Template:Colored em of the empirical pmf of X* (found from the realization of the empirical cdf of X*), which is close to pdf f(x) of X. This means the relative frequency historgram of X1*,,XB* is close to the pdf f(x) of X.

In particular, since the cdf of X*, Fn(x), assigns probability 1/n to each of X1*,,XB* [3], the pmf of X* is β„™(X*=xi)=1n,i=1,2,,n. Notice that this pmf is quite simple, and therefore it can make the related calculation about it simpler. For example, in the following, we want to know the distribution of T*=g(X1*,,Xn*), and this simple pmf can make the resulting distribution also quite simple.

Template:Colored remark In the following, we will discuss an application of the bootstrap method (or Template:Colored em) mentioned above, namely using bootstrap method to Template:Colored em the distribution of a statistic T=g(X1,X2,,Xn) (the inputs of the functions are random variables and g is a function). The reason for approximating, rather than finding the distribution exactly, is that the latter is usually infeasible (or may be too complicated).

To do this, consider the "bootstrapped statistic" T*=g(X1*,X2*,,Xn*) and the statistic T=g(X1,X2,,Xn). X1*,X2*,,Xn* is the bootstrap random sample (with bootstrap sample size n) from the distribution of X* and X1,X2,,Xn is the random sample from the distribution of X*. When n is large, since the distribution of X* is similar to that of X, the bootstrap random sample X1*,X2*,,XB* and the random sample X1,X2,,Xn are also similar. It follows that T* and T are similar as well, or to be more precise, the Template:Colored em of T* and T are close. As a result, we can utilize the distribution of T* (which is easier to find and simpler, since the pmf of X* is simple as in above) to approximate the distribution of T. A procedure to do this is as follows:

  1. Generate a Template:Colored em x1*,x2*,,xn* from the Template:Colored em X1*,X2*,,Xn*, which is from the distribution of X*.
  2. Calculate a realization of the bootstrapped statistic T*, t*=g(x1*,x2*,,xn*).
  3. Repeat 1. to 2. j times to get a sequence of j realizations of T*: t1*,t2*,,tj*.
  4. Plot the relative frequency historgram of the j realizations t1*,t2*,,tj*.

This histogram of the j realizations (which are a realization of a random sample from T* with sample size j) is close to the pmf of T* [4], and thus close to the pmf of T. Template:Nav Template:BookCat

  1. ↑ This is different from the empirical cdf 1nk=1n𝟏{Xkx}.
  2. ↑ For Glivenko-Cantelli theorem, the empirical cdf is a good estimate of the cdf F(x), regardless of what the actual values (realization) of the random sample are, i.e. for each realization of the empirical cdf, it is a good estimate of the cdf F(x), when n is large.
  3. ↑ That is, for a realization of random sample X1,X2,,Xn, say x1,x2,,xn, the probability for X* to equal x1,x2,,xn (which corresponds to the realization of X1,X2,,Xn respectively), is 1/n each.
  4. ↑ The reason is mentioned similarly above: the histogram should be close to the pmf of T* since the cdf corresponding to the histogram (i.e. the realization of the empirical cdf of the random sample T1*,T2*,,Tj* ) is close to the cdf of T*