How to calculate expectation and variance of kernel density estimator?

schniefen · Jun 27, 2021

This is a question from a mathematical statistics textbook, used at the first and most basic mathematical statistics course for undergraduate students. This exercise follows the chapter on nonparametric inference. An attempt at a solution is given. Any help is appreciated.

Exercise:

Suppose ##x_1, ..., x_n## are independent and identically distributed (i.i.d.) observations of a random variable ##X## with unknown distribution function ##F## and probability density function ##f\in C^m##, for some ##m>1## fixed. Let
$$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)$$
be a kernel estimator of ##f##, with ##k\in C^{m+1}## a given fixed function such that ##k\geq 0##, ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\mathrm{supp} (k)=[-1,1]## and bandwidth ##h=h(u)## (for the time being unspecified).
1. Show that ##\mathbb{E}[f_n(t)]=\int_{\mathbb{R}} k(u) f(t-hu)\mathrm{d}u##.
2. Make a series expansion of ##f## around ##t## in terms of ##hu## in the expression for ##\mathbb{E}[f_n(t)]##. Suppose that ##k## satisfies ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1<l<m## and ##\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty##. Determine the bias ##\mathbb{E}[f_n(t)]-f(t)## as a function of ##h##.
3. Suppose that ##\mathrm{Var}[k(X_1)]<\infty## and determine ##\mathrm{Var}[f_n(t)]## as a function of ##h##.
4. Determine the mean square error ##\mathrm{mse}[f_n(t)]## from 2 and 3 as a function of ##h##.
5. For what value of ##h##, as a function of ##n##, is ##\mathrm{mse}[f_n(t)]## smallest?
6. For the value of ##h=h(n)## obtained from 5, how fast does ##\mathrm{mse}[f_n(t)]## converge to 0, when ##n## converges to ##\infty##?

Attempt:

1. By linearity of the expectation, identical distribution of ##x_1, ..., x_n##, the law of the unconscious statistician and the change of variables ##u=(t−x)/h##,
\begin{align*}
\mathbb{E}[f_n(t)]&=\frac{1}{n}\sum_{i=1}^n \mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x}{h}\right)\right]\\
&=\int_{\mathbb{R}}\frac{1}{h}k\left(\frac{t-x}{h}\right)f(x)\mathrm{d}x\\
&=\int_{\mathbb{R}}\frac{1}{h}k(u)f(t-hu)h\mathrm{d}u\\
&=\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u.
\end{align*}

2. From ##f\in C^m##, it follows that $$f(t-hu)=\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m).$$
Then from 1 and linearity of integration,
\begin{align*}
\mathbb{E}[f_n(t)]&=\int_{\mathbb{R}}k(u)\left(\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m)\right)\mathrm{d}u\\
&=\sum_{l=0}^m\int_{\mathbb{R}}k(u)\frac{f^{(l)}(t)}{l!}\mathrm{d}u+\int_{\mathbb{R}}k(u)o((hu)^m)\mathrm{d}u.
\end{align*}
From the given conditions on ##k##, the ##l=0## term reads
$$\int_{\mathbb{R}} k(u)f(t)\mathrm{d}u=f(t)\int_{\mathbb{R}} k(u) \mathrm{d}u=f(t)1=f(t).$$
The ##1\leq l<m## terms are
$$\int_{\mathbb{R}} k(u)\frac{f^{(l)}(t)}{l!} (-hu)^l\mathrm{d}u=\frac{f^{(l)}(t)(-h)^l}{l!}\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0.$$
Finally, the ##l=m## term is $$ \frac{f^{(m)}(t)(-h)^m}{m!}\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty.$$
The remainder term is simply,
$$\int_\mathbb{R} k(u) o((uh)^m)\mathrm{d}u = o(h^m)\int_\mathbb{R} k(u) u^m\mathrm{d}u = o(h^m).$$
Putting it all together:
$$\mathbb{E}[f_n(t)] = f(t) + \frac{f^{(m)}(t)(-h)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u + o(h^m),$$ and thus $$\mathbb{E}[f_n(t)]-f(t)=\frac{f^{(m)}(t)(-h)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u + o(h^m)=A(t)h^m+o(h^m),$$ where ##A(t)=\frac{f^{(m)}(t)(-1)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u<\infty.##
Note that this solution assumes that ##h\neq h(u)## and that ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1\leq l < m##. Is this reasonable?

3. This solution was partly suggested by someone else. By independence and the change of variables in 2,
\begin{align*}
\mathrm{Var}[f_n(t)]&=\mathrm{Var}\left[\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\frac{1}{n^2h^2}\sum_{i=1}^n\mathrm{Var}\left[k\left(\frac{t-x_i}{h}\right)\right]\\
&=\frac{1}{nh^2}\mathrm{Var}\left[k\left(\frac{t-x}{h}\right)\right]\\
&=\frac{1}{nh^2}\left(\mathbb{E}\left[k^2\left(\frac{t-x}{h}\right)\right]-\left(\mathbb{E}\left[k\left(\frac{t-x}{h}\right)\right]\right)^2\right)\\
&=\frac{1}{nh^2}\left(h\int_{\mathbb{R}}k^2(u)f(t-hu)\mathrm{d}u-\left(h\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u\right)^2\right)\\
&=\frac{1}{nh^2}\left(h\int_{\mathbb{R}}k^2(u)\left(f(t)-(hu)f^{(1)}(t)+o(hu)\right)\mathrm{d}u +O(h^2) \right)\\
&=\frac{1}{nh^2}\left(h\cdot f(t)\int_{\mathbb{R}}k^2(u)\mathrm{d}u - h^2 f^{(1)}(t)\int_{\mathbb{R}}k^2(u)u\mathrm{d}u + o(h^2) +O(h^2) \right)\\
&=\frac{1}{nh^2}\left(h\cdot f(t)\int_{\mathbb{R}}k^2(u)\mathrm{d}u +O(h^2) \right)\\
&=\frac{f(t)}{nh}\int_{\mathbb{R}}k^2(u)\mathrm{d}u +o\left(\frac{1}{nh}\right),
\end{align*}
since ##o(h^2) + O(h^2) = O(h^2)## (2nd last equality) and ##O(h^2) \cdot \frac{1}{nh^2} = O(h) \cdot\frac{1}{nh} = o(1)\cdot \frac{1}{nh} = o\left(\frac{1}{nh}\right)## (last equality). What happens with ##- h^2 f^{(1)}(t)\int_{\mathbb{R}}k^2(u)u\mathrm{d}u## in the third to the second last equality?

schniefen · Jun 28, 2021

My assertion about ##h\neq h(u)## follows from...if ##h=h(u)## is in fact true, then ##o((hu)^m)=u^mo(h^m)## and thus ##
\int_\mathbb{R} k(u) o((uh)^m)\mathrm{d}u = \int_\mathbb{R} o(h^m)k(u) u^m\mathrm{d}u.## However, is it possible to just pull the ## o(h^m)## out of the integral and use the condition that ##\int_\mathbb{R} k(u) u^m\mathrm{d}u < \infty##?

Office_Shredder · Jun 29, 2021

I'm still lost on the first part. Is the ##u## in part 1 supposed to be the same ##u## that ##h## is a function of? You assumed it was constant when doing the change of variables.

schniefen · Jun 29, 2021

Thanks for the reply.

Office_Shredder said:

I'm still lost on the first part. Is the ##u## in part 1 supposed to be the same ##u## that ##h## is a function of?

As I look at the exercise again, I'm unsure if it isn't supposed to be ##h=h(n)## as specified in part 6. Would this make sense?

Office_Shredder said:

You assumed it was constant when doing the change of variables.

How come? The change of variables is ##u=(t−x)/h## and it implies ##h\mathrm{d}u=\mathrm{d}x##.

Office_Shredder · Jun 29, 2021

schniefen said:

How come? The change of variables is ##u=(t−x)/h## and it implies ##h\mathrm{d}u=\mathrm{d}x##.

If h is a function of u (and hence of x also?) then this isn't true. This doesn't make a ton of sense as a set up, so I think your conjecture that the u is a typo seems right to me.

schniefen · Jun 29, 2021

Office_Shredder said:

If h is a function of u (and hence of x also?) then this isn't true. This doesn't make a ton of sense as a set up, so I think your conjecture that the u is a typo seems right to me.

Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?

schniefen · Jul 7, 2021

schniefen said:

Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?

If one treats ##u=u(n)## (since ##u=(t-x)/h(n)##), then one could pull out factors of ##u## if the variable inside ##o## suddenly changes from ##hu## to ##n##. From the Taylor expansion, it is assumed it is ##hu##, i.e. there is a function of ##hu## that goes to ##0## faster than ##(hu)^m## as ##hu\to 0##. I guess one would have to find out what ##hu\to 0## would be equivalent to in terms of ##n##. It is isn't stated in the exercise, but to obtain reasonable estimations, ##h(n)\to 0## as ##n\to \infty##. Does this sound legitimate, @Office_Shredder? I have skimmed through different lecture notes about this and in many of them a similar calculation as above is carried out.

Office_Shredder · Jul 11, 2021

schniefen said:

Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?

Yes. In fact, you are going to run into more problems if h is a function of u, since now if h is small you have to consider what is happening to u. For example if ##h=1/u## the thing on the left is not a very impressive statement, and is not equal to the thing on the right.

schniefen · Jul 12, 2021

Office_Shredder said:

Yes. In fact, you are going to run into more problems if h is a function of u, since now if h is small you have to consider what is happening to u. For example if ##h=1/u## the thing on the left is not a very impressive statement, and is not equal to the thing on the right.

Why is ##o((hu)^m)=o\left(\left(\frac{1}{u}u\right)^m\right)=o(1)## not equal to ##u^mo(h^m)=u^mo\left(\frac{1}{u^m}\right)=o(1)##?

schniefen · Jan 4, 2022

schniefen said:

Summary:: What is the expectation and variance of ##f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)##?

This is a question from a mathematical statistics textbook, used at the first and most basic mathematical statistics course for undergraduate students. This exercise follows the chapter on nonparametric inference. An attempt at a solution is given. Any help is appreciated.

Exercise:

Suppose ##x_1, ..., x_n## are independent and identically distributed (i.i.d.) observations of a random variable ##X## with unknown distribution function ##F## and probability density function ##f\in C^m##, for some ##m>1## fixed. Let
$$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)$$
be a kernel estimator of ##f##, with ##k\in C^{m+1}## a given fixed function such that ##k\geq 0##, ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\mathrm{supp} (k)=[-1,1]## and bandwidth ##h=\displaystyle\rlap{——}h(u)h(n)## (for the time being unspecified).
1. Show that ##\mathbb{E}[f_n(t)]=\int_{\mathbb{R}} k(u) f(t-hu)\mathrm{d}u##.
2. Make a series expansion of ##f## around ##t## in terms of ##hu## in the expression for ##\mathbb{E}[f_n(t)]##. Suppose that ##k## satisfies ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1<l<m## and ##\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty##. Determine the bias ##\mathbb{E}[f_n(t)]-f(t)## as a function of ##h##.
3. Suppose that ##\mathrm{Var}[k(X_1)]<\infty## and determine ##\mathrm{Var}[f_n(t)]## as a function of ##h##.
4. Determine the mean square error ##\mathrm{mse}[f_n(t)]## from 2 and 3 as a function of ##h##.
5. For what value of ##h##, as a function of ##n##, is ##\mathrm{mse}[f_n(t)]## smallest?
6. For the value of ##h=h(n)## obtained from 5, how fast does ##\mathrm{mse}[f_n(t)]## converge to 0, when ##n## converges to ##\infty##?

Find attached a solution I have written.

How to calculate expectation and variance of kernel density estimator?

Attachments

1. What is a kernel density estimator?

2. How do you calculate the expectation of a kernel density estimator?

3. What is the variance of a kernel density estimator?

4. How do you choose the appropriate kernel function for a kernel density estimator?

5. Can a kernel density estimator be used for any type of data?

Similar threads

Hot Threads

Recent Insights