How to calculate expectation and variance of kernel density estimator?

Click For Summary

Discussion Overview

The discussion revolves around the calculation of expectation and variance of a kernel density estimator, as presented in a mathematical statistics exercise. Participants explore theoretical aspects of nonparametric inference, focusing on the properties of kernel estimators, including bias, variance, and mean square error, while addressing specific mathematical conditions and assumptions.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • Some participants assert that the expectation of the kernel estimator can be expressed as an integral involving the true density function, based on the law of the unconscious statistician and change of variables.
  • Others propose a series expansion of the density function around a point, leading to a bias term that depends on the bandwidth and the derivatives of the density function.
  • There is a discussion about the conditions on the kernel function, particularly regarding its moments and support, and how these affect the bias and variance calculations.
  • Some participants question the assumption that the bandwidth is a function of the observations, leading to confusion about the implications for the change of variables used in the calculations.
  • Concerns are raised about the validity of pulling terms out of integrals when the bandwidth is treated as a function of the variable in the integral.
  • Participants explore the convergence of mean square error as the sample size increases and the implications of choosing an optimal bandwidth.
  • There is ongoing debate about whether certain mathematical expressions hold under the assumption that the bandwidth is not a function of the variable being integrated.

Areas of Agreement / Disagreement

Participants express differing views on the assumptions regarding the bandwidth function and its implications for the calculations. There is no consensus on whether the bandwidth should be treated as constant or variable, leading to unresolved questions about the validity of certain mathematical steps.

Contextual Notes

Limitations include assumptions about the behavior of the bandwidth function, the treatment of the kernel function's moments, and the implications of variable dependencies in the change of variables used in the calculations. These aspects remain unresolved and are subject to further clarification.

Who May Find This Useful

Readers interested in mathematical statistics, particularly in nonparametric methods and kernel density estimation, may find this discussion relevant for understanding the complexities involved in expectation and variance calculations.

schniefen
Messages
177
Reaction score
4
TL;DR
What is the expectation and variance of ##f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)##?
This is a question from a mathematical statistics textbook, used at the first and most basic mathematical statistics course for undergraduate students. This exercise follows the chapter on nonparametric inference. An attempt at a solution is given. Any help is appreciated.

Exercise:

Suppose ##x_1, ..., x_n## are independent and identically distributed (i.i.d.) observations of a random variable ##X## with unknown distribution function ##F## and probability density function ##f\in C^m##, for some ##m>1## fixed. Let
$$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)$$
be a kernel estimator of ##f##, with ##k\in C^{m+1}## a given fixed function such that ##k\geq 0##, ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\mathrm{supp} (k)=[-1,1]## and bandwidth ##h=h(u)## (for the time being unspecified).
1. Show that ##\mathbb{E}[f_n(t)]=\int_{\mathbb{R}} k(u) f(t-hu)\mathrm{d}u##.
2. Make a series expansion of ##f## around ##t## in terms of ##hu## in the expression for ##\mathbb{E}[f_n(t)]##. Suppose that ##k## satisfies ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1<l<m## and ##\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty##. Determine the bias ##\mathbb{E}[f_n(t)]-f(t)## as a function of ##h##.
3. Suppose that ##\mathrm{Var}[k(X_1)]<\infty## and determine ##\mathrm{Var}[f_n(t)]## as a function of ##h##.
4. Determine the mean square error ##\mathrm{mse}[f_n(t)]## from 2 and 3 as a function of ##h##.
5. For what value of ##h##, as a function of ##n##, is ##\mathrm{mse}[f_n(t)]## smallest?
6. For the value of ##h=h(n)## obtained from 5, how fast does ##\mathrm{mse}[f_n(t)]## converge to 0, when ##n## converges to ##\infty##?

Attempt:

1. By linearity of the expectation, identical distribution of ##x_1, ..., x_n##, the law of the unconscious statistician and the change of variables ##u=(t−x)/h##,
\begin{align*}
\mathbb{E}[f_n(t)]&=\frac{1}{n}\sum_{i=1}^n \mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x}{h}\right)\right]\\
&=\int_{\mathbb{R}}\frac{1}{h}k\left(\frac{t-x}{h}\right)f(x)\mathrm{d}x\\
&=\int_{\mathbb{R}}\frac{1}{h}k(u)f(t-hu)h\mathrm{d}u\\
&=\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u.
\end{align*}

2. From ##f\in C^m##, it follows that $$f(t-hu)=\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m).$$
Then from 1 and linearity of integration,
\begin{align*}
\mathbb{E}[f_n(t)]&=\int_{\mathbb{R}}k(u)\left(\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m)\right)\mathrm{d}u\\
&=\sum_{l=0}^m\int_{\mathbb{R}}k(u)\frac{f^{(l)}(t)}{l!}\mathrm{d}u+\int_{\mathbb{R}}k(u)o((hu)^m)\mathrm{d}u.
\end{align*}
From the given conditions on ##k##, the ##l=0## term reads
$$\int_{\mathbb{R}} k(u)f(t)\mathrm{d}u=f(t)\int_{\mathbb{R}} k(u) \mathrm{d}u=f(t)1=f(t).$$
The ##1\leq l<m## terms are
$$\int_{\mathbb{R}} k(u)\frac{f^{(l)}(t)}{l!} (-hu)^l\mathrm{d}u=\frac{f^{(l)}(t)(-h)^l}{l!}\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0.$$
Finally, the ##l=m## term is $$ \frac{f^{(m)}(t)(-h)^m}{m!}\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty.$$
The remainder term is simply,
$$\int_\mathbb{R} k(u) o((uh)^m)\mathrm{d}u = o(h^m)\int_\mathbb{R} k(u) u^m\mathrm{d}u = o(h^m).$$
Putting it all together:
$$\mathbb{E}[f_n(t)] = f(t) + \frac{f^{(m)}(t)(-h)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u + o(h^m),$$ and thus $$\mathbb{E}[f_n(t)]-f(t)=\frac{f^{(m)}(t)(-h)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u + o(h^m)=A(t)h^m+o(h^m),$$ where ##A(t)=\frac{f^{(m)}(t)(-1)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u<\infty.##
Note that this solution assumes that ##h\neq h(u)## and that ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1\leq l < m##. Is this reasonable?

3. This solution was partly suggested by someone else. By independence and the change of variables in 2,
\begin{align*}
\mathrm{Var}[f_n(t)]&=\mathrm{Var}\left[\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\frac{1}{n^2h^2}\sum_{i=1}^n\mathrm{Var}\left[k\left(\frac{t-x_i}{h}\right)\right]\\
&=\frac{1}{nh^2}\mathrm{Var}\left[k\left(\frac{t-x}{h}\right)\right]\\
&=\frac{1}{nh^2}\left(\mathbb{E}\left[k^2\left(\frac{t-x}{h}\right)\right]-\left(\mathbb{E}\left[k\left(\frac{t-x}{h}\right)\right]\right)^2\right)\\
&=\frac{1}{nh^2}\left(h\int_{\mathbb{R}}k^2(u)f(t-hu)\mathrm{d}u-\left(h\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u\right)^2\right)\\
&=\frac{1}{nh^2}\left(h\int_{\mathbb{R}}k^2(u)\left(f(t)-(hu)f^{(1)}(t)+o(hu)\right)\mathrm{d}u +O(h^2) \right)\\
&=\frac{1}{nh^2}\left(h\cdot f(t)\int_{\mathbb{R}}k^2(u)\mathrm{d}u - h^2 f^{(1)}(t)\int_{\mathbb{R}}k^2(u)u\mathrm{d}u + o(h^2) +O(h^2) \right)\\
&=\frac{1}{nh^2}\left(h\cdot f(t)\int_{\mathbb{R}}k^2(u)\mathrm{d}u +O(h^2) \right)\\
&=\frac{f(t)}{nh}\int_{\mathbb{R}}k^2(u)\mathrm{d}u +o\left(\frac{1}{nh}\right),
\end{align*}
since ##o(h^2) + O(h^2) = O(h^2)## (2nd last equality) and ##O(h^2) \cdot \frac{1}{nh^2} = O(h) \cdot\frac{1}{nh} = o(1)\cdot \frac{1}{nh} = o\left(\frac{1}{nh}\right)## (last equality). What happens with ##- h^2 f^{(1)}(t)\int_{\mathbb{R}}k^2(u)u\mathrm{d}u## in the third to the second last equality?
 
Last edited:
Physics news on Phys.org
My assertion about ##h\neq h(u)## follows from...if ##h=h(u)## is in fact true, then ##o((hu)^m)=u^mo(h^m)## and thus ##
\int_\mathbb{R} k(u) o((uh)^m)\mathrm{d}u = \int_\mathbb{R} o(h^m)k(u) u^m\mathrm{d}u.## However, is it possible to just pull the ## o(h^m)## out of the integral and use the condition that ##\int_\mathbb{R} k(u) u^m\mathrm{d}u < \infty##?
 
I'm still lost on the first part. Is the ##u## in part 1 supposed to be the same ##u## that ##h## is a function of? You assumed it was constant when doing the change of variables.
 
Thanks for the reply.
Office_Shredder said:
I'm still lost on the first part. Is the ##u## in part 1 supposed to be the same ##u## that ##h## is a function of?
As I look at the exercise again, I'm unsure if it isn't supposed to be ##h=h(n)## as specified in part 6. Would this make sense?
Office_Shredder said:
You assumed it was constant when doing the change of variables.
How come? The change of variables is ##u=(t−x)/h## and it implies ##h\mathrm{d}u=\mathrm{d}x##.
 
schniefen said:
How come? The change of variables is ##u=(t−x)/h## and it implies ##h\mathrm{d}u=\mathrm{d}x##.

If h is a function of u (and hence of x also?) then this isn't true. This doesn't make a ton of sense as a set up, so I think your conjecture that the u is a typo seems right to me.
 
  • Like
Likes   Reactions: schniefen
Office_Shredder said:
If h is a function of u (and hence of x also?) then this isn't true. This doesn't make a ton of sense as a set up, so I think your conjecture that the u is a typo seems right to me.
Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?
 
schniefen said:
Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?
If one treats ##u=u(n)## (since ##u=(t-x)/h(n)##), then one could pull out factors of ##u## if the variable inside ##o## suddenly changes from ##hu## to ##n##. From the Taylor expansion, it is assumed it is ##hu##, i.e. there is a function of ##hu## that goes to ##0## faster than ##(hu)^m## as ##hu\to 0##. I guess one would have to find out what ##hu\to 0## would be equivalent to in terms of ##n##. It is isn't stated in the exercise, but to obtain reasonable estimations, ##h(n)\to 0## as ##n\to \infty##. Does this sound legitimate, @Office_Shredder? I have skimmed through different lecture notes about this and in many of them a similar calculation as above is carried out.
 
schniefen said:
Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?

Yes. In fact, you are going to run into more problems if h is a function of u, since now if h is small you have to consider what is happening to u. For example if ##h=1/u## the thing on the left is not a very impressive statement, and is not equal to the thing on the right.
 
Office_Shredder said:
Yes. In fact, you are going to run into more problems if h is a function of u, since now if h is small you have to consider what is happening to u. For example if ##h=1/u## the thing on the left is not a very impressive statement, and is not equal to the thing on the right.

Why is ##o((hu)^m)=o\left(\left(\frac{1}{u}u\right)^m\right)=o(1)## not equal to ##u^mo(h^m)=u^mo\left(\frac{1}{u^m}\right)=o(1)##?
 
  • #10
schniefen said:
Summary:: What is the expectation and variance of ##f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)##?

This is a question from a mathematical statistics textbook, used at the first and most basic mathematical statistics course for undergraduate students. This exercise follows the chapter on nonparametric inference. An attempt at a solution is given. Any help is appreciated.

Exercise:

Suppose ##x_1, ..., x_n## are independent and identically distributed (i.i.d.) observations of a random variable ##X## with unknown distribution function ##F## and probability density function ##f\in C^m##, for some ##m>1## fixed. Let
$$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)$$
be a kernel estimator of ##f##, with ##k\in C^{m+1}## a given fixed function such that ##k\geq 0##, ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\mathrm{supp} (k)=[-1,1]## and bandwidth ##h=\displaystyle\rlap{——}h(u)h(n)## (for the time being unspecified).
1. Show that ##\mathbb{E}[f_n(t)]=\int_{\mathbb{R}} k(u) f(t-hu)\mathrm{d}u##.
2. Make a series expansion of ##f## around ##t## in terms of ##hu## in the expression for ##\mathbb{E}[f_n(t)]##. Suppose that ##k## satisfies ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1<l<m## and ##\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty##. Determine the bias ##\mathbb{E}[f_n(t)]-f(t)## as a function of ##h##.
3. Suppose that ##\mathrm{Var}[k(X_1)]<\infty## and determine ##\mathrm{Var}[f_n(t)]## as a function of ##h##.
4. Determine the mean square error ##\mathrm{mse}[f_n(t)]## from 2 and 3 as a function of ##h##.
5. For what value of ##h##, as a function of ##n##, is ##\mathrm{mse}[f_n(t)]## smallest?
6. For the value of ##h=h(n)## obtained from 5, how fast does ##\mathrm{mse}[f_n(t)]## converge to 0, when ##n## converges to ##\infty##?
Find attached a solution I have written.
 

Attachments

  • Like
Likes   Reactions: jim mcnamara

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 0 ·
Replies
0
Views
643
Replies
4
Views
3K