# How to calculate expectation and variance of kernel density estimator?

• I
• schniefen
In summary, the exercise involves finding the bias, variance, and mean square error of a kernel estimator for a random variable with unknown distribution function. The exercise also requires making series expansions and determining the convergence rate. The solution involves using linearity of expectation, change of variables, and the law of the unconscious statistician. The bias is determined as a function of the bandwidth, while the variance and mean square error are determined as functions of the bandwidth and the random variable's probability density function. The exercise also requires showing that the mean square error is smallest for a certain value of the bandwidth, and determining the convergence rate for this value.
schniefen
TL;DR Summary
What is the expectation and variance of ##f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)##?
This is a question from a mathematical statistics textbook, used at the first and most basic mathematical statistics course for undergraduate students. This exercise follows the chapter on nonparametric inference. An attempt at a solution is given. Any help is appreciated.

Exercise:

Suppose ##x_1, ..., x_n## are independent and identically distributed (i.i.d.) observations of a random variable ##X## with unknown distribution function ##F## and probability density function ##f\in C^m##, for some ##m>1## fixed. Let
$$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)$$
be a kernel estimator of ##f##, with ##k\in C^{m+1}## a given fixed function such that ##k\geq 0##, ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\mathrm{supp} (k)=[-1,1]## and bandwidth ##h=h(u)## (for the time being unspecified).
1. Show that ##\mathbb{E}[f_n(t)]=\int_{\mathbb{R}} k(u) f(t-hu)\mathrm{d}u##.
2. Make a series expansion of ##f## around ##t## in terms of ##hu## in the expression for ##\mathbb{E}[f_n(t)]##. Suppose that ##k## satisfies ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1<l<m## and ##\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty##. Determine the bias ##\mathbb{E}[f_n(t)]-f(t)## as a function of ##h##.
3. Suppose that ##\mathrm{Var}[k(X_1)]<\infty## and determine ##\mathrm{Var}[f_n(t)]## as a function of ##h##.
4. Determine the mean square error ##\mathrm{mse}[f_n(t)]## from 2 and 3 as a function of ##h##.
5. For what value of ##h##, as a function of ##n##, is ##\mathrm{mse}[f_n(t)]## smallest?
6. For the value of ##h=h(n)## obtained from 5, how fast does ##\mathrm{mse}[f_n(t)]## converge to 0, when ##n## converges to ##\infty##?

Attempt:

1. By linearity of the expectation, identical distribution of ##x_1, ..., x_n##, the law of the unconscious statistician and the change of variables ##u=(t−x)/h##,
\begin{align*}
\mathbb{E}[f_n(t)]&=\frac{1}{n}\sum_{i=1}^n \mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\mathbb{E}\left[\frac{1}{h}k\left(\frac{t-x}{h}\right)\right]\\
&=\int_{\mathbb{R}}\frac{1}{h}k\left(\frac{t-x}{h}\right)f(x)\mathrm{d}x\\
&=\int_{\mathbb{R}}\frac{1}{h}k(u)f(t-hu)h\mathrm{d}u\\
&=\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u.
\end{align*}

2. From ##f\in C^m##, it follows that $$f(t-hu)=\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m).$$
Then from 1 and linearity of integration,
\begin{align*}
\mathbb{E}[f_n(t)]&=\int_{\mathbb{R}}k(u)\left(\sum_{l=0}^m \frac{f^{(l)}(t)}{l!} (-hu)^l+o((hu)^m)\right)\mathrm{d}u\\
&=\sum_{l=0}^m\int_{\mathbb{R}}k(u)\frac{f^{(l)}(t)}{l!}\mathrm{d}u+\int_{\mathbb{R}}k(u)o((hu)^m)\mathrm{d}u.
\end{align*}
From the given conditions on ##k##, the ##l=0## term reads
$$\int_{\mathbb{R}} k(u)f(t)\mathrm{d}u=f(t)\int_{\mathbb{R}} k(u) \mathrm{d}u=f(t)1=f(t).$$
The ##1\leq l<m## terms are
$$\int_{\mathbb{R}} k(u)\frac{f^{(l)}(t)}{l!} (-hu)^l\mathrm{d}u=\frac{f^{(l)}(t)(-h)^l}{l!}\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0.$$
Finally, the ##l=m## term is $$\frac{f^{(m)}(t)(-h)^m}{m!}\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty.$$
The remainder term is simply,
$$\int_\mathbb{R} k(u) o((uh)^m)\mathrm{d}u = o(h^m)\int_\mathbb{R} k(u) u^m\mathrm{d}u = o(h^m).$$
Putting it all together:
$$\mathbb{E}[f_n(t)] = f(t) + \frac{f^{(m)}(t)(-h)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u + o(h^m),$$ and thus $$\mathbb{E}[f_n(t)]-f(t)=\frac{f^{(m)}(t)(-h)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u + o(h^m)=A(t)h^m+o(h^m),$$ where ##A(t)=\frac{f^{(m)}(t)(-1)^m}{m!} \int_{\mathbb{R}}k(u)u^m \mathrm{d}u<\infty.##
Note that this solution assumes that ##h\neq h(u)## and that ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1\leq l < m##. Is this reasonable?

3. This solution was partly suggested by someone else. By independence and the change of variables in 2,
\begin{align*}
\mathrm{Var}[f_n(t)]&=\mathrm{Var}\left[\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)\right]\\
&=\frac{1}{n^2h^2}\sum_{i=1}^n\mathrm{Var}\left[k\left(\frac{t-x_i}{h}\right)\right]\\
&=\frac{1}{nh^2}\mathrm{Var}\left[k\left(\frac{t-x}{h}\right)\right]\\
&=\frac{1}{nh^2}\left(\mathbb{E}\left[k^2\left(\frac{t-x}{h}\right)\right]-\left(\mathbb{E}\left[k\left(\frac{t-x}{h}\right)\right]\right)^2\right)\\
&=\frac{1}{nh^2}\left(h\int_{\mathbb{R}}k^2(u)f(t-hu)\mathrm{d}u-\left(h\int_{\mathbb{R}}k(u)f(t-hu)\mathrm{d}u\right)^2\right)\\
&=\frac{1}{nh^2}\left(h\int_{\mathbb{R}}k^2(u)\left(f(t)-(hu)f^{(1)}(t)+o(hu)\right)\mathrm{d}u +O(h^2) \right)\\
&=\frac{1}{nh^2}\left(h\cdot f(t)\int_{\mathbb{R}}k^2(u)\mathrm{d}u - h^2 f^{(1)}(t)\int_{\mathbb{R}}k^2(u)u\mathrm{d}u + o(h^2) +O(h^2) \right)\\
&=\frac{1}{nh^2}\left(h\cdot f(t)\int_{\mathbb{R}}k^2(u)\mathrm{d}u +O(h^2) \right)\\
&=\frac{f(t)}{nh}\int_{\mathbb{R}}k^2(u)\mathrm{d}u +o\left(\frac{1}{nh}\right),
\end{align*}
since ##o(h^2) + O(h^2) = O(h^2)## (2nd last equality) and ##O(h^2) \cdot \frac{1}{nh^2} = O(h) \cdot\frac{1}{nh} = o(1)\cdot \frac{1}{nh} = o\left(\frac{1}{nh}\right)## (last equality). What happens with ##- h^2 f^{(1)}(t)\int_{\mathbb{R}}k^2(u)u\mathrm{d}u## in the third to the second last equality?

Last edited:
My assertion about ##h\neq h(u)## follows from...if ##h=h(u)## is in fact true, then ##o((hu)^m)=u^mo(h^m)## and thus ##
\int_\mathbb{R} k(u) o((uh)^m)\mathrm{d}u = \int_\mathbb{R} o(h^m)k(u) u^m\mathrm{d}u.## However, is it possible to just pull the ## o(h^m)## out of the integral and use the condition that ##\int_\mathbb{R} k(u) u^m\mathrm{d}u < \infty##?

I'm still lost on the first part. Is the ##u## in part 1 supposed to be the same ##u## that ##h## is a function of? You assumed it was constant when doing the change of variables.

Office_Shredder said:
I'm still lost on the first part. Is the ##u## in part 1 supposed to be the same ##u## that ##h## is a function of?
As I look at the exercise again, I'm unsure if it isn't supposed to be ##h=h(n)## as specified in part 6. Would this make sense?
Office_Shredder said:
You assumed it was constant when doing the change of variables.
How come? The change of variables is ##u=(t−x)/h## and it implies ##h\mathrm{d}u=\mathrm{d}x##.

schniefen said:
How come? The change of variables is ##u=(t−x)/h## and it implies ##h\mathrm{d}u=\mathrm{d}x##.

If h is a function of u (and hence of x also?) then this isn't true. This doesn't make a ton of sense as a set up, so I think your conjecture that the u is a typo seems right to me.

schniefen
Office_Shredder said:
If h is a function of u (and hence of x also?) then this isn't true. This doesn't make a ton of sense as a set up, so I think your conjecture that the u is a typo seems right to me.
Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?

schniefen said:
Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?
If one treats ##u=u(n)## (since ##u=(t-x)/h(n)##), then one could pull out factors of ##u## if the variable inside ##o## suddenly changes from ##hu## to ##n##. From the Taylor expansion, it is assumed it is ##hu##, i.e. there is a function of ##hu## that goes to ##0## faster than ##(hu)^m## as ##hu\to 0##. I guess one would have to find out what ##hu\to 0## would be equivalent to in terms of ##n##. It is isn't stated in the exercise, but to obtain reasonable estimations, ##h(n)\to 0## as ##n\to \infty##. Does this sound legitimate, @Office_Shredder? I have skimmed through different lecture notes about this and in many of them a similar calculation as above is carried out.

schniefen said:
Now, does ##o((hu)^m)=u^mo(h^m)## still hold if ##h\neq h(u)##?

Yes. In fact, you are going to run into more problems if h is a function of u, since now if h is small you have to consider what is happening to u. For example if ##h=1/u## the thing on the left is not a very impressive statement, and is not equal to the thing on the right.

Office_Shredder said:
Yes. In fact, you are going to run into more problems if h is a function of u, since now if h is small you have to consider what is happening to u. For example if ##h=1/u## the thing on the left is not a very impressive statement, and is not equal to the thing on the right.

Why is ##o((hu)^m)=o\left(\left(\frac{1}{u}u\right)^m\right)=o(1)## not equal to ##u^mo(h^m)=u^mo\left(\frac{1}{u^m}\right)=o(1)##?

schniefen said:
Summary:: What is the expectation and variance of ##f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)##?

This is a question from a mathematical statistics textbook, used at the first and most basic mathematical statistics course for undergraduate students. This exercise follows the chapter on nonparametric inference. An attempt at a solution is given. Any help is appreciated.

Exercise:

Suppose ##x_1, ..., x_n## are independent and identically distributed (i.i.d.) observations of a random variable ##X## with unknown distribution function ##F## and probability density function ##f\in C^m##, for some ##m>1## fixed. Let
$$f_n(t)=\frac{1}{n}\sum_{i=1}^n \frac{1}{h}k\left(\frac{t-x_i}{h}\right)$$
be a kernel estimator of ##f##, with ##k\in C^{m+1}## a given fixed function such that ##k\geq 0##, ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\mathrm{supp} (k)=[-1,1]## and bandwidth ##h=\displaystyle\rlap{——}h(u)h(n)## (for the time being unspecified).
1. Show that ##\mathbb{E}[f_n(t)]=\int_{\mathbb{R}} k(u) f(t-hu)\mathrm{d}u##.
2. Make a series expansion of ##f## around ##t## in terms of ##hu## in the expression for ##\mathbb{E}[f_n(t)]##. Suppose that ##k## satisfies ##\int_{\mathbb{R}} k(u)\mathrm{d}u=1##, ##\int_{\mathbb{R}} k(u)u^l\mathrm{d}u=0## for all ##1<l<m## and ##\int_{\mathbb{R}} k(u)u^m\mathrm{d}u<\infty##. Determine the bias ##\mathbb{E}[f_n(t)]-f(t)## as a function of ##h##.
3. Suppose that ##\mathrm{Var}[k(X_1)]<\infty## and determine ##\mathrm{Var}[f_n(t)]## as a function of ##h##.
4. Determine the mean square error ##\mathrm{mse}[f_n(t)]## from 2 and 3 as a function of ##h##.
5. For what value of ##h##, as a function of ##n##, is ##\mathrm{mse}[f_n(t)]## smallest?
6. For the value of ##h=h(n)## obtained from 5, how fast does ##\mathrm{mse}[f_n(t)]## converge to 0, when ##n## converges to ##\infty##?
Find attached a solution I have written.

#### Attachments

• exercise22_6.pdf
166.6 KB · Views: 126
jim mcnamara

## 1. What is a kernel density estimator?

A kernel density estimator is a non-parametric method used to estimate the probability density function of a random variable based on a set of observations. It works by smoothing the observed data points using a kernel function, such as a Gaussian or Epanechnikov function, to create a continuous estimate of the underlying probability distribution.

## 2. How do you calculate the expectation of a kernel density estimator?

The expectation, or mean, of a kernel density estimator is calculated by taking the average of the kernel function values at each data point, weighted by the corresponding kernel density estimate. This can be expressed mathematically as E[KDE(x)] = ∫ KDE(x) * K(x) dx, where K(x) is the chosen kernel function.

## 3. What is the variance of a kernel density estimator?

The variance of a kernel density estimator measures the spread or variability of the estimated probability density function. It is calculated by taking the integral of the squared kernel function values at each data point, weighted by the corresponding kernel density estimate, and subtracting the square of the expectation. This can be expressed mathematically as Var[KDE(x)] = ∫ (KDE(x))^2 * K(x) dx - (E[KDE(x)])^2.

## 4. How do you choose the appropriate kernel function for a kernel density estimator?

The choice of kernel function for a kernel density estimator depends on the underlying data and the desired smoothness of the estimated probability density function. Commonly used kernel functions include the Gaussian, Epanechnikov, and uniform kernels. The optimal choice can be determined through cross-validation or other methods of model selection.

## 5. Can a kernel density estimator be used for any type of data?

Yes, a kernel density estimator can be used for any type of data, as long as the underlying probability distribution is continuous. It is particularly useful for data that does not follow a known parametric distribution, as it does not make any assumptions about the shape of the data. However, it may not perform well for data with a large number of outliers or a highly skewed distribution.

Replies
1
Views
502
Replies
1
Views
996
Replies
2
Views
667
Replies
2
Views
1K
Replies
1
Views
431
Replies
1
Views
979
Replies
1
Views
947
Replies
0
Views
972
Replies
4
Views
3K
Replies
1
Views
1K