How to calculate the bandwidth for a Kernel density estimation

Jack Miller · Mar 12, 2011

Hello everybody!

Since some time I am trying the estimate the density of a set of numbers (in my case the numbers are distances to some object from a laser scanner).

As I read, that the kernel density estimation technique is a basic approach for that kind of problem. Different Kernels can be applied, e.g. a Gaussian kernel.
However the Variance/Bandwidth must be estimated for a correct project of the density from the given data(-->here the set of distances)

There are some measures like the "mean integrated squared error". In that case a set of test bandwidth are calculated and the one with the lowest MISE is a good choice for the bandwidth.
However there are other methods like the "plug in" method which results to an optimal bandwidth. Does anybody knows how to use that "plug in" method or knows some other method? I just stucked and that point of estimated the optimal bandwidth.

Any help or comment, I would be really glad!

SW VandeCarr · Mar 14, 2011

This paper describes the "Solve the Equation" plug in method which differs from the direct plug in approach in that the pilot bandwidth is written as a function of the kernel bandwidth h. (see section 5.2).

http://www.umiacs.umd.edu/~vikas/publications/CS-TR-4774.pdf

Jack Miller · Mar 14, 2011

Thanks a lot for your reply. I have read about the plug - in method. To be honest I couldn't completely figure out how to use it. For instance in the section you mentioned -- 5.2. -- on page 11. I am not sure how to compute step (1), how to estimate σ^. or what is H(.) in step 3?

By the way... Currently I am using a Gaussian kernel. As input I have a vector of about 100-150 double numbers.

Is there maybe some easier approximation available?
Or maybe some example?

I apologize, for those questions. I m trying since a week to figure out to calculate the bandwidth, even I tried with histogram smoothing which was not really successful.

SW VandeCarr · Mar 15, 2011

From your original question, I got the impression that you were more familiar with the method. It's computationally intensive. What software are you running? My experience a while back, working in a multidisciplinary group, was with Minitab. I'm sure there are other good packages for this (SAS), but stay away from Excel. It has a bad reputation according to some experienced participants in this forum.

I assume you have some reasons for using KDE instead of parametric methods. However, if all you want to do is find the most likely distribution for a data set, MLE works pretty well unless your data has a lot of noise.

One motivation for KDE is for smoothing noisy data. A forerunner equation for determining the distribution of some random variable is:

[tex] dX_t=\mu (X_t) dt + \sigma(X_t)dW_t[/tex] where [tex]W_t[/tex] is Standard Brownian Noise.

You should have seen this equation if you are attempting KDE:

[tex]f_h(x)=\frac{1}{nh}\sum_{i=1}^{n} K \frac{(x-x_i)}{h}[/tex]

where K is your kernel density function which integrates to 1. In your case it is Gaussian.

h is your bandwidth which is chosen to optimize between smoothness (larger) and reduced bias (smaller). Sigma is your standard deviation parameter. If I remember correctly, your software should let you vary the bandwidth and display the resulting plot with the MSE.

My google searches have revealed a lot material on this. We can't give tutorials on a subject like KDE but we might be able to help with specific questions. However, you don't need to understand the details of the computations to use the software.

Jack Miller · Mar 16, 2011

Thanks a lot for the answer.

There are many toolboxes in Matlab or in Python. Even I tried to use some applets.
It has shown that the bandwidth is a major factor that influences the estimated density of a distribution. Moreover the bandwidth estimation must run automatically.

I am suppose to use C++. I would like to give a short description about the project:
It is about object recognition in 3d. For each object I generate a 3d mesh which consists a set of nodes.
The nodes are connected by edges. Afterwards I compute for each node the shortest distances to all other nodes (all-pair-shortest-distance).
Then I compute for each node the mean distance (a kind of centrality measure for that node). This set of mean distances from all nodes
characterizes the object.
Then it is possible to compare sets of distances from different object to differentiate them.
I am follwing the approach which is described here: http://graphics.stanford.edu/courses/cs468-08-fall/pdf/hamza-krim.pdf .

In order to compare the set from distances from different objects, the density of distances from each object is estimated and then compared.
I am wanted to use KDE as also proposed in the paper(link).
However the optimal bandwidth in order to compute the a density distribution of the distances is a crucial parameter.

But you mentioned MLE. Does it mean, that I just compute the expected value and variance from the given distances of an object?
For smoothing you mentioned to use forerunner equation (Brownian noise).

I read that there is K-nearest neighbor approach (http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_5.html). As I understood for each distance, I should compute the K-nearest neighbors and then compute the variance of these selected distances.
This variance is the bandwidth for this distance. So in your KDE equation above: there will be not h, but a h_i which corresponds to the variance for K-nearest neighbors of x_i.
Do you agree?I am glad for the feedback you already gave me!

SW VandeCarr · Mar 17, 2011

Jack Miller said:

Thanks a lot for the answer.

There are many toolboxes in Matlab or in Python. Even I tried to use some applets.
It has shown that the bandwidth is a major factor that influences the estimated density of a distribution. Moreover the bandwidth estimation must run automatically.

Good. This would seem to solve your problem in terms of optimizing h.

.
Then it is possible to compare sets of distances from different object to differentiate them.
I am follwing the approach which is described here: http://graphics.stanford.edu/courses/cs468-08-fall/pdf/hamza-krim.pdf .

This is a fascinating application. Various 3D objects will have distinctive geodesic shape distributions.

In order to compare the set from distances from different objects, the density of distances from each object is estimated and then compared.
I am wanted to use KDE as also proposed in the paper(link).
However the optimal bandwidth in order to compute the a density distribution of the distances is a crucial parameter.

Of course.

But you mentioned MLE. Does it mean, that I just compute the expected value and variance from the given distances of an object?

I don't think so. After reading your link, it's clear you should use KDE.

For smoothing you mentioned to use forerunner equation (Brownian noise).

No. That equation is not very useful for direct application of KDE. With a Gaussian kernel the KDE equation (also given in both the Raykar paper and in your link) becomes:

[tex]\hat p(x)=\frac{1}{N\sqrt{2\pi h^2}} \sum_{i=1}^{N}e^{-(X-x_{i})^{2}/2h^2}[/tex]

I read that there is K-nearest neighbor approach (http://nedwww.ipac.caltech.edu/level5/March02/Silverman/Silver2_5.html). As I understood for each distance, I should compute the K-nearest neighbors and then compute the variance of these selected distances.
This variance is the bandwidth for this distance. So in your KDE equation above: there will be not h, but a h_i which corresponds to the variance for K-nearest neighbors of x_i.
Do you agree?

I don't follow. You're summing over all i in order to obtain an estimation. You are going to have only one selected bandwidth for a given estimation [tex] \hat p[/tex](x). Your aim is to minimize the integrated MSE for selections of h optimized for smoothness. From what you said above, your program should do this for you, or at least present you with choices.

[tex]intMSE(\hat p,p)=E[\int_R (\hat p(x)-p(x))^2 dx][/tex].

The Raykar link I posted is very good IMO. You should take the time to try to read it all even if you may not have the background to follow all of it. For example, do you know what a characteristic function (or Fourier transform of a PDF) is? If you have the right software for doing KDE you really don't need to worry about the computational details. The programs I remember displayed plots in evolution and calculated MSEs for you.

EDIT: Part of the problem I think you're having is evident in in your question. How do you "calculate" bandwidth? There is no necessarily "correct" bandwidth; only more or less optimal bandwidths given the objective.

Jack Miller · Mar 20, 2011

Hi,

I read the paper and tried to understand, especially section 5.2. . But all details of the bandwidth estimation are not completely clear for me. However this is a requirement if I need to estimate the bandwidth by myself.

As mentioned in the previous post, I found some sources which are computing the bandwidth. Unfortunately, they are applets or libraries in python, Matlab or R. I didn't find anything in C/C++ which is necessary for me, since the system which I have to integrate my part(object recogniton) is in C++.

I found on the Raykar website (http://www.umiacs.umd.edu/~vikas/Software/optimal_bw/optimal_bw_code.htm) some code in C++(http://www.umiacs.umd.edu/~vikas/Software/optimal_bw/source_code.zip).

It seems very promising! : "FAST OPTIMAL BANDWIDTH ESTIMATION FOR UNIVARIATE KDE - CODE" :-)

This is the header which provides a single function (which I think implements the approach of the paper you posted!):"class UnivariateDensityDerivative{

public:

//constructor

UnivariateDensityDerivative(int NSources,

int MTargets,

double *pSources,

double *pTargets,

double Bandwidth,

int Order,

double *pDensityDerivative);
//destructor

~UnivariateDensityDerivative();
//function to evaluate the Density Derivative

void Evaluate();
//function to evaluate the Hermite polynomial.

double hermite(double x, int r);
private:

int N; //number of sources.

int M; //number of targets.

double *px; //pointer to sources, (N).

double *py; //pointer to the targets, (M).

double h; //the source bandwidth.

int r; //the rth density derivative.

double *pD; //pointer the the evaluated Density Derivative, (M).

}; "

I am not 100% sure how to use that function.
The function requires the sources points and targets points. I suppose that these points are used to evaluate the accuracy of the density estimation.
Moreover I think, these points represent the "distances" in my context(see previouse posts).
Also sources and targets points are from the same distribution; maybe I can even use the same points for source and target in the function.

There is a bandwidth parameter. It seems that I need to provide a bandwidth, so this function will not provide an optimal bandwidth, rather I think it will return a kind of fitness of the provided bandwidth.
But how is the fitness represented? Maybe the last parameter "pDensityDerivative" does this. It is are kind of "return" parameter as I understood when I had a look at the code. But how can I interprete it? I couldn't find the answer for it.

I am so thankful and appreciate it, that you are trying to help me and I mean it!

SW VandeCarr · Mar 21, 2011

Jack Miller said:

Hi,

I read the paper and tried to understand, especially section 5.2. . But all details of the bandwidth estimation are not completely clear for me. However this is a requirement if I need to estimate the bandwidth by myself.

It seems very promising! : "FAST OPTIMAL BANDWIDTH ESTIMATION FOR UNIVARIATE KDE - CODE" :-)

There is a bandwidth parameter. It seems that I need to provide a bandwidth, so this function will not provide an optimal bandwidth, rather I think it will return a kind of fitness of the provided bandwidth.
But how is the fitness represented? Maybe the last parameter "pDensityDerivative" does this. It is are kind of "return" parameter as I understood when I had a look at the code. But how can I interprete it? I couldn't find the answer for it.

I am so thankful and appreciate it, that you are trying to help me and I mean it!

You need not have copied the program because I really can't help you with it. I'm not programmer. It's obvious that you cannot do this kind of asymptotic estimation by hand, so you will need to find a way to compute optimal h. All I can tell you is that Rykar defines a series of estimators beginning with equation 32 that lead to a solution with zero bias. However you may not want zero bias at the cost of smoothing depending on your data set and requirements for applications of your solution for related problems. I'm sorry I couldn't help you more, but this is a computing problem for you, not so much a math problem.

I might suggest you send a private message to one of the moderators to move your question to the Computer Forum. DO NOT REPOST this question there yourself however. That's a violation of PF rules and you may be cited with an infraction. The moderators' names appear at the bottom of the page in each forum (click on the name and then on "Contact Info'). I any case, I can't promise that they can do any better for you than you have done for yourself by finding the program you have but someone might be able to answer your questions regarding it.

How to calculate the bandwidth for a Kernel density estimation

Related to How to calculate the bandwidth for a Kernel density estimation

1. What is Kernel density estimation?

2. How is bandwidth calculated for Kernel density estimation?

3. What factors should be considered when choosing a bandwidth for Kernel density estimation?

4. Is there a standard or optimal bandwidth for Kernel density estimation?

5. How does bandwidth affect the accuracy of Kernel density estimation?

Similar threads

Hot Threads

Recent Insights