Kernel Estimation: Questions about Bandwidths & R Functions

In summary, the two functions use different types of distributions and have different assumptions about the data.
  • #1
eoghan
207
7
Hi there!
I'm new in the technique of Kernel Estimation, so it could be that the following questions are really elementary. There is something I don't understand about the bandwidths. Using R I have two functions to perform the estimate:
kde2d from MASS
bkde2D from KernelSmooth
Here are my questions
1) I see from the source code of kde2d that it divides the bandwidth provided by the user by 4 and I've seen this practice also somewhere else. Why the bandwidth is divided by 4?
2) kde2d perform uses an axis-aligned bivariate normal distribution, while bkde2D uses a standard bivariate normal distibution. Are they the same?

Thank you
 
Physics news on Phys.org
  • #2
in advance!1) The bandwidth is divided by 4 because it is used for each dimension of the data. For example, if the data consists of two variables, X and Y, then the bandwidth is divided by 4 to get the bandwidth for each dimension (X and Y).2) No, they are not the same. The axis-aligned bivariate normal distribution has a covariance matrix that is diagonal, while the standard bivariate normal distribution has a full covariance matrix. The axis-aligned distribution assumes that the data points are independent and have no correlation between them, while the standard bivariate normal distribution allows for correlation between the data points.
 

FAQ: Kernel Estimation: Questions about Bandwidths & R Functions

What is kernel estimation and how does it work?

Kernel estimation is a non-parametric method used for estimating probability density functions from a given set of data points. It works by placing a kernel (usually a bell-shaped curve) at each data point and then summing up the contributions from all kernels to estimate the underlying density function.

What is bandwidth and why is it important in kernel estimation?

Bandwidth in kernel estimation refers to the width of the kernel function. It determines the smoothness of the estimated density function and plays a crucial role in finding the right balance between overfitting and underfitting the data. A larger bandwidth results in a smoother but less accurate estimate, while a smaller bandwidth leads to a more jagged but potentially more accurate estimate.

How do I choose the optimal bandwidth for kernel estimation?

There are several methods for selecting the optimal bandwidth, including cross-validation, plug-in methods, and rule-of-thumb approaches. Cross-validation involves splitting the data into training and validation sets and choosing the bandwidth that minimizes the error on the validation set. Plug-in methods use mathematical formulas to estimate the optimal bandwidth. Rule-of-thumb approaches use heuristics or guidelines to select a reasonable bandwidth. Ultimately, the best approach will depend on the specific data and application.

Are there specific R functions for performing kernel estimation?

Yes, there are several R functions that can be used for kernel estimation, including density(), kde(), and ks() in the base package, and the density() function in the MASS package. Each function may have different default settings and allow for different types of kernels and bandwidth selection methods, so it's important to read the documentation and choose the appropriate function for your specific needs.

Can kernel estimation handle non-continuous data?

Yes, kernel estimation can handle non-continuous data, such as categorical or discrete data. In these cases, the kernel function will be adjusted to account for the specific type of data. For example, in the case of categorical data, a uniform kernel may be used instead of a Gaussian kernel. However, it's important to note that kernel estimation is most effective when applied to continuous data.

Similar threads

Replies
7
Views
1K
Replies
3
Views
819
Replies
30
Views
3K
Replies
12
Views
2K
Replies
2
Views
1K
Replies
10
Views
5K
Replies
1
Views
2K
Back
Top