Nonparametric bootstrap: Assumptions and number of bootstrap samples?

  • Context: Graduate 
  • Thread starter Thread starter madilyn
  • Start date Start date
  • Tags Tags
    Assumptions Bootstrap
Click For Summary

Discussion Overview

The discussion revolves around the nonparametric bootstrap method, focusing on its procedural steps, the number of bootstrap samples needed, and the assumptions required for the method to be effective. Participants explore theoretical aspects and implications of confidence intervals in the context of bootstrapping.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant outlines the nonparametric bootstrap procedure, including generating bootstrap samples and calculating statistics.
  • Another participant questions the definition of "works" in the context of bootstrapping and emphasizes the complexity of confidence intervals, suggesting that they do not imply a specific probability about population parameters.
  • Concerns are raised about the convergence of bootstrap estimates to the actual parameter value as more samples are generated, with no guarantees provided.
  • A distinction is made between confidence intervals and credible intervals, highlighting that the accuracy of estimates depends on how well the sample represents the population and the method of estimation used.
  • Participants express uncertainty about how to determine a good number of bootstrap samples and what constitutes a reliable bootstrap approximation.

Areas of Agreement / Disagreement

Participants generally agree on the procedural steps of the nonparametric bootstrap but express differing views on the implications of confidence intervals and the convergence of bootstrap estimates. The discussion remains unresolved regarding the best practices for determining the number of bootstrap samples and the assumptions necessary for effective application.

Contextual Notes

Participants note limitations in understanding the philosophical underpinnings of confidence intervals and the implications of bootstrap sampling, indicating a need for further exploration of these concepts.

madilyn
Messages
13
Reaction score
0
I've been figuring out the use of the nonparametric bootstrap and if I understand correctly, this is the procedure:

1. Take an original sample, a vector x = (x1, ..., xn)

2. Generate k vectors, each called a 'bootstrap sample', of the same length as x by random sampling (with replacement) of the original vector, x, e.g. I have b1 = (x3, xn, x1, ..., x13), b2 = (x2, x5, ..., xn) etc.

3. Now I calculate any statistic \hat{\theta} = f(\bf{x}) on each bootstrap sample and my bootstrapped statistic is the mean of the statistic across the distribution and my confidence intervals on that bootstrapped statistic can be found using the inverse cdf of a normal distribution.

If everything is correct above, I have two questions:

i. How do I determine the number of bootstrap samples to take, k? Is there a principled way to determine this? Without this, I would just have to keep repeating the same procedure with increasing k until there's some kind of convergence on the mean \bar{\hat{\theta}}? But this seems computationally taxing.

ii. What assumptions must be correct for this procedure to work? I'm guessing that \hat{\theta} must have finite variance? What else?
 
Physics news on Phys.org
I think your description of the procedure is correct.


madilyn said:
ii. What assumptions must be correct for this procedure to work? I'm guessing that \hat{\theta} must have finite variance? What else?

That's a good question. A major task is define what it means to say it "works". Do you have a sophisticated understanding of the meaning of a confidence interval? In particular, do you understand that the usual sort of confidence interval does NOT let you make statements about a population parameter being in a specific numerical interval. (For example, you can't conclude things like "There is a 90% chance that the population mean is in the interval [ 2.3 - 0.62, 2.3 + 0.62]. )

If you mean \bar{\hat{\theta}} to be an estimate of a parameter of the distribution from which the sample is taken, I don't think there is any guarantee that the results of the bootstrap "converge" to the value of that parameter as you generate more boostrap samples.
 
Stephen Tashi said:
I think your description of the procedure is correct.

That's a good question. A major task is define what it means to say it "works". Do you have a sophisticated understanding of the meaning of a confidence interval? In particular, do you understand that the usual sort of confidence interval does NOT let you make statements about a population parameter being in a specific numerical interval. (For example, you can't conclude things like "There is a 90% chance that the population mean is in the interval [ 2.3 - 0.62, 2.3 + 0.62]. )

If you mean \bar{\hat{\theta}} to be an estimate of a parameter of the distribution from which the sample is taken, I don't think there is any guarantee that the results of the bootstrap "converge" to the value of that parameter as you generate more boostrap samples.

Stephen, thanks for your prompt answers as always!

1. Unfortunately no, I don't have a very sophisticated understanding of the meaning of a confidence interval (I wouldn't be able to write a philosophical debate about it). But I do have a basic grasp of the pitfalls. What's one school of thought I could practice on "what works" without going too deep into the foundations?

2. Hm, that's problematic. How would I know what's a good bootstrap approximation if it doesn't converge?

I'm sorry if I sound like I'm just looking for more clues but I don't have a strong intuition on how to attack this.
 
A confidence interval isn't the same as a "credible interval" (http://en.wikipedia.org/wiki/Credible_interval).

Suppose we are trying to estimate a property of a population by bootstrapping. We have a large batch of samples and from it we repeatedly select smaller bathches. How close our estimate is to the actual value of the populaton parameter depends on 1)How well the large batch of samples matches the population distribution and 2) How we estimate the parameter from the bootstrap samples

Once you have the large batch of samples, you can usually produce smaller (frequentist) confidence intervals by doing more bootstrap sampling in 2). More boostrap samples can't improve the mis-estimation that may be introduced in 1). The total confidence interval size depends on both 1) and 2).

It would be easier to discuss bootstrapping if we discuss estimating a specific thing.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
3K
Replies
1
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 6 ·
Replies
6
Views
4K
  • · Replies 874 ·
30
Replies
874
Views
48K