# I Bootstrap in Monte Carlo and the number of samples

#### diegzumillo

I am analyzing a lot of data from Montecarlo simulations and trying to use bootstrap to estimate the standard deviation. what I'm finding however is very, very little variation with each iteration of bootstrap, and I don't know why.

Two reasons come to mind. Given the complexity of the analysis I cannot do more than 50 iterations, which sounds like too few according to sources out there, so maybe I just need more? Another thing is I have 5000 data points, resampling barely makes a dent on the histogram, so I'm not surprised it's not changing the statistics that much either.

Anyone with experience with bootstrap analysis have any idea?

PS: I have no idea what 'level' this question is.

Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### Stephen Tashi

I am analyzing a lot of data from Montecarlo simulations and trying to use bootstrap to estimate the standard deviation.
The standard deviation of what?

"Standard deviation" is associated with the distribution of a random variable. What random variable are you talking about?

#### diegzumillo

Oh, I mean standard deviation of the thing I measured after the whole analysis. I didn't get in detail because it's not a direct quantity easy to detail. Schematically it's something like data > process data into a single quantity. Then bootstrapping resampled data > process into single quantity. Then I take all the generated processed quantities of each bootstrap and calculate the standard deviation of the whole set.

#### Stephen Tashi

Schematically it's something like data > process data into a single quantity. Then bootstrapping resampled data > process into single quantity. Then I take all the generated processed quantities of each bootstrap and calculate the standard deviation of the whole set.
It's unclear what you mean and what you are doing.

If you have N independent samples of a random variable, there are estimators of the its standard deviation (e.g. the sample standard deviation) that use all N samples. it isn't clear why you are bootstrapping. How does what you are doing differ from having N independent samples of the random variable of interest?

#### diegzumillo

I can try explaining what I'm trying to do a little better. I have a set of data which can be used to calculate a quantity I'm interested in. For sake of example, say we want to calculate the skewness. I take the original data and calculate the skewness. But the data I have is itself a small sample of the entire sample space but I can't run simulations forever so the small sample I have will have to do. I'll use bootstrap, resample it again and again. Each time I resample I calculate the skewness, which will surprisingly be different each time. Then I calculate the standard deviation of all the skewnesses (this word might not exist) obtained with each iteration, as a way of saying how confident I am that the skewness I calculated originally is representative of the larger sample data (the one I don't have).

Sorry for being vague earlier. The thing is I don't know much about this stuff, and whenever I don't know much about something I tend to assume I'm the only dummy who doesn't know about it, therefore everyone else would recognize the problem without a lot of explanation. i.e. I was lazy and presumptuous.

#### Stephen Tashi

II have a set of data which can be used to calculate a quantity I'm interested in. For sake of example, say we want to calculate the skewness.
It's important to know whether you want to estimate a property of a random variable versus a property of N samples of that random variable. For example, the standard deviation of a random variable is a different number that the standard deviation of the mean of 20 independent samples of that random variable. It isn't clear whether you are trying to estimate a parameter of a finite set of outcomes of a random variable or whether you are trying to estimate a parameter associated with the distribution of a single outcome of that random variable.

it isn't clear whether your 5000 data points are independent samples of the same random variable or whether they are generated by a process that introduces a dependence in their values - such as a Markov chain or an AIRMA process.

#### FactChecker

Gold Member
2018 Award
1) Are you changing the random number seed from one run to the next?
2) Is the situation being simulated such that the random part is small compared to the whole?
3) Is the significant random part a rare event?

Last edited:

#### Ray Vickson

Homework Helper
Dearly Missed
I am analyzing a lot of data from Montecarlo simulations and trying to use bootstrap to estimate the standard deviation. what I'm finding however is very, very little variation with each iteration of bootstrap, and I don't know why.

Two reasons come to mind. Given the complexity of the analysis I cannot do more than 50 iterations, which sounds like too few according to sources out there, so maybe I just need more? Another thing is I have 5000 data points, resampling barely makes a dent on the histogram, so I'm not surprised it's not changing the statistics that much either.

Anyone with experience with bootstrap analysis have any idea?

PS: I have no idea what 'level' this question is.
What is the meaning of the "5000" and of the "50" (as in 50 iterations)? Are you ultimately getting 50 samples of your quantity of interest, or are you getting 5000?

A sample of size 50 is somewhat "small", but people often need to deal with samples that small in applications. If the data are roughly normally distributed you can get confidence intervals on the variance by using the F-distribution.

A sample of size 5000 is really quite good, relative to what people often need to deal with in applications. An inference based on a sample of that size ought to be more "meaningful" than one based on a sample of size 50.

#### diegzumillo

Oh shoot. I thought this conversation had died after my last comment because I didn`'t get any notification (probably overlooked it).

This is still a problem to me, by the way. Bootstrapping still gives error bars unrealistically small.

The problem I'm working is a Monte Carlo simulation on a lattice (think Ising model), where I calculate for each temperature observables like magnetization for about 5000 different configurations. Then I calculate a density of states using Ferrenberg Swendsen algorithm.

I'm not very confident about bootstrapping this system because the data is correlated, as is usually the case in monte carlo methods. The Ferrenberg-Swendsen algorithm takes autocorrelation into account, so that's fine, but then bootstrapping? Shouldn't the data be uncorrelated?

"Bootstrap in Monte Carlo and the number of samples"

### Physics Forums Values

We Value Quality
• Topics based on mainstream science
• Proper English grammar and spelling
We Value Civility
• Positive and compassionate attitudes
• Patience while debating
We Value Productivity
• Disciplined to remain on-topic
• Recognition of own weaknesses
• Solo and co-op problem solving