Calculating Standard Error of Mean for 2D Histogram Data

zedya
Messages
4
Reaction score
0
Hey everyone,

I'm not sure if there is an effective answer to my problem, but here goes:

I am working on Ramachandran plots for short peptides (3 amino acids long). For every snapshot of the protein (this would be my data point) there are two angles being recorded, the phi and psi angles. Looks like this:

-144.369 -177.292
-64.5267 148.338
-114.061 141.662
-82.48 152.633
-64.7174 157.237
-85.9076 133.427
-103.411 145.982
-75.3895 150.165

Then I create several bins for the ranges that I'm interested in, so say I count how many of these data points have -160<phi<-120 and 140<psi<200. Once I have that count, I divide it by the total number of data points and find what percentage of the time the angles are in those ranges.

Now I need to calculate the standard error of mean for these percentages. I understand how to calculate the standard error of mean for the data itself, as in for the distribution of the data. But I am only counting how many data points fall into a given region and I am not sure as to how I can get an SEM for that.

Any help would be appreciated. And please ask if any clarification is necessary, even if you won't be able to help out, maybe it'll clarify things for the next person.

Thanks.
 
Physics news on Phys.org


Hey zedya and welcome to the forums.

If I'm reading this correctly, it seems you want to get a standard error and a mean for the distribution of the percentages of the actual bins.

So to do this you need to create a distribution with some bin-size b where you have 100/b bins (1 bin includes all percentage data, 10 includes 1-10,11-20, and so on).

So for all your bins you get the probabilities and put them in the appropriate bin, generate a histogram and normalize it to get your distribution.

Then take the sample mean and the standard error that distribution generated by your sample data to get that.

The more bins you have and the more data and variation of the percentages you have, the better the variation for the distribution of your percentages.

It would probably be wise though to outline exactly what you are trying to do because the advice given may be detrimental if you had a specific purpose that was contrary to that kind of analysis.
 
Hi all, I've been a roulette player for more than 10 years (although I took time off here and there) and it's only now that I'm trying to understand the physics of the game. Basically my strategy in roulette is to divide the wheel roughly into two halves (let's call them A and B). My theory is that in roulette there will invariably be variance. In other words, if A comes up 5 times in a row, B will be due to come up soon. However I have been proven wrong many times, and I have seen some...
Thread 'Detail of Diagonalization Lemma'
The following is more or less taken from page 6 of C. Smorynski's "Self-Reference and Modal Logic". (Springer, 1985) (I couldn't get raised brackets to indicate codification (Gödel numbering), so I use a box. The overline is assigning a name. The detail I would like clarification on is in the second step in the last line, where we have an m-overlined, and we substitute the expression for m. Are we saying that the name of a coded term is the same as the coded term? Thanks in advance.
Back
Top