Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Standard error of the percentage of data pts that fall into a bin in a 2D histogram

  1. Aug 23, 2012 #1
    Hey everyone,

    I'm not sure if there is an effective answer to my problem, but here goes:

    I am working on Ramachandran plots for short peptides (3 amino acids long). For every snapshot of the protein (this would be my data point) there are two angles being recorded, the phi and psi angles. Looks like this:

    -144.369 -177.292
    -64.5267 148.338
    -114.061 141.662
    -82.48 152.633
    -64.7174 157.237
    -85.9076 133.427
    -103.411 145.982
    -75.3895 150.165

    Then I create several bins for the ranges that I'm interested in, so say I count how many of these data points have -160<phi<-120 and 140<psi<200. Once I have that count, I divide it by the total number of data points and find what percentage of the time the angles are in those ranges.

    Now I need to calculate the standard error of mean for these percentages. I understand how to calculate the standard error of mean for the data itself, as in for the distribution of the data. But I am only counting how many data points fall into a given region and I am not sure as to how I can get an SEM for that.

    Any help would be appreciated. And please ask if any clarification is necessary, even if you won't be able to help out, maybe it'll clarify things for the next person.

  2. jcsd
  3. Aug 23, 2012 #2


    User Avatar
    Science Advisor

    Re: standard error of the percentage of data pts that fall into a bin in a 2D histogr

    Hey zedya and welcome to the forums.

    If I'm reading this correctly, it seems you want to get a standard error and a mean for the distribution of the percentages of the actual bins.

    So to do this you need to create a distribution with some bin-size b where you have 100/b bins (1 bin includes all percentage data, 10 includes 1-10,11-20, and so on).

    So for all your bins you get the probabilities and put them in the appropriate bin, generate a histogram and normalize it to get your distribution.

    Then take the sample mean and the standard error that distribution generated by your sample data to get that.

    The more bins you have and the more data and variation of the percentages you have, the better the variation for the distribution of your percentages.

    It would probably be wise though to outline exactly what you are trying to do because the advice given may be detrimental if you had a specific purpose that was contrary to that kind of analysis.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook