Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Weighted Mean: different sample size and variance

  1. Apr 8, 2015 #1
    If I have three sets of numbers
    A is numbers between 0 and 0.09
    B is numbers between 0.091 and 0.011
    C is numbers between 0.011 and 0.1

    where the number of elements in A are say, 37, B are 16 and C are 178. So the three arrays have different numbers of points and different distances, plotting looks like this
    plot.jpg
    How do I find the mean of all the points?
    I was assigning weights based by multiplying the physical length of each section and number of points in each section, then getting the ordinary mean of each section, multiplying by this weight and then dividing by the sum of the weights, but I'm not sure if this is right
    [tex]
    \frac{w_A m_A + w_B m_B + w_C m_C}{w_A + w_B + w_C}
    [/tex]
    where
    [tex]
    w_A = l_A n_A
    [/tex]
    [tex]
    l_A = 0.09-0
    [/tex]
    [tex]
    n_A = 37
    [/tex]
    My way may be correct but it seems wrong? I've just guessed really. Any help greatly appreciated.
     
  2. jcsd
  3. Apr 8, 2015 #2
    Why are the number grouped and what do they represent?
     
  4. Apr 8, 2015 #3

    mathman

    User Avatar
    Science Advisor
    Gold Member

    The physical lengths shouldn't enter into the calculation. Add the averages for each section multiplied by the number of points. Then divide by the total number of points.
     
  5. Apr 8, 2015 #4
    I need to know the purpose of the means you are trying to calculate.
     
  6. Apr 9, 2015 #5
    It's mesh nodes along a boundary, where each point has a unique value, u. So I thought the size (physical length) of each section would be important?
     
  7. Apr 9, 2015 #6
    So you want the average of U defined at a node weighted according to the size of the mess? What is the significance of the grouping to get the three means?
     
  8. Apr 10, 2015 #7

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    I assume you don't have the original raw data but only know the sample mean of each group. Then you should do what @mathman suggests. It is just as though you replaced the true value of each data point with the mean of each points group. Without the raw data, that is the best you can do. Of course, if you have the raw data, just take the sample mean of the entire data set and ignore the groups they were in.
     
  9. Apr 12, 2015 #8
    I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
     
  10. Apr 12, 2015 #9

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    I don't understand. Do you mean that the measurements for the different sets are in different units? If they are all measured in the same units, I don't think you have to treat the data in the sets differently.
     
  11. Apr 12, 2015 #10
    Ignoring the red line, say the blue line is u. But the points on u are unevenly spaced - closer around 0.01 as there is a sudden jump. If I took a simple mean in say MatLab mean(U(Y==0)), this would not account for the fact that the points are unequally spaced. I'm wondering how I would account for this?
    fig4 zoom.jpg
     
  12. Apr 12, 2015 #11

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    If you have the raw data, and the values of all the data are measured on the same real line in the same units, then the sample mean of the raw data is unambiguous. Just use the standard equation. Maybe you are looking for something other than the mean of the data. It sounds like your data may not be from a single random distribution, but rather from three different distributions.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook