Weighted Mean: different sample size and variance

AI Thread Summary
To find the mean of three sets of numbers with different sample sizes and variances, it's suggested to calculate the weighted mean by using the averages of each section multiplied by the number of points, then dividing by the total number of points. The physical lengths of the sections should not factor into the mean calculation if the values are measured in the same units. If raw data is available, the overall sample mean can be computed directly, disregarding the groupings. Concerns about uneven spacing in the mesh suggest that a simple mean may not yield accurate results, indicating the need for a method that accounts for this variance. Ultimately, understanding the purpose of the means and the nature of the data distributions is crucial for accurate calculation.
AnneElizabeth
Messages
19
Reaction score
0
If I have three sets of numbers
A is numbers between 0 and 0.09
B is numbers between 0.091 and 0.011
C is numbers between 0.011 and 0.1

where the number of elements in A are say, 37, B are 16 and C are 178. So the three arrays have different numbers of points and different distances, plotting looks like this
plot.jpg

How do I find the mean of all the points?
I was assigning weights based by multiplying the physical length of each section and number of points in each section, then getting the ordinary mean of each section, multiplying by this weight and then dividing by the sum of the weights, but I'm not sure if this is right
<br /> \frac{w_A m_A + w_B m_B + w_C m_C}{w_A + w_B + w_C}<br />
where
<br /> w_A = l_A n_A<br />
<br /> l_A = 0.09-0 <br />
<br /> n_A = 37<br />
My way may be correct but it seems wrong? I've just guessed really. Any help greatly appreciated.
 
Physics news on Phys.org
Why are the number grouped and what do they represent?
 
The physical lengths shouldn't enter into the calculation. Add the averages for each section multiplied by the number of points. Then divide by the total number of points.
 
I need to know the purpose of the means you are trying to calculate.
 
It's mesh nodes along a boundary, where each point has a unique value, u. So I thought the size (physical length) of each section would be important?
 
So you want the average of U defined at a node weighted according to the size of the mess? What is the significance of the grouping to get the three means?
 
I assume you don't have the original raw data but only know the sample mean of each group. Then you should do what @mathman suggests. It is just as though you replaced the true value of each data point with the mean of each points group. Without the raw data, that is the best you can do. Of course, if you have the raw data, just take the sample mean of the entire data set and ignore the groups they were in.
 
I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
 
AnneElizabeth said:
I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
I don't understand. Do you mean that the measurements for the different sets are in different units? If they are all measured in the same units, I don't think you have to treat the data in the sets differently.
 
  • #10
Ignoring the red line, say the blue line is u. But the points on u are unevenly spaced - closer around 0.01 as there is a sudden jump. If I took a simple mean in say MatLab mean(U(Y==0)), this would not account for the fact that the points are unequally spaced. I'm wondering how I would account for this?
fig4 zoom.jpg
 
  • #11
If you have the raw data, and the values of all the data are measured on the same real line in the same units, then the sample mean of the raw data is unambiguous. Just use the standard equation. Maybe you are looking for something other than the mean of the data. It sounds like your data may not be from a single random distribution, but rather from three different distributions.
 
Back
Top