Weighted Mean: different sample size and variance

Click For Summary

Discussion Overview

The discussion revolves around calculating a weighted mean from three sets of numbers with different sample sizes and variances. Participants explore the implications of using physical lengths and the significance of grouping the data, while considering the context of mesh nodes along a boundary in a physical system.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes a method for calculating the mean by assigning weights based on the physical length of each section and the number of points, but expresses uncertainty about its correctness.
  • Another participant questions the purpose of grouping the numbers and what they represent.
  • Some participants argue that physical lengths should not factor into the mean calculation, suggesting instead to use the averages of each section multiplied by the number of points.
  • A participant seeks clarification on the purpose of the means being calculated, emphasizing the importance of understanding the context.
  • One participant explains that the average should be weighted according to the size of the mesh, questioning the significance of the grouping for calculating the means.
  • Another participant assumes that the original raw data is not available and suggests using the sample means of each group instead, while noting that if raw data is available, the overall sample mean should be calculated directly.
  • One participant expresses concern that a simple mean might not yield accurate results due to uneven spacing of the mesh nodes.
  • Another participant reiterates that if all measurements are in the same units, the data sets do not need to be treated differently, questioning the need for a weighted approach.
  • Finally, a participant suggests that the data may originate from different distributions, which could complicate the mean calculation.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the correct method for calculating the mean. There are multiple competing views regarding the role of physical lengths, the significance of grouping, and whether to use raw data or sample means.

Contextual Notes

There are unresolved questions about the assumptions underlying the proposed methods, particularly regarding the treatment of unevenly spaced data and the implications of using different distributions.

AnneElizabeth
Messages
19
Reaction score
0
If I have three sets of numbers
A is numbers between 0 and 0.09
B is numbers between 0.091 and 0.011
C is numbers between 0.011 and 0.1

where the number of elements in A are say, 37, B are 16 and C are 178. So the three arrays have different numbers of points and different distances, plotting looks like this
plot.jpg

How do I find the mean of all the points?
I was assigning weights based by multiplying the physical length of each section and number of points in each section, then getting the ordinary mean of each section, multiplying by this weight and then dividing by the sum of the weights, but I'm not sure if this is right
<br /> \frac{w_A m_A + w_B m_B + w_C m_C}{w_A + w_B + w_C}<br />
where
<br /> w_A = l_A n_A<br />
<br /> l_A = 0.09-0 <br />
<br /> n_A = 37<br />
My way may be correct but it seems wrong? I've just guessed really. Any help greatly appreciated.
 
Physics news on Phys.org
Why are the number grouped and what do they represent?
 
The physical lengths shouldn't enter into the calculation. Add the averages for each section multiplied by the number of points. Then divide by the total number of points.
 
I need to know the purpose of the means you are trying to calculate.
 
It's mesh nodes along a boundary, where each point has a unique value, u. So I thought the size (physical length) of each section would be important?
 
So you want the average of U defined at a node weighted according to the size of the mess? What is the significance of the grouping to get the three means?
 
I assume you don't have the original raw data but only know the sample mean of each group. Then you should do what @mathman suggests. It is just as though you replaced the true value of each data point with the mean of each points group. Without the raw data, that is the best you can do. Of course, if you have the raw data, just take the sample mean of the entire data set and ignore the groups they were in.
 
I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
 
AnneElizabeth said:
I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
I don't understand. Do you mean that the measurements for the different sets are in different units? If they are all measured in the same units, I don't think you have to treat the data in the sets differently.
 
  • #10
Ignoring the red line, say the blue line is u. But the points on u are unevenly spaced - closer around 0.01 as there is a sudden jump. If I took a simple mean in say MatLab mean(U(Y==0)), this would not account for the fact that the points are unequally spaced. I'm wondering how I would account for this?
fig4 zoom.jpg
 
  • #11
If you have the raw data, and the values of all the data are measured on the same real line in the same units, then the sample mean of the raw data is unambiguous. Just use the standard equation. Maybe you are looking for something other than the mean of the data. It sounds like your data may not be from a single random distribution, but rather from three different distributions.
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
6K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 25 ·
Replies
25
Views
12K
  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K
Replies
11
Views
94K
  • · Replies 3 ·
Replies
3
Views
4K