Weighted Mean: different sample size and variance

In summary, the conversation is about finding the mean of three sets of numbers with different number of elements and distances. The person is trying to calculate the mean by assigning weights based on the physical length and number of points in each section, but is unsure if this is the correct approach. They are also discussing the significance of grouping the numbers and the importance of the physical length in the calculation. It is suggested to use the standard equation to calculate the mean if the raw data is available.
  • #1
AnneElizabeth
19
0
If I have three sets of numbers
A is numbers between 0 and 0.09
B is numbers between 0.091 and 0.011
C is numbers between 0.011 and 0.1

where the number of elements in A are say, 37, B are 16 and C are 178. So the three arrays have different numbers of points and different distances, plotting looks like this
plot.jpg

How do I find the mean of all the points?
I was assigning weights based by multiplying the physical length of each section and number of points in each section, then getting the ordinary mean of each section, multiplying by this weight and then dividing by the sum of the weights, but I'm not sure if this is right
[tex]
\frac{w_A m_A + w_B m_B + w_C m_C}{w_A + w_B + w_C}
[/tex]
where
[tex]
w_A = l_A n_A
[/tex]
[tex]
l_A = 0.09-0
[/tex]
[tex]
n_A = 37
[/tex]
My way may be correct but it seems wrong? I've just guessed really. Any help greatly appreciated.
 
Physics news on Phys.org
  • #2
Why are the number grouped and what do they represent?
 
  • #3
The physical lengths shouldn't enter into the calculation. Add the averages for each section multiplied by the number of points. Then divide by the total number of points.
 
  • #4
I need to know the purpose of the means you are trying to calculate.
 
  • #5
It's mesh nodes along a boundary, where each point has a unique value, u. So I thought the size (physical length) of each section would be important?
 
  • #6
So you want the average of U defined at a node weighted according to the size of the mess? What is the significance of the grouping to get the three means?
 
  • #7
I assume you don't have the original raw data but only know the sample mean of each group. Then you should do what @mathman suggests. It is just as though you replaced the true value of each data point with the mean of each points group. Without the raw data, that is the best you can do. Of course, if you have the raw data, just take the sample mean of the entire data set and ignore the groups they were in.
 
  • #8
I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
 
  • #9
AnneElizabeth said:
I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.
I don't understand. Do you mean that the measurements for the different sets are in different units? If they are all measured in the same units, I don't think you have to treat the data in the sets differently.
 
  • #10
Ignoring the red line, say the blue line is u. But the points on u are unevenly spaced - closer around 0.01 as there is a sudden jump. If I took a simple mean in say MatLab mean(U(Y==0)), this would not account for the fact that the points are unequally spaced. I'm wondering how I would account for this?
fig4 zoom.jpg
 
  • #11
If you have the raw data, and the values of all the data are measured on the same real line in the same units, then the sample mean of the raw data is unambiguous. Just use the standard equation. Maybe you are looking for something other than the mean of the data. It sounds like your data may not be from a single random distribution, but rather from three different distributions.
 

1. What is the weighted mean and how is it calculated for different sample sizes and variances?

The weighted mean is a statistical measure used to find the average of a set of numbers, where each number is multiplied by a weight corresponding to its importance. It is calculated by multiplying each number in the set by its corresponding weight, then adding all the products and dividing by the sum of the weights. For different sample sizes and variances, the weights may change and need to be adjusted accordingly.

2. Why is the weighted mean used instead of a regular mean?

The weighted mean is used when the data being analyzed has varying levels of importance or significance. It gives more weight to values that are more significant, rather than treating all values equally. This is useful when dealing with data from different sample sizes and variances, as it takes into account the varying weights of the data.

3. How do you determine the weights for a weighted mean?

The weights for a weighted mean can be determined in a few different ways, depending on the situation. In some cases, the weights may be predetermined based on the significance of each value. In other cases, the weights may be determined by the inverse of the variances of each value, with higher variances having lower weights. Additionally, the weights can also be determined through statistical methods such as regression analysis or the use of a mathematical formula.

4. Can the weighted mean be affected by outliers in the data?

Yes, the weighted mean can be affected by outliers in the data. Outliers are values that are significantly different from the rest of the data, and if they have a high weight, they can greatly influence the weighted mean. It is important to carefully consider the weights assigned to outliers in order to accurately represent the data.

5. What are the limitations of using a weighted mean for data analysis?

One limitation of using a weighted mean is that it assumes the weights are accurate and representative of the data. If the weights are not carefully chosen or if there is bias in the weights, the resulting weighted mean may not accurately reflect the data. Additionally, the weighted mean may not be useful if the data is heavily skewed, as it may give too much weight to extreme values.

Similar threads

  • Introductory Physics Homework Help
Replies
13
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
936
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
25
Views
11K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
3K
  • Calculus and Beyond Homework Help
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
93K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
3K
Back
Top