# Weighted Mean: different sample size and variance

Tags:
1. Apr 8, 2015

### AnneElizabeth

If I have three sets of numbers
A is numbers between 0 and 0.09
B is numbers between 0.091 and 0.011
C is numbers between 0.011 and 0.1

where the number of elements in A are say, 37, B are 16 and C are 178. So the three arrays have different numbers of points and different distances, plotting looks like this

How do I find the mean of all the points?
I was assigning weights based by multiplying the physical length of each section and number of points in each section, then getting the ordinary mean of each section, multiplying by this weight and then dividing by the sum of the weights, but I'm not sure if this is right
$$\frac{w_A m_A + w_B m_B + w_C m_C}{w_A + w_B + w_C}$$
where
$$w_A = l_A n_A$$
$$l_A = 0.09-0$$
$$n_A = 37$$
My way may be correct but it seems wrong? I've just guessed really. Any help greatly appreciated.

2. Apr 8, 2015

### gleem

Why are the number grouped and what do they represent?

3. Apr 8, 2015

### mathman

The physical lengths shouldn't enter into the calculation. Add the averages for each section multiplied by the number of points. Then divide by the total number of points.

4. Apr 8, 2015

### gleem

I need to know the purpose of the means you are trying to calculate.

5. Apr 9, 2015

### AnneElizabeth

It's mesh nodes along a boundary, where each point has a unique value, u. So I thought the size (physical length) of each section would be important?

6. Apr 9, 2015

### gleem

So you want the average of U defined at a node weighted according to the size of the mess? What is the significance of the grouping to get the three means?

7. Apr 10, 2015

### FactChecker

I assume you don't have the original raw data but only know the sample mean of each group. Then you should do what @mathman suggests. It is just as though you replaced the true value of each data point with the mean of each points group. Without the raw data, that is the best you can do. Of course, if you have the raw data, just take the sample mean of the entire data set and ignore the groups they were in.

8. Apr 12, 2015

### AnneElizabeth

I do have the values of U at each point in the mesh, I just thought because the mesh is unevenly spaced, taking a simple mean of these values would not yield the correct result.

9. Apr 12, 2015

### FactChecker

I don't understand. Do you mean that the measurements for the different sets are in different units? If they are all measured in the same units, I don't think you have to treat the data in the sets differently.

10. Apr 12, 2015

### AnneElizabeth

Ignoring the red line, say the blue line is u. But the points on u are unevenly spaced - closer around 0.01 as there is a sudden jump. If I took a simple mean in say MatLab mean(U(Y==0)), this would not account for the fact that the points are unequally spaced. I'm wondering how I would account for this?

11. Apr 12, 2015

### FactChecker

If you have the raw data, and the values of all the data are measured on the same real line in the same units, then the sample mean of the raw data is unambiguous. Just use the standard equation. Maybe you are looking for something other than the mean of the data. It sounds like your data may not be from a single random distribution, but rather from three different distributions.