Finding standard deviation of combination of data

songoku · Jun 10, 2024

I tried some workings but got me nowhere. I just want to ask whether this question is solvable, i.e the answer can be in numerical value. If yes, then I want to try a bit by myself before asking for hint here.

Thanks

FactChecker · Jun 10, 2024

To be clear, by A+B, I assume you mean some set of data ##\{ c_i = a_i + b_i | a_i \in A,\ b_i \in B\}##.
In that case, the correlation between the ##a_i##s and associated ##b_i##s must be considered.
The general equation is Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##) +2 Cov(##X_A,\ Y_B##).
For uncorrelated random variables, ##X_A## and ##Y_B##, this becomes Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##)

songoku · Jun 11, 2024

FactChecker said:

To be clear, by A+B, I assume you mean some set of data ##\{ c_i = a_i + b_i | a_i \in A,\ b_i \in B\}##.
In that case, the correlation between the ##a_i##s and associated ##b_i##s must be considered.
The general equation is Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##) +2 Cov(##X_A,\ Y_B##).
For uncorrelated random variables, ##X_A## and ##Y_B##, this becomes Var(##X_A+Y_B##) = Var(##X_A##) + Var(##Y_B##)

Ah I see, so basically this question not really making sense because the number of data in each group is not the same so A + B will result in some data in A has no match for data in B.

If the question is modified into finding the standard deviation if the data in A is combined with data in B (so now the total data is 250), can we solve it? Actually this is the one I tried and got stuck (so I thought maybe the information of the question is not enough)

Thanks

Hill · Jun 11, 2024

songoku said:

Ah I see, so basically this question not really making sense because the number of data in each group is not the same so A + B will result in some data in A has no match for data in B.

If the question is modified into finding the standard deviation if the data in A is combined with data in B (so now the total data is 250), can we solve it? Actually this is the one I tried and got stuck (so I thought maybe the information of the question is not enough)

Thanks

It can be solved if we assume that the groups are taken from the same population and have the same mean.

FactChecker · Jun 11, 2024

songoku said:

Ah I see, so basically this question not really making sense because the number of data in each group is not the same so A + B will result in some data in A has no match for data in B.

The first problem is that the meaning of "A+B" is undefined, or at least not clear to me. Do you mean the sum of random variables, ##X_A##, from A and ##X_B##, from B? In that case, you need to know which of the A samples match up and sum with which of the B samples.

songoku said:

If the question is modified into finding the standard deviation if the data in A is combined with data in B (so now the total data is 250), can we solve it? Actually this is the one I tried and got stuck (so I thought maybe the information of the question is not enough)

So you are talking about drawing samples of a random variable, X, from the union of A and B, ##A \cup B##. Are the samples drawn randomly uniformly from ##A \cup B##?
In that case, you should be able to use the standard equation for ##\sigma^2## that you gave above. Apply it to the entire 250 elements. Why do you say that it didn't work?

WWGD · Jun 11, 2024

Maybe to clarify , are these samples from two populations A, B, or do these describe the whole population of interest?
You may do some tests to determine if the data comes from different populations. I believe the Wilcoxon rank test is one such non-parametric test.

Gavran · Jun 12, 2024

What about the property ## \sigma_{A+B}^2 = \sigma_A^2 + \sigma_B^2 ## ?

FactChecker · Jun 12, 2024

Gavran said:

What about the property ## \sigma_{A+B}^2 = \sigma_A^2 + \sigma_B^2 ## ?

The OP defines A and B as sets. So A+B is not the sum of random variables. It is the sum of sets, whatever that means.
If you are talking about the sum of random variables, the formula is ##\sigma_{X+Y}^2 = \sigma_{X}^2 +\sigma_{Y}^2 + 2 cov(X,Y)##. Your "property" is wrong in general and only right for uncorrelated variables.
On the other hand, if you are talking about the union of sets, ##C=A\cup B##, with a random variable, ##X##, drawn with uniform distribution from ##C##, then it is still wrong. Consider the single-element sets ##A=\{0\}, B=\{100\}##. Clearly, ##\sigma_A = \sigma_B = 0## but ##\sigma_C = 50##.

songoku · Jun 12, 2024

FactChecker said:

The first problem is that the meaning of "A+B" is undefined, or at least not clear to me. Do you mean the sum of random variables, ##X_A##, from A and ##X_B##, from B? In that case, you need to know which of the A samples match up and sum with which of the B samples.

So you are talking about drawing samples of a random variable, X, from the union of A and B, ##A \cup B##. Are the samples drawn randomly uniformly from ##A \cup B##?
In that case, you should be able to use the standard equation for ##\sigma^2## that you gave above. Apply it to the entire 250 elements. Why do you say that it didn't work?

I am not really sure how to interpret the question. I posted the exact question, word by word.

In my opinion, it makes more sense if the interpretation is not the sum of random variables but maybe sum of sets. Group A has 150 data with standard deviation of 10 and group B has standard deviation of 20 with 100 data. Let say I combine all data into one set, set C, so this set contains 250 data and I want to find the standard deviation of C.

This is what I did:
For group A:
$$\sigma_{a}^{2}=\frac{1}{n_a} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{n_a}\right)$$
$$100=\frac{1}{150} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{150}\right)$$
$$\Sigma a^2=15000+\frac{(\Sigma a)^2}{150}....(1)$$

For group B:
$$\sigma_{b}^{2}=\frac{1}{n_b} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{n_b}\right)$$
$$400=\frac{1}{100} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{100}\right)$$
$$\Sigma b^2=40000+\frac{(\Sigma b)^2}{100}....(2)$$

For group C:
$$\sigma_{c}^{2}=\frac{1}{n_c} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)$$
$$=\frac{1}{250} \left(\Sigma a^2 +\Sigma b^2 - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$
$$=\frac{1}{250}\left(15000+\frac{(\Sigma a)^2}{150} + 40000+\frac{(\Sigma b)^2}{100} - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$

Then I stuck.

Thanks

Hill · Jun 12, 2024

songoku said:

I am not really sure how to interpret the question. I posted the exact question, word by word.

In my opinion, it makes more sense if the interpretation is not the sum of random variables but maybe sum of sets. Group A has 150 data with standard deviation of 10 and group B has standard deviation of 20 with 100 data. Let say I combine all data into one set, set C, so this set contains 250 data and I want to find the standard deviation of C.

This is what I did:
For group A:
$$\sigma_{a}^{2}=\frac{1}{n_a} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{n_a}\right)$$
$$100=\frac{1}{150} \left(\Sigma a^2 - \frac{(\Sigma a)^2}{150}\right)$$
$$\Sigma a^2=15000+\frac{(\Sigma a)^2}{150}....(1)$$

For group B:
$$\sigma_{b}^{2}=\frac{1}{n_b} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{n_b}\right)$$
$$400=\frac{1}{100} \left(\Sigma b^2 - \frac{(\Sigma b)^2}{100}\right)$$
$$\Sigma b^2=40000+\frac{(\Sigma b)^2}{100}....(2)$$

For group C:
$$\sigma_{c}^{2}=\frac{1}{n_c} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)$$
$$=\frac{1}{250} \left(\Sigma a^2 +\Sigma b^2 - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$
$$=\frac{1}{250}\left(15000+\frac{(\Sigma a)^2}{150} + 40000+\frac{(\Sigma b)^2}{100} - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$

Then I stuck.

Thanks

Have you tried the approach of the post #4 or you find it unreasonable?

songoku · Jun 12, 2024

Hill said:

Have you tried the approach of the post #4 or you find it unreasonable?

Oh I did that and I got ##\sqrt{220}## as the answer. I thought Factchecker was talking about something else, not using the assumption in post#4.

Thanks

FactChecker · Jun 13, 2024

songoku said:

I am not really sure how to interpret the question. I posted the exact question, word by word.

In my opinion, it makes more sense if the interpretation is not the sum of random variables but maybe sum of sets. Group A has 150 data with standard deviation of 10 and group B has standard deviation of 20 with 100 data. Let say I combine all data into one set, set C, so this set contains 250 data and I want to find the standard deviation of C.

For group C:
$$\sigma_{c}^{2}=\frac{1}{n_c} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)$$
$$=\frac{1}{250} \left(\Sigma a^2 +\Sigma b^2 - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$
$$=\frac{1}{250}\left(15000+\frac{(\Sigma a)^2}{150} + 40000+\frac{(\Sigma b)^2}{100} - \frac{(\Sigma a+\Sigma b)^2}{250}\right)$$

Then I stuck.

You are not stuck. You are done.
I can not checked your arithmetic for group C, but that is the correct approach (given certain assumptions about what your question means)
If you want to consider the combined set C as the entire population of possible values of a random variable drawn uniformly from C, then you have calculated the variance of that random variable.
If you want to consider the combined set C as the set of sample results, then you should make one change to your equation. It should be ##\sigma_{c}^{2}=\frac{1}{n_c -1} \left(\Sigma c^2 - \frac{(\Sigma c)^2}{n_c}\right)##. The divisor is reduced by 1 because the population mean is being estimated.

PS. When you combine two sets into one, IMO, you should use the union symbol, ##A \cup B##, rather than a plus sign.

songoku · Jun 13, 2024

Oh ok, it means I can't get the answer in numerical value.

Thank you very much for the help and explanation FactChecker, Hill, WWGD, Gavran

FactChecker · Jun 13, 2024

songoku said:

Oh ok, it means I can't get the answer in numerical value.

Thank you very much for the help and explanation FactChecker, Hill, WWGD, Gavran

Oh, wait! I thought that you had the values of the summations of all the elements in ##A \cup B##. Don't you have that? How did you get the means of A and B?

songoku · Jun 13, 2024

FactChecker said:

Oh, wait! I thought that you had the values of the summations of all the elements in ##A \cup B##. Don't you have that? How did you get the means of A and B?

I posted all the questions in OP, that's everything. I don't know the values of the summations of all the elements in ##A \cup B## and I don't have the means of A and B.

FactChecker · Jun 13, 2024

songoku said:

I posted all the questions in OP, that's everything. I don't know the values of the summations of all the elements in ##A \cup B## and I don't have the means of A and B.

Sorry, I misunderstood.

Interpreting A+B as ##A \cup B##:
There is no way to solve it. Consider three simpler problems, all with the same individual 0 (or undefined, if you wish) standard deviations for ##A## and ##B## but significantly different standard deviations for ##A \cup B##:
1) A={0}, B={1}. ##\sigma_{sample A\cup B} = 0.70710678## and ##\sigma_{population A\cup B} = 0.5##
2) A={0}, B={10}. ##\sigma_{sample A\cup B} = 7.0710678## and ##\sigma_{population A\cup B} = 5##
3) A={0}, B={100}. ##\sigma_{sample A\cup B} = 70.710678## and ##\sigma_{population A\cup B} = 50##

If you don't like the 0 or undefined standard deviations for single-element sets A and B, you can easily make multiple-element examples.

Interpreting A+B as ##\{a+b| a\in A, b\in B, \text {selected independently and randomly}\}##:
Then apply ##\sigma_{A+B}^2 = \sigma_A^2 + \sigma_B^2## as @Gavran stated in post #7.
Since this is the only interpretation of A+B with a solution, it is probably the correct interpretation.

songoku · Jun 14, 2024

I understand.

Thank you very much FactChecker

Finding standard deviation of combination of data

Similar threads

Finding the number of ways to arrange identical balls in a circle (3 different colors)

Find the polar form of a complex number

Greatest possible value of a constant in polynomial

Family of lines that are at a distance of 5 from the origin

Making the denominator of a certain fraction real

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers