# Understanding 2 equivalent formulations of both data set measures

• I
• fab13
In summary: In the same time, on [Wikipedia], it is said that :Population-based statistics [ edit ]The populations of sets, which may overlap, can be calculated simply as follows:$$N_{X \cup Y}=N_{X}+N_{Y}-N_{X \cap Y}$$The populations of sets, which do not overlap, can be calculated simply as follows:\begin{aligned}X \cap Y=\varnothing \Rightarrow & N_{X \cap Y}=0 \\\Rightarrow & N_{X \cup Y}=N_{X}+N_{Y}\end{aligned}Standard deviations of non-overlapping ##
fab13
TL;DR Summary
I would like to know if 2 formulations about the agregation of 2 measures are equivalent with expectation and standard deviation.
I have two independant experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures.

From these two measures, assuming errors are gaussian, we want to get the estimation of Ï
and its error (i.e with a combination of two measures).

We choose the maximum likelihood method with the pdf of 2 measures:
$$f(\tau, \sigma)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2} \frac{(\tau-\hat{\tau})^{2}}{\sigma^{2}}\right)$$
One has to maximize the likelihood function:
$$\mathcal{L}=\prod_{i=1}^{2} \frac{1}{\sqrt{2 \pi} \sigma_{i}} \exp \left(-\frac{1}{2} \frac{\left(\tau_{i}-\hat{\tau}\right)^{2}}{\sigma_{i}^{2}}\right)$$
taking the following condition:
$$\frac{\partial(-\log \mathcal{L})}{\partial \hat{\tau}}=0$$
We get :
$$\Rightarrow \hat{\tau}=\frac{\tau_{1} / \sigma_{1}^{2}+\tau_{2} / \sigma_{2}^{2}}{1 / \sigma_{1}^{2}+1 / \sigma_{2}^{2}}\quad(1)$$
##\sigma_{\hat{\tau}}## is deducted from second derivate of ##\log{\mathcal{L}}## :
$$\frac{1}{\sigma_{\hat{\tau}}^{2}}=\frac{1}{\sigma_{1}^{2}}+\frac{1}{\sigma_{2}^{2}}\quad(2)$$
For these both measures, equivalent number ##\tilde{N}## is defined by:
$$\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)$$

- Question 1) Why we call this quantity ##\tilde{N}## as an "equivalent number" in
eq##(3)##

- Question 2) This expression eq##(3)## is defined as being the relative error of
measure expressed by the statistical error due to the number of
events. Where does this definition of relative error of measure come from
? I mean, how to justify it ?

After, we can write :
$$\hat{\tau}=\frac{\tilde{N}_{1} \tau_{1}+\tilde{N}_{2} \tau_{2}}{\tilde{N}_{1}+\tilde{N}_{2}}$$
Finally, we have :
$$\hat{\tau}=\frac{\tau_{1} /\left(\sigma_{1} / \tau_{1}\right)^{2}+\tau_{2} /\left(\sigma_{2} / \tau_{2}\right)^{2}}{1 /\left(\sigma_{1} / \tau_{1}\right)^{2}+1 /\left(\sigma_{2} / \tau_{2}\right)^{2}}\quad(4)$$

In conclusion, we can say that in one case :

- case (1) : weighted by the square of inverse error (eq##(1)##)

and in another case :

- case (2): weighted by the square of relative error (eq##(4)##)

Question 3) Are these 2 cases, rather formulations, are equivalent ? Are they 2 interpretations of a same quantity ##\hat{\tau}## ? If not, what's the link between these both expressions eq##(1)## and eq##(4)## ?

In the same time, on [Wikipedia], it is said that :

Population-based statistics [ edit ]
The populations of sets, which may overlap, can be calculated simply as follows:
$$N_{X \cup Y}=N_{X}+N_{Y}-N_{X \cap Y}$$
The populations of sets, which do not overlap, can be calculated simply as follows:
\begin{aligned} X \cap Y=\varnothing \Rightarrow & N_{X \cap Y}=0 \\ \Rightarrow & N_{X \cup Y}=N_{X}+N_{Y} \end{aligned}
Standard deviations of non-overlapping ##(X \cap Y=\varnothing)## sub-populations can be aggregated as follows if the size (actual or relative to one another) and means of each are known:
\begin{aligned} \mu_{X \cup Y} &=\frac{N_{X} \mu_{X}+N_{Y} \mu_{Y}}{N_{X}+N_{Y}} \\ \sigma_{X \cup Y} &=\sqrt{\frac{N_{X} \sigma_{X}^{2}+N_{Y} \sigma_{Y}^{2}}{N_{X}+N_{Y}}+\frac{N_{X} N_{Y}}{\left(N_{X}+N_{Y}\right)^{2}}\left(\mu_{X}-\mu_{Y}\right)^{2}}\quad(5) \end{aligned}
For example, suppose it is known that the average American man has a mean height of 70 inches with a standard deviation of three inches and that the average American woman has a mean height of 65 inches with a standard deviation of two inches. Also assume that the number of men, ##N##, is equal to the number of women. Then the mean and standard deviation of heights of American adults could be calculated as
$$\begin{array}{l} \mu=\frac{N \cdot 70+N \cdot 65}{N+N}=\frac{70+65}{2}=67.5 \\ \sigma=\sqrt{\frac{3^{2}+2^{2}}{2}+\frac{(70-65)^{2}}{2^{2}}}=\sqrt{12.75} \approx 3.57 \end{array}$$ - Question 4) Considering the expectations ##\mu_x## and ##\mu_y## are not the same, like in my 2 measures at the beginning (corresponding to ##\tau_1## and ##\tau_2##), can we say that eq##(2)## and eq##(5)## are equivalent ? i.e in the case where I have 2 measures at the beginning of my most.

Any help is welcome

fab13 said:
I have two independant experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures.
We choose the maximum likelihood method with the pdf of 2 measures:
##
f(\tau, \sigma)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2} \frac{(\tau-\hat{\tau})^{2}}{\sigma^{2}}\right)
##
One has to maximize the likelihood function:
##
\mathcal{L}=\prod_{i=1}^{2} \frac{1}{\sqrt{2 \pi} \sigma_{i}} \exp \left(-\frac{1}{2} \frac{\left(\tau_{i}-\hat{\tau}\right)^{2}}{\sigma_{i}^{2}}\right)
##
That is a liklihood function for two samples, one sample taken per each gaussian distribution. So it isn't clear how we would compute ##\sigma_1, \sigma_2## if we have only one sample per each gaussian distribution.

For these both measures, equivalent number ##\tilde{N}## is defined by:
##
## - Question 1) Why we call this quantity ##\tilde{N}## as an "equivalent number" in
eq##(3)##

Where did you read about "equivalent number"? - in what text or article?

In conclusion, we can say that in one case :
- case (1) : weighted by the square of inverse error (eq(1))
and in another case :
- case (2): weighted by the square of relative error (eq(4))
Question 3) Are these 2 cases, rather formulations, are equivalent ?

No.

Are they 2 interpretations of a same quantity ##\hat{\tau}## ?

No.

If not, what's the link between these both expressions eq##(1)## and eq##(4)## ?

I don't know. The statistic ##\tau/\sigma## is called the "coefficient of variation". https://en.wikipedia.org/wiki/Coefficient_of_variation The only place I have seen the a relations like ##\sigma^2/\tau^2 = z/N^2 ## is in equations for confidence intervals for the statistic.
In the same time, on [Wikipedia], it is said that :

Population-based statistics [ edit ]
The populations of sets, which may overlap, can be calculated simply as follows:

That article deals with calculating the sample mean and sample standard deviation of a sample when we know those statistics for subsets of the numerical data. This is a different topic that how to use sample standard means and sample standard deviations of subsets of the data to estimate the population mean and population standard deviation of the sample.

Of course it is possible that the maximum liklihood or other type of estimator for a parameter uses only the sample mean and standard deviation from an entire set of data instead of using different calculations on various subsets of the data and combing the results. The thing to understand is that the above article is about a calculation whose correct answer is established by the definitions of sample mean and sample standard deviation for an entire data set. By contrast, an article about how to formulate estimators may deal with maximum liklihood estimators, minimum variance estimators, unbiased estimators etc., which have different definitions.

Last edited:
Are they 2 interpretations of a same quantity
?
No.

Could you justify please ? I think there is the notion of confidence between both interpretations but maybe I am wrong.

What is the interest of have these 2 different equations eq(1) and eq(4) ?

sorry, I wanted to say : "I think there is the notion of confidence level (C.L) between both interpretations" and this is also related to what is called "equivalent number".

fab13 said:

Is it clear that the two equations can produce different results for ##\hat{\tau}##?
What is the interest of have these 2 different equations eq(1) and eq(4) ?

I've never seen equation 4 before. Where did you see it?

fab13 said:
equivalent number ##\tilde{N}## is defined by:
$$\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)$$
This can only be done for Poisson statistics where expected value and variance are the same.

For a gaussian with mean zero (or negative ) you get nonsense.

BvU said:
This can only be done for Poisson statistics where expected value and variance are the same.

For a gaussian with mean zero (or negative ) you get nonsense.

I don't understand very well when you say this can only be done for Poisson statistic. Indeed, if ##\sigma_1=\tau_1## and ##\sigma_2=\tau_2##, then we would have :

$$\frac{\sigma_{1}}{\tau_{1}}=1=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=1=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)$$

So ##\tilde{N}_{1}=\tilde{N}_{2}=1## ? that makes no sense. What is the signification of this "equivalent number" ? I thought there was a link with confidence interval expressions.

Variance is the square of ##\sigma##

So what to conclude ? I would get :

##\dfrac{\sigma^2}{\tau}=\dfrac{\sigma}{\sqrt{\tilde{N}}}## ?

I have difficulties to grasp the subtilities of this quantity. As much I can understand the notion of relative dispersion with the ratio ##\dfrac{\sigma}{\tau}##, as less I understand this notion of "equivalent number".

Any clarifications are welcome

fab13 said:
I don't understand very well when you say this can only be done for Poisson statistic.
What meaning can you possibly attach to ##\ \sigma/\tau\ ## when ##\tau ## is zero or negative ?
Except, of course, to designate ##\ \sigma/|\tau| \ ## as the relative error.
fab13 said:
I have two independent experiments have measured τ11 and τ22 with σi representing errors on measures.
Usually, there is a story attached to such a statement:
How does one 'measure' ##\sigma_i## (as opposed to estimating) and is there a valid reason for them to be different ?​
Are the ##\tau_i## based on independent sample sets taken from one and the same population ?​
I seem to remember

If the ##\tau_i## are averages, the corresponding estimate of ##\sigma_m## of the average is ##\sigma_i/\sqrt {N-1}\approx 1/\sqrt N## (with \sigma_i the estimate for he standard deviatiom of the sample), so in that case you could make a case for ##\sigma_m \propto 1/\sqrt { N_i}##.​
The relative accuracy of an estimate of the standard deviation is approximately ##1/\sqrt N## where N is the sample size. This puts a severe limit on the significance of differences beween the ##\sigma_i## !​

## 1. What are equivalent formulations in data set measures?

Equivalent formulations in data set measures refer to two different ways of expressing the same information or data. This could include different units of measurement, different mathematical formulas, or different representations of the data.

## 2. Why is it important to understand 2 equivalent formulations in data set measures?

Understanding 2 equivalent formulations in data set measures is important because it allows for a deeper understanding of the data and how it is being measured. It also allows for comparisons and conversions between different forms of the data.

## 3. How can I identify equivalent formulations in data set measures?

To identify equivalent formulations in data set measures, you can look for patterns or relationships between the different forms of the data. This could include looking at the units of measurement, the mathematical operations used, or the overall trends in the data.

## 4. Can equivalent formulations in data set measures lead to different conclusions?

Yes, equivalent formulations in data set measures can sometimes lead to different conclusions. This is because different forms of the data may highlight different aspects or patterns, and may be interpreted differently by different individuals.

## 5. How can I ensure accuracy when working with 2 equivalent formulations in data set measures?

To ensure accuracy when working with 2 equivalent formulations in data set measures, it is important to double check your calculations and conversions. It can also be helpful to consult with other experts or sources to verify your understanding and interpretations of the data.

• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
964
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
841
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
787
• Thermodynamics
Replies
7
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
769
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
988
• Cosmology
Replies
1
Views
810
• Quantum Physics
Replies
1
Views
651