Understanding 2 equivalent formulations of both data set measures

Click For Summary

Discussion Overview

The discussion revolves around two independent experiments measuring quantities ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}##, focusing on the estimation of a parameter ##\hat{\tau}## and its associated error using maximum likelihood methods. Participants explore the implications of different formulations for estimating ##\hat{\tau}##, specifically comparing the use of inverse error weights and relative error weights.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • Some participants question why the quantity ##\tilde{N}## is referred to as an "equivalent number" in the context of relative errors.
  • There is a discussion about the justification for defining relative error in terms of statistical error due to the number of events.
  • Participants explore whether the two formulations for estimating ##\hat{\tau}## (eq##(1)## and eq##(4)##) are equivalent or represent different interpretations of the same quantity.
  • One participant asserts that the two cases are not equivalent and questions the relationship between the two expressions.
  • Another participant highlights that the likelihood function for two samples is unclear when only one sample per Gaussian distribution is available.
  • There is mention of how population-based statistics differ from the maximum likelihood estimators being discussed.
  • Some participants express uncertainty about the implications of using different equations for estimating ##\hat{\tau}## and their potential to yield different results.

Areas of Agreement / Disagreement

Participants do not reach a consensus on whether the two formulations for estimating ##\hat{\tau}## are equivalent. There are competing views regarding the interpretations and implications of the different equations presented.

Contextual Notes

Participants note that the maximum likelihood method and the definitions of sample mean and standard deviation may lead to different results depending on the context and assumptions made about the data.

fab13
Messages
300
Reaction score
7
TL;DR
I would like to know if 2 formulations about the agregation of 2 measures are equivalent with expectation and standard deviation.
I have two independent experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures.

From these two measures, assuming errors are gaussian, we want to get the estimation of Ï
and its error (i.e with a combination of two measures).

We choose the maximum likelihood method with the pdf of 2 measures:
$$
f(\tau, \sigma)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2} \frac{(\tau-\hat{\tau})^{2}}{\sigma^{2}}\right)
$$
One has to maximize the likelihood function:
$$
\mathcal{L}=\prod_{i=1}^{2} \frac{1}{\sqrt{2 \pi} \sigma_{i}} \exp \left(-\frac{1}{2} \frac{\left(\tau_{i}-\hat{\tau}\right)^{2}}{\sigma_{i}^{2}}\right)
$$
taking the following condition:
$$
\frac{\partial(-\log \mathcal{L})}{\partial \hat{\tau}}=0
$$
We get :
$$
\Rightarrow \hat{\tau}=\frac{\tau_{1} / \sigma_{1}^{2}+\tau_{2} / \sigma_{2}^{2}}{1 / \sigma_{1}^{2}+1 / \sigma_{2}^{2}}\quad(1)
$$
##\sigma_{\hat{\tau}}## is deducted from second derivate of ##\log{\mathcal{L}}## :
$$
\frac{1}{\sigma_{\hat{\tau}}^{2}}=\frac{1}{\sigma_{1}^{2}}+\frac{1}{\sigma_{2}^{2}}\quad(2)
$$
For these both measures, equivalent number ##\tilde{N}## is defined by:
$$
\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)
$$

- Question 1) Why we call this quantity ##\tilde{N}## as an "equivalent number" in
eq##(3)##

- Question 2) This expression eq##(3)## is defined as being the relative error of
measure expressed by the statistical error due to the number of
events. Where does this definition of relative error of measure come from
? I mean, how to justify it ?


After, we can write :
$$
\hat{\tau}=\frac{\tilde{N}_{1} \tau_{1}+\tilde{N}_{2} \tau_{2}}{\tilde{N}_{1}+\tilde{N}_{2}}
$$
Finally, we have :
$$
\hat{\tau}=\frac{\tau_{1} /\left(\sigma_{1} / \tau_{1}\right)^{2}+\tau_{2} /\left(\sigma_{2} / \tau_{2}\right)^{2}}{1 /\left(\sigma_{1} / \tau_{1}\right)^{2}+1 /\left(\sigma_{2} / \tau_{2}\right)^{2}}\quad(4)
$$

In conclusion, we can say that in one case :

- case (1) : weighted by the square of inverse error (eq##(1)##)

and in another case :

- case (2): weighted by the square of relative error (eq##(4)##)

Question 3) Are these 2 cases, rather formulations, are equivalent ? Are they 2 interpretations of a same quantity ##\hat{\tau}## ? If not, what's the link between these both expressions eq##(1)## and eq##(4)## ?

In the same time, on [Wikipedia], it is said that :

Population-based statistics [ edit ]
The populations of sets, which may overlap, can be calculated simply as follows:
$$
N_{X \cup Y}=N_{X}+N_{Y}-N_{X \cap Y}
$$
The populations of sets, which do not overlap, can be calculated simply as follows:
$$
\begin{aligned}
X \cap Y=\varnothing \Rightarrow & N_{X \cap Y}=0 \\
\Rightarrow & N_{X \cup Y}=N_{X}+N_{Y}
\end{aligned}
$$
Standard deviations of non-overlapping ##(X \cap Y=\varnothing)## sub-populations can be aggregated as follows if the size (actual or relative to one another) and means of each are known:
$$
\begin{aligned}
\mu_{X \cup Y} &=\frac{N_{X} \mu_{X}+N_{Y} \mu_{Y}}{N_{X}+N_{Y}} \\
\sigma_{X \cup Y} &=\sqrt{\frac{N_{X} \sigma_{X}^{2}+N_{Y} \sigma_{Y}^{2}}{N_{X}+N_{Y}}+\frac{N_{X} N_{Y}}{\left(N_{X}+N_{Y}\right)^{2}}\left(\mu_{X}-\mu_{Y}\right)^{2}}\quad(5)
\end{aligned}
$$
For example, suppose it is known that the average American man has a mean height of 70 inches with a standard deviation of three inches and that the average American woman has a mean height of 65 inches with a standard deviation of two inches. Also assume that the number of men, ##N##, is equal to the number of women. Then the mean and standard deviation of heights of American adults could be calculated as
$$
\begin{array}{l}
\mu=\frac{N \cdot 70+N \cdot 65}{N+N}=\frac{70+65}{2}=67.5 \\
\sigma=\sqrt{\frac{3^{2}+2^{2}}{2}+\frac{(70-65)^{2}}{2^{2}}}=\sqrt{12.75} \approx 3.57
\end{array}
$$ - Question 4) Considering the expectations ##\mu_x## and ##\mu_y## are not the same, like in my 2 measures at the beginning (corresponding to ##\tau_1## and ##\tau_2##), can we say that eq##(2)## and eq##(5)## are equivalent ? i.e in the case where I have 2 measures at the beginning of my most.

Any help is welcome
 
Physics news on Phys.org
fab13 said:
I have two independent experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures.
We choose the maximum likelihood method with the pdf of 2 measures:
##
f(\tau, \sigma)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2} \frac{(\tau-\hat{\tau})^{2}}{\sigma^{2}}\right)
##
One has to maximize the likelihood function:
##
\mathcal{L}=\prod_{i=1}^{2} \frac{1}{\sqrt{2 \pi} \sigma_{i}} \exp \left(-\frac{1}{2} \frac{\left(\tau_{i}-\hat{\tau}\right)^{2}}{\sigma_{i}^{2}}\right)
##
That is a liklihood function for two samples, one sample taken per each gaussian distribution. So it isn't clear how we would compute ##\sigma_1, \sigma_2## if we have only one sample per each gaussian distribution.

For these both measures, equivalent number ##\tilde{N}## is defined by:
##
\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)
## - Question 1) Why we call this quantity ##\tilde{N}## as an "equivalent number" in
eq##(3)##

Where did you read about "equivalent number"? - in what text or article?

In conclusion, we can say that in one case :
- case (1) : weighted by the square of inverse error (eq(1))
and in another case :
- case (2): weighted by the square of relative error (eq(4))
Question 3) Are these 2 cases, rather formulations, are equivalent ?

No.

Are they 2 interpretations of a same quantity ##\hat{\tau}## ?

No.

If not, what's the link between these both expressions eq##(1)## and eq##(4)## ?

I don't know. The statistic ##\tau/\sigma## is called the "coefficient of variation". https://en.wikipedia.org/wiki/Coefficient_of_variation The only place I have seen the a relations like ##\sigma^2/\tau^2 = z/N^2 ## is in equations for confidence intervals for the statistic.
In the same time, on [Wikipedia], it is said that :

Population-based statistics [ edit ]
The populations of sets, which may overlap, can be calculated simply as follows:

That article deals with calculating the sample mean and sample standard deviation of a sample when we know those statistics for subsets of the numerical data. This is a different topic that how to use sample standard means and sample standard deviations of subsets of the data to estimate the population mean and population standard deviation of the sample.

Of course it is possible that the maximum liklihood or other type of estimator for a parameter uses only the sample mean and standard deviation from an entire set of data instead of using different calculations on various subsets of the data and combing the results. The thing to understand is that the above article is about a calculation whose correct answer is established by the definitions of sample mean and sample standard deviation for an entire data set. By contrast, an article about how to formulate estimators may deal with maximum liklihood estimators, minimum variance estimators, unbiased estimators etc., which have different definitions.
 
Last edited:
Are they 2 interpretations of a same quantity
?
No.

Could you justify please ? I think there is the notion of confidence between both interpretations but maybe I am wrong.

What is the interest of have these 2 different equations eq(1) and eq(4) ?
 
sorry, I wanted to say : "I think there is the notion of confidence level (C.L) between both interpretations" and this is also related to what is called "equivalent number".
 
fab13 said:
Could you justify please ?

Is it clear that the two equations can produce different results for ##\hat{\tau}##?
What is the interest of have these 2 different equations eq(1) and eq(4) ?

I've never seen equation 4 before. Where did you see it?
 
fab13 said:
equivalent number ##\tilde{N}## is defined by:
$$
\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)
$$
This can only be done for Poisson statistics where expected value and variance are the same.

For a gaussian with mean zero (or negative :-p) you get nonsense.
 
BvU said:
This can only be done for Poisson statistics where expected value and variance are the same.

For a gaussian with mean zero (or negative :-p) you get nonsense.

I don't understand very well when you say this can only be done for Poisson statistic. Indeed, if ##\sigma_1=\tau_1## and ##\sigma_2=\tau_2##, then we would have :

$$\frac{\sigma_{1}}{\tau_{1}}=1=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=1=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)$$

So ##\tilde{N}_{1}=\tilde{N}_{2}=1## ? that makes no sense. What is the signification of this "equivalent number" ? I thought there was a link with confidence interval expressions.
 
Variance is the square of ##\sigma##
 
So what to conclude ? I would get :

##\dfrac{\sigma^2}{\tau}=\dfrac{\sigma}{\sqrt{\tilde{N}}}## ?

I have difficulties to grasp the subtilities of this quantity. As much I can understand the notion of relative dispersion with the ratio ##\dfrac{\sigma}{\tau}##, as less I understand this notion of "equivalent number".

Any clarifications are welcome
 
  • #10
fab13 said:
I don't understand very well when you say this can only be done for Poisson statistic.
What meaning can you possibly attach to ##\ \sigma/\tau\ ## when ##\tau ## is zero or negative ?
Except, of course, to designate ##\ \sigma/|\tau| \ ## as the relative error.
fab13 said:
I have two independent experiments have measured τ11 and τ22 with σi representing errors on measures.
Usually, there is a story attached to such a statement:
How does one 'measure' ##\sigma_i## (as opposed to estimating) and is there a valid reason for them to be different ?​
Are the ##\tau_i## based on independent sample sets taken from one and the same population ?​
I seem to remember

If the ##\tau_i## are averages, the corresponding estimate of ##\sigma_m## of the average is ##\sigma_i/\sqrt {N-1}\approx 1/\sqrt N## (with \sigma_i the estimate for he standard deviatiom of the sample), so in that case you could make a case for ##\sigma_m \propto 1/\sqrt { N_i}##.​
The relative accuracy of an estimate of the standard deviation is approximately ##1/\sqrt N## where N is the sample size. This puts a severe limit on the significance of differences beween the ##\sigma_i## !​
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 7 ·
Replies
7
Views
2K
Replies
1
Views
4K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 27 ·
Replies
27
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K