# Calculating the variance of integrated Poisson noise on a defined quantity

• A
• fab13
In summary, the conversation discusses the estimation of the variance expression of Poisson noise on a quantity that is calculated through a sum of terms. The formula for the standard deviation of the noise includes a factor of (2l+1) which leads to discrepancies in the results. Various expressions are proposed and tested, but a definitive justification for the correct formula is not yet provided. Suggestions are made to use a symbolic algebra program to ensure reliable results.

#### fab13

TL;DR Summary
I want to estimate the variance expression of Poisson Noise of the quantity ##\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}##
It is in cosmology context but actually, but it is also a mathematics/statistical issue.

From spherical harmonics with Legendre deccomposition, I have the following definition of
the standard deviation of a ##C_\ell## noised with a Poisson Noise ##N_p## :

##
\begin{equation}
\end{equation}
##

Now I consider the quantity : ##\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}##

I want to estimate the variance expression of Poisson Noise of this qantity.

For that, I take the definition of ##a_{lm}## following a normal distribution with mean equal to zero and take also the definition of a ##C_\ell=\langle a_{lm}^2 \rangle=\dfrac{1}{2\ell+1}\sum_{m=-\ell}^{\ell}\,a_{\ell m}^2 = \text{Var}(a_{lm})##.

I use ##\stackrel{d}{=}## to denote equality in distribution :

##
\begin{align}
Z&\equiv \sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell a_{\ell,m}^2 \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell C_\ell \cdot \bigg( \frac{a_{\ell,m}}{\sqrt{C_\ell}} \bigg)^2 \\[6pt]
&\stackrel{d}{=}\sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell C_\ell \cdot \chi^{2}(1) \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} C_\ell \sum_{m=-\ell}^\ell \chi^{2}(1) \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} C_\ell \cdot \chi^{2}(2 \ell + 1). \\[6pt]
\end{align}
##

So Finally we have :

##
\begin{aligned}
\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2} & \stackrel{d}{=} \sum_{\ell=1}^{N}\,C_\ell \chi^{2}\left(2\ell+1)\right)
\end{aligned}
##

To compute the Poisson Noise of each ##a_{\ell m}^2##, a colleague suggests me to take only the quantity :

and to do the summation to get the variance of Poisson Noise (to make the link with formula (1) and be consistent with it) :

I wrote above ##\text{Var}(N_{p,int})##, make caution, this is just to express the "integrated Poisson variance" over ##\ell## in the quantity ##\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}##.

Unfortunately, when I do numerical computation with this variance formula ##(3)##, I don't get the same variance than with another valid method.

Correct results are got by introducing a factor ##\sqrt{2\ell+1}## into formula (3).

But, under the condition this factor is right, I don't know how to justify it, it is just fined-tuned from my part for the moment and it is not rigorous I admit.

QUESTION :

Is foruma ##(2)## that expresses the variance of Shot Noise on the quantity ##\sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell a_{\ell,m}^2## correct ?

If yes, how could integrate it correctly by summing it on multipole ##\ell## and to remain consistent with the standard deviation formula ##(1)## ? (I talk about the pre-factor ##\sqrt{\frac{2}{(2 \ell+1) f_{sky}}}## that disturbs me in the expression of integrated Poisson Noise)

Last edited:
As a general comment, there is a lot of terms in here that you haven't defined. I know you have a lot going on from reading your post, but is there a paper or thesis that summarizes the measurement and noise sources? Just send us a link or even a citation and we can take a stab at it. It's really hard to make conjectures about expressions when we don't know what they mean.

Your equation 1 doesn't make a lot of sense because it looks like the two noise terms are adding linearly, not in quadrature. Was there a typo?

I'm confused how you get ##\mathrm{Var}(N_{p,int})=\sum_l \frac{2}{f_{sky}N_p^2}## from your equation 1. I get ##\mathrm{Var}(N_{p,int})=\sum_l \frac{2 N_p^2}{f_{sky}}##. It seems very weird to me because I would think that the variance should grow, not shrink, as your poissonian noise term (##N_p##) grows. Did you perhaps normalize by the average squared (##N_p^4##)?

@Twigg, oh , really sorry, I made a mistake regarding the Poisson noise : actually, this is ##N_{p}=\dfrac{1}{n_{g}}## with ##n_{g}## the density of galaxies.

So, my initial formula (3) should be rather : ##\text{Var}(N_{p,int})=\sum_\ell \dfrac{2}{f_{sky}}N_p^2##

##=\sum_\ell \dfrac{2}{f_{sky}}\,\dfrac{1}{n_g^2}##

The lower threre are galaxies, the larger the Poisson noise is.

Ok! I think that makes more sense! The physics of it is still lost on me but now I think the statistics makes more sense.

Equation 1 still seems like a typo to me. I'm going to assume what you mean to write was $$\sigma^2(C_l)(l) = \frac{2}{(2l+1)f_{sky}} \left( C_l^2 + N_p^2 \right)$$

Given that modification, then I think your introduction of ##\sqrt{2l+1}## in your equation 3 makes sense. Here's what I derived: $$Z = \sum_l \sum_m a_{l,m}^2 = \sum_l (2l+1)C_l$$
Now if we ignore the ##C_l^2## term in equation 1, and just calculate the variance of Z due to poissonian noise (let's call that ##\sigma (Z_p)## to distinguish it from the total variance of Z), then we get:
$$\mathrm{Var} (Z_p) = \sum_l (2l+1)^2 \times \left( \frac{2}{(2l+1)f_{sky}} \right) N_p^2 = \frac{2}{f_{sky}} \sum_l (2l+1) N_p^2$$

@Twigg . Thanks for your support.

##\mathrm{Var} (Z_p) = \sum_l (2l+1)^2 \times \left( \frac{2}{(2l+1)f_{sky}} \right) N_p^2 = \frac{2}{f_{sky}} \sum_l (2l+1) N_p^2##

and it gives too much high values for ##\mathrm{Var} (Z_p)##.

However, with this formula :

##\mathrm{Var} (Z_p) = \frac{2}{f_{sky}} \sum_l \sqrt{(2l+1)} N_p^2##

I get the expected results. But this factor is just fine-tuned, I don't know, if it is the right expression, how to justify it ?

Best regards

Ah ok, I see what you mean. I thought you meant putting the square rooted quantity into the standard deviation, not the variance. Now I understand.

Sorry, I don't think we'll be able to help you further. The formulas you've posted predicts the variance with a factor of ##(2l+1)##, as you and I have both demonstrated. If that's not correct, then I think there's an error in the formulas you wrote down at the start. We'd need a lot more context in order to spot such an error.

For these kinds of things where rogue factors of ##2l+1## can make or break your calculation, I recommend using some symbolic algebra program (for example, mathematica). If you limit your inputs to formulas you are 100% sure have the correct prefactors, and maintain a good library of useful functions (like error propgation), you'll get reliable results.

@Twigg

The formula : ##\sigma^2(C_l)(l) = \frac{2}{(2l+1)f_{sky}} \left( C_l^2 + N_p^2 \right)##

comes from the equation (137) on the following paper : https://arxiv.org/pdf/1910.09273.pdf

Hoping this will help

Here additionnal informations about my first method where I am interested in the ratio :

##\dfrac{\operatorname{Var}\left(\boldsymbol{B}_{s p, i n t, i}\right)}{(\sum_\ell C_\ell)^2}##

Indeed, taking the square of (1) for the case ##N_{i j}^{G G} \equiv N_{s p}## make appear cross-terms :

##
\operatorname{Var}\left(C_{\ell}\right)=\left(\sigma_{C, i j}^{G G}(\ell)\right)^{2}=\frac{2}{(2 \ell+1) f_{\mathrm{sky}} \Delta \ell}\left[C_{\ell}^{2}+2 C_{\ell} N_{s p}+N_{s p}^{2}\right]

##

and I don't know how to make the connection with ##\operatorname{Var}\left(B_{s p}\right)##.
Maybe I could make the link with the expression :

##
\operatorname{Var}[X+Y]=\operatorname{Var}[X]+\operatorname{Var}[Y]+2 \cdot \operatorname{Cov}[X, Y]
##

and do the link with :

##
\operatorname{Var}\left(C_{\ell}+N_{s p}\right)=\operatorname{Var}\left(C_{\ell}\right)+2 \operatorname{Cov}\left(C_{\ell}, N_{s p}\right)+\operatorname{Var}\left(N_{s p}\right)
##

But how to prove that :

##
2 \operatorname{Cov}\left(C_{\ell}, N_{s p}\right)=2 C_{\ell} N_{s p}
##

?? If this is exact, could I consider ##2 \operatorname{Cov}\left(C_{\ell}, N_{s p}\right)=0## ?

But it remains the term ##\operatorname{Var}\left(C_{\ell}\right) \ldots##
If I could prove it, this way, Maybe we could infer the expression of variance for integrated Shot noise over ##\ell## (noted ##B_{s p, \text { int }}## and ##B_{p h, \text { int }}## ) and with subscript ##i## referencing the ##i##-th bin of redshift :

##
\begin{aligned}
\operatorname{Var}\left(\boldsymbol{B}_{s p, i n t, i}\right) &=\operatorname{Var}\left(\sum_{j=1}^{N} \boldsymbol{B}_{s p, i}\left(\ell_{j}\right)\right)=\sum_{j=1}^{N} \operatorname{Var}\left(\boldsymbol{B}_{s p, i}\right) \\
&=\sum_{\ell=\ell_{\min }}^{\ell_{\max }} \frac{2}{(2 \ell+1) \Delta \ell f_{\mathrm{sky}}} \frac{1}{\bar{n}_{s p, i}^{2}} \\
\operatorname{Var}\left(\boldsymbol{B}_{p h, i n t, i}\right) &=\operatorname{Var}\left(\sum_{j=1}^{N} \boldsymbol{B}_{p h, i}\left(\ell_{j}\right)\right)=\sum_{j=1}^{N} \operatorname{Var}\left(\boldsymbol{B}_{p h, i}\right) \\
&=\sum_{\ell=\ell_{\min }}^{\ell_{\max }} \frac{2}{(2 \ell+1) \Delta \ell f_{\mathrm{sky}}} \frac{1}{\bar{n}_{p h, i}^{2}}
\end{aligned}
##

Now, Given the fact that I am interested in the ratio (you can remark the different wieghting with ##(2\ell+1)## on ##C_\ell## :

##\dfrac{\operatorname{Var}\left(\boldsymbol{B}_{s p, i n t, i}\right)}{(\sum_\ell (2\ell+1)C_\ell)^2}##

I was hoping that results being better than with the first method but with the formula of @Twigg , this is not the case.

I don't know what to do, any idea is welcome.

That equation (137) says something different:

$$\sigma^2 (C_l)(l) = \frac{1}{\sqrt{f_{sky} \Delta l}} \left[ C_l + N_p \right]$$

If your bin width ##\Delta l## is something like ##(2l+1)##, then this will give you the correct factor of ##\sqrt{2l+1}##, but it will mean your denominator will have a ##\sqrt{f_{sky}}## in it instead of ##f_{sky}##. If that number is close to 1, it might not matter.

is there any chance that ##\mathrm{Var}(Z_p) = \sqrt{\frac{2}{f_{sky}}} \sum_l \sqrt{2l+1} N_p## fits the data well?

P.S. I was mistaken in the modification I suggested for your equation 1. The notation made me think it was a standard deviation, but it was supposed to be a variance.

you can look at the reference formula (there is a typo in formula (137) : it misses the factor ##\dfrac{1}{2\ell+1}).##

You can get rid of ##\Delta\ell## factor, it is just the bins width of power spectrum in code and I take it as ##\Delta\ell =1## in my code.

and my formula (1) on the first post refers to a standard deviation (make caution, ##C_\ell## is a variance).

Twigg said:
That equation (137) says something different:

$$\sigma^2 (C_l)(l) = \frac{1}{\sqrt{f_{sky} \Delta l}} \left[ C_l + N_p \right]$$

If your bin width ##\Delta l## is something like ##(2l+1)##, then this will give you the correct factor of ##\sqrt{2l+1}##, but it will mean your denominator will have a ##\sqrt{f_{sky}}## in it instead of ##f_{sky}##. If that number is close to 1, it might not matter.

is there any chance that ##\mathrm{Var}(Z_p) = \sqrt{\frac{2}{f_{sky}}} \sum_l \sqrt{2l+1} N_p## fits the data well?

P.S. I was mistaken in the modification I suggested for your equation 1. The notation made me think it was a standard deviation, but it was supposed to be a variance.
Thea equation (137) refers to a standard deviation, not a variance. Could you see what's wrong for the expression of Variance of quantity : ##Z = \sum_l \sum_m a_{l,m}^2 = \sum_l (2l+1)C_l##

Regards

fab13 said:
Thea equation (137) refers to a standard deviation, not a variance.
Doesn't the paper explicitly say that eqn 137 defines a covariance?

Sorry, at this point I don't think I can follow the notation, and I don't think I'll be much help.

fab13 said:
Summary:: I want to estimate the variance expression of Poisson Noise of the quantity ##\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}##

It is in cosmology context but actually, but it is also a mathematics/statistical issue.

From spherical harmonics with Legendre deccomposition, I have the following definition of
the standard deviation of a ##C_\ell## noised with a Poisson Noise ##N_p## :

##
\begin{equation}
\end{equation}
##

Now I consider the quantity : ##\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}##

I want to estimate the variance expression of Poisson Noise of this qantity.

For that, I take the definition of ##a_{lm}## following a normal distribution with mean equal to zero and take also the definition of a ##C_\ell=\langle a_{lm}^2 \rangle=\dfrac{1}{2\ell+1}\sum_{m=-\ell}^{\ell}\,a_{\ell m}^2 = \text{Var}(a_{lm})##.

I use ##\stackrel{d}{=}## to denote equality in distribution :

##
\begin{align}
Z&\equiv \sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell a_{\ell,m}^2 \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell C_\ell \cdot \bigg( \frac{a_{\ell,m}}{\sqrt{C_\ell}} \bigg)^2 \\[6pt]
&\stackrel{d}{=}\sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell C_\ell \cdot \chi^{2}(1) \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} C_\ell \sum_{m=-\ell}^\ell \chi^{2}(1) \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} C_\ell \cdot \chi^{2}(2 \ell + 1). \\[6pt]
\end{align}
##

So Finally we have :

##
\begin{aligned}
\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2} & \stackrel{d}{=} \sum_{\ell=1}^{N}\,C_\ell \chi^{2}\left(2\ell+1)\right)
\end{aligned}
##

To compute the Poisson Noise of each ##a_{\ell m}^2##, a colleague suggests me to take only the quantity :

and to do the summation to get the variance of Poisson Noise (to make the link with formula (1) and be consistent with it) :

I wrote above ##\text{Var}(N_{p,int})##, make caution, this is just to express the "integrated Poisson variance" over ##\ell## in the quantity ##\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}##.

Unfortunately, when I do numerical computation with this variance formula ##(3)##, I don't get the same variance than with another valid method.

Correct results are got by introducing a factor ##\sqrt{2\ell+1}## into formula (3).

But, under the condition this factor is right, I don't know how to justify it, it is just fined-tuned from my part for the moment and it is not rigorous I admit.

QUESTION :

Is foruma ##(2)## that expresses the variance of Shot Noise on the quantity ##\sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell a_{\ell,m}^2## correct ?

If yes, how could integrate it correctly by summing it on multipole ##\ell## and to remain consistent with the standard deviation formula ##(1)## ? (I talk about the pre-factor ##\sqrt{\frac{2}{(2 \ell+1) f_{sky}}}## that disturbs me in the expression of integrated Poisson Noise)
I think what you're asking for is the relationship between the sample variance of the result and the actual variance. Is that right?

If so, then the "samples" at play are the ##a_{\ell m}## values, of which there are ##2\ell+1## independent values for each ##\ell## for the full sky (which has to be modified by ##f_{sky}## as shown in your equation for the standard deviation of ##C_{\ell}##). You should be able to just use standard sample variance calculations to get the result from there.

Post 1 of 6: Introduction to ##a_{\ell m}## and ##C_\ell##

I see that this thread was moved to Stats from Cosmology, but the original cosmology context is important. I don't know if I can answer your question fully @fab13, but I may be able to shed some light on the formalism incl. the assumptions and limitations behind it. (@Twigg was also wondering about the physical significance of the various quantities). The treatment of things in terms of ##C_\ell## works for describing a continuous signal distributed over the entire celestial sphere that you can consider to be a statistically-isotropic Gaussian random field, such as e.g. the Cosmic Microwave Background (CMB). You've inherited this formalism from CMB data analysis, but in your OP you talk about your application being (I think) discrete galaxy counts over the sphere and their Poisson statistics. Once you finish reading what I have to say, you'll have to evaluate how applicable this math is to that context.

If you have some continuous astrophysical signal ##S(\mathbf{n})## as a function of direction ##\mathbf{n} = (\theta, \phi)## over the sphere, you can decompose it using the spherical harmonics ##Y_{\ell m}(\mathbf{n})##, which are a set of orthonormal basis functions (or "modes"):
$$\begin{equation}S(\mathbf{n}) = \sum_{\ell=0}^\infty a_{\ell m}Y_{\ell m}\end{equation}$$
and by orthonormality:
$$\begin{equation} a_{\ell m} = \int S(\mathbf{n}) Y^*_{\ell m}\,d\Omega\end{equation}$$
Where the integral is over the entire sphere (##d\Omega## being an element of solid angle), and ##^*## denotes the complex conjugate. This brings up the first key point I wanted to raise. In general, the coefficients of the expansion, ##a_{\ell m}## are complex-valued, because the spherical harmonics themselves (##Y_{\ell m}##) are complex-valued. The ##\theta## dependence of ##Y_{\ell m}## may be in terms of Legendre polynomials, but the there is also a factor of ##e^{im\phi}##. So we can't really go around writing things like "##a^2_{\ell m}##", as you have done. But we can talk about ##|a_{\ell m}|^2 = a_{\ell m}a^*_{\ell m}##. The exception is ##a_{\ell 0}##, which will be real-valued of course.

The monopole term (##\ell = 0##) describes a uniform (isotropic) signal over the entire sphere. The dipole term (##\ell = 1##) describes variations over a characteristic scale of 180 deg. (i.e. two hemispheres notably different in signal intensity), with the three ##m## modes representing 3 possible orthogonal orientations for the dipole. Likewise the quadrupole terms (##\ell = 2##) describe the "dumbells" with signal variations separated by 90 degrees. E.g. bright at the poles and dim azimuthally around the equator, or having two "hot" and two "cold" quadrants. Higher-order multipoles (larger values of ##\ell##) describe signal variations on smaller angular scales, explaining why multipole ##\ell## is regarded as somewhat akin to the spatial frequency in a Fourier decomposition. Notably, for the CMB, the anisotropy (except for the dipole) arises from primordial perturbations to the matter distribution in the early Universe, and some theories of primordial perturbations predict that they would be Gaussian-distributed in intensity. Therefore, for ##\ell \geq 2##, the coefficients ##a_{\ell m}## are assumed to be mean-zero Gaussian random variables, with mean and covariance given by:
$$\begin{equation}E[a_{\ell m}] = 0\end{equation}$$
$$\begin{equation}\mathrm{Cov}(a_{\ell m}, a^*_{\ell' m'} ) = \delta_{\ell \ell' m m'}C_\ell^\mathrm{th}\end{equation}$$
We've in fact made two key assumptions above:
1. Statistical Isotropy - The Kronecker delta in the Covariance equation expresses that only the diagonal of the coefficients' covariance matrix is non-zero. I.e. the coefficients are uncorrelated, both between different multipoles ##\ell##, and for different ##m## modes within a single multipole. This is an assertion that our 2D random field is statistically isotropic. Its variance (power) does not vary from place to place in the sky map. I believe this is akin to a 1D random noise signal (like a time series) exhibiting stationarity, and similarly, the Fourier coefficients of such a stationary 1D signal would be uncorrelated.
2. Gaussianity - By focusing only on the (co)variance of the ##a_{\ell m}## factors, and assuming that this provides complete statistical information about the signal on the sky, we are further assuming that our 2D random field is Gaussian, since all the higher-order moments of the pdf of ##a_{\ell m}## would depend just on the second moment, which is the theoretical variance ##C_\ell^\mathrm{th}##. This is akin to a 1D random noise time series whose statistics are Gaussian, that thus could be described entirely in terms of its PSD.
• In the case of the CMB, the model power spectrum ##C_\ell^\mathrm{th}## comes from the underlying theory of primordial perturbations. So our actual observed CMB sky is just one realization of a random process whose statistics are (perhaps) described by ##C_\ell^\mathrm{th}##. I'm not sure where your theoretical power spectrum for the galaxy distribution would come from. Perhaps from models of later structure formation?
Note that we do not further assume scale invariance of the perturbations, which would be the case if the power spectrum ##C_\ell^{th}## were flat, meaning constant as a function of multipole. This is known not to be true for the CMB due to processing of the primordial spectrum by early-Universe plasma physics (acoustic oscillations etc.), which gave rise to preferred scales in the anisotropy. But I thought I'd bring it up for the sake of furthering intuition. Continuing our analogy, scale invariance of the 2D random field is somewhat akin to our 1D stationary Gaussian noise timestream also being white, and thus having a flat PSD.

Last edited:
• Twigg
Post 2 of 6: The ##C_\ell## Estimator (Signal-Only Case)

Getting back for a moment to the fact that the spherical-harmonic coefficients are complex valued: this means that they can be expressed in terms of real-valued Gaussian random variables (r.v.s) as follows:
$$\begin{equation}a_{\ell m} = x_{\ell m} + iy_{\ell m}\end{equation}$$
With ##E[x_{\ell m}] = E[y_{\ell m}] = 0##. I believe it's inherent to the definition of a complex Gaussian r.v. that the real and imaginary parts are i.i.d., and thus:
$$\begin{equation}\mathrm{Var}(x_{\ell m}) = \mathrm{Var}(y_{\ell m}) = \frac{C_\ell^\mathrm{th}}{2}\end{equation}$$
while ##\mathrm{Cov}(x_{\ell m},y_{\ell m}) = 0##. Note: an exception is the ##m = 0## case, for which ##y_{\ell 0} = 0##, and ##\mathrm{Var}(x_{\ell 0}) = \mathrm{Var}(a_{\ell 0}) = C_\ell^\mathrm{th}##.

Although you won't find this notation for ##a_{\ell m}## in many papers, I've found it to be critical in determining the statistical properties of the observable ##C_\ell##, which is the power spectrum you calculate from the modes that you've measured in your sky map:
$$\begin{equation} C_\ell \equiv \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}|a_{\ell m}|^2 = \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}a_{\ell m}a_{\ell m}^*\end{equation}$$
You can see that we're just estimating the power in multipole ##\ell## by averaging together the power of all the measured ##m## modes within that multipole. You won't get the exact answer from just one signal realization (one sky), but the ensemble average of many realizations should return the theoretical result. I.e. in the signal-only case (no noise), ##C_\ell## is an unbiased estimator for ##C_\ell^\mathrm{th}##:
\begin{aligned} E[C_\ell] &= E\left[ \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}a_{\ell m}a_{\ell m}^*\right]\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}E[a_{\ell m}a_{\ell m}^*]\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}E[(x_{\ell m} + iy_{\ell m})(x_{\ell m} - iy_{\ell m})]\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}E[x_{\ell m}^2 + y_{\ell m}^2]\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}E[x_{\ell m}^2] + E[y_{\ell m}^2]\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}\mathrm{Var}(x_{\ell m}) + \mathrm{Var}(y_{\ell m})\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}\frac{C_\ell^\mathrm{th}}{2} + \frac{C_\ell^\mathrm{th}}{2}\\ &= C_\ell^\mathrm{th} \end{aligned}

Post 3 of 6: Variance of the ##C_\ell## Estimator (Signal-Only Case)

Computing the variance of the estimator is more complicated. For this I find it useful to rely on a property of the coefficients:
$$\begin{equation}a_{\ell (-m)} = a_{\ell m}^*\end{equation}$$
which arises from the fact that 1) the ##Y_{\ell m}## functions have this same property, and 2) the signal on the sky is real valued. Using this property, we can re-write ##C_\ell##, breaking up the negative-##m##, positive-##m##, and zero-##m## components of the summation, and using a dummy variable ##n = -m##:
\begin{aligned} C_\ell &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}a_{\ell m}a_{\ell m}^*\\ &= \frac{1}{2\ell + 1}\left(a_{\ell 0}^2 + \sum_{m=-\ell}^{-1}a_{\ell m}a_{\ell m}^* + \sum_{m=1}^{+\ell}a_{\ell m}a_{\ell m}^* \right)\\ &= \frac{1}{2\ell + 1}\left(x_{\ell 0}^2 + \sum_{n=1}^{+\ell}a_{\ell (-n)}a_{\ell (-n)}^* + \sum_{m=1}^{+\ell}a_{\ell m}a_{\ell m}^* \right)\\ &= \frac{1}{2\ell + 1}\left(x_{\ell 0}^2 + \sum_{n=1}^{+\ell}a^*_{\ell n}a_{\ell n} + \sum_{m=1}^{+\ell}a_{\ell m}a_{\ell m}^* \right)\\ \end{aligned}
In the last line, in the summation over ##n##, I've used the special complex-conjugation property twice. On the first factor, I used it as written. On the second factor, I first took the complex conjugate of both sides, and used that version of the property. Since indices ##n## and ##m## now range over the same set of values, we can write:
\begin{aligned} C_\ell&= \frac{1}{2\ell + 1}\left(x_{\ell 0}^2 + 2\sum_{m=1}^{+\ell}a_{\ell m}a_{\ell m}^* \right)\\ &= \frac{1}{2\ell + 1}\left(x_{\ell 0}^2 + 2\sum_{m=1}^{+\ell}x_{\ell m}^2 + y_{\ell m}^2 \right) \end{aligned}
It's slightly easier to compute the variance of this version of ##C_\ell##. Recall that if ##X## is an r.v. and ##a## is a constant, then ##\mathrm{Var}(aX) = a^2\mathrm{Var}(X)##. Recall also that the variances of independent r.v.s simply add together. Using these two properties:
\begin{aligned} \mathrm{Var}\left(C_\ell\right) &= \mathrm{Var}\left(\frac{1}{2\ell + 1}\left(x_{\ell 0}^2 + 2\sum_{m=1}^{+\ell}x_{\ell m}^2 + y_{\ell m}^2 \right)\right)\\ &= \frac{1}{(2\ell + 1)^2}\left(\mathrm{Var}\left(x_{\ell 0}^2\right) + 4\sum_{m=1}^{+\ell}\mathrm{Var}\left(x_{\ell m}^2\right) + 4\sum_{m=1}^{+\ell}\mathrm{Var}\left(y_{\ell m}^2\right) \right)\\ \end{aligned}
Let's break up these three terms and evaluate them separately:

Term 1
\begin{aligned}\mathrm{Var}(x_{\ell 0}^2) &\equiv E[x_{\ell 0}^4] - \left(E[x_{\ell 0}^2]\right)^2\\ &= E[x_{\ell 0}^4] - \left(\mathrm{Var}(x_{\ell 0})\right)^2\\ &= E[x_{\ell 0}^4] - \left(C_\ell^\mathrm{th}\right)^2\\ \end{aligned}
We're running up against fourth moments now, due to the "variance of a variance" nature of ##\mathrm{Var}\left(C_\ell\right)##. It turns out that if ##X## is a mean-zero Gaussian r.v., then the fourth moment is given by ##E[X^4] = 3(E[X^2])^2 = 3\sigma^4##. This result can be derived from the more general Isserlis theorem. Applying it here yields:
\begin{aligned}\mathrm{Var}(x_{\ell 0}^2) &= 3\left(E[x_{\ell 0}^2]\right)^2 - \left(C_\ell^\mathrm{th}\right)^2\\ &= 3\left(\mathrm{Var}\left(x_{\ell 0}\right)\right)^2 - \left(C_\ell^\mathrm{th}\right)^2\\ &= 3\left(C_\ell^\mathrm{th}\right)^2 - \left(C_\ell^\mathrm{th}\right)^2\\ &= 2\left(C_\ell^\mathrm{th}\right)^2 \end{aligned}
Term 2

By very similar arguments to the ones we made for Term 1, we can show that for the ##m \neq 0## case:
\begin{aligned}\mathrm{Var}(x_{\ell m}^2) &\equiv E[x_{\ell m}^4] - \left(E[x_{\ell m}^2]\right)^2\\ &= E[x_{\ell m}^4] - \left(\mathrm{Var}(x_{\ell m})\right)^2\\ &= 3\left(\frac{C_\ell^\mathrm{th}}{2}\right)^2 - \left(\frac{C_\ell^\mathrm{th}}{2}\right)^2\\ &= 2\frac{\left(C_\ell^\mathrm{th}\right)^2}{4}\\ \end{aligned}
Term 3
Identical result to that of Term 2, because ##x_{\ell m}## and ##y_{\ell m}## are i.i.d.

Plugging the three terms back into the variance equation above gives:
\begin{aligned} \mathrm{Var}\left(C_\ell\right) &= \frac{1}{(2\ell + 1)^2}\left(2\left(C_\ell^\mathrm{th}\right)^2 + 4\sum_{m=1}^{+\ell}2\frac{\left(C_\ell^\mathrm{th}\right)^2}{4} + 4\sum_{m=1}^{+\ell}2\frac{\left(C_\ell^\mathrm{th}\right)^2}{4} \right)\\ &= \frac{1}{(2\ell + 1)^2}\left(2\left(C_\ell^\mathrm{th}\right)^2 + 2\ell\left(C_\ell^\mathrm{th}\right)^2 + 2\ell\left(C_\ell^\mathrm{th}\right)^2 \right)\\ &= \frac{4\ell + 2}{(2\ell + 1)^2}\left(C_\ell^\mathrm{th}\right)^2 \\ &= \frac{2(2\ell + 1)}{(2\ell + 1)^2}\left(C_\ell^\mathrm{th}\right)^2 \\ &= \boxed{\frac{2}{2\ell + 1}\left(C_\ell^\mathrm{th}\right)^2}\\ \end{aligned}
And so we've recovered the result that you and @Twigg were discussing, and that was related to Eq. 137 of the paper that you cited. I know this is my first set of posts on PF, but I hope I have built up at least some cosmology-analysis street cred by deriving this from first principles! You can see from the fact that we had to invoke Isserlis above that this result applies strictly if your random signal is Gaussian-distributed. Again, I don't know how valid that assumption is for your case. Another point: we have the mean and variance of ##C_\ell##, but what is its distribution (pdf)? As you pointed out in your OP, since the estimator is the sum of the squares of a bunch of independent mean-zero Gaussian r.v.s, it should itself be chi-squared distributed. But again, that rests on the assumption of Gaussianity of the signal.

There are also a couple of other simplifications in my result relative to what you're interested in:
1. My result is for the signal-only case (we'll get to signal + noise in a minute)
• Those who are new to this field may be wondering why there is any uncertainty in the signal-only case. Again, it stems from the fact that we can observe only one Universe, and hence only one realization (one sky's worth) of random fluctuations described by ##C_\ell^\mathrm{th}##. So this is a form of sample variance, but in CMB analysis we usually refer to it as "cosmic variance" when we have the full-sky information, and reserve the term "sample variance" for the even larger error in ##C_\ell## resulting from only having observations over a partial sky. Which leads me to:
2. My result is for the full-sky case. The cut-sky case introduces a factor of ##f_\mathrm{sky}## (the fraction of the sky that is observed by the experiment) in the denominator to increase the uncertainty. This is based on a heuristic argument that if you don't observe the full sky, you have fewer than ##2\ell + 1## independent modes/estimates of ##C_\ell^\mathrm{th}## to work with in your sky map. Don't ask me to justify it beyond that.
I personally think a more important thing to realize about a cut-sky analysis (other than simply higher experimental uncertainty) is that having a partial sky destroys statistical isotropy, invalidating a lot of our theory above. If you truncate or apodize your signal with a sky mask, you'll introduce correlations between different multipoles. As a simple intuitive example: if the sky were just a monopole and you truncated that, you'd obviously need higher-##\ell## modes to describe the sharp edges of your sky patch. In this way, you've leaked power from ##\ell = 0## to higher-##\ell## in your observed power spectrum.

For more on methods to deal with mode-mode coupling arising from sky masking and other instrumental effects such as filtering and beams, see e.g.:

- https://arxiv.org/abs/astro-ph/0105302 (from 2002, early CMB analysis techniques)
- https://arxiv.org/abs/2111.01113 (much more recent follow up relevant to more sensitive experiments)

Post 4 of 6: Introducing Additive Noise

Suppose our sky map ##M(\mathbf{n})## consists of the signal discussed above, plus some additive noise ##N(\mathbf{n})## (e.g. from the observing instrument) that is also a mean-zero random field. By the linearity of the spherical-harmonic transform (Post 1), we can simply add the ##a_{\ell m}## coefficients of the signal and noise together:
\begin{aligned}a_{\ell m}^M &= \int \left(S(\mathbf{n})+ N(\mathbf{n})\right) Y^*_{\ell m}\,d\Omega\\ &= \int S(\mathbf{n}) Y^*_{\ell m}\,d\Omega + \int N(\mathbf{n})Y^*_{\ell m}\,d\Omega\\ &= a_{\ell m}^S + a_{\ell m}^N \end{aligned}
As before (Post 2):
$$\begin{equation}a_{\ell m}^S = x_{\ell m} + iy_{\ell m}\end{equation}$$
while we will use a different notation for the noise real and imaginary parts, to avoid confusion:
$$\begin{equation}a_{\ell m}^N = u_{\ell m} + iv_{\ell m}\end{equation}$$
The observed power spectrum of the map is an auto-spectrum i.e. it is the sample covariance of the map ##a_{\ell m}##s with themselves:
\begin{aligned} C_\ell^{MM} &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}\left(a^M_{\ell m}\right)\left(a^M_{\ell m}\right)^*\\ &= \frac{1}{2\ell + 1}\sum_{m = -\ell}^{+\ell}\left(a^S_{\ell m}+a^N_{\ell m}\right)\left(a^S_{\ell m}+a^N_{\ell m}\right)^*\\ &= \frac{1}{2\ell + 1}\left(\sum_{m = -\ell}^{+\ell}\left(a^S_{\ell m}\right)\left(a^S_{\ell m}\right)^* + \sum_{m = -\ell}^{+\ell}\left(a^S_{\ell m}\right)\left(a^N_{\ell m}\right)^* + \sum_{m = -\ell}^{+\ell}\left(a^N_{\ell m}\right)\left(a^S_{\ell m}\right)^* + \sum_{m = -\ell}^{+\ell}\left(a^N_{\ell m}\right)\left(a^N_{\ell m}\right)^* \right)\\ \end{aligned}
Both the noise and the signal are real valued. Therefore, you can show using the special complex-conjugation property of the ##a_{\ell m}##s (Post 3) that the two middle summations are equal to each other and are real valued (this is left as an exercise to the reader). We can therefore write the result in terms of two auto-spectra and one cross-spectrum:
$$\begin{equation}C_\ell^{MM} = C_\ell^{SS} + 2C_\ell^{SN} + C_\ell^{NN}\end{equation}$$

Post 5 of 6: Expectation Value of the Map Power Spectrum (Signal + Noise Case)

We can construct these measured power spectrum quantities (end of Post 4), but are they still useful now that we have noise of unknown properties? To answer that, consider the expectation value and variance of the estimator again:
$$\begin{equation}E\left[C_\ell^{MM}\right] = E\left[C_\ell^{SS}\right] + 2E\left[C_\ell^{SN}\right] + E\left[C_\ell^{NN}\right]\end{equation}$$
Consider the three terms separately:

Term 1

From Post 2:
$$\begin{equation}E\left[C_\ell^{SS}\right] = C_\ell^\mathrm{th}\end{equation}$$
Term 2
\begin{aligned}E\left[C_\ell^{SN}\right] &\propto \sum_{m = -\ell}^{+\ell}E\left[\left(a^S_{\ell m}\right)\left(a^N_{\ell m}\right)^*\right]\\ &= \sum_{m = -\ell}^{+\ell}\mathrm{Cov}\left(\left(a^S_{\ell m}\right), \left(a^N_{\ell m}\right)^*\right) = \sum_{m = -\ell}^{+\ell}\left(C_\ell^{SN}\right)^\mathrm{th}\end{aligned}
So now we come to a question: is the noise statistically independent from the signal? I mean are they independent theoretically speaking, ignoring chance correlations between specific signal and noise realizations? If the noise is instrument noise, it would usually make sense for that to have no real correlation to on-sky signal. And if signal and noise can indeed be modeled as independent r.v.s, then the covariance above is zero, and thus the expectation value of the cross-spectrum is zero; all the measured chance correlations ##C_\ell^{SN}## between signal and noise would average down to zero over many trials of the same experiment. But if signal-to-noise covariance is not zero even in theory, then we're stuck with this cross term! We can't simplify it any further, and our estimation of the signal power is biased by it in an unknown way. I haven't thought about this carefully, but if your noise source is shot noise due to the discrete nature of the galaxies themselves, then wouldn't that noise correlate with the signal, which is the galaxy count? If so, this could be one cause of inaccuracy of the equation (Eq. 137) that you're using to estimate the uncertainty in ##C_\ell##. But maybe this cross-correlation component of the power spectrum is something that you can determine through numerical simulation in your case, or even model analytically...?

Term 3

Likewise for the noise auto-spectrum:
\begin{aligned}E\left[C_\ell^{NN}\right] &\propto \sum_{m = -\ell}^{+\ell}E\left[\left(a^N_{\ell m}\right)\left(a^N_{\ell m}\right)^*\right]\\ &= \sum_{m = -\ell}^{+\ell}\mathrm{Cov}\left(\left(a^N_{\ell m}\right), \left(a^N_{\ell m}\right)^*\right) = \sum_{m = -\ell}^{+\ell}\left(C_\ell^{NN}\right)^\mathrm{th}\end{aligned}

It is the convention in CMB data analysis to use the following symbol for the theoretical noise auto-spectrum ##\left(C_\ell^{NN}\right)^\mathrm{th} = N_\ell##. This could be determined from an instrument noise model, or estimated from differences between dataset halves that should null out the signal. Note that we're not necessarily assuming here that the map noise is statistically isotropic: it may often not be, since it is dependent on telescope scan strategy and other direction-dependent factors that go into the final map. But only diagonal elements of the covariance matrix of the ##a_{\ell m}^N## coefficients appear in the sum above, so those are the only ones we have to worry about influencing the mean of the measured power spectrum, even if the off-diagonal ones are not zero.

Note, however, that by assuming that ##N_\ell## is a complete description of the noise, we are again assuming that the noise is Gaussian, which may or may not be valid for your application.

Anyway, putting this all together, if we can safely assume that Term 2 is 0, then:
$$\begin{equation}E\left[C_\ell^{MM}\right] = C_\ell^\mathrm{th} + N_\ell\end{equation}$$
The estimator is biased (relative to the true sky signal) by the noise power.

Post 6 of 6: Variance of the Map Power Spectrum (Signal + Noise Case)

This one is going to be much thornier to compute (and so I won't actually do it, LOL). Let's return to Post 3's equation for the ##C_\ell## estimator in terms of real-valued r.v.s, but let's update it for the signal + noise case:
\begin{aligned} \mathrm{Var}\left(C_\ell^{MM}\right) &= \mathrm{Var}\left(\frac{1}{2\ell + 1}\left(\mathrm{Re}\left(a_{\ell 0}^M\right)^2 + 2\sum_{m=1}^{+\ell}\mathrm{Re}\left(a_{\ell m}^M\right)^2 + \mathrm{Im}\left(a_{\ell m}^M\right)^2 \right)\right)\\ &= \frac{1}{(2\ell + 1)^2}\mathrm{Var}\left(\underbrace{\left(x_{\ell 0} + u_{\ell 0}\right)^2 + 2\sum_{m=1}^{+\ell}\left(x_{\ell m} + u_{\ell m}\right)^2 + \left(y_{\ell m} + v_{\ell m}\right)^2}_{W}\right) \end{aligned}
Before now, when faced with something like like the above, I've just been bringing the Var() operator inside the summations willy nilly, since the variance of the sum of a set of independent r.v.s is just equal to the sum of the variances of those individual r.v.s. But that assumes that different ##m## modes within the same ##\ell## are independent, which is only true in the case of statistical isotropy! And now that we have noise, it's not necessarily a given that the map is statistically isotropic. But if it isn't, then we can go no further. We're stuck with the equation above. You could try to tackle it using the definition of variance, but that would be insanely messy. I.e. everything in the large parentheses is ##W##, now compute ##E[W^2] - (E[W])^2##. Good luck with that! Shudder. Another way to tackle this might be to write things in terms of the three measured power spectra from Post 4 (SS, SN, and NN). But I don't think it makes things any simpler; you still have to expand all those power spectrum terms out, writing them in terms of the real-valued spherical-harmonic coefficients in the end.

Even if we do assume statistical isotropy, things still get messy. We can now distribute Var() into the parentheses. But consider the variance of even the just the first (##m = 0##) term of ##W##:
$$\begin{equation}\mathrm{Var}\left(x_{\ell 0}^2 + 2x_{\ell 0}u_{\ell 0} + u_{\ell 0}^2\right)\end{equation}$$
I can't even distribute the Var operator inside these parentheses, because even if the signal and noise are independent, ##x_{\ell 0}^2 ##is presumably not independent of ##x_{\ell 0}u_{\ell 0}##!. So my recourse is again to go back to the fundamental definition of variance, which turns the above expression into:

$$\begin{equation}E\left[\left(x_{\ell 0}^2 + 2x_{\ell 0}u_{\ell 0} + u_{\ell 0}^2\right)^2\right] - \left(E\left[x_{\ell 0}^2 + 2x_{\ell 0}u_{\ell 0} + u_{\ell 0}^2\right]\right)^2\end{equation}$$
The first term of the above expression will produce a plethora of "fourth moment" type terms like ##x_{\ell 0}^4##, ##x_{\ell 0}^3u_{\ell 0}##, ##u_{\ell 0}^4##, etc. We can tackle these using Isserlis, but only if both signal and noise are Gaussian r.v.s. The second term is doable because it reduces to
\begin{aligned}&\left(E\left[x_{\ell 0}^2\right] + E\left[2x_{\ell 0}u_{\ell 0}\right] + E\left[u_{\ell 0}^2\right]\right)^2\\ &= \left(\mathrm{Var}\left(x_{\ell 0}^2\right) + 2\mathrm{Cov}\left(x_{\ell 0},u_{\ell 0}\right) + \mathrm{Var}\left(u_{\ell 0}^2\right)\right)^2\\ &= \left(C_\ell^\mathrm{th} + 2\mathrm{Cov}\left(\left(a^S_{\ell 0}\right),\left(a^N_{\ell 0}\right)\right) + N_\ell\right)^2\\\end{aligned}
and if signal and noise are statistically independent, this reduces to ##(C_\ell^\mathrm{th} + N_\ell)^2##. But that's not the full answer, even for ##m = 0##. And we still have the headache of the summation over ##m \neq 0##. So that gives you a flavour of the problem. Whether to bother with this formalism, and whether naive formulas for the error in ##C_\ell## give accurate results comes down to:
• Whether the signal and noise are both statistically isotropic
• Whether the signal and noise are statistically independent of each other
• Whether the signal and noise can be accurately modeled as Gaussian random fields.

@LastScattered1090 thanks a lot for your long and detailled answer, it is really fine from your part.

I said in the first post that "a colleague suggests me to take only the quantity :

"

Actually, this is the translation of an optimal variance for Shot Noise from the "Inverse-variance weighted"
method : this is the key point of discrepancy described aboved under image format.

Actually, in my case, I am working with 2 quantities, the first which is the sum of Cl's and the second one which is the double sum on ##\ell## and ##m## of ##a_lm^2##.

I am expected to get a Figure of Merit better with the second observable than with the first one but numerically when I run the codes, it is the contrary, and I would like to prove mathematically or physically why I have this contradicton. Sorry to put images but this is easier than translate all the Latex code in physicsforums format.

Here all the reasoning :   Problematic as you can see : I have a FoM = 1442.71 for the second observable (double sum) and better FoM (1588) for the first observable :

1) Which one of the 2 results is the right one from the optimal variance of Shot noise chosen ?

2) Do you think the optimal variance that I am using in formula(5) is right.

For me, I think this optimal variance for Shot noise is more appropriate for the first observable (just sum of ##C_\ell## without factors ##(2\ell+1)## but I can't for the moment demonstrate it.

Thanks for yout support,

Regards

Wow, thanks a ton to @LastScattered1090! That looks like it was a lot of work!

I know I'm not sitting at the cool kids table with all this cosmology lingo , but I have a few follow up questions about the last post.

@fab13
How is your figure of merit defined? I'm used to FoM's for fitting that are better when they are small. Are you defining it like an SNR squared, as in ##\frac{1}{\sigma_o^2}##?

The optimal variance in equation 5 is factored into both observable uncertainties, so that, by itself, won't cause a difference between their FoMs.

Why do you expect the FoM on the second (weighted, as in Eqn 6) observable to be lower? By weighting the shot noise as you do in the second observable, you are giving lower priority to data at lower values of ##l##. Why do you expect a better fit when you're partially excluding data? Is there some reason why the lower spherical harmonic components are noisier? The 8% increase in FoM for the unweighted sum (eqn 7) actually makes sense from the perspective that your weighting was too aggressive.

Edit: Nevermind, it makes sense now. Ignore the section I crossed out. Whoops.

My figure of merit is defined by the inverse of determinant of block 2x2 parameters in covariance matrix (with 2 sigma error).

You say that "it makes sense now" : could you tell me more please about these 2 different results at level of FoM.

Caution, in the 2 cases of computing variance ##\sigma_o^2##, the divider (or denominator) is not the same (see equation (9) and (13) on my images above.

Best regards

@Twigg

By saying that "from the perspective that your weighting was too aggressive" (I talk about the second observable, i.e the double sum of ##a_{\ell m}##), what do you mean ?

Regards

fab13 said:
You say that "it makes sense now" : could you tell me more please about these 2 different results at level of FoM.
What I had in my head at the time was nonsense. I thought you had introduced the weighting of ##(2l_1)## into observable 2 as some arbitrary weighting. Then I realized it was just the degeneracy factor (each ##m##-channel contributes noise of its own), and now it seems to me like observable 2 is perfectly natural and observable 1 is the one that's arbitrarily weighted. My intuition is that using arbitrary weights is virtually the same as excluding some of the data from the analysis (the data with low weight gets excluded), so I would expect FoM to be worse not better. That is also what I meant when I said "your weighting was too aggressive", you "excluded" too much data.

fab13 said:
My figure of merit is defined by the inverse of determinant of block 2x2 parameters in covariance matrix (with 2 sigma error).
Sorry, the FoM is still not clear to me. The covariance matrix of what? Which variable is this covariance for?

@Twigg . Thanks for your support.

now it seems to me like observable 2 is perfectly natural and observable 1 is the one that's arbitrarily weighted. My intuition is that using arbitrary weights is virtually the same as excluding some of the data from the analysis (the data with low weight gets excluded), so I would expect FoM to be worse not better. That is also what I meant when I said "your weighting was too aggressive", you "excluded" too much data.
The main idea is that both observable are equal to a constant. So I am looking for, by computing the variance of the ratio, to get constraints on this ratio.

The first approach (first observable) is to sum the ##C_\ell## at numerator and denominator to get implicitly the average value of ##C_\ell## for numerator and the average value ##C_\ell## for denominator (the number of ##C_\ell## disappears when we do the ratio).

##\begin{gathered}
O=\left(\frac{\mathcal{D}_{s p}}{\mathcal{D}_{p h}}\right) \\
\mathcal{D}_{s p}=\sum_{i=l}^{n} C_{\ell, s p}\left(\ell_{i}\right) \\
\mathcal{D}_{p h}=\sum_{i=l}^{n} C_{\ell, p h}\left(\ell_{i}\right)
\end{gathered}
\begin{gathered}
O=\left(\frac{\mathcal{D}_{s p}}{\mathcal{D}_{p h}}\right) \\
\mathcal{D}_{s p}=\sum_{i=l}^{n} C_{\ell, s p}\left(\ell_{i}\right) \\
\mathcal{D}_{p h}=\sum_{i=l}^{n} C_{\ell, p h}\left(\ell_{i}\right)
\end{gathered}##

This method is not an arbitrary weighting : this is just the computation of ratio of average of ##C_\ell##.
I get by this method a Figure of Merit of 1555.88

2) Regarding the Figure of Merit : I am working with Fisher matrix and by inverse it, I get the covariance matrix. I am working into ##w_o## and ##w_a## CDM model.

Here an illustration : 3) with the second weighting :

##O=\dfrac{\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}}{\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell}\left(a_{\ell m}^{\prime}\right)^{2}}=\dfrac{\sum_{\ell=1}^{N}(2\ell+1)\,C_{\ell,sp}}{\sum_{\ell=1}^{N}(2\ell+1)\,C_{\ell,ph}}##

I get a Figure of Merit = 1442.71 : this is lower compared to the first observable (the first observable is not really a weighting, just a sum of ##C_\ell##).

I take over yout intuition : with this second weighting, I exclude the low multipoles ##\ell## and increase the importance for high multipoles ##\ell##. So, this fact has an impact on the computation of variance on the ratio, and so on the FoM.

Then, I would like to be able to explain why with the first observable (ratio of mean of ##C_\ell##) gives a better Figure of merit compared to the second observable (ratio of weighting sum of ##C_\ell## with the weights ##(2\ell+1)##, if this is possible from a mathematical or physical point of view.

Regards.

fab13 said:
@Twigg . Thanks for your support.

The main idea is that both observable are equal to a constant. So I am looking for, by computing the variance of the ratio, to get constraints on this ratio.

The first approach (first observable) is to sum the ##C_\ell## at numerator and denominator to get implicitly the average value of ##C_\ell## for numerator and the average value ##C_\ell## for denominator (the number of ##C_\ell## disappears when we do the ratio).

##\begin{gathered}
O=\left(\frac{\mathcal{D}_{s p}}{\mathcal{D}_{p h}}\right) \\
\mathcal{D}_{s p}=\sum_{i=l}^{n} C_{\ell, s p}\left(\ell_{i}\right) \\
\mathcal{D}_{p h}=\sum_{i=l}^{n} C_{\ell, p h}\left(\ell_{i}\right)
\end{gathered}
\begin{gathered}
O=\left(\frac{\mathcal{D}_{s p}}{\mathcal{D}_{p h}}\right) \\
\mathcal{D}_{s p}=\sum_{i=l}^{n} C_{\ell, s p}\left(\ell_{i}\right) \\
\mathcal{D}_{p h}=\sum_{i=l}^{n} C_{\ell, p h}\left(\ell_{i}\right)
\end{gathered}##

This method is not an arbitrary weighting : this is just the computation of ratio of average of ##C_\ell##.
I get by this method a Figure of Merit of 1555.88

2) Regarding the Figure of Merit : I am working with Fisher matrix and by inverse it, I get the covariance matrix. I am working into ##w_o## and ##w_a## CDM model.

Here an illustration :

View attachment 293381
3) with the second weighting :

##O=\dfrac{\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}}{\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell}\left(a_{\ell m}^{\prime}\right)^{2}}=\dfrac{\sum_{\ell=1}^{N}(2\ell+1)\,C_{\ell,sp}}{\sum_{\ell=1}^{N}(2\ell+1)\,C_{\ell,ph}}##

I get a Figure of Merit = 1442.71 : this is lower compared to the first observable (the first observable is not really a weighting, just a sum of ##C_\ell##).

I take over yout intuition : with this second weighting, I exclude the low multipoles ##\ell## and increase the importance for high multipoles ##\ell##. So, this fact has an impact on the computation of variance on the ratio, and so on the FoM.

Then, I would like to be able to explain why with the first observable (ratio of mean of ##C_\ell##) gives a better Figure of merit compared to the second observable (ratio of weighting sum of ##C_\ell## with the weights ##(2\ell+1)##, if this is possible from a mathematical or physical point of view.

Regards.
Sorry for the duplicates, I have just the formulas :

##\begin{gathered}

O=\left(\frac{\mathcal{D}_{s p}}{\mathcal{D}_{p h}}\right) \\

\mathcal{D}_{s p}=\sum_{i=l}^{n} C_{\ell, s p}\left(\ell_{i}\right) \\

\mathcal{D}_{p h}=\sum_{i=l}^{n} C_{\ell, p h}\left(\ell_{i}\right)

\end{gathered}##

Sorry for the long delay. Stressful time over here.

Unfortunately, I don't know what ##w_o## and ##w_a## mean, but if you're calculating a Fisher matrix then you're doing some form of non-linear fit, right? If so, I wouldn't trust that one observable is better than another observable until you can verify that the observable with the better FoM isn't underscattered relative to the real data. It's happened to me before. We had an observable A and an observable B. When we propagated the noise into our calculations the math said that ##\sigma_{A,theory} < \sigma_{B,theory}##. But when we took multiple data sets and looked at the scatter of the measurements, we found that ##\sigma_{A,data} = \sigma_{B,data} = \sigma_{B,theory}##, in other words we found that the error propagation of A was underscattered relative to the data. It's a long story, but this happened because the way we defined ##A## inherently messed up the statistics. I feel like not counting all the m-modes (your observable 1) could cause a similar effect?

In any case, as a rule of thumb, one method of defining your observable shouldn't give you a better fit than another, unless you are deliberately exploiting a correlation in your data. In your case, I find the fact that there's such a large discrepancy in your FoMs to be suspicious, and would suggest that you compare your expectations to real data as soon as possible. Multiple data sets are necessary, so you can look at scatter when you fit them out. Is this possible in your case?

@Twigg . Thanks for your reply. Indeed, the context is cosmological about ##w_0## and ##w_a## which is an extension to the LCDM standard model.

That's why I would like, if possiblle, all this discussion to be moved to Astrophysics/Cosmology forum.

Regards

fab13 said:
moved to Astrophysics/Cosmology forum.
Which one? Those are two separate sub-forums. Sorry, I woud like to put it in Cosmology forum.

Regards

• berkeman
Isn't really there anyone that could help me about a good or natural weighting for my new observable ?

the primordial condition is that the ratio in this observable between numerator and denominator should be equal to the ratio of squared bias of 2 probes. I think that a natural form for this new observable is the ratio
between the sum of both angular power spectrum :

\begin{gathered}
O=\left(\frac{\mathcal{D}_{s p}}{\mathcal{D}_{p h}}\right) \\
\mathcal{D}_{s p}=\sum_{i=l}^{n} C_{\ell, s p}\left(\ell_{i}\right) \\
\mathcal{D}_{p h}=\sum_{i=l}^{n} C_{\ell, p h}\left(\ell_{i}\right)
\end{gathered}

It gives for instant the best result from a Figure of Merit point (FoM) of view.

Another form has been suggested but less efficient at FoM level :

##O=\dfrac{\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell} a_{\ell m}^{2}}{\sum_{\ell=1}^{N} \sum_{m=-\ell}^{\ell}\left(a_{\ell m}^{\prime}\right)^{2}}=\dfrac{\sum_{\ell=1}^{N}(2\ell+1)\,C_{\ell,sp}}{\sum_{\ell=1}^{N}(2\ell+1)\,C_{\ell,ph}}##

The motivation of this second form was to say that for numerator and divider, we adda huge number of chi2 distribution, so we tend to get a gaussian PDF.

\begin{align}
Z&\equiv \sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell a_{\ell,m}^2 \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell C_\ell \cdot \bigg( \frac{a_{\ell,m}}{\sqrt{C_\ell}} \bigg)^2 \\[6pt]
&\stackrel{d}{=}\sum_{\ell=\ell_{min}}^{\ell_{max}} \sum_{m=-\ell}^\ell C_\ell \cdot \chi^{2}(1) \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} C_\ell \sum_{m=-\ell}^\ell \chi^{2}(1) \\[6pt]
&= \sum_{\ell=\ell_{min}}^{\ell_{max}} C_\ell \cdot \chi^{2}(2 \ell + 1). \\[6pt]
\end{align}

But actually, I have to compute if there is a gain on the standard deviation for this "almost-gaussian" compared to the sum of ##C_\ell## in the first form.

How to proceed for this computation ? If I manage to calculate analytically the standard deviation of the first and second form, I could compare them and prove basically that the first form of observable is the more appropriate.

Best regards

Last edited:
LastScattered1090 said:
As before (Post 2):
$$\begin{equation}a_{\ell m}^S = x_{\ell m} + iy_{\ell m}\end{equation}$$
Careful. These values are not independent, as the transform of the ##a_{\ell m}## values must be real-valued. In practical terms this means that there are ##2\ell + 1## independent variables (not considering the covered sky fraction) per ##\ell##.

You can manage this explicitly by working in a real-valued superposition of ##a_{\ell m}##: ##a_{\ell m} \pm a_{\ell -m}##.

fab13 said:
Isn't really there anyone that could help me about a good or natural weighting for my new observable ?
This thread has gotten decently long, and what you're asking has changed since the first post. No offense, but most folks have probably given up trying to understand. I don't mean to be critical. I'm just saying this as advice for you to get the most from the forums, since you're clearly putting in a lot of effort. Shorter, clearer posts will get much more responses (it's better if your first post is too somewhat too short and others have to ask clarification than if its too long from the start). Repeatedly posting the same question in a long thread doesn't help because it adds to the "wall of text" that new readers will see. Again, just advice, not being trying to be critical.

Regarding my last post, so do you have multiple data sets to look for underscatter or overscatter of your ratios and their error bars? If not, can you break your main data set into statistically independent (i.e. non-overlapping) chunks of data? Let the data tell you which weighting is more natural!

If you have the time, can you also try a third observable? $$D = \sum_l (2l+1) C_l$$. If that gives a FoM equivalent to the second observable, that would at least confirm that the issue is the weighting on different spherical harmonics and that the issue is not correlations between m-channels.