- #1
fab13
- 318
- 6
- TL;DR Summary
- I am talking about issues on the cross-correlation between data to increase information on cosmological parameters with forecast
In the context of a forecast, I am currently working on Fisher's formalism which is part of a more general theory, that of information. My problem applies to estimating cosmological parameters from input data with the Fisher formalism and recipes to build a Fisher matrix. The context is Astrophysics but it can apply to many subjects.
Here is a summary : the input data is 4 columns of data, the first representing the redshift of the galaxies (i.e their distance) and then the other 3 each corresponding to the bias (ie roughly to the uncertainty about their position) of a given galaxy type: there are therefore 3 types of given galaxies and one value per redshift (8 redshifts in all). So I have a table of 8x3 values of bias.
1) First part : Now, I'm trying to cross data to try to extract additional information because for example, for the first type of galaxy, I have only the first 2 biases that are non-zero (I mean for First 2 redshifts), and for the third type, I have 6 different values of 0 for the 6 redshifts above the 2 previous ones.
Here the file of biases for the 3 populations (b1,b2,b3) as a function of redshift (first column) :
# z b1 b2 b3
1.7500000000e-01 1.1133849956e+00 0.0000000000e+00 0.0000000000e+00
4.2500000000e-01 1.7983127401e+00 0.0000000000e+00 0.0000000000e+00
6.5000000000e-01 0.0000000000e+00 1.4469899900e+00 7.1498329000e-01
8.5000000000e-01 0.0000000000e+00 1.4194157200e+00 7.0135835000e-01
1.0500000000e+00 0.0000000000e+00 1.4006739400e+00 6.9209771000e-01
1.2500000000e+00 0.0000000000e+00 0.0000000000e+00 6.8562140000e-01
1.4500000000e+00 0.0000000000e+00 0.0000000000e+00 6.8097541000e-01
1.6500000000e+00 0.0000000000e+00 0.0000000000e+00 6.7756594000e-01
My teacher suggested to me to merge the first column (corresponding to the first type of galaxy) with the third one (corresponding to the third type of population of galaxies), so as to obtain a single vector with only values for the non-zero bias). This way, I simulate a "single population" processing.
From a statistical point of view, will there be a loss or a gain of information if I do this fusion of the 2 columns ? The problem seems rather complex because everything depends on the value of the data.
2) Second part : Another point of view suggested by my teacher: if I take a sample and I cut it in 2 parts, and if I cross-correlate data between the 2 subsets obtained, will I win or lose information from a statisticl point of view ? , i.e at the level of the accuracy of parameters that I will extract from the cross-correlation between the 2 subsets. (in my case, i.e galaxies biaises and Fisher formalism, I mean the constraints that I get after having built my Fisher matrix and invert it) ?
He thinks that at first sight, I can not lose information (which seems intuitive because cutting a sample in 2 is not a loss of info per se) but he says that everything depends on whether I know or not precisely the ratio of the biases between the 2 subsamples : I did not quite understand this notion of ratio between theses 2 biases of subsets.
I am therefore looking for information on this problem, maybe on this forum, statisticians will be able to help me in this technique of cross-correlations and the fact of knowing or not if one gains or one loses some information by bringing together several sources of information.
I think that the gain or the loss of info will be a function of the redundancy of the data (we speak of entropy of Shannon I think, don't we ?).
3) Third part : I could also cross data between overlapped data for 2 columns of data (2 values for each redshift) but here I think it's another problem from a statistical point of view: by the way, I'm talking about at the beginning of data crossing with the merging of 2 vectors but the "cross-correlation" is rather defined in the case of overlapped values, right?
However, in both cases, we cross data, in a certain way.
For the moment, in my algorithm, I treat the first 2 values of the 1st type of population, the 3 others overlapped between the second and the 3rd type, and the last 3 of the 3rd type of population, which is 8 values at total (I mean 8 redshifts): so there are 2 "auto-spectrum" and 1 overlapped spectrum.
My measure of the information gain that I have been talking about since the beginning of this post is done with the computation of constraints by inverting the Fisher matrix, which gives me the covariance matrix and therefore the variance and the correlation of the parameters that I want to estimate: the smaller the standard deviations, the higher the information gain.
Your advices or suggestions on the issue are precious and will help me better understand the logic of this "data cross-correlation" story.
Any help is welcome.
PS: If the topic seems to be placed on the wrong forum, feel free to move it in the forum Mathematics/Statistics but I think this is actually an Astrophysics/Cosmology problem.
Regards
Here is a summary : the input data is 4 columns of data, the first representing the redshift of the galaxies (i.e their distance) and then the other 3 each corresponding to the bias (ie roughly to the uncertainty about their position) of a given galaxy type: there are therefore 3 types of given galaxies and one value per redshift (8 redshifts in all). So I have a table of 8x3 values of bias.
1) First part : Now, I'm trying to cross data to try to extract additional information because for example, for the first type of galaxy, I have only the first 2 biases that are non-zero (I mean for First 2 redshifts), and for the third type, I have 6 different values of 0 for the 6 redshifts above the 2 previous ones.
Here the file of biases for the 3 populations (b1,b2,b3) as a function of redshift (first column) :
# z b1 b2 b3
1.7500000000e-01 1.1133849956e+00 0.0000000000e+00 0.0000000000e+00
4.2500000000e-01 1.7983127401e+00 0.0000000000e+00 0.0000000000e+00
6.5000000000e-01 0.0000000000e+00 1.4469899900e+00 7.1498329000e-01
8.5000000000e-01 0.0000000000e+00 1.4194157200e+00 7.0135835000e-01
1.0500000000e+00 0.0000000000e+00 1.4006739400e+00 6.9209771000e-01
1.2500000000e+00 0.0000000000e+00 0.0000000000e+00 6.8562140000e-01
1.4500000000e+00 0.0000000000e+00 0.0000000000e+00 6.8097541000e-01
1.6500000000e+00 0.0000000000e+00 0.0000000000e+00 6.7756594000e-01
My teacher suggested to me to merge the first column (corresponding to the first type of galaxy) with the third one (corresponding to the third type of population of galaxies), so as to obtain a single vector with only values for the non-zero bias). This way, I simulate a "single population" processing.
From a statistical point of view, will there be a loss or a gain of information if I do this fusion of the 2 columns ? The problem seems rather complex because everything depends on the value of the data.
2) Second part : Another point of view suggested by my teacher: if I take a sample and I cut it in 2 parts, and if I cross-correlate data between the 2 subsets obtained, will I win or lose information from a statisticl point of view ? , i.e at the level of the accuracy of parameters that I will extract from the cross-correlation between the 2 subsets. (in my case, i.e galaxies biaises and Fisher formalism, I mean the constraints that I get after having built my Fisher matrix and invert it) ?
He thinks that at first sight, I can not lose information (which seems intuitive because cutting a sample in 2 is not a loss of info per se) but he says that everything depends on whether I know or not precisely the ratio of the biases between the 2 subsamples : I did not quite understand this notion of ratio between theses 2 biases of subsets.
I am therefore looking for information on this problem, maybe on this forum, statisticians will be able to help me in this technique of cross-correlations and the fact of knowing or not if one gains or one loses some information by bringing together several sources of information.
I think that the gain or the loss of info will be a function of the redundancy of the data (we speak of entropy of Shannon I think, don't we ?).
3) Third part : I could also cross data between overlapped data for 2 columns of data (2 values for each redshift) but here I think it's another problem from a statistical point of view: by the way, I'm talking about at the beginning of data crossing with the merging of 2 vectors but the "cross-correlation" is rather defined in the case of overlapped values, right?
However, in both cases, we cross data, in a certain way.
For the moment, in my algorithm, I treat the first 2 values of the 1st type of population, the 3 others overlapped between the second and the 3rd type, and the last 3 of the 3rd type of population, which is 8 values at total (I mean 8 redshifts): so there are 2 "auto-spectrum" and 1 overlapped spectrum.
My measure of the information gain that I have been talking about since the beginning of this post is done with the computation of constraints by inverting the Fisher matrix, which gives me the covariance matrix and therefore the variance and the correlation of the parameters that I want to estimate: the smaller the standard deviations, the higher the information gain.
Your advices or suggestions on the issue are precious and will help me better understand the logic of this "data cross-correlation" story.
Any help is welcome.
PS: If the topic seems to be placed on the wrong forum, feel free to move it in the forum Mathematics/Statistics but I think this is actually an Astrophysics/Cosmology problem.
Regards
Last edited: