# Combine several measures: consensus mean or a sampling problem?

• Mikeldon
In summary, the author has 20 subjects, each with a different sensor measurement, and is interested in the mean and variance of all the sensor measurements. The author calculates the mean and variance for each subject and for each sensor. Next, the author calculates the mean and variance of all the sensor measurements for the 20 subjects. Finally, the author tries to find a global estimate for the mean and variance by using either the consensus mean or stratified sampling approach.
Mikeldon
Combine several measures: "consensus mean" or a sampling problem?

Dear all,
I have this problem:
the work aims to test the value (mean and error) of several sensors (with a different measure for each) in a subjects group, and then to have a single global value (a single mean and error) from all the sensors used to test the group. All sensors measure the same molecule with different specificity. The normality of the sensors measure in unknown.

More in details: I have 20 subjects (a,b,c,...,n). For each subject I can use different sensors (A, B, C,...M) , each resulting in a measure. The interest is focused on the value of each sensor (reflecting a biological parameter ) and then on the complessive value of all sensor measures.

The value provided by each sensor is interesting, so for the subjects I have the value of each sensor measurement. Not all measures ar reliable and some must be discarded, so each sensor measures a different subset of the 20 subjects.
I calculated the mean value, and variance of sensor A, B, C,...M using available measures from the group for each of the sensors.

So this is the available data:

Sensor A -> values from subjects (a,c,d,f,h,j...,n) ===> mean(A), variance(A)
Sensor B -> values from subjects (c,d,e,g,...,l) ===> mean(B), variance(B)
Sensor C -> values from subjects (a,b,d,e,j,k,...,n) ===> mean(C), variance(C)
...
Sensor M -> values from subjects (b,c,d,g,h,...,K) ===> mean(M), variance(M)

Each mean and variance calculated upon a different (sometimes equal) set of subjects, because of discarding of some bad values.

Now the problem:

I want to have a single measure from all my means, and relative variance, and I find difficult to decide which method to use, and under which reasons (heteroskedasticity, correlation of samples, ecc ecc).

The global measure should account for variability between and within each sensor, giving at the end a single mean value and a single error.

In particular I tend to consider two approaches: the calculation of a "consensus mean" or -considering each sensor as a strata- treat my work as a stratified sampling (where each sensor would be a strate) and weight each sensor by the number of subjects from which measures are available.

The theoretical approach of "consensus mean" is described here: http://www.fire.nist.gov/bfrlpubs/build02/PDF/b02027.pdf (page 26)
and here:
http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/consmean.htm
And I would use Dataplot with the Mandle-Paule approach.

While for stratified sampling approach I am considering the formula described here:
http://www.spsstools.net/Tutorials/WEIGHTING.pdf (page 9)

Which approach do you think is more suitable for my needs, and why?

Thank you very much!
Michael.

Using stratification would imply that the goal was to compute a mean that is likely to agree with other researchers who use the same set of sensors (and had roughly the same number of non-discarded measurements from them). This is not the same as the goal of finding a mean value that is likely to agree with researchers who use a different set of sensors or, for example, make a larger fraction of their measurments with sensor 'A' than you did.

The consensus mean is simple and if you publish your data, some readers will probably compute it themselves even if you don't. So you should be prepared to answer questions that it may raise.

Usually when people ask questions about real world statistical problems on the forum, they abstract the mathematical information that they think is relevant in order to be concise. This is good, but they usually omit necessary information. (A large amount of information and a considerable number of assumptions are needed to give a mathematical answer to a real world question.)

For example, when you say subject 'a' is measured by sensors 'A' and 'C', this might mean that subject 'a' is something like a mineral specimen that was weighed non-destructively on two different scale.s. Or it might mean that subject 'a' is a tissue sample from an animal that was divided into two parts with one part sent to lab 'A' and one part sent to lab 'C'.

It's also necessary to be clear about your purpose. For example, if your goal is to publish a paper in a particular journal or write a thesis that will be approved by a certain committee then you should look at what statistical methods were used in papers that they have approved. You might simply asked one of the editors or committee members for advice.

I realized that I cannot treat my data like a stratified sampling, as the same subject is measured with different sensors (so my strata would be overlapping).
This leaves the "consensus mean" but usually it is computed for data from different labs, that worked on different (even if similar) samples.

At this point I do not know exactly how to treat my data, and obtain a "global" estimation from all the measures. I also thought to directly calculate a single mean and variance from alla available data, ignoring the fact that I used different sensors, as they are all designed to measure the concentration of a specific molecule.

You are absolutely right about the questions that a particular approach would produce, but I do not really care about them, as I think the most important thing is to obtain a reasonable measure, hence understanding the importance of alla significant aspects (like the overlapping, that exclude considering data like from a stratified sampling). Then answers will be logical.

The aim is to publish a paper, but this technique is really new and no other works have been done previously.

If any other detail is needed I can provide it. About sensors, I already said that normality is of their measures is unknown; also homoskedasticity cannot be assumed (and this prevents me from just taking alla the raw measures and treat them all as a single dataset).

The purpose is to mediate the contribution of different approaches (sensors) that have in common the same purpose and use the same measuring unit. The idea is to squish the maximum information from each sensor, assuming that they all are good to measure the molecule without ignoring that they may be different (i.e. no normality, no same variance), and I do not know how much this is important.

Mikeldon said:
The aim is to publish a paper, but this technique is really new and no other works have been done previously.

The particular subject matter (measuring your moelcule) may be new, but there have probably been many other papers published where some other thing was measured by a variety of instruments. Don't neglect to scan for those in the journals that you are considering.

Do you anticipate that a large part of the content of the paper will be to expound your statistical method?

...
hence understanding the importance of alla significant aspects (like the overlapping, that exclude considering data like from a stratified sampling). Then answers will be logical.

Logic and mathematics can only proceed if they have sufficient information. It's unlikely that you have sufficient objective information. You must be willing to make some assumptions in order to get a mathematical answer.

My interests are in simulation and probabilistic modelling, so that will be my bias in advising your. In complex situations it is very useful to construct a simulation, even if you don't implement it as a computer program. The exercise of making a simulation helps you break the problem into manageable parts and leads to asking pertinent questions.

As an expert in this subject (whatever it is!) you probably know something about how the measuring instruments work. You may find that someone has published a simulation of how these instruments produce errors or you may be able to create such a simulation yourself. You might also find data about a different molecule of similar properties to yours where the true value is known and the values given by various sensors are listed. That would let you form some model of the errors.

A robust method of statistical estimation is the Maximum Likelihood method. If you assume a probability distribution of errors for each sensor (not a family of distributions, but a specific distribution, perhaps a different one for each individual sensor) and you assume the true mean is a specific value x_bar, then you can calculate the probability of the data you observed. A computer program can try various values for x_bar and determine which of them gives the data the highest probability.

The purpose is to mediate the contribution of different approaches (sensors) that have in common the same purpose and use the same measuring unit.

The term "sensor fusion" is often used when people pursue this goal. You might find relevant material by searching for it. I, myself, have never seen any work with that title that impressed me.

Dear Michael,

Thank you for sharing your research and the available data. It is clear that you have put a lot of thought into finding a suitable method for combining the measures from different sensors. Based on the information provided, it seems that the most appropriate approach would be to use the "consensus mean" method. This method takes into account the variability between and within each sensor, which is essential in your study where different sensors are measuring the same molecule with different specificity. Additionally, the Mandle-Paule approach used in Dataplot is a well-established method for calculating consensus means and is commonly used in scientific research.

On the other hand, treating each sensor as a stratum and weighting them by the number of subjects from which measures are available may not fully account for the variability between sensors. This approach may also introduce bias if the number of subjects for each sensor is not balanced.

In conclusion, I would recommend using the "consensus mean" method with the Mandle-Paule approach in Dataplot for combining the measures from different sensors in your study. However, it is always important to consider the limitations and assumptions of any statistical method and to carefully interpret the results. I wish you all the best in your research.

Best regards,

## 1. What is the consensus mean in scientific measurements?

The consensus mean is a statistical method that combines multiple measures or data points to determine a single representative value. It is often used in scientific research to account for variations and biases in individual measurements.

## 2. How is the consensus mean calculated?

The consensus mean is calculated by taking the average of all the individual measures. This can be done by summing all the values and dividing by the total number of measures, or by using more advanced statistical methods such as weighted averaging.

## 3. What is a sampling problem in relation to the consensus mean?

A sampling problem occurs when the measures being combined are not representative of the entire population or when there is a biased selection of measures. This can lead to an inaccurate consensus mean and affect the validity of the results.

## 4. How can you ensure the accuracy of the consensus mean?

To ensure the accuracy of the consensus mean, it is important to use a diverse and representative sample of measures. This includes avoiding biased selection and using appropriate statistical methods to analyze the data.

## 5. Can the consensus mean be applied to any type of data?

Yes, the consensus mean can be applied to any type of data as long as the measures being combined are numerical. It is commonly used in scientific research, but can also be applied in other fields such as market research or public opinion polls.

• Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
829
• Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
7
Views
917
• Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
17
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K