Estimate the source of a variable when there are several distributions

uzi kiko · May 8, 2019

Hello everyone

In my study, I inject a different amount of fluid in each experiment, such as 1 ml, 2 ml..., and test the change in the general dielectric properties of the solution.

Now that I have done much (over 100) measurements for each injection in a specific volume, one can see that for each injection the values of the new dielectric properties have a normal distribution with specified mean and variance.

For example: for injection of 1 ml, the results distribute with μ1 and σ1, for injection of 2 ml we get μ2 and σ2. When there is some overlap between the various distributions.

Now suppose I get an X value of dielectric property, what is the correct way to estimate the probability that the value X came from a specific injection volume?

Seemingly, I could use the Z test for each distribution separately, but it seems to me that the Z test does not take into account the other distributions.

Thank you!

Dale · May 8, 2019

This sounds like an application best suited for maximum likelihood

WWGD · May 8, 2019

Maybe Bayes?

Dale · May 8, 2019

WWGD said:

Maybe Bayes?

Good idea, particularly if there is some prior information about which volumes are more probable.

marcusl · May 8, 2019

I'm always in favor of Bayesian inference if you figure out how to do it. Gaussian distributions give you priors that are fairly easy to manipulate. If not, and because I'm pretty sure that your process is linear, then a minimum variance estimator is a good fallback. Look up BLUE (Best Linear Unbiased Estimator) as used in, e.g., "spectral unmixing."

uzi kiko · May 9, 2019

Thank you very much for the super fast and great answers!
They are very helpful.

Let's say that I used MLE and came out with the most likelihood model.
For example, let say that for injection of 1ml the values will distribute as X~N(0,1) and for 2ml X~N(1,1).

Now, after I calculated the MLE for the newly observed data, I came out with an estimation about μ and σ, let's say for the example that I reach, μ=1.1 and σ=0.9.

My question is, how can I calculate the probability that the observed data came from N(1,1)?

marcusl · May 9, 2019

Assuming the problem is linear, applying linear regression to the mean values gives you a first-order estimate. In the example above, the estimated volume would be 2.1 ml (not N(1,1)), and confidence (probability) can be inferred from the correlation coefficient. You need to first verify from your data that your system is linear.

To get a true probability, you could try applying Bayesian inference. There are other tests as well, but someone versed in classical statistics would know better than I.

Stephen Tashi · May 9, 2019

uzi kiko said:

In my study, I inject a different amount of fluid in each experiment, such as 1 ml, 2 ml..., and test the change in the general dielectric properties of the solution.

Now suppose I get an X value of dielectric property, what is the correct way to estimate the probability that the value X came from a specific injection volume?

This is not a "well defined" mathematical quesion. (It's similar to asking how one finds the remaining sides of a triangle when given the length of one side and the size of one angle.)

One way to interpret "I get an X value of a dielectric property" is that you pick the X value at random from all the X values that you measured in your experiments, giving all the measured values an equal probability of being chosen. Let's suppose you did the same number of experiments with a 2 ml concentration as with the other oncentrations.

Another way to interpret "I get an X value of a dielectric property" is that another person who did experiments tells you the X value. He picks this value at random from the experiments that he did. Perhaps he did many more tests using a 2 ml concentration than the other concentrations.

The answer to "the probability the value X came from the specific injection volume" depends on knowing the probability distribution for how the injection volumes are selected. If you can specify a specific distribution for how the injection volumes are selected ( a "prior distribution") then we can compute things like "The probability the injection volume is 2 ml given X = 3.1".

Estimate the source of a variable when there are several distributions

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Undergrad My basic understanding of set theory

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

High School Onto set mapping is the surjective set mapping, and into injective?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers