Hypothesis testing of a complex deterministic model

In summary: The t-statistic is a measure of how likely it is that the data would fall within the interval given the hypothesis.So if you have a hypothesis that says there is a correlation between two variables, and you have data that falls within the bounds of that confidence interval, then your hypothesis is not rejected.In summary, hypothesis testing is a statistical tool used to determine if there is a correlation or functional relationship between two variables. To do this, you need a sample of data that falls within the bounds of the confidence interval and you use test statistics to determine if the hypothesis is rejected or failed to be rejected.
  • #1
Pythagorean
Gold Member
4,401
313
What is a good test for hypothesis testing of a complex deterministic model?

What I'm doing:

I have a complex system of nonlinear differential equations. I must solve them numerically. There are about 22 input parameters that I have rough physiological ranges for and I use a genetic algorithm to determine possible solutions.

Once I have (numerical) solutions, I can make abstract analyses on the output data generated by the the model (like wave width, rise-time, fall-time, etc).

What would be a good statistical test to find correlations and functional relationships between the input parameters and these abstract output quantities.
 
Physics news on Phys.org
  • #2
Pythagorean said:
What is a good test for hypothesis testing of a complex deterministic model?


Do you know what hypothesis testing is? What hypothesis do you wish to test?
 
  • #3
I'm familiar with general idea of hypothesis testing (null vs. alternative) but I have no experience with it. I asked one of my professors what the best method for finding correlations between the outputs and in the inputs would be and he said he wasn't sure and that it sounded like hypothesis testing.

The set of hypotheses to test would basically be for each output, M and for each set of inputs Ni:

M is correlated with input set Ni.

So, as an example, the wave width is correlated with the 3rd, 5th, and 6th inputs. (But it might not be correlated with just the sixth alone).
 
  • #4
Pythagorean said:
I'm familiar with general idea of hypothesis testing (null vs. alternative) but I have no experience with it.).

The general idea of hypothesis testing is that the null hypothesis must be specific enough so you can compute the probability of getting the data that you have, assuming the null hypothesis is true. You say that your model is deterministic. How would you compute a probability for any event?
 
  • #5
I will basically have a set of such hypotheses, here's a concrete example of one:

when [itex]V_1[/itex] is in the range [0 x], [itex]V_2[/itex] is (qualifier) correlated with [itex]\lambda[/itex].

And I will have thousands of trials with randomly selected [itex]V_1[/itex] and [itex]V_2[/itex] (normally distributed around experimentally determined values) for which I've simulated the deterministic model and produced an output, then measured [itex]\lambda[/itex].

So then to compute the probability, I guess I'd try Bayes first. Conditioning a rise in [itex]\lambda[/itex] on the two independent inputs. So:

P(B) P(A|B) = P(A) P(A|B)
P(A) = P(C)P(D)

B = rise in lambda
C = [itex]V_1[/itex] is in [0,x]
D = rise in [itex]V_2[/itex]
 
Last edited:
  • #6
I'm still not sure how I'd compute probabilities though, I guess. Here's a rather archaic way I came up with:

for P(B), I could go through the data, find pairs of data where lambda rose significantly* and each such set of data will represent an event, of the total possible pairs of data, then for P(B|A) go through those and see, of those, how many of the inputs met condition A. That seems like a lot of work, though and I'm not sure if it's an appropriate way to define an event.

Maybe it would be easier to find the probability of static values (B = lambda is a high value) rather than a change event...

Plotting particular inputs against particular outputs, I can clearly see some trends, but there's no way for me to see all 22 inputs at once, obviously. I was hoping to find some sort of correlation test (like biologists use a T-test) when I originally asked the question of my prof.

*(I'll have to define that too)
 
  • #7
What are you trying to do? Get new variables in which an approximate but simpler description is possible? Or make predictions for experimentalists to test?
 
  • #8
You will need to get a sample of data and figure out parameters that correspond to a specific set of hypotheses, and use test statistics for the hypotheses and decide using some decision boundary (think of a confidence interval) whether your hypotheses are rejected or failed to be rejected.

If you want to control the hypothesis tests in more detail then you will need to like at power calculations and sample size calculations especially if you are trying to detect effects with minimum deviations from the actual parameters (i.e. detect a difference of delta where delta is some small number).

If you are looking typically at means (which include means, proportions, and probabilities) then with enough data you should be able to use a lot of asymptotic results (like normal or t-distribution).

If you are looking at hypothesis testing for correlations, or general effects then you may need to look at regression modelling.

Simple correlations between two variables can be done using simple linear regression and you basically use a t-statistic (since you have a t-distribution for the correlation coefficient) and you can get a confidence interval in which you can see if your hypothesis contains that interval or doesn't.
 
  • #9
atyy said:
What are you trying to do? Get new variables in which an approximate but simpler description is possible? Or make predictions for experimentalists to test?

Well neither, technically, but equivalently the latter. I have an experimental collaborator that has provided data, but different channel kinetics were taken from different animals and there's some inherent measurement error, so they don't play nice without adjusting some parameters and so there's lots of degeneracy a la Eve Marder's "Multiple Models. .." paper on lobster ganglion (in case you are familiar with it).

I essentially want to capture (and quantify) these degeneracies.
 
  • #10
Pythagorean said:
Well neither, technically, but equivalently the latter. I have an experimental collaborator that has provided data, but different channel kinetics were taken from different animals and there's some inherent measurement error, so they don't play nice without adjusting some parameters and so there's lots of degeneracy a la Eve Marder's "Multiple Models. .." paper on lobster ganglion (in case you are familiar with it).

I essentially want to capture (and quantify) these degeneracies.

I'm a little familiar with Marder's work that multiple parameter regimes can produce the same behaviour. If you're pretty sure that the data fall within the model class, then is the aim to locate regions of parameter space which are consistent with the data?
 
Last edited:
  • #11
Yes, and I have manually (and found multiple regions!). I now want to see how parameters wave (i.e. action potential) characterization. All action potentials are not the same in the neuron I'm modelling, so it would be of interest to be in a parameter regime where the system's sensitivity to parameters was physiologically relevant as well.

For now though, I'd just like to see what particular (sets of) input parameters correlate well with particular characteristics of the output wave (it's, so far, a 7+ dimensional model with 20+ parameters, so manually/visually is a bit difficult).
 
  • #12
chiro said:
If you are looking at hypothesis testing for correlations, or general effects then you may need to look at regression modelling.

Simple correlations between two variables can be done using simple linear regression and you basically use a t-statistic (since you have a t-distribution for the correlation coefficient) and you can get a confidence interval in which you can see if your hypothesis contains that interval or doesn't.

This (first paragraph) seems most like what I'm after. I'm rather ignorant of statistics/probability in general (I've always done things deterministically). Dice rolls, coin flips, and card draws are about as sophisticated as I ever got in my undergrad.

As for the second paragraph, unfortunately, comparing a single input to a single output will have problems, since the single input will behave differently when other inputs are in other parameter regimes, so I will somehow have to consider all inputs at once (or at least several at once if the complexity is too great with all inputs) for each output.
 
  • #13
Pythagorean said:
Yes, and I have manually (and found multiple regions!). I now want to see how parameters wave (i.e. action potential) characterization. All action potentials are not the same in the neuron I'm modelling, so it would be of interest to be in a parameter regime where the system's sensitivity to parameters was physiologically relevant as well.

For now though, I'd just like to see what particular (sets of) input parameters correlate well with particular characteristics of the output wave (it's, so far, a 7+ dimensional model with 20+ parameters, so manually/visually is a bit difficult).

To start, maybe you could set all parameters except 2 at fixed values in the physiological range. Then you'll have a 2D parameter space remaining, and you can classify points in that cross section as having or not having the characteristics?

I'm not sure this is relevant, but have you seen http://www.cs.nyu.edu/~roweis/lle/papers/lleintro.pdf ?
 
Last edited:
  • #14
atyy said:
To start, maybe you could set all parameters except 2 at fixed values in the physiological range. Then you'll have a 2D parameter space remaining, and you can classify points in that cross section as having or not having the characteristics?

I'm not sure this is relevant, but have you seen http://www.cs.nyu.edu/~roweis/lle/papers/lleintro.pdf ?

I suppose I was hoping to avoid such a laborious effort.

I was thinking more about correlation than existence (as we increase this parameter, the width of the action potential increases).

The article looks interesting and sounds like it's relevant. Not sure how much time and energy I want to invest into this since it's somewhat of an aside from the core research questions.
 
  • #15
Pythagorean said:
I was thinking more about correlation than existence (as we increase this parameter, the width of the action potential increases).

Could you colour code the points in the 2D plane according to the width of the action potential?

Pythagorean said:
The article looks interesting and sounds like it's relevant. Not sure how much time and energy I want to invest into this since it's somewhat of an aside from the core research questions.

I'm not sure hypothesis testing is appropriate, because what you have is supposed to be the hypothesis (actually hypotheses, since it seems to be a model class) without systematic error or measurement noise. Usually hypothesis testing is what one does if one has noisy data and wants to see if that data is consistent with the hypothesis.

So it seems to me more like trying to figure out which 2D planes you should look at, and hoping there's a simple structure in those planes. If all the parameters in the physiological regime fall on a manifold that is less than 22D, then LLE may help in discovering the variables that coordinatize the lower dimensional manifold (and you'll have fewer 2D cross sections to look at), but I agree it's not guaranteed to work, and seems rather blind, whereas a well motivated biological guess might be better.
 
  • #16
Hrmm... I think I agree that hypothesis testing doesn't quite fit.

I think the ideal would be something like the T-test but multivariate.
 
  • #17
Maybe I understand what you want to do. It seems reasonable.

You treat your model as generating data probabilistically, by defining your subjective expert opinion as to which paramater ranges are plausible, ie. you subjectively and expertly define P(B1, ... , B22), where B1 to B22 are the parameters.

Then by inspecting the "data" you think that action potential width W is approximately a linear function of some parameter A(B1, ..., B22), ie. you hypothesize W=MA+C.

So first generate data pairs {A,W} where A is drawn randomly according to P.

Then with your data pairs you can just do linear regression on {A,W}. If you get a high r value, then the hypothesis W=MA+C describes a lot of variance in your "data".
 
  • #18
yes, that's essentially it. Though I throw out some sets, A, when they lead to a steady state solution (I only want limit cycle behavior). So that could throw off quantity (but I'm mostly only interested in quality so that would probably be ok).

I might also hypothesize non-linear relationships between W and A, but it might be simpler to just go with a linear assumption since that is enough to say that W is correlated with some set of B's.

So linear regression is the test I should familiarize myself with? I've heard it a lot and have a vague notion if what it is; there seems to be plentiful resources for it.
 
  • #19
Yes, start with linear regression. If you're using MATLAB I think it's something like corrcoef to get the r-value or linfit. It should also give you the p value. There are technical things linear regression assumes, but if your data looks linear by visual inspection, then try it first, and worry about technicalities later (ok, I'm an experimentalist, maybe you shouldn't do that, I'm sure one of the proper stats guys here can tell you the correct procedure ...)
 
  • Like
Likes 1 person
  • #20
Note that in the regression framework, you can make the models as complex as you want and test for all kinds of interaction terms across many variables (as opposed to just a simple correlation coefficient between two variables).

If you want to understand these take a look at interaction effects in regression modelling for n-way effects (where n is the number of variables involved with the specific interaction).
 

What is a complex deterministic model?

A complex deterministic model is a mathematical representation of a system or process that follows deterministic rules, meaning that the output is completely determined by the input and there is no randomness involved. It often involves a large number of variables and equations, making it difficult to analyze and understand without the use of mathematical techniques.

Why is hypothesis testing important for complex deterministic models?

Hypothesis testing allows scientists to make informed decisions about the validity of their complex deterministic models. It helps determine whether the model accurately represents the real-world system and can be used to make predictions and draw conclusions.

What are the steps involved in hypothesis testing of a complex deterministic model?

The steps involved in hypothesis testing of a complex deterministic model include formulating a research question, setting up the null and alternative hypotheses, selecting an appropriate statistical test, collecting data, analyzing the results, and interpreting the findings.

What are some challenges of hypothesis testing for complex deterministic models?

Some challenges of hypothesis testing for complex deterministic models include the need for large amounts of data, the potential for overfitting the model to the data, and the difficulty in determining causality due to the complex nature of the model.

How can scientists ensure the validity of their hypothesis testing for complex deterministic models?

To ensure the validity of their hypothesis testing for complex deterministic models, scientists can use a combination of statistical techniques, perform sensitivity analyses, and seek peer review and replication of their results. It is also important to carefully consider the assumptions and limitations of the model and the statistical tests used.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
468
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
871
  • Programming and Computer Science
Replies
4
Views
806
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
863
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
7K
  • Computing and Technology
Replies
4
Views
1K
  • General Math
Replies
2
Views
161
Back
Top