# Hypothesis testing of a complex deterministic model

1. Jul 29, 2013

### Pythagorean

What is a good test for hypothesis testing of a complex deterministic model?

What I'm doing:

I have a complex system of nonlinear differential equations. I must solve them numerically. There are about 22 input parameters that I have rough physiological ranges for and I use a genetic algorithm to determine possible solutions.

Once I have (numerical) solutions, I can make abstract analyses on the output data generated by the the model (like wave width, rise-time, fall-time, etc).

What would be a good statistical test to find correlations and functional relationships between the input parameters and these abstract output quantities.

2. Jul 29, 2013

### Stephen Tashi

Do you know what hypothesis testing is? What hypothesis do you wish to test?

3. Jul 29, 2013

### Pythagorean

I'm familiar with general idea of hypothesis testing (null vs. alternative) but I have no experience with it. I asked one of my professors what the best method for finding correlations between the outputs and in the inputs would be and he said he wasn't sure and that it sounded like hypothesis testing.

The set of hypotheses to test would basically be for each output, M and for each set of inputs Ni:

M is correlated with input set Ni.

So, as an example, the wave width is correlated with the 3rd, 5th, and 6th inputs. (But it might not be correlated with just the sixth alone).

4. Jul 29, 2013

### Stephen Tashi

The general idea of hypothesis testing is that the null hypothesis must be specific enough so you can compute the probability of getting the data that you have, assuming the null hypothesis is true. You say that your model is deterministic. How would you compute a probability for any event?

5. Jul 29, 2013

### Pythagorean

I will basically have a set of such hypotheses, here's a concrete example of one:

when $V_1$ is in the range [0 x], $V_2$ is (qualifier) correlated with $\lambda$.

And I will have thousands of trials with randomly selected $V_1$ and $V_2$ (normally distributed around experimentally determined values) for which I've simulated the deterministic model and produced an output, then measured $\lambda$.

So then to compute the probability, I guess I'd try Bayes first. Conditioning a rise in $\lambda$ on the two independent inputs. So:

P(B) P(A|B) = P(A) P(A|B)
P(A) = P(C)P(D)

B = rise in lambda
C = $V_1$ is in [0,x]
D = rise in $V_2$

Last edited: Jul 29, 2013
6. Jul 29, 2013

### Pythagorean

I'm still not sure how I'd compute probabilities though, I guess. Here's a rather archaic way I came up with:

for P(B), I could go through the data, find pairs of data where lambda rose significantly* and each such set of data will represent an event, of the total possible pairs of data, then for P(B|A) go through those and see, of those, how many of the inputs met condition A. That seems like a lot of work, though and I'm not sure if it's an appropriate way to define an event.

Maybe it would be easier to find the probability of static values (B = lambda is a high value) rather than a change event...

Plotting particular inputs against particular outputs, I can clearly see some trends, but there's no way for me to see all 22 inputs at once, obviously. I was hoping to find some sort of correlation test (like biologists use a T-test) when I originally asked the question of my prof.

*(I'll have to define that too)

7. Jul 30, 2013

### atyy

What are you trying to do? Get new variables in which an approximate but simpler description is possible? Or make predictions for experimentalists to test?

8. Jul 30, 2013

### chiro

You will need to get a sample of data and figure out parameters that correspond to a specific set of hypotheses, and use test statistics for the hypotheses and decide using some decision boundary (think of a confidence interval) whether your hypotheses are rejected or failed to be rejected.

If you want to control the hypothesis tests in more detail then you will need to like at power calculations and sample size calculations especially if you are trying to detect effects with minimum deviations from the actual parameters (i.e. detect a difference of delta where delta is some small number).

If you are looking typically at means (which include means, proportions, and probabilities) then with enough data you should be able to use a lot of asymptotic results (like normal or t-distribution).

If you are looking at hypothesis testing for correlations, or general effects then you may need to look at regression modelling.

Simple correlations between two variables can be done using simple linear regression and you basically use a t-statistic (since you have a t-distribution for the correlation coefficient) and you can get a confidence interval in which you can see if your hypothesis contains that interval or doesn't.

9. Jul 30, 2013

### Pythagorean

Well neither, technically, but equivalently the latter. I have an experimental collaborator that has provided data, but different channel kinetics were taken from different animals and there's some inherent measurement error, so they don't play nice without adjusting some parameters and so there's lots of degeneracy a la Eve Marder's "Multiple Models. .." paper on lobster ganglion (in case you are familiar with it).

I essentially want to capture (and quantify) these degeneracies.

10. Jul 30, 2013

### atyy

I'm a little familiar with Marder's work that multiple parameter regimes can produce the same behaviour. If you're pretty sure that the data fall within the model class, then is the aim to locate regions of parameter space which are consistent with the data?

Last edited: Jul 30, 2013
11. Jul 30, 2013

### Pythagorean

Yes, and I have manually (and found multiple regions!). I now want to see how parameters wave (i.e. action potential) characterization. All action potentials are not the same in the neuron I'm modelling, so it would be of interest to be in a parameter regime where the system's sensitivity to parameters was physiologically relevant as well.

For now though, I'd just like to see what particular (sets of) input parameters correlate well with particular characteristics of the output wave (it's, so far, a 7+ dimensional model with 20+ parameters, so manually/visually is a bit difficult).

12. Jul 30, 2013

### Pythagorean

This (first paragraph) seems most like what I'm after. I'm rather ignorant of statistics/probability in general (I've always done things deterministically). Dice rolls, coin flips, and card draws are about as sophisticated as I ever got in my undergrad.

As for the second paragraph, unfortunately, comparing a single input to a single output will have problems, since the single input will behave differently when other inputs are in other parameter regimes, so I will somehow have to consider all inputs at once (or at least several at once if the complexity is too great with all inputs) for each output.

13. Jul 30, 2013

### atyy

To start, maybe you could set all parameters except 2 at fixed values in the physiological range. Then you'll have a 2D parameter space remaining, and you can classify points in that cross section as having or not having the characteristics?

I'm not sure this is relevant, but have you seen http://www.cs.nyu.edu/~roweis/lle/papers/lleintro.pdf ?

Last edited: Jul 30, 2013
14. Jul 30, 2013

### Pythagorean

I suppose I was hoping to avoid such a laborious effort.

I was thinking more about correlation than existence (as we increase this parameter, the width of the action potential increases).

The article looks interesting and sounds like it's relevant. Not sure how much time and energy I want to invest into this since it's somewhat of an aside from the core research questions.

15. Jul 30, 2013

### atyy

Could you colour code the points in the 2D plane according to the width of the action potential?

I'm not sure hypothesis testing is appropriate, because what you have is supposed to be the hypothesis (actually hypotheses, since it seems to be a model class) without systematic error or measurement noise. Usually hypothesis testing is what one does if one has noisy data and wants to see if that data is consistent with the hypothesis.

So it seems to me more like trying to figure out which 2D planes you should look at, and hoping there's a simple structure in those planes. If all the parameters in the physiological regime fall on a manifold that is less than 22D, then LLE may help in discovering the variables that coordinatize the lower dimensional manifold (and you'll have fewer 2D cross sections to look at), but I agree it's not guaranteed to work, and seems rather blind, whereas a well motivated biological guess might be better.

16. Jul 30, 2013

### Pythagorean

Hrmm... I think I agree that hypothesis testing doesn't quite fit.

I think the ideal would be something like the T-test but multivariate.

17. Jul 30, 2013

### atyy

Maybe I understand what you want to do. It seems reasonable.

You treat your model as generating data probabilistically, by defining your subjective expert opinion as to which paramater ranges are plausible, ie. you subjectively and expertly define P(B1, ... , B22), where B1 to B22 are the parameters.

Then by inspecting the "data" you think that action potential width W is approximately a linear function of some parameter A(B1, ..., B22), ie. you hypothesize W=MA+C.

So first generate data pairs {A,W} where A is drawn randomly according to P.

Then with your data pairs you can just do linear regression on {A,W}. If you get a high r value, then the hypothesis W=MA+C describes a lot of variance in your "data".

18. Jul 30, 2013

### Pythagorean

yes, that's essentially it. Though I throw out some sets, A, when they lead to a steady state solution (I only want limit cycle behavior). So that could throw off quantity (but I'm mostly only interested in quality so that would probably be ok).

I might also hypothesize non-linear relationships between W and A, but it might be simpler to just go with a linear assumption since that is enough to say that W is correlated with some set of B's.

So linear regression is the test I should familiarize myself with? I've heard it a lot and have a vague notion if what it is; there seems to be plentiful resources for it.

19. Jul 30, 2013

### atyy

Yes, start with linear regression. If you're using matlab I think it's something like corrcoef to get the r-value or linfit. It should also give you the p value. There are technical things linear regression assumes, but if your data looks linear by visual inspection, then try it first, and worry about technicalities later (ok, I'm an experimentalist, maybe you shouldn't do that, I'm sure one of the proper stats guys here can tell you the correct procedure ...)

20. Jul 30, 2013

### chiro

Note that in the regression framework, you can make the models as complex as you want and test for all kinds of interaction terms across many variables (as opposed to just a simple correlation coefficient between two variables).

If you want to understand these take a look at interaction effects in regression modelling for n-way effects (where n is the number of variables involved with the specific interaction).