# Statistics of Stability?

1. Jul 24, 2010

### starrymirth

Hi,

I am curious about how to describe the stability of a graph using some form of bona fide statistical analysis.
I unfortunately have very little statistical background.

The data come from a Computer Science research project I am working on. We are attempting to use simulated cellular automata to arrange themselves in specific shapes. We have a "fitness function" that determines how "well" it fits the shape. The closer the function gets to 0, the better.

We evaluate the fitness at step 100 of the simulation as the measure of how "good" the automaton is.

Now, we are attempting to classify the automaton, and I am trying to determine stability. We have run the simulations for 1000 steps and evaluated the fitness at each step. We can therefore plot a graph of fitness vs time.
We don't much care if the graph varies widely in the first steps, but after step 100 we want it to remain stable.

Currently I have made it work so that it calculates the standard deviation, and then counts how many steps pass until the fitness passes out of the range of 1(or a fraction of) standard deviation/s above and below the fitness value at step 100. That count is then it's stability 'score'.

However, I am concerned that the nature of these automata will skew the standard deviation.

For instance:
>>I have one gene that reaches a good value at step 100 and then promptly dies off completely (thus sending his fitness back up to maximum for the remaining 900 steps) and making him rather un-useful, and in a way, 'unstable'.
>>Another gene "blinks" on and off at each step - also unstable, he alternates between minimum and maximum fitness values at each step.

Are there any methods in statistics that would help in evaluating their stability?
Also, as my statistics is pretty shaky, a method that is common-sense-y would score bonus points. Or is the method I am using an acceptable one?

Thanks
Laura
2nd yr BSc (Physics, Mathematics, Computer Science)

2. Jul 29, 2010

### SW VandeCarr

From what I know about GAs, statistical issues come in the with the sampling phase (non random filtering through a fitness function) and possibly in evaluating the effects of mutations and crossovers. The latter would seem to be where instabilities arise. Depending on the fitness function, two or more competing phenotypes may be selected. This can lead to prolonged instability unless you modify the fitness function to favor one phenotype. (I'm assuming groupings of similar genotypes will tend to cluster as a selected phenotype in the filtering process.)

Maximization of the fitness function via feedback loops is one way to avoid arbitrary manual interference which sort of defeats the idea behind GAs. There's some literature on this and I'll try to find a free link if you're interested.

In terms of standard statistical techniques like using the standard deviation as some kind of cutoff, I would think that shouldn't be necessary if the fitness algorithm is well designed, but perhaps I'm misunderstanding the problem. For example, a fit phenotype (or genotype) should continue to be selected, not die off over successive generations. Perhaps there's a mismatch between your stated goals and the fitness function.

Last edited: Jul 29, 2010
3. Jul 29, 2010

### CRGreathouse

I think the real problem is the fitness function. Rather than using your current fitness function, consider checking the minimum fitness over k generations, say 6 (so check 995 to 1000). For a more extreme function, you could give a score of 0 to configurations that change between 999 and 1000...

4. Aug 3, 2010

### starrymirth

We do not yet know the correlation between phenotype and genotype, and so are trying to figure out what sections of the genetic code (the string of 1's and 0's) actually produces the behaviour we want.

At this stage, they are not generational - they are individuals.

What we do is we take a randomly generated population, allow it to develop over generations, irradiate them from time to time (change a few 0's to 1's), and run for a pre-defined number of generations. Then we take the best gene in that entire population (one individual) and perform this analysis of it's lifetime.

We're hoping to find some form of convergent evolution. Once we have several best genes, we analyse them, try to classify them, and attempt to match the genes to functions.

Unfortunately I am not at liberty to change the fitness function. I'm pretty sure the way the function is written is sound (I only joined the project recently), but it sounds like the sampling is causing an artificial environment. I will speak to my Prof-in-charge again, about evaluating fitness randomly/several steps in a row, but I think re-designing the way the fitness is measured won't happen soon. I am also the youngest and second newest member on the team, and so challenging the past year of work is... ill-advised.

I think that now, having measured the fitness at each step with the best genes, the stability is the next level of natural selection. A very low stability score will indicate to us that that gene should be discarded.