Statistics of Stability?

In summary, Laura is trying to determine the stability of a graph of fitness vs time, but is concerned about the instability of the automata. She is looking for help from statistics.
  • #1
starrymirth
4
0
Hi,

I am curious about how to describe the stability of a graph using some form of bona fide statistical analysis.
I unfortunately have very little statistical background.


The data come from a Computer Science research project I am working on. We are attempting to use simulated cellular automata to arrange themselves in specific shapes. We have a "fitness function" that determines how "well" it fits the shape. The closer the function gets to 0, the better.

We evaluate the fitness at step 100 of the simulation as the measure of how "good" the automaton is.

Now, we are attempting to classify the automaton, and I am trying to determine stability. We have run the simulations for 1000 steps and evaluated the fitness at each step. We can therefore plot a graph of fitness vs time.
We don't much care if the graph varies widely in the first steps, but after step 100 we want it to remain stable.

Currently I have made it work so that it calculates the standard deviation, and then counts how many steps pass until the fitness passes out of the range of 1(or a fraction of) standard deviation/s above and below the fitness value at step 100. That count is then it's stability 'score'.

However, I am concerned that the nature of these automata will skew the standard deviation.

For instance:
>>I have one gene that reaches a good value at step 100 and then promptly dies off completely (thus sending his fitness back up to maximum for the remaining 900 steps) and making him rather un-useful, and in a way, 'unstable'.
>>Another gene "blinks" on and off at each step - also unstable, he alternates between minimum and maximum fitness values at each step.

Are there any methods in statistics that would help in evaluating their stability?
Also, as my statistics is pretty shaky, a method that is common-sense-y would score bonus points. Or is the method I am using an acceptable one?

Thanks
Laura
2nd yr BSc (Physics, Mathematics, Computer Science)
 
Physics news on Phys.org
  • #2
starrymirth said:
Hi,

Currently I have made it work so that it calculates the standard deviation, and then counts how many steps pass until the fitness passes out of the range of 1(or a fraction of) standard deviation/s above and below the fitness value at step 100. That count is then it's stability 'score'.

However, I am concerned that the nature of these automata will skew the standard deviation.

For instance:
>>I have one gene that reaches a good value at step 100 and then promptly dies off completely (thus sending his fitness back up to maximum for the remaining 900 steps) and making him rather un-useful, and in a way, 'unstable'.
>>Another gene "blinks" on and off at each step - also unstable, he alternates between minimum and maximum fitness values at each step.

Are there any methods in statistics that would help in evaluating their stability?
Also, as my statistics is pretty shaky, a method that is common-sense-y would score bonus points. Or is the method I am using an acceptable one?

Thanks
Laura
2nd yr BSc (Physics, Mathematics, Computer Science)

From what I know about GAs, statistical issues come in the with the sampling phase (non random filtering through a fitness function) and possibly in evaluating the effects of mutations and crossovers. The latter would seem to be where instabilities arise. Depending on the fitness function, two or more competing phenotypes may be selected. This can lead to prolonged instability unless you modify the fitness function to favor one phenotype. (I'm assuming groupings of similar genotypes will tend to cluster as a selected phenotype in the filtering process.) Maximization of the fitness function via feedback loops is one way to avoid arbitrary manual interference which sort of defeats the idea behind GAs. There's some literature on this and I'll try to find a free link if you're interested.

In terms of standard statistical techniques like using the standard deviation as some kind of cutoff, I would think that shouldn't be necessary if the fitness algorithm is well designed, but perhaps I'm misunderstanding the problem. For example, a fit phenotype (or genotype) should continue to be selected, not die off over successive generations. Perhaps there's a mismatch between your stated goals and the fitness function.
 
Last edited:
  • #3
I think the real problem is the fitness function. Rather than using your current fitness function, consider checking the minimum fitness over k generations, say 6 (so check 995 to 1000). For a more extreme function, you could give a score of 0 to configurations that change between 999 and 1000...
 
  • #4
We do not yet know the correlation between phenotype and genotype, and so are trying to figure out what sections of the genetic code (the string of 1's and 0's) actually produces the behaviour we want.

At this stage, they are not generational - they are individuals.

What we do is we take a randomly generated population, allow it to develop over generations, irradiate them from time to time (change a few 0's to 1's), and run for a pre-defined number of generations. Then we take the best gene in that entire population (one individual) and perform this analysis of it's lifetime.

We're hoping to find some form of convergent evolution. Once we have several best genes, we analyse them, try to classify them, and attempt to match the genes to functions.

Unfortunately I am not at liberty to change the fitness function. I'm pretty sure the way the function is written is sound (I only joined the project recently), but it sounds like the sampling is causing an artificial environment. I will speak to my Prof-in-charge again, about evaluating fitness randomly/several steps in a row, but I think re-designing the way the fitness is measured won't happen soon. I am also the youngest and second newest member on the team, and so challenging the past year of work is... ill-advised.

I think that now, having measured the fitness at each step with the best genes, the stability is the next level of natural selection. A very low stability score will indicate to us that that gene should be discarded.

Thanks for the advice!

Laura
 
  • #5
Are you calculating a single standard deviation for the entire 1,000-step process, or are you calculating 900 standard deviations (progressively, one for each step 101 through 1,000)?
 

1. What is stability in statistics?

Stability in statistics refers to the consistency of results over repeated applications of the same statistical analysis or procedure. In other words, stable statistical methods should produce similar results when applied to the same data multiple times.

2. Why is stability important in statistics?

Stability is important because it ensures that statistical conclusions are reliable and can be generalized to a larger population. It also allows for comparison between different studies or datasets using the same statistical methods.

3. How is stability measured in statistics?

Stability can be measured by assessing the variability of results obtained from repeated applications of a statistical method, or by comparing the results from different statistical methods used on the same dataset.

4. What factors can affect the stability of statistical methods?

Factors that can affect stability include the sample size, the quality of the data, the distribution of the data, and the complexity of the statistical method being used. Additionally, human error or bias can also impact the stability of results.

5. How can stability be improved in statistical analyses?

Stability can be improved by using larger sample sizes, ensuring the quality and accuracy of the data, and using appropriate statistical methods for the data and research question. It is also important to carefully consider and control for potential sources of bias in the analysis.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
16
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
28
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
738
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
961
  • Set Theory, Logic, Probability, Statistics
2
Replies
37
Views
3K
Back
Top