Artificial said:
Hey, can someone explain how genetic programming works? Like how do evolutionary algorithms and genetic algorithms create social simulations of the future, or how they can predict mutations through these algorithms.
I think you've been reading too much popular hype on GP. Genetic programming is an offshoot of genetic algorithms. These techniques are inspired by the workings of evolutionary biology. Nature has inspired other techniques in computer science as well, such as neural networks and ant colony optimization algorithm. Outside this inspiration, these biologically-motivated programming techniques have little to do with real biology.
GAs work by manipulating sequences of symbols according to rules that mimic sex-based biological evolution: Inheritance, mutation, and cross-over. In biology, some offspring are more fit than are others: Biology has some scoring mechanism. GAs work in a similar manner.
GA developers must confront three key challenges. The first challenge is one of data representation. The problem domain must be representable by a sequences of symbols, called a string in computer science parlance. For example, one way to encode the routing between cities in the traveling salesman problem (TSP) is to simply list the cities in the order traveled.
The second challenge faced by GA developers is to write the rules of mutation and cross-over. Mutation is easy in the TSP: transpose a pair of cities in the routing. Cross-over is a bit more complex. In sexual inheritance, a child inherits half of the genes from each of two parent. GAs do this by splicing substrings of the strings from the parents. Arbitrarily splicing substrings of two routings in the TSP will most likely generate an invalid routing. Care must be taken in both the representation and the cross-over algorithm to ensure that offspring represent valid solutions.
The final challenge faced by GA developers is scoring. While "survival of the fittest" is a poor characterization of evolutionary biology, it is an excellent characterization of GAs. The scoring algorithm for the TSP is easy: The total distance between pairs of cities in a given routing. Coming up with a good scoring algorithm for a real-world problem is a bit more challenging.
With these three challenges solved, the GA developer can simply turn the crank.
- Select the population size, how many elements of the population survive each turn of the crank, how often mutation and cross-over occur.
- Generate an initial population. For example, shuffle a deck of cards representing cities for the TSP.
- Score the population and cull the least-fit member.
- Generate as many random pairs of survivors (not exclusive pairings; GAs can be promiscuous) needed to regenerate the population.
- Apply mutation and cross-over rules to regenerate the population.
- Repeat steps 3 to 5, over and over again.
Genetic programs are simply genetic algorithms in which the strings are computer programs. The first challenge is representation. Lisp is almost exclusively the language of choice here because the language is very well-formed and is amenable to mutation and cross-over. Unlike the TSP, the strings are not a fixed size. GP strings can shrink and grow. Size/complexity can be used as part of the scoring algorithm (along with getting close to the "right" answer, of course.)