If you want to go bonkers on the sub-field mapping and related finite field stuff ...
For a finite field module a prime number p, there are primitives, calling one of them e, where every non-zero number in the field can be represented as e^i % p, where i goes from 0 to p-2. e^(p-1)%p = 1. For example, consider the finite field modulo 5, then all non-zero numbers can be powers of 2:
{2^0 % 5 = 1, 2^1 % 5 = 2, 2^2 % 5 = 4, 2^3 % 5 = 3} (this wraps around at 2^4 % 5 = 1).
or powers of 3:
{3^0 % 5 = 1, 3^1 % 5 = 3, 3^2 % 5 = 4, 3^3 % 5 = 2} (this wraps around at 3^4 % 5 = 1).
For an 8 bit field modulo a 9 bit "prime" polynomial (1 bit coefficients, add and subtract are effectively replaced by xor), there are also primitives. Normally a polynomial where 2 (binary 10) is a primitive is chosen, but AES uses a polynomial where 2 is not a primitive (more on this below).
A simple example of sub-field mapping:
http://rcgldr.net/misc/m8to44.doc
As mentioned, AES uses a 9 bit polynomial where 2 is not a primitive, but typically the 4 bit sub-field mapping will use polynomials x^2 + ax + b where 1x + 0 is a primitive. A brute force search for a combination of primitive 'e' for the 9 bit polynomial, 'a' and 'b' for a potential "compatable" 4 bit sub-field polynomial x^2 + ax + b to map first to two 4 bit sub-fields and then again to a 2 bit sub-field polynomial x^2 + x + 1 to map to four 2 bit sub-fields is done, looking for the combination that results in the fewest number of gates. Link to a pdf file about optimizing an "s-box". The point isn't to understand all the complex math in the document, but to note the effort put into this to reduce the gate count by about 20% (since there can be 10 or more s-boxes on a AES device, the gate count is important).
http://www.iacr.org/archive/ches2005/032.pdf