Should I Become a Mathematician?

mathwonk · Oct 27, 2006

oh i remember. i was referring you to some web based notes by other people on linear algebra. ill look them up. or just search on google for linear algebra notes, books.

mathwonk · Oct 27, 2006

oops, here's another elem one. http://www.numbertheory.org/book/

mathwonk · Oct 27, 2006

http://www.etsu.edu/math/gardner/2010/notes.htm

mathwonk · Oct 27, 2006

heres a more advanced one. but ilike the one by Ruslan Sharipov, who posts here, his second level linear algebra notes are truly excellent.

Ki Man · Oct 27, 2006

wow thanks guys

dontbesilly · Oct 27, 2006

mathwonk said:

learn it as soon as possibvle, from any source that helps. learn it in asmany ways as one can. better not to wait until reals as then it si very hard and coupled with very hard topics too.

i started elarning it ni high school, from the book principels of mathematics, by allendoerfer and oakley. i also took euclidean geomnetry, whose absence is one of the main reasons proofs are no longer understood by todays students.

i.e. removing geometry proofs and inserting AP calculua is a prime culprit for our current demise as a math nation.

I've heard this said before...that proofs have been removed from geometry classes. That was not true for me. I took it back in 2003/2004 and we proved everything we did, all the time. Most of our work infact involved proofs, or constructions if I recall correctly. Of course, I live in a pretty good school district, and was on the advanced track. Maybe it's different elsewhere, or for those who take it later than I did.

But yeah, understanding those proofs was pretty key to my mathematical developement.

mathwonk · Oct 27, 2006

glad to hear it. of course now that we are grown up and geting our own info and motivating ourselves, we can fill any gaps that were left by our schooling. so thanks for the input and the questions.

my personal goal for the next few weeks is to learn either Grothendieck's version of Galois theory (etale maps and etale cohomology), or learn about Galois theory of ring extensions.

Ki Man · Oct 27, 2006

dontbesilly said:

That was not true for me. I took it back in 2003/2004 and we proved everything we did, all the time. Most of our work infact involved proofs, or constructions if I recall correctly.

I believe he was referring to how they take the Geometry-style proof and fill it in with Calcula, not them taking out the proofs completely.

Geometry proofs and Calculus proofs are different, right?

Hmm... Last week I got my hands on a Calculus textbook in pretty good condition for $1 at a library booksale along with ancient sheet music of 'The Messiah' Copyrighted 1918. its amazing what you can find if you look in certain places :P.

Calculus: early transcendentals (3rd edition)
James Stewart.

I can't understand any of it right now, but maybe later i'll undertsand it along with E=hv :]

johnnyp · Oct 27, 2006

Thanks Jason and Mathwonk. My real analysis class is using a text called Principles of Mathematical Analysis, but I'm not liking it very much. The professor's classnotes, on the other hand, are amazingly clear and motivated, and the proofs are a bit longer than Principle's, but more instructive and less terse.

Oh and real analysis here is the transition to proofs class - is that a bad thing?

quasar987 · Oct 27, 2006

Sharipov's book:

http://uk.arxiv.org/abs/math.HO/0405323/

Sharipov's bookS:

http://www.geocities.com/r-sharipov/e4-b.htm

mathwonk · Oct 27, 2006

who is your professor johnnyp/ and where are you studying? i like to know about good profs and good schools.

mathwonk · Oct 27, 2006

uh oh. be careful what you wish for. we are stuck!

mathwonk · Oct 28, 2006

a nice thing in sharipov's is inclusion of a first chapter with basic set theoretic terminology about maps, images inverse images, and so on. this is crucial in all advanced course work.

then his treatment of jordan forms begins at the essential point, namely explanation of structure of nilpotent operators.

i.e. jordan form tells you how to understand an operator based on its minimal polynomial. since the simplest polynomial is X^n, this is the basic case.

a polynomial T that satisfies X^n, i.e. such that T^n = 0, is called nilpotent, because some power of it is zero, or nil. [zero-power = nil-potent.] you get it.

mathwonk · Oct 28, 2006

by the way, if my recent plaint caused us to be stuck, i apologize to those of you who did not want that.

it makes it easier to find, as a sticky, but harder to gauge the level of current interest. so either way is ok with me.

whoever put it up here presumably did so either to prevent our feeling ignored, or to make it more useful.

either way it was thoughtful, and i thank you.

mathwonk · Oct 28, 2006

to get into a phd program, you have to pass 2 hurdles: 1) admission to the university grad school and 2) admission to the departmental program.

math departments are not so bureaucratic as university grad schools are. the latter will require degrees and certifications for admissions.

moreover, those requirements are there for a reason, since people without them are almost always lacking some quality that would help insure success.

however if you are that rare bird, a truly exceptional mathematics talent, who knows all they need to, and can do the work, then you might get in without a degree.

let me say this is highly unlikely, and unrecommended. why would anyone want to avoid the college experience, which many people recall as the best time of their lives?

and why would anyone think it more likely to succeed in grad school without being instructed for four years by experts?

i recommend you give yourself every opportunity. take all the usual courses, convince people you have the ability to do a graduate degree.

besides, one thing you would be missing without this experience is the ability to convince someone to write a recommendation letter.

here is one possible scenario: show up at a grad school, on your own nickel, and sign up for a course that advanced undergrads are taking, people who are thought of as grad school material, and outperform them.

or show up at a grad school and take and pass their phd prelims.

radou · Oct 28, 2006

I'm currently on the 3rd year of civil engineering and am interested in math. Well, 'to be interested in math' is a slippery construction, since interest alone doesn't imply anything. Anyway, we had four math courses on the first two years which contained a standard calculus, basic linear algebra and numerical methods overview, as a probability and statistics course. But, since a faculty of engineering in general isn't a place where you'll learn math on a bit higher and more precise level, I decided to take additional courses on the faculty of mathematics, since I don't believe (in my case only, though) in the possibility of a good-quality-self-study. I'm currently attending a linear algebra course, which I find highly interesting and enlighting. If I'll have the time in the nearer future, I plan to attend more math courses and build a small 'additional database' in my student record. So, actually, there are some 'ways around'. But, they're still just 'ways around'.

JasonRox · Oct 28, 2006

mathwonk said:

to get into a phd program, you have to pass 2 hurdles: 1) admission to the university grad school and 2) admission to the departmental program.

math departments are not at so bureaucratic, but university grad schools are. they will require various degrees and certifications for admissions.

moreover, i have learned by experience that those requirements are there for a reason, i.e. people without them are almost always lacking some quality that would help insure success. [i once went to bat for someone without paper quals, who afterwards was indeed not well qualified for our program.]

however if you are that rare bird, a truly exceptional mathematics talent, who knows all they need to, and can do the work, then you might get in without a degree.

You can get into graduate school without a degree?

mathwonk · Oct 28, 2006

the research experience is quite different from the passive learning experience, and after passing through it i had new respect for my colleagues who had done it before me.

moreover, it is not entirely about talent, but persistence and stamina play an equal role. by skipping the undergrad degree one misses the chance to develop this stamina.

just not giving up, is as essential as being smart, as many average intellects have obtained phds (tataa!), but no one who gives up ever does, no matter how brilliant.

mathwonk · Oct 28, 2006

jasonrox: i tried to make it clear that a math dept may be interested in a very talented person, degree or not, but a grad school will not want to accept that person, and with good reason. you have seized on one phrase in my long statement and taken it out of context. read it all. i am not advising or encouraging anyone to seek entrance to grad school without a degree.

no it is unlikely you can get in and unwise to try.

i'll give you one successful example, Barry Mazur apparently has no undergrad degree. he's the guy Andrew Wiles sent his manuscript on Fermat's last theorem to check it. and presumably he was unsure about it, when it was indeed wrong.

but most of the rest of us are not like Barry. and besides Barry had all the academic requirements and time spent in school, he just declined to attend required ROTC. the school (MIT) afterwards seems to have eliminated the requirement, possibly as a result of the subsequent embarrasment at having denied Barry Mazur a degree.

balletomane · Oct 28, 2006

How can I figure out if I am cut out to be a math major? I really love my math classes, but there is always a sense of not being good enough at it. (I seem to be very dense compared to my classmates.)
Er, I'm not doing a good job of articulating what I mean. I guess, put another way, what qualities should a person pursuing a career or degree in mathematics possess? (I do realize that mathematicians/math majors are very diverse, but are there common qualities?)

mathwonk · Oct 28, 2006

you are doing fine. insecurity is an occupational hazard, as there are so many brilliant people in it. love of the subject is the key. i think from what you say you are cut out to do this.

why give up what we love just because someone else is better at it? be happy for them, and hang in there.

Ki Man · Oct 28, 2006

I love math. =]

and don be discouraged by what others do. I'm dense compared to my classmates too, but I'm the brightest dense person (or maybe the densest bright person). be proud of your accomplishments even if you're not the best. Having a love of math is good enough if you really enjoy what you do regaurdless of how well you do it

mathwonk · Oct 31, 2006

day 4.1 algebra, groups and group actions

8000 Group Actions, Simplicity of Icos, Sylow, Jordan Holder.
We continue to study finite groups. To study non abelian ones, we try as with abelian groups to decompose them into products composed of smaller subgroups. This is not always possible, and even to attempt it we need to prove the existence of smaller subgroups. A finite abelian group G has a subgroup of order n for every n that divides #G. This is not true for non abelian groups, but it is true for prime power factors ps dividing #G. To find these subgroups we could look for non trivial homomorphisms, but the kernel of a homomorphism is a normal subgroup, and subgroups of non abelian groups may not be normal. Worse, some non abelian groups have no proper normal subgroups, i.e. they are "simple". A homomorphism from a simple group G is thus either injective or constant.
We cannot have a product decomposition of such a group, since a product KxH admits two projections, to K and to H, whose kernels are the normal subgroups {e}xH and Kx{e}, which intersect in the identity. It cannot be a "semi direct product" KxcH*since that requires K to be normal and K to intersect H only in the identity, nor even be an extension of a group H by a group K, i.e. there is no exact sequence {e}-->K-->G-->H-->{e}, since that requires K to be normal.
Thus we need another tool to study general groups, "group actions", a refinement of the technique of homomorphisms.
Definition: A group G acts on a set S if there is a map GxS-->S taking (g,x) to gx, such that g(hx) = (gh)x for all g,h in G, and all x in S, and ex = x for all x in S.
This is equivalent to a homomorphism G-->Bij(S) of G into the group of bijections of S, taking g to the bijection (x-->gx) and allows us to study G by looking at how it moves points of S around.

The key concepts are orbit, stabilizer, and
the counting principle #(G) = #(orbit)#(stabilizer).

More precisely:
Defn: If G acts on S, the orbit O(y) of a point y in S is the image of the map
G x {y}-->S, i.e. O(y) = {gy: all g in G}.

Defn: If y is in S, Stab(y) = {g in G: gy = y}.

Cosets and conjugacy classes come in as follows:
Lemma: If y is in S, and gy = z, then Stab(z) = gStab(y)g-1, is a conjugate of Stab(y), and the set {elements in H taking y to z} = the coset (g.Stab(y)), and its conjugate hg.Stab(y)h-1 = {elements in H taking h(y) to h(z)}
proof: exercise:

Counting principle: For any y in S, #(G) = #O(y).#Stab(y).
proof: Since every element of G takes y to some element of the orbit of y, G is the disjoint union, over all z in O(y), of the sets {all h in H: hy = z}. Since each of these is a coset of Stab(y), and since multiplication by g is a bijection Stab(y)-->gStab(y), each of these cosets has the same cardinality as Stab(y). QED.

Lemma: Every subgroup H of G is a stabilizer for some action.
proof: Let G act on left cosets of H by left translation. I.e. x takes yH to (xy)H. Then H is the stabilizer of the coset eH = H. QED.

Thus stabilizers for actions can be used to study all subgroups.
Corollary(LaGrange): For every subgroup H of G, #(H) divides #(G).
proof: The counting principle says #(G) = #(H).#(cosets of H in G). QED.

Note: Being in the same orbit is an equivalence relation on S, so an action partitions S into disjoint orbits, each orbit having cardinality dividing #(G).

Def: A fixed point is a point y of S such that Stab(y) = G, i.e. O(y) = {y}.

Corollary: If S is finite, and #(G) = pr, where p is prime, then #(S) is congruent modulo p, to the number of fixed points.
proof: S is the disjoint union of orbits, and each orbit has cardinality divisible by p, except the singleton orbits. QED.
Example of a simple group: G = Icos = rotation group of a regular icosahedron. G acts on the points of the icoshedron, in particular on the vertices, which form one orbit of 12 points. Since each vertex is fixed by exactly 5 rotations, #(G) = (5)(12) = 60. This agrees with the orbit of 20 faces, each fixed by 3 rotations, and the orbit of the 30 edges, each fixed by two rotations.
The 20 elements of order 3 fixing the 10 pairs of opposite faces, the 24 elements of order 5 fixing the 6 pairs of opposite vertices, and the 15 elements of order 2 fixing the 15 pairs of opposite edges, give all 59 non trivial elements of G.
Since the stabilizers of all vertices are conjugate, a normal subgroup containing one element of order 5 contains all, and similarly for the other orders. Hence a normal subgroup K of G has order = 1 + some or none of the integers 15, 20, 24. But the only divisors of 60 those sums form are 1 and 60. Hence G has no proper normal subgroups, so is simple.

Next we use actions to produce stabilizer subgroups of prime power orders.
Theorem(Sylow): Let #(G) = mpr where p does not divide m.
1) There exist subgroups of G of order pr.
2) All subgroups of order pr are conjugate to one another,
3) The number of subgroups of order pr divides m, and is congruent to 1 modulo p.
proof: Suppose G acts on a set S such that p does not divide #(S). S is a disjoint union of orbits, so there is an orbit O(x) whose order is not divisible by p. By the counting principle pr divides #(Stab(x)). So if we can find such an action where #(Stab(x)) <= pr, we would be done.
Since G is an arbitrary group, the only thing G acts on is G itself, by translation, and conjugation. But G has order divisible by p. We might consider subgroups of G, but we do not know how many there are! So we consider subsets of G, with G acting by translation. If a subgroup H stabilizes a non empty set T, then for any y in T, translation is an injection H-->T taking g in H to gy in T. So H is no larger than T. Thus if we let G act on subsets of size pr, then the stabilizers will have cardinality <= pr as desired.
So we hope the number of such subsets is not divisible by p. Of course the set S of subsets of g of size pr, has order = = . In this fraction every factor in the top of form (mpr-k), is divisible by ps , s <= r, if and only if k is, if and only if the factor (pr-k) in the bottom is. Thus every factor of p occurring in the top is canceled by a factor from the bottom. Hence this binomial coefficient is not divisible by p, and thus the stabilizer of any subset in an orbit not divisible by p, gives a subgroup of G of order pr. QED

Lemma: If H,K are subgroups of G and H lies in N(K), then the set of products HK is a subgroup of G, and HK/K ? H/(HmeetK).
proof: exercise.
To count the number of subgroups P1,...,Pn, of order pr, (called Sylow p - subgroups, or pr - subgroups) let P1 act by conjugation on all of them. We claim P1 fixes only P1. To prove it, if P1 fixes Pj, then P1 lies in the "normalizer" N(Pj) = {g in G such that g-1Pjg = Pj}. Then P1Pj is a subgroup of G, and (P1Pj)/Pj ? P1/(P1meetPj). Since the latter quotient group has order dividing #(P1) = pr, it follows that #(P1Pj) is a power of p. Since P1Pj contains P1, whose order is already the largest possible power of p for a subgroup of G, hence P1 = Pj. Thus the action of P1 on the set S of Sylow p subgroups, has exactly one fixed point. By the counting principle above for p-groups, #(S) is congruent to 1, mod p.
Now let G act on S by conjugation. The G- orbit of Pj contains the P1 orbit of Pj. Thus the G orbits are unions of P1 orbits, and all the P1 orbits except {P1}, have order divisible by p. So the G orbit containing P1 has order congruent to 1 mod p, while the others are divisible by p. But the normalizer of any Pj in G contains Pj. The order of the G orbit of Pj equals the index of that normalizer, hence divides m, so cannot be divisible by p. Thus there is only one G orbit, i.e. all Pj are conjugate. Since the order of each orbit divides m, and there is only one orbit, #(S) divides m. QED.
Cor: A group G of order 24 cannot be simple.
proof: 24 = 23.3, so the number k of subgroups of order 8, divides 3, hence k = 1, or 3. If k = 1, the unique subgroup of order 3 is normal, if k = 3, we get a transitive action of G by conjugation on the 3 subgroups, hence a homomorphism G--S(3), which must have a non trivial kernel, since #S(3) = 6 < 24 = #(G).

Cor: Every simple group G of order 60 is isomorphic to A(5).
proof: First we want to find a non constant homomorphism G-->S(5).
Since 60 = 22.3.5, by Sylow there are 1, 3, 5, or 15, sylow 4-subgroups, and 1, 4, 10, or 20, sylow 3-subgroups, and 1, 6, or 12, sylow 5-subgroups. G is simple, hence has trivial center, so cannot have a conjugacy class of one non identity element, and a transitive action on a set of n < 5 elements gives non constant homomorphism to a group S(n) of order less than 60. So there are either 5 or 15 subgroups of order 4; 10 of order 3; and 6 of order 5. This gives 20 elements of order 3, and 24 elements of order 5. So we focus on the groups of order 4.
If there are 5 of them, since G acts transitively on them by conjugation, we have our non constant map G-->S(5). If there are 15, they cannot all intersect trivially, since there are only 15 elements left in the union of all the 4-subgroups. Hence some pair of distinct 4 groups contain a common element x, necessarily of order 2.
Then the normalizer N(x) of x is a subgroup which contains two distinct sylow subgroups of order 4. Thus #(N(x)) = 4n for some n > 1, and #(N(x)) divides 60. Hence #(N(x)) = 12, 20 or 60. Hence the index of N(x), i.e. the order of the class of elements conujugate to x, has order <= 5. Since G acts transitively on this class, it has order 5, and again we have our non constant map <pi>:G-->S(5).
The map <pi> is injective since the kernel is a normal subgroup smaller than G. Moreover if S(5)-->{±1} is the "sign map" (discussed below), the composition G-->S(5)-->{±1}, must have non trivial kernel in G. Since the only non trivial normal subgroup of G is G itself, the image of the map
G-->S(5) lies in A(5) = kernel(sign map). Hence G ? A(5). QED.

Challenge: Consider groups of order 168. Try to find a simple one, and prove it is unique. Then prove there are no other simple groups of order < 168, or even < 360, (except abelian ones of prime order).

Exercise: Extend Sylow's theorem, by showing the following:
i) If p is prime and ps divides #G, then G has a subgroup of order ps.
[hint: It suffices to look inside a sylow p-subgroup. Prove the center of a p-group is always non trivial by looking at conjugacy classes. I.e. elements of the center define conjugacy classes with only one element. All non trivial conjugacy classes are divisible by p. So how many singleton classes must exist? Then you can mod out by the center and use induction.]
ii) If G has a subgroup H of order ps where p is prime, prove H is contained in a sylow p-subgroup. [hint: the proof we gave above for the number of sylow groups showed that when a p-group acts on all sylow p-subgroups by conjugation, it must be contained in any fixed subgroup.]

Decomposing groups as "products" of subgroups.
Direct products:
Now that we have a good supply of subgroups in any group G, we ask when G decomposes as a product of some of these subgroups. We define a direct product of groups exactly as before:
Def. H x K = {all pairs (h,k) with h in H, k in K} and (x,y)(h,k) = (xh,yk).

Non abelian products only have half the mapping properties of abelian ones:
Lemma: The projections HxK-->H and HxK-->K are homomorphisms, and if f:G-->H and g:G-->K are any two homomorphisms, there is a unique homomorphism G-->HxK, whose compositions G-->HxK-->H, and G-->HxK-->K equal f and g. proof: exercise.

mathwonk · Oct 31, 2006

day 4.2 algebra, classifying small groups using semi direct products

This does not help us to decompose G, because if H,K are subgroups of G, we only have inclusion maps H-->G and K-->G. In the non abelian case, these do not define a map H x K-->G. This is why it is harder to decompose G as a product. The image of such a map would be the set of products of elements of H and K, but these products usually do not even define a subgroup of G unless at least one of H or K is normal.

Exercise: If H,K are subgroups of G and H lies in the normalizer of K, then HK is a subgroup of G, and HK/K ? H/(HmeetK).

To define a map out of a product we need some commutativity. We identify H, K with the subgroups H x {e}, and {e} x K in H x K. Then H and K intersect only in {e} = {(eH,eK)}, and every element of H commutes with every element of K, i.e. (h,e)(e,k) = (h,k) = (e,k)(h,e). Thus both H and K are normal subgroups of H x K. Conversely, if normal subgroups H, K of a group G intersect only in {e}, they commute with each other since for x in H, y in K, we have x(yx-1y-1) = (xyx-1)y-1, belongs both to H and K. Hence xy(x-1y-1) = e, so xy = yx.

This much commutativity is enough to define a map out of a product.
Proposition: If f

-->G and g:K-->G are group maps then f(H) and g(K) are subgroups of G. If the elements of these image subgroups commute with each other, i.e. if f(x)g(y) = g(y)f(x) for every x in H, y in K, then the map
(f x g)

x K-->G with (f x g)(s,t) = f(s)g(t) is a homomorphism whose restrictions to H,K are f, g respectively.
proof: With this definition, (fxg)(u,v).(fxg)(s,t) = f(u)g(v)f(s)g(t) = f(u)f(s)g(v)g(t) = f(us)g(vt) = (fxg)(us,vt) = (fxg)((u,v).(s,t)). QED.

Cor: If H,K are normal subgroups of G and HmeetK = {e}, there is an injective homomorphism HxK-->G sending (h,k) to hk, whose image is HK.
proof: We have just proved the image groups H,K commute, so this is a homomorphism. If hk = e, then h-1 = k, so belongs to both H and K, hence k = e = h, proving injectivity. The image is obviously HK. QED.

Cor: If H,K are normal in G, HK = G and HmeetK = {e}, then G ? HxK.

Examples: A group of order 15 has sylow subgroups H,K of orders 3,5, which are unique, since 1 is the only factor of 5 congruent to 1 mod 3, and also the only factor of 3 congruent to 1 mod 5. Thus both H, K are normal, intersect only in {e}, so G ? Z/3 x Z/5 ? Z/(15). QED.

This example generalizes as follows. If #G = pq, with p,q primes, the sylow subgroups H, K have orders p, q. If p > q, the number of sylow p-subgroups divides q and has form 1, p+1, .., hence equals 1. So the sylow subgroup of the larger prime is always normal. The number of q - sylow subgroups has form nq+1 and divides p, so since p is prime it equals p, so nq = p-1, and q divides p-1. Thus we have:

Proposition: If #G = pq where p>q are primes, and q is not a factor of
p -1, then G is cyclic, G ? Z/(pq).
proof: As above, both sylow subgroups are normal, so G ? Z/p x Z/q ? Z/(pq). QED.

E.g., all groups of orders, 15, 35, 65, 77, 91, 85,139, 95, 133,... are cyclic.

What about groups of order p2?
Proposition: All groups of order p2 are abelian. Hence there are exactly 2 of them, Z/p2 and Z/p x Z/p.
proof:

Lemma: A p - group always has a non trivial center.
proof: This uses the orbit formula in the following form: If N(x) = {y: yx=xy} = the normalizer of x, then N(x) is a subgroup, and its index is the order of the conjugacy class of x. Hence #G = sum over one element x from each conjugacy class, of the indices of the N(x). In particular, since an element is in the center Z(G) if and only if its normalizer is G with index 1, we have:

The class equation: #G = #Z(G) + summation IndexN(x), for one x in each non trivial conjugacy class.

proof of lemma: For a p - group G, these non trivial indices are all powers of p, as is #G, hence so is #Z(G). I.e. #Z(G) is divisible by p, so the center contains more than just {e}. QED lemma.

proof of proposition:
If x is any element of any group, the normalizer of x always contains both x and the center Z(G). If x is in the center then N(x) = G. If not, then N(x) is strictly larger than Z(G). Since in a p group, #Z(G) is at least p, then for every x ,#N(x) is at least p2. But that means for every x, N(x) = G. Hence every x is in Z(G). QED.Prop.

We now know all groups of order 4, 9, 25, 49, 121,..., and may ask about groups of order pq where p > q and q is a factor of p-1, like #G = 6, or 21, or 2p, for p odd. As above, these are the cases where only one of the two sylow subgroups need be normal. So what happens in that case? I.e. how does the "product" group HK look then? We need another tool.

Semi - direct products
If H,K are subgroups of G and only K is normal, the products kh still form a subgroup KH, but the multiplication is more complicated. If we understand H and K, we need to know how to multiply products of form (xs)(yt) where x,y are in K, s,t are in H. If s,y did commute, then (xs)(yt) would equal xyst, but sy may not commute, but the extent to which they do not commute is given by conjugation. Thus sy may not equal ys, i.e. sys-1 may not equal y, but it does equal cs(y) where cs:K-->K is conjugation by s.
I.e. if we know the automorphism cs:K-->K, then sys-1 = cs(y), so sy = cs(y)s. Thus xsyt = x(sy)t = x(cs(y)s)t = (x.cs(y))(st). Thus if c

-->Aut(K) is the homomorphism taking each s to cs = conjugation by s, the product (xs)(yt) is given by (x.cs(y))(s.t). This tells us how to define a twisted product, called the semi direct product of K and H, with twisting given by a homomorphism c

-->Aut(K).

Defn: Let H,K be groups and let c

-->Aut(K) be a homomorphism. Then define multiplication on the cartesian product KxH by setting (x,s).(y,t) = (x.cs(y), st). Denote the resulting semi direct product by KxcH.

Exercise: With definitions as above, prove:
(i) The semi direct product KxcH is a group.
(ii) The subsets K' = {(k,e) for all k in K}, and H' = {(e,h) for all h in H} are subgroups of G isomorphic to K, H respectively, and K' is normal.
(iii) The action of H on K via c becomes the conjugation action of H' on K', i.e. if k' = (k,e), h' = (e,h), then h'k'h'-1 = (c(h)(k))' = (h(k),e).
(iv) H' is normal in KxcH if and only if c is the trivial homomorphism.
(v) If H, K are subgroups of a group G, K is normal, and we define
c

-->Aut(K) to be conjugation of K by H, then letting f(k,h) = kh, defines a homomorphism f:KxcH-->G, which is surjective if G = KH, and injective if KmeetH = {e}.

Proposition: If G has order 2p where p is an odd prime, there is exactly on non abelian group of order 2p, the dihedral group Dp.
proof: The subgroup K of order p is normal, so we have an isomorphism G ? (Z/p) xc (Z/2), where c:Z/2-->Aut(Z/p) is a non trivial homomorphism. Since Aut(Z/p) ? (Z/p)* ? (Z/(p-1)), there is only one element of order 2 in Aut(Z/p), hence only one on trivial map c, hence one non abelian group. Since Dp is non abelian of order 2p, this is it. QED.
This classifies all groups of orders 6, 10, 14.

Next we show homomorphisms c

-->Aut(K) that difffer by an automorphism of H, define isomorphic semi direct products.
Proposition: Let H, K be groups, c

-->Aut(K) a homomorphism, g

-->H an automorphism of H, and define c'

-->Aut(K) by c' = cg-1. Then the map f:KxcH-->Kxc'H defined by f(k,h) = (k,g(h)), is an isomorphism.
Proof: f is a bijective function, with inverse f-1(k,h) = (k,g-1(h)), so we check the homomorphism property. If (k,h), (k1,h1) are in KxcH, their product is (k,h)\(k1,h1) = (k.c(h)(k1),hh1), whose image is f(k.c(h)(k1), hh1) = (k.c(h)(k1), g(hh1)).
On the other hand the two images of (k,h) and (k1,h1) are f(k,h) = (k,g(h)) and f(k1,h1) = (k1, g(h1)), hence the product of the images is (k,g(h)).(k1, g(h1)) = (kc'(g(h))(k1), g(h)g(h1)). Since c'g = c, and g is a homomorphism, thus indeed f((k,h).(k1,h1)) = (k.c(h)(k1), g(hh1)) = (kc'(g(h))(k1), g(h)g(h1)) = f(k,h).f(k1,h1).
QED.

Exercise: i) If p-1 = mq, there are exactly q-1 non constant maps
c:(Z/q)-->Z/(p-1), taking [1] to some multiple of [m].
ii) Aut(Z/p) ? Z/(p-1).
iii) If p-1 = mq, all non constant maps c:Z/q-->Aut(Z/p) define isomorphic semi direct products (Z/p) xc (Z/q).
iv) If p-1 = mq, there is exactly one non abelian group of order pq.

Classifying groups whose order has more than 2 factors is more work.
Theorem: There are exactly 2 non abelian groups of order 8, up to isomorphism, Hamilton's unit quaternions, and D4 = Isom(square).
Proof: #G = 8 = 23, and G not cyclic, so all elements have order 1,2, or 4.

Lemma: Two elements x,y of order 2 in a group, commute if and only if their product has order 2.
proof: If xy has order 2, then (xy)(xy) = e, so xy = (xy)-1 = y-1x-1 = yx, since x,y have order 2. The other direction is even easier. QED.

Hence G has elements of all orders 1,2, and 4.
case 1) Assume there is only one element of order 2, hence 6 elements of order 4. Then let x be an element of order 4, and y another element of order 4, with y different from both x and x-1. The subgroup <x> has index 2, hence is normal. Since G = <x>.<y>, and <x> ? <y> ? Z/4, G must be the image of a surjective map from a non trivial semidirect product Z/4 xc Z/4, defined by a non constant homomorphism
c:Z/4-->Aut(Z/4) ? Z/2. There is only one such map, hence only one such non trivial s.d.p. Z/4 xc Z/4.
Now for the map Z/4 xc Z/4-->G. It is multiplication, (or exponentiation in our notation) hence maps {0}xZ/4--><y> isomorphically ([0,n]-->yn), and maps Z/4 x {0}--><x> isomorphically ([n,0]-->xn). Since there is only one element of order 2 in G, the elements x2 = y2 are the same, so the element [2,2] of Z/4 xc Z/4, must be the unique non trivial element of the kernel. Hence G ? [Z/4 xc Z/4]/{(2,2)}, is also uniquely determined. So there is only one non abelian group of order 8 with a unique element of order 2. Note that Hamilton's quaternions do satisfy this description, hence this is the quaternion group.

case 2) Assume there are more than one element of order 2. There are still some elements of order 4, so let x have order 4, hence x2 is the unique element of order 2 in the subgroup <x>. then choose another element of order 2, say y, different from x2. Then <x> is normal and the subgroup <x>.<y> = G, so G ? <x> xc <y> ? (Z/4)xc(Z/2), defined by the unique non trivial map c:Z/2-->Aut(Z/4). So there is only one non abelian group of order 8 with more than one element of order 2, which must be D4 = Isom(square).

Theorem: There are 3 non abelian groups of order 12, up to isomorphism.
proof: #G = 12 = 22.3, so there are sylow subgroups H,K of orders 3,4. If there are 4 subgroups of order 3, hence 8 elements of order 3, there are only 4 elements left to form one group of order 4, so the sylow 4-subgroup is unique and normal. Hence at least one of the sylow subgroups is normal. If both sylow subgroups H,K are normal, G ? HxK, hence G is abelian. So if G is non abelian, only one sylow subgroup is normal.
Since HK = G, we have in all cases an isomorphic map KxcH-->G where c

-->Aut(K) is a non constant homomorphism. (The constant homomorphism defines the trivial direct product, which is abelian.) If the 4-subgroup is normal, we have c:Z/3-->Aut(K), where K is either Z/4 or Z/2 x Z/2. Since the only homomorphism Z/3-->Aut(Z/4) ? Z/2 is constant, K must be Z/2 x Z/2. Then Aut(Z/2 x Z/2) ? S(3) has 2 elements of order 3 so there are two non constant maps c:(Z/3)-->Aut(K). Since one can show that Aut(Z/3) acts on the set of the resulting semi direct products by isomorphisms, and since Aut(Z/3) ? Z/2, the two non cobnstant maps Z/3-->S(3) yield isomorphic groups KxcH.
Thus there is only one non abelian group G ? (Z/2 x Z/2) xc (Z/3) of order 12, with normal sylow 4 - group. In fact the group Tet = Isom(tetrahedron) has order 12, and 4 distinct subgroups of order 3, so must be this group. The action on the 4 vertices also embeds this group as A(4) in S(4), since that sub group is generated in S(4) by the 8 elements of order 3.
If K = Z/3 is the normal subgroup, and H is the sylow 4-subgroup, we have a map H-->Aut(K) ? Aut(Z/3) = {±Id} ? Z/2. If H ? Z/4 there is only one non trivial map, taking [1] to -Id. So there is only one non abelian group of order 12 with Z/3 as normal subgroup, and having a subgroup isomorphic to Z/4, i.e. one non trivial semi direct product (Z/3) xc (Z/4).
I have not run across this group in geometry.
If K = Z/3 is the normal subgroup, and H ? Z/2 x Z/2 is the sylow 4-subgroup, then c:(Z/2 x Z/2)-->Aut(Z/3) = (Z/3)* ? Z/2, so there are three non constant maps, each taking two of the vectors (1,0), (0,1), (1,1) to 1, and taking the other vector to 0. But again Aut(Z/2 x Z/2) ? S(3) acts transitively on these maps. Hence all three resulting semi direct products are isomorphic, so there is really only one non abelian semi direct product of form (Z/3) xc (Z/2 x Z/2). Since the dihedral group D6 = Isom(hexagon) has order 12, seven elements of order 2, two elements of order 6, and two elements of order 3, it must be this group.

mathwonk · Oct 31, 2006

day 4.3 algebra, normal and composition series for groups

Theorem:
G is solvable iff all subgroups and quotient groups of G are solvable iff there is one normal subgroup K such that both K and G/K are solvable.
proof:
I. Assume K is normal in G, and that both K and G/K are solvable. Thus we have normal series K = K0 > K1 > ...> Kn = {e}, and G/K = H0 > H1> ...> Hm = {[e]}, and all quotients Ki/Ki+1 and Hj/Hj+1 are abelian. Then define a normal series for G by augmenting that for K, by the pull back of that for G/K. I.e. let Gj = f-1(Hj) where f:G-->G/K is the natural projection. Since the inverse image of a normal group is also normal, all Gj are normal. Hence G = G0 > G1 > ...>Gm = K = K0>...>Kn = {e} is a normal series for G. The Ki/Ki+1 are still abelian, Gm/K0 = {e} is abelian, and for j < m, we have Gj/Gj+1 ? (Gj/K)/(Gj+1/K) ? Hj/Hj+1 is abelian. That proves G solvable.
II. Next assume G solvable with abelian normal series G = G0 > G1 > ...>Gm = {e}, and let H be any subgroup. Define Hi = HmeetGi. Then Hi+1 is not necessarily normal in G, but it is normal in Hi. I.e. conjugating an element y of Hi+1 = HmeetGi+1 by an element x of Hi = HmeetGi is conjugating by an element of Gi, and Gi+1 is normal in Gi. Hence xyx-1 lies in Gi+1. But x also lies in H, as does y, and H is normal in H, so xyx-1 also lies in H. I.e. for all x in Hi and y in Hi+1, xyx-1 lies in HmeetGi+1 = Hi+1.
Now Hi/Hi+1 = (HmeetGi)/(HmeetGi+1), so if we map HmeetGi into Gi/Gi+1, the kernel is precisely (HmeetGi+1). Hence Hi/Hi+1 is isomorphic to a subgroup of Gi/Gi+1, hence is also abelian. Thus H is solvable.
III. Assume G is solvable with abelian normal series G = G0 > G1 > ...>Gm = {e}, K is normal in G and consider G/K. Define Hi = (KGi)/K. Since the class of elements of K are trivial, each class in Hi can be represented as [x] for some x in Gi, and similarly each [y] in Hi+1 can be represented as [y] for some y in Gi+1. Thus [x][y][x-1] = [xyx-1] is in Hi+1, since xyx-1 is in Gi+1. Hence Hi+1 is normal in Hi.
Now consider Hi/Hi+1 = (KGi/K)/(KGi+1/K) ? (KGi)/(KGi+1). Then map Gi-->KGi-->(KGi)/(KGi+1). The composition map f is onto since again every class [y] in the quotient can be represented by an element y of Gi. Then since the subgroup Gi+1 of Gi goes to zero under this composition, there is an induced map [f]:(Gi/Gi+1)--> (Hi/Hi+1) which is still surjective. Since Hi/Hi+1 is thus isomorphic to a quotient of the abelian group Gi/Gi+1 modded out by the kernel of [f], the quotient Hi/Hi+1 is also abelian. QED.

Composition series
Notice that if G is a cyclic group Z/(mn) of order mn, then G has a cyclic subgroup generated by [m] of order n, whose quotient is cyclic of order m. Hence a cyclic group of order n = ?piri has a maximal, non redundant, normal series whose quotients are of prime order, and equal to the prime factors of n. Thus every maximal non redundant normal series for G has the same quotients, up to isomorphism, but possibly in a different order. That this also holds for non abelian groups is called the Jiordan Hoklder theorem.

Definition: A composition series for a group G is a normal series G = G0 > G1 > ...>Gm = {e}, in which every quotient group Gi/Gi+1 is simple but not trivial, ( thus a maximal, non redundant, normal series).

Theorem: (Jordan - Holder) If a finite group G has two composition series, then they have the same length, and after some renumbering of the quotients.the two sequences of simple quotients are the same, up to isomorphism.
proof: By induction on the order of G, prime order being trivial. Let
G > G1 > ...>Gm = {e}, and G > H1 > ...>Hn = {e}, be composition series for G.
case I. G1 = H1. Then we are done by induction, since the groups G1 = H1 have smaller order, so their compositiion series are the same length, and have isomorphic quotients, in some order.
case II. G1 and H1 are different. Then both G1, H1 are maximal proper normal subgroups of G, so their product G1H1 is normal and larger than either, hence G1H1 = G. We want to construct another composition series for G to reduce to case I. Look at G1meetH1. This is not equal to either G1 or H1 and is normal in both, so call it K2, and construct a composition series for K2 > K3 >...> Ks.
Then we have two new composition series for G: G > G1 > K2 > ...> Ks, and G > H1 > K2 > ...> Ks. To check this, we only have to show that both G1/K2 and H1/K2 are simple and non trivial. But G1/K2 = G1/(G1meetH1) ? G1H1/H1 = G/H1, is simple. Same for H1/K2.
Now case I tells us that m = s, and the composition series
(A) G > G1 > ...>Gm and (B) G > G1 > K2 > ...> Ks, have isomorphic quotients. Also n = s, and the series (C) G > H1 > K2 > ...> Ks, and (D) G > H1 > ...>Hn have isomorphic quotients. Since G1/K2 ? G/H1 and H1/K2 ? G/G1, we see series (B) and (C) also have isomorphic quotients. Hence the same holds for series (A) and (D), as desired. QED.

Corollary: A group G is solvable if and only if in every composition series for G, all the simple quotients are cyclic of prime order. [Necessarily the orders of the quotients is the sequence of primes in the prime factorization of #G].

Corollary: Prime factorization of integers is unique, up to order of factors.
proof: A prime factorization of n gives a composition series for Z/n. QED.

Free groups and free products of groups.
We noted that given two maps g:G-->K, h

-->K of groups, setting (gh)(x,y) = g(x)h(y) may not define a map GxH-->K, since elements of G commmute with elements of H, but their images in K may not commute. Since we have no restriction on the elements of K, in order to construct a group from which the maps g,h, do always define a map to K, we must allow no commutativity in our "product" at all. Let G = H = Z, the simplest case. Call a the generator of the first copy of Z, and b the generator of the second copy. Since the only relations we allow are those forced by the group laws, we must allow ababab and a2bab-3ab, and so on, all to be different elements. So we define a "word" constructed from the letters a,b, to be any finite sequence of powers of a and b, e.g. ar1bs1ar2bs2... The exponents can be any integers. The sequence where all powers are zero, is the identity element, and words are multiplied by juxtaposition. When we juxtapose two words, the powers of a and b may not alternate, so we combine adjacent powers of the same letter. The trivial word has only zero exponents. A non trivial word is reduced if it has no adjacent powers of the same letter and no zero exponents. We also consider the trivial word to be reduced and write it as e.
Clearly inverses exist, and the trivial word is the identity since e = x0 = y0. Associativity is not so easy, but there is a nice proof in Mike Artin's book, which I copy.
Artin calls a word a finite sequence of the elements a,b,a-1,b-1, and a reduction of a word is obtained by a cancellation of some adjacent pair, of form xx-1, or by a sequence of such cancellations. A reduced word is one in which no cancellations are possible. The main point is that starting from any word and performing cancellations, there is only one possible reduced result. This is true if the word is already reduced, for example if it has length zero or one, so use induction on the length of the word. If a word is not reduced it must contain a pair of form xx-1. If we cancel this pair first, the induction hypothesis says there is only one possible reduced result for this word. If we perform some other sequence of cancellations, and eventually cancel this \pair, we might as well have canceled it first, and the same result holds. If on the iother hand we can one of these elements but not both, we must do it as follows: by cancelling the first two of
x-1xx-1, or the last two of xx-1x. Either way, the result is the same as if we had canceled the original pair, so the argument in that case holds. QED.

Definition: The set of reduced words in the letters {a,b}, with the operation of juxtaposition, is the free group on those two letters. The empty word is the identity, written e = a0 = b0. We shorten the notation by using higher exponents for adjacent occurrences of the same letter.

Exercise: Associativity follows easily from the (obvious) associativity on the set of unreduced words, by the uniqueness result above for reduction.

Definition: The free*product of two copies of Z is defined as the free group Fr(a,b) on two letters. It is easy to see that any two group maps f:Z-->K and g:Z-->K define a unique group map (fxg):Fr(a,b)-->K.

There is one plausible result that is not too hard.
Theorem: If G = Fr(a,b) is the free group on two letters, and G' is the commutator subgroup, the quotient G/G' ? Z x Z, the free abelian group on two letters. [hint: prove they have the same mapping property.]

Remark: The same construction, with the same proof, defines a free group on any set of letters, and proves the existence of a free product of any number of copies of the group Z. It follows that every group G is the image of a homomorphism from a free group Fr(S)-->G, but it is hard to make much use of this. I.e. unlike the abelian case, the free groups, even on two letters, are very complicated and hard to understand.

Theorem: 1) Every free group on any finite or countable number of letters, is a subgroup of Fr(a,b).
2) Conversely, every subgroup of Fr(a,b) is isomorphic to a free group on a finite or countable set of letters. [look at <pi>1(figure eight).]
“proof”: If X(n) = the “figure eight” with n loops, then X(n), n >= 2, is a covering space of X(2), and <pi>1(X(n)) ? Fr(a1,...,an), same for X(infinity). This proves 1), by the homotopy lifting property. For 2) given any subgroup of <pi>1(X(2)), it defines a covering space Y whose <pi>1 is that subgroup. But the figure eight is a graph, and every covering space of a graph is again a graph (1 dimensional complex), hence homotopic to a wedge of circles, so <pi>1 is again free. qed.

JasonRox · Oct 31, 2006

mathwonk said:

jasonrox: i tried to make it clear that a math dept may be interested in very talented person, degree or not, but a grad school will not want to accept that person, and with good reason. you have seized on one phrase in my long statement and taken it out of context. read it all. i am not advising or encouraging anyone to seek entrance to gradschool without degree.

no it is unlikely you can get in and unwise to try.

ill give you one successful example, barry mazur has no undergrad degree. he's the guy andrew wiles sent his manuscript on fermats last theorem to to check it. and presumably he was unsure about it, when it was indeed wrong.

but most of the rest of us are not at all like barry. and besides barry had all the academic requirements and time spent in school, he just refused to attend required ROTC. the school (MIT) afterwards seems to have eliminated the requirement, probably as a result of the subsequent embarrasment at ahving denied barry mazur a degree.

I read the whole thing. I just wanted to check if that's what you said.

I'm aware that people like Barry are not common... at all.

mathwonk · Oct 31, 2006

day 3 algebra, canonical forms of matrices

day 3 was normal forms of matrices and the matrices will undoubtedly not looad, but maybe some of the algebra still has interest.

8000 Fall 2006 Day 3.
Canonical forms of matrices, the power of Cramer's rule

Our decomposition theorem gives us a standard model in each isomorphism class of finitely generated torsion k[X] modules. This will be used next to provide a standard matrix representative for each conjugacy class, or similarity class as it is usually called, in the ring Matn(k), of n by n matrices over any field k.
Recall that a linear map T:V--->V on a k vector space V, provides a unique k algebra map k[t]--->Endk(V), sending t to T, and hence f(t) to f(T), and hence a unique k[t] module structure on V. We will denote Endk(V) simply by End(V) in this chapter for brevity, since we will not be concerned with the larger ring of group endomorphisms.
Conversely, a k[t] module structure on V singles out a unique linear map T, the image of t under the map k[t]--->Endk(V). Thus k[t] module structures on V are in natural bijection with the elements of End(V). We want to ask what equivalence relation is imposed in this way on End(V) by considering isomorphism classes of modules.

Note that if f:(V,T)--->(V,S) is a k[t] module isomorphism, then f is a k isomorphism that takes multiplication by (i.e. application of) T into multiplication by S. Thus f(Tv) = S(fv) for every v in V. Since f is an isomorphism this implies Tv = (f-1Sf)v, for every v. Hence S and T are conjugate by the isomorphism f.
Conversely, these equations show that if T = (f-1Sf), then T and S define isomorphic k[t] modules via the isomorphism f. Thus isomorphism classes of k[t] module structures on V correspond to conjugacy classes of endomorhisms via the action of Aut(V) on End(V).

Hence when V has finite k dimension, our canonical models of each k[t] - isomorphism class, translate into canonical representatives of each conjugacy class in End(V). Recall each finitely generated torsion k[t] module (V,T) has a model V ? k[t]/f1 x ...x k[t]/fm , where each fi is a monic polynomial in k[t], and fi divides fi+1.
Under the isomorphism (V,T) ? k[t]/f1 x ...x k[t]/fm the linear map T:V--->V, i.e. multiplication by T, becomes multiplication by the variable t on each factor of k[t]/f1 x ...x k[t]/fm. Hence if we choose a natural k basis for this model vector space, the resulting matrix for t will give a natural matrix representing T in some corresponding k basis for V.

A k - basis for k[t]/f1 x ...x k[t]/fm, can be obtained as the union of bases for each factor space k[t]/fi, and the simplest basis for k[t]/fi, is {1, t, t^2,..., t^ri-1}, where fi has degree ri. If f = a0 + a1t + ...+ ar-1t^r-1 + t^r,

the matrix of t in this basis is this: , where the jth column

is the coefficient vector of t times the jth basis vector. E.g. t(1) = 0(1)+1(t) + 0(t^2) + ...+ 0(t^r-1), gives the first column.

This is called a cyclic basis, since the linear map carries each basis vector to the next one, except for the last one, which is carried to a linear combination of the basis by means of scalars which are precisely minus the coefficients of the polynomial f. This is called a companion matrix Cf for f. [Other versions of it in other books may have the coefficients of f along the bottom, and the 1's above the diagonal.] Note that if v1,...,vn is one cyclic basis for (V,T) then for any c != 0, cv1,...,cvn is another, so cyclic bases are never unique.

If f1,...,fm is the sequence of polynomials defining the module (V,T), the full matrix for T using the cyclic bases for each factor looks like this:

, where there are zeroes away from the Cfi.

Summarizing, we have the following.
Theorem: If V is a vector space of finite dimension n over a field k, and T is any linear endomorphism of V, there exist bases for V in which the matrix of T is composed of one or more blocks, each block being a companion matrix for a monic k polynomial fi.
The sum of the degrees of the fi equals n, and we may choose them so each fi divides fi+1. If we do this, then two maps S,T of V are conjugate if and only if they have exactly the same matrix of companion blocks. There is exactly one companion matrix block Cf for each factor k[t]/(f) in the standard decomposition of the k[t] module structure for (V,T). The block Cf has dimension deg(f) by deg(f).

Terminology: We call the unique matrix of this type associated to T, the rational canonical matrix for T.
Two natural questions remain:
1) how do we find the canonical form for a given matrix? and
(more difficult):
2) how do we find a basis that puts a given matrix into canonical form?
A third question is:
3) is there a simpler canonical matrix in cases where the polynomials fi are particularly simple, e.g. when they all factor into linear factors over k?

Before addressing these questions, we derive some useful consequences of the results we already have. For example we can already*compute the important invariant ?fi of the module (V,T), using determinants. Briefly, we claim this product is the "characteristic polynomial" of T, ?fi = det[tI-T] = chT(t). Since fm is the annihilator of the module (V,T), this implies the Cayley Hamilton theorem: chT(T) = 0.

Before proving this, we recall without proof the basic theory of determinants, including LaGrange's formulas for expanding them along any row or column, and the resulting "Cramer's rule".
Review of determinants.
If A = [aij] is an n by n matrix over a commutative ring, denote by Aij the (n-1) by (n-1) matrix obtained from A by deleting the ith row and jth column. Then LaGrange's formulas say, for each fixed value of i, det(A) = <sum>j (-1)^i+j det(Aij), (expansion by the ith row), and for each fixed value of j, det(A) = <sum>i (-1)^i+j det(Aij), (expansion by the jth column.
Thus if we define adj(A) = the adjoint of A, as the matrix whose i,j entry equals (-1)i+j det(Aji), i.e. as the transpose of the matrix of signed determinants of the Aij, it follows that the matrix products adj(A).A = A.adj(A), both equal the diagonal matrix det(A).I, whose entries along the diagonal are all equal to det(A).
Thus if det(A) is a unit in the ring of coefficients, then A is an invertible matrix with inverse equal to (det(A))-1.adj(A). Since for any two n by n matrices A,mB we always have det(AB) = det(A)det(B), the converse is also true. I.e. AB = I implies det(A)det(B) = det(I) = 1, so both det(A) and det(B) are units. Thus the equation adj(A).A = A.adj(A) = det(A).I, yields a formula for the inverse of an invertible A, and hence Cramer's rule for solving invertible systems AX=Y.
Cramer's formula also implies that a matrix and its transpose have the same determinant. I.e. since the transpose of the adjoint is the adjoint of the transpose, taking the transpose of the equation adj(A).A = A.adj(A) = det(A).I, gives (det(At).I) = At.adj(At) = adj(At).At = (det(A).I)t = det(A).I, the last because the diagonal matrix det(A).I is symmetric.

Define: the characteristic polynomial of a linear map T on a finite dimensional space chT(t) = det([tI-A]) where A is any matrix for T.

By the previous remarks, a matrix A and its transpose At have the same characteristic polynomial.

Note: If A,B are two matrices matrix for T, A and B are conjugate, i.e. B = C-1AC for some invertible C. Then since det(B) = det(C-1AC) =
det(C-1)det(A)det(C) = det(A)det(C-1)det(C) = det(A), we see A and B have the same determinant. Similarly, [tI-A] and C-1[tI-A]C =
[C-1tIC-C-1AC] = [tI-B] have the same determinant, since t.I commutes with every matrix C. Hence the characteristic polynomial of T is well defined by any matrix for T. It is easy to see the constant term of chA(t) is ± det(A), and the coefficient of tn-1 is minus the trace of A, (minus the sum of the diagonal entries).

Exercise: If Cf is a companion matrix for the monic polynomial f, then ch(Cf) = f. [hint: use induction and expand across the first row.]
One can see immediately that the trace of Cf is - an-1.

Corollary:(Cayley Hamilton) If T is any linear transformation, then chT(T) = 0. In particular a matrix satisfies its characteristic polynomial.
proof: The annihilator ideal of the cyclic module R/I where I is any ideal of the ring R, equals I. In particular the annihilator ideal of k[t]/(f) is (f). Hence the annihilator of the module k[t]/f1 x ...x k[t]/fm, where fi divides fi+1, is fm. I.e. the smallest degree monic polynomial f such that f(t) = 0 on this module is fm. If this module represents (V,T), then the minimal polynomial of T is fm, and we just showed the characteristic polynomial of T is the product ?fi. So the minimal*polynomial of T divides its characteristic polynomial, which implies the corollary. QED.

Note: Since every factor fi divides fm, this proof shows that every irreducible factor of chT(t) is an irreducible factor of the minimal polynomial mT(t), (and vice versa). Moreover, for a cyclic or companion matrix, the minimal and characteristic polynomials are equal. This is the analog of the fact that for a cyclic group Z/nZ, the order n of the group equals the annihilator of the group.

Example: A nilpotent matrix A is a square matrix such that Am = 0 for some m. If A is nilpotent, follows that An = 0, where n is the dimension of the matrix A. Since all coefficients ai of the characteristic polynomial for a nilpotent matrix are 0 except the leading one, the rational canonical form of a nilpotent matrix consists of blocks of form:

The reader should verify this matrix is nilpotent.

Direct proof of Cayley Hamilton:
Cramer's rule implies the Cayley Hamilton theorem directly, without using the decomposition theorem, or the rational canonical form, as follows. Let [tI-A] be the characteristic matrix for A, with coefficients in k[t], and substitute t = A into this matrix, obtaining an n by n matrix with coefficients in the subring k[A], of Matn(k).
This may be viewed as defining a linear map on the product space (k^n) x...x (k^n), a product of n copies of k^n. Note this is not the same as substituting t = A into tI-A viewed as a polynomial with matrix coefficients, as that would give A.I-A = 0. Our result instead is the following n by n matrix M:

M = . Now take the transpose of this,

Mt = , and apply it to the column of vectors

in (k^n)^n.

By definition of the entries in A, this yields Mt = . Now multiply

Mt from the left by adj(Mt) = (adj(M))t. By Cramer's rule adj(Mt) Mt =

ch(At)(A).I = chA(A).I = annihilates the vector . I.e. the matrix

product = . Hence chA(A)(ei) = 0 for

each i, so chA(A) = 0. QED.

Note: This proves the minimal polynomial divides the characteristic polynomial, but does not show they have the same irreducible factors.

mathwonk · Oct 31, 2006

day 3.2, canonical forms of matrices

The canonical presentation of (k^n, A) by the characteristic matrix of A.

Next we ask how to find the rational canonical form of a given n by n matrix A over a field k. Since it is determined by the cyclic decomposition of the k[t] module (k^n,A), it suffices to diagonalize any presentation matrix for this module. So we look for a matrix M of polynomials in k[t], whose cokernel is isomorphic to (k^n, A) as k[t]- modules. Perhaps not surprisingly, it is given by the only k[t] matrix we know, the characteristic matrix [tI-A].
It is easy to find an explicit sequence of k[t] generators for (k^n,A), since e1,...,en are k generators, hence also k[t] generators of kn. The map (k[t])^n--->k^n, sending Ei to ei, where E1 = (1,0,...,0) in (k[t])^n, and e1 = (1,0...,0) in k^n, is thus a surjective k[t] module map, where <sum> fi(t)Ei in (k[t])^n goes to <sum> fi(A)ei in k^n.

The next theorem is our main result.
Theorem: Given an n by n matrix A over a field k, defining a k[t] module structure on kn, the k[t] module map (k[t])n--->kn, sending <sum> fi(t)Ei to
<sum> fi(A)ei, is surjective. Its kernel is a free k[t] module of rank n generated by the columns of [tI-A], the characteristic matrix of A. I.e. the following sequence of k[t] modules is exact: 0--->(k[t])^n--->(k[t])^n--->k^n--->0, where the left map is multiplication by [t.I-A].

Remark: This will follow from a version of the wonderful "root factor" theorem.

As corollary of the theorem above we get another proof of
Cayley Hamilton: If the k[t] module (kn, A) is isomorphic to the product (k[t]/f1) x ...x (k[t]/fm), in standard form, i.e. where fi divides fi+1, then the minimal polynomial of A is fm and the characteristic polynomial is the product ?fi.
proof: Since [tI-A] is a presentation matrix for this module, there exist invertible matrices A, B over k[t] such that A[tI-A]B is diagonal, with lower diagonal entries equal to the fi, and higher diagonal entries = 1.
Hence det(A)chA(t)det(B) = ?fi. Since A, B are invertible over k[t], their determinants are units in k[t] hence non zero constants in k. Since chA(t) is monic, the coefficient of the leading term on the left equals det(A)det(B). Since the product ?fi on the right is also monic, det(A)det(B) = 1, hence chA(t) = ?fi. QED.

Note the analogy here with the structure of finite abelian groups. If G is an abelian group isomorphic to (Z/n1) x ...x (Z/nr), where ni divides ni+1, then nr is the annihilator of G, (it generates the principal annihilator ideal), and the cardinality of the group G is ?ni. In both cases it is hard to compute the precise annihilator, but we can compute a multiple of it more easily, i.e. in one case the order of the abelian group, and in the other the characteristic polynomial of the matrix. In both cases the computable element has the same prime factors as the annihilator.

Next we recall the root - factor theorem, and apply it to prove the theorem above, that the characteristic matrix of A gives a presentation for the k[t] module (k^n, A). We also get another proof of Cayley Hamilton.

Polynomials with non commutative coefficients: If R is any ring, not necessarily commutative, define the polynomial ring R[t] as usual, but where powers of t commute with all coefficients in R, although the coefficients may not commute among themselves.
Hence f(t) = <sum> ait^i = <sum> t^iai, but if we set t*= c, where c is in R, it makes a difference whether we set t = c in the first or the second of these expressions. We call fr(c) = <sum> aic^i the right value of f at c, and fl(c) =
<sum> c^iai, the left value of f at c.

Remainder theorem: If f(t) is a polynomial in R[t], then we can write f(t) = (t-c)q(t) + fl(c) = p(t)(t-c) + fr(c), i.e. we can divide f(t) by (t-c) from the left, with remainder the left value of f at c, and similarly from the right. The quotients and remainders are unique if we require the remainder belong to R.
proof: We do it for left evaluations and left division. This is the binomial theorem, i.e. replace t in f(t), by (t-c)+c and expand. We get in each term tiai, terms in which all but the last have a factor of (t-c), i.e.
t^iai = [(t-c)+c]^i ai = [(t-c)q(t) + c^i] ai. Thus f(t) = <sum> t^iai = (t-c)Q(t) + <sum>c^iai, and we see the remainder is indeed the left evaluation of f at c.
This proves existence. For uniqueness, assume f(t) = (t-c)q(t)+r = (t-c)(p(t)+s, where r,s belong to R. Then (t-c)[q(t)-p(t)] = s-r. Thus the left hand side also belongs to R. But multiplication by (t-c) raises the degree by one, so the left hand side has degree >= 1, unless [q(t)-p(t)] = 0. then also r-s = 0. Hence both quotient and remainder are unique. QED.

Corollary: If f(t) is any polynomial in R[t], f is left divisible by (t-c) if and only if fl(c) = 0. Similarly for right divisibility.
proof: The expression we gave shows that f(t) = (t-c)q(t) + fl(c), Hence if fl(c) = 0, then f is left divisible by (t-c). Conversely, if f is left divisible by (t-c), uniqueness shows the remainder, which is zero, must equal fl(c), so fl(c) 0. QED.

Next to apply these results about divisibility of polynomials, to products of matrices, we prove that matrices with polynomial entries are equivalently polynomials with matrix coefficients.

Lemma: If k is a field, the non commutative ring Matn(k[t]) of n by n matrices with entries from k[t], is isomorphic to Matn(k)[t], the ring of polynomials with coefficients in the non commutative ring Matn(k).
proof: Just as with commutative rings, a ring map R[t]-->S is obtained from a ring map R--->S plus a choice of element in S to send t to, only this time, since t commutes with R in R[t], we must choose as image of t, an element that commutes with the image of R in S. So we map Matn(k) into Matn(k[t]) by viewing scalar matrices as polynomial matrices, and then send t to the matrix t.I, which is in the center of Matn(k[t]), i.e. it commutes with everything. It is an exercise to check this ring map is injective and surjective. QED.

It follows that if we have two matrices of polynomials and we multiply them as matrices, we get the same result by viewing them as polynomials with matrix entries, and multiplying them as polynomials.

Corollary: Cayley Hamilton. A square matrix A over a commutative ring R, is a root of its characteristic polynomial chA(t).
proof: By Cramer's rule, we have (tI-A).adj(tI-A) = chA(t).I, as products of matrices. Then it holds also as products of polynomials. Setting t = A gives zero on the left, hence also on the right side. I.e. if chA(t) = <sum> t^ici, where the ci belong to R, then chA(t).I = (<sum> t^ici).I = <sum> t^i(ci.I). Thus setting t = A gives 0 = <sum> A^i(ci.I) = <sum>A^i(ci) = <sum> ciA^i = chA(A). QED.

If in the lemma above, we think of the matrix on the left acting individually on each column vector of the matrix on the right, we can also consider matrices of polynomials acting on column vectors of polynomials, as multiplication from the left of polynomials with matrix coefficients, times polynomials with column vector coefficients. I.e. the lemma also holds with the same proof for polynomials with coefficients in any ring R with identity, acting from the left on polynomials with coefficients in any (unitary) left module over R.

So let k^n[t] denote polynomials with coefficients which are column vectors from k^n. This is not a ring, in particular the coefficents do not have an element 1, so this object does not contain t. But the coefficients do contain the basic vectors ei, and we can multiply these by polynomials over k and add up. In particular this object is a k[t] module, and is isomorphic as such to the free k[t] module (k[t])^n.
I.e. if Ei are the standard free k[t] basis vectors in (k[t])^n, just send Ei to ei, and <sum>fiEi to <sum>fiei where fi are polynomials in k[t]. The expression <sum>fiei can be re - expanded as a polynomial in t with vector coefficients by expanding each term as fei = (a0+a1t+...+t^n)ei = (a0ei + t a1ei +...+ t^nei), and then combining coefficients of like powers of t, from various terms, to get coefficient vectors.

Exercise: Show this gives a k[t] module isomorphism (k[t])^n--->k^n[t].
As we have remarked above, the previous lemma, shows multiplication of matrices corresponds to multiplication of polynomials, i.e. the isomorphisms above, give isomorphisms of multiplication diagrams with matrix multiplication Matn(k[t]) x (k[t])^n--->(k[t])^n, corresponding to polynomial multiplication Matn(k)[t] x k^n[t] ---> k^n[t].

Now we can prove the main presentation theorem.
Theorem: Given any n by n matrix A over a field k, defining a k[t] module structure on k^n, the k[t] module map (k[t])^n--->k^n, sending
<sum> fi(t)Ei to <sum> fi(A)ei, is surjective, and its kernel is a free k[t] module, freely generated by the columns of [tI-A], the characteristic matrix of A. I.e. this sequence is exact: 0--->(k[t])^n--->(k[t])^n--->k^n--->0, as k[t] - modules, where the left map is multiplication by [tI-A].
proof: We know the last map is surjective.
Recall the right map takes <sum>fi(t)Ei to <sum>fi(A)ei, which is exactly the result of viewing <sum>fi(t)Ei as a polynomial <sum>fi(t)ei with coefficient vectors in k^n, and then setting t = A. So if we view these as maps of polynomials k^n[t]--->k^n[t]--->k^n--->0, the right map k^n[t]--->k^n, is left evaluation of a polynomial f(t) with vector coefficients, at t = A. By the root factor theorem above, this equals zero if and only if the polynomial f(t) is left divisible by (t-A), i.e. if and only if f(t) is in the image of the left hand map k^n[t]--->k^n[t].
Since multiplication by a monic polynomial never sends a non zero polynomial to zero, the left map is injective. Hence the sequence
0--->(k[t])^n--->(k[t])^n--->k^n--->0 is exact, and (tI-A) is indeed a presentation matrix for the module (k^n,A). QED.

The following amazing theorem, generalizes the fact an injective endomorphism of a finite dimensional vector space is also surjective.
Theorem: If R is any commutative ring and X a finitely generated R module, any surjective R module map f:X--->X is an isomorphism.
proof: This follows from the proof of Cayley Hamilton. If x1,...,xn are generators and if we write f(xj) = <sum>i aij xi, then as in a previous proof, the matrix A represents f for the generators {xi} even if not independent, and look at the matrix M = . Again the transpose

Mt = , annihilates the column of vectors

Again the determinant of tI-A is a polynomial P(t) over R annihilating the matrix A and hence the map f. As a small refinement: note if the image f(X) of the map f lies in the submodule IX, for some ideal I of R, then we can choose the entries aij to belong to I, and looking at the determinant formula for P shows the coefficient of ti in P(t) belongs to the power I^n-i of the ideal I, where n = degree of P(t).
Now apply the principle just proved, not to f, but to the map
Id:X--->X where X is viewed not as an R module, but as an R[t] module where t acts via t = f. Then the image of Id is all of X, which equals (t)X, the product of X by the ideal (t) in R[t]. Hence we have a polynomial satisifed by Id as follows: Id^n + c1f.Id^n-1 + ...+cn-1f^n-1.Id + cnf^n = 0, where each cif^i belongs to the ideal (f) in R[f]. But we can solve this for Id, getting Id = -[c1f.Id^n-1 + ...+cn-1f^n-1.Id + cnf^n ] =
f [-c1.Id^n-1 - ...-cn-1f^n-2.Id - cnf^n-1]. The polynomial expression on the right is a right inverse for f, and since all its terms are polynomials in f, it commutes with f, hence is also a left inverse. QED.

mathwonk · Oct 31, 2006

in my class, we have so far covered almost all topics on the syllabus in post 177, and are now in the field and galois theory section. we did not prove either the simplicity of A(n) [but we did prove it for A(5), hence the non - solvability of all A(n) n > 4] or the spectral theorem.

i did not write up the jordan form material again yet this fall, nor any galois theory, but of course a detailed treatment is in my webnotes.

courtrigrad · Oct 31, 2006

mathwonk, do you know of any places that still used Intro to Calculus and Analysis by Courant/John?

mathwonk · Oct 31, 2006

well i used it in my freshman honors calc course at UGA a couple years back. we were able to get copies for everyone for about 10-15 bucks apiece.

what a bargain. and I learned a lot too using it.

mathwonk · Oct 31, 2006

oh and to people interested in good notes and cheap, i.e. free, the homepage of James Milne, university of michigan, has outstandingly good course notes, in tex, well written, and mathematically very authoritative.

they range from groups to galois theory, to etale cohomology and the Weil conjectures, miles over my head, but very inspiring. it would be cool to have an e - seminar on them, but i don't know how feasible that is.

you need sheaf cohomology as background for instance, but the real obstacle is someone with time and energy to commit to keeping it going, as some guys did with Bachman's book a while back.

that would ideally be me, but i am feeling challenged just keeping up my class preparations in calc and galois theory.

anyway i am quite interested in exploring the deep connections between algebra topology and geometry contained in the links between the fundamental group, its monodromy representations defined by branched coverings of algebraic varieties, and etale cohomology and the absolute Galois group of the rationals.

e.g. Grothendiecks conjecture (nakahara, et al..) that the galois group of a hyperbolic curve determines its arithmetic and geometry.

quasar987 · Oct 31, 2006

Are you planing on reading some more Riemann eventually mathwonk?

mathwonk · Oct 31, 2006

yes i would like to understand his discussion of his singularity theorem, and other results on theta divisors.

also his study of the representation of algebraic curves via monodromy representations, which foreshadows all this geometric galois theory.

jammidactyl · Nov 1, 2006

I need some hints on where to apply to grad school. First a few facts:

3.7 GPA mathematics, PSU
3.7 GPA meteorology, Naval Postgraduate School (one-year)

No research experience. Joined the Air Force and have been working as a weather officer for two years. Main interests lie in differential geometry and algebra with some interest in logic/computation. Took GRE as a cocky undergrad without studying... didn't do real hot. Taking it again next April and currently spending most free time practicing. Now I understand "the process". Getting out of military, but not ready to take GRE this month to apply to school for next year... going to have to apply for fall 2008.

With a solid GRE score, what level of school should I be applying to? I don't want to set myself up for disappointment, nor do I want to sell myself short.

Should I Become a Mathematician?

Similar threads

Hot Threads

Recent Insights