ok, i'll start with the highest-level explanation i can, and if we need to, we'll work our way down.
first of all, we can define an equivalence relation on G, by x~y iff x = gxg-1 for some g in G.
an equivalence relation partitions a set into disjoint subsets, the equivalence classes. so it is natural to ask what is the equivalence class of x, [x], under ~?
well, clearly it is the set of all conjugates of x, {gxg-1: g in G}.
now, consider the subgroup N(x) = {g in G: gx = xg}, that is, every element of G that commutes with x. if n is in N(x), then nxn-1 = xnn-1 = x.
so every element of N(x) just gives x upon conjugation.
what we want to show is that there is a bijection between the set of all conjugates of x (that is, the equivalence class [x]) and the set of left cosets gN(x).
suppose two conjugates of x are the same:
so gxg-1 = hxh-1. then,
x = g-1hxh-1g = (g-1h)x(g-1h)-1, that is:
(g-1h)x = x(g-1h).
this means that g-1h is in N(x), so gN(x) = hN(x).
this means that the map gN(x) → gxg-1 is injective, provided it is well-defined.
well suppose h is in gN(x), so h = gn, for some n in N(x).
then hxh-1 = (gn)x(gn)-1 = g(nxn-1)g-1
= gxg-1 (since n is in N(x), and thus commutes with x).
so our map is indeed well-defined, it only depends on the coset gN(x), and not on the element we pick from it to represent it.
clearly this map is surjective, too, since any conjugate gxg-1 of x has the pre-image the coset of N(x) which contains g.
so gN(x) → gxg-1 is indeed a bijection between the (left) cosets of N(x) and the conjugates of x in [x].
but we can count the number of cosets of a subgroup rather easily, it is the index of the subgroup in G.
thus: |[x]| = [G:N(x)], there are exactly as many conjugates of x, as there are cosets of the normalizer of x.
now |G| is just the sum of the sizes of these equivalence classes, so we have:
|G| = \sum_{[x]} [G:N(x)], where we count each equivalence class only once.
this is almost the class equation. one more wrinkle to work out. if x is in the center of G, so that xg = gx, for every g in G, it's normalizer N(x) is all of G. so in this case, [G:N(x)] = [G:G] = 1.
rather than sum these all "one at a time", we just count them "all at once" by finding the order of the center. this gives us:
|G| = |Z(G)| + \sum_{x \not \in Z(G)} [G:N(x)]
where again, we only sum over distinct conjugacy classes.