My understanding of the factorization scale vs renormalization scale is:
1. Renormalization scale: this is the usual scale you get in DimReg. This scale can be used to derive an RG evolution equation and thus can be used to resum large logarithms.
2. Factorization scale: this is the scale at which you no longer trust perturbation theory. You know you have terrible soft and collinear divergences, and these divergences can be absorbed into IR-unsafe quantities such as parton distrib'n functions. \mu_F is the scale where you declare these soft divergences to take over. Above that scale you are perturbative and can rely on Feynman Diagram calculations, while below that scale you have an ugly matrix element of a (nonlocal) operator, which you cannot calculate but is "universal" - that is, the same for many different hard processes (such as a pdf).
The renormalization scale is arbitrary, but you want to choose it wisely! You always choose this scale so that large logarithms vanish (more precisely, are resummed into running coupling constants, etc). That is why you evaluate \alpha_s at the renormalization scale.
The factorization scale is NOT arbitrary! That is set by the kinematics of your problem. As humanino said: it's the scale at which higher twist terms matter. In other words, it's the place where you don't know how to calculate anymore!
There's more to choosing \mu_R=\mu_F than just "simplifying the result": you are avoiding dangerous logarithms of the ratio of these scales, which might destroy Perturbation theory. As long as there are no other scales, there is no problem. The tricky part is when there are SEVERAL scales in your problems (collinear divergences). Then it is by no means obvious what to chose for \mu_R. This is what "Soft Collinear Effective Theory" is all about.
Hope that helps!