There are two rules that you will get used to with time (and I'll attempt to explain their origin, further down) :
Rule #1 : The energy of an electron increases with increasing n+l, where n is the shell number, and l is the sub-shell (for s, p, d, f, l = 0, 1, 2, 3 respectively) number. (Note : there can be conditions when this rule gets slightly altered)
Rule #2 : When two subshells have the same n+l value (like 2p and 3s), the one with the lower n has a lower energy.
From these above rules (which are outcomes - albeit approximate - of QM calculations), one can determine the order in which sub-shells are chosen for occupancy.
This is the order (verify using above rules, and correct me, if I've made a mistake) :
1s < 2s < 2p < 3s < 3p < 4s < 3d < 4p < 5s < 4d < 5p < 6s < 4f < 5d < 6p < 7s < ...
Argument for the Origin of the above Rules :
Let's look at what happens to the energy of the atom/ion when you add electrons. There are two effects that dictate the changes in energy.
(i) Initially, adding more electrons to a given shell reduces the energy, because it increases the attractive force on the nucleus (which has a negative sign). Now, at the same time, this addition will also increase the repulsion (positive sign) between the electrons themselves, but this increase is found to be smaller than that with the nucleus. Since the reduction in energy (due to increased attraction from the nucleus) dominates the increase in energy (due to increased repulsion between electrons), the net effect is a reduction of energy - or an improvement of stability. However, as more electrons are added to the shell, the shell starts to get more crowded, so after a point, the repulsion between the electrons becomes more dominant. As a result, further addition of electrons beyond a particular number causes the energy to go up. Typically, this starts to happen whenever you've just filled (or sometimes, exactly half-filled) a sub-shell. Sometimes, this increase can be small, but other times it can be large.
(ii) Adding an electron into a new shell increases the energy too. Since this new (bigger) shell is farther away from the nucleus, the increase in attraction to the nucleus is smaller (remember, the electrostatic force is inversely proportional to the square of the distance between the charges) and so the repulsion is more dominant, causing the energy to go up (and usually by a significant amount).
So at certain points in the filling process (usually when a sub-shell has just been filled), you have to make a decision between filling the next sub-shell in the same shell (which increases the energy) and going to the next shell itself (this too increases the energy). The decision depends on which route causes a smaller increase in energy. Sometimes, it's better to go to the next subshell and at other times it's preferable to go to the next shell. Stated differently, it is possible for a lower subshell in a higher shell to have a lower energy than a higher subshell in the same shell. The energies of these sub-shells have been calculated using QM and the essence of the results of this calculation is conveyed by the 2 rules above.
(While this explains why the filling order follows some complex pattern, it does not explain why this pattern can be described (approximately) by the above rules. To understand this, you must know how to do the calculations.)
Let's now apply the rules to the elements to better understand how they work.
Start with no electrons, and keep adding one at a time, putting the first electron in 1s. The addition of the second electron to 1s reduces the energy as explained in (i), so He (=1s^2) is more stable than H(=1s^1). Now the next electron must go to 2s, causing the energy to increase (by quite a large number actually). Thus, Li (=1s^2 2s^1) is more unstable than He. So, He sits at something like a stability peak (Duet/Fully Filled Shell Rule). Adding a second electron to 2s makes Be (=1s^2 2s^2) more stable than Li. But adding the first electron to 2p causes the energy to increase, though by a small enough amount that this is still preferable to 3s. So B (=1s^2 2s^2 2p^1) is marginally more unstable than Be. Adding more electrons to 2p makes C (=1s^2 2s^2 2p^2), N (=1s^2 2s^2 2p^3),..., Ne (=1s^2 2s^2 2p^6) more and more stable. The only deviation from this trend is that O (=1s^2 2s^2 2p^4) is actually a little bit more unstable than N, because N has an exactly half-filled 3p sub-shell (the reason for this increase in energy beyond half-filling has to do with the spins of the electrons). Now, as expected, adding an electron to Ne involves starting a new shell, making Na much more unstable than Ne. So, Ne occupies the second major stability peak. But keep in mind that, along the way, Be and N also occupied little peaks themselves.
(Check the bar graph at
this link to confirm that the above analysis is indeed correct. The ionization energy is a measure of stability)
Now finally, we can show why the Octet Rule works, based upon the ordering of sub-shells as determined by the above two rules. Let's list the order again :
1s < 2s < 2p < 3s < 3p < 4s < 3d < 4p < 5s < 4d < 5p < 6s < 4f < 5d < 6p < 7s < ...
First 1s is filled (Duet/Fully Filled Shell Rule) .
Then 2s and 2p are filled completing the second shell ((Octet/ Fully Filled Shell Rule)
Next, 3s and 3p are filled. But the next electron goes to 4s (a new shell), making a large drop in stability. So, Ar (=1s^2 2s^2 2p^6 3s^2 3p^6) sits at a stability peak with 8 electrons in the valence (n=3) shell (Octet Rule).
Similarly, the filling of 4s, 3d (which is more energetic than 4s, but better than 4p) and 4p gives us Kr (=1s^2 2s^2 2p^6 3s^2 3p^6 3d^10 4s^2 4p^6). But since the next electron goes to 5s (a new shell), Kr sits at another peak with 8 electrons in the valence (n=4) shell (Octet Rule).
Checking the others in the list shows that the Octet Rule applies to all atoms beyond He.
Further Insights :
This pattern (which is a result of the two rules) suggests that the d sub-shells are energetically expensive, because whenever some {N}p (n+l =N+1) sub-shell gets filled, the next electron goes to {N+1}s (n+l = N+1) rather than {N}d (n+l = N+2), since clearly N+1 < N+2. This is, in fact true. The shape of the d sub-shells makes them relatively unfavorable.
Conclusion :
So, in short, the "lower n" rule (#2) ensures that the valence shell gets an Octet (since {N+1}s will not be filled before {N}p, as long as {N}p exists), and the "n+l" rule (#1) ensures that you won't get more than an Octet (since {N+1}s will get filled before {N}d).