think about what it means for a column to have a "leading 1". it means all of the entries before the leading 1 in the ROW the leading 1 occurs in, are 0. that means that no linear combination of the columns before it, can possibly include it in their span.
say a leading 1 occurs in the 3rd column, in the second row. this means that the column 1 and column 2 vectors have 0 in their 2nd position (the 2nd row). so no matter what linear combination of column 1 and column 2 we take, it, too will have 0 in the second position. but column 3 has 1 in the second position. so it has to be linearly independent of all the column vectors before it.
by contrast, we know that column 2 has some non-zero entry in row 1, and column 1 is ALWAYS (1,0,0..,0). so column 2 is linearly dependent on column 1, so including it in a spanning set is redundant.
there's nothing special about choosing the columns of A that correspond to the pivot columns of rref(A), it's just that, after row-reducing A, and seeing what the rank of A actually is, they make an easy choice. we could find many other possible bases for col(A), but picking the columns that correspond to the pivot columns of
rref(A), ensures that we get a linearly independent set.
to pick a basis for col(A), we know that the columns of A are a spanning set. and we know that we need to pick rank(A) of them, to get a basis. we will always have rank(A) pivot columns in rref(A), since every pivot column has a leading 1, which "tags" a non-zero row; we know these columns correspond to the rank of A.
other choices of the columns of A might lead to a basis for col(A), or might not. the matrix A =
[1 1 0 1]
[0 0 1 2]
[0 0 0 1]
has rank 3, but columns 1,2 and 4 do not make a basis.