shivajikobardan said:
You have two operations that the regex is doing: (1) looking at a character and deciding whether it's a match; (2) deciding how many characters to match.
The "looking at a character and deciding if it's a match" operation, by itself, is straightforward: a single character matches if it's an a or a b.
The problem you are having comes with the "deciding how many characters to match" operation. You are thinking of that operation as follows: once you find a matching character, continue matching more characters as long as they are
the same character as the first one that matched. In other words, once a match is found with an a, continue matching as long as there are a's, or once a match is found with a b, continue matching as long as there are b's; but
stop matching if you started with a's but then encounter a b, or vice versa.
But that is
not what the regex
(a | b)*
actually does. The regex that actually does what I just described in the last paragraph is the one I gave in my last post,
(a* | b*)
.
That regex will indeed match aa, aaa, bb, bbb, etc., but will
not match baaaa.
What the regex
(a | b)*
actually does is to keep matching characters as long as each character is either an a or a b; it does not care if a's and b's are mixed. That is why baaaa is a match. The reason it doesn't care if a's and b's are mixed is that the "check for a match" operation, a | b, is
inside the parentheses, but the "keep matching" operation, *, is
outside the parentheses. That means it applies the a | b match criterion to each character independently during the "keep matching" operation, so it will keep matching as long as each character is either an a or a b, without caring about mixing the two.
Whereas in the regex
(a* | b*)
, the "keep matching" operation is
inside the parentheses, and there are two of them, one for a and one for b; that means that once it succeeds at the "check for a match" operation just
once, it has to try the "keep matching" operation with just that character that it matched, for as long as it can.