Simplifying regular expression?

  • Thread starter KataKoniK
  • Start date
168
0

Main Question or Discussion Point

Does anyone know if this regular expression can be simplified?
(a*bc*bc*|c*)*b

I tried (a|b|c)*b, but it's not correct.
 

Answers and Replies

verty
Homework Helper
2,157
198
Well in that group you have the same suffix on both words, so you can factor it out (1). Then notice that the content in that group might be empty which is a problem because it will run away on failure, so we must disallow that that empty case. We can do that with a branch (2).

#0: (a*bc*bc*|c*)*b
#1: ((a*bc*b)?c*)*b
#2: ((a*bc*b)c*|c+)*b

You can reduce the number of operators by one by factoring out c*:

#3: ((a*bc*b|c)c*)*b

Now you have a problem with the c's which is difficult to explain. The clash is between the last c* and the )* (it's because that group can simplify to c+ in certain cases). The way around that is to force that c* to group atomically so that it won't backtrack.

#4: ((a*bc*b|c)c*+)*b
OR ((a*bc*b|c)(?>c*))*b

Now that should work well enough but to be pedantic those first 2 *'s don't need to backtrack:

#5: ((a*+bc*+b|c)c*+)*b

Unfortunately that last * does need to be able to backtrack because there might be an even number of b's being matched against and it would fail in that case if that group was atomic.

Needless to say, regexes are difficult. I learned them from the PHP manual 'PCRE' section. You can find it online.
 
Last edited:
168
0
Hmm, so it cannot be simplified to something really short ie. (a|b|c)*b (which is wrong) as it seems.

Thanks.
 

Related Threads for: Simplifying regular expression?

  • Last Post
Replies
1
Views
1K
  • Last Post
Replies
4
Views
1K
  • Last Post
Replies
1
Views
3K
  • Last Post
Replies
2
Views
4K
Replies
12
Views
766
  • Last Post
Replies
3
Views
2K
  • Last Post
Replies
0
Views
3K
Top