Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Simplifying regular expression?

  1. Jan 20, 2007 #1
    Does anyone know if this regular expression can be simplified?

    I tried (a|b|c)*b, but it's not correct.
  2. jcsd
  3. Jan 20, 2007 #2


    User Avatar
    Homework Helper

    Well in that group you have the same suffix on both words, so you can factor it out (1). Then notice that the content in that group might be empty which is a problem because it will run away on failure, so we must disallow that that empty case. We can do that with a branch (2).

    #0: (a*bc*bc*|c*)*b
    #1: ((a*bc*b)?c*)*b
    #2: ((a*bc*b)c*|c+)*b

    You can reduce the number of operators by one by factoring out c*:

    #3: ((a*bc*b|c)c*)*b

    Now you have a problem with the c's which is difficult to explain. The clash is between the last c* and the )* (it's because that group can simplify to c+ in certain cases). The way around that is to force that c* to group atomically so that it won't backtrack.

    #4: ((a*bc*b|c)c*+)*b
    OR ((a*bc*b|c)(?>c*))*b

    Now that should work well enough but to be pedantic those first 2 *'s don't need to backtrack:

    #5: ((a*+bc*+b|c)c*+)*b

    Unfortunately that last * does need to be able to backtrack because there might be an even number of b's being matched against and it would fail in that case if that group was atomic.

    Needless to say, regexes are difficult. I learned them from the PHP manual 'PCRE' section. You can find it online.
    Last edited: Jan 20, 2007
  4. Jan 20, 2007 #3
    Hmm, so it cannot be simplified to something really short ie. (a|b|c)*b (which is wrong) as it seems.

Share this great discussion with others via Reddit, Google+, Twitter, or Facebook