Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Overlapping Matches Python

  1. May 11, 2013 #1
    I'm writing a function to take a string like "aXYb" and return a regex in which the lower case letters act like actual character and the upper case become free variables.

    The regex generated from "aXYb" should match anything of the form a([a-z]+)([a-z]+)b. It does. But not exactly as "freely" as I would like. For example, when I re.compile(a([a-z]+)([a-z]+)b).match("aaxab"), the only tuple I get back is ("ax", "a").

    Preferably, I'd like both ("a", "xa") and ("ax", "a"). How can I change "a([a-z]+)([a-z]+)b" to get these results?

    For a little background, it's intended to be used in describing rules for algebraic manipulations (in the abstract algebra sense). The final goal is to be able do a breadth first search of all manipulations until I get to the desired result. The free variables are used in describing the variable part of valid manipulations, like (AB=AC -> B=C). That's why overlapping is absolutely necessary.
  2. jcsd
  3. May 11, 2013 #2

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    That's not how regular expressions work. They are hard enough already as is, both for the users and the implementers. You are the one who knows that that ([a-z]+)([a-z]+) means something special. You can form a list of all possible matches from the one match you do get.
  4. May 11, 2013 #3
    Yeah, I figured out that algebraic expressions are context free rather than regular. I think lex is used for context free languages. Would it be easier to get lex to do this than to write my own?
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook