Overlapping Matches Python

  • Python
  • Thread starter TylerH
  • Start date
  • #1
724
0

Main Question or Discussion Point

I'm writing a function to take a string like "aXYb" and return a regex in which the lower case letters act like actual character and the upper case become free variables.

The regex generated from "aXYb" should match anything of the form a([a-z]+)([a-z]+)b. It does. But not exactly as "freely" as I would like. For example, when I re.compile(a([a-z]+)([a-z]+)b).match("aaxab"), the only tuple I get back is ("ax", "a").

Preferably, I'd like both ("a", "xa") and ("ax", "a"). How can I change "a([a-z]+)([a-z]+)b" to get these results?

For a little background, it's intended to be used in describing rules for algebraic manipulations (in the abstract algebra sense). The final goal is to be able do a breadth first search of all manipulations until I get to the desired result. The free variables are used in describing the variable part of valid manipulations, like (AB=AC -> B=C). That's why overlapping is absolutely necessary.
 

Answers and Replies

  • #2
D H
Staff Emeritus
Science Advisor
Insights Author
15,393
683
That's not how regular expressions work. They are hard enough already as is, both for the users and the implementers. You are the one who knows that that ([a-z]+)([a-z]+) means something special. You can form a list of all possible matches from the one match you do get.
 
  • #3
724
0
Yeah, I figured out that algebraic expressions are context free rather than regular. I think lex is used for context free languages. Would it be easier to get lex to do this than to write my own?
 

Related Threads on Overlapping Matches Python

Replies
3
Views
751
Replies
6
Views
4K
  • Last Post
Replies
4
Views
1K
  • Last Post
Replies
7
Views
878
  • Last Post
Replies
4
Views
451
Replies
6
Views
3K
  • Last Post
Replies
10
Views
1K
  • Last Post
Replies
2
Views
2K
  • Last Post
Replies
8
Views
5K
  • Last Post
Replies
4
Views
3K
Top