Python How can I modify my regex to allow for overlapping matches in Python?

Click For Summary
The discussion centers on creating a function that converts a string like "aXYb" into a regex pattern where lowercase letters are treated as literal characters and uppercase letters are treated as free variables. The goal is to generate a regex that matches patterns like "a([a-z]+)([a-z]+)b" but allows for overlapping matches, such as both ("a", "xa") and ("ax", "a"). The user notes that standard regex does not support overlapping matches, which complicates their intended use for algebraic manipulations in abstract algebra. They consider whether using a tool like lex, which is designed for context-free languages, would be more effective than attempting to modify regex for their needs. The discussion highlights the challenges of using regex for complex pattern matching and the potential need for alternative approaches in handling context-free expressions.
TylerH
Messages
729
Reaction score
0
I'm writing a function to take a string like "aXYb" and return a regex in which the lower case letters act like actual character and the upper case become free variables.

The regex generated from "aXYb" should match anything of the form a([a-z]+)([a-z]+)b. It does. But not exactly as "freely" as I would like. For example, when I re.compile(a([a-z]+)([a-z]+)b).match("aaxab"), the only tuple I get back is ("ax", "a").

Preferably, I'd like both ("a", "xa") and ("ax", "a"). How can I change "a([a-z]+)([a-z]+)b" to get these results?

For a little background, it's intended to be used in describing rules for algebraic manipulations (in the abstract algebra sense). The final goal is to be able do a breadth first search of all manipulations until I get to the desired result. The free variables are used in describing the variable part of valid manipulations, like (AB=AC -> B=C). That's why overlapping is absolutely necessary.
 
Technology news on Phys.org
That's not how regular expressions work. They are hard enough already as is, both for the users and the implementers. You are the one who knows that that ([a-z]+)([a-z]+) means something special. You can form a list of all possible matches from the one match you do get.
 
Yeah, I figured out that algebraic expressions are context free rather than regular. I think lex is used for context free languages. Would it be easier to get lex to do this than to write my own?
 
Learn If you want to write code for Python Machine learning, AI Statistics/data analysis Scientific research Web application servers Some microcontrollers JavaScript/Node JS/TypeScript Web sites Web application servers C# Games (Unity) Consumer applications (Windows) Business applications C++ Games (Unreal Engine) Operating systems, device drivers Microcontrollers/embedded systems Consumer applications (Linux) Some more tips: Do not learn C++ (or any other dialect of C) as a...

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 0 ·
Replies
0
Views
1K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
1
Views
2K
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 4 ·
Replies
4
Views
12K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K