How can I modify my regex to allow for overlapping matches in Python?

  • Context: Python 
  • Thread starter Thread starter TylerH
  • Start date Start date
  • Tags Tags
    Overlapping Python
Click For Summary
SUMMARY

The discussion focuses on modifying a Python regex to allow for overlapping matches, specifically using the pattern "a([a-z]+)([a-z]+)b". The user seeks to obtain multiple tuples from the string "aaxab", aiming for results like ("a", "xa") alongside ("ax", "a"). The challenge arises from the limitations of regular expressions in handling overlapping patterns, which are essential for the user's goal of performing breadth-first searches in algebraic manipulations. The conversation also touches on the potential use of lex for context-free languages as an alternative approach.

PREREQUISITES
  • Understanding of Python regex syntax and functions
  • Familiarity with string manipulation in Python
  • Basic knowledge of abstract algebra concepts
  • Awareness of context-free languages and tools like lex
NEXT STEPS
  • Explore Python's regex module for advanced matching techniques
  • Research the use of lex for parsing context-free languages
  • Learn about alternative libraries for regex in Python, such as regex module
  • Investigate algorithms for breadth-first search in abstract algebra contexts
USEFUL FOR

This discussion is beneficial for Python developers, mathematicians working with algebraic expressions, and anyone interested in advanced regex techniques and context-free language parsing.

TylerH
Messages
729
Reaction score
0
I'm writing a function to take a string like "aXYb" and return a regex in which the lower case letters act like actual character and the upper case become free variables.

The regex generated from "aXYb" should match anything of the form a([a-z]+)([a-z]+)b. It does. But not exactly as "freely" as I would like. For example, when I re.compile(a([a-z]+)([a-z]+)b).match("aaxab"), the only tuple I get back is ("ax", "a").

Preferably, I'd like both ("a", "xa") and ("ax", "a"). How can I change "a([a-z]+)([a-z]+)b" to get these results?

For a little background, it's intended to be used in describing rules for algebraic manipulations (in the abstract algebra sense). The final goal is to be able do a breadth first search of all manipulations until I get to the desired result. The free variables are used in describing the variable part of valid manipulations, like (AB=AC -> B=C). That's why overlapping is absolutely necessary.
 
Technology news on Phys.org
That's not how regular expressions work. They are hard enough already as is, both for the users and the implementers. You are the one who knows that that ([a-z]+)([a-z]+) means something special. You can form a list of all possible matches from the one match you do get.
 
Yeah, I figured out that algebraic expressions are context free rather than regular. I think lex is used for context free languages. Would it be easier to get lex to do this than to write my own?
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
1
Views
2K
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 4 ·
Replies
4
Views
12K
  • · Replies 1 ·
Replies
1
Views
3K