Simple Regex help for matching strings inside markup tags

  • Thread starter Thread starter Greg Bernhardt
  • Start date Start date
  • Tags Tags
    Strings Tags
AI Thread Summary
A user is seeking assistance with a regular expression to match strings formatted with "Re:" enclosed in bold markup tags, specifically in a SQL context using MariaDB. Initial attempts with regex patterns like "(?<=)Re:.*?(?=])" and "\[b]Re:(.+)[\/b]" were unsuccessful. The discussion highlights the importance of escaping square brackets in regex, suggesting the pattern "\[b\]Re:(.+)\[/b\]" as a potential solution. However, the user faced issues when executing a SELECT query, which returned fewer results than expected. It was noted that MariaDB requires double backslashes for proper escaping, leading to the successful pattern "\\[b\\]Re:(.+)\\[\\/b\\]". The conversation emphasizes the differences in regex handling across programming languages and database engines, particularly between Python and SQL.
Messages
19,787
Reaction score
10,739
Say I have a dozen of strings like this, with different content between the bold markup tags ("Re:" is also a constant). I need a regular expression that will match this. Basic and I think I am close, but just need a little help.



Here is what I have, but it's not matching

(?<=[b])Re:.*?(?=[/b]]) or this \[b]Re:(.+)[\/b]
 
Technology news on Phys.org
Greg Bernhardt said:
Say I have a dozen of strings like this, with different content between the bold markup tags ("Re:" is also a constant). I need a regular expression that will match this. Basic and I think I am close, but just need a little help.



Here is what I have, but it's not matching

(?<=[b])Re:.*?(?=[/b]]) or this \[b]Re:(.+)[\/b]
Is this Python?
 
Just vanilla regular expression to use in SQL eventually
 
  • Like
Likes WWGD
Don't you just need to escape the square brackets? Otherwise the regex interprets it as a character range (i.e., [abcd] matches a, b, c or d, so a b in square brackets matches b). So \[b\]Re:.*\[\/b\] should work (although that would be case sensitive).

Edit: And whether you need to escape the slash in the close bold tag may depend on your language.
 
Try this:

Code:
\[b\]Re:(.+)\[/b\]

This assumes that what's after the "Re:" can be anything; if you want to restrict it to alphanumerics and punctuation you could do something like:

Code:
\[b\]Re:(([A-Za-z \!\?\.\,\;]+))\[/b\]
 
So PF comment from the admin side is SQL-queryable?
 
Ibix said:
whether you need to escape the slash in the close bold tag may depend on your language.

Python's regexp parser (which is what I used for my testing) doesn't seem to require it, but yes, others might.
 
PeterDonis said:
Try this:

Code:
\[b\]Re:(.+)\[/b\]

This assumes that what's after the "Re:" can be anything; if you want to restrict it to alphanumerics and punctuation you could do something like:

Code:
\[b\]Re:(([A-Za-z \!\?\.\,\;]+))\[/b\]

This works when testing in https://regexr.com/ but when I do a SELECT query using it, it's not at all working and returns results that don't match at all. hmmm

WWGD said:
So PF comment from the admin side is SQL-queryable?

Of course, it's all stored in a DB
 
Greg Bernhardt said:
when I do a SELECT query using it, it's not at all working and returns results that don't match at all

What database engine? And what exactly is the SELECT query statement?
 
  • #10
PeterDonis said:
What database engine? And what exactly is the SELECT query statement?
MariaDB which is a drop in replacement for mySQL.

SELECT * FROM `xf_post` WHERE `message` REGEXP '\[b\]Re:(.+)\[\/b\]'

This actually returns only 6 results when I know there are 127k when doing a LIKE query for "[B]Re:"

One of the results matches this

[b]Please check out the COBE and WMAP results[/b]
 
Last edited:
  • #11
Greg Bernhardt said:
This works when testing in https://regexr.com/ but when I do a SELECT query using it, it's not at all working and returns results that don't match at all. hmmm
Of course, it's all stored in a DB
In case you're interested, Sql Server dev allows for Python and has its own ML Server.
 
  • #14
PeterDonis said:
Python's regexp parser (which is what I used for my testing) doesn't seem to require it, but yes, others might.
I tested by piping a string through sed, which does require it because it uses / to delimit the regex. Some other engines follow suit - apparently not this one, though.
 
  • #15
Greg Bernhardt said:
I think that is bingo!

:smile:
 
Back
Top