Simple Regex help for matching strings inside markup tags

  • Thread starter Thread starter Greg Bernhardt
  • Start date Start date
  • Tags Tags
    Strings Tags
Click For Summary

Discussion Overview

The discussion centers around creating a regular expression (regex) to match strings enclosed within specific markup tags, particularly focusing on the pattern "Re:" followed by content within bold tags. Participants explore various regex formulations and their applicability in different programming contexts, including SQL and Python.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose regex patterns like (?<=)Re:.*?(?=]) and \[b]Re:(.+)[\/b] but note that these are not matching as expected.
  • One participant clarifies that they are using vanilla regular expressions intended for SQL.
  • Another participant suggests escaping square brackets in the regex, indicating that \[b\]Re:.*\[\/b\] should work, although it may be case-sensitive.
  • Several participants discuss the implications of escaping characters differently in various programming languages, particularly noting that Python's regex parser may not require escaping certain characters.
  • One participant shares a regex pattern \[b\]Re:(.+)\[/b\] and discusses its effectiveness in testing environments like regexr.com, but expresses issues when using it in SQL queries.
  • Another participant mentions that the SELECT query using the regex returns fewer results than expected, prompting questions about the database engine and the specific query used.
  • One participant references the MariaDB documentation, suggesting that two backslashes may be necessary to properly escape characters in regex for that database.
  • A later reply indicates that changing to \\[b\]Re:(.+)\\[\\/b\] appears to yield better results.

Areas of Agreement / Disagreement

Participants express differing views on the correct regex formulation and its effectiveness across different programming environments. There is no consensus on a single solution, as various approaches are discussed and tested.

Contextual Notes

Limitations include potential differences in regex behavior across programming languages and database engines, as well as unresolved issues regarding the effectiveness of specific regex patterns in SQL queries.

Messages
19,908
Reaction score
10,915
Say I have a dozen of strings like this, with different content between the bold markup tags ("Re:" is also a constant). I need a regular expression that will match this. Basic and I think I am close, but just need a little help.



Here is what I have, but it's not matching

(?<=[b])Re:.*?(?=[/b]]) or this \[b]Re:(.+)[\/b]
 
Technology news on Phys.org
Greg Bernhardt said:
Say I have a dozen of strings like this, with different content between the bold markup tags ("Re:" is also a constant). I need a regular expression that will match this. Basic and I think I am close, but just need a little help.



Here is what I have, but it's not matching

(?<=[b])Re:.*?(?=[/b]]) or this \[b]Re:(.+)[\/b]
Is this Python?
 
Just vanilla regular expression to use in SQL eventually
 
  • Like
Likes   Reactions: WWGD
Don't you just need to escape the square brackets? Otherwise the regex interprets it as a character range (i.e., [abcd] matches a, b, c or d, so a b in square brackets matches b). So \[b\]Re:.*\[\/b\] should work (although that would be case sensitive).

Edit: And whether you need to escape the slash in the close bold tag may depend on your language.
 
Try this:

Code:
\[b\]Re:(.+)\[/b\]

This assumes that what's after the "Re:" can be anything; if you want to restrict it to alphanumerics and punctuation you could do something like:

Code:
\[b\]Re:(([A-Za-z \!\?\.\,\;]+))\[/b\]
 
So PF comment from the admin side is SQL-queryable?
 
Ibix said:
whether you need to escape the slash in the close bold tag may depend on your language.

Python's regexp parser (which is what I used for my testing) doesn't seem to require it, but yes, others might.
 
PeterDonis said:
Try this:

Code:
\[b\]Re:(.+)\[/b\]

This assumes that what's after the "Re:" can be anything; if you want to restrict it to alphanumerics and punctuation you could do something like:

Code:
\[b\]Re:(([A-Za-z \!\?\.\,\;]+))\[/b\]

This works when testing in https://regexr.com/ but when I do a SELECT query using it, it's not at all working and returns results that don't match at all. hmmm

WWGD said:
So PF comment from the admin side is SQL-queryable?

Of course, it's all stored in a DB
 
Greg Bernhardt said:
when I do a SELECT query using it, it's not at all working and returns results that don't match at all

What database engine? And what exactly is the SELECT query statement?
 
  • #10
PeterDonis said:
What database engine? And what exactly is the SELECT query statement?
MariaDB which is a drop in replacement for mySQL.

SELECT * FROM `xf_post` WHERE `message` REGEXP '\[b\]Re:(.+)\[\/b\]'

This actually returns only 6 results when I know there are 127k when doing a LIKE query for "[B]Re:"

One of the results matches this

[b]Please check out the COBE and WMAP results[/b]
 
Last edited:
  • #11
Greg Bernhardt said:
This works when testing in https://regexr.com/ but when I do a SELECT query using it, it's not at all working and returns results that don't match at all. hmmm
Of course, it's all stored in a DB
In case you're interested, Sql Server dev allows for Python and has its own ML Server.
 
  • #14
PeterDonis said:
Python's regexp parser (which is what I used for my testing) doesn't seem to require it, but yes, others might.
I tested by piping a string through sed, which does require it because it uses / to delimit the regex. Some other engines follow suit - apparently not this one, though.
 
  • #15
Greg Bernhardt said:
I think that is bingo!

:smile:
 

Similar threads

  • · Replies 10 ·
Replies
10
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
Replies
5
Views
1K
  • · Replies 10 ·
Replies
10
Views
2K
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
Replies
33
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K