Simple Regex help for matching strings inside markup tags

In summary, you have a dozen strings with different content between the bold markup tags. You need a regular expression that will match this. Basic and you think you are close, but just need a little help.
  • #1
19,443
10,021
Say I have a dozen of strings like this, with different content between the bold markup tags ("Re:" is also a constant). I need a regular expression that will match this. Basic and I think I am close, but just need a little help.



Here is what I have, but it's not matching

(?<=[b])Re:.*?(?=[/b]]) or this \[b]Re:(.+)[\/b]
 
Technology news on Phys.org
  • #2
Greg Bernhardt said:
Say I have a dozen of strings like this, with different content between the bold markup tags ("Re:" is also a constant). I need a regular expression that will match this. Basic and I think I am close, but just need a little help.



Here is what I have, but it's not matching

(?<=[b])Re:.*?(?=[/b]]) or this \[b]Re:(.+)[\/b]
Is this Python?
 
  • #3
Just vanilla regular expression to use in SQL eventually
 
  • Like
Likes WWGD
  • #4
Don't you just need to escape the square brackets? Otherwise the regex interprets it as a character range (i.e., [abcd] matches a, b, c or d, so a b in square brackets matches b). So \[b\]Re:.*\[\/b\] should work (although that would be case sensitive).

Edit: And whether you need to escape the slash in the close bold tag may depend on your language.
 
  • #5
Try this:

Code:
\[b\]Re:(.+)\[/b\]

This assumes that what's after the "Re:" can be anything; if you want to restrict it to alphanumerics and punctuation you could do something like:

Code:
\[b\]Re:(([A-Za-z \!\?\.\,\;]+))\[/b\]
 
  • #6
So PF comment from the admin side is SQL-queryable?
 
  • #7
Ibix said:
whether you need to escape the slash in the close bold tag may depend on your language.

Python's regexp parser (which is what I used for my testing) doesn't seem to require it, but yes, others might.
 
  • #8
PeterDonis said:
Try this:

Code:
\[b\]Re:(.+)\[/b\]

This assumes that what's after the "Re:" can be anything; if you want to restrict it to alphanumerics and punctuation you could do something like:

Code:
\[b\]Re:(([A-Za-z \!\?\.\,\;]+))\[/b\]

This works when testing in https://regexr.com/ but when I do a SELECT query using it, it's not at all working and returns results that don't match at all. hmmm

WWGD said:
So PF comment from the admin side is SQL-queryable?

Of course, it's all stored in a DB
 
  • #9
Greg Bernhardt said:
when I do a SELECT query using it, it's not at all working and returns results that don't match at all

What database engine? And what exactly is the SELECT query statement?
 
  • #10
PeterDonis said:
What database engine? And what exactly is the SELECT query statement?
MariaDB which is a drop in replacement for mySQL.

SELECT * FROM `xf_post` WHERE `message` REGEXP '\[b\]Re:(.+)\[\/b\]'

This actually returns only 6 results when I know there are 127k when doing a LIKE query for "[B]Re:"

One of the results matches this

[b]Please check out the COBE and WMAP results[/b]
 
Last edited:
  • #11
Greg Bernhardt said:
This works when testing in https://regexr.com/ but when I do a SELECT query using it, it's not at all working and returns results that don't match at all. hmmm
Of course, it's all stored in a DB
In case you're interested, Sql Server dev allows for Python and has its own ML Server.
 
  • #14
PeterDonis said:
Python's regexp parser (which is what I used for my testing) doesn't seem to require it, but yes, others might.
I tested by piping a string through sed, which does require it because it uses / to delimit the regex. Some other engines follow suit - apparently not this one, though.
 
  • #15
Greg Bernhardt said:
I think that is bingo!

:smile:
 

1. What is a regex?

A regex, short for regular expression, is a sequence of characters that defines a search pattern. It is commonly used in string matching and search and replace operations.

2. How do I use regex to match strings inside markup tags?

To match strings inside markup tags using regex, you can use the angle brackets (<>) as delimiters and use the dot (.) metacharacter to match any character. For example, to match all strings inside HTML tags, you can use the regex pattern <.+>.

3. Can I specify which strings inside markup tags I want to match?

Yes, you can use character classes such as [A-Z] or [0-9] to specify which characters you want to match inside the tags. You can also use quantifiers like + or * to match one or more instances of a character inside the tags.

4. How can I exclude certain strings from being matched inside markup tags?

You can use the ^ metacharacter to specify negation in your regex pattern. For example, to exclude all strings that start with a number inside HTML tags, you can use the pattern <^[\d]+>.

5. Are there any resources for learning more about using regex for matching strings inside markup tags?

Yes, there are many online tutorials and guides available for learning about regex and its use in matching strings inside markup tags. Some popular resources include the MDN Web Docs, Regular-Expressions.info, and W3Schools.

Similar threads

  • Programming and Computer Science
Replies
10
Views
2K
  • Programming and Computer Science
Replies
5
Views
762
  • Engineering and Comp Sci Homework Help
Replies
10
Views
1K
  • Materials and Chemical Engineering
Replies
1
Views
901
Replies
33
Views
2K
  • Advanced Physics Homework Help
Replies
1
Views
919
Replies
1
Views
2K
  • Quantum Physics
Replies
28
Views
1K
  • Programming and Computer Science
Replies
29
Views
5K
  • Special and General Relativity
3
Replies
75
Views
3K
Back
Top