I need a regex to get stuff between <li> and </li>

  • Thread starter Thread starter SlurrerOfSpeech
  • Start date Start date
AI Thread Summary
The discussion revolves around extracting specific text from HTML using regular expressions in C#.NET, focusing on a list of names within an HTML structure. The user aims to capture the names listed under the <ul> tag following the <h2>Friends</h2> header. The initial regex pattern used is not yielding the desired results, as it captures only the first match instead of the full content within the <li> tags. Suggestions include using lookahead and lookbehind assertions to refine the regex, with a proposed alternative pattern: (?<=<li>)[a-zA-Z0-9. ]+(?=</li>). The conversation also touches on the possibility of using loops to extract multiple matches, although the preference appears to lean towards a regex-only solution.
SlurrerOfSpeech
Messages
141
Reaction score
11
What I'm ultimately tried to do is get the
Code:
Some Guy, Some Other Guy, Some Guy 2, Some W. Bush
from an expression like

Code:
<h2>Friends</h2><ul><li>Some Guy</li><li>Some Other Guy</li><li>Some Guy 2</li><li>Some W. Bush</li></ul>

This expression is in a much larger piece of text but is the only time an expression of this exact form is in it. I'm using

Code:
(?s)<h2>Friends</h2>.*?<ul>.*?</ul>

to get the expression and

Code:
<li>([a-zA-Z0-9. ]+)</li>

to get

Code:
<li>Some Guy</li>, <li>Some Other Guy</li>, <li>Some Guy 2</li> and <li>Some W. Bush</li>
, but I actually want what's BETWEEN the tags.
 
Technology news on Phys.org
$1 or \1 (depending on the interpreter) will give the content of the first bracket instead of the full match, here the content inside the tags.
Alternatively, lookahead and lookbehind are an option, but more complicated and not necessary here.
 
What language and/or regular expression engine are you using?
 
FactChecker said:
What language and/or regular expression engine are you using?

C#.NET, System.Text.RegularExpressions
 
Are you ok with using a loop to extract out the groups? Or should it be regex only?
 
Back
Top