Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I need a regex to get stuff between <li> and </li>

  1. Nov 13, 2015 #1
    What I'm ultimately tried to do is get the
    Code (Text):
    Some Guy, Some Other Guy, Some Guy 2, Some W. Bush
    from an expression like

    Code (Text):
    <h2>Friends</h2><ul><li>Some Guy</li><li>Some Other Guy</li><li>Some Guy 2</li><li>Some W. Bush</li></ul>
    This expression is in a much larger piece of text but is the only time an expression of this exact form is in it. I'm using

    Code (Text):
    (?s)<h2>Friends</h2>.*?<ul>.*?</ul>
    to get the expression and

    Code (Text):
    <li>([a-zA-Z0-9. ]+)</li>
    to get

    Code (Text):
    <li>Some Guy</li>, <li>Some Other Guy</li>, <li>Some Guy 2</li> and <li>Some W. Bush</li>
    , but I actually want what's BETWEEN the tags.
     
  2. jcsd
  3. Nov 13, 2015 #2

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    $1 or \1 (depending on the interpreter) will give the content of the first bracket instead of the full match, here the content inside the tags.
    Alternatively, lookahead and lookbehind are an option, but more complicated and not necessary here.
     
  4. Nov 13, 2015 #3

    FactChecker

    User Avatar
    Science Advisor
    Gold Member

    What language and/or regular expression engine are you using?
     
  5. Nov 14, 2015 #4
    C#.NET, System.Text.RegularExpressions
     
  6. Nov 14, 2015 #5

    adjacent

    User Avatar
    Gold Member

    Are you ok with using a loop to extract out the groups? Or should it be regex only?
     
  7. Nov 14, 2015 #6
  8. Nov 14, 2015 #7

    mfb

    User Avatar
    2016 Award

    Staff: Mentor

    Hmm, I don't see how to get subpatterns there.

    You can try (?<=<li>)[a-zA-Z0-9. ]+(?=</li>)
    Lookahead and lookbehind
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: I need a regex to get stuff between <li> and </li>
  1. C++ regex not matching (Replies: 6)

Loading...