Regular Expressions and Yahoo Pipes

Click For Summary
SUMMARY

This discussion focuses on using Yahoo Pipes to create an RSS feed from an HTML table displaying new posts on a web forum. The user attempts to extract specific data such as titles, posters, and topic descriptions using regular expressions. The provided regex patterns, including ^.*title=\".*\">(.*)<\/a>.* for titles, are not yielding matches, indicating a need for proper escaping of characters in the context of Yahoo Pipes' regex implementation. The user seeks further assistance in formatting dates as well.

PREREQUISITES
  • Familiarity with Yahoo Pipes and its functionalities
  • Understanding of regular expressions, particularly in a Perl context
  • Basic knowledge of HTML structure and elements
  • Experience with RSS feed creation and manipulation
NEXT STEPS
  • Research Yahoo Pipes regular expression syntax and limitations
  • Learn advanced regular expression techniques for HTML parsing
  • Explore methods for formatting dates in Yahoo Pipes
  • Investigate alternative tools for creating RSS feeds from HTML content
USEFUL FOR

This discussion is beneficial for web developers, data scrapers, and anyone interested in automating RSS feed generation from HTML content using Yahoo Pipes and regular expressions.

John Creighto
Messages
487
Reaction score
2
I'm trying to create an rss feed from a html table on the main page of a web forum. I want to do this because the table displays the new posts.

I use yahoo pipes and you can see my attempt here:
http://pipes.yahoo.com/pipes/pipe.info?_id=0e72fee43090386fddbc9191f5cddc86

The pipes work up to the regular expression block.

The input to my regular expression block is:

Code:
<a rel="nofollow" target="_blank" href="http://thepeacearch.com/forum/showthread.php?t=16681" title="So, recently I've been looking for ways to make my online time more efficient. One thing I'm looking to do is find ways to combine information from...">My Enviornment rss Feed</a>



<div class="smallfont">

<span style="cursor:pointer;">s243a</span>

</div>


<div class="smallfont">Today <span class="time">02:47 AM</span></div>

 

 
<div class="smallfont" style="text-align:right;white-space:nowrap;">
Today <span class="time">02:47 AM</span><br />
by <a rel="nofollow" target="_blank" href="http://thepeacearch.com/forum/member.php?find=lastposter&amp;t=16681">s243a</a> <a rel="nofollow" target="_blank" href="http://thepeacearch.com/forum/showthread.php?p=295884#post295884"><img alt="" border="0" src="http://thepeacearch.com/images/lustrous/buttons/lastpost.gif" title="Go to last post"/></a>
</div>
 


<span class="smallfont">0</span> 


<span class="smallfont">3</span>

<span class="smallfont">0</span> 


<span class="smallfont">3</span>

I try to extract the title as follows:

Code:
^.*title=\".*\">(.*)<\/a>.*

the poster as follows;

Code:
^.*<span style=\"cursor\:pointer\;\"(.*)\<\/span\>.*

and the topic description as follows:

Code:
^.*title\=\"(.*)\".*

For each of the above regular expressions, the match is replaced with what is inside the brackets. Unfortunately none of my expressions are matching. I'm not sure what characters I need to escape but it is suppose to be based on perl and wikipedia tells me that all non alphanumeri characters in perl can be replaced by a backslash.
 
Last edited by a moderator:
Technology news on Phys.org
Well, I don't know about yahoo pipes regexes, but in the absence of documentation (I didn't find any good documentation in a quick search) you might as well try experimenting.
Can you match:
horse
h(orse) (replace the orse with something else)
^.*horse
\"h(orse)\"
 
Last edited:
I was able to figure most of what I want out with help from this post:

http://discuss.pipes.yahoo.com/Message_Boards_for_Pipes/threadview?m=tm&bn=pip-DeveloperHelp&tid=9741&mid=9742&tof=3&rt=2&frt=2&off=1

I'll I need to do now is figure out how to format the date:

http://discuss.pipes.yahoo.com/Message_Boards_for_Pipes/threadview?m=tm&bn=pip-DeveloperHelp&tid=9749&mid=9749&tof=1&frt=2
 
Last edited by a moderator: