February 13, 2004

on Screen Scraping to RSS feeds

While googling on the subject matter, I was led to the MSDN Academic Alliance’s treatment of the issue — Creating a Generic Site-To-Rss Tool by Roy Osherove in October 2003. An interesting read for those who are deploying desktop and web applications using Microsoft’s .NET Techonology.

Obviously, I found the code interesting, but the most intriguing nugget in the article, was a Regex tool that that Roy Osherove has developed based on the MS .Net Framework.

Aptly named The Regulator , this is one of the best of breed regular expression utilities that I have seen.

UPDATE: I am not alone in the quest for RSS Scraping. The top link returned from searching Google for RSS Scraper is the weblog of Guy Bjerke . Mr. Bjerke has posted an article that is of interest to Radio UserLand subscribers in that he has tried Stapler and is now using the beta web app, myRSS

Posted by akvalley at February 13, 2004 08:03 AM | TrackBack
Comments

I’m not sure I understand what the regulator is for. I understand that regular expressions are like mega wildcards, but is there a way to use the regulator to convert HTML -> RSS?

Posted by: bkam at March 15, 2004 07:11 PM

Along with a former student, I am implementing the Hilltop High Website www.hilltoplancers.org. Our most recent project is an application for teachers called AutoLink. Teachers will be able to create a hot list for their subject area or for a trip to the computer lab. The data fields are title, url and description, which is similar to googles search results. What I am suggesting to my programmer is to create the ability for teachers to do a google search then add the items to their hotlist without retyping or cutting and pasting. I guess the term we will use is to screen scrape the data into the mysql. Should we use RSS for the intermediate step or find some other way to parse the google search results.

Rick

Posted by: Rick Lakin at July 2, 2004 08:13 PM

Rick, why not use the google api ?

http://www.google.com/apis/

Posted by: Tomun at August 5, 2004 12:26 PM
Post a comment









On-topic comments, complaints, and criticisms are welcome, but off-topic or inappropriate comments will be deleted without notice to the commentor. If you include your URL below it will be linked (and subsequently indexed by Google and the like...possibly).

Since ALL of the message text is displayed online, please maintain your personal privacy by not posting personal information.

Remember personal info?






Please enter the code as seen in the image above to post your comment.