February 13, 2004
on Screen Scraping to RSS feeds
While googling on the subject matter, I was led to the MSDN Academic Alliance’s treatment of the issue — Creating a Generic Site-To-Rss Tool by Roy Osherove in October 2003. An interesting read for those who are deploying desktop and web applications using Microsoft’s .NET Techonology.
Obviously, I found the code interesting, but the most intriguing nugget in the article, was a Regex tool that that Roy Osherove has developed based on the MS .Net Framework.
Aptly named The Regulator , this is one of the best of breed regular expression utilities that I have seen.
UPDATE: I am not alone in the quest for RSS Scraping. The top link returned from searching Google for RSS Scraper is the weblog of Guy Bjerke . Mr. Bjerke has posted an article that is of interest to Radio UserLand subscribers in that he has tried Stapler and is now using the beta web app, myRSS
Posted by akvalley at February 13, 2004 08:03 AM | TrackBackI’m not sure I understand what the regulator is for. I understand that regular expressions are like mega wildcards, but is there a way to use the regulator to convert HTML -> RSS?
Posted by: bkam at March 15, 2004 07:11 PMAlong with a former student, I am implementing the Hilltop High Website www.hilltoplancers.org. Our most recent project is an application for teachers called AutoLink. Teachers will be able to create a hot list for their subject area or for a trip to the computer lab. The data fields are title, url and description, which is similar to googles search results. What I am suggesting to my programmer is to create the ability for teachers to do a google search then add the items to their hotlist without retyping or cutting and pasting. I guess the term we will use is to screen scrape the data into the mysql. Should we use RSS for the intermediate step or find some other way to parse the google search results.
Rick
Posted by: Rick Lakin at July 2, 2004 08:13 PMRick, why not use the google api ?
http://www.google.com/apis/
Posted by: Tomun at August 5, 2004 12:26 PM