Web-scraping solutions?
- IT TOPICS:Business Intelligence, Development, Software
There are times you want to automatically gather information from web pages, possibly from sites whose owners would not derive great joy from you doing this.
This is not a terribly hard technical problem, but it is also far from trivial given all the weird things that can happen on websites these days.
Well, I ran into a couple of guys at the Text Mining Summit who claim to solve just that problem. Their company -- the amusingly named Scrapegoat -- seems to consist of a few programmers and a toolkit, which probably is continually enhanced as ever more webpage weirdnesses are discovered. They prefer to do the programming for you, but will sell you the toolkit for your own use if you absolutely insist.
I haven't actually checked out their work, talked with customers, or even seriously given them my usual third-degree grilling. In other words, I haven't done proper analysis on them at all. But with those rather comprehensive disclaimers out of the way -- my hunch is that you won't be sorry you gave them a call.
EDIT: At first blush, it appears that QL2 is a more industrial-strength and productized version of the same thing. I'll try to check them out and report back.




