Login

Wikipedia does not like spiders

2008-07-17 22:51

I am currently writing with Beautifulsoup and Mutagen a little python script to parse wikipedia pages and retrieve the released year of an album. Much to my surprise I received an "Forbidden" error code.

What is happening is that wikipedia does not want to allow crawlers. Hopefully I could make urllib use an authorized User-Agent but WTF ? This kind of protection is so easily circumvented that all it does is annoying people.

No comment

Post a comment