Skip to content
14 June 32011 / Robin Wellner

The Path to Philosophy

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”.
— Randall Munroe, title text of xkcd/903.

I may have written a script inspired by the alt-text of http://xkcd.com/903/ --- blog post and GitHub Gist will follow whenever.

The method described in the quote at the top of this page is actually away to make a graph out of Wikipedia, I realized. Some sort of reverse tree. Not exactly the right description, since cycles are possible (and do exist on Wikipedia).

I wondered whether it would be possible to write a script that starts from some article on Wikipedia and goes on until it finds Philosophy or ends up in a cycle.

As it turns out, it is. Once you get past the parsing of HTML, it is not that hard, even. For that, I avoided regular expressions, and went straight to html.parser.HTMLParser. Can’t say I used it before, but it was a breeze. 90% of the code in the parser is concerned with keeping track of all the conditions whether it is the right time to follow a link. Surprisingly, handling the “not in parentheses” condition was the simplest. I want to keep the code out of this post and in a Gist, but I can’t resist sharing this snippet beforehand:

	def handle_data(self, data):
		self.open_parens += data.count('(') - data.count(')')

For my walker, I ran into the problem of redirects: this could cause problems — or at least redundancy — for cycles. I solved the problem by making what can be described as a dictionary of sets. I sought help from Stack Overflow, and got a neat new toy out of it, but in the end I just went with what I had already.

Long story short: most articles seem to end up in Philosophy eventually (often in ~20 steps), but some end up in a cycle. It is possible that there is some article in there without a valid link, which means there could be dead ends (I don’t check for that, only for Philosophy and cycles), but I haven’t found one yet.

I thought of building an extensive network of Wikipedia articles, but decided against it because of the high volume of requests I would need to send to Wikipedia for it.

And finally, for those of you craving code, here it is. Requires Python 3.

Advertisements

2 Comments

Leave a Comment
  1. gamerwho / Jun 14 2011 20:18

    “I thought of building an extensive network of Wikipedia articles, but decided against it because of the high volume of requests I would need to send to Wikipedia for it.”

    You can still do that! Wikipedia shares a dump of their database: http://en.wikipedia.org/wiki/Wikipedia:Database_download

    • Robin Wellner / Jun 14 2011 21:15

      I thought of that, but the volume of downloading I would have to do anyway put me off.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: