Last night while the Red Sox were playing and they were leading 231-2, I tried picking up a book to read but was a little too distracted. I needed something mindless but faintly interactive. So I logged on to Amazon Mechanical Turk to earn $0.02 a minute clicking on links.
Actually I was hoping there were some faintly interesting writing jobs but they were all the sleazy “write a glowing product review on this blog” kind. Instead one of the more mundane tasks caught my eye: “help refine search results.” Hey, that’s basically what I do for a living.
The task was to rank the relevancy of various web page results for a given search query. All of the results seemed to be from Wikipedia (including the Talk: pages and other material not likely to be of interest to a general audience). The queries appeared to be genuine user data.
I learned a number of things from this experience. First of all, searching Wikipedia is often nothing more than a snapshot of the day’s vandalism. The snippet from “Roman Catholicism in Myanmar” suggested that I “keep doin it you pimp!!!” Quite a few pages had no other content besides “fag”. These articles were corrected immediately, no doubt, but their cached states were immortalized by the search engine, and users’ misspellings were often exact matches for misspellings by vandals.
Happily, the query for chicken sexing matched an entire article devoted to it and very few vandalized articles containing the words “chicken” and “sex”. I wish more people knew to quote their search terms because a great percentage of the queries matched an article with those exact words as its title. Correctly quoted, Google would return these highly-relevant pages as the #1 result. Sometimes the queries would find an exact Wikipedia match but be too broad in their scope, resulting in a disambiguation page. Peevishly, I started highly ranking non-U.S. results (”Spanish Civil War” for civil war) even though I knew from context that these searches were all by Americans. A little historical perspective never hurt.
Another thing I might have learned (had I not known it already) is that search engines are idiots. I don’t know the answer to Who helped elect Arnold Schwarzenneger but I do know it wasn’t “New Kids on the Block“. There probably isn’t an answer to Who invented pi but it is definitely not “Hat“.
Often the searches weren’t so much actual queries as cries in the wilderness. I had no idea how to respond to parenting help or debt consolidation. Wikipedia didn’t either.
I stopped, eventually, not so much because I got bored or because the game was ending, but because the sad queries were getting to me. It’s like reading that page that turns up #1 in Google for the query cancel google. At first it’s kind of funny, you know, “Ha ha people don’t understand how search engines work.” Then you realize, here’s this incredible technology that has changed everything, and most people in this scientifically-advanced first world country don’t know a thing about it. And then you come to a query like the Bible says spending too much time on-line is not good, and it’s so much worse than you could’ve imagined that you just close the computer and walk away.