In senior year of college, I took an class that blended information theory, algorithms and networking. It was called “Algorithms at the end of the wire”. My project for that class was an application that finds links for articles in Wikipedia.

Working off the ideas presented in class on search results ranking and vector space models, we proposed that given a query article (an article to add links in), we can find some k articles already in Wikipedia that are most similar to it. We can then we can use the links in those articles to infer the links to create in the query article. In particular, each of the neighboring articles could suggest links for text that they had in common with the query and the set of neighbors would vote on the link with weighing applied based on how close the particular voting article was to the query article. As the mechanism for determining the k nearest neighbors, we would fetch articles from the Wikipedia corpus that had text occurring in the query article then rank the results and pick the top k. Ranking was done separately using PageRank and using Latent Semantic Indexing then the rankings were aggregated.

You can download a prototype of an editor implementing our algorithm here . The editor depends on a web service so you need to be connected to the internet to use it. This is a C# application so you can run it in windows or in Linux using Mono.

kappa_01_prelinking.jpgkappa_01_postlinking.jpg

 

 

 

 

Read more