Better search algorithms

April 23, 2008 at 4:37 pm | Posted in Uncategorized | Leave a comment

I’m doing a bit of research for a few side projects that I’m working on. It basically has to do with data mining the new symantec web which is becoming oh-so-prevalent on the modern day internet. Anyways, I read an interesting article on search algorithms. Basically, there are many different approaches to getting a relevant data. Some methods use sophisticated algorithms with a single dataset. Other approaches use a simpler, more basic, algorithms but uses sets of different types of data as the dataset.

Guess which strategy works best?

Correct, the simpler one. The point is is that more, independent data usually beats out smarter algorithms. If you have different datasets from a different sources, you can usually get a more accurate response than trying to squeeze every last drop out of performance our of your favorite algorithm. Google adopts a similar strategy in the search. Not only does it index web pages, but it bases rankings on the links that users actually click on. Combined, the two datasets form a more relevant answer than what would have happened if each algorithm were to be performed individually. Well, that’s a simple example anyway. In many ways, it’s like finding the location of an object in a two dimensional space by triangulating three radars at different locations.

A recommended book that discusses good algorithms in general terms is “The Wisdom of Crowds”. The book helps explain these concepts in rather interesting ways. The book is available here if anyone wants to read it: http://www.amazon.com/Wisdom-Crowds-James-Surowiecki/dp/0385721706/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1208993527&sr=8-1

Advertisements

Create a free website or blog at WordPress.com.
Entries and comments feeds.

%d bloggers like this: