Skip to main content

The Levenshtein distance!!!

Must have seen Google responding intelligently to your query like:
Did you mean Google?

How does it does that? Simple, PHP has function called Levenshtein that will do it for you provided you have a collection of words to match against for.

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2 . The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

Try this:

-guess.php



Enter:



// input misspelled word
if(!isset($_GET['q'])) exit();
$input = $_GET['q'];

// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);

// check for an exact match
if ($lev == 0) {

// closest word is this one (exact match)
$closest = $word;
$shortest = 0;

// break out of the loop; we've found an exact match
break;
}

// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}

//echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
}

?>
Interesting Is'nt it?
thanks to Code Jockeys community@orkut.

Comments

Popular posts from this blog

If programming languages were cars...

There have been many comparisons between programming languages and different aspects of human life by many individuals .This is one of them , a comparison between programming languages and cars.This post is inspired from the article originally written by Mike Vanier.This is an update to an old series of jokes about computer languages being like cars. Just go through this list and see if you can come out with your own list of comparisons.Here goes the list... The list Ada is a tank. A butt-ugly tank that never breaks down. People laugh uncontrollably if you tell them you drive Ada, but really, do you want to be driving a sports car in a war zone? Assembly Language is a bare engine; you have to build the car yourself and manually supply it with gas while it's running, but if you're careful it can go like a bat out of hell. Assembly Language : YOU are the car. Basic is a simple car useful for short drives to the local shops. Once popular with learner drivers, it has recen...

Google Facts

The name Google is a spelling error. The founders of the site, Larry page and Sergey Brin , thought they were going for Googol .. Googol is the mathematical term for 1 followed by 100 zeros. Initially, Larry and Sergey Brin called their search engine BackRub , named for its analysis of the of the web's "back links." The reason the google page is so bare is because t he founder didn't know HTML and just wanted a quick interface. The company's first office was in a garage , in Menlo Park, California . Google's first employee was Craig Silverstein, now Google's Director of technology. The basis of Google's search technology is called PageRank that assigns a rank to determine how useful it is. However, that is not why it is called PageRank. It is actually named after Google co-founder Larry Page . It would take 5,707 years for a person to search Google's 3 billion pages . The Google software does it in 0.5 seconds. The logos that appear on ...

T Shirt Quotes related to Computers

Last week while searching for some computer related quotes for T Shirt I came across certain quotes that I thought were very good.So I thought why not share these quotes with you.So here are these quotes,pick the one you like or if you have any of your favorites then do share it with us.Here is the list.. "Programmers don't byte, they nibble a bit" "To iterate is human, to recurse divine" " first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time" "99% of all girls are beautiful, the rest 1% are in my college "ASC!! a stupid question,get a stupid ANS!" "In cartooned form Atom1 - I have lost an electron. Atom2 - Are you sure? Atom1 - I am positive." "There's no place like 127.0.0.1 (“Home” for the non-geeks)" "YouTube(logo) myspace(logo) and I'll Google(logo) your Yahoo(logo)." " I'm a progr...