Skip to main content

The Levenshtein distance!!!

Must have seen Google responding intelligently to your query like:
Did you mean Google?

How does it does that? Simple, PHP has function called Levenshtein that will do it for you provided you have a collection of words to match against for.

The Levenshtein distance is defined as the minimal number of characters you have to replace, insert or delete to transform str1 into str2 . The complexity of the algorithm is O(m*n), where n and m are the length of str1 and str2 (rather good when compared to similar_text(), which is O(max(n,m)**3), but still expensive).

Try this:

-guess.php



Enter:



// input misspelled word
if(!isset($_GET['q'])) exit();
$input = $_GET['q'];

// array of words to check against
$words = array('apple','pineapple','banana','orange',
'radish','carrot','pea','bean','potato');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($words as $word) {

// calculate the distance between the input word,
// and the current word
$lev = levenshtein($input, $word);

// check for an exact match
if ($lev == 0) {

// closest word is this one (exact match)
$closest = $word;
$shortest = 0;

// break out of the loop; we've found an exact match
break;
}

// if this distance is less than the next found shortest
// distance, OR if a next shortest word has not yet been found
if ($lev <= $shortest || $shortest < 0) {
// set the closest match, and shortest distance
$closest = $word;
$shortest = $lev;
}
}

//echo "Input word: $input\n";
if ($shortest == 0) {
echo "Exact match found: $closest\n";
} else {
echo "Did you mean: $closest?\n";
}

?>
Interesting Is'nt it?
thanks to Code Jockeys community@orkut.

Comments

Popular posts from this blog

T Shirt Quotes related to Computers

Last week while searching for some computer related quotes for T Shirt I came across certain quotes that I thought were very good.So I thought why not share these quotes with you.So here are these quotes,pick the one you like or if you have any of your favorites then do share it with us.Here is the list.. "Programmers don't byte, they nibble a bit" "To iterate is human, to recurse divine" " first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time" "99% of all girls are beautiful, the rest 1% are in my college "ASC!! a stupid question,get a stupid ANS!" "In cartooned form Atom1 - I have lost an electron. Atom2 - Are you sure? Atom1 - I am positive." "There's no place like 127.0.0.1 (“Home” for the non-geeks)" "YouTube(logo) myspace(logo) and I'll Google(logo) your Yahoo(logo)." " I'm a progr...

Google Facts

The name Google is a spelling error. The founders of the site, Larry page and Sergey Brin , thought they were going for Googol .. Googol is the mathematical term for 1 followed by 100 zeros. Initially, Larry and Sergey Brin called their search engine BackRub , named for its analysis of the of the web's "back links." The reason the google page is so bare is because t he founder didn't know HTML and just wanted a quick interface. The company's first office was in a garage , in Menlo Park, California . Google's first employee was Craig Silverstein, now Google's Director of technology. The basis of Google's search technology is called PageRank that assigns a rank to determine how useful it is. However, that is not why it is called PageRank. It is actually named after Google co-founder Larry Page . It would take 5,707 years for a person to search Google's 3 billion pages . The Google software does it in 0.5 seconds. The logos that appear on ...

Computer Tips Bollywood Style

After a long gap I am writing a post...back in the blogger arena after such a long interval feels good.Here are some tips... As they say "A picture is worth a thousand words". TecH GaraGe (TG) is back with a new flavour in a new attire.Keep checking out this blog.