SQL Server Word Sound Comparison

Facebooktwittergoogle_plusredditpinterestlinkedinmail

As we learned in a previous post, you can use the SOUNDEX function to understand the sound that a word makes. You can then compare it to another word to see if the sound matches. This works okay for exact sound matches. But what if you want check to see if the word sound similar to another word? And how do you find out how similar the words are? SQL Server has you covered. In SQL Server 2005 they released the DIFFERENCE function.

 
The DIFFERENCE Function
When you use the SOUNDEX function in SQL Server, it returns a 4 character code that represents the sound of the word. The DIFFERENCE function compares two of these 4 character codes and returns a value between 0 and 4. The return value indicates how many of these characters in the 4 character code were the same. Here is an example:

Word SOUNDEX Code
Frog F620
Blog B420

DIFFERENCE: 2

You can see in the example above that if you compare the SOUNDEX code of the two words, only 2 characters are identical. The ‘F’ is different from the ‘B’. The ‘6’ is different from the ‘4’. But the ‘2’ and the ‘0’ are the same between the two codes. Thus… the difference is 2.

 
Put It All Together
Now that you understand how SOUNDEX and DIFFERENCE work, you can use them to create searches that are based on “sounding like a word”. Here is an example… remember, 0 means the words sound nothing alike and 4 means that the words sound exactly alike. In my example, I am going to match on a likeness of 3… which is sounding pretty similar.

 
 
Reference: http://msdn.microsoft.com/en-us/library/ms188753.aspx

SOUNDEX – Discovering The Sounds Of Words In SQL Server

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Let’s face it, not everyone is a perfect speller all the time. Exact match queries are pretty easy to do in SQL Server. But, what if you wanted to correct their spelling or search based on a misspelled word? How is this possible? In SQL Server 2008+ you have the ability to do this with the SOUNDEX function.

SOUNDEX
The SOUNDEX function will take an input word and convert it in to a 4 character code representing the sound of the word. For example: goat -> G300.

So what does this 4 character code mean? The first character represents the first character in the string, the rest of the characters represent the rest of the characters in the string. Of course SQL Server does some magic to come up with this code. If you are interested more in the makeup of this code, see the MSDN link at the bottom of this post. Essentially, this 4 character code represents the sound of the word.

 
How To Use It
You can use this just like any other function in SQL Server.

An easy way to search based on word sound would be something like this:

The above query would find words like “dog”, “dig”, “dag”

 
Reference: http://msdn.microsoft.com/en-us/library/ms187384.aspx