SQL Server Word Sound Comparison


As we learned in a previous post, you can use the SOUNDEX function to understand the sound that a word makes. You can then compare it to another word to see if the sound matches. This works okay for exact sound matches. But what if you want check to see if the word sound similar to another word? And how do you find out how similar the words are? SQL Server has you covered. In SQL Server 2005 they released the DIFFERENCE function.

When you use the SOUNDEX function in SQL Server, it returns a 4 character code that represents the sound of the word. The DIFFERENCE function compares two of these 4 character codes and returns a value between 0 and 4. The return value indicates how many of these characters in the 4 character code were the same. Here is an example:

Frog F620
Blog B420


You can see in the example above that if you compare the SOUNDEX code of the two words, only 2 characters are identical. The ‘F’ is different from the ‘B’. The ‘6’ is different from the ‘4’. But the ‘2’ and the ‘0’ are the same between the two codes. Thus… the difference is 2.

Put It All Together
Now that you understand how SOUNDEX and DIFFERENCE work, you can use them to create searches that are based on “sounding like a word”. Here is an example… remember, 0 means the words sound nothing alike and 4 means that the words sound exactly alike. In my example, I am going to match on a likeness of 3… which is sounding pretty similar.

Reference: http://msdn.microsoft.com/en-us/library/ms188753.aspx

SOUNDEX – Discovering The Sounds Of Words In SQL Server


Let’s face it, not everyone is a perfect speller all the time. Exact match queries are pretty easy to do in SQL Server. But, what if you wanted to correct their spelling or search based on a misspelled word? How is this possible? In SQL Server 2008+ you have the ability to do this with the SOUNDEX function.

The SOUNDEX function will take an input word and convert it in to a 4 character code representing the sound of the word. For example: goat -> G300.

So what does this 4 character code mean? The first character represents the first character in the string, the rest of the characters represent the rest of the characters in the string. Of course SQL Server does some magic to come up with this code. If you are interested more in the makeup of this code, see the MSDN link at the bottom of this post. Essentially, this 4 character code represents the sound of the word.

How To Use It
You can use this just like any other function in SQL Server.

An easy way to search based on word sound would be something like this:

The above query would find words like “dog”, “dig”, “dag”

Reference: http://msdn.microsoft.com/en-us/library/ms187384.aspx

How To Use The OUTPUT Clause


The OUTPUT clause is an amazing part of SQL Server that many people do not know about.  As corny as it is, I still remember the day that I found it.  I had been searching for the whole day trying to figure out how to get the identity values from a large amount of data that I just inserted in to the database.  There had to be a way… right?  After searching and searching, I gave up and just accepted that it is not possible… a few days later I found exactly what I was looking for… the OUTPUT clause.

The OUTPUT clause is a part of the query that will return data from before or after the operations is completed.  Let’s say that you inserted data in to a table and you wanted the ID column values (which are auto-numbers).  The OUTPUT clause gives you this information!  It can work on INSERT, DELETE, UPDATE, and MERGE statements.  I will take you through examples of each.

To access the data that is being altered in your sql statement, you need to use special column prefixes that SQL Server makes available to you.  The two special prefixes are “inserted” and “deleted”.  During an insert statement, the inserted prefix is available for you to use.  During a delete statement, the deleted prefix is available to you.  During the update and merge statements, both the deleted and inserted prefixes are available to you.  In these cases, the deleted represents the data before it was changed and the inserted represents the data after it was changed.

One thing to note is that the data being outputted must go in to a table or table variable.

OUTPUT Clause On An INSERT Statement

OUTPUT Clause On A DELETE Statement

OUTPUT Clause On An UPDATE Statement

OUTPUT Clause On A MERGE Statement

Reference: http://msdn.microsoft.com/en-us/library/ms177564.aspx


Parse XML With SQL Server


SQL Server has a lot of useful functionality built-in when it comes to XML.  Today I want to show you an easy way to parse XML data in to a usable format inside your query.

For this example, I will be parsing XML data into a temp table, then inserting that data in to a table.

The statement that parses the XML has 2 parts to it.

This part says to parse the @XMLToParse variable (previously filled with the XML file) using the method built in to SQL Server called .nodes.  It specifies the repeating node to be Animals/LandAnimals/Animal.  It assigns an aliases this XML parsed records as a table named xmlData.  The (A) is the column name of the rowset.  This will be referenced in the select part of the statement.


The select part of this statement references xmlData (which is the table aliases) and A (which is the column named for that table).  It calls the .value function to return the value from the table/column.  For the .value function, you pass in 2 elements.

The first element is the field.  In this case we are passing in just a period.  We do this because there is no node below the Animal node.  If there was, we would need to specify it here..

The second parameter is the datatype that you would like the value to be casted as.  Then I always aliases the field to something relevant.

Although parsing XML can be a little confusing in SQL Server, it is very powerful.  This is a great way to pass bulk data to a stored procedure from any type of client application.

Difference Between UNION and UNION All


The UNION operation and the UNION ALL operation perform almost the same operation.  They are both used to combine two result sets in to one result set.  The main difference between the two operations is that the UNION operation will return the unique records in the final result set.  The UNION ALL operation will return any duplicates in the final result set.

In the examples below, we will assume we have two tables with the following data.  Notice that the values Cat and Dog exist in both tables.

Table1 Table2
Alligator Cat
Beaver Dog
Cat Eagle
Dog Frog

The UNION Operation


Notice that the duplicate Cat and Dog values were removed from the result.

The UNION ALL Operation


Notice that the duplicate Cat and Dog values are left in the result.

Speed Considerations

One final difference between UNION and UNION ALL is the speed difference.  Because the UNION operation needs to return a distinct set of values, this will take extra processing time.  A general rule is that if you know that the combined data is going to be unique already… just use the plain UNION ALL operation.  Only use the UNION operation if there are duplicates that you would like to have remove.
NOTE:  The UNION and UNION ALL operations are actually one operation.  The ALL keyword is just an attribute.  Because they are so different, I find it is easier to think of them both as separate operations.

Reference:  http://msdn.microsoft.com/en-us/library/ms180026.aspx