Ranking Data In SQL Server

Facebooktwitterredditpinterestlinkedinmail

When it comes to ranking data in SQL Server, as of SQL Server 2005 they have built this in to the database. In 2005 the SQL Server team introduced the RANK function. The RANK function allows you to assign a number to each record in a result set based something in the data.

Let’s say that you wanted to apply a rank based on the weight of animals in your zoo (from heaviest to lightest). Elephant would get a number of 1 (because it’s the heaviest) and a tiny turtle would get the number 100 (because it’s the lightest). The number returned would start at 1 and grow for every record. This is done using this awesome RANK function.

The RANK function also has a partition feature that allows you to group the data. This allows the ranking number to reset for every group. Using our example above, we could extend it to show us the heaviest to lightest animals based on the animal type. So there would be a #1 ranking for the mammal, amphibian, bird, and reptile animal types.

 
Duplicate Ranks
Because the ranking number is incremented in the order specified by the ORDER BY clause, what happens if there are duplicates? In our example above, what if there are 2 animals with the same weight? If this occurs, the same ranking number will be applied to each of the duplicate records. So if there were 2 animals with the same weight, they would get the same exact ranking number.

 
How To Use RANK
The RANK function has 2 parameters. You must always supply the ORDER BY. The PARTITION BY is optional.

    PARTITION BY – This is what you would like SQL Server to group your rankings by. In the above example, if we want the ranking number to reset for every different animal types, then we would specify that here.
    ORDER BY – The ORDER BY is the order that you want your ranking number to be generated in your result set. This is a required parameter. In the example above, we want it from heaviest to lightest. We would specify the column name here that holds the animal weight.

Here is some sample code and the output to show each of the above scenarios.

SELECT	AnimalID,
	AnimalName,
	AnimalWeight,
	RANK() OVER (ORDER BY AnimalWeight DESC) AS AnimalWeightRank
FROM	Animal

In the above example, you can see that we are ranking each record by the weight of the animal from heaviest to lightest. This is done by passing the ORDER BY parameter to the function.

[table width=”500″ colwidth=”20|75|50|20″ colalign=”left|left|left|left”]
AnimalID,AnimalName,AnimalWeight,AnimalWeightRank
168,Camel,1200,1
162,Carabao,500,2
163,Cattle,400,3
166,Alpaca,350,4
182,Deer,300,5
187,Turkey,30,6
186,Goose,15,7
185,Duck,10,8
171,Cat,10,8
188,Quail,10,8
189,Pigeon,5,11
184,Chicken,5,11
[/table]

 
 

SELECT	AnimalID,
	AnimalName,
	AnimalType,
	AnimalWeight,
	RANK() OVER (PARTITION BY AnimalType ORDER BY AnimalWeight DESC) AS AnimalWeightRank
FROM	Animal

In the above example, you can see that we are ranking each record by the weight of the animal from heaviest to lightest. We take this a bit further by grouping these by the AnimalType. This is done by passing the PARTITION BY parameter to the function call.

[table width=”500″ colwidth=”20|75|75|50|20″ colalign=”left|left|left|left|left”]
AnimalID,AnimalName,AnimalType,AnimalWeight,AnimalWeightRank
187,Turkey,Bird[attr style=”background-color:#F2F5A9″],30,1[attr style=”background-color:#F2F5A9″]
186,Goose,Bird,15,2
185,Duck,Bird,10,3
188,Quail,Bird,10,3
189,Pigeon,Bird,5,5
184,Chicken,Bird,5,5
168,Camel,Mammal[attr style=”background-color:#F2F5A9″],1200,1[attr style=”background-color:#F2F5A9″]
162,Carabao,Mammal,500,2
163,Cattle,Mammal,400,3
166,Alpaca,Mammal,350,4
182,Deer,Mammal,300,5
171,Cat,Mammal,10,6
[/table]

 
 
Reference: http://msdn.microsoft.com/en-us/library/ms176102.aspx

ROW_NUMBER – How To Use It

Facebooktwitterredditpinterestlinkedinmail

ROW_NUMBER is a function built-in to SQL Server that will return a row number for each record in your result set. You can further change resulting row number to reset the row number based on some value in the result set. I will show you examples of both below.

ROW_NUMBER Basics
To show the row number in SQL Server, you need to use the ROW_NUMBER function. This function is broken down in to two parts.

  • PARTITION BY – If you supply this parameter, then the row number will reset based on the value changing in the columns supplied. This is kinda like using a GROUP BY.
  • ORDER BY – This is the specified order of the row number that you would like. If you wanted the order of the row number to increment by an employee name (alphabetically), you do that here.

 
With this function I think examples will help paint the full picture.

In the following query, the results will show a row number with each record. The number will start at 1 and increment for every record in order of AnimalName. You can see that that the order of the row number increment is specified as a parameter in to the ROW_NUMBER() function.

SELECT	AnimalID,
	AnimalName,
	AnimalType,
	ROW_NUMBER() OVER(ORDER BY AnimalName) AS RowNumber
FROM	Animal

[table width=”500″ colwidth=”20|100|50″ colalign=”left|left|center|left|right”]
AnimalID,AnimalName,AnimalType,RowNumber
166,Alpaca,Mammal,1
168,Camel,Mammal,2
162,Carabao,Mammal,3
171,Cat,Mammal,4
163,Cattle,Mammal,5
184,Chicken,Bird,6
182,Deer,Mammal,7
185,Duck,Bird,8
186,Goose,Bird,9
189,Pigeon,Bird,10
188,Quail,Bird,11
187,Turkey,Bird,12
[/table]

 
In the above example, the row number never resets. It started at 1 and kept going through all the records. But what if you wanted to reset the row number back to 1 based on a value changing in your result set. In the following example, every time that the AnimalType changes we will reset the row number back to 1. This way, each AnimalType would have it’s own set of row numbers. We accomplish this by listing the columns that we want to group the row numbers by in the PARTITION BY parameter.

SELECT	AnimalID,
	AnimalName,
	AnimalType,
	ROW_NUMBER() OVER(PARTITION BY AnimalType ORDER BY AnimalName) AS RowNumber
FROM	Animal A
ORDER	BY AnimalType, AnimalName

[table width=”500″ colwidth=”20|100|50″ colalign=”left|left|center|left|right”]
AnimalID,AnimalName,AnimalType,RowNumber
184,Chicken,Bird[attr style=”background-color:#F2F5A9″],1[attr style=”background-color:#F2F5A9″]
185,Duck,Bird,2
186,Goose,Bird,3
189,Pigeon,Bird,4
188,Quail,Bird,5
187,Turkey,Bird,6
166,Alpaca,Mammal[attr style=”background-color:#F2F5A9″],1[attr style=”background-color:#F2F5A9″]
168,Camel,Mammal,2
162,Carabao,Mammal,3
171,Cat,Mammal,4
163,Cattle,Mammal,5
182,Deer,Mammal,6
[/table]

 
 
Good Things To Know
You can specify multiple columns in the PARTITION BY and ORDER BY parameters by separating them with a comma.

You can specify ASC or DESC in the ORDER BY parameter if you like as well.

 
Reference: http://msdn.microsoft.com/en-us/library/ms186734.aspx

MAXDOP – SQL Server’s Way To Avoid Parallelism

Facebooktwitterredditpinterestlinkedinmail

SQL Server comes with built-in support for using parallelism for query execution. That means that SQL Server will split up the execution of a query among different CPUs if it thinks that it can get the query results faster that way. The default setting in SQL Server will use the number of CPU cores on the system as the maximum number of parallel processors to use. Okay… this is great, right?

 
The Grass Isn’t Always Greener
Although the thought of running 1 SQL query but using multiple processors sounds great, sometimes it can actually sometimes cause slower queries. A couple issues I have ran in to in our production environment are:

  • Sometimes the parallelism can take away CPU from other processes that are trying to run at the same time.
  • Sometimes the time that it takes to reassemble the results is longer than the not using parallelism. What this means is that when a query is split among different processors, it will need to merge them back together before it sends it to you. Sometimes this can be a longer task then just using 1 processor the whole time.

 
MAXDOP
MAXDOP is a query hint that can be added to the end of your SQL query to control the amount of parallelism that happens. When you use MAXDOP, you will specify the maximum number of parallelism threads that SQL Server uses for that query. In fact, the word MAXDOP stand for “Maximum Degree Of Parallelism”.

Here is an example of how to use the MAXDOP query hint to limit the number of parallel threads to 1 (essentially disabling the parallelism):

SELECT	*
FROM	Animal A
INNER	JOIN AnimalType AT ON AT.AnimalTypeID = A.AnimalTypeID
ORDER	BY A.AnimalName
OPTION	(MAXDOP 1)

To show how this affects the query, let me show you the execution plan when I supply the MAXDOP 1.
MAXDOP No Parallelism

Here is a screenshot of the execution plan of that exact same query without specifying the MAXDOP. You can see that it now uses parallelism.

MAXDOP Parallelism

 
Will Removing MAXDOP Help My Queries Run Faster
Is MAXDOP a silver bullet for making queries faster? No. In most cases you will just want SQL Server to manage the parallelism. This is usually the fastest/best way for your query to execute. But occasionally if you are having a slow query, you can try changing the MAXDOP setting. You will need to test your query before and after to see if there is significant improvement.

 
How Do You Know If Your Query Is Using Parallelism?
Here are a couple ways that I use to find out if parallelism is happening to a query.

  1. Check out the actual execution plan. After you run the query and the execution plan shows up, you will see a “Parallelism” step. You can see the screenshot above for what that looks like.
  2. Monitor the Activity Monitor in SQL Server Management Studio. This is a great way to detect if your production applications are causing parallelism. It’s really easy to see in Activity Monitor because it looks like the same line repeated many times.
    MAXDOP Activity Monitor

 
Reference: http://msdn.microsoft.com/en-us/library/ms181714.aspx

SQL Server 2014 Feature – Non-Clustered Indexes For Table Variables

Facebooktwitterredditpinterestlinkedinmail

This is truly an amazing feature to add to SQL Server 2014. I use table variables all the time and this was the only thing I didn’t like about them. Up to this point, SQL Server did not support having non-clustered indexes on table variables (the one with the name that starts with the @). If you wanted to do this, you had to create/use a temp table (the one with the name that starts with #).

With SQL Server 2014, they changed this and now allow the table variables to have non-clustered indexes as well. To do this, we simply add a little bit of extra code after the column declaration.

DECLARE	@AnimalTableVar	TABLE
( AnimalID		INT,
  AnimalName	VARCHAR(50)	INDEX IX_AnimalTempTable_AnimalName
)

In the sample above, we create a table variable called @AnimalTableVar. Then when we declare the column AnimalName, we add an index called IX_AnimalTempTable_AnimalName.

Here is how selecting from this table shows up in the execution plan!
Non-Clustered Indexes For Table Variables

Reference: http://msdn.microsoft.com/en-us/library/ms188927(v=sql.120).aspx

FORCE ORDER Query Hint

Facebooktwitterredditpinterestlinkedinmail

When querying out data, SQL Server has a ton of different query hints. One less commonly known query hint is FORCE ORDER. So here we will talk about what this query hint does and how we can use it efficiently.

FORCE ORDER
When you put this query hint on to your query, it tells SQL Server that when it executes the statement to not change the order of the joins in the query. It will join the tables in the exact order that is specified in the query. Normally the SQL Server optimizer will rearrange your joins to be in the order that it thinks will be optimal for your query to execute.

To use the FORCE ORDER query hint, you will want to add this at the end of your query.

 
Example Without FORCE ORDER

SELECT	*
FROM	Habitat H 
INNER	JOIN AnimalHabitat AH ON H.HabitatID = AH.HabitatID
INNER	JOIN Animal A ON AH.AnimalID = A.AnimalID

This is the actual execution plan without the FORCE ORDER hint. You can see that SQL Server starts at the table Animal and AnimalHabitat tables, then goes to the Habitat table. If you look at our query above, we clearly list the tables in the order of Habitat, AnimalHabitat, then Animal.

Execution Plan Without Force Order

 
Example With FORCE ORDER

SELECT	*
FROM	Habitat H 
INNER	JOIN AnimalHabitat AH ON H.HabitatID = AH.HabitatID
INNER	JOIN Animal A ON AH.AnimalID = A.AnimalID
OPTION	(FORCE ORDER)

In the screenshot below, it shows the actual execution plan with the FORCE ORDER hint. In the new query, you can see that SQL Server starts with the Habitat and AnimalHabitat tables and then moves to the Animal table. It does this because when we turn on the FORCE ORDER option, it uses the order of the tables specified in the query syntax.

Execution Plan With Force Order

 
What Is This Used For
The FORCE ORDER query hint is only used when you want to override the way that SQL Server wants to execute this query. Normally you will just let SQL Server figure out how to get the data from the database. It does such a good job at it, that you do not usually need to override this functionality.

Occasionally SQL Server will not figure out the most optimal way to pull this data out. In this case you may want to try this query hint.