Don’t Get Slimed By Bad Parameter Sniffing

ghostbusters_slimer

Listen! Do you smell something? Are you suffering from random, unexplained slowdowns? Users complaining about queries that sometimes run fast, sometimes slow, for no obvious reason? Maybe it’s gremlins, or maybe your server is haunted, but more likely it’s just bad parameter sniffing. Before you call in the Ghostbusters, let’s discuss parameter sniffing, both good and bad, and how to combat the bad variety.

What is parameter sniffing?

Parameter sniffing is a fundamental part of the SQL Server query engine. In order to become a troubleshooting and performance tuning expert, you need to have a solid understanding of what it is and how it works. I’ll give you the simple explanation of what parameter sniffing is. There are better, more detailed explanations available out there, but for the purpose of this article, a simple explanation will suffice. Assume that you have a stored procedure:

CREATE PROCEDURE uspSearchForClientAddressByClientID
  @ClientID INT
AS
BEGIN
  SELECT ClientID, ClientAddress
  FROM ClientAddresses
  WHERE ClientID = @ClientID;
END;

Assume also that you have ten (10) clients with ID’s ranging from 1 to 10. Client number 1 has 900 locations, each represented by a row in the ClientAddresses table. The other clients, 2 thru 10, only have one location each.

The ClientAddresses table is a heap (a table with no clustered index), with one non-clustered index on the ClientID column.

Let’s execute the stored procedure:

EXECUTE uspSearchForClientAddressByClientID
  @ClientID = 9;

Gears whir, buzzers buzz, spinners spin, as SQL Server figures out the fastest way to run our stored procedure. Looking at the parameter value that we provided, the indexes on the table, and the statistics describing the data contained in the table and the indexes, SQL compiles a query plan and then executes the procedure according to that plan. It also remembers the plan for later use. This, in a nutshell, is parameter sniffing.

In this case, the statistics tell SQL Server that there is only one row with a client ID of 9, and an index is available to point us directly at that single row. The resulting query plan will do an index seek against the non-clustered index, finding our one row, then do an additional lookup against the heap to get the ClientAddress column.

client9plan

This plan will be cached, and all subsequent executions of the stored procedure will do the same index seek/heap lookup operation, regardless of the value specified for the @ClientID parameter.

What is bad parameter sniffing?

sniffing

Let’s execute the procedure again, this time looking for a client ID of 1, the client with 900 addresses. There’s already a plan in the cache for the stored procedure, so that plan is retrieved and used to run our query. This time however, there are 900 rows to locate. If you recall, the query plan, thinking that there’s only one row to retrieve, does an index seek and a heap lookup to find one row. In this case, it’s going to do 900 index seeks, and 900 heap lookups (notice the “thick lines” in comparison to the first query).

client1plan

See how widely different the estimated number of rows is from the actual number of rows? That’s bad parameter sniffing at work. The plan is not optimized for the parameter value that we provided this time, it’s optimized for another parameter value with a drastically different data profile.

Conversely, if the first procedure execution uses client ID 1 as its parameter, the query optimizer knows (based on statistics) that there are 900 rows to retrieve. A table scan will be more efficient that seeking each row one by one, so the compiled plan does just that. You can see in the plan that the estimated row count was 900, and 900 rows were actually returned.

client1badplan

Unfortunately, this means that every execution of the procedure, even for the single-row client ID values, would also do a table scan. Running the procedure for client ID 9, the estimated row count is still 900, but only one row is actually returned.

client9badplan

Definitely not preferred behavior, and yet another example of bad parameter sniffing.

Stored procedures vs standalone queries

One common technique for reducing the effects of parameter sniffing in a stored procedure is to declare local variables within the procedure, assign your parameter values to those variables, and use the variables in your query instead of the parameters. Something like this:

CREATE PROCEDURE uspSearchForClientAddressByClientID
  @ClientID INT
AS
BEGIN
  DECLARE @PassedClientID INT;

  SET @PassedClientID = @ClientID;

  SELECT ClientID, ClientAddress
  FROM ClientAddresses
  WHERE ClientID = @PassedClientID;
END;

When presented with a procedure constructed this way, SQL Server is unable to “sniff” the incoming parameter value. For lack of any better direction, it will rely solely on the table and index statistics to figure out an “average” value for the incoming parameter, the value that it thinks is most likely to be provided. It will then produce a nice, generic query plan. The result is a plan that works “OK” for most executions, but it’s not optimized to run as fast as possible for any specific parameter value. If I execute this stored procedure for client ID 9, for which there exists only one row, the query plan indicates a table scan, not an index seek.

localvars

Why? Because the statistics tell the query optimizer that of the 909 rows in the table, 90.9% of them are for client ID 1. Odds are that client ID 1 is going to be searched for most frequently, so that’s what the plan is optimized for.

Another option available for combating bad parameter sniffing in a stored procedure is the WITH RECOMPILE option:

CREATE PROCEDURE uspSearchForClientAddressByClientID
  @ClientID INT
  WITH RECOMPILE
AS
BEGIN
  SELECT ClientID, ClientAddress
  FROM ClientAddresses
  WHERE ClientID = @ClientID;
END;

The WITH RECOMPILE option, if you haven’t guessed already, forces the stored procedure to compile a new query plan each time it is executed. This completely eliminates the possibility of having a “bad” plan cached, but it can add significant overhead to the execution of large or complex stored procedures. Use this with caution.

Unfortunately, bad parameter sniffing isn’t limited to stored procedures. Standalone queries can suffer from it as well.

Parameterized queries

Many of today’s applications don’t use stored procedures at all – it’s part of the timeless conflict between DBA’s and developers. The bias today is toward parameterized queries. You might see something like this used to perform the same client address search that appears above:

SqlConnection conn = new SqlConnection(_connectionString);
conn.Open();
string s = "SELECT ClientID, ClientAddress " + 
  "FROM ClientAddresses WHERE ClientID = @ClientID";
SqlCommand cmd = new SqlCommand(s);
cmd.Parameters.Add("@ClientID", ClientID);
SqlDataReader reader = cmd.ExecuteReader();

A query submitted in this fashion is sniffed, compiled, and cached in exactly the same way that a stored procedure is. It is equally vulnerable to bad parameter sniffing, and it’s even harder for a DBA to fix under fire if it’s causing a problem problem. With the stored procedure, if necessary, I can directly modify the procedure to add WITH RECOMPILE or use the local variables trick to make it behave. With this parameterized code, I can’t do a thing. I need to get a developer involved, a code change is needed, and a new app must be deployed. I do have some options available, which we’ll discuss later in this article, but you can see that I lose a lot of maneuverability when I don’t have stored procedures to work with.

Some of you at this point might be thinking parameters are a bad idea – they seem to cause a lot of instability and potential problems with the query optimizer. Why use them at all? Why not just generate dynamic SQL statements with hard-coded lookup values? If I want to find addresses for client ID 9, why not just have my application issue this query (note that this is a VERY simple example and is provided only to help explain parameterization and plan reuse):

SELECT 
  ClientID, 
  ClientAddress
FROM ClientAddresses
WHERE ClientID = 9;

There are no parameters involved, so there should be no chance of bad parameter sniffing, right? I can issue the same query with a different client ID to find addresses for another client, again with no parameters. Problem solved! Ummm, no offense, but in the words of the great Peter Venkman, this reminds me of the time you tried to drill a hole through your head. Remember that?

What we’ve just described is known as an “ad hoc” query. There’s nothing inherently “bad” about them, it’s just that they sort of defeat the purpose of caching query plans for reuse. Each time somebody looks for client ID 9, that query will reuse the plan that was cached the last time somebody looked for client ID 9. Searching for client ID 8? That’s technically a different query (different hard-coded value, different query text), so it’s going to compile and cache its own plan. If you have 10 clients in your table, you’ll potentially have 10 different query plans cached for client lookups. What if you have a million clients? Yep, that’s a million query plans. Time for a story – gather ’round kiddies…

forcedparameterization

Once upon a time there was a company. This company employed a small army of developers, a handful of DBAs, and one Real SQL Guy. The Real SQL Guy had things humming along pretty nicely, in spite of the fact that the army of developers were constantly assaulting SQL Server with an OLTP workload that was “ad hoc heavy”. One day, the Real SQL Guy decided to upgrade the production server to SQL Server 2012, so he walked down to the computer store on the corner and purchased a monstrous machine – 64 cores, 2TB of RAM, a real beast. SQL Server would surely FLY on this hardware. He plugged in the new machine, restored backups of the OLTP databases onto the new SQL instance, pointed the apps at the new server, and watched as the machine immediately hit 100% CPU utilization and died. Frantically, knobs were turned, buttons were pushed, statistics were updated, all to no avail. Fearing that he’d lost his mojo, the Real SQL Guy called in the REAL SQL Guys, direct from Microsoft. After a week on-site, even they were stumped – the consensus was that the problem had to do with cache management, but they weren’t sure why. Near the end of that week, the Real SQL Guy timidly (as is his manner) suggested that we try a rarely used option – forced parameterization. The mojo was back…

Forced parameterization

So what is forced parameterization? Refer back to the example above where we’re issuing ad hoc queries with the client ID values hard-coded into the query text. In this ridiculously simple example, SQL Server is going to recognize that the client ID value is actually a parameter, even though you haven’t declared it as such, and will process the query as if it were properly declared with parameters. With more complex queries however, it may not do that – the optimizer will only spend a finite amount of time analyzing a query before it gives up and just runs it. Forced parameterization changes this behavior – it forces the optimizer to spend more time thinking about the query before running it, with the hope that it will be able to come up with a reusable query plan.

In the previous tale of woe, where I told you about an upgrade gone bad, the problem stemmed from a change in the way the plan cache is managed in SQL Server 2012. Without going into the technical details, let’s just say that SQL Server 2012 is more aggressive about cleaning out unneeded plans from the cache. With our ad hoc heavy workload, SQL Server was getting slammed with literally THOUSANDS of ad hoc queries per second. That’s thousands of requests to compile a plan, store it in the cache (where it will never be used again), while at the same time the cleanup process is trying to remove the plans that it knows are ad hoc and not needed anymore. SQL Server essentially became so busy trying to manage the cache that it couldn’t do any real work.

By enabling forced parameterization, those thousands of ad hoc queries suddenly became queries that SQL was able to create reusable plans for. The tug-of-war between the incoming new plans and the cleanup of the old plans stopped, and the server was able to stay alive and actually do work.

Am I suggesting that YOU should enable forced parameterization on your databases? No, I’m not. If you think you’re suffering because of ad hoc queries and excessive compilations and poor plan reuse, then yes, consider forced parameterization, but understand what you’re doing and why. Forced parameterization isn’t free – it increases your exposure to bad parameter sniffing, because it forces more queries to become parameterized. As the use of application frameworks like LINQ and nHibernate increases, the problem of ad hoc queries is going to affect more and more of our SQL Server systems. Developers will argue that nHibernate is great and would NEVER cause problems like this. If you believe that, you probably also believe in UFOs, astral projections, mental telepathy, ESP, clairvoyance, spirit photography, telekinetic movement, full trance mediums, the Loch Ness monster and the theory of Atlantis.

Query hints

queryhints

Alright – we’ve established that stored procedures are no good because they can suffer from bad parameter sniffing. We’ve established that parameterized queries are no good because they too can suffer from bad parameter sniffing. We’ve established that ad hoc queries are no good because they can cause problems with the plan cache. We’ve also established that forced parameterization can reduce the problem with ad hoc queries – by making them vulnerable to bad parameter sniffing. Seems like there’s no way to win, doesn’t it?

It’s not really that bad. There are several query hints available to us to help control how a query is compiled. I could dedicate an entire blog post to just query hints, so I’m not going to cover them all here. I do want to mention one specifically however – the OPTIMIZE FOR hint.

Remember the local variables trick that I talked about earlier, where you use local variables in a stored procedure to pass the incoming parameter values into your query? That trick causes the query to be compiled using a “generic” set of parameters that are based solely on the statistics on the underlying table(s). You can achieve the same effect by using the OPTIMIZE FOR UNKNOWN query hint.

You can also use OPTIMIZE FOR to force a query to compile a plan that is optimized for a specific value. Think about the ClientAddresses table that we were querying earlier. We know that client 1 has 900 address records and that queries looking for that client are going to prefer a table scan. We also know that clients 2 thru 9 each only have one address record and will prefer an index seek. Knowing our workload, we know that the vast majority of the ClientAddresses lookups are done for clients 2 thru 9 – lookups for client 1 are rare. We can use OPTIMIZE FOR (@ClientID = 2) to force the query to compile a plan that is optimized for one of the single-row clients. You can (and should) read a lot more online about the OPTIMIZE FOR query hint – better writers than me have better explanations to offer.

Query hints are powerful tuning tools that you can use, but they require you to have access to the queries. If the queries are being generated by an application, you may not have that access. You might try working with the developers to include query hints in some of the problem queries, but that could easily lead to human sacrifice, dogs and cats living together… mass hysteria! There’s a safer option that will keep the dogs and cats separate – plan guides.

Plan guides

If you have a query that usually runs well but seemingly at random will suddenly fall on its face, you’re likely dealing with a bad query plan resulting from bad parameter sniffing. Consider this graph from Ignite, showing this very behavior as witnessed on one of my production servers.

badplan

The green bars represent a query that would, two or three times a day, receive a bad plan from the optimizer. This query is run thousands of times an hour and when it would run with a bad plan, it was a significant load on the server. After trying various query hints, updating statistics more frequently on a couple of tables, some Gozer worshiping (surprisingly uneffective), I decided to force a good plan through the use of a plan guide. Can you tell from the graph which day I did this on?

A plan guide can take a couple of different forms. In one form, a plan guide is simply a way for you to tell the optimizer “Hey, every time you see this query text, I want you to treat it as if this specific query hint were specified in the query text”. For example, you can force the use of OPTIMIZE FOR UNKNOWN on a query that you don’t have access to in order to add the query hint directly to the query text.

Another form of plan guide is that where you have a known good query plan in the plan cache, and you want to make sure that plan is always used to execute a particular query. By using the plan handle for that query plan, you can tell the optimizer “Hey, every time you see this query text, don’t think about how to run it, just always use this query plan”. That’s what I did to fix this particular problem. Let’s have a look at how I did it.

First, I had to find the plan handle for my known good plan. During a period of time that I knew the query was running correctly, I ran this query:

-- Use this SELECT statement to determine the plan_handle for the desired query plan
SELECT * 
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
CROSS APPLY sys.dm_exec_text_query_plan(qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) AS qp
WHERE qt.text LIKE N'%OrderItemEvents . Event_ID = OrderItemEventRequestor . OrderItemEventID%';

That returns various pieces of information (including the plan handle) about the query containing the text snippet that I specified. Using that plan handle, I then ran this:

-- Create a plan guide for the query using the plan_handle for the desired query plan
DECLARE @plan_handle	VARBINARY(64);
DECLARE @offset			INT;

SELECT 
    @plan_handle	= qs.plan_handle, 
    @offset			= qs.statement_start_offset
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS qt
CROSS APPLY sys.dm_exec_text_query_plan(qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) AS qp
WHERE qs.plan_handle = 0x060006001E371516A00DAFAD2101000001000000000000000000000000000000000000000000000000000000

EXECUTE sp_create_plan_guide_from_handle 
    @name =  N'OrderItemEvents_PlanGuide',
    @plan_handle = @plan_handle,
    @statement_start_offset = @offset;

Done. From now on, until I remove that plan guide, this particular query will ALWAYS use the known good plan that is currently in the plan cache. As you can see in the Ignite graph, it didn’t misbehave again. Back off man, I’m a data scientist.

Parameter sniffing is a necessary part of the query engine and most of the time, it works and does the job it’s supposed to do. When it doesn’t work, it can cause some serious problems. As the SQL Server product has matured and the query engine has improved, it’s become smarter and can do some amazing things with some complicated queries. We have, fortunately, also gained new weapons to use against bad parameter sniffing when it occurs. It pays to keep up with the latest versions of the product.

But what if you’re stuck on an old version?

When all else fails, cross the streams

It’s not pretty, but sometimes you gotta do what you gotta do. You cross the streams, nuke the site from orbit, throw the Hail Mary, you punt…

CrossTheStreams

Because we’re lucky that way, my team is stuck supporting a poorly configured, grossly overloaded SQL Server 2005 STANDARD EDITION instance that we’re not allowed to upgrade. This instance supports a complex, busy application that uses Hibernate-generated queries to communicate with the database. It’s the classic “built and supported by developers with no understanding of SQL Server” system. It’s littered with views that reference other views that reference other views. The queries tend to be the type that fetch every possible column from every table across a ten-table JOIN. I think you get the point.

This system is, as you might expect, incredibly unstable. Multiple times a day, often overnight or over the weekend, one of these big ugly queries will receive a bad plan, and the system slows to a crawl. The developers are reluctant to make code changes, and “it must be a SQL problem” is commonly heard. We’re still expected to keep the database server running, so we opted for the nuclear option. We crossed the streams – a scheduled job issues the DBCC FREEPROCCACHE command every hour. Yep, we dump the entire plan cache, good plans and bad, every hour. This guarantees that all of the bad plans get removed, with the unfortunate side-effect of removing the good plans too. Hey, I said it wasn’t pretty, but we had to show this prehistoric server how we do things downtown.

Who Ya Gonna Call?

Well, actually, they’re gonna call you. When “the database is slow”, when that pesky query is running slow again, or when “something’s wrong”, they’re gonna call you. You’re gonna need to know what to do and maybe, just maybe, something you’ve read here will help you out of a sticky situation. Afterwards, when they ask you if you’re a SQL God, you say YES! I always do…

gb-marshmallow