On June 24th, Dell Software gave me the privilege of presenting Subqueries For Superheroes as part of their Experts Perspective webcast series. This is one of my favorite presentations to give, and it’s always a treat to be invited to present for Dell.
In case you can’t tell from the title, this presentation is about subqueries. How they can help you, how they can hurt you, and some ways that you might be using one without realizing it.
If you missed the live webcast, or just want to watch it again, you can find it here: https://software.dell.com/webcast-ondemand/subqueries-for-superheroes829601/
Slides And Demo Scripts
You can download the slide deck here: https://www.dropbox.com/s/bv2mx1jyyfr6fq8/Subqueries%20for%20Superheroes%202014-06-24.pptx
Or flip through it online right here:
The demo scripts used in the session are available to download here: https://www.dropbox.com/s/8883ebjgbvt8asb/Scripts.zip
And finally, you can get the AdventureWorks2012 database (on which these scripts are based) here: http://msftdbprodsamples.codeplex.com/downloads/get/165399
Q: Are subqueries able to utilize indexes created on their original tables?
Yes, absolutely. A subquery is just part of another query, and is compiled together with that larger query. The same factors that go into compiling any query (statistics, available indexes, etc) apply to the subquery as well.
Q: What about using a CTE?
A CTE (common table expression) is just another form of subquery and is, in fact, discussed in one of my demo scripts.
Q: What are the differences in IO between the different SQL statements?
If you download the demo scripts, you can answer this for yourself by using the SET STATISTICS IO ON command and running the desired SQL statements.
Q: Hi, I notice you use IN at lot. is it better to use EXISTS instead of IN?
Great question! This loosely translates to “is a correlated subquery better than a non-correlated subquery?”. Generally, there isn’t a significant difference between the two, but it largely depends on the overall query and the data against which it’s being executed. Examples of each are included in the demo scripts, download them and experiment!
Q: Hi, in a subquery like the one inside an EXIST, does it matter if you use * on the SELECT or for instance “1”. I am under the impression that IO might be higher with *.
Another great question, one with as many answers as there are SQL blogs. Typically there is no performance difference between “SELECT *”, “SELECT 1”, “SELECT ‘x’”, or any other variation of these when used inside of an EXISTS statement. Look for a followup blog post dedicated to this topic.
Q: Can you compare correlated vs non-correlated for taking out dups?
Generally, there isn’t a significant difference between the two, but it largely depends on the overall query and the data against which it’s being executed. Examples of each are included in the demo scripts, download them and experiment!
Q: Hi ,Will be there any difference in getting count of records in table using Count(1) Vs Count(*) ?
No, both will count all rows that match the WHERE clause and/or JOIN conditions of the query.
Q: Not A Question, but additional info on Not In vs Not Exists: “Not In” = 3-Valued Logic (Including UnKnown). “Exists” = 2-Valued Logic (True or False)
Great explanation, thank you!
Q: You mentioned user-defined functions, but did not have time to finish it, could you do a summary about that? Is it in most of cases a performance hit? Thanks
I’m actually working on a separate presentation dedicated entirely to the topic of user-defined functions (UDFs), as well as a couple of blog posts. In addition, you can download the demo scripts for this Subqueries presentation and try out the UDF examples that I didn’t get to show you.
Q: You mentioned using an ‘Old’ version of SQL Server vs a ‘newer’ version. With which version are these new statements available?
The window functions (RANK, ROW_NUMBER, etc) were introduced in SQL Server 2005. Each subsequent release has brought improvements and expanded features to the window functions, including LAG and LEAD.
Q: Where and how can I use APPLY in subquery?
As long as you’re using APPLY in a way that forms a valid SQL statement, you can use that SQL statement in any of the allowed ways that subqueries can be used.
Q: Hi, do you consider creating temporary table would as good as using subquery?
It depends. Sometimes using a temp table followed by a separate query will perform better than a single query utilizing a subquery. The “problem” with using a temp table is that you’re now following a “procedural” query path instead of a set-based query path. Set-based methods are typically always the better way to go, but there are exceptions. Experiment, try both against your dataset, and determine which best suits your needs.
Q: How does data manipulation inside a subquery, such as ISNULL(column) = ISNULL(othercolumn), affect the subquery?
This expression, or one like it, will behave within a subquery the same as in any “normal” query. Any time you wrap a function around a column in a search expression, you risk introducing a “non-SARGable expression” into your query. SARGability is too broad of a topic to address here, but there is plentiful information to be found online.
Q: should we use the same precautions for stored procedures? Since sp’s have better performance (from what ive been told)
A stored procedure is nothing more than a pre-compiled collection of queries. There used to be a performance advantage to using stored procedures, due to the fact that once compiled, the resulting query plan would be cached, eliminating the need to compile the procedure for subsequent executions. That advantage has diminished somewhat, starting with SQL Server 2005, when statement-level caching was introduced. The real advantage to using stored procedures isn’t so much about performance as it is about flexibility. A query that is contained within a stored procedure is more easily tuned and tweaked by a DBA than a query that is embedded within an application. At 3:00am when the pager is going off, this can mean the difference between being the hero, or being helpless.
Q: Which method is best to get a distinct on a VARCHAR column where other joins are involved.
I’m not entirely sure I understand the question, but typically, the relationships between your tables won’t be based on a VARCHAR column, they’ll be based on an INT or some other datatype better suited to serving as a key value. If you look at the examples in my demo scripts, you’ll note that all of my IN or EXISTS clauses are looking for the existence of a key value within another dataset comprised of other key values. This is where the de-duplication occurs. The related columns (such as VARCHARs) are retrieved once the unique keys are known.
Other questions (or comments)? Post them here in the comments section, and I’ll do my best to answer them!
Thanks For Attending
To those of you who attended the session live, please accept a huge THANK YOU! from me. This was by far the largest audience that I’ve presented to, and the Dell guys tell me it was the best webinar turnout they’ve had yet. I hope you enjoyed the session and, most importantly, learned something from it.