The real reason SELECT * queries are bad: index coverage

Thursday, July 19, 2007

SQL SQL Server

Are SELECT * queries bad? Sure, everyone know that. But, why?

It's returning too much data, right?

That's the common answer, but I don't think it's the right one. If you're working with a reasonably normalized database, the actual network traffic difference is pretty small.

Let's take a look at a sample. The following two queries select 326 rows from the TransactionHistoryArchive table in the AdventureWorks database (which has a total of 89K rows). The first uses a SELECT * query, the second selects a specific column:

SELECT * FROM Production.TransactionHistoryArchive 
WHERE ReferenceOrderID < 100

SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive 
WHERE ReferenceOrderID < 100

In this case, the difference in network traffic is only 15K, roughly a 10% difference (180K vs. 165K). It's worth fixing, but not a huge difference.

SELECT * makes the Table / Index Scan Monster come

Often, the bigger problem with SELECT * is the effect it will have on the execution plan. While SQL Server primarily uses indexes to look up your data, if the index contains all the columns you’re requesting it doesn’t even need to look in the table. That concept is known as index coverage. In the above example, the first query results in a Clustered Index Scan, whereas the second query uses a much more efficient Index Seek. In this case, the Index seek is one hundred times more efficient than the Clustered Index Scan.

Unless you've indexed every single column in a table (which is almost never a good idea), a SELECT * query can't take advantage of index coverage, and you're likely to get (extremely inefficient) scan operations.

If you just query the rows you'll actually be using, it's more likely they'll be covered by indexes. And I think that's the biggest performance advantage of ignoring SELECT * queries.

The Stability Aspect

SELECT * queries are also bad from an application maintenance point of view as well, since it introduces another outside variable to your code. If a column is added to a table, the results returned to your application will change in structure. Well programmed applications should be referring to columns by name and shouldn't be affected, but well programmed applications should also minimize the ways in which they are vulnerable to external changes.

Shameless Plug: I go into this (and a lot other important performance tips) in more detail in a soon-to-be-released book for SitePoint.

14 Comments

"If a row is added to a table..."

Should be: "If a column is added to a table..."

Goran - Thursday, July 19, 2007 1:55:29 AM

Great post Jon...

The point about stability and ordinal position is a very real one, and I strongly agree with the practice of referencing columns by name*. You'll find this out the hard way if you use most database sync applications to migrate changes from one environment to another.

Even if you don't rely on ordinal position, it's a good idea to have your change scripts drop and recreate a table when columns are added to it, if only for schema consistency reasons.

Can't wait for the book!

* ...he says hypocritically, knowing full well that SubSonic relies on the assumption that the name field is in the second ordinal position

Eric Kemp - Thursday, July 19, 2007 4:14:45 AM

Nice to read the reason why this is pushed so heavily by DBAs.

I will disagree with your statement that SELECT * makes maintenance of an application more difficult. I use a hybrid OR/M (homegrown) that maps many related tables together and returns all columns so that the developer can decide what to do with the data. If someone is to add a column now, all I have to do is update the code to fill the extra column.

If I were selecting specific columns, i would still have to update the code, but additional any views and procs that return data to the app.

Jerry - Thursday, July 19, 2007 8:08:25 AM

Another reason to avoid SELECT * is with views that access other views. You'll have to recompile each view in the correct dependency order or you'll get errors. You also run the risk of introducing problems where the new column matches the name of another column in the view and you can get ambiguous results.

Chris Miller - Thursday, July 19, 2007 8:38:26 AM

good one. A good reason to not use Select *

Vikram - Thursday, July 19, 2007 10:05:06 AM

Good one. Definitely will keep this in mind.

Question, does it also apply with

select foo
from bar
where exists
( select * from tarfu where tarfu.idee = bar.idee)

or does the query optimizer recognize that it doesn't require any complex index scanning?

mcgurk - Thursday, July 19, 2007 11:29:56 AM

@mcgurk:
SQL Server's Query Optimizer knows the difference in EXISTS -case as it tests only a boolean value, anyway.

So it doesn't matter within EXISTS whether you use SELECT * or something else.

Jemm - Thursday, July 19, 2007 12:21:02 PM

I'm curious what other variations you tried. Did you reverse the queries in your batch? Did you add more than one column to your other select? What happens when you explicitly name all of the columns in the table? The reason I ask is that databases are actually really tough to benchmark and profile. It can be nearly impossible when multiple clients are connected as well. Your first query undoubtedly placed all of the pages in RAM that the second query needed so the fact that it ran so much faster doesn't mean much.

Sam Corder - Thursday, July 19, 2007 4:34:06 PM

@Sam Corder
Great points.

1) Yes, Index Coverage will only prevent a scan if all the columns are covered. Adding all columns to a select query wouldn't help. However, if SELECT * you're pretty much guaranteed you won't take advantage of index coverage; if you SELECT only the columns you need you (or your DBA) has the opportunity to add selective indexes to frequently used queries.

You also have the opportunity of adding frequently used columns to indecies via SQL Server 2005's "index with included columns" feature: http://msdn.microsoft.com/en-us/library/ms190806.aspx

2) I did test the other order, and I cleared cache between all runs. Try this query, you should see that the SELECT * queries show 50% load and the SELECT column query takes 0% load, regardless of order:

USE AdventureWorks
GO

DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')

CHECKPOINT
DBCC DROPCLEANBUFFERS
GO

SELECT * FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
GO

DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')

CHECKPOINT
DBCC DROPCLEANBUFFERS
GO

SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100

GO
DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')

CHECKPOINT
DBCC DROPCLEANBUFFERS
GO

SELECT * FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100
GO
DBCC FREESESSIONCACHE
DBCC FREEPROCCACHE
DBCC FREESYSTEMCACHE('ALL')

CHECKPOINT
DBCC DROPCLEANBUFFERS
GO

SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive
WHERE ReferenceOrderID < 100

GO

Jon Galloway - Thursday, July 19, 2007 5:17:48 PM

Great article Jon,

Does this still apply even when you want to get all columns from the row? Should you still write out each column name in the select query instead of *?

thanks

Justin - Friday, July 20, 2007 5:28:09 AM

Indexes, and it is also very simple - you quiery ONLY what you need.

There's a reason, if you check top 5 internet dating sites, most of them are running on tens of servers (and one on at least a 100), and one - on 3. If you don't get big, it is easy to cover up sloppiness by throwing hardware at it, but that will also cost ya.

Reason #2 is also obvious - if you need just one value, get a value, not an entire row:

Select Value1 from Table1 Where Key=123

is way better than

Select * from Table1 Where Key=123

, and use appropriate tools to get it to an application (NOT a Recordset). Writing "Select * " justifies sloppy code in this case as well.

SmiLie - Friday, July 20, 2007 2:15:05 PM

yes this statement return the lot of rows if it is not used properly. and it will slowdown your application.

stokes - Friday, October 22, 2010 6:00:36 AM

abides Somebody essentially help to make surloesiy posts I would state. This is the very first time I frequented your web page and thus far? I surprised with the research you made to create this particular publish incredible. Great job!

Joe - Monday, January 21, 2013 5:02:11 PM

Great information This is diitinctsve. Anyone stare upon every single content articles and we are baffled. We're interested in this type of whatever you may desire. An appreciate going up on your collect, and value for money doing while in this. Please keep cutting. Th

Gerfredo - Thursday, January 24, 2013 6:09:16 AM

Comments have been disabled for this content.