Archives / 2007 / July
  • The real reason SELECT * queries are bad: index coverage

    Are SELECT * queries bad? Sure, everyone know that. But, why?

    It's returning too much data, right?

    That's the common answer, but I don't think it's the right one. If you're working with a reasonably normalized database, the actual network traffic difference is pretty small.

    Let's take a look at a sample. The following two queries select 326 rows from the TransactionHistoryArchive table in the AdventureWorks database (which has a total of 89K rows). The first uses a SELECT * query, the second selects a specific column:


    SELECT * FROM Production.TransactionHistoryArchive WHERE ReferenceOrderID < 100 SELECT ReferenceOrderLineID FROM Production.TransactionHistoryArchive WHERE ReferenceOrderID < 100


    In this case, the difference in network traffic is only 15K, roughly a 10% difference (180K vs. 165K). It's worth fixing, but not a huge difference.

    SELECT * makes the Table / Index Scan Monster come

    Often, the bigger problem with SELECT * is the effect it will have on the execution plan. While SQL Server primarily uses indexes to look up your data, if the index contains all the columns you’re requesting it doesn’t even need to look in the table. That concept is known as index coverage. In the above example, the first query results in a Clustered Index Scan, whereas the second query uses a much more efficient Index Seek. In this case, the Index seek is one hundred times more efficient than the Clustered Index Scan.


    Unless you've indexed every single column in a table (which is almost never a good idea), a SELECT * query can't take advantage of index coverage, and you're likely to get (extremely inefficient) scan operations.

    If you just query the rows you'll actually be using, it's more likely they'll be covered by indexes. And I think that's the biggest performance advantage of ignoring SELECT * queries.

    The Stability Aspect

    SELECT * queries are also bad from an application maintenance point of view as well, since it introduces another outside variable to your code. If a column is added to a table, the results returned to your application will change in structure. Well programmed applications should be referring to columns by name and shouldn't be affected, but well programmed applications should also minimize the ways in which they are vulnerable to external changes.

    Shameless Plug: I go into this (and a lot other important performance tips) in more detail in a soon-to-be-released book for SitePoint.


  • [T-SQL] Getting distinct, current entries from tables with timestamp / datetime fields

    It's relatively easy to store data with a time dimension, but querying it is another matter. If you select from a temporal tables (one which includes historical information indicated by a timestamp or datetime column) based on your ID, you'll get a lot of duplicate records; only one or a few of those records will be applicable to a given time or timespan.

    For example, the following query (from the AdventureWorks sample database) returns 746 records:


    SELECT ActualCost FROM Production.TransactionHistory WHERE ProductID = 784


    A table with History in the name is expected to work that way, but I've run into plenty of tables which allow for duplicates of every column but the primary key, differentiated by a datetime column. I'm starting to use these more now, partly due to project requirements, and partly because SubSonic has built-in support for CreatedOn and ModifiedOn columns.

    In the past, I've used stacked views or nested subqueries to handle this madness. It's a pain in the neck, and often leads to frustrations with the GROUP BY clause since queries with an aggregate term can't include a column in a resultset that's not in the GROUP BY clause (but adding those columns to the GROUP BY clause isn't the right solution, as it alters the rows you're returning).

    The general idea is to write a query which gets the latest update date for a unique ID combination, then join against it. It works, but it's a pain in the neck, and it's error prone. Fortunately, SQL Server 2005's RANK / PARTITION features make this a lot easier.

    Let's talk specifics - a query against the AdventureWorks Production.TransactionHistory table which returns the latest record Product Name and Number by Transaction Date. There are 113K rows in that table, but if we only want the latest entry for each Product we're down to 490 rows.

    Unlike the traditional GROUP BY based clause, the RANK / POSITION based queries are a bit more intuitive once you've got the hang of them. The important thing is to use PARTITION in the same way you'd think of a GROUP BY - the column(s) you want unique should show up in the PARTITION clause.

    One limitation on RANK queries is that you can't use the RANK value directly in a where clause. That's easy to work around by placing the RANK value in a subquery and the WHERE clause filtering on the RANK in the outer query.

    UPDATE: Based on comments, my original example was a little too simple. I've added a few joins and return columns; the point is that we can continue to add other columns to the result set without having to worry about how the grouping is handled. Yes, you can do this with subqueries, but in many cases those will continue to grow more complicated as you add tables and columns; this syntax doesn't.

    SELECT Product.Name, Product.ProductNumber, TransactionDate, TransactionID, ProductCategory.Name, ProductSubcategory.Name, ProductSubcategory.ModifiedDate FROM ( SELECT DISTINCT Production.TransactionHistory.ProductID, Production.TransactionHistory.TransactionDate, Production.TransactionHistory.TransactionID, RANK() OVER ( PARTITION BY Production.TransactionHistory.ProductID ORDER BY Production.TransactionHistory.TransactionDate DESC) AS DateOrder FROM Production.TransactionHistory ) CurrentTransactionHistory INNER JOIN Production.Product ON Production.Product.ProductID = CurrentTransactionHistory.ProductID INNER JOIN Production.ProductSubcategory ON Production.Product.ProductSubcategoryID = Production.ProductSubcategory.ProductSubcategoryID INNER JOIN Production.ProductCategory ON Production.ProductSubcategory.ProductCategoryID = Production.ProductCategory.ProductCategoryID WHERE DateOrder = 1 ORDER BY ProductSubcategory.ModifiedDate


  • Presentation tricks - Command Windows

    Time management is a big part of a technical presentation. You want your demos to go fast enough to keep your audience's attention, but you don't want to gloss over details and lose them. Here are a few tricks I've used when showing something command line operations at a DOS prompt.


  • SubSonic - Code Camp slides posted

    Slides and sample code from my talk on SubSonic from the SoCal Code Camp a week ago are available here. The slides were pretty light since my talk was geared towards building some very simple sample pages on the fly.


  • Silverlight 1.1 (Alpha) cross domain webservice access makes mashups tricky

    Any web mashups, by definition, require cross-domain calls. Those cross-domain calls may happen on the client (in the browser) or on the server. Regardless of the client technology (AJAX, Flash, Silverlight, etc.), cross domain calls on the client are always more complex that server-side cross-domain calls, and for good reason. It's tricky in AJAX, and it's downright difficult in Silverlight. You'll know that Silverlight development has become more widespread when you hear a lot more complaints about this problem.


  • Some keyboard input tricks for Silverlight 1.1 (Alpha)

    Here are a few tricks I learned while doing my "hello world" maze game in Silverlight 1.1.

    Silverlight doesn't fire the KeyDown event for cursor (arrow) keys

    However, just about all keys will fire a KeyUp event. In my case, I was able to just handle the KeyUp event, but remember that holding a key down will trigger multiple KeyDown events and only one KeyUp event.

    Notice how a lot of the games you're seeing for Silverlight use W/A/S/D instead of arrow keys? I suspect that's the reason.

    The difference between Key and PlatformKey

    The KeyDown and KeyUp event parameters include KeyboardEventArgs, which has properties for both PlatformKeyCode and Key. PlatformKeyCode is a platform specific key code, so by definition it can (and will) change between Windows and Mac (and potentially between versions of these operating systems). It's preferable to use Key unless you need to be you need to detect keys which aren't available on all platforms, such as the Windows start key.

    Key is a platform agnostic key code and is better to use when possible. Of course, there isn't a Key value for the Windows key, so if you're using keys which will only be defined on a specific operating system you're stuck with the PlatformKeyCode value. If you do need access to PlatformKey values, see Dave Relyea's comment on this thread for info on how he determined platform key values.

    The missing Key Enumeration

    KeyboardEventArgs.Key value is supposed to be an Enum per the MSDN documentation, but at least in this release it's just an int. I grabbed the keys and values from the documentation and created a simple enum which should hopefully be forward compatible when that enum is (hopefully) added to agclr.dll.

    enum Keys { KEYNONE = 0, BACKSPACE = 1, TAB = 2, ENTER = 3, SHIFT = 4, CTRL = 5, ALT = 6, CAPSLOCK = 7, ESCAPE = 8, SPACE = 9, PAGEUP = 10, PAGEDOWN = 11, END = 12, HOME = 13, LEFT = 14, UP = 15, RIGHT = 16, DOWN = 17, INSERT = 18, DELETE = 19, DIGIT0 = 20, DIGIT1 = 21, DIGIT2 = 22, DIGIT3 = 23, DIGIT4 = 24, DIGIT5 = 25, DIGIT6 = 26, DIGIT7 = 27, DIGIT8 = 28, DIGIT9 = 29, A = 30, B = 31, C = 32, D = 33, E = 34, F = 35, G = 36, H = 37, I = 38, J = 39, K = 40, L = 41, M = 42, N = 43, O = 44, P = 45, Q = 46, R = 47, S = 48, T = 49, U = 50, V = 51, W = 52, X = 53, Y = 54, Z = 55, F1 = 56, F2 = 57, F3 = 58, F4 = 59, F5 = 60, F6 = 61, F7 = 62, F8 = 63, F9 = 64, F10 = 65, F11 = 66, F12 = 67, NUMPAD0 = 68, NUMPAD1 = 69, NUMPAD2 = 70, NUMPAD3 = 71, NUMPAD4 = 72, NUMPAD5 = 73, NUMPAD6 = 74, NUMPAD7 = 75, NUMPAD8 = 76, NUMPAD9 = 77, MULTIPLY = 78, ADD = 79, SUBTRACT = 80, DECIMAL = 81, DIVIDE = 82, KEYUNKNOWN = 255 }

    The result: a generic Keyboard Handler Event

    First, I add the event handler in the Page_Load event:

    public void Page_Loaded(object o, EventArgs e) { InitializeComponent(); this.KeyUp += new System.Windows.Input.KeyboardEventHandler(keyHandler); //Other stuff... }

    Next, my keyHandler method has a simple switch block which allows for W/A/S/D, numeric keypad, or cursor keys. I also handle the F key, which toggles to full screen mode:

    protected void keyHandler(object sender, KeyboardEventArgs args) { Keys key = (Keys)args.Key; switch (key) { case Keys.A: case Keys.NUMPAD4: case Keys.LEFT: handleLeft(); break; case Keys.W: case Keys.NUMPAD8: case Keys.UP: handleUp(); break; case Keys.D: case Keys.NUMPAD6: case Keys.RIGHT: handleRight(); break; case Keys.S: case Keys.NUMPAD2: case Keys.DOWN: handleDown(); break; case Keys.F: BrowserHost.IsFullScreen = !BrowserHost.IsFullScreen; break; default: return; } }