Dixin's Blog

Dixin is a UI designer and knows a little bit about programming.

  • Understanding LINQ to SQL (11) Performance

    [LINQ via C# series]

    LINQ to SQL has a lot of great features like

    • strong typing
    • query compilation
    • deferred execution
    • declarative paradigm

    etc., which are very productive. Of course, these cannot be free, and one price is the performance.

    O/R mapping overhead

    Because LINQ to SQL is based on O/R mapping, one obvious overhead is, data changing usually requires data retrieving:

    private static void UpdateProductUnitPrice(int id, decimal unitPrice)
    {
        using (NorthwindDataContext database = new NorthwindDataContext())
        {
            Product product = database.Products.Single(item => item.ProductID == id); // SELECT...
            product.UnitPrice = unitPrice; // UPDATE...
            database.SubmitChanges();
        }
    }

    Before updating an entity, that entity has to be retrieved by an extra SELECT query. This is slower than direct data update via ADO.NET:

    private static void UpdateProductUnitPrice(int id, decimal unitPrice)
    {
        using (SqlConnection connection = new SqlConnection(
            "Data Source=localhost;Initial Catalog=Northwind;Integrated Security=True"))
        using (SqlCommand command = new SqlCommand(
            @"UPDATE [dbo].[Products] SET [UnitPrice] = @UnitPrice WHERE [ProductID] = @ProductID",
            connection))
        {
            command.Parameters.Add("@ProductID", SqlDbType.Int).Value = id;
            command.Parameters.Add("@UnitPrice", SqlDbType.Money).Value = unitPrice;
    
            connection.Open();
            command.Transaction = connection.BeginTransaction();
            command.ExecuteNonQuery(); // UPDATE...
            command.Transaction.Commit();
        }
    }

    The above imperative code specifies the “how to do” details with better performance.

    For the same reason, some articles from Internet insist that, when updating data via LINQ to SQL, the above declarative code should be replaced by:

    private static void UpdateProductUnitPrice(int id, decimal unitPrice)
    {
        using (NorthwindDataContext database = new NorthwindDataContext())
        {
            database.ExecuteCommand(
                "UPDATE [dbo].[Products] SET [UnitPrice] = {0} WHERE [ProductID] = {1}",
                id, 
                unitPrice);
        }
    }

    Or just create a stored procedure:

    CREATE PROCEDURE [dbo].[UpdateProductUnitPrice]
    (
        @ProductID INT,
        @UnitPrice MONEY
    )
    AS
    BEGIN
        BEGIN TRANSACTION 
        UPDATE [dbo].[Products] SET [UnitPrice] = @UnitPrice WHERE [ProductID] = @ProductID
        COMMIT TRANSACTION
    END
    

    and map it as a method of NorthwindDataContext (explained in this post):

    private static void UpdateProductUnitPrice(int id, decimal unitPrice)
    {
        using (NorthwindDataContext database = new NorthwindDataContext())
        {
            database.UpdateProductUnitPrice(id, unitPrice);
        }
    }

    As a normal trade off for O/R mapping, a decision has to be made between performance overhead and programming productivity according to the case. In a developer’s perspective, if O/R mapping is chosen, I consistently choose the declarative LINQ code, unless this kind of overhead is unacceptable.

    Data retrieving overhead

    After talking about the O/R mapping specific issue. Now look into the LINQ to SQL specific issues, for example, performance in the data retrieving process. The previous post has explained that the SQL translating and executing is complex. Actually, the LINQ to SQL pipeline is similar to the compiler pipeline. It consists of about 15 steps to translate an C# expression tree to SQL statement, which can be categorized as:

    • Convert: Invoke SqlProvider.BuildQuery() to convert the tree of Expression nodes into a tree of SqlNode nodes;
    • Bind: Used visitor pattern to figure out the meanings of names according to the mapping info, like a property for a column, etc.;
    • Flatten: Figure out the hierarchy of the query;
    • Rewrite: for SQL Server 2000, if needed
    • Reduce: Remove the unnecessary information from the tree.
    • Parameterize
      • Format: Generate the SQL statement string;
      • Parameterize: Figure out the parameters, for example, a reference to a local variable should be a parameter in SQL;
      • Materialize: Executes the reader and convert the result back into typed objects.

    So for each data retrieving, even for data retrieving which looks simple:

    private static Product[] RetrieveProducts(int productId)
    {
        using (NorthwindDataContext database = new NorthwindDataContext())
        {
            return database.Products.Where(product => product.ProductID == productId)
                                    .ToArray();
        }
    }

    LINQ to SQL goes through above steps to translate and execute the query. Fortunately, there is a built-in way to cache the translated query.

    Compiled query

    When such a LINQ to SQL query is executed repeatedly, The CompiledQuery can be used to translate query for one time, and execute for multiple times:

    internal static class CompiledQueries
    {
        private static readonly Func<NorthwindDataContext, int, Product[]> _retrieveProducts = 
            CompiledQuery.Compile((NorthwindDataContext database, int productId) =>
                database.Products.Where(product => product.ProductID == productId).ToArray());
    
        internal static Product[] RetrieveProducts(
            this NorthwindDataContext database, int productId)
        {
            return _retrieveProducts(database, productId);
        }
    }

    The new version of RetrieveProducts() gets better performance, because only when _retrieveProducts is first time invoked, it internally invokes SqlProvider.Compile() to translate the query expression. And it also uses lock to make sure translating once in multi-threading scenarios.

    Static SQL / stored procedures without translating

    Another way to avoid the translating overhead is to use static SQL or stored procedures, just as the above examples. Because this is a functional programming series, this article not dive into. For the details, Scott Guthrie already has some excellent articles:

    Data changing overhead

    By looking into the data updating process, it also needs a lot of work:

    • Begins transaction
    • Processes the changes (ChangeProcessor)
      • Walks through the objects to identify the changes
      • Determines the order of the changes
      • Executes the changings
        • LINQ queries may be needed to execute the changings, like the first example in this article, an object needs to be retrieved before changed, then the above whole process of data retrieving will be went through
        • If there is user customization, it will be executed, for example, a table’s INSERT / UPDATE / DELETE can be customized in the O/R designer

    image

    It is important to keep these overhead in mind.

    Bulk deleting / updating

    Another thing to be aware is the bulk deleting:

    private static void DeleteProducts(int categoryId)
    {
        using (NorthwindDataContext database = new NorthwindDataContext())
        {
            database.Products.DeleteAllOnSubmit(
                database.Products.Where(product => product.CategoryID == categoryId));
            database.SubmitChanges();
        }
    }

    The expected SQL should be like:

    BEGIN TRANSACTION 
    exec sp_executesql N'DELETE FROM [dbo].[Products] AS [t0]
    WHERE [t0].[CategoryID] = @p0',N'@p0 int',@p0=9
    COMMIT TRANSACTION
    

    Hoverer, as fore mentioned, the actual SQL is to retrieving the entities, and then delete them one by one:

    -- Retrieves the entities to be deleted:
    exec sp_executesql N'SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
    FROM [dbo].[Products] AS [t0]
    WHERE [t0].[CategoryID] = @p0',N'@p0 int',@p0=9
    
    -- Deletes the retrieved entities one by one:
    BEGIN TRANSACTION 
    exec sp_executesql N'DELETE FROM [dbo].[Products] WHERE ([ProductID] = @p0) AND ([ProductName] = @p1) AND ([SupplierID] IS NULL) AND ([CategoryID] = @p2) AND ([QuantityPerUnit] IS NULL) AND ([UnitPrice] = @p3) AND ([UnitsInStock] = @p4) AND ([UnitsOnOrder] = @p5) AND ([ReorderLevel] = @p6) AND (NOT ([Discontinued] = 1))',N'@p0 int,@p1 nvarchar(4000),@p2 int,@p3 money,@p4 smallint,@p5 smallint,@p6 smallint',@p0=78,@p1=N'Optimus Prime',@p2=9,@p3=$0.0000,@p4=0,@p5=0,@p6=0
    exec sp_executesql N'DELETE FROM [dbo].[Products] WHERE ([ProductID] = @p0) AND ([ProductName] = @p1) AND ([SupplierID] IS NULL) AND ([CategoryID] = @p2) AND ([QuantityPerUnit] IS NULL) AND ([UnitPrice] = @p3) AND ([UnitsInStock] = @p4) AND ([UnitsOnOrder] = @p5) AND ([ReorderLevel] = @p6) AND (NOT ([Discontinued] = 1))',N'@p0 int,@p1 nvarchar(4000),@p2 int,@p3 money,@p4 smallint,@p5 smallint,@p6 smallint',@p0=79,@p1=N'Bumble Bee',@p2=9,@p3=$0.0000,@p4=0,@p5=0,@p6=0
    -- ...
    COMMIT TRANSACTION
    

    And the same to the bulk updating. This is really not effective and need to be aware. Here is already some solutions from the Internet, like this one. The idea is wrap the above SELECT statement into a INNER JOIN:

    exec sp_executesql N'DELETE [dbo].[Products] FROM [dbo].[Products] AS [j0] 
    INNER JOIN (   
    SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
    FROM [dbo].[Products] AS [t0]
    WHERE [t0].[CategoryID] = @p0) AS [j1] 
    ON ([j0].[ProductID] = [j1].[[Products])', -- The Primary Key
    N'@p0 int',@p0=9

    Query plan overhead

    The last thing is about the SQL Server query plan. Before .NET 4.0, LINQ to SQL has an issue (not sure if it is a bug). LINQ to SQL internally uses ADO.NET, but it does not set the SqlParameter.Size for a variable-length argument, like argument of NVARCHAR type, etc. So for two queries with the same SQL but different argument length:

    using (NorthwindDataContext database = new NorthwindDataContext())
    {
        database.Products.Where(product => product.ProductName == "A")
            .Select(product => product.ProductID).ToArray();
    
        // The same SQL and argument type, different argument length.
        database.Products.Where(product => product.ProductName == "AA")
            .Select(product => product.ProductID).ToArray();
    }

    Pay attention to the argument length in the translated SQL:

    exec sp_executesql N'SELECT [t0].[ProductID]
    FROM [dbo].[Products] AS [t0]
    WHERE [t0].[ProductName] = @p0',N'@p0 nvarchar(1)',@p0=N'A'
    
    exec sp_executesql N'SELECT [t0].[ProductID]
    FROM [dbo].[Products] AS [t0]
    WHERE [t0].[ProductName] = @p0',N'@p0 nvarchar(2)',@p0=N'AA'
    

    Here is the overhead: The first query’s query plan cache is not reused by the second one:

    SELECT sys.syscacheobjects.cacheobjtype, sys.dm_exec_cached_plans.usecounts, sys.syscacheobjects.[sql] FROM sys.syscacheobjects
    INNER JOIN sys.dm_exec_cached_plans
    ON sys.syscacheobjects.bucketid = sys.dm_exec_cached_plans.bucketid; 

    image

    They actually use different query plans. Again, pay attention to the argument length in the [sql] column (@p0 nvarchar(2) / @p0 nvarchar(1)).

    Fortunately, in .NET 4.0 this is fixed:

    internal static class SqlTypeSystem
    {
        private abstract class ProviderBase : TypeSystemProvider
        {
            protected int? GetLargestDeclarableSize(SqlType declaredType)
            {
                SqlDbType sqlDbType = declaredType.SqlDbType;
                if (sqlDbType <= SqlDbType.Image)
                {
                    switch (sqlDbType)
                    {
                        case SqlDbType.Binary:
                        case SqlDbType.Image:
                            return 8000;
                    }
    
                    return null;
                }
    
                if (sqlDbType == SqlDbType.NVarChar)
                {
                    return 4000; // Max length for NVARCHAR.
                }
    
                if (sqlDbType != SqlDbType.VarChar)
                {
                    return null;
                }
    
                return 8000;
            }
        }
    }

    In this above example, the translated SQL becomes:

    exec sp_executesql N'SELECT [t0].[ProductID]
    FROM [dbo].[Products] AS [t0]
    WHERE [t0].[ProductName] = @p0',N'@p0 nvarchar(4000)',@p0=N'A'
    
    exec sp_executesql N'SELECT [t0].[ProductID]
    FROM [dbo].[Products] AS [t0]
    WHERE [t0].[ProductName] = @p0',N'@p0 nvarchar(4000)',@p0=N'AA'
    

    So that they reuses the same query plan cache:

    image

    Now the [usecounts] column is 2.

    Read more...

  • A DirectoryCatalog class for Silverlight MEF (Managed Extensibility Framework)

    In the MEF (Managed Extension Framework) for .NET, there are useful ComposablePartCatalog implementations in System.ComponentModel.Composition.dll, like:

    • System.ComponentModel.Composition.Hosting.AggregateCatalog
    • System.ComponentModel.Composition.Hosting.AssemblyCatalog
    • System.ComponentModel.Composition.Hosting.DirectoryCatalog
    • System.ComponentModel.Composition.Hosting.TypeCatalog

    While in Silverlight, there is a extra System.ComponentModel.Composition.Hosting.DeploymentCatalog. As a wrapper of AssemblyCatalog, it can load all assemblies in a XAP file in the web server side. Unfortunately, in silverlight there is no DirectoryCatalog to load a folder.

    Background

    There are scenarios that Silverlight application may need to load all XAP files in a folder in the web server side, for example:

    • If the Silverlight application is extensible and supports plug-ins, there would be something like a Plugins folder in the web server, and each pluin would be an individual XAP file in the folder. In this scenario, after the application is loaded and started up, it would like to load all XAP files in Plugins folder.
    • If the aplication supports themes, there would be something like a Themes folder, and each theme would be an individual XAP file too. The application would also need to load all XAP files in Themes.

    It is useful if we have a DirectoryCatalog:

    DirectoryCatalog catalog = new DirectoryCatalog("/Plugins");
    catalog.DownloadCompleted += (sender, e) => { };
    catalog.DownloadAsync();

    Obviously, the implementation of DirectoryCatalog is easy. It is just a collection of DeploymentCatalog class.

    Retrieve file list from a directory

    Of course, to retrieve file list from a web folder, the folder’s “Directory Browsing” feature must be enabled:

    image

    So when the folder is requested, it responses a list of its files and folders:

    image

    This is nothing but a simple HTML page:

    <html>
        <head>
            <title>localhost - /Folder/</title>
        </head>
        <body>
            <h1>localhost - /Folder/</h1>
            <hr>
            <pre>
                <a href="/">[To Parent Directory]</a><br>
                <br>
                1/3/2011  7:22 PM   185 <a href="/Folder/File.txt">File.txt</a><br>
                1/3/2011  7:22 PM   &lt;dir&gt; <a href="/Folder/Folder/">Folder</a><br>
            </pre>
            <hr>
        </body>
    </html>
    

    For the ASP.NET Deployment Server of Visual Studio, directory browsing is enabled by default:

    image

    The HTML <Body> is almost the same:

    <body bgcolor="white">
        <h2><i>Directory Listing -- /ClientBin/</i></h2>
        <hr width="100%" size="1" color="silver">
        <pre>
            <a href="/">[To Parent Directory]</a>
            Thursday, January 27, 2011 11:51 PM 282,538 <a href="Test.xap">Test.xap</a>
            Tuesday, January 04, 2011 02:06 AM  &lt;dir&gt; <a href="TestFolder/">TestFolder</a>
        </pre>
        <hr width="100%" size="1" color="silver">
        <b>Version Information:</b>&nbsp;ASP.NET Development Server 10.0.0.0 
    </body>
    

    The only difference is, IIS’s links start with slash, but here the links do not.

    Here one way to get the file list is read the href attributes of the links:

    [Pure]
    private IEnumerable<Uri> GetFilesFromDirectory(string html)
    {
        Contract.Requires(html != null);
        Contract.Ensures(Contract.Result<IEnumerable<Uri>>() != null);
    
        return new Regex(
                        "<a href=\"(?<uriRelative>[^\"]*)\">[^<]*</a>",
                        RegexOptions.IgnoreCase | RegexOptions.CultureInvariant)
                    .Matches(html)
                    .OfType<Match>()
                    .Where(match => match.Success)
                    .Select(match => match.Groups["uriRelative"].Value)
                    .Where(uriRelative => uriRelative.EndsWith(".xap", StringComparison.Ordinal))
                    .Select(uriRelative =>
                        {
                            Uri baseUri = this.Uri.IsAbsoluteUri
                                                ? this.Uri
                                                : new Uri(Application.Current.Host.Source, this.Uri);
                            uriRelative = uriRelative.StartsWith("/", StringComparison.Ordinal)
                                                ? uriRelative
                                                : (baseUri.LocalPath.EndsWith("/", StringComparison.Ordinal)
                                                        ? baseUri.LocalPath + uriRelative
                                                        : baseUri.LocalPath + "/" + uriRelative);
                            return new Uri(baseUri, uriRelative);
                        });
    }

    Please notice the folders’ links end with a slash. They are filtered by the second Where() query.

    The above method can find files’ URIs from the specified IIS folder, or ASP.NET Deployment Server folder while debugging. To support other formats of file list, a constructor is needed to pass into a customized method:

    /// <summary>
    /// Initializes a new instance of the <see cref="T:System.ComponentModel.Composition.Hosting.DirectoryCatalog" /> class with <see cref="T:System.ComponentModel.Composition.Primitives.ComposablePartDefinition" /> objects based on all the XAP files in the specified directory URI.
    /// </summary>
    /// <param name="uri">
    /// URI to the directory to scan for XAPs to add to the catalog.
    /// The URI must be absolute, or relative to <see cref="P:System.Windows.Interop.SilverlightHost.Source" />.
    /// </param>
    /// <param name="getFilesFromDirectory">
    /// The method to find files' URIs in the specified directory.
    /// </param>
    public DirectoryCatalog(Uri uri, Func<string, IEnumerable<Uri>> getFilesFromDirectory)
    {
        Contract.Requires(uri != null);
    
        this._uri = uri;
        this._getFilesFromDirectory = getFilesFromDirectory ?? this.GetFilesFromDirectory;
        this._webClient = new Lazy<WebClient>(() => new WebClient());
    
        // Initializes other members.
    }

    When the getFilesFromDirectory parameter is null, the above GetFilesFromDirectory() method will be used as default.

    Download the directory’s XAP file list

    Now a public method can be created to start the downloading:

    /// <summary>
    /// Begins downloading the XAP files in the directory.
    /// </summary>
    public void DownloadAsync()
    {
        this.ThrowIfDisposed();
    
        if (Interlocked.CompareExchange(ref this._state, State.DownloadStarted, State.Created) == 0)
        {
            this._webClient.Value.OpenReadCompleted += this.HandleOpenReadCompleted;
            this._webClient.Value.OpenReadAsync(this.Uri, this);
        }
        else
        {
            this.MutateStateOrThrow(State.DownloadCompleted, State.Initialized);
            this.OnDownloadCompleted(new AsyncCompletedEventArgs(null, false, this));
        }
    }

    Here the HandleOpenReadCompleted() method is invoked when the file list HTML is downloaded.

    Download all XAP files

    After retrieving all files’ URIs, the next thing becomes even easier. HandleOpenReadCompleted() just uses built in DeploymentCatalog to download the XAPs, and aggregate them into one AggregateCatalog:

    private void HandleOpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
    {
        Exception error = e.Error;
        bool cancelled = e.Cancelled;
        if (Interlocked.CompareExchange(ref this._state, State.DownloadCompleted, State.DownloadStarted) !=
            State.DownloadStarted)
        {
            cancelled = true;
        }
    
        if (error == null && !cancelled)
        {
            try
            {
                using (StreamReader reader = new StreamReader(e.Result))
                {
                    string html = reader.ReadToEnd();
                    IEnumerable<Uri> uris = this._getFilesFromDirectory(html);
    
                    Contract.Assume(uris != null);
    
                    IEnumerable<DeploymentCatalog> deploymentCatalogs =
                        uris.Select(uri => new DeploymentCatalog(uri));
                    deploymentCatalogs.ForEach(
                        deploymentCatalog =>
                        {
                            this._aggregateCatalog.Catalogs.Add(deploymentCatalog);
                            deploymentCatalog.DownloadCompleted += this.HandleDownloadCompleted;
                        });
                    deploymentCatalogs.ForEach(deploymentCatalog => deploymentCatalog.DownloadAsync());
                }
            }
            catch (Exception exception)
            {
                error = new InvalidOperationException(Resources.InvalidOperationException_ErrorReadingDirectory, exception);
            }
        }
    
        // Exception handling.
    }

    In HandleDownloadCompleted(), if all XAPs are downloaded without exception, OnDownloadCompleted() callback method will be invoked.

    private void HandleDownloadCompleted(object sender, AsyncCompletedEventArgs e)
    {
        if (Interlocked.Increment(ref this._downloaded) == this._aggregateCatalog.Catalogs.Count)
        {
            this.OnDownloadCompleted(e);
        }
    }

    Exception handling

    Whether this DirectoryCatelog can work only if the directory browsing feature is enabled. It is important to inform caller when directory cannot be browsed for XAP downloading.

    private void HandleOpenReadCompleted(object sender, OpenReadCompletedEventArgs e)
    {
        Exception error = e.Error;
        bool cancelled = e.Cancelled;
        if (Interlocked.CompareExchange(ref this._state, State.DownloadCompleted, State.DownloadStarted) !=
            State.DownloadStarted)
        {
            cancelled = true;
        }
    
        if (error == null && !cancelled)
        {
            try
            {
                // No exception thrown when browsing directory. Downloads the listed XAPs.
            }
            catch (Exception exception)
            {
                error = new InvalidOperationException(Resources.InvalidOperationException_ErrorReadingDirectory, exception);
            }
        }
    
        WebException webException = error as WebException;
        if (webException != null)
        {
            HttpWebResponse webResponse = webException.Response as HttpWebResponse;
            if (webResponse != null)
            {
                // Internally, WebClient uses WebRequest.Create() to create the WebRequest object. Here does the same thing.
                WebRequest request = WebRequest.Create(Application.Current.Host.Source);
    
                Contract.Assume(request != null);
    
                if (request.CreatorInstance == WebRequestCreator.ClientHttp &&
                    // Silverlight is in client HTTP handling, all HTTP status codes are supported.
                    webResponse.StatusCode == HttpStatusCode.Forbidden)
                {
                    // When directory browsing is disabled, the HTTP status code is 403 (forbidden).
                    error = new InvalidOperationException(
                        Resources.InvalidOperationException_ErrorListingDirectory_ClientHttp, webException);
                }
                else if (request.CreatorInstance == WebRequestCreator.BrowserHttp &&
                    // Silverlight is in browser HTTP handling, only 200 and 404 are supported.
                    webResponse.StatusCode == HttpStatusCode.NotFound)
                {
                    // When directory browsing is disabled, the HTTP status code is 404 (not found).
                    error = new InvalidOperationException(
                        Resources.InvalidOperationException_ErrorListingDirectory_BrowserHttp, webException);
                }
            }
        }
    
        this.OnDownloadCompleted(new AsyncCompletedEventArgs(error, cancelled, this));
    }

    Please notice Silverlight 3+ application can work in either client HTTP handling, or browser HTTP handling. One difference is:

    • In browser HTTP handling, only HTTP status code 200 (OK) and 404 (not OK, including 500, 403, etc.) are supported
    • In client HTTP handling, all HTTP status code are supported

    So in above code, exceptions in 2 modes are handled differently.

    Conclusion

    Here is the whole DirectoryCatelog’s looking:

    image

    Please click here to download the source code, a simple unit test is included. This is a rough implementation. And, for convenience, some design and coding are just following the built in AggregateCatalog class and Deployment class. Please feel free to modify the code, and please kindly tell me if any issue is found.

    Read more...