Li Chen's Blog

  • Implement a simple priority queue

    .NET framework does not have a priority queue built-in. There are several open source implementations. If you do not want to reference an entire library, it is fairly easy to implement one yourself. Many priority queue implementations use heap. However, if the number of levels of priorities is small, it is actually very easy and efficient to implement priority queues using an array of queues. There is a queue implementation in the .net framework.

    My implementation is in the code below. The enum QueuePriorityEnum contains the number of levels of priorities. It is picked up automatically by the PriorityQueue class. The queue support 3 operations: Enqueue, Dequeue and Count. There behavior is modeled after the Queue class in the .net framework.

     

    using System;
    using System.Collections.Generic;
    using System.Linq;
    
    namespace MyNamespace
    {
        // Modify this enum to add number of levels. It will picked up automatically
        enum QueuePriorityEnum
        {
            Low = 0,
            High =1
        }
    
        class PriorityQueue<T>
        {
            Queue<T>[] _queues;
    
            public PriorityQueue()
            {
                int levels = Enum.GetValues(typeof(QueuePriorityEnum)).Length;
                _queues = new Queue<T>[levels];
                for (int i = 0; i < levels; i++)
                {
                    _queues[i] = new Queue<T>();
                }
            }
    
            public void Enqueue(QueuePriorityEnum priority, T item)
            {
    	    _queues[(int)priority].Enqueue(item);
            }
    
            public int Count
            {
                get
                {
                    return _queues.Sum(q => q.Count);
                }
            }
    
            public T Dequeue()
            {
    	    int levels = Enum.GetValues(typeof(QueuePriorityEnum)).Length;
    	    for (int i = levels - 1; i > -1; i--)
    	    {
    		if (_queues[i].Count > 0)
    		{
    			return _queues[i].Dequeue();
    		}
    	    }
                throw new InvalidOperationException("The Queue is empty. ");
            }
        }
    }
    
    
  • integrating external systems with TFS

    Recently, we need to integrate external systems with TFS. TFS is a feature-rich system and has a large API. The TFS sample on MSDN only scratch the top surface. Fortunately, a couple of good blog posts get me on the write direction:

    The key is to use the VersionControlService. We need to reference the following assemblies:

    Microsoft.TeamFoundation.Client.dll

    Microsoft.TeamFoundation.Common.dll

    Microsoft.TeamFoundation.VersionControl.Client.dll

    Microsoft.TeamFoundation.VersionControl.Common.dll

    The code would be something like:

    using Microsoft.TeamFoundation.Client;
    using Microsoft.TeamFoundation.Framework.Common;
    using Microsoft.TeamFoundation.Framework.Client;
    using Microsoft.TeamFoundation.VersionControl.Client;
    
    ...
    
    TfsTeamProjectCollection pc = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(tfsUri);
    VersionControlServer vcs = pc.GetService();
    

    Then to check whether a file exists, we can use:

    vcs.ServerItemExists(myPath, ItemType.File)

    Or check if a directory exists:

    vcs.ServerItemExists(myPath, ItemType.Folder)

    To get a list of directories or files, we can use the GetItems method. TFS is far more complicated than a file system. We can get a file, get the history of a file, get a changeset, etc. Therefore, the GetItems method has many overloads. To get a list files, we can use:

    var fileSet = vcs.GetItems(myPath, VersionSpec.Latest, RecursionType.OneLevel, DeletedState.NonDeleted, ItemType.File);
    foreach (Item f in fileSet.Items)
    {
        Console.WriteLine(f.ServerItem);
    }

    Or get a list of directories:

    var dirSet = vcs.GetItems(myPath, VersionSpec.Latest, RecursionType.OneLevel, DeletedState.NonDeleted, ItemType.Folder);
    foreach (Item d in dirSet.Items)
    {
        Console.WriteLine(d.ServerItem);
    }
  • Converted ASP Classic Compiler project from Mercurial to Git

    Like some other open source project developers, I picked the Mercurial as my version control system. Unfortunately, Git is winning in the Visual Studio echo systems. Fortunately, it is possible to contact Codeplex admin for manual conversion from Mercurial to Git. I have done exactly that for my open source ASP Classic Compiler project. Now I can add new examples in response to forum questions and check them in using my Visual Studio 2013. Now I am all happy.

  • Video review: RESTful Services with ASP.NET Web API from PACKT Publishing

    Disclaimer: I was provided this video for free by PACKT Publishing. However, that does not affect my opinion about this video.

    Upon request by PACKT Publishing, I agreed to watch and review the video “RESTful Service with ASP.NET Web API” by Fanie Reynders. Prior to the review, I have a EBook called “Designing Evolvable Web APIs with ASP.NET - Harnessing the power of the web” by Glenn Block, Pablo Cibraro, Pedro Felix, Howard Dierking and Darrel Miller from O’Reilly. I also have access to several videos on the Pluralsight.com on the same subject. So I would put my review in perspective with those other materials.

    The video from Packt has the length of 2 hours 4 minutes. It gave a nice overview over the ASP.NET Web API. The video is available for watch online or for downloading to watch offline. The video has 8 chapters. It covers the ASP.NET Web API in a clear and accurate way and is surprisingly complete for this short length.

    In comparison with other resources, I would recommend you get this video if you have never worked with ASP.NET Web API before and want to get a complete overview in a short time possible.

    If you love video, Pluralsight is offering the equally good and slightly longer (3h 15m)  “Introduction to the ASP.NET Web API” by Jon Flanders. You would need subscription to access the exercise materials thought. If you do subscribe, Pluralsight also has a couple of intermediate level videos by Shawn Wildermuth.

    Lastly, if you want an in depth book that you can use for a extended period of time, it is hard to beat Glenn Block, etc.’s book from O’Reilly.

  • LINQ query optimization by query rewriting using a custom IQueryable LINQ provider

    In my SkyLinq open source project so far, I tried to make LINQ better and created several extension methods (for example, see GroupBy and TopK) that are more memory efficient than the standard methods in some scenarios. However, I do not necessarily want people to bind into these extensions methods directly for the following reasons:

    1. Good ideas evolve. New methods may come and existing methods may change.
    2. There is a parallel set of methods in IEnumerable<T>, IQueryable<T> and to some degree, in IObservable<T>. Using custom LINQ extension breaks the parallelism.

    A better approach would be to write the LINQ queries normally using the standard library methods and then optimize the queries using a custom query provider. Query rewriting is not a new idea. SQL server has been doing this for years. I would like to bring this idea into LINQ.

    If we examine the IQueryable<T> interface, we will find there is a parallel set of LINQ methods that accepts Expression<Func> instead of <Func>.

    image

    There is a little C# compiler trick here. If C# see a method that accepts Expression<Func>, it would create Expressions of Lamda instead of compiled Lambda expression. At run time, these expressions are passed to an IQueryable implementation and then passed to the underlying IQueryProvider implementation. The query provider is responsible for executing the expression tree and return the results. This is how the magic of LINQ to SQL and Entity Frameworks works.

    The .net framework already has a class called EnumerableQuery<T>. It is a query provider than turns IQueryable class into LINQ to objects queries. In this work, I am going one step further by creating an optimizing query provider.

    A common perception of writing a custom LINQ provider is that it requires a lot of code. The reason is that even for the most trivial provider we need to visit the every possible expressions in the System.Linq.Expressions namespace. (Note that we only need to handle expressions as of .net framework 3.5 but do not need to handle the new expressions added as part of dynamic language runtime in .net 4.x). There are reusable framework such as IQToolkit project that makes it easier to create a custom LINQ provider.

    In contrast, creating an optimizing query provider is fairly easy. The ExpressionVistor class already has the framework that I need. I only needed to create a subclass of ExpressionVisitor called SkyLinqRewriter to override a few methods. The query rewriter basically replaces all call to the Queryable class with equivalent calls to the Enumerable class and rewrite queries for optimization when an opportunity presents.

    It is fairly easy to consume the optimizing query provider. All we need is to call AsSkyLinqQueryable() to convert IEnumerable<T> to IQueryable<T> and remaining code can stay intact:

    image

    An end-to-end example can be found at https://skylinq.codeplex.com/SourceControl/latest#SkyLinq.Example/LinqToW3SVCLogExample.cs.

    To conclude this post, I would recommend that we always code to IQueryable<T> instead of IEnumerable<T>. This way, the code has the flexibility to be optimized by an optimizing query provider, or be converted from pull based query to push based query using Reactive Extension without rewriting any code.

  • Expression tree visualizer for Visual Studio 2013

    Expression tree visualizer, as the name indicates, is a Visual Studio visualizer for visualizing expression trees. It is a must if you work with expressions frequently. Expression Tree Visualizer is a Visual Studio 2008 sample. There is a Visual Studio 2010 port available on codeplex. If you want to use it with a later version of Visual Studio, there is not one available. Fortunately, porting it to another version of Visual Studio is fairly simple:

    1. Download the original source code from http://exprtreevisualizer.codeplex.com/.
    2. Replace the existing reference to Microsoft.VisualStudio.DebuggerVisualizers assembly to the version in the Visual Studio you want to work with. For Visual Studio 2013, I found it in C:\Program Files (x86)\Microsoft Visual Studio 12.0\Common7\IDE\ReferenceAssemblies\v2.0\Microsoft.VisualStudio.DebuggerVisualizers.dll on my computer.
    3. Compile the ExpressionTreeVisualizer project and copy ExpressionTreeVisualizer.dll to the visualizer directory. To make it usable by one user, just copy it to My Documents\VisualStudioVersion\Visualizers. To make it usable by all users of a machine, copy it to VisualStudioInstallPath\Common7\Packages\Debugger\Visualizers.

    If you do not want to walk through the process, I have one readily available at http://weblogs.asp.net/blogs/lichen/ExpressionTreeVisualizer.Vs2013.zip. Just do not sue me if it does not work as expected.

  • An efficient Top K algorithm with implementation in C# for LINQ

    The LINQ library currently does not have a dedicated top K implementation. Programs usually use the OrderBy function followed by the Take function. For N items, an efficient sort algorithm would scale O(n) in space and O(n log n) in time. Since we are counting the top K, I believe that we could devise an algorithm that scales O(K) in space.

    Among all the sorting algorithms that I am familiar, the heapsort algorithm came to my mind first. A Heap has a tree-like data structure that can often be implemented using an array. A heap can either be a max-heap or a min-heap. In the case of min-heap, the minimum element of at the root of the tree and all the child elements are greater than their parents.

    Binary heap is a special case of heap. A binary min-heap has the following characteristics:

    • find-min takes O(1) time.
    • delete-min takes O(log n) time.
    • insert takes O(log n) time.

    So here is how I implement my top K algorithm:

    1. Create a heap with an array of size K.
    2. Insert items into the heap until the heap reaches its capacity. This takes K O(log K) time.
    3. For each the remaining elements, if an element is greater than find-min of the heap, do a delete-min and then insert the element into the heap.
    4. Then we repeated delete-min until the heap is empty and arrange the deleted element in reverse order and we get our top 10 list.

    The time it takes to find top K items from an N item list is:

    O(N) * t1 + O((K + log N - log K) * log K) * t2

    Here t1 is the time to compare an element in step 3 to find-min and t2 is the combined time of delete-min and the subsequent insert. So this algorithm is much more efficient that a call to OrderBy followed by a call to Take.

    I have checked-in my implementation to my Sky LINQ project. The LINQ Top and Bottom is at in LinqExt.cs. The internal binary heap implementation is in BinaryHeap.cs. The LINQ example can be found in HeapSort.cs. The Sky Log example has also been updated to use the efficient Top K algorithm.

    Note that this algorithm combined OrderBy and Take by GroupBy itself still uses O(N) splace where N is number of groups. There are probabilistic approximation algorithms that can be used to further alleviate memory foot print. That is something I could try in the future.

  • My first impression on Tx (LINQ to Logs and Traces)

    Yesterday, I learnt about Tx (LINQ to Logs and Traces) from This Week On Channel 9. The name suggest that it has overlap with my end-to-end example on LINQ to W3SVC log files. It turns out that the overlap is minimal.

    Tx is a Microsoft Open Technologies project. The source code commit history suggest it was originally part Reactive Extension (Rx) developed by the Rx team and became an independent project a year ago. Tx heavily uses Rx.

    The Tx project has a page on “When to use Tx?” and I will quote a portion of it below.

    When NOT to use Tx

    Use only LINQ-to-Objects when:
    • There are no real-time feeds involved
    • The data is already in memory (e.g. List <T>), or you are dealing with single file that is easy to parse - e.g. a text file, each line representing event of the same type.
    Example how to parse text files using LINQ-to-Objects is W3CEnumerable.cs. This parses the W3C logs from IIS
    Use only
    Reactive Extensions (Rx) when:
    • There are real-time feeds (e.g. mouse moves)
    • You need the same query to work on past history (e.g. file(s)) and real-time
    • Each feed/file contains a single type of events, like T for IObservable<T>

    When to use Tx

    Tx.Core adds the following new features to Rx:
    • Support for Multiplexed sequences (single sequence containing events of different types in order of occurence). The simplest example of turning multiplexd sequence into type-specific Obseravable-s is the Demultiplexor
    • Merging multiple input files in order of occurence - e.g. two log files
    • Hereogeneous Inputs - e.g. queries across some log files (.evtx) and some traces (.etl)
    • Single-pass-read to answer multiple queries on file(s)
    • Scale in # of queries. This is side effect of the above, which applies to both real-time and single-read of past history
    • Providing virtual time as per event timestamps. See TimeSource
    • Support for both "Structured" (like database, or Rx) and "Timeline" mode (like most event-viewing tools)

    The Tx W3CEnumerable.cs does have overlap with my implementation of log parser. I wish I had noticed the project earlier. In the future, I will try to align my project with Tx to make it easier to use them together. I just had an implementation of duct-typing so interoperating different types from the same idea should be easy.

  • A c# implementation of duck typing

    Eric Lippert’s blogs (and his old blogs) have been my source to the inner working of VBScript and C# in the past decade. Recently, his blog on “What is ‘duck typing’?” has drawn lots of discussions and generated additional blogs from Phil Haack and Glenn Block. The discussion injects new ideas into my thoughts on programming by composition. I previously argued that interfaces are too big a unit for composition because they often live in different namespaces and only a few of the well-known interfaces are widely supported. I was in favor of lambdas (especially those that consume or return IEnumerable<T>) as composition unit. However, as I worked through my first end-to-end example, I found that lambdas are often too small (I will blog about this in details later).

    Duck typing may solve the interface incompatibility problem in many situations. As described in Wikipedia, The name came from the following phrase:

    “When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.”

    If I would translate it into software engineering terms, I would translate it as:

    The client expects certain behavior. If the server class has the behavior, then the client can call the server object even though the client does not explicitly know the type of the server objects.

    That leads to the following implementation in a strongly-typed language like c#, consisting of:

    1. An interface defined by me to express my expected behaviors of duck, IMyDuck.
    2. An object from others that I want to consume. Let’s call its type OtherDuck.
    3. A function that checks if Duck has all the members of IMyDuck.
    4. A function (a proxy factory) that generates a proxy that implements IMyDuck and wrap around OtherDuck.
    5. In my software, I only bind my code to IMyDuck. The proxy factory is responsible for bridging IMyDuck and OtherDuck.

    This implementation takes the expected behaviors as a whole. Except for the one time code generation for the proxy, everything else is strongly-typed at compile time. It is different to the “dynamic” in c# which implements late binding at the callsite. 

    I have checked my implementation into my Sky LINQ project under the c# project of SkyLinq.Composition. The name of the class is DuckTypingProxyFactory. The methods look like:

    image

    I have also provided an example called DuckTypingSample.cs in the SkyLinq.Sample project.

    So when would duck-typing be useful? Here are a few:

    1. When consuming libraries from companies that implement the same standard.
    2. When different teams in a large enterprise each implement the company’s entities in their own namespace.
    3. When a company refactors software (see the evolution of asp.net for example).
    4. When calling multiple web services.

    Note that:

    1. I noticed the impromptu-interface project that does the same thing after I completed by project. Also, someone mentioned TypeMock and Castle in the comment of Eric’s log. So I am not claiming to be the first person with the idea. I am still glad to have my own code generator because I am going to use it to extend the idea of strongly-typed wrapper as comparing to data-transfer-objects.
    2. Also, nearly all Aspect-Oriented-Programming (AOP) and many Object-Relational-Mapping (ORM) frameworks have the similar proxy generator. In general, those in AOP frameworks are more complete because ORM frameworks primary concern with properties. I did looked at the Unity implementation when I implemented mine (thanks to the Patterns & Practices team!). My implementation is very bare-metal and thus easy to understand.
  • SkyLog: My first end-to-end example on programming by composition

    I have completed my first end-to-end example as I am exploring the idea of programming by composition. SkyLog is an example on querying IIS Log files like what LogParser do, except:

    1. It is built on LINQ and thus can use the entire .net framework, not just the built-in functions of LogParser.
    2. It can use Parallel LINQ and thus uses multiple cores effectively.
    3. Any part of a query can be refactored into a function and thus reused.
    4. We just need to write the queries once and the queries can be used against one log file, a whole directory of log files, several servers, and in any storage, such as Azure Blog Storage.

    The running example are available at http://skylinq.azurewebsites.net/skylog. The example runs the query against some log files stored in Azure Blog Storage.

    image

    The queries in the example are from http://blogs.msdn.com/b/carlosag/archive/2010/03/25/analyze-your-iis-log-files-favorite-log-parser-queries.aspx.

    The queries look even simpler than what you normally write against LogParser or SQL Server:

    image

    That is because part of the queries were refactored into two reusable functions below. We cannot reuse part of query with LogParser. With SQL Server, we can create reusable units with View. With LINQ, we can reuse any part of the LINQ query.

    image

    Note that I used the memory efficient version of GroupBy written by me previously. The equivalent code using the GroupBy in LINQ is commented out.

    To run these reports, we just need records from the log files as IEnumerable<string>. To get them from a directory, the code would look like:

    image

    To get records from Windows Azure Blob Storage, the could would look like:

    image

    Lastly, we need to convert IEnumberable<string> to strongly-typed wrapper, IEnumerable<W3SVCLogRecord>, to be consumed by the reports:

    image

    I touched upon the idea of strongly-typed wrapper previously. The purpose of strongly-typed wrapper is to allow intelligence while minimize the garbage collection. The wrapper provides access to underlying data, but does not convert the data to a data-transfer-object.

    The front-end is a simple ASP.NET MVC page. Reports are exposed through ASP.NET Web API using the CSV formatter that I built previously. The visualization was built using D3.js. The code is so simple that we could think each line of the code as Javascript based widgets. If we desire more sophisticated UI, we could compose the UI with a different widget. We could even introduce a client-side template engine such as Mustache.js.

    image

    Even with simple example, there are enough benefits in it so that I probably would never need to use LogParser again. The example is fully built on LINQ to IEnumerable. I have not used any cool feature from IQueryable and Rx yet. As usually, the latest code is in http://skylinq.codeplex.com. Stay tuned for more fun stuff.