Contents tagged with General Software Development

  • Tracking My Internet Provider Speeds

    Of late, our broadband internet has been feeling sluggish. A call to the company took way more hold-time than I wanted to spend, and it only fixed the problem for a short while. Thus a perfect opportunity to play with some new tech to solve a problem, in this case, documenting a systemic issue from a service provider.

  • Creating high performance WCF services

    I had a WCF service where I wanted to be able to support over a hundred concurrent users, and while most of the service methods had small payloads which returned quickly, the startup sequence needed to pull down 200,000 records. The out of the box WCF service had no ability to support this scenario, but with some effort I was able to squeeze orders of magnitude performance increases out of the service and hit the performance goal.

    Initially performance was abysmal and there was talk of ditching WCF entirely ( and as the one pushing WCF technology on the project this didn't seem like a career enhancing change )

     

    Here's how performance was optimized. These are listed in the order they were implemented. Some are fairly obvious, others took some time to discover.  Each item represents, a significant increase in latency or scalability from the prior - and although I have internal measurement numbers, I'm not comfortable publishing them as the size of the data increased, and the testing approach changed.

    1. Use NetTCP binding
      This helps both throughput and the time it takes to open and close connections
    2. Use DataContract Serializer instead of XMLSerializer
      I started out using DataTables - POCO objects via Linq2Sql yielded a 6x increase
      slow: [OperationContract] MyDataTable GetData(...);
      fast: [OperationContract] IEnumerable<MyData> GetData(...);

    3. Unthrottle your service
      It's quite understanable that WCF is resistant to Denial of Service attacks out of the box, but it's too bad that it's is such a manual operation to hit the "turbo button". It would be nice if the Visual Studio tooling did this for you, or at least had some guidance (MS - hint, hint)

      The items to look at here are:
      1. <serviceBehaviors><serviceThrottling ...> set the max values high
      2. <dataContractSerializer maxItemsInObjectGraph="2147483647" />
      3. and under <netTcpBinding> setting the listenBacklog, maxConnections, and maxBuffer* value high
    4. Cache your data
      WCF, unlike ASP.Net has no built in facility to cache service responses, so you need to do it by hand. Any cache class will do.
    5. Normalize/compress your data
      this doesn't necessarily have to be done in the database, the Linq GroupBy operators make this easy to do in code. To clarify, say your data is kept in a denormalized table
      string Key1
      string Key2
      string Key3
      int val1
      int val2

      the bulk of the result set ends up being duplicate data
      LongKeyVal1 LongKeyVal2 LongKeyVal3 10 12
      LongKeyVal1 LongKeyVal2 LongKeyVal3 11 122
      LongKeyVal1 LongKeyVal2 LongKeyVal3 12 212
      so normalize this into
      LongKeyVal1 LongKeyVal2 LongKeyVal3
      10 12
      11 122
      12 212

      In code, given the following classes

      public class MyDataDenormalized
      {
          public string Key1 { get; set; }
          public string Key2 { get; set; }
          public string Key3 { get; set; }
          public int Val1 { get; set; }
          public int Val2 { get; set; }
      }
      public class MyDataGroup
      {
          public string Key1 { get; set; }
          public string Key2 { get; set; }
          public string Key3 { get; set; }
          public MyDataItem[] Values { get; set; }
      }
      public class MyDataItem
      {
          public int Val1 { get; set; }
          public int Val2 { get; set; }
      }

      you can transform an IEnumerable<MyDataDenormalized> into a IEnumerable<MyDataGroup> via the following

      var keyed = from sourceItem in source
                 group sourceItem by new
                 {
                     sourceItem.Key1,
                     sourceItem.Key2,
                     sourceItem.Key3,
                 } into g
                 select g;
      var groupedList = from kItems in keyed
                    let newValues = (from sourceItem in kItems select new MyDataItem() { Val1 = sourceItem.Val1, Val2= sourceItem.Val2 }).ToArray()
                    select new MyDataGroup()
                    {
                        Key1 = kItems.Key.Key1,
                        Key2 = kItems.Key.Key2,
                        Key3 = kItems.Key.Key3,
                        Values = newValues,
                    };
    6. Use the BinaryFormatter, and cache your serializations
      If you're willing to forgo over the wire type safety, the binary formatter is the way to go for scalability. Data caching has only a limited impact if a significant amount of CPU time is spent serializing it - which is exactly what happens with the DataContract serializer.

      The operation contract changes to
    7. [OperationContract]
      Byte[] GetData(...);

      and the implementation to

      var bf = new BinaryFormatter();
      using (var ms = new MemoryStream())
      {
          bf.Serialize(ms, groupeList);
      // and best to cache it too return ms.GetBuffer(); }

       

    Before items 4,5, and 6 the service would max out at about 50 clients ( response time to go way up and CPU usage would hit 80% - on a 8 core box). After these changes were made, the service could handle of 100 + clients and CPU usage flattened out at 30%

    Update: Shay Jacoby has reasonably suggested I show some code.

    Update2: Brett asks about relative impact. Here's a summary

    item latency scalability
    2) DataContract Serializer large large
    3) unthrottle small large
    4) cache data small  
    5) normalize data medium  
    6) cache serialization small large

     

    kick it on DotNetKicks.com

  • Scheduling PowerShell tasks without a console window

    Have you every wanted to use Windows Task Scheduler to run a PowerShell script on a frequent schedule, but hated how the console window would flash on the screen every time the script ran? Yeah, me too.

     

    Apparently the task scheduler API supports hiding the console window, but the command line and visual interface don't expose it. My solution, rather than working with the API, is to create the world's smallest PowerShell host and compile it as a windows mode executable.

    static class PoshExec
    {
        static void Main(string[] args)
        {
            (new System.Management.Automation.RunspaceInvoke()).Invoke(args[0]);
        }
    }
    > csc  /target:winexe PoshExec.cs /r:"c:\Program Files\Reference Assemblies\Microsoft\WindowsPowerShell\v1.0\System.Management.Automation.dll"

     

    Once that's done you have what you need to schedule silent tasks

    SCHTASKS /Create /SC MINUTE /MO 15 /TN ATaskName /TR "c:\devtools\PoshExec '& c:\devtools\myscript.ps1'"
    kick it on DotNetKicks.com

  • F# and CEP

    At the last New York .Net Meetup, Luke Hoban presented an overview of F#. Like everyone else who's catching the F# bug I was quite impressed with its succinctness, sequences, forward pipes, and support for asynchronous programming (called workflows).

    Could a reasonable amount of F# code replace the big expensive CEP engines we use?

    So I asked how to compute over a stream of data, using standard deviation is a simple example. Luke has a great write up on this topic.

    http://blogs.msdn.com/lukeh/archive/2008/10/10/standard-deviation-and-event-based-programming.aspx