in

ASP.NET Weblogs

Sijin Joseph's blog

My experiences with .Net
  • A case study in micro-optimization (Generating permutations)

    I've blogged about my experiences in coming up with a fast solution to cedric's coding challenge in C# over at my personal blog. Couldn't bear the thought of the fastest solution being a Java one ;-)

     http://www.indiangeek.net/2008/08/29/a-case-study-in-micro-optimization/

  • What will be the next generation internet application platform?

    A few years ago I was a firm believer in the Rich Connected Client application model, which was based on running applications installed locally on the users desktop. From the time of the Ajaxian explosion, the quality and quantity of Ajax based web applications has continued to increase, applications like FaceBook have introduced new paradigms whereas apps like Live Maps have made existing apps much more convenient and accessible. Today you have to really argue hard to even consider a desktop based application for anything that is non-computation intensive (Even this category is questionable now, for e.g. a few years back movie editing web apps would have been out of the question). So what is it that makes the web such a successful application platform
    • Uniform and simple model (Web Browser, urls, can click when hand is visible) - Once a user learns the basics of working with a web application that knowledge can be easily applied to other applications.
    • Client platform independence - The decoupling of the server and client with an agreed contract (HTML+CSS+JS) means that the traditional problems of targetting various platforms with different APIs is no longer existent on the client side.
    • Machine independence - The user is no longer restricted to the machine on which the application was installed. This also results in a much simpler deployment model.
    • Data independence - The user's data is now available on the network which means that not only can the user run the application from anywhere but can also access his data from anywhere.
    Now what would the next generation internet application platform look like? I think that in addition to the above characteristics, the next generation of platforms would involve the following.
    • Full use of computing resources available locally - Having a powerful CPU and GPU seems like such a waste when all your applications have to be funnelled through the browser. So the next generation platform would allow access to the computing power available locally.
    • Better integration with the local resources - This is sort of related to the point above, but would allow internet applications to access local disks, settings, registry etc.
    • Better security model - Of course all this has already been attempted with ActiveX and XPCOM, but the security models there have been weak and non-intutive to users, a better solution is needed.
    So it looks like the direction being taken by Microsoft Silverlight and Adobe AIR are steps in the right direction to building the next generation internet application platform. However Microsoft has a great oppurtunity here push the envelope with Silverlight and introduce new standards for desktop integration of internet applications, their extensive user base means that any API created by them has a very good chance of being successful and catching on with the other players in this space.
  • Subversion as a deployment tool

    I was thinking on the way to work today that subversion would be a great tool to overcome some of the difficulties associated with frequent deployments to the web serevers. Here's how I see it working
    1. Create a production/live build folder in your source tree and add it to the repository.
    2. Modify our build system to create the live builds in this folder and commit to the repository.
    3. On the live server the site is deployed as a checkout of the live build folder.
    4. Once the build passes unit tests and QA all we need to deploy is to update the working copy on the live server. The big advantage here is that rollbacks etc. are automatically handled because we can always roll back to a previous version. Also you get a nice history of all the updates to the live server.
  • Programmer Competency Matrix

    Having worked with programmers with an extreme variance in skills, I sometimes get the feeling that there is an big lack of good programmers but when I thought about it a little more I realized that it's not very clear cut, some of the programmers have strong areas and if you confine the tasks into their strong areas then they tend to deliver well. So I started thinking about all the lines on which we can evaluate a programmer, here's what I have so far...

    Programmer Competency Matrix (the table is too big to fit on this blog post and needs a whole page of it's own)

    After having spent a whole afternoon on this I realize that even this is not comprehensive, this matrix is more biased towards non-visual programmers, so a big majority of web devs will not be able to relate well to this matrix, but I am tired and will come back to this at a later time.

  • An alternative model of computation for concurrency

     I recently came across an old article that I had written for my company newsletter, it's always fun to discover old stuff that you've written and see how much your perception has changed since then. Copied verbatim below, this was a print article which is why the links are not hyperlinked.


    Concurrency is one of the hot subjects in computer science today. This has partly to do with the fact that processors (1) are reaching their physical limits and thus we need to start looking at new avenues of achieving performance. Herb Sutter the renowned author has written an excellent article (2) named “The Free Lunch Is Over - A Fundamental Turn Toward Concurrency in Software” which captures very beautifully why concurrency is going to be important in the years ahead.

    Let us look at a few issues surrounding concurrency today and also look at an alternative model of computation that solves these problems.

    The most prevalent model of computation in both hardware and software today is the Von Neumann architecture (3) which is an implementation of the Turing (4) machine which in turn is based on the work of Alan Turing(5).  Programs written in this model have a shared memory area also known as the store which can be to store data. The store is divided into chunks called cells and named cells are called variables.

    A function written in the Neumann model can make use of the store to aid in its computation, thus any time a function uses data other than those provided as parameters to the function, it is in fact using the store.

    The alternative model of computation we'll look at is called the functional model. The functional model is based on the work of Alonzo Church (6) called Lambda Calculus(7), who came up with this model at approximately the same time as Alan Turing. The good thing though is that both models of computation are equivalent i.e. any computation that can be expressed in one model can also be expressed in the other.

    The functional model is based on mathematics, in this model there is no store and all computation is done by evaluating mathematical functions. A mathematical function is one in which the value of a function is totally dependent on the values of the parameters and not on any external state. Also the term variable in this model refers to a named value and not to a named cell.

    Consider a simple function
    f(x) = g(x) + h(x)

    The function f takes x as a parameters and returns the result of evaluating function g with parameter x and adding it to the result of evaluating the function h with parameter x.

    In Neumann style programs it is possible that f will return a different output for the same value of x, where as in functional style programs it is guaranteed that no matter how many times f is called with a particular value of x, the output will always be the same. This is because functional programs do their computations without the use of any external state whereas the computations in Neumann style programs are affected by the state of the store.

    Let us understand this with a concrete example
    int y = 2;
    foo(x)
    {
        return x *y;
    }

    bar(x)
    {
        int y = 2;
        return x * y;
    }

    The function foo uses a variable y from the free store to do its computation, thus if some other function were to alter the value of the cell y, foo would start returning different results.

    On the other hand bar does not use any external state and for any value of x it will always return twice of x.

    It is possible to write functional programs in Neumann languages by totally avoiding the use of store and modeling the entire computation using only functions that use local variables. On the other hand functional languages do not have the concept of a shared store so writing Neumann style programs in them is not possible.

    Now let's take a look at how these computation models interact with concurrency. There are two ways parallelize programs, implicitly and explicitly.

    In implicit parallelization, the compiler does the hard work of looking at the program and deciding what instruction sequences can be parallelized. This is the most commonly used method of parallelization and in fact most of the programmers are not even aware that the compiler does this under the hood.

    Neumann style programs are hard to optimize for parallelization since functions can be implicitly dependent on each other via the store. For e.g.
    baz(x)
    {
        foo(x);
        return bar(x);
    }

    int y = 0;
    foo(x)
    {
        y = 2 * x;
    }

    bar(x)
    {   
        return y * 6;
    }
    In this case foo and bar need to execute sequentially because they make use of the cell 'y' from the store. This is an extremely simple example but hidden dependencies like this can get incredibly complicated to figure out. Thus the compiler has to be conservative about what it can parallelize and only parallelize code that it absolutely knows is safe to do so which unfortunately is not a lot.

    On the other hand since functions written in a functional language are not dependent on external state they can be very easily parallelized by evaluating different functions in parallel.
    For e.g.
    baz(x)
    {
        foo(x);
        bar(x)
        combo(foo(x));
    }
    Since foo, bar and combo are purely mathematical functions, the evaluation of baz can be parallelized by evaluating foo and bar in parallel and then evaluating combo, since combo depends on the value from foo.

    Now let’s turn our attention to explicit parallelism, where the programmer explicitly programs parallelism. This is done by creating parallel execution sequences called threads that are scheduled by the operating system to execute on the available processors.

    In the Neumann style the problem that arises with explicit parallelism is that since all threads share the same common store they need to be careful not to overwrite the data of other threads. The canonical way to handle this has been to but checks around code that accesses shared data to ensure that only a single thread has access to shared data at any point of time. This blocking behavior in turns leads to issues like deadlock where all threads keep waiting for access to a shared location on the store. Further the complexity of this solution increases exponentially with the number of threads, number of shared data areas and the number of places where the shared data is accessed.

    A better solution to this problem is actually doing the lock on the memory itself rather than on the code that accesses the memory. Transactional memory systems (8) have started making their way into mainstream computing but they're still a long way off from being an ideal solution.

    In 1978 C.A.R Hoare(9) (best known for inventing QuickSort) wrote a paper titled Communicating Sequential Processes (10), this classic paper introduces a new way to tackle the problem of explicit parallelism, which is in a way similar to the model used by functional programming languages. In this model concurrent processes share state information by passing messages to each other rather than sharing a common store. These messages can be passed asynchronously or synchronously. Thus the whole problem of preventing access to shared data is made non-existent.

    There is one language today that combines aspects of functional programming and message passing concurrency to come up as an ideal distributed computing platform. That language is Erlang (11) from Ericsson laboratories.  This language and platform came about as the result of research in 1980s to find out which features of computer languages were suited for programming telecommunication systems. In addition to concurrency, this language has some other cool features like error recovery and hot updates that enable any system built using Erlang to keep on running. Joe Armstrong, the brains behind Erlang has written a very good article (12) on all the fuss about Erlang and I would encourage anyone who is interested to take a look at it.

    I had originally intended for this article to be an introduction to Erlang but ran out of my 1000 word limit even before I could get past the introduction, thus I decided to pick up the one big feature of Erlang which is concurrency and focus on that. In terms of support for reliability and concurrency, Erlang is unrivaled and I believe that in the years ahead some aspects of Erlang or Erlang itself will become mainstream the way has Ruby has.

    References
    1.    www.cise.ufl.edu/research/revcomp/physlim/PhysLim-CiSE/PhysLim-CiSE-5.doc - Physical limits of computing.
    2.    http://www.gotw.ca/publications/concurrency-ddj.htm - The free lunch is over.
    3.    http://en.wikipedia.org/wiki/Von_Neumann_architecture - The Von Neumann architecture.
    4.    http://en.wikipedia.org/wiki/Turing_machine - Turing machine.
    5.    http://en.wikipedia.org/wiki/Alan_Turing - Alan Turing
    6.    http://en.wikipedia.org/wiki/Alonzo_Church - Alonzo Church
    7.    http://en.wikipedia.org/wiki/Lambda_calculus - Lambda calculus
    8.    http://en.wikipedia.org/wiki/Transactional_memory - Software Transactional Memory
    9.    http://en.wikipedia.org/wiki/C._A._R._Hoare -  C.A.R. Hoare
    10.    http://en.wikipedia.org/wiki/Communicating_sequential_processes - Communicating Sequential Processes
    11.    http://www.erlang.org/ - Erlang
    12.    http://www.pragmaticprogrammer.com/articles/erlang.html -  More about Erlang
    13.    http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=10142 - Concepts, Techniques, and Models of Computer Programming – This is an amazing book that talks about the various models of computation.
     

  • First alternative to google that I actually liked

    SearchMe - Still in beta and I was able to get an account easily. Check out the screenshot of their search results

    Not only are the results categorized but I loved the visual search results that don't require me to open the page in another window to further filter interesting results.

  • How to leak memory in .Net - Statics and Event Handlers

    For the past few days I’ve been investigating some memory leak issues in our desktop application. The problem started showing up when we saw that opening new documents and then closing them didn’t have any negative impact on the memory usage. Initial tests using vadump and process explorer confirmed that there was an issue and so we the developers started looking into it.

    Initially it looked like the problem was that certain event handlers were causing references to closed documents to hang on a couple of ones that I remember are Application.Idle and SystemEvents.PowerModeChanged. Next there were some references via event handlers that were being held by Singleton objects and some service objects and those were easily handled as well.

    After this we could see that the references were still hanging around, btw we were using a combination of ANTS profiler, WinDBG (dumpheap + gcroot) and the VS.Net debugger all this while to investigate the issue. After fixing the obvious issues we struggled to find the root cause of the memory leaks. Then I looked around for alternate profilers and came across .Net Memory Profiler from SciTec, using this gave a much clearer picture into the issue, you see the new profiler gave you allocation stacks for all references and the ability to reflect over the instance fields, using this I started seeing that two third-party components that we were using were causing the issue.

    Basically both third-party components, one a very well known UI toolkit and another one that provides skinning support to controls were storing references to controls in static hashtables. In one toolkit a bug in the code caused the reference from the hashtable to remain even after it was not required and in the second case I think those guys just didn’t know how to remove an entry from a hashtable, they were simply setting the value of the key to be null causing a reference to the key to be held by the static hashtable.

    We now need to hack around these issues either by using Reflection or getting an updated build from the vendors.

    Some of the lessons I’ve learnt from this

    • Always, Always have source code for any third-party component that you are using in your application. For any non-trivial usage you’ll always end up fixing bugs in the component.
    • When putting any data in static fields, double check to make sure that it’s really required and keep in mind the memory impact of the decision also provide a clean API to clean up the static data.
    • If your’re hooking onto an event then make sure to unhook when it’s no longer required.
  • .Net framework hotfix wreaks havoc

    Last week all of us were baffled when suddenly one part of our application that uploads files to a FTP server stopped working. The strange thing was that the same build has been working without any issues for the past one week. We looked at everything that could have gone wrong, server, configuration, code but everything was setup fine and hadn't been changed. Also interestingly it stopped working for everyone except the developer who was responsible for the feature.

    The first thing we did was to enable detailed logging to see what was happening, the logs showed two problems
    1. We were incorrectly formatting the path of file to upload
    2. The .Net framework code was changing folders after login to the root folder of the ftp where it didn't have permissions to upload the file
    Further investigation showed that the second issue was not coming on the developer's machine. Most puzzling indeed.

    Then I remembered that last week .Net had issued a critical hotfix for .Net 2.0, could this be the issue. We verified that the developer didn't have the hotfix and all machines which were failing did have, Strike 1! Next we uninstalled the hotfix from one of the machines and the FTP uploads started working, Strike 2!! Finally we fixed the incorrect formatting of the ftp url and the issue got resolved on all machines with or without the hotfix, Strike 3! Issue resolved!

    The problem was that the hotfix changed the implementation of the FTP code inside the .Net framework so that it behaved differently when passed an incorrectly formatted url.

    This was the first time I saw a working app fail because of the way an incorrect argument was handled by a newer version of the framework. It was a good learning experience though :) Also this strengthens my belief in asserting all assumptions in code because if we had asserted that the url was infact of the format that we were expecting, this issue would never have happened in the first place.
  • In defense of hacking

    I read a very interesting essay today - Hacknot - To Those About to Hack
    that talks about why planning upfront always pays in the long run. There is a very nice story that illustrates the value of planning upfront.

    I think that when people write essays like this they tend to provide an analogy that suits the point that they’re trying to make, for e.g. in this case Pro BDUF and Agile bashing.

    There are a couple of reasons why the analogy is not quite relevant in this case. Firstly software is not like chopping wood, it’s not like construction infact any comparison that tries to compare software development with any kind of physical object creation is flawed. Physical objects have limitations with respect to the time and effort required to shape them or construct them. The values of these physical constants are irrelevant when it comes to software and in some cases the physical limitations do not exist at all.

    Secondly the requirements in almost every software product that I’ve worked on always change after the initial code has been implemented, first because the customer/user is himself not very clear on what is required. It’s quite difficult to describe a large state machine for a CS grad let alone a layman. Also usability design itself is an iterative process for for a product with a UI the requirement churn rate is absurdly high when compared to any physical engineering activity.

    So given that requirements are bound to change doesn’t it make sense to practice the one thing that you know for sure is going to happen i.e. change. No amount of planning is going to prepare you for change, you need to practice for change day in and day out by following the training regime of agile methods. TDD, pair programming, daily meetings, refactoring, rejection of BDUF etc. these things prepare the programmer for the inevitable.

    I can imagine a version of the story that favors the Agile camp in which the carpenter in the middle of the day decides that he does not need a big log of wood at all! But the fact that wood chopping is a physical activity again prevents me from going ahead on that analogy.

  • What they don't teach you in CS class

    Software Engineering!!!

    A scientist builds in order to learn; an engineer learns in order to build. - Fred Brooks in the Mythical Man-Month.

    Following up on my post about the need for a CS degree for programmers, I had started writing this post on how software engineering requires a different set of skills than what is required for a computer scientist. But then I saw that most of what I wanted to say had already been very well captured by a lot of other very famous people, so instead of reiterating, I’ll be posting links to some good reads on this topic.

    But before that here is a quick summary of what I think are the most important skills for a programmer, which has somethings in common with those required by a computer scientist but also some that are not.

    • Given a system, have a very good understanding of it’s rules. The systems that a programmer typically works with are the language, the OS, the implementation platform(Java, .Net, Python etc.) and libraries. This knowledge is essential when writing code as well as when debugging issues. Most good programmers have encyclopedic knowledge of the systems that they’re working with, one of the best examples that I can think of is Raymond Chen.
    • Be able to come up with efficient ways to get a particular task done using the rules of the system. I think this is something that you’re born and although this can potentially be learnt, I think the best programmers have an innate talent for this aspect of programming. Some common techniques for solving problems are taught in CS class, but the ones most used in reality are mostly based on common sense. One of the most excellent books that I’ve read on abstract problem solving is “How to solve it: Modern Heuristics” by Zbigniew Michalewicz and David B. Fogel
    • Be able to express their thoughts in a manner that can be easily understood by other programmers. This aspect is something that can only be learnt by experience. This is one area that is very important yet gets very little weightage in CS class. I’ve seen some extremely unreadable code, that when deciphered showed extraordinary problem solving ability. For examples, browse some of the solutions submitted by top rankers at TopCoder.com. One of the best books on this aspect and my recommendation as a first book for any programmer is Code Complete by Steve McConnell
    • Be a good problem solver, this includes having related abilities like systematic elimination of possibilities to reach a solution, hypothesis testing to narrow down causes etc. This again is something that you’re born with and can potentially be learnt to some level. Best book on this aspect that I’ve read is Debugging Applications by John Robbins of NuMega, although this book is windows specific, some of the chapters that deal with debugging strategies and techniques to prevent bugs are invaluable.
    • Use your experience to prevent mistakes. This is another area about which very little is written but you can easily make out professional code by the way in which bugs are fixed. Newbies tend to fix the bug at the point of its happening and that’s it, a professional on the other hand thinks about what caused this kind of error to get introduced in the first place and then puts in checks to ensure that similar kinds of issues don’t enter into code and if they do then get flushed out immediately. Also when faced with similar types of problems, good programmers are able to look at the meta problem and come up with reusable solutions for them.
    • Experience, nothing can compare to having written and maintained 1000000+ lines of code.

    And here are the links on the CS vs SE question…

More Posts Next page »