Tales from the Evil Empire

Bertrand Le Roy's blog

News


Bertrand Le Roy

BoudinFatal's Gamercard

Tales from the Evil Empire - Blogged

Blogs I read

My other stuff

Archives

February 2005 - Posts

Clean Office automation on the server... at last!
One of the very common requests we see on the ASP.NET forums is how to generate Excel or Word documents on the server? There are currently three approaches to answer this need:
  1. Output HTML or XML and just change the mimetype so that the relevant Office application opens the stream. All Office applications being quite HTML and XML friendly, chances are you'll get a pretty good result while leaving server resources reasonably untapped. But it's hacky to say the least, and what you get is not a real Office document, just some HTML or XML document open in Office. This means that you won't be able to use most of the features of the Office application (like formulas in Excel, which is quite a large drawback). If you're brave, you may generate a proper Office XML format (the keywords here are WordprocessingML and SpreadsheetML), but you may want to fall back on 2:
  2. Use a third party server library that generates well-formed Office documents. There are quite a few floating around. Google MSN Search is your friend.
  3. Install Office on the server, spawn one of the Office applications and automate it from the server. You shouldn't do this if you can avoid it, as this KB article explains. The problems (apart from licensing) are due to Office applications being built to be desktop applications, not scalable server components. Read the KB article for more details, but in a nutshell, you'll have to deal with singletons, queues, cleanup procedures, etc. and even if you do it relatively cleanly, it will perform poorly. It just does not seem worth the trouble. The Office Web Components are also client-side objects that won't give good results server-side.
Starting with Visual Studio Tools for Office (VSTO) 2005, server-side objects are provided that solve this problem. It's like option 2., only it's done by Microsoft.
 
Check out the article here:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odc_vsto2005_ta/html/OfficeVSTOServerCapabilities.asp
Brian Goetz on micro-benchmarks
Brian Goetz writes on micro-benchmarks and discourages people from writing any. The first part of the article details why some particular banchmark is flawed, which is only mildly interesting unless you're interested in lock performance in Java (and why wouldn't you be?), but the second part gives excellent advice on performance testing in general.
I thought I would give a pointer to his article as I've been guilty of micro-benchmarking myself on this blog more than once. To my defense, I've always said that these gave only a rough idea of performance in a real-life scenario, and that any performance testing should be made in the context of the real application.
The problem for us API developers is that our users ask for guidance on when to use this or that particular technique. We can still give some general answers based on reasonable micro-benchmarking and analysis of the IL code in some very simple cases, and that's enough for most users. But nothing will ever replace a good profiler and a lot of experimentation on the real application when it's being used in real-life conditions.
 
Read the article here:
http://www-128.ibm.com/developerworks/java/library/j-jtp02225.html?ca=drs-j0805
More on string concatenation performance
This one is kind of obvious when you think about it, but I've seen code like this so many times it's probably worth showing. Strangely enough, I've seen this kind of code in VB most of the time.
The reason why people write this kind of code is to construct a SQL query while keeping the source code easily readable by splitting the string across several lines of code. The code looks like this:

string 
"Some string that we want to build";
a +" from several strings for better"
;
a +" source code readability:"
;
a +" we don't want a long string on one line.";
 
Notice how the string has the new bit added to itself using concatenation (in VB, you would use &= instead of +=). This is of course not very efficient because real concatenations happen. Here's the IL that this code compiles to:
 
 ldstr "Some string that we want to build"
 stloc.0
 ldloc.0
 ldstr " from several strings for better"
 call string [mscorlib]System.String::Concat(string, string)
 stloc.0
 ldloc.0
 ldstr " source code readability:"
 call string [mscorlib]System.String::Concat(string, string)
 stloc.0
 ldloc.0
 ldstr " we don't want a long string on one line."
 call string [mscorlib]System.String::Concat(string, string)
 
It's much more efficient to really split the line without ending it, this way:

string 
"Some string that we want to build"
    
" from several strings for better"
    
" source code readability:"
    
" we don't want a long string on one line.";
 
The IL that this code compiles to is exactly equivalent to what you'd get if you had left the string on one line because the compiler knows that only static strings are concatenated here:
 
 ldstr "Some string that we want to build from several str"
 + "ings for better source code readability: we don't want a long string on"
 + " one line."
 
+ signs here are an artefact of ILDASM for readability, similar to what we're trying to do in our source code, and are not real concatenations: notice how the lines are split at different places when compared to our source. To make it perfectly clear: the + signs here are *not* real. If you read the exe file with a hex editor, you'll see that the string is in one piece. No quotes, no concatenation, nothing.
 
So you get both readability and optimal performance.
 
In VB, of course, you can use the line continuation symbol (underscore: _ ) and the & operator. The compiler will also optimize it to a single string.
Three common mistakes in JavaScript / EcmaScript
Here are three common mistakes I've seen recently in script files.
  1. Undefined is not null, except that it is.
    If you've been writing code in a strongly-typed language recently, you're used to checking the nullity of objects before you use them, like this:

    if
    (SomeObject.foo !== null) {

    Well, in JavaScript, something that has not been assigned to is not null, it's undefined. Undefined is different from null when using !== but not when using the weaker != because JavaScript does some implicit casting in this case. Well, anyways, you can use typeof to explicitly check for undefined, or use the weaker equality operators, but the shortest way to deal with this, and also the one that best expresses your intention of checking if an object is safe to use is probably to just rely on the type-sloppiness of JavaScript and count on it to evaluate null and undefined as false in a boolean expression, like this:

    if
    (SomeObject.foo) {

    It's very important to keep the undefined case in mind. Another case is when you expect a function to return a boolean value. What if the function forgets to return a value in some cases? Well, its return value is then undefined, which is false. So if your own default value should be true, you should really write this:

    if
    (SomeFunction() !== false) {

    Which is different from if (SomeFunction()). By the way, note the strict equality here, which preserves you from strange things like "" == 0).
    But let me summarize and potentially add to the confusion before we move on to the next trick:
    undefined false
    (SomeObject.foo) false false
    (SomeObject.foo != null) false true
    (SomeObject.foo !== null) true true
    (SomeObject.foo != false) true false
    (SomeObject.foo !== false) true false
     
  2. You can't overload a function.
    Developers who are used to languages like Java and C# overload methods all the time. Well, in JavaScript, there are no overloads, and if you try to define one, you won't even get an error. The interpreter will just pick the latest-defined version of the function and call it. The earlier versions will just be ignored.
    The way you simulate overloading is twofold. First, if a parameter is omitted, it is undefined. And second, there is a special variable, arguments, which is an array of the function parameters. Based on the type of each parameter, you can do different things. But it's kind of ugly.
     
  3. Undeclared variables are global.
    Always, always declare your variables using the var keyword. If you don't, your variable is global. So anyone who makes the same mistake as you (or more likely, if you do the same mistake in two different places) will create nice conflicts which give rise to very difficult-to-track bugs. Even loop counters should be properly declared.
There is actually a good way to do some basic sanity checks on your script files (like multiple declarations, forgotten declarations, unassigned variables, etc.): in Firefox, go to the about:config url and look for the javascript.options.strict entry. Set it to true. Now, you can point the browser to your JavaScript file. You'll get a lot of new warnings that will point to the problems in your code (if any, but I doubt that you'll get zero warnings the first time you do that).
UPDATE: removed the useless rant at the beginning (I now quite like JavaScript) and corrected some mistakes in #1.
More Posts