Archives

Archives / 2004 / August
  • [.NET General] John Gough on "Vectors vs. Arrays"

    Below, I present a very interesting post on "Vectors vs. Arrays" where John Gough describes some important details about what goes on under the covers of certain seemingly minor differences in array notation.  John's post is contained in the archives of the DOTNET-LANGUAGE-DEVS email list on DISCUSS.MICROSOFT.COM but is only available, there, to list members.  I think this post is of more general interest so am making it accessible here. 

    A lengthier explanation is available in John's book, Compiling for the .NET Common Language Runtime

    Among other things, John Gough is the creator of Gardens Point Component Pascal (gpcp), an Open Source compiler for an object-oriented dialect of  Pascal that runs on the .NET Framework,

    NOTE: The following contains quoted material.

     Rod da Silva's Question:
    I was under the impression that there is a very real difference between
    the CLR type int[][] and int[,]. However, I am finding out that the
    both appear to be nothing more than instances of System.Array class.
    That is, they both exhibit pass-by-reference semantics in that I can
    pass either to a method and modify one of its elements, and the change
    will persist when I return from the method. I was expecting int[][] to
    have pass-by-value semantics.

    Can someone please describe the difference between int[][] and int[,]?
    Also is there any way to make int[][] have pass-by-value (i.e.;
    valuetype) semantics?
    John Gough's Answer:

    good question.

    You are correct, both int[][] and int[,] are reference types.
    I spend some time in my "Compiling for the .NET Common
    Language Runtime" (Prentice Hall 2002) explaining what a
    compiler has to do to get value semantics for its target
    language.

    The difference between the two types can be understood as
    follows. One dimensional arrays of any type are a primitive
    for the CLR. Thus int[] is a <<reference>> to an array of
    int. The type int[][] is a reference to an array of
    references to int. It is thus a "ragged array", and if you
    want it to be normal two-D array then in CIL the initializer
    must explicitly create each component int[] array to be the
    same length. Of course in some languages the compiler may
    hide this away from the programmer. Note that it follows
    that creating an array, say int[8][8], will require a total
    of nine(!) objects to be allocated.

    The type int[,] is not a built-in type of the execution
    engine, although the JIT does need to know about it.
    Instead it is one of the possible forms of System.Array. In
    brief, the memory allocated for such an array will be in one
    glob, and requires just one object creation. The only
    downside is that you cannot access the elements of such an
    array using just the raw instruction set of the CLR. It is
    necessary to call functions of System.Array and hope that
    the JIT gets to be clever enough to inline the code.

    Finally, how to get value semantics. Reading my book may
    help you write a compiler to do the trick, but if you are
    stuck with a language that does not do it for you then you
    need to write a method for each type, such as

    int[][] CopyOf(int[][] x) {
    // allocate correctly sized collection of 1-D arrays
    // now copy the elements, then return
    }

    So that instead of saying
    SomeMethod(myArray);
    you go
    SomeMethod(CopyOf(myArray));

    Hope this helps.

    John Gough

  • [General] Exploratory Data Analysis (EDA)

    I have long been fascinated by Exploratory Data Analysis (EDA), a very creative new statistical methodology that differs substantially from what most people know as statistics. 

    Most tools in the normal statistician's kit are intended to help analysts confirm the results of statistical experiments or to validate an hypothesis via statistical manipulation of pre-existing data.  We can classify these approaches as "confirmatory statistical analysis."  The "standard" confirmatory statistical techniques are only suitable if the problem under study meets  the very specific requirements and assumptions upon which parametric statistical theory is based.  Frequently, people -- including many professional statisticians who should know better -- blindly misuse the normal tools (e.g., mean and standard deviation) on data sets that do not come close to meeting the required conditions (such as having a normal distribution, etc.).  Only rarely can standard parametric statistical methods be used effectively to perform initial explorations on unknown batches of numbers.

    John W. Tukey, in his great classic text, Exploratory Data Analysis, gave us some cool tools for exploring data.  Sometimes, you end up with a bunch of data and have absolutely no idea what might be "in there."  Tukey's methods included some very interesting graphical techniques, such as "stem and leaf diagrams" and "box plots," that stand as excellent early modern data visualization examples.  I must hasten to add that many of the EDA techniques are not only effective but fun to do.  I strongly recommend EDA to absolutely anyone who must even occasionally attempt to find that elusive "something" in a batch of numbers.

    I consider it one of the canonical examples of the unfairness of the universe that Tukey's text appears to be out of print and is now somewhat difficult to find.  You can easily locate any number of derivative works but, IMNSHO, the true classics in any field should *never* be allowed to go out of print -- and Tukey's "orange book" certainly classifies as one of those.  Find it in some library somewhere and just take a look at it and I think you will agree.  Even the format and layout of this book is creative, special, and clear.  But the techniques, themselves, are things of beauty, developed by that extremely rare type of statistician, one who actually tried to do real things with real numbers.

    John W. Tukey died on July 26, 2000.  He certainly deserves to be ranked as one of the most influential statisticians of the late 20th century.  Oh, and by the way, you might be interested to know that it was John W. Tukey who first coined the term "software" in 1957.

    The immediate motive for this post is that I just discovered two nice introductory sites about EDA that I had not previously seen:  Exploratory Data Analysis and Data Visualization, by the unusual Dr. Alex Yu, Chong Ho (Alex), and the Exploratory Data Analysis section of the free online Engineering Statistics Handbook, provided by the Information Technology Laboratory (ITL) of NIST .  These resources give excellent introductions and give the beginner a great starting point.

    Enjoy!

  • [Security] Major Cryptographic Algorithms Broken by Quantum Bogodynamics

    It is definitely not April Fools' Day, but the article Crypto researchers abuzz over flaws will probably make you think it is.  As if all of the nasty viruses and worms and buffer overruns of late aren't enough, now MD4, MD5, HAVAL-128, RIPEMD, SHA-1, and other basic cryptographic algorithms currently in heavy production usage are under severe mathematical attack. 

    I think the only reasonable non-Occamian (Null-O) theory is that we must have recently experienced a serious rise in bogon flux density.  It's obvious (TM) that bogons and psytons have started poking their holes not only through electronic equipment but also even through basic theories and abstractions of all types.  Quantum bogodynamics has evolved into the abstract realm!  Start boning up on your quantum compudynamics or we are surely lost. Hmmmmmm?  Perhaps we're lost, anyway.

    "Caveat everybody!  She's gonna' blow!" 

  • [.NET - General] Return of the .... STL!

    Coming from a C++ background, it is interesting to see that the Standard Template Library (STL) has been reincarnated as STL.NET for VC++ 2005.  Stanley B. Lippman provides a new STL.NET Primer that not only introduces the undead beast but also gives some rationale for its .NETification.  Lippman's article is the beinning of a series on STL.NET.  Evolving STL into STL.NET is a Good Thing (TM) but it is not enough.

    When STL first came into existence, it quickly became an essential tool for C++ engineers, myself included.  I also derive from a SmallTalk lineage, and have grown used to SmallTalk's powerful collection class libraries, so the STL collections (containers) were certainly welcome, limited as they were.  Alexander Stepanov's approach to providing templated algorithms was quite revolutionary and represented a very creative contribution at the time.  I welcome the aspirations of the venerable STL to join the .NET generation because it will provide a much needed migration path for C++ engineers who have invested time and energy into learning these powerful tools.  I can only hope Lippman is right and that STL.NET will play well with others (C# and VB.NET and the many other new CLS-compliant languages). I eagerly await the other articles in his series to find out. 

    My main thesis here, though, is that STL.NET is not enough.  Making STL.NET available for C++ is a nice gesture, but what about the rest of the CLS/CLR world?  Collections are important!  A solid set of collections can make architecture and engineering much easier and you quickly find that collection abstractions enhance your thinking ability.  I sincerely hope that Microsoft will bite the bullet and take this opportunity to rework the whole idea of collections intelligently. 

    The original System.Collections offering was, frankly, not even a respectable token gesture.  I'm not sure Microsoft has ever taken collections seriously.  Even the Java Collection Framework (JCF) is arguably better than .NET's collections.  Does Microsoft have any empowered, strong advocate for putting a real set of collections into the .NET Framework?  If so, I would like to know that person's name!

    Wintellect has started a Power Collections for .NET initiative.  This effort could be a vehicle for work toward a new and effective collections framework.  At least, it is a community where some people want to discuss the matter and where some people are currently writing code.  They are asking for our help.  If you care about collections, you might want to participate.  The opportunity is here and the time is now!

    PS Thanks to Mike Taulty for bringing the STL.NET issue to my attention.

    Update:  Kevin Downs has provided a link to the very interesting C5 collection class library project.  To make sure the various collection class projects at least know about each other, I have forwarded the link to the Power Collections for .NET Class Ideas Forum to make sure those folks are also aware of it and have emailed information about Power Collections for .NET to the author of C5.

  • [Security] Defensive Security Programming Resource

    Security conscious software developers, certainly including .NET developers, should take particular note of the Metasploit Framework released into the wild by Metasploit.  I have to stretch a bit to have faith that this information and toolkit will be used more for good than harm.  Still, with all sorts of very nasty new viruses appearing, ones that can even hop from Bluetooth to your Symbian-enabled cellphone (see SymbOS.Cabir), all of us serious software professionals had better educate ourselves on the tools and techniques being used against us by the denizens of the Dark Side.  Frankly, I think the virus wars have escalated beyond the coping ability of the normal anti-virus vendors and their products.  From what I see, most organizations are absolutely clueless as to the new hazards we face today!  If you care about your users, you will need to work very hard to protect them and your applications from the kinds of tactics demonstrated publicly by Metasploit and similar exploit information sources.  May the Force be with you!

  • [Software Architecture] "Getting from use cases to code," a great article series

    Gary Evans has written Getting from use cases to code Part 1: Use-Case Analysis and Getting from use cases to code Part II: Use Case Design, absolutely the clearest exposition I have yet seen about how to proceed from use cases to an actual design.  As a .NET architect, I can use Gary's articles to cleanly lead user representatives, stakeholders, and managers through the process and can show them how we can get to shippable results by starting with a decent set of use cases.  Gary gets my personal ".NET Architect's Helper of the Month" award for his work on these.

  • [TDD][ASP.NET] Some help for unit testing server-side ASP.NET

    In the "Why didn't I think of this?" category falls a very helpful new article, "Server-Side Unit Testing in ASP.NET: How to create an HttpContext outside of IIS" by Steven Padfield, who did think of it.  I started down this path, once upon a time, but didn't follow through.  Padfield's approach is potentially helpful when you want to integrate NUnit testing with ASP.NET (not an easy task, unfortunately!).