# brianbec's WebLog

### Programming

##### Closures without Closures; Currying without Currying

Last installment, we found we could write a factorial function with a function call in the recursive branch, but not a recursive call. This ‘nearly recursive’ style allows us to write factorial without using its name. We just assume we have a free-variable delegate to some function that performs the rest of the computation. At initialization time, set the delegate to refer to the function itself. Thus, replace recursion with a call through a free variable. Just another level of indirection, as the cliché goes.

We need exactly this technique when we need recursive functions where explicit recursion isn’t allowed, as in lambda expressions.

Well, how about getting rid of the recursive definitions? Why bother with all these gymnastics just to write supposedly elegant functions, when loops and tests will do, eh? Two reasons: many algorithms are very hard to write correctly without recursion, and two, much more importantly, because we can. Part of my job is to explore all the implications of our technology. If something is possible, I have to try it. And I get to have fun doing it, too.

There is endless discussion of recursion versus iteration. It’s always possible to transform recursion into tail-recursion, and most compilers automatically convert tail-recursion into iteration. But many algorithms are very difficult, even for experts, to write straight-up correctly in iterative form. Quicksort and merge sort are famous, and binary search is infamous. Just about any graph or tree algorithm is best written recursively. Finally, when coding, it’s usually best to transcribe recursive specifications as recursive programs, notwithstanding notorious counterexamples like Fibonacci.

So, let’s take it as given that programmers need recursion as an intellectual tool for writing correct programs faster, secure in the knowledge that there are many ways to attack problems that might arise. That’s the real justification for studying toy problems like factorial: they represent much larger challenges, but in a small form that promotes full understanding of the principles at play.

Last time, we indulged in imaginary extensions to VB, allowing us to curry a function applied to itself, to create a closure containing the delegate to call in place of the recursion. The closure part is very easy to model in non-imaginary VB. Consider

Delegate Function df(ByVal n As Long) As Long

That’s the type needed for a factorial function. Next, define one as something that creates a new instance of Class cf, short for closure of f, which loads a private instance variable (closure environments are externally invisible) then calls a pseudo-lambda of type df in that closure. This is our somewhat clumsy transcription of the imaginary Dim fact = f(f), which presupposes currying. Don’t worry, it will get less clumsy as we go along.

Function fact1(ByVal n As Long) As Long

Return (New cf1).lambdaF(n)

End Function

Class cf1

REM This is a free variable in lambdaF:

REM The following is our simulation of self-application:

Sub New()

End Sub

REM Here is the non-recursive version, that calls itself

REM indirectly through the delegate stored in the closure:

Public Function lambdaF(ByVal n As Long) As Long

If n = 0 Then : Return 1

Else : Return n * adf(n - 1)

End If

End Function

End Class

This is pretty easy, and accounts for the idea of a closure, modeling it as an instance of a class that contains bindings for free variables. Of course, the closure here is statically special-cased, but a generalized technique is all-but-obvious:

Function fact2(ByVal n As Long) As Long

Return (New cf2).lambdaF(n)

End Function

Class cf2

Private env As New Dictionary(Of String, df)

Sub New()

End Sub

Public Function lambdaF(ByVal n As Long) As Long

If n = 0 Then : Return 1

Else : Return n * Me.env.Item("adf")(n - 1)

End If

End Function

End Class

or, even more generally

Class dfFreeVars

Protected env As New Dictionary(Of String, df)

End Class

Class cf2

Inherits dfFreeVars

...

Let’s say the point is made: we can easily implement environments for free variables, thus we have that part of closures licked. Henceforth, since environments will be very small, let’s not bother with the full generality of dictionary types in superclasses, remembering that we can easily do this when we need the generality.

We still have the vexing problem that we don’t really have lambda, self-application, and currying, just a rather lame simulation for them. Lambda will take us far into high orbit in another installment, but currying and self-application we can fake up right now:

Function fact3(ByVal n As Long) As Long

Dim acf3 = New cf3 REM Make a var so we can self-apply:

REM chained calls represent currying

End Function

Class cf3

REM This type represents functions that take (delegate)

REM functions as their first argument and return (delegate)

REM functions of a different type:

Delegate Function dh(ByVal adh As dh) As df

REM Here's the free variable we've closed over:

Private Function lambdaF(ByVal n As Long) As Long

If n = 0 Then : Return 1

REM Self-application and currying, call-on-call:

End If

End Function

REM The factorial MUST self apply to inflate the closure:

Public Function lambdaH(ByVal adh As dh) As df

REM Ooh, look, mom: a way to break access control!

End Function

End Class

See what’s happened? Fact3 gins up a closure by self-application of the one public method of cf3. That closure returns a (delegate to a) function of a numerical argument, so it’s curried. Fact3 then just calls the curried function. There is just a tiny bit of excess generality here, as lambdaH could be applied to any function of (delegate) type dh; it just so happens we always self-apply it, both when cf3 is instantiated in fact3, and internally to lambdaF, where external self-application is mimicked internally.

There is much, much more to do over the next few installments. First of all, we don’t really have a full lambda yet. Sure, we can make functions that don’t refer to their names internally, but we still refer to them externally. Second, this is all specialized to factorial, so far: we need to make this recursion-elimination technique work for any recursive function whatever. Third, we’ve just replaced call recursion with type recursion. The stuff above only works because type dh can refer to itself. That’s so self-application will work in a static-typing ambience. We could get rid of that through late-binding, and I might show something like that: I haven’t decided yet. But there is another technique.

Filed under: , ,

##### Lambdas, Closures, Currying, and All That

Behind the scenes, I’m working on the lambda-execution mode for the IQ97. I’m hoping for a few tweaks to VB and LINQ before I can do this satisfactorily, but I can take the opportunity now—with the current CTP—to illustrate the general technique of defining functions with lambda and what I hope it might look like in VB some time down the road. This exercise raises a number of interesting technical issues: in particular, closures—environments for free variables, and currying—values of functions when only some of their arguments are supplied. We explore these rather fully in this installment.

Suppose you had a function, call it

Function fact(ByVal n As Long) As Long

that would compute the factorial of any integer, n, at least up to the limits of  the Long data type. This would be fairly easy to define as follows:

Function fact(ByVal n As Long) As Long

If n = 0 Then

Return 1

Else

Return n * fact(n - 1)

End If

End Function

This would be fine for many purposes, but it only works because the function has a name, namely fact. Suppose fact didn’t have a name, or had another name, or your language didn’t allow recursive calls, or you simply refused to call fact inside the body of fact. The scenario where fact doesn’t have a name at all is the lambda-expression scenario, which LINQ uses internally. The other scenarios are easier to understand, so let’s start with them. Just refuse to call fact recursively, now what do we do? Let fact take another argument, which is a delegate function that (refers to a function that) computes the factorial, and write

Delegate Function dg(ByVal n As Long) As Long

Function fact(g As dg, ByVal n As Long) As Long

If n = 0 Then

Return 1

Else

Return n * g(n - 1)

End If

End Function

But fact was supposed to be our function that computes the factorial, so try to call it as follows:

Dim x = fact(AddressOf fact, 6L)

and get a type error: the first argument of fact must be a  function of one argument, and we are trying to pass fact itself in that argument slot, and fact is a function of two arguments. How can we escape this problem? Well, there is an easy answer, and that is to make the first argument of fact be a function of two arguments, exactly of the signature of fact itself, as follows (after juggling names a little bit):

Delegate Function dg(ByVal adg As dg, ByVal n As Long) As Long

Function f(ByVal g As dg, ByVal n As Long) As Long

If n = 0 Then

Return 1

Else

Return n * g(g, n - 1)

End If

End Function

Function fact(ByVal n As Long) As Long

End Function

If you like this answer, then we’re done and have a nice day. But, really, this is just the beginning and there are some lovely treats ahead.

There is something ugly about carrying around g just so it can be applied to itself over and over again in the pseudo-recursive call. Since it’s always g in that slot, except for the very first call, where it’s f, isn’t it a bit excessive to pass it around as an argument? We could lift g out and make it a top-level (module-level) variable:

Dim g As dg = AddressOf f

but I would hope that you would recoil in horror at such a solution. First of all, we would like it to be a constant—it never varies—and VB won’t allow that:

Const g As dg = AddressOf f REM XXXX Not Allowed !!!!

For the same reason, we can’t reverse the argument list and make g an optional argument, because a constant is required:

Function f(ByVal n As Long, Optional ByVal g As dg = AddressOf f)

REM XXXX Not Allowed !!!!

No function but f needs to access g, and it’s programming-style anathema to lift arguments far out of the scope where they’re needed. You should be ashamed of yourself for even thinking it.

Let’s change fonts as we go into magical wishful-thinking land. First, let’s imagine that our programming language allowed us to apply f to its first argument once and for all, but NOT apply it to its second long integer argument, getting a delegate to a NEW function that takes the integer:

Delegate Function dg(ByVal adg As dg, ByVal n As Long) As Long

Delegate Function df(ByVal n As Long) As Long

Function f(ByVal g As dg, ByVal n As Long) As Long

...

Dim fact As df = f(f)

This is called currying. To be a bit more precise about it, currying is a language feature (or just a theoretical construct) that allows you to call a function one argument at a time, left-to-right down the argument list. Each call returns a new function that takes the remainder of the arguments until you’re left with a function of one argument. Its utility is not just to pretty up scenarios like this, but also to support a simple theoretical calculus of programming, in which every function is, at heart, just a function of one argument, albeit one that may return other functions. In this calculus, a function of many variables—a multiary function—is modeled as a chain of unary functions. Of course, the calculus is the lambda calculus.

So, by currying, f(f) returns a new function of n As Long, which matches the delegate type df. We don’t say AddressOf in our fantasy code, because this new function is not represented in the source code. We just assume that our imaginary currying support will produce a delegate instance for us. From this point on, we can call the fantasy fact with a single argument, the number whose factorial we desire. Let’s try to write out what f(f) ought to be, still in our fantasy language:

Function (ByVal n As Long) As Long

If n = 0 Then

Return 1

Else

Return n * g(g, n - 1)

End If

End Function

We have two problems here: one easy and one hard. The easy one, first: the function has no name. It’s not f, because that function takes two arguments. It’s not even an overload of f, because we didn’t actually write it down: we calculated this function by calling f on f. The compiler might create a name for it internally, but we can’t access that name. The syntax above is perfectly reasonable for defining a function with no name. Alternatives might be:

Lambda (ByVal n As Long) As Long

...

(ByVal n As Long) As Long

...

Now the hard problem: g is not visibly defined anywhere—it’s not a local variable, and it’s not a function argument—but we know what it is: it’s f itself, or, more precisely, an instance of delegate type matching the signature of f. Such things—variables whose definitions are not locally obvious—are called free variables. We talked about them before, heaping shame upon anathema for thinking about putting them in some visible scope.

So, the lambda expression above actually includes an invisible environment in which the location of g is stored when the function is defined and from which the value of g can be looked up when the function is called. This pair of a lambda and an environment for free variables is called a closure. It’s important to store the location of g when the lambda function is defined, this is called static binding or lexical scoping. The alternative, in which the location of g is stored at call time, is called dynamic binding or dynamic scoping. While a completely legitimate style, it is no longer favored because it leads to potentially surprising behavior when the particular g depends on what function actually called the lambda containing the free variables. Call f from one place, get one g; call f from another place, get a potentially different g.

Amazingly enough, we can leave the realm of fantasy and actually model currying, closures, and lambdas in the current VB, but it’s non trivial. Let’s leave that for the next installment.

Filed under: , ,

##### IQ97 Deployment Error Fixed

Get new drop: current VB9 Customer Technology Preview (CTP). Copy the database Saturn5_002.mdf and Saturn5_002_log.LDF to c:\. Zipped VB9 project directory tree.

Apologies ... the previously posted versions of IQ97 were missing a source file, specifically, {lambda}Exec, where {lambda} is the UNICODE greek character. Turns out that Winzip silently refused to copy the file, so it was just absent from the source project. The workaround was simply to comment out the three references to it that appeared in Module1.vb and delete the file from the Visual Studio project, as, no doubt, some readers discovered. The lambda execution mode is work-in-progress at this time, anyway, so there is no real harm in deleting the file. However, to prevent this kind of thing in the future, I've learned my lesson: Don't Try to be Cute When Naming Files. The drop above fixes the problem.

Filed under: ,

##### IQ97: Gas and Go!

Get new drop: current VB9 Customer Technology Preview (CTP). Copy the database Saturn5_002.mdf and Saturn5_002_log.LDF to c:\. Zipped VB9 project directory tree.

The Hewlett-Packard manual exhibits the following command strings (last blog I said there were 256 possible and 236 used, but here there are 250 – the explanation is that I fold 19 command strings to uppercase, e.g., LBLa = LBLA, and there are five extension command strings, 19-5+236=250; this is admittedly a tiny bit of cruft in my program). This is taken right out of the database table “InstructionSet”:

 0 CLX ENG GSBA GTO9 LBL2 LSTX RCL1 S ST/8 ST+8 STO4 X<0? 1 COS ENT^ GSBB GTOA LBL3 - RCL2 SCI ST/9 ST+9 STO5 X<>Y? 2 COS-1 e^x GSBC GTOB LBL4 MRG RCL3 SF0 ST-0 ST*0 STO6 X=Y? 3 DEG F0? GSBD GTOC LBL5 N! RCL4 SF1 ST-1 ST*1 STO7 X>Y? 4 / F1? GSBE GTOD LBL6 ->P RCL5 SF2 ST-2 ST*2 STO8 X<=Y? 5 D->R F2? GSBa GTOE LBL7 % RCL6 SF3 ST-3 ST*3 STO9 6 DSP0 F3? GSBb GTOa LBL8 %CH RCL7 SUM+ ST-4 ST*4 STOA X^2 7 DSP1 FRC GSBc GTOb LBL9 Pi RCL8 SUM- ST-5 ST*5 STOB X<->I 8 DSP2 FIX GSBd GTOc LBLA + RCL9 SIN ST-6 ST*6 STOC X<->Y 9 DSP3 GRAD GSBe GTOd LBLB PREG RCLA SIN-1 ST-7 ST*7 STOD Y^x . DSP4 GSB0 GSBi GTOe LBLC PRST RCLB SPC ST-8 ST*8 STOE 1/X DSP5 GSB1 GTO0 GTOi LBLD PRTX RCLC SQRT ST-9 ST*9 STOI 10^x DSP6 GSB2 GTO1 ->HMS LBLE P<->S RCLD ST/0 ST+0 ST/i STOi ABS DSP7 GSB3 GTO2 HMS-> LBLa PSE RCLE ST/1 ST+1 ST-i TAN-1 CF0 DSP8 GSB4 GTO3 HMS+ LBLb ->R RCLi ST/2 ST+2 ST+i TAN CF1 DSP9 GSB5 GTO4 INT LBLc Rv RCLi ST/3 ST+3 ST*i * CF2 DSPi GSB6 GTO5 ISZI LBLd R^ RCLS ST/4 ST+4 STO0 WDTA CF3 DSZI GSB7 GTO6 ISZi LBLe RAD RND ST/5 ST+5 STO1 X<>0? CHS DSZi GSB8 GTO7 LBL0 LN R->D R/S ST/6 ST+6 STO2 X=0? CLRG EEX GSB9 GTO8 LBL1 LOG RCL0 RTN ST/7 ST+7 STO3 X>0?

The first task is to fill the dictionary ‘gas tank’ of static delegates with pairs of command strings and delegates, preparatory to going into a loop reading command strings and executing delegates. Do this in DelExec.Init. The first ten commands are DSP0, DSP1, …, DSP9 for setting up the number of digits to display on the calculator faceplate and in console-mode printouts. In the ironic fashion presented last time, these will all dispatch to a single command delegate, dspcmd:

Sub Init()

Dim digits = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}

Dim labels = {"A", "B", "C", "D", "E"} REM folded to uppercase

Dim ops = {"+", "-", "*", "/"}

For Each i In digits

Next

Here, we use + to stitch together strings; & would work just as well (I’m not sure if there is any difference). Let’s look at dspcmd:

Function dspcmd(ByVal c As String, ByVal m As Integer) As Integer

Savex()

Dim q = c(3) REM fish out the digit from the command string

Dim r As Integer = dpyPrecision

If q = "i" Then

r = reg(25) REM DSPI takes precision from Indirect register

Else

r = Val(q)

End If

If r >= 0 And r <= 9 Then

dpyPrecision = r

End If

updateWholeUI()

Return m + 1

End Function

Sure enough, we fish out the digit pseudo-argument from the input string. The only interesting thing is the DSPI case, where the digit isn’t a digit at all, but an uncial i. The DSPi command—pronounced “Disp-Eye” or “Display Indirect”—takes the precision from the I, or Indirect, register. Cool! So we need to gas up the dictionary with one more command string for dspcmd:

for eleven so far (the command string is folded to uppercase). Now, let’s put in all the GTOx, GSBx, LBLx, STOx, and RCLx commands, where x can be any digit or any alphabetic label from A-E:

For Each i In digits.Concat(labels)

Next

That takes care of 15 x 5=75 command strings for 86 so far. You can see, here, why I fold the labels to uppercase: it’s so I can clump together GTOa (uncial a) and GTOA (capital A), for instance, with RCLA (capital A), despite the fact that there is no RCLa command. This is appropriate since the pseudo-argument gets fished out of the actual command string, and all we have to do is remember to fold to uppercase for lookup—but not for copy of the command string passed to the delegate. Another way of saying this is that ONLY the insides of command functions are case-sensitive. Another excuse for this is that it helped me avoid useless ambiguities like 10^X versus 10^x. But I already admitted it was cruft; get over it.

Did you spot the opportunity for dynamic identifiers? Here is what I wanted to write:

For Each cstr In {"GTO", "GSB", "LBL", "RCL", "STO"}

For Each i In digits.Concat(labels)

Next

Next

In other words, I wanted to compute not just the command string, but the name of the delegate, too, to prevent the copy-and-paste error of, say, writing

Now this is the kind of annoying, stupid coding mistake that I know you never make, but, alas, I am not immune. It shows up like a land mine, much later, during calculator execution as subroutines mysteriously fail to return. But it is completely avoidable if we can write the desired code. So, what would that take? For one thing, the argument of AddressOf is not known at compile time, so the delegate can’t be constructed at compile time. So, isn’t it natural to call them late-bound delegates? We don’t have them yet, but scenarios like this point out the need, and we’re prototyping them for later development in VB.

Next, we need to fill up the tank with the storage-register arithmetic commands, STO+0, STO+1, and so on. These are a lot like the += , -=, *=, and /= operators in VB: they cumulatively side-effect the indicated registers by the indicated arithmetic operation taking the other argument from the X slot of the data stack:

For Each i In digits

For Each j In ops

Next

Next

For Each j In ops

Next

These loops add 44 command strings, for 130 so far. We need four more case-folded indirect commands:

for 134. There are exactly 100 more commands, accounting for the 236 required minus the two unimplemented (MRG and WDATA). The program does a quick check of this at the end of DelExec.Init, counting the command strings in the database and the command strings in the Dictionary, and diffing them both ways. If a string is in the database and not the dictionary, it’s unimplemented. Vice-versa, it’s an extension. See the source for all that.

Ok, now, sit down. Take a deep breath, because I am about to put on my best Halloween face and try to scare the pants off you. Remember dspcmd? Have you noticed that it is never called statically? Go ahead, to the source, find dspcmd in DelExec, highlight it with the mouse, and press Alt-F12. That will find all references to the symbol. Just its definition and its insertion into staticDelegates at command strings DSP+i and DSPI. No calls. So what’s the use of it? Answer: the REPL loop or the UI faceplate sends one of those 11 strings into the system, which looks up the delegate in the dictionary and dispatches to it through the delegate.

Ok, great, but since it’s not used directly in the simulator program, why should it be stored directly in the simulator program? Why not just read it out of the database, just like the calculator program that uses it, not to mention along with all the state variables and what not? Well, the answer is, of course we should. Just a matter of finding a way to store code in the database, that is, a way to convert code to data. Now, what I want to do is just cut the VB code and paste it into the database, and read it back in at dictionary-creation time. We call that code literals, and this scenario points out the need for it, and we’re prototyping it for later development in VB. In the mean time, we have to use more grotesque means. The only way of saving code as data in the current CLR is to disassemble it into IL, and the only ways to restore code from data is Reflection.Emit and Lightweight CodeGen (the last two links are superb; thanks, Joel Pobar). So, I wrote a little disassembler, and write back all the command functions into the database table “Disassemblies,” with foreign keys to another table called “CmdImplementations,” which stores byte-code streams for safety and redundancy. Oh, and, of course, I have a little assembler that writes the code back out into dynamic delegates in module DynExec. If you’ll forgive the length, dspcmd ends up, in “Disassemblies,” as

 0 Nop 1 Call Savex 6 Nop 7 ldarg.0 8 ldc.i4.3 9 callvirt get_Chars 14 stloc.1 15 Ldsfld dpyPrecision 20 stloc.2 21 ldloc.1 22 Call ToString 27 Ldstr "i" 32 ldc.i4.0 33 Call CompareString 38 ldc.i4.0 39 Ceq 41 stloc.3 42 ldloc.3 43 brfalse.s 17 45 Ldsfld reg 50 ldc.i4.s 25 52 ldelem.r8 53 call Round 58 conv.ovf.i4 59 stloc.2 60 br.s 8 62 nop 63 ldloc.1 64 call Val 69 stloc.2 70 nop 71 ldloc.2 72 ldc.i4.0 73 clt 75 ldc.i4.0 76 ceq 78 ldloc.2 79 ldc.i4.s 9 81 cgt 83 ldc.i4.0 84 ceq 86 and 87 stloc.3 88 ldloc.3 89 brfalse.s 6 91 ldloc.2 92 stsfld dpyPrecision 97 nop 98 call updateWholeUI 103 nop 104 ldarg.1 105 ldc.i4.1 106 add.ovf 107 stloc.0 108 br.s 0 110 ldloc.0 111 ret

To execute all calculator commands in dynamic-delegate mode, enter the “R” command at the console prompt, then execute the Diagnostics or Primes programs as before. And don’t blame me if it goes into the weeds: it’s only proof-of-concept code.

##### Endoskeleton of the IQ97

Get new drop: current VB9 Customer Technology Preview (CTP). Copy the database Saturn5_002.mdf and Saturn5_002_log.LDF to c:\. Zipped VB9 project directory tree.

The nice thing about this project is that we can explore technologies quite fully, but in the context of a concrete, small, completely understandable application: a programmable calculator. Anyone who can understand a Hewlett-Packard calculator can understand VB9, LINQ, ADO.NET, comprehensions, lambda expressions, and all the rest of these beautiful new technologies. Many ‘package’ samples are too large, offering no way to understand them piece-by-piece; or too artificial or abstract, leaving too big a gap to what one wants to learn to do; or too mundane to keep you awake. IQ97 is just big enough, just small enough, concrete, and fun.

Because people reason better from the concrete to the abstract rather than the other way around, many generalizations to other kinds of projects will be quite obvious, some so obvious that they don’t need pointing out, others I’ll point out when I notice them. Here’s one right off the bat: every technique we employ to simulate the HP97 can carry over to implementing any other virtual-machine (VM) specification. Any important VMs come to mind? How about the CLR itself? IQ97 will give you lots of ideas toward creating your own CLR / CLI implementation. Later on, we will talk about a virtual machine for executing LINQ comprehensions: a VM that can be implemented over a relational database, over XML, over objects in memory, over any kind of collections.

Because IQ97 is small and constrained, it’s practical to do some crazy things with it. Now, you’re not supposed to resist, saying “why would anyone ever do THAT?” You’re supposed to say “wow, I never thought of that!” then let your imagination go, trying to think of ways to use them in your own work. For instance, I now have planned out the following FIVE execution modes

1.        Delegates to static functions

a.       Development Mode: Includes disassembly and dehydration into the database

b.      Deployment Mode: When development is completely done, the static functions can be erased from the simulator, relying entirely on 2b

2.       Delegates to dynamic functions via Lightweight Code Gen (LCG)

a.       generated in parallel with disassembly and dehydration of the static functions

b.      or, in a completely separate phase, dynamically rehydrated, re-assembled, and generated from the database

3.       Dynamically generated lambda expressions

4.      Rehydrated lambda expressions

5.       Stored procedures in VB through SQLCLR in the database

The first mode is the easiest to understand and implement, and is the basis for all the other modes. In the current drop, 1a and 2a are implemented. 2B just takes courage, but will be the last thing I ever do with this. Some scaffolding for 3 is in place, and the rest are just dreams at present. Let’s dig into mode 1.

The first task is to establish some variables to hold the durable state—that is, persisted to the database—of the machine, in a module named DelExec, short for Delegate Execution. There is a 4-position data stack, a 4-position address stack for GSB (go subroutine) instructions, 4 flags, 224 program-memory slots, 26 registers, and a special last-x register for storing whatever was last in the X slot of the stack. VB9 requires the largest-index value in the declaration of the array bounds, not the length of the array, which would be one more. So an array of length 4 is declared with a 3 in round brackets, and the index will range from 0 to 3, inclusive.

Public Module DelExec

Public dStk(3) As Double  ' Data Stack

Public aStk(3) As Integer ' Address Stack

Public flags(3) As Boolean

Public pgm(224) As String ' Program Memory

Public reg(25) As Double

Public lastX As Double

We’re using just ordinary VB Doubles for values in the calculator. This is not precisely correct, as the actual calculator uses binary-coded decimal (BCD) and it’s easy to detect the difference, for instance, in different overflow behavior. However, VB Doubles are good enough for ordinary circumstances, in particular, for the Diagnostics and Primes programs. Furthermore, our goal is to simulate the calculator software, not its hardware. It would be a different project to precisely simulate the calculator numerics, and even the exact microcode (someone else has written such a simulation of the microcode for the HP45). But that would take us away from the point of the exercise: to show off fancy language and programming techniques.

Other aspects of durable state include the current display precision, trig mode and display style, which are loaded with the defaults below when the calculator powers up:

Public dpyPrecision As Integer = 2

Public dpyStyle As String = "FIX"

Public trigMode As String = "RAD"

For executing stored programs, we need an instruction pointer (ip), and we shall add a couple more instrumentation variables to count the total number of instructions executed and the real time taken. These are not in the original calculator, of course, but are just too good not to add to our sim. Since we can execute the sim out of the database, these will need to be durable, too:

Public ip As Integer

Public ic As Integer

Public it As New Stopwatch REM from System.Diagnostics

We’ll need exactly these things in the database. Use “Open Table Definition” and “Show Table Data” in Visual Studio’s Server Explorer under the connection to Saturn5_002.mdf and look over the tables “Stack,” “Registers,” “Flags,” and “Controls” (I don’t see a way to copy-paste the database schema here, so just take a look). We interact with these tables using LINQ / Dataset as follows:

Sub initVmdb()

dInstructionSet = New DataPackage("InstructionSet")

dFlags = New DataPackage("Flags")

dControls = New DataPackage("Controls")

dStack = New DataPackage("Stack")

dRegisters = New DataPackage("Registers")

End Sub

DataPackage is my abstraction over the LINQ / Dataset stuff, capturing the patterns I use over and over:

Public Class DataPackage

Implements IDisposable

Public ds As DataSet

Public dt As DataTable

Public dq As IQueryable

Public cb As SqlCommandBuilder

Every one of my data packages will need all five of these things. I make them public for my own convenience. Yeah, I know the conventional wisdom of private fields, public properties, yadda yadda, but this is for me, only, and I don’t need to overdo that discipline. A couple of trivial constructors:

Sub New()

'Nothing interesting here, but must exist

End Sub

Sub New(ByVal nym As String)

Me.create(nym)

End Sub

The following is where the real stuff occurs: this is how we handle each table in the database. Given a name, first we make a DataSet (I frequently use the pseudonym “nym” for variables containing names, because there are too many other names of names named name in a typical system). We also must make a data adapter, then just have it query up the whole table into the DataSet, which I always want to do. We also must have a SQL command builder, even though it’s not directly used. Since each of my DataSets contains just one table, for simplicity, I can set a variable of type DataTable always to the 0-th element of ds.Tables:

Public Sub create(ByVal nym As String)

ds = New DataSet(nym)

da = New SqlDataAdapter("SELECT * FROM " + nym, conn)

cb = New SqlCommandBuilder(da) REM must exist, but not used

da.Fill(ds)

dt = ds.Tables(0)

Now, we have another pattern: In every one of my tables (with one exception, later), the first column and only the first column contains the primary key, which sometimes happens to be one of those SQL autoincrement is-identity kinds. I found the following patterns to work, partly by trial-and-error and partly by asking around. I don’t understand the TableMappings, but I need them:

Dim fudge(1) As DataColumn

fudge(0) = dt.Columns.Item(0)

dt.PrimaryKey = fudge

dq = dt.ToQueryable()

End Sub

Public Function update(ByVal nym As String) As Integer

Return da.Update(ds, nym)

End Function

The rest of the class is just standard IDisposable boilerplate support for releasing non-garbage-collected resources:

Protected Overrides Sub finalize()

Me.Dispose()

MyBase.Finalize()

End Sub

Private disposedValue As Boolean = False  ' To detect redundant calls

Protected Overridable Sub Dispose(ByVal disposing As Boolean)

If Not Me.disposedValue Then

If disposing Then

cb.Dispose()

ds.Dispose()

da.Dispose()

dt.Dispose()

Me.disposedValue = True

End If

' TODO: free shared unmanaged resources

End If

Me.disposedValue = True

End Sub

End Class

The next seven variables are NON-durable, and track modes in the simulator rather than any documented modes in the calculator. Numeric entry, for instance, requires us to track whether the decimal or EEX key has been pressed. This isn’t the time to get into those vagaries, the point here being that just a handful of extra variables take care of them.

Public eex(1) As Integer

Public currexp As Double = 0

Public liftP As Boolean = True ' Will next entry lift the stack?

Public traceP As Boolean = True

Public eexmode As Boolean = False

Public efrac As Boolean = False ' Entering a fractional part?

Public progP As Boolean = False

Notice the Public access for most everything. Since this application is standalone, I break things up into modules and classes just for design modularity, and access control is not something I need to worry much about, so I am completely whimsical about it.

I’ll use the term command to mean a calculator function, whether executed by clicking a button on the faceplate, or by typing a shortcut key with the faceplate up, or by typing a calculator command in the console window, or by running a stored calculator program. Every command has a static function in the DelExec module (all functions in a VB module happen to be static functions). Executing a command works by first presenting a command string to the simulator. IQ97 commands do not take arguments, but variable information is often encoded in the command string. For instance, the STO4 command means “Store a copy of the value in the X slot of the data stack into register 4.” Likewise, STO3 means “Store a copy of the value in the X slot into register 3.” Today, it would be considered a more mainstream design, perhaps, to give the destination register as an argument to a generic STO command. However, that was not quite the way Hewlett-Packard did it back then.

There is a bit of irony to the story: the way to key in the STO4 command on the calculator faceplate is first to press the STO key and then to press the 4 key. However, internally, this combination is stored as a single byte in program memory. In fact, this merged keystroke encoding was a big improvement over the ancestral HP65, which, in fact, stored commands and arguments sequentially. However, that led to twisted code as programmers strove to save precious memory.

We need a CLR type, cmdel, for command delegates and a dictionary, staticDelegates, to look up command delegates given command strings:

Delegate Function cmdel(ByVal cmd As String, ByVal ip As Integer) _

As Integer

Private staticDelegates As New Dictionary(Of String, cmdel)

So, in our sim, we pick up the STO key, then a numeric key, and build a combined command string, “STO4,” and the static function for all STO commands then fishes the register back out of the string, again. This explains why every command function takes the actual command string as an argument—it often needs to fish out hidden pseudo-argument information.

Every command function also takes an integer representing the current instruction pointer, and is responsible for returning the next value of the instruction pointer, which is almost always just one more than the current ip, exceptions including branch and subroutine-call instructions. This is another advantage of the merged keystroke encoding: since all instructions are the same length, all command functions can increment ip the same way.

So, as you can probably guess, there are 256 possible opcodes. The calculator defines 236 of them, and I implemented all but WDATA and MRG, which are for writing data to mag cards. You can see all the command strings and their key scan codes in the table “InstructionSet.” The numeric opcodes are not used in IQ97, so we don’t record or store them anywhere.

It might seem most straightforward to start off by showing the numeric-entry commands. However, they are a bit messy due to internal modalities concerning the decimal point and EEX key, so let’s delay presenting them for a bit. The numeric operation commands are a good place to start. Consider the following, which implements the plus command, whenever the + key is pressed:

Function plus(ByVal c As String, ByVal m As Integer) As Integer

Savex()

dStk(0) += dStk(1)

dStk(1) = dStk(2)

dStk(2) = dStk(3)

Return m + 1

End Function

So, clearly, dStk(0) represents the X slot of the data stack, dStk(1) the Y slot, and so on. Savex is a ubiquitous utility; most of the command functions call it:

Sub Savex()

lastX = dStk(0)

EndInput()

End Sub

which just saves the last value of the X slot of the data stack in the special register lastX, where it can be fetched with the LSTX command. EndInput interacts with the numeric input modes, so we’ll talk about it later.

UpdateStackUI is one of a bunch of functions for operating the blinking lights. We get to all that later.

So there you go: we have laid out the endoskeleton of the calculator implementation. Next installment, we’ll fill up the dictionary with command delegates and start executing functions!

##### State of the IQ97

I’ll call my simulator of the HP97 the IQ97 (see if you can decipher the pun). Despite my effort at self-restraint, this has bloomed into a full-blown obsessive-compulsive sample work, as I can’t stop thinking of new things to do with it. So, rather than waiting until it’s all done, since there is no way to know when that will be, I’ll blog it out as I go along. A tremendous amount of interesting technology that deserves to be shown off has already gone into it. You’ll forgive the bits that get finished or undone later in the spirit of joyful bodging, and you’ll overlook messy bits that were just hacked together to get to something more interesting. And bugs? Please post or email ‘em when you find ‘em.

You’ll need the current VB9 Customer Technology Preview (CTP). Then you will need to copy the database Saturn5_002.mdf file and Saturn5_002_log.LDF file to c:\ , the root of your C drive. (This pun is too obscure to guess: one version of the chip in the HP calculators was called the ‘Saturn’ and the rocket used to launch the Apollo-Soyuz mission was the mighty Saturn 5). Finally, you will need this zipped VB9 project directory tree. You should be able to open up the solution, hit F5, and follow along these blogs. At this point, if you have things properly installed, you will see quite a lot of console output fly by, ending with the last few lines of the calculator’s diagnostic program:

...

219 SCI

220 PRTX

221 DSP1

222 FIX

223 PRTX

224 R/S

=000:=>

Most of the console output consists in disassembled calculator-key functions, about which we will have more to say later on.

There are lots of things to do in console mode, but, for now, just type m, which stands for ‘monitor,’ and <Enter>, and you should see the faceplate of the IQ97. You have now exited console mode: all keyboard and mouse entry go into the UI until you type q at the faceplate. That will send you back to console mode. For some reason I don’t understand, you must force the UI into focus the first time you type ‘m<Enter>;’ the easiest way to do that is just click anything on the faceplate or just Alt-tab over to its window. After the first time, you can m<Enter> and q back and forth and focus will track correctly.

Hewlett-Packard’s diagnostic program is preloaded, so if you click the ‘A’ button on the extreme left center or just type a (no <Enter> on the faceplate), you should see blinking lights for about 5 to 10 seconds, halting with 0.00 in the main display at top-left; -8.889e-088 twice and 0.00 on the paper tape, right-center top; and, in the machine-state display in green, left-center top, ip = 224, cmd = R/S, ic = 1018, TIME = something reasonable around 5 to 10 seconds, and ips (instructions per second) = something reasonable around 250 or below. If you get all that, you’ve passed the basic diagnostics and you are ready to use the calculator.

The UI mode responds to mouse clicks, of course, but also to several standard computer keyboard presses, like the number keys, the arithmetic keys, <End> for the calculator ENTER key (because the computer <Enter> key automatically mouse clicks whatever button has the focus, it was difficult to use that for the calculator ENTER key), <Backspace> for CLX, A-E for the A-E keys, F for the calculator f-shift key (the glaring yellow one), and R for  Rv, which rolls the stack down one.

To get off the ground, type 1 <End> 2 +, and you should see 3.00 in the main display and the X slot of the stack display center-top. The main display is always an echo of the X slot. If you just finished the diagnostics, the rest of the stack will contain Y=0.00, Z=300.00, and T=300.00. For those not familiar with the famous Hewlett-Packard Reverse-Polish Notation (RPN), it’s just postfix immediate. Numeric keys build up inputs, so, when we type the first 1, we start input mode. When we type <End>, we end input mode. Typing 2 causes an automatic push (HP calls it stack lift) of 1 into the Y slot, and puts the machine back into buildup mode . + Terminates buildup mode and immediately executes Y+XàX, dropping the stack down, autocopying T into Z. Most of the rest of the calculator functions operate this way, and a little bit of playing around will get you familiar and at-ease with it if you aren’t already.

While RPN eventually lost out to so-called ‘algebraic mode’ with parentheses in the calculator mass market, presumably because students preferred to slavishly copy formulas from textbooks to their calculators rather than to read and understand the formulas, RPN really is technically superior and much easier to grasp, not to mention much easier to edit and correct. Yes, there is a teeny-tiny learning molehill, but the benefits massively outweigh the cost, especially for anyone who has actually studied computers for a little while. Be that as it may, RPN was introduced by HP for pure expediency: it required less precious, on-board memory than the algebraic style. RPN only requires the value stack because all operations execute immediately. An algebraic-mode calculator must store a complete expression tree as it is built up, and would thus have been intolerably limited in 1976, when 25 memory cells was considered lavishly, luxuriously, many. With algebraic mode, users had to resort to pencil-and-paper to break up big formulas, while, with a little practice, an experienced RPN user could key in the most hairy formula right the first time (hint: start from the innermost subexpressions and build OUT, just as with applicative order or call-by-value in programming languages.

If you want to play with the primes program, go to the drop-down combo box just under the main display and select “Primes” (don’t bother with “Moon Lander” – it’s not working yet). Now TAB out of the combo box to remove focus from it (another UI booger – sorry). Click or type E to enter Prime’s Auto Mode, and type 100 B 200 C D to generate all primes from 100 to 200. This should go for about 20 to 40 seconds, with 4788 instructions executed at 125-250 ips, and leave results on the scrollable paper tape. If you want to heat up your computer for a few hours, type 1000000000 B 1000001000 C D and it will spit out one prime about once every 10 to 20 minutes. I suggest you leave that for overnight, though, and try out the console mode first, without the UI, because console mode has about 1000 times greater throughput.

Type q to quit the UI (you can get it back with another m<Enter>), and enter the following at the console prompt, each line terminated with <Enter> (don’t press <Enter> too early – it single-steps the program in the calculator, but, if you do, I’ll let you guess how to get started again):

 read.prim an extended command, not really part of the HP97 command set T Turns of trace mode E Turns on auto-print mode, lest the calculator stop after each prime 1 Simulates pressing of the 1 key on the calculator 0 Yes, we have to enter each digit on its own 0 As if it were an individual calculator key press 0 Each followed by the Enter key, so, yes, console mode is clumsy 0 But it will execute much, much faster, when we run the program 0 So hang in there for four more zeros, ok? 0 Three … 0 Two … 0 One … 0 Liftoff!  Yay, we’re done (and it sounds like a rocket launch) B Tells the calculator that we have the starting prime 1 Oh, no, here we go again! 0 Ok, go with me for five zeros 0 Four … 0 Three … 0 Two … 0 One … 1 A break from the monotony 0 Just three more and we’re done for good, I promise 0 But now I’m just funning with you because you got the idea 0 Way back there C Tells the calculator that we have the ending prime D Let ‘er rip!

This should start producing primes immediately, starting with 1000000007, and ending with 1000000993, at the rate of 1 every few seconds, executing exactly 8,628,918 instructions at the rate of up to 225,000 instructions per second. If you did the same in UI mode, the number of instructions will be exactly the same, but the ips will be more like 225, taking well over one hour.

The primes program has another nifty feature: it will also factor any number in the X register if you click the A key (or type A<Enter> at the console-mode prompt, or just A at the UI). Make sure you’re in auto-print mode, that is, that you have toggled E until 1 shows in the X slot and the main display.

At the console prompt, you can also type preg to print out the registers and prst to print out the stack, or go back into trace mode with t, all followed by <Enter>, of course, which I won’t bother to mention for console mode from now on.

If something goes wrong and the program becomes unresponsive (called ‘going into the weeds’ – a metaphor from automobile racing) just kill it from Visual Studio and start it over. It’s just a blogging sample, don’t you know, and hasn’t been subjected to anything like product-level QA.

Ok, but guess what? You’ve stuck with me through all that and we are ready for our first REAL TREAT of VB9 programming! Look at the code for intercepting computer keystrokes in the UI panel. It’s in Monitor.vb and is called Monitor_KeyPress. This is the ordinary winforms event handler for KeyPress, which gets forwarded to the topmost control—the window itself—from whatever subcontrol happens to have the focus.

Private Sub Monitor_KeyPress(ByVal sender As Object, ByVal e As System.Windows.Forms.KeyPressEventArgs) Handles Me.KeyPress

Dim k = Char.ToUpperInvariant(e.KeyChar)

If k >= "0"c And k <= "9"c Then

CObj(Me).("Num" & k & "_Click")(Nothing, Nothing)

ElseIf k >= "A"c And k <= "E"c Then

CObj(Me).(k & "Key_Click")(Nothing, Nothing)

ElseIf k = "F"c Then

fShiftKey_Click(Nothing, Nothing)

ElseIf k = "+"c Then

Me.PlusKey_Click(Nothing, Nothing)

ElseIf k = "-"c Then

Me.MinusKey_Click(Nothing, Nothing)

ElseIf k = "*"c Then

Me.TimesKey_Click(Nothing, Nothing)

ElseIf k = "/"c Then

Me.DivKey_Click(Nothing, Nothing)

ElseIf k = "R"c Then

Me.RollDownKey_Click(Nothing, Nothing)

ElseIf k = "Q"c Then

fMonitor.Dispose()

fMonitor = Nothing

End If

End Sub

We’ll convert the key to uppercase and start testing its value. The first branch runs when you press any key in the 0 through 9 range, and … what’s that weirdness? Method call through dynamic Identifiers! The idea is rather than specifying the method to call for each digit key statically, like this

If k = "0"c Then

Me.Num0_Click(Nothing, Nothing)

ElseIf k = "1"c Then

Me.Num1_Click(Nothing, Nothing)

ElseIf k = "2"c Then

Me.Num2_Click(Nothing, Nothing)

...

and so on, let’s just calculate the name of the method to call from the keycode. It only works through late binding (can’t statically bind to a method name that is not known until run time), so we first must convert the method receiver to something of static type Object, that is, convert Me to an object (hopefully, this syntax noise will be eliminated in the next drop of VB9):

CObj(Me)

Then, let’s calculate the name of the button-click event-handler method to call:

"Num" & k & "_Click"

(the ampersand glues strings – and things automatically convertible to strings – together), put that method name inside parentheses to the right of the dot:

CObj(Me).("Num" & k & "_Click")

(that's the magic syntax that means “dynamic identifier!”), supply the arguments to finish our first method-call through dynamic identifier:

CObj(Me).("Num" & k & "_Click")(Nothing, Nothing)

So, pressing a digit key on the computer keyboard just forwards execution to the exact code that process a digit-key button click through the computer mouse, just as we want. There is another method call through dynamic identifiers for handling A through E. Notice we cannot make the all-too-easy typing error of getting the digit wrong in the method name, so the dynamic identifiers also give us prophylaxis against cut-and-paste coding bugs!

We could not conveniently do likewise with the arithmetic keys. While I would have liked to say

Private Sub [+_Click](...) Handles PlusKey.Click

which VB9 accepts, by the way, because of the bracket escape syntax, the CLR balks at a name such as +_Click for a method and it fails to link. But if I had been able to write that, then the four lines for processing arithmetic keyboard input would have collapsed to just one method call through dynamic identifiers.

That’s enough for today, but in future installments, you will see much more use of this feature, as well as calls through dynamic-method delegates and through lambda-expression delegates. Also in the future, I will update the IQ97 source solution as I go along, and each blog will tell you whether there is a  new drop to pick up.

##### VB9, LINQ, ADO, and a 30th Anniversary for Rocket Science

In 1976, 30 years ago, Hewlett-Packard Corporation introduced a landmark product: the HP97 portable, programmable, printing scientific calculator with magnetic-card storage. The HP97 was one of a matched pair with the HP67, the pocket version of the SAME calculator! These two could read and write the same magnetic cards containing programs and data, and they could execute the same programs. The pocket version, the HP67, did not have a printer, so it just paused the display when it executed a PRINT instruction of some kind or another. Other than that and the form factors, These two machines were clones.

These were as close to a commercial personal computer as could be bought in their day. Hewlett-Packard had already changed the world in 1972 with the HP35, the world’s first pocket slide-rule calculator. It is difficult to overstate the impact the HP35 had on the scientific and technical world, especially in areas with a lot of field and lab work such as surveying, navigation, prospecting, refining, and power generation. Overnight, HP had obsoleted slide rules and pocket mechanical calculators like the Curta, which, itself, had revolutionized the world 25 years prior. Technical people could now carry around anywhere a single instrument with unprecedented accuracy, speed, reliability, and quality. And the battery was very long-lasting, the machines were unbelievably tough (I dropped an HP45 off my motorcycle at speed, turned around, found it in the gutter, picked it up, turned it on, and it still worked. It still works TODAY!)

These machines also had a wonderful look and feel, both in the software – they used RPN – and in the hardware – the buttons prevented accidental pressing with a slight resistance and confirmed pressing with a satisfying click. Not only were they a landmark of technical capability, but of industrial design and aesthetics. They have never been equaled, frankly, even by HP itself.

By 1975, HP had developed a stored-program version of the pocket slide-rule calculator, the HP65. This was so powerful and reliable that it was routinely used in life-and-death situations, for instance, as the back-up navigation computer aboard the Apollo-Soyuz space mission (read about it at the link above).

Now, I’m an official rocket scientist. At the time, I was an official student of rocket science. I routinely used an HP45, a souped-up HP35. It had cost me \$400, which was a substantial fraction of my annual income. I had never been able to afford a Curta, but the HP45 was just a must-have and worth every penny. I could not afford the HP65. At \$1000 in 1975, it would be like spending \$4000 today. At that time, a 3-year-old Volkswagen could be bought for \$500. It was a stunning amount of money. But it was a stunning machine. Everyone in science and engineering knew about HP65s and lusted after them.

Then, one year later, HP outdid themselves and introduced the 67/97 pair. The 67 was \$450 and at least three times as capable as the HP65. Interestingly, HP continued to sell the more expensive HP65 at its original price of \$1000, a move which endeared its customers. Rather than slap them in the face by dropping the price to liquidation levels, they refused to punish their early adopters. But the 67s and 97s flew off the shelves – even students could afford them.

When I started my PhD thesis, I plunked down a bankrupting, heartbreaking \$750 for an HP97. But the need for printed results was overpowering, never mind the fact that I was under the sway of irrational lust and I just had to have the thing.

I still have it. It sits on my desk at work, crunching out prime numbers above 1,000,000,000, at the rate of 1 about every three hours. It makes a great conversation piece: if someone happens to be in my office and the printer goes off, it can make a nice break to say “Oh, we have good news:  1,000,022,897 is prime!”

In honor of the 30th anniversary year of this incredible machine that changed the world and affected my life, and in honor of the new technologies embodied in VB9, LINQ, and ADO.NET which are doing the same things, I decided to write a simulation of the HP97 using those new technologies in such a way as to show them off. This simulation runs, in fact, the exact same programs as the original HP97. The simulator can execute in two modes: with UI and as a console application. With the UI on, the simulation blinks lights and pretend-prints and runs about 20 times faster than the real calculator, spitting out a big prime about once every 10 minutes. As a console app, it runs about 20,000 times faster than the real calculator, spitting out a big prime about once a second.

This simulator does not cheat. It has a delegate function for every allowable calculator key. There is a database of instructions with their key scancodes, which I typed in from the HP97 manual. Each instruction is represented as a  string. Here’s a sampling of those instructions.

 12 1/X 52 NULL NULL 13 10^x 16 33 NULL 14 ABS 16 31 NULL 15 CF0 16 22 0 16 CF1 16 22 1 17 CF2 16 22 2 18 CF3 16 22 3 19 CHS -22 NULL NULL 20 CLRG 16 -53 NULL 21 CLX -51 NULL NULL 22 COS 42 NULL NULL 23 COS-1 16 42 NULL

The programs are stored in a different database table as sequences of strings. Again, I just keyed these in from the manual or from actual paper-tape printouts. Here are the first few instructions of the primes program:

 1 LBLA 2 STOB 3 ENT^ 4 INT 5 X<>Y? 6 GTO5 7 0 8 STOD

When the user simulates the reading of a mag card into calculator memory (with a drop-down list box or with a “read” console command), I use LINQ over Dataset in ADO.NET to read the program, as strings, into memory. To execute the program, I dispatch each instruction string through a lookup table to its delegate function, and advance the instruction pointer by the appropriate number of steps. While the program is running, I write the state of the machine back out to database tables, including the 4-position stack, all 25 registers, and a few hidden registers like the instruction pointer (ip) and LASTX. For performance reasons, I throttle back the state-saving, saving once every N instructions, typically 100,000. The simulator can also read back its state from the database and pick up calculation from a saved checkpoint.

If I set the throttle to save to the database every time a single instruction executes, then the simulator effectively runs out of the database. It’s quite slow, but, as a demonstration, it’s also quite amazing. The simulator could be made even more databasey if I implemented the calculator functions as stored procedures in the database and actually executed them there. I might make a version to do that in the future.

But for now, I’m focusing on another trick: saving the code for the calculator functions in the database as serialized LINQ Expressions. In a previous life, I would have just implemented the calculator functions in Scheme and written them out as S-expressions into the database. But with the new facilities in VB9 and LINQ, we’re going to have all that capability and more, since we can get at the .NET framework. This isn’t quite working, yet, due to some bugs in the internal development version of LINQ, but I will show off this idea once we get it off the ground. There will be a new version of the VB9, LINQ, and ADO.NET Customer Technology Preview (CTP) some time in the next few weeks, and, when that is released, I’ll blog the details of my simulator program and explain how they exercise our new technology. In particular, I will be able to show how dynamic identifiers, relaxed delegates, and late-bound delegates drastically reduce the size of the UI code and make certain bugs impossible. It’s a great time to be a programmer, even 30 years after the HP97.

##### The Pie-Fighting Philosophers Problem

Take three PhD specialists in programming languages, three PhD specialists in relational database, and three PhD specialists in network security, and seat them around a table. Now take three coconut-cream pies, three chocolate-cream pies, and three banana-cream pies and randomly place one pie in front of each philosopher. Finally, give them three hours to come up with a position paper on the concept of "Identity." Calculate how many philosophers at the end of three hours have three kinds of cream-pie on their faces, how many have two kinds, and how many have just one kind.

Ok, I thought it was funny, but then, I've been involved in a lot of design and standards committees. The point is, no single concept of "Identity" works for all three. In fact, no concept of Identity works for any pair of the three, and it's only barely possible to get a concept that works for any one of the three. Let's illustrate this just for programming languages and relational databases.

Consider the following data, stated in English:

Venice is in Italy.

Rome is in Italy.

Nogales is in Mexico.

Nogales is in the USA.

Hey, wait a minute. One city in two countries? Sure. Happens all the time. Trieste straddles Slovenia and Italy. How about Bratislava? It's in three countries: Slovakia, Hungary, and Austria. Sault Ste. Marie is in both Canada and the USA. And those are just the physically connected examples. The USA has at least one Rome – in New York – and at least one Venice – in California.

Ok, the point is that our Mondial database is going to have to deal with the fact that the Country-City relationship is really many-to-many, not one-to-many, and that means it's going to have to deal with Identity in a deeper way than we sketched out prior to this.

How would our three database specialists represent the data above? Experienced practitioners will normalize the data, that is, remove redundancies, resulting in three tables: one for the countries, one for the cities, and one, called a join table, middle table, or intersection table, for the relationship. The primary key for the Country table will be just the name of the country. The primary key for the City table will be just the name of the city. The middle table will contain pairs of foreign keys. One member of each pair will be a city name, and the other member of each pair will be a country name. The primary key of the middle table will be the pair of foreign keys. In pictures:

 Table Labels: CountryTable CityTable CountryCityTable Column Labels: Country City Country City Data Rows: Italy Venice Italy Venice Mexico Rome Italy Rome USA Nogales Mexico Nogales USA Nogales

So we see that the only way to refer to a country or a city in the middle table is by a copy of its primary key. Therefore, in this normalized database,

Two rows are identical if and only if their primary keys have the same value.

Many database practitioners use underlining on the labels of columns containing primary keys; we'll use bold. Since every column contains a component of a primary key, all column labels are bold.

There is also an obvious non-normalized view form of the same data. Non-normalized basically means that this form contains mathematically redundant information, and this can cause ambiguity problems in the identity department. Typically, such a form would be temporary, not the permanent form, for constructing reports, building queries, analyzing by-products, and what not. It would look something like this:

 Table Labels: CountryFromCityTable CityFromCountryTable Column Labels: City Country Country City Data Rows: Venice Italy Italy Venice Rome Italy Italy Rome Nogales Mexico Mexico Nogales Nogales USA USA Nogales

What's the natural way to express this scenario in Visual Basic? Consider our running "Mondial" example again, but stripped way, way down:

Public Class Country

Public Name As String

Public MyCities As List(Of City)

End Class

Public Class City

Public Name As String

Public MyCountries As List(Of Country)

End Class

...

Sub InitData()

Dim Italy = New Country {Name := "Italy"}

Dim USA = New Country {Name := "USA"}

Dim Mexico = New Country {Name := "Mexico"}

Dim Venice = New City {Name := "Venice"}

Dim Rome = New City {Name := "Rome"}

Dim Nogales = New City {Name := "Nogales"}

Italy.MyCities = New List(Of City)

Mexico.MyCities = New List(Of City)

USA.MyCities = New List(Of City)

Venice.MyCountries = New List(Of Country)

Rome.MyCountries = New List(Of Country)

Nogales.MyCountries = New List(Of Country)

End Sub

That all just represents a graph of objects in memory, which we can sketch as follows:

_.------------------------.

,-''                           `v

+-------+-/-+---+                    +---+--------+

| Italy | o | o |<-------------------= o | Venice |

+-------+---+--\+                    +---+--------+

^           '-.

`--.          `--.

`-------.     `-----------.

`-------.         `---.

`------.      `.

`-.     `v

+-------+---+                         +\--+------+

|  USA  | o-=--------.                | o | Rome |

+-------+---+         `--.            +---+------+

^.                  `-----.

`----------.             `---.

`-----.           `.

`--.         `--.

,------------.      `--.         `-.

v               `-.       `--.        v

+--------+---+              `-.  +---+-\-+---------+

| Mexico | o |                 `-= o | o | Nogales |

+--------+-\-+                   +---+---+---------+

`-.                               ^

`----.                        /

`----------------------'

The point is that with objects in memory, the natural representation of the data is through references, and we may conclude that

Two objects are identical if and only if their references are identical.

So, indeed, since the USA object has a reference to Nogales that is identical to the reference to Nogales contained in the Mexico object, these two Nogales's are (drum roll) identical. Likewise, Venice has a reference to Italy, and Rome has a reference to the same, identical, Italy object.

Now, we might duplicate exactly this situation in a relational database, but we would have to introduce additional data for references, along the following lines:

 CountryTable CityTable CoutryCityTable Country CnyPK City CtyPK CnyFK CtyFK Italy 37 Venice 26 37 26 Mexico 162 Rome 8 37 8 USA 49 Nogales 33 49 33 162 33

In fact, this is the kind of automatic solution that object-oriented databases would typically compute on your behalf to persist an object graph for you. It just substitutes unique numbers for object references, a process called swizzling. But it still MUST USE a middle table, because there is no good way to represent a variable-length list of references in a column of a table. Most solutions for nested tables, in fact, just un-nest the nested information into additional tables, just as above. So, even though this appears to be a simulation of the object-oriented graph, it doesn't take much to see that it's a totally uninteresting transformation of the normalized relational database form: it just added numerical keys, and it still uses value-equality as the definition for identity.

The point I'm trying to make, here, is that object-oriented programs and relational databases have totally incompatible notions of identity, and that the incompatibilty opens up all kinds of downstream complexities. The various schemes for resolving the incompatibilities are mapping solutions. The world of mapping is very, very complicated, because mapping solutions typically try to resolve not just the mis-match in the notions of identity, but to provide all kinds of operational services, incorporating, for instance, updatable non-normalized views, and support for badly designed legacy databases, which are far too valuable to throw away but far too expensive to normalize.

Sorry, I don't have a bolt of lightning that, with blinding obviousness and clarity, puts this entire topic into order. I think it's best just to stock up on towels and view it as a never-ending pie fight we just have to deal with. That way, we'll interpret it as funny instead of troubling.

##### More Mondial Fun with VB and DLINQ

Buckle up, boys and girls, we’re going to be little DBA’s today ! (Data Base Administrators)

Earlier, we showed how to use LINQ over objects in memory. Today, we show how to write EXACTLY the same program, this time using DLINQ, over a real SQL-Express relational database. The whole DLINQ project, including the full data set of 3,053 rows, can be downloaded here. The earlier LINQ project – same program only over objects-in-memory, can be downloaded here. You will need the VB9 CTP (Customer Technology Preview), plus some patience. That is a pre-alpha version of VB, and you may have to create new projects and copy the source files (I have had some “inside” reports that this is so).

The point of both programs is to compare the time for a fast query to the time for a slow query. To deliver the punch line, even with 14 cities in a tiny DLINQ database, the slow query is 20 times slower than the fast one .

================================================================

SlowQuery1 found 14 cities, Time =    54.04 milliseconds

FastQuery1 found 14 cities, Time =     2.12 milliseconds

================================================================

With the full database of 3,053 cities, the results are crushing:

================================================================

SlowQuery1 found 3053 cities, Time = 46109.31 milliseconds

FastQuery1 found 3053 cities, Time =    40.53 milliseconds

================================================================

The data “entities” are from the Mondial database, a fun and free collection of facts and figures about the world. We start off small with just one “relationship” between cities and countries: each country has many cities, expressed through the “join condition” Country.Code == City.Country. In the in-memory case, we firmly establish that, practically speaking, it’s ALWAYS better to build an index, search the index, and throw the index away than to scan the relationship via a nested double loop. In the DLINQ case, we’ll use DLINQ to set up foreign keys in the Database and scan them in the fast query. Here is the query code for the DLINQ case (and we explain all the details below):

Sub SlowQuery1()

xs = Select New {Cty := it.t.Name, Cny := it.n.Name} _

From t In db.Cities, n In db.Countries _

Where it.t.Country = it.n.Code

xc = CountIEnum(xs)

End Sub

Sub FastQuery1()

xs = Select New {Cty := t.Name, Cny := t.MyCountry.Name} _

From t In db.Cities

xc = CountIEnum(xs)

End Sub

The neat thing is that DLINQ lets us express our relationship as a PROPERTY, “MyCountry,” on the City class. Neat. We don’t need an index in memory to get fast query. In my sample, I do build such an index, but I use it ONLY to set up the database quickly. I don’t use it at all in the queries and it’s absolutely not necessary.

Did I say that we’re making our SQL database FROM SCRATCH? Surprisingly, this is unusual. Almost all the samples I’ve seen magically install some big messy database and then show you how to manipulate it. They never show you how to create it. What’s up with that? I want to do EVERYTHING from my VB program. I don’t want to learn a bunch of SQL-Express GUI DBA tools too. Given the choice between “Hello World!” and “Hello Northwind!” I’ll take “Hello World!” every time. Wouldn’t you? Well, “Hello World!” translates into “Hello Mondial!”

Let’s start with class definitions. Remember, the Country class definition for our objects-in-memory LINQ version was:

Public Class Country

Public Name As String

Public Code As String 'Primary key

Public Capital As String

Public Province As String

Public Area As Integer

Public Population As Integer

End Class

The DLINQ version is, mostly, very similar. Visually, it’s more junky looking, but, fear not, I walk you through all of it:

<Table()> _

Public Class Country

<Column(DbType:="VARCHAR(32)")> _

Public Name As String

<Column(DbType:="VARCHAR(4)", Id:=True)> _

Public Code As String

<Column(DbType:="VARCHAR(35)")> _

Public Capital As String

<Column(DbType:="VARCHAR(32)")> _

Public Province As String

<Column(DbType:="INT")> _

Public Area As Integer

<Column(DbType:="INT")> _

Public Population As Integer

Private _myCities As New EntitySet(Of City)

<Association(Storage:="_myCities", OtherKey:="Country")> _

Public Property MyCities() As EntitySet(Of City)

Get

Return _myCities

End Get

Set(ByVal value As EntitySet(Of City))

_myCities.Assign(value)

End Set

End Property

End Class

The custom attribute <Table()> tells DLINQ that we want the class stored as a database table. The VB compiler ignores CLR custom attributes – it just copies them into assembly metadata so that libraries can read them through Reflection if they need to. That’s just what the DLINQ library does – it checks our class definitions at run time to see if they’re Tables.

Next, list out all the same fields as before, just decorated with <Column(…)> attributes to tell DLINQ what SQL types we want stored. I think there are some intelligent defaults I could have used, saving explicit specifying of these details, but I wanted to play it really safe, not testing those waters, so I slavishly set up column attributes for every field. The Mondial original was written in SQL, not VB, so it wasn’t hard to figure out the size of VARCHARS I needed and what not.

The new DLINQ EntitySet type and Association attribute are clearly the interesting things. They represent a recurring DLINQ pattern for one-to-many relationships. Over here, in the Country class, we’re on the ONE side of the one-to-many relationship. We must have a corresponding Association attribute on the MANY side, in the City class; we get to it in a minute. Let’s go through this side line-by-line:

Private _myCities As New EntitySet(Of City)

Ok, there is a private field that DLINQ will use behind our backs to store a “list” of cities, in effect, for each Country.

<Association(Storage:="_myCities", OtherKey:="Country")> _

Ok, DLINQ now knows the name of the storage field it can party on; it’s Country._myCities. It also knows the name of the field in the OTHER side of the association that points back to Country instances, here. Remember, our “join” condition is that City.Country == Country.Code. The OtherKey portion tells DLINQ that the field named “Country” in the OTHER side of the association hooks up to the primary key on THIS side of the relationship, which, oh-by-the-way, we already declared thus:

<Column(DbType:="VARCHAR(4)", Id:=True)> _

Public Code As String

The word “Country” in the fragment OtherKey:="Country" means the field named “Country” in the other side of the association, not the class named “Country” on this side of the association. Sorry about this, but that’s how Mondial was designed and we’re going to live with it.

Public Property MyCities() As EntitySet(Of City)

Get

Return _myCities

End Get

Set(ByVal value As EntitySet(Of City))

_myCities.Assign(value)

End Set

End Property

Ok, that is more-or-less standard stuff: it defines the public property allowing other parts of my program to interact with the EntitySet. DLINQ parties on _myCities behind the scenes, but we get to party on the public Property MyCities. The only thing to remember is to use the Assign method of EntitySet so DLINQ can keep track of the publicly made changes. Now, we just look at the City class and we’re done with almost everything difficult:

<Table()> _

Public Class City

<Column(DbType:="VARCHAR(35)", Id:=True)> _

Public Name As String

<Column(DbType:="VARCHAR(4)", Id:=True)> _

Public Country As String

<Column(DbType:="VARCHAR(35)", Id:=True)> _

Public Province As String

<Column(DbType:="INT")> _

Public Population As Integer

<Column(DbType:="FLOAT")> _

Public Longitude As Double

<Column(DbType:="FLOAT")> _

Public Latitude As Double

Private _myCountry As EntityRef(Of Country)

<Association(Storage:="_myCountry", ThisKey:="Country")> _

Public Property MyCountry() As Country

Get

Return _myCountry.Entity

End Get

Set(ByVal value As Country)

_myCountry.Entity = value

End Set

End Property

Ok, line-by-line. First, the primary key in this table is triplicate, consisting of the Name of the city, its Country code, and its province. Fair enough (how many “Springfields” ARE there, actually?) Let’s go straight to the Association. EntityRef(Of Country) on this side; a ThisKey whose name matches up with the OtherKey on Country. Once again, the word “Country” on both sides of the association means the field named “Country” in the City class, not to the Class named Country. City.Country holds a value that must be equal to the Code field value in class Country for every pair of City and Country participating in the relationship. Think about it for a minute and make sure you totally “get it” before moving on.

Last detail is that the Entity property on DLINQ’s EntityRef type is the one we must use to get and set EntityRefs on the background private party storage field, _myCountry.

There’s no freedom in the pattern, here. This is the exact way to express one-to-many relationships in DLINQ. Everything has to be wired up exactly this way. If we were NOT creating our database from scratch, we would just let the SQLmetal utility generate this goo for us from a pre-existing SQL schema, which we would have created from some other SQL toolsets I didn’t want to learn about. Furthermore, I don’t like stuff that generates VB code where I have no idea what’s going on, so I insisted on writing this sample by hand. They tried to tell me not to do it, but I march to a different drummer.

Ok, we’re done with almost all the hard stuff. Let’s get through some easy stuff:

Public Class Mondial

Inherits DataContext

Public Sub New(ByVal connection As String)

MyBase.New(connection)

End Sub

Public Countries As Table(Of Country)

Public Cities As Table(Of City)

End Class

DataContext is a DLINQ base class; we must inherit from it for our own database, and we must pass in a connection string when we create one of these. The connection string is the LAST bit of hard stuff. There is a cottage industry of helpage for these nasties: see http://www.connectionstrings.com for instance. They are amazingly brittle, so I just wish you luck. Here is the one I used (and I had to get help to construct it):

Private ReadOnly conn1 = "Data Source=.\SQLEXPRESS;AttachDbFilename=" & _

"'C:\Documents and Settings\brianbec\My Documents\Mondial.mdf';" & _

"Integrated Security=True;Connect Timeout=30;User Instance=True"

My suggestion is that you change only the highlighted thing to your user name. Even though I thought I understood this connection string, I found by bitter experience that I had no flexibility whatever with it. I couldn’t change the directory, the Timeout, the order of terms, NOTHING. If I looked at it sideways, I would get an exception from deep in the bowels.

Ok, that’s it, the rest is easy:

Module Module1

Public UseTinyData As Boolean = True

REM set to False to use big, slow data set

Public CountryFromCode As Dictionary(Of String, Country)

Private ReadOnly conn1 = ... (as before

Public db As Mondial

Public Sub Init()

CountryFromCode = New Dictionary(Of String, Country)

If db.DatabaseExists Then

db.DeleteDatabase()

End If

db.CreateDatabase()

If UseTinyData Then

InMemoryTinyDataInitializer.CountryData(db)

InMemoryTinyDataInitializer.CityData(db)

Else

InMemoryDataInitializer.CountryData(db)

InMemoryDataInitializer.CityData(db)

End If

db.SubmitChanges()

End Sub

That just sets up our data tables, and yes, we do use an in-memory index to quickly set up the foreign-key attributes of type EntityRef, but we don’t have to at all. We also have two different data sets, one tiny, for debugging and development, and the full one with 3,053 city rows.

Module InMemoryTinyDataInitializer

Private Sub AddCountry(ByVal db As Mondial, ByVal c As Country)

End Sub

Public Sub CountryData(ByVal db As Mondial)

AddCountry(db, New Country {Name := "Albania", Code := "AL", Capital := "Tirane", Province := "Albania", Area := 28750, Population := 3249136})

AddCountry(db, New Country {Name := "Greece", Code := "GR", Capital := "Athens", Province := "Greece", Area := 131940, Population := 10538594})

End Sub

Private Sub AddCity(ByVal db As Mondial, ByVal c As City)

c.MyCountry = CountryFromCode(c.Country)

End Sub

Public Sub CityData(ByVal db As Mondial)

AddCity(db, New City {Name := "Tirane", Country := "AL", Province := "Albania", Population := 192000, Longitude := 10.7, Latitude := 46.2})

AddCity(db, New City {Name := "Shkoder", Country := "AL", Province := "Albania", Population := 62000, Longitude := 19.2, Latitude := 42.2})

AddCity(db, New City {Name := "Durres", Country := "AL", Province := "Albania", Population := 60000, Longitude := 19.3, Latitude := 41.2})

AddCity(db, New City {Name := "Vlore", Country := "AL", Province := "Albania", Population := 56000, Longitude := 19.3, Latitude := 40.3})

AddCity(db, New City {Name := "Elbasan", Country := "AL", Province := "Albania", Population := 53000, Longitude := 20.1, Latitude := 41.1})

AddCity(db, New City {Name := "Korce", Country := "AL", Province := "Albania", Population := 52000, Longitude := 20.5, Latitude := 40.4})

AddCity(db, New City {Name := "Athens", Country := "GR", Province := "Greece", Population := 885737, Longitude := 23.7167, Latitude := 37.9667})

AddCity(db, New City {Name := "Thessaloniki", Country := "GR", Province := "Greece", Population := 406413, Longitude := 22.95, Latitude := 40.6167})

AddCity(db, New City {Name := "Piraeus", Country := "GR", Province := "Greece", Population := 196389, Longitude := Nothing, Latitude := Nothing})

AddCity(db, New City {Name := "Patrai", Country := "GR", Province := "Greece", Population := 142163, Longitude := Nothing, Latitude := Nothing})

AddCity(db, New City {Name := "Larisa", Country := "GR", Province := "Greece", Population := 102426, Longitude := Nothing, Latitude := Nothing})

AddCity(db, New City {Name := "Iraklion", Country := "GR", Province := "Greece", Population := 102398, Longitude := Nothing, Latitude := Nothing})

AddCity(db, New City {Name := "Volos", Country := "GR", Province := "Greece", Population := 71378, Longitude := Nothing, Latitude := Nothing})

AddCity(db, New City {Name := "Kavalla", Country := "GR", Province := "Greece", Population := 56705, Longitude := Nothing, Latitude := Nothing})

End Sub

End Module

I’ll leave the main routine and the timing details to the project I uploaded. Have fun!

##### Got REPL? Or, "Please Don't Shake My Jigsaw Puzzle!"

One recurring but subliminal theme at the recent Compiler Dev Lab (via Devhawk) was how nice it is to do incremental, interactive development. I argue, here, that not only is it nice, it’s actually critically important. The argument is statistical.

If you make ten changes to a program all at once, the number of ways of getting all ten of them right is just one, but the number of ways of getting at least one of them wrong is 1,023. And, if you get even one of them wrong, then you’ve just inserted subtle bugs into your program that you – or even worse, someone else – may not find till much later.

Digest that for a minute. If you make ten lousy changes to a program before trying them out, your chances of getting the whole thing right is just one in a thousand. It never ceases to amaze me, watching programmers at work. They’ll type in ten lines of code and then spend all day debugging them. Whereas, if they had found a way to type them in one at a time, testing as they go, then they would only have had ten ways to mess up, instead of a thousand.

Multiply that kind of inefficiency by 1,000 developers, and suddenly we can see why software engineering in practice is anything but engineering.

Look at it this way: would you solve a jigsaw puzzle by fitting in one piece at a time, or by trying to get ten pieces all in at one go? The latter strategy is a non-starter. You might as well try to solve the puzzle by putting the pieces back in the box and shaking them.

So, what… How are we going to make our changes one at a time instead of in batches? For that, let’s go back to the Compiler Dev Lab.

The most popular talks were those wherein the speaker typed a line of code and then something interesting happened on the screen. F#, Dyalog APL, Iron Python, Monad, all went along this way. As more lines were typed, more and more sophisticated things happened, building exponentially on the things that happened just a line or two prior.

The host for such a development style is a REPL (pronounced “Repple”), which stands for Read-Eval-Print Loop. Such things are commonplace staples of programming languages since the 1950’s, pioneered with Lisp, Forth, Basic, and so on. The alternative is an ECDL (pronounced “Eckdull”), for Edit-Compile-Debug Loop. An ECDL is, in reality, just a semi-interactive simulation of a card punch, card reader, and line printer (those of you under the age of 45 will have no idea what I’m talking about). People who grew up with PL/1, Fortran, C, Java, are comfortable with ECDLs and may never even have seen a REPL.

But, almost any language can be hosted in a REPL or an ECDL, and some development environments, like those for PLT-scheme and Hugs, support both.

But here’s the main point: a REPL helps you make just one change at a time. Make a change, hit return, bango it’s executed. If you made a mistake (which we all do about half the time), you see it right away, NOW, not later, hidden in the weeds with 1,022 other potential mistakes.

The speakers knew that REPLs create an overwhelmingly compelling development experience, and thus, an overwhelmingly compelling presentation experience. After all, what good is a talk about your technology if your audience dozes off? Their knowledge of this is so deep that a couple of them even created specially doctored REPLs just for presentation purposes.