February 2004 - Posts

Luna's REPL

Luna’s REPL (Read-Eval-Print Loop) reveals a tremendous amount about the internals of the compiler / scripter. First, the namespaces used include “Luna.Types” and “Luna.Functions,” which we will get to in time:

usingSystem;

usingSystem.IO;

usingLuna;

usingLuna.Types;

usingLuna.Functions;

The namespace provided is “Luna.REPL,” so other programs might embed Luna and have a scripter to invoke:

namespaceLuna.REPL

{

      public class REPL

      {

“Start” has several overloads so that it can take input from any TextReader and send output to any TextWriter:

            public static void Start(TextReader input, TextWriter output)

            {

First off, create an instance of type “Luna,” the main purpose of which is to store an InteractionEnvironment, which is a Dictionary for installing and looking up the values, or “bindings” of variables:

                  Luna Luna = new Luna(input, output);

Next, write some banner text and get to the business of reading, evaluating, and printing in a loop that goes on forever:

                  output.WriteLine("\nWelcome to Luna " + Luna.Version + "\n");

                  while (true)

                  {

                        output.Write("luna>");

                        object result;

                        try

                        {

                              object sexp = Read.CSApply(input);

That last line of code reveals the pattern for invoking functions. “Read” is an implementation of the Scheme primitive by the same name. In Scheme, one would write “(read input)”, and the implementation of the same procedure call in C# is Read.CSApply(input). So, it’s easy to generalize and represent every Scheme function with a C# class of the same name that supports a “CSApply” method to arguments. Applying a function to its arguments; that’s the pattern to exploit via virtual functions. “Apply” is the Scheme primitive, and Luna supports it as a virtual function, too, usually by simply calling the static “CSApply.” I do not know what Luna’s author meant by the abbreviation “CS”. The return value of “Read.CSApply” is of type “object” and represents an “S-Expression,” the base data type in Scheme.

                              result = Eval.CSApply(sexp, Luna.InteractionEnvironment);

That line of code applies “Eval” to the s-expression returned by “Read” and returns a result. The next code block catches any exception thrown above and assigns it to the result of the current iteration of the loop, and if the object happens to be the global instance of “Exit.Bye,”, break out of the forever loop :

                        }

                        catch (Exception e)

                        {

                              result = e;

                        }

                        if (result == Exit.Bye)

                        {

                              Write.CSApply("Good Bye.", output);

                              break;

                        }

If everything has been OK so far, Apply “Write,” to the result on the output, and toss in a newline for hygiene. “Write” is pretty much the same thing as “Print”:

                        else

                        {

                              Write.CSApply (result, output);

                              output.WriteLine();

                        }

                  }

            }

 

Here are a few convenience overloads and a “Main” routine that simply calls the most generic of them:

            public static void Start(TextReader input)

            {Start(input, System.Console.Out);}

            public static void Start(TextWriter output)

            {Start(System.Console.In, output);}

            public static void Start()

            {Start(System.Console.In, System.Console.Out);}

            public static void Main(String[] argv)

            {Start();}

      }

}

Next time, we’ll get into “Read” and see how it creates S-expressions.

 

 

Posted Monday, February 23, 2004 5:44 PM by brianbec | with no comments

Luna analysis Intro (details pending solution to posting code)

I've been worrying about implementing compilers and interpreters for a long time. As a platform for hands-on R&D, nothing beats a listener-based implementation, where one enters statements and expressions from the console and files directly into the system, essentially combinding the compile step and the 'go' step. Such things also double as scripters if they can be extended to call arbitrary libraries. My first foray into this world entailed David Betz's XLisp and the Silicon Graphics GL library, a precursor to OpenGL. I used a “Glue“ approach, writing custom wrappers and designing data-structure conversions for over 700 routines and types. I wasn't real happy with the brute-force approach, especially since TONS of exploitable, repetitive pattern emerged as I slavishly performed cut-paste-edit, but I didn't have time to learn the assembler and C-compiler internals necessary to graduate to wrapper generation. By the way, I believe Eric Raible, who was aware of my effort at the time, helped coin the term Superglue with his co-author, J. P. M. Hultquist. I frequently use “Superglue“ to refer to general strategies for generating wrappers automatically as opposed to “Glue,“ which is writing wrappers by hand.

I eventually solved the “Superglue“ problem completely for the Windows platform using Siod, George Carette's marvelously tiny interpreter. I copied my paper about that a few days ago to this blog.

To push Superglue into the CLR world, I started with a copy of Luna Scheme. After a few bug fixes, I got this off the ground and wrote stressful regression tests over the Y-combinator and the usual combinatorially explosive Fibonacci and so on.

Being based on CLR, Luna is an incredibly lucid implementation of a compile-and-go scripter, because vast amounts of infrastructure are already provided by CLR, and the code to implement the compiler is very simple. Every such scripter works by running a “REPL”, or Read-Eval-Print Loop. “Read” reads expressions, “Eval” evaluates them, and “Print” prints them. Reading an expression is converting it from a textual representation to an internal representation. Evaluating an expression is retrieving values of variables, expanding syntactic macros, and applying functions to arguments. Printing an expression is converting it from internal representation to external, ASCII, readable form.

HICCUP -- I was going to paste some code here and give a line-by-line analysis, but that will have to wait for some tools research. My first attempt at just pasting code text here, even under 'verbatim' and 'tt' HTML tags, didn't work at all.  Back soon with real meat...

Posted Friday, February 20, 2004 12:58 PM by brianbec | with no comments

Filed under:

Teaser: Sending Email from a Script

I'm going to be blogging about how to write a scripting engine in the CLR. As a teaser, let me show you what I can do with the scripter I've already ginned up. There's not a whole lot of effort expended in creating this scripter, and it has some warts -- some really big ones, which I won't hide, but it already can do quite a lot of things.

(define fwdir "c:/WINDOWS/Microsoft.NET/Framework/v1.1.4322/")

That just defines a global variable (or symbol), “fwdir“, whose value is the path in the string literal.

(define syslib (load-assembly (string-append fwdir "system.dll")))

That defines a global symbol, “syslib“, whose value is the result of calling “load-assembly” on the string concatenation of (the value of) “fwdir', which we already defined, and the string literal “system.dll.“  The blue strings are part of the IEEE 1178 standard language; the red ones are my extensions for scripting in the CLR. “load-assembly“ just returns an object of type “Assembly“, which we use in the Reflection API in various ways.

(define maildll (string-append fwdir "system.web.dll"))

You should be getting the idea by now...

(define maillib (load-assembly maildll))

Ditto... let's do something new:

(define email (new maildll "System.Web.Mail.MailMessage"))

That creates a new instance of the type “System.Web.Mail.MailMessage” in the assembly “maildll” and assigns the instance to the symbol “email“.

(invoke email "set_Body" "Test Body")

That invokes the “set_Body“ method on the object “email“ with the argument “Test Body“. “set_Body“ is actually the “set“ branch of the property “Body“; Reflection exposes this as an ordinary method named “set_Body“ and my invoke routine just makes a really straightforward call to Reflection's Invoke. If we had written that line of code in C# instead of the scripter, it would look like email.Body = "TestBody";

(invoke email "set_Subject" "Test Subject")

(invoke email "set_From" "few@boar.com")

(invoke email "set_To" "brianbec@exchange.nowhere.com")

Those are just some more property “set“s. Let's get an email server and send the message!

(define server-ip (invoke-static syslib "System.Net.Dns" "Resolve" "df-keekyoo"))

That assigns to the symbol “server-ip“ the result of invoking the static method “System.Net.Dns.Resolve“ in the assembly syslib for the exchange server “df-keekyoo“. The “invoke-static“ procedure requires us to separate the namespace of the target routine, namely “System.Net.Dns“, from the name of the target routine, namely “Resolve“. Sorry about that.

(define address (vector-ref (invoke server-ip "get_AddressList") 0))

Invoking the method “get_AddressList“, which, of course, is really the “get“ branch of the property “AddressList“, on the “server-ip“ object returned in the prior call to “System.Net.Dns.Resolve“ results in an array of addresses. We access the 0-th component of the array via the IEEE-standard routine “vector-ref“ to get the stuff we need for the next call. It all just works... nice.

(invoke-static maillib "System.Web.Mail.SmtpMail" "set_SmtpServer"
   
(invoke address "ToString"))

In an inner call, we invoke the .NET-standard “ToString“ method on the “address“ object to get the argument for a call of the static routine “System.Web.Mail.SmtpMail.set_SmtpServer“ in the “maillib“ assembly. This call is for side-effect, so we discard its return value.

(invoke-static maillib "System.Web.Mail.SmtpMail" "Send" email)

Finally, we send the email object.  Boom!  We're done.  Slick, eh?

Posted Tuesday, February 17, 2004 4:51 PM by brianbec | with no comments

Filed under:

Generalized Grimm's Law versus Precision

Grimm's Law documents certain technical points of language evolution. It notes a general softening of consonants and a general blurring of distinctions over time. Latin 'pater' becomes English 'father'. The formerly different consonants 'bh' and 'b' become the same consonant 'b'. I speculate that some sort of generalized Grimm's law applies to many aspects of language and thinking, particularly technical language. I've noticed, over time, a general softening of edges and blurrings of distinctions. Here are some examples:

“Lie” and “lay.” People used to care about the difference. “I lie down on the couch” meant now, “I lay down on the couch” meant some time in the past, and “I lie the book on the table” was unsayable. I don't think anyone cares about the difference any more.

The short 'e' and the short 'a' are merging. Very young people, especially, pronounce “friends” as “frands” and “help” as “halp”, for instance. Even television talking heads are not careful about the distinction. Next time someone on the phone asks if you have a “pen” handy, you might reply with a question: “cast-iron or aluminum with teflon?”

“It's” and “Its” used to be different. My mnemonic was “It's lost its apostrophe.” However, even in very formal writing, I see “it's” used for the possessive, something that used to be a single-point of failure in a school essay. If you wrote “... format it's hard drive ...” you'd get an “F” on your paper, no matter what else was in it.

Singular and plural used to be different. “Every user opens their inbox” was a train wreck, “user” being singular and “their” being plural. Hypersensitivity to sexism, however, has forced a blurring of this distinction, since no one wants to write “Every user opens his inbox“ -- that's annoying, “Every user opens his or her inbox” -- that's awkward, or “Every user opens her inbox” -- that's distracting. I don't know why no one notices that “Every user opens the inbox” would avoid the train wreck.

“Which” and “that” used to be different. One used to require a comma before “which”. The following two sentences used to have slightly different shades of meaning: “The book, which is on the table, is mine” and “The book that is on the table is mine.” The first sentance simply noted a peripheral fact about the book -- the fact that it's on the table. The second sentence noted a critical distinction between the book on the table and every other book. “The book which is on the table is mine“ would get you an “F“ on your school essay. No one cares about this distinction any more.

Hyphens used to be important. A “first-class object” made it clear that the object was first-class, whereas a “first class object” blurs the distinction with the very first object of a class. Used to be “first class object” would get you an “F” on your paper, but no one seems to notice the blurring of distinctions afforded by forgetting the hyphen.

I could go on, but suffice it to say that the overall, slow-but-steady blurring of distinctions over time results in a kind of global loss of precision in language. I think it's an entropic effect. Nevertheless, the loss of precision means that we tend to need more words to make our point. Papers get longer AND less precise. So, here's the value judgment: I don't think this is a great trend. I think careful thinkers -- those concerned with hygiene of thought -- will notice these trends and try to swim upstream against the driving current of entropy.

Posted Tuesday, February 17, 2004 11:59 AM by brianbec | 2 comment(s)

Filed under:

Why Script?

Just in case the case needs to be made…

Why bother scripting in the first place? Why not just write a program to solve your problems? A scripter is a program, of course, just one that accepts programming input from a console or a file. Scripting is a different architecture for solving a problem. The scripting architecture goes like this: let me write a program that CAN solve my problem given appropriate input in a script. The program architecture goes like this: let me write a program that DOES solve my problem; input and algorithms are all built-in to the program.

Why bother writing a bunch of scripting infrastructure on the way to solving some problem rather than just getting to the job of solving the problem? The reasons are the following

1.      The scripter infrastructure can be reused for future problems

2.      It’s more flexible to have the input outside the program, where it can be changed independently

a.       Having input outside the program saves a compilation step. The compilation step can be anywhere from inconvenient to impossible (imagine having to recompile Windows every time you changed win.ini? Same goes for any application with .ini files. The gizmo that reads the .ini file is a miniscripter built into the application.)

3.      Distributing the program to others is easier: users don’t need source to the program, just the script

4.      Scripting supports interaction for pedagogy, exploration, discovery, tinkering, debugging; hard-baked programs don’t

5.      Scripting, as opposed to mousey GUI interaction, supports iteration and conditionals (it’s hard to write a GUI that lets you replicate some gesture N times or do some click and drag only if some logical condition is true)

6.      Scripting supports unattended automation

7.      Scripting saves programmer time

8.      With extension technology, scripters can call ANY library on the system

Let me contrive an example to illustrate some advantages of scripting. Let’s say you want to write a program to solve Rubik’s Cube. Somewhere along the way, you’re going to write routines to rotate faces of the cube and some other routines to rotate the whole cube. Let’s call the face-rotation routines u, d, l, r, f, and b; for up, down, left, right, front, and back. Calling u results in one clockwise rotation of the current up face. To get a counterclockwise rotation, just call u three times. Let’s call the cube-rotation routines U, R, F; we don’t need D, L, and B since D would just be U U U. The cube-rotation routines change the identity of the faces; for instance, U is faceRàfaceF, faceFàfaceL, faceLàfaceB, and faceBàfaceR.

Now, with a scripting architecture, your program would just read commands from a listener console or from a file and perform rotations on a display of the cube. You could have all your complexity in the script, where you can change it without recompiling the your input. For instance, if you could also call random number generation from your script, you could scramble the cube by building up a string of the commands listed above. If your scripter supported abstraction, you could define new macros, like u’ ß u u u, representing a counterclockwise rotation, or c1ß(u’ r u) representing a commutator. With enough programming muscle in the scripter, you could write your analyzer and solver entirely in script, also. A GUI is going to have a hard time giving you any facilities for defining macros: you have to have the macros in your head and execute them in ‘unrolled’ form by dragging the faces of the cube around.

Composition would let us build up whole solutions and libraries of macros, all in the script. Here’s the point: once we wrote the scripter, we wouldn’t have to write much compiled code at all to work on the solution of Rubik’s cube, and we wouldn’t have to do lots of tedious clicking and dragging to perform manual solutions. Without a scripter, we would have to build up all our macros, compositions, analyzer, and solver, all in compiled code. The extra compilation step every iteration through the code adds up to a lot of precious programmer time. Having macro, analyzer, and solution code mixed in with the code that just mechanically moves and displays faces blurs the distinction and reduces our opportunities to see and exploit the differences.

 

Posted Sunday, February 15, 2004 12:51 PM by brianbec | with no comments

Filed under:

How to Script, Help me Test, Help my Math

Three quick things this morning. TOPIC 1: HOW TO SCRIPT. You may see my first post below on extending scripters with dynamically loaded libraries via wrapper generation, or superglue. Such is very much more interesting nowadays in the presence of CLR, mostly because

1.      scripters are MUCH easier to write

2.      DLLs come with full metadata, so we no longer need guesswork and empirical science (e.g., on export-syntax decorations)

3.      Reflection makes it VERY easy to load and to emit code. We need both for superglue. In the old days, we had to do this manually by reading and writing machine code.  

So I’ll be doing several things in this general, long-running topic over the future. First, I’ll show how to write a scripter on CLR by analyzing Luna-Scheme. We want to focus on implementation issues, and Scheme is a good choice for the following reasons:

1.      we don’t have to innovate in language design: we let someone else design the language, for instance, IEEE Standard 1178

2.      it’s very well documented how to compile and interpret this language, so we don’t have to so a lot of gratuitous innovating there (see SICP and EOPL for instance)

3.      it’s a small and simple language, by design, so we don’t have lots and lots of gewgaws, gargoyles, gadgets, and doodads to implement

In fact, the whole point of Extension as an architectural approach is precisely to put as much richness, depth and complexity OUT of the scripting language and INTO the libraries. Make the scripter as simple and small as we can; put all the complicated stuff in libraries. Building a big, feature-packed, hairy programming language at the scripting level is not what we’re after.

TOPIC 2: HELP ME TEST. Okay, it’s pretty obvious that I’m filling up my blog with test posts, trying to get math into the blog. Someone pointed out that it would be NICE if I didn’t do that, but, of course, I need to test. So, if someone could tell me HOW to post tests off the “Main Feed” of the blog, I’d be grateful. I am an ABSOLUTE NEWBIE to blogging, so lead me by the hand. The ONLY things I know how to do are the things you see here. I can write stuff and post it via default options of Newsgator and TonesNotes. That’s it, I don’t know any more. I don’t know how to distinguish the “Main Feed” from any other feed.

TOPIC 3: HELP MY MATH. Okay, it’s pretty obvious that I’m getting nowhere fast trying to post math into my blog. Anyone who successfully does math posting via WebTeX, MathType, MathML, Mathematica (Wolfram), you name it, if you’d be willing to lead me by the hand, I’d be grateful. Once again, I’m a blogging MORON, but I’m good at using all those other tools.

 

Posted Sunday, February 15, 2004 11:04 AM by brianbec | 1 comment(s)

Filed under:

The physics of racing

My old series of physics articles can be found at http://phors.locost7.info.  If I can figure out how to post Mathematical formulae and code to this forum, I may decide to write new articles in the Blog rather than trying to create PDF files and find hosting sites for them.

Posted Friday, February 13, 2004 7:49 PM by brianbec | 4 comment(s)

Email link below is bogus

The old email address, brianbec@hotmail.com, no longer exists.  Sorry about that, but the quote below is from an old essay of mine (http://www.angelfire.com/wa/brianbec/siodffi.html)

Posted Friday, February 13, 2004 7:47 PM by brianbec | 1 comment(s)

Wrapper Generation for Scripters

I'm still trying to get the hang of blogging.  Please forgive me if the format of the following QUOTE of an old, public essay of mine is terribly broken.  I don't have a good way to preview things before I post them.

 

Calling the C world from the Scheme World

A brief essay, with examples, by

Brian Beckman

Updated 12 January 2000 (thanks to John D. Corbett for probing questions)

Original 19 August 1999

brianbec@hotmail.com

What are we trying to do?

We'd like to call programs written in C and C++ from programs written in Scheme. You thought I was going to say "and vice versa", but that's not entirely true. Some C programs call other C programs through explicit function pointers, or callbacks. The only case in which we need C to call Scheme is the case of callbacks. In other words, we want to call C and C++ from Scheme and we want write our callbacks from C and C++ in Scheme.

 

To be a bit more blunt about it, we want Scheme to be in charge, and we want C and C++ programs to be the unwitting slaves of Scheme. In other words, we'd like our Scheme to be able to call any C/++ code whether that code was ever intended to be called by some other programming system. And, we want our methods to scale to real-world C/++ programs. For example, we'd like to call kernel32.dll and oleaut32.dll from Scheme code.

 

We can expand our ambitions to C++ vtables and to COM. If we can call C, we ought to be able to call through a vtable, which is just an array of (non-callback) function pointers. So, our class of unwitting slaves includes all of C++ and COM, including OLE Automation Servers, because all of these are based on vtables. From now on, then, I'll write “C" and mean “C and C++ and COM and OLE Automation" because we can handle all these cases in the same way.

 

We require only that the target C code be packaged in a DLL, or Dynamically Linked Library, and that the DLL export its entry points symbolically. DLLs are a very general standard for symbolic linking at runtime in the C-programming world on Microsoft platforms like Windows. Instead of using the operating system's symbolic linking loader, we'll just gin up our own inside an implementation of Scheme. This is how we're going to call C functions in DLLs.

 

All that said, this is just a demonstration. We're not actually trying to make an industrial-strength, shippable Scheme product that can call C. Instead, we're trying to demonstrate solutions to the "hard" problems of callbacks and transformations among C types to Scheme types, and leave the problems of incorporating these solutions into a product-grade implementation to others. However, this is not mere theory: we show actual solutions to these problems in real, scalable code, and we call big, hairy DLLs like kernel32.dll and oleaut32.dll directly, with no glue code written in C, and we’ve stressed this by calling millions of iterations over the “hard” bits.

Why would you want to do that?

Up front, let me say that I'm not trying to advertise or proselytize or push my work in the slightest way.  I built it for my own reasons and just thought some other Scheme and Siod users might find it amusing if not useful. 

 

C is a wonderful language for writing libraries of special-purpose and high-performance components. It's a lousy language for algorithms, data structures, scripting, distributed command and control, experimentation, prototyping, learning by doing, and so on. That is, it’s so-so when you need lots of flexibility and elbow room or when you need to focus on the application domain. C forces you to do all your own pipe-fitting and resource management, as well as to invent data formats and write custom I/O code for your application data.

 

Scheme is great for flexibility and elbow room and for maximizing the impact of your precious programming hours by taking care of data formats, types, and resources for you. But Scheme is not so good for special-purpose, high-performance libraries.

 

So, let's use C for what it's good for and Scheme for what it's good for. Let's also decide that we want to use all kinds of C libraries that might only have been intended to be called from other C programs. We want to be able to use libraries tomorrow we haven’t thought about today without having to drop everything and invent new, abstract wrappers for them. So we need to make our Scheme smart enough to fool C into thinking it's being called by C. This isn't as hard as it sounds.

C isn't Safe; how can you call it?

Everyone knows that Scheme is safer than C. Most memory-corruption bugs are simply impossible in Scheme, and they're quite easy in C. In fact, C is no safer than assembly language, and it's that way on purpose. C is meant for hardcore programming close to the machine. While it has nice abstraction features like static type-checking and object-oriented programming, it always gives you a way to break the rules when you need to do. If we're going to call arbitrary C code from Scheme, we need to deal with the fact that C code doesn't have anything like Scheme's notion of safety.

 

We have to open holes in Scheme's safety nets, but we can close them all up on the Scheme side (even though we haven't done that completely in this demonstration package). This means that we can do all sorts of potentially unsafe things in some isolated bits of Scheme code, and we can stress, debug, test, and quarantine all the unsafe code on the other side of a well-defined fence. We'll show lots more on this below.

What is the C World?

This is best shown by example. Consider the following C program:

 

#include <stdio.h>
#include <string.h<
 
__declspec (dllexport) int __stdcall    
TestVictim0 () {
   return 42 ;
   }
__declspec (dllexport) char * __stdcall
TV4 (char * s) {
   printf("Got this: \"%s\"\n", s) ;
   return s ;
   }
__declspec (dllexport) char * __cdecl
TV5cdecl (char * s, char * t) {
   return strcat (s, t) ;
   }
typedef int (*PFII) (int) ;
 
__declspec (dllexport) int __stdcall
TVcallback (int i1, int i2, PFII f) {
   int i = f ( i1 * i2 ) ;
   return i ;
   }

 

That is source code for a little DLL that has four exports, TestVictim0, TV4, TV5cdecl, and TVcallback. Trust me, the methods we develop here work for very big DLLs, like kernel32.dll and oleaut32.dll, and we show it below.

 

This DLL is completely in the C world. There is nothing in here about Scheme-like types. These exports give and take machine-word integers; strings, as machine-level addresses; and functions as machine-level addresses. All C-like stuff. Also, some of the entry points are cdecls, i.e., where caller pops arguments, and some of them are stdcalls, i.e., where callEE pops arguments. The former have the advantage of flexibility, that is, you can implement functions of variable arity, like printf, using cdecls. The latter have the advantage of brevity: call sites do not need to take up space with argument-popping instructions.

 

If we can call cdecls and stdcalls with ints, strings, and function pointers, we will have a sizeable fraction of the C world covered. Further, we argue, we've got all the hard cases covered and the rest is straightforward.

What is the Scheme World?

By example, once again. What should calls to the entry points above look like? A very simple, but surprisingly useful assumption is that C functions return machine words, which, on 32-bit architectures, are DWORDS or unsigned longs. The C language permits a function to "return a struct", that is, an aggregate much larger than 32 bits. But, this facility is not used very often in practice. Furthermore, compilers would be likely to implement the facility by putting the data somewhere and having the function return a pointer in a DWORD.

 

The upshot is that we may cover the vast majority of industrially important cases simply by assuming that C functions return DWORDs. This makes the Scheme "superglue" easier. The "superglue" is the generic C code we need to write as part of the Scheme implementation to extend it so it can call arbitrary C code. This is different from special-purpose "glue" that would extend Scheme so it could call a particular C function. If you haven't gathered it by now, we eschew any use of "glue" in favor of "superglue".

 

All that said, here is a real trace of some calls from Scheme, through Superglue, to the DLL shown above. The first one is easy to understand--the function returns 42, which superglue turns from a C number into a Scheme number. The next two return addresses, interepreted as Scheme numbers. To get at the data inside, we need some superglue for converting from addresses in the C world into values in the Scheme world. The last one converts the Scheme closure into a C function (by dyanmically painting code bytes in memory), calls it, and returns the answer.

 

(TestVictim0) --> 42
(TV4 "Hello World") --> 0x7B6D80
(TV5cdecl "yadda yadda, " "blah blah") --> 0x7B6C00
(TVcallback 3 4 (lambda (x) (* 2 x))) --> 24

 

To look inside the pointers returned by TV4 and TV5cdecl, we use the Scheme Superglue function cstring->string, which takes a Scheme number, which must have the value of a C pointer pointing to a C string, and conses up a fresh Scheme string. So, we get the following:

 

(cstring->string (TV4 "Hello World")) --> "Hello World"
(cstring->string (TV5cdecl "yadda yadda, " "blah blah")) --> "yadda yadda, blah blah"

Multiary Callbacks: WndProcs

 Now, in Windows, a WndProc is a quaternary callback, that is, a  

(lambda (hWnd msg wParam lParam) ...)

 

 So, given multi-ary (or multiadic) callbacks, and given that we have a way to populate C structs (see structs.scm in the source below), then we should be able to write Windows applications entirely in Scheme, and we can; here’s “Hello, World!”: 
 

(require "gdi32.scm")

(require "winuser.scm")

(require 'fm.scm)

 

(define (WndProc hWnd msg wParam lParam)

  (cond

   ((= msg WM_DESTROY) (PostQuitMessage 0))

   ((= msg WM_PAINT)

    (let* ((ps  (new-PAINTSTRUCT))

           (hDC (BeginPaint hWnd ps)))

      (TextOutA hDC 10 10 (string->cstring "Hello, World!") 13)

      (EndPaint hWnd ps) 0))

   (#t (DefWindowProcA hWnd msg wParam lParam))) )

 

(define (myCreateWindow)

  (let ((wc      (new-WNDCLASS))

        (clsNym  (string->cstring "GenericAppClass"))

        (appNym  (string->cstring "Generic Application"))

        (clsStyl (| CS_OWNDC (| CS_VREDRAW CS_HREDRAW)))

        (winStyl (| WS_OVERLAPPEDWINDOW (| WS_VISIBLE

                   (| WS_HSCROLL WS_VSCROLL))))

        (hInst   #x400000)

        (msg     (new-MSG)) )

 

    (WNDCLASS::set-lpszClassName wc clsNym)

    (WNDCLASS::set-lpfnWndProc   wc (callback WndProc ))

    (WNDCLASS::set-style         wc clsStyl)

    (WNDCLASS::set-hInstance     wc hInst)

    (WNDCLASS::set-hbrBackground wc 6)

 

    (ChkHandleReturn (RegisterClassA wc))

    (ChkHandleReturn (CreateWindowExA 0 clsNym appNym winStyl

                                      0 0 CW_USEDEFAULT CW_USEDEFAULT

                                      0 0 hInst 0))

 

    (while (GetMessageA msg 0 0 0)

           (TranslateMessage msg)

           (DispatchMessageA msg)) ))

Hostile Foreign-Function Interface in SIOD

We wrote a Hostile FFI for George Carette’s wonderful, little SIOD program (http://people.delphi.com/gjc/siod.html). The FFI is hostile because the C code doesn’t need to cooperate. We can run kernel32, COM, Ole Automation, etc. Please feel free to download the Visual Studio 6.0 Project. Build it, run it (you may have to adjust the project directories in the Debug Tab of the Project\Settings dialog box; make sure the program starts in the siodffi\scripts\scheme directory), and type

 

(load "regress.scm")

 

This will run a bunch of tests on Superglue, calling kernel32, calling raw COM, consing up callbacks like the ones documented above, and calling OLE Automation. Type

 

(load "windows.scm")

 

To see “Hello, World!” in action (wowee!!)

 

You can examine what’s going on and how we do it by just looking at the scheme code in “regress.scm” and backtracking through the Superglue code. Most of that code is in “Realload.cpp”, and there isn’t very much of it. Superglue consists entirely of the following new Scheme functions implemented in Realload.cpp and slib.cpp:

 

init_subr_1 ("GetProcAlist",          GetProcAlist ) ;

init_subr_1 ("GetProcAlistCDecls",    GetProcAlistCDecls ) ;

init_subr_2 ("MainDispatch",          MainDispatcher ) ;

init_subr_2 ("MainDispatchCDecl",     MainDispatcherCDecls ) ;

init_subr_1 ("cstring->string",       StringFromCString ) ;

init_subr_1 ("string->cstring",       CStringFromString ) ;

init_subr_1 ("array->rgpv",           RgPvFromLispArray ) ;

init_subr_2 ("rgpv->array",           LispArrayFromRgPv ) ;

init_subr_1 ("array->cbytes",         CByteArrayFromArray ) ;

init_subr_1 ("array->carray",         CByteArrayFromArray ) ;

init_subr_2 ("cbytes->array",         ByteArrayFromCByteArray) ;

init_subr_3 ("carray->array",         SiodArrayFromCByteArray) ;

init_subr_0 ("ScmGetSystemDirectory", SiodGetSystemDirectory ) ;

init_subr_0 ("sgsd",                  SiodGetSystemDirectory ) ;

init_subr_2 ("newVariant",            newVariant ) ;

  
init_subr_1 ("num->lisp",           LispFromFlonum    ) ;

init_subr_1 ("lisp->num",             FlonumFromLisp ) ;

init_subr_1 ("cptr->num",             FlonumFromCPtr ) ;

init_subr_1 ("num->cptr",             CPtrFromFlonum ) ;

init_subr_1 ("&",                     CPtrAddressOfCPtr ) ;

init_subr_0 ("cptr",                  newcptr ) ;

init_subr_1 ("val",                   indirect ) ;

init_subr_1 ("indirect",              indirect ) ;

init_subr_1 ("peek-dw",               indirect ) ;

init_subr_2 ("poke-dw",               pokeDw ) ;

init_subr_1 ("peek-byte",             peekByte ) ;

init_subr_2 ("poke-byte",             pokeByte ) ;

 

All the rest of the fun stuff is written in Scheme.

Alternatives

Ok, we have a pretty comprehensive solution to this, but there’s another partial one in the world, too. The Rice University PLT group has come up with a system for calling Active X controls from Scheme when those controls are scripted in an HTML window. References are below.

DrScheme V100

This is very cool. It’s R4RS-compliant, and our gizmo is not. But, it only calls ActiveX controls in an HTML context, and we call arbitrary C/C++/COM DLLs. See http://www.cs.rice.edu/CS/PLT/packages/drscheme/

Why not write safe, high-level Scheme Wrappers?

Now,  I had four reasons to opt for designing a "hardcore, unsafe" Hostile FFI over a set of sane wrappers: laziness (typeI), laziness (typeII), coverage, and uniquess. To elaborate:

 

1. Laziness (typeI): the Windows API is enormous: at least 5,000 APIs if one includes Multimedia, DirectX, OLE, (D)COM etc.; just enumerate the exports in *.dll in your Windows directory.  To cover even a small fraction of it with sane wrappers is a huge amount of work.  I once did sane wrappers for GL on SGI machines, and that was weeks of spade work for a 700-item API that is vastly simpler and more repetitive than that of Windows.  I couldn't imagine actually trying to do it for Windows in the first place, let alone trying to keep up with the deluge of new DLLs hitting the fan continually.

 

The lazy programmer's way out is to find a generic way to suck up any and all the APIs, and that's the way I took.  This approach, BTW, is not different in principle than the approch taken by Java FFI, Visual Basic's FFI, and a host of others.  I just don't use those language systems because I like scheme better (subjective rationale).

 

2. Laziness (typeII): It's hard enough to learn the minimum, required knowledge to write Windows programs. To learn this or that sane wrapper system *before* finding out whether it can actually do what you need is burdensome.  For example, I wanted to play with MIDI.  I was unable to find any way to hook into the Windows MIDI API amongst the current crop of free scheme implementations, e.g., DrScheme, VScheme, and SCM.  Those are all fabulous systems, but I had to devote some considerable time to learning them before realizing that I could not find out how to work MIDI with them. Note that this was not time spent on Scheme itself, but on the libraries and wrappers and interfaces and other accoutrements that come with each system. Now, I'm not an expert in those systems, and they *may* have MIDI capability, but I had only finite time to look and I couldn't find it within my time horizon.  I had to trade this off against how much time it would take me to write an FFI, and that turned out to be 3 days (because I already had my own PE loader and I already knew the insides of Siod). So this was another subjective point: it was easier *for me* to do it my way.

 

3. Coverage: I sort of touched on this, but there is another point.  I do not know, today, what corners of Windows I will need tomorrow (so this topic might be called ignorance or blindness:). I would hate to have to derail my train-of-thought to go design yet-another-sane-wrapper before I can get back to playing with Windows. So, I trade "sanity" off against cognitive throughput: if I can find away to slay the *entire* dragon (and all future dragons, to boot) with one blow, then I don't ever have to think about it again.  I'll gladly take a little insanity and usafety in exchange. Subjective again.

 

4. Uniqueness: There are lots of sane-wrapper solutions out there with their own pros and cons each.  I really didn't want to add another.

Posted Friday, February 13, 2004 7:39 PM by brianbec | with no comments

More Posts