March 2003 - Posts
There's been much talk recently about NUnit rocking. It certainly does rock (although it's frighteningly over-engineered). I was interested to see an article by Eric Gunnerson espousing the use of NUnit, for two reasons: 1 - a lot of C# developers listen to Eric, so this will hopefully encourage a lot of developers to start doing test-driven development, and 2 - someone from Microsoft encouraging the use of an open source product?!?! Where's the embrace-and-extend? Where's 'M'-Unit?
Also, be sure to check out man/legend Ron Jeffries' series on TDD with C# and NUnit:
http://xprogramming.com/xpmag/acsIndex.htm
I was all over the profiling and metadata jigsaw puzzle a couple of weeks ago, foraging for clues on Google in the header files and the specs. I've come to the conclusion that this is a tragically under-explored, under-used and under-documented part of .Net, and Microsoft really need to get their act together. There's so much that can be done with these API's and I'm sure people are scared off by the lack of support.
So, with a view to improving the situation, I will keep writing up my adventures in this hyar blog.
In the last entry I showed how to get a pointer to a method. Now we're going to examine the details of that method:
There are two formats for method blobs - tiny and fat, represented by the structs IMAGE_COR_ILMETHOD_TINY and IMAGE_COR_ILMETHOD_FAT. The tiny format is used for methods less than 64 bytes in size, with no local variables. Most methods you will encounter will be fat, but the two formats need to be approached differently.
One problem I've had is going directly from the raw blob to one of those structs. As far as I can tell, we need to first go through a union called IMAGE_COR_ILMETHOD, which can hold a method in either tiny or fat form:
LPCBYTE pMethodHeader;
ULONG methodSize;
m_pProfilerInfo->GetILFunctionBody(moduleID, functionToken, &pMethodHeader, &methodSize);
IMAGE_COR_ILMETHOD* pMethod = (IMAGE_COR_ILMETHOD*)pMethodHeader;
IMAGE_COR_ILMETHOD_FAT fatImage = pMethod->Fat;
The trouble with this is that the method could be tiny. Luckily, Microsoft provide a couple of structs in the corhlpr.h header which make things a little easier: COR_ILMETHOD_TINY and COR_ILMETHOD_FAT (they both inherit from the IMAGE_xxx types above). They have the methods IsTiny() and IsFat() respectively. Thus:
IMAGE_COR_ILMETHOD* pMethod = (IMAGE_COR_ILMETHOD*)pMethodHeader;
COR_ILMETHOD_FAT* fatImage = (COR_ILMETHOD_FAT*)&pMethod->Fat;
if(!fatImage->IsFat()) {
COR_ILMETHOD_TINY* tinyImage = (COR_ILMETHOD_TINY*)&pMethod->Tiny;
//Handle Tiny method
}
else {
//Handle Fat method
}Now that we've tamed the method blob, we can look at the details of the header, and the code that comes after it. As an example: printf("Flags: %X\n", fatImage->Flags);
printf("Size: %X\n", fatImage->Size);
printf("MaxStack: %X\n", fatImage->MaxStack);
printf ("CodeSize: %X\n", fatImage->CodeSize);
printf("LocalVarSigTok: %X\n", fatImage->LocalVarSigTok);
byte* codeBytes = fatImage->GetCode();
ULONG codeSize = fatImage->CodeSize;
for(ULONG i = 0; i < codeSize; i++) {
if(codeBytes[i] > 0x0F) {
printf("codeBytes[%u] = 0x%X;\n", i, bytes[i]);
}
else {
printf("codeBytes[%u] = 0x0%X;\n", i, bytes[i]);
}
}
The first field, Flags, consists of 12 bits and describes the method format (Fat or Tiny), whether local variables should be initialized before use, and whether there are more sections tagged onto the end of the main code block. Size is the size in DWORDs of the header structure, not the whole method. Currently this is always 3 (12 bytes). MaxStack is, yup, the maximum stack size required by this method (stack size is measured in abstract "items", not in physical memory). CodeSize is the size in bytes of the IL blob following the header, and LocalVarSigTok is a metadata token. It acts as an index into the metadata table of the module containing this method. I'll cover this in more depth next time, and show you how to emit bytecode.
Today I spent a (mostly) happy eight hours wrestling with the CLR's unmanaged profiling API. There's not much in the way of documentation, but with the help of an Under the Hood article I got a basic profiler up-and-running. The goal of this is to implement method interception, replacing the MSIL of a method on-the-fly. While it wasn't too hard to get a profiler working, the code-replacement stuff is much trickier, and I couldn't find any code examples for doing this (barring a few hints from John and Ingo). My C++/COM isn't up to much either.
Anyway, by hooking into the JitCompilationStarted event, I can get hold of whatever function the CLR is about to compile. The callback looks like this:
HRESULT ProfilerCallback::JITCompilationStarted(FunctionID functionID, BOOL fIsSafeToBlock)
functionID is self-explanatory, and fIsSafeToBlock tells you whether it's OK to hold up the runtime while you do whatever you're doing (I think - the docs aren't very clear on this). Now we have a FunctionID, we can use it to find a function's signature, which class and module it belongs in and, most importantly in this instance, its body. First we need to jump through a small hoop:
ClassID classID;
ModuleID moduleID;
mdToken token;
m_pProfilerInfo->GetFunctionInfo(functionID, &classID, &moduleID, &token);
GetFunctionInfo takes that FunctionID and gives you back ID's for the class and module containing that function, as well as a metadata token. According to John Lam, "a MethodDef token is simply the row number of the corresponding entry into the MethodDef table of a module. MethodDef tokens are static data; they are generated at compile time". You might have seen the ldtoken MSIL opcode before - it takes a MethodDef as an argument. The reason we need a token instead of the FunctionID we had before is that tokens are valid even before the function has been loaded by the CLR.
OK, so now we pass the moduleID and the token back to the runtime like so:
LPCBYTE header;
ULONG methodSize;
m_pProfilerInfo->GetILFunctionBody(moduleID, token, &header, &methodSize);
...and we get back a pointer (to a pointer) to the method header, and the method's size.
Methods are stored as contiguous chunks of memory, consisting of a header block - which contains things like the maximum stack size, initialization flags and pointers to any local variables - the body itself, and zero or more 'sections' of extra data (like exception-handling stuff, and possibly switch tables?). The details are in Partition II of the CLI spec.
This is where it starts getting hairy, and my C++ skills start to falter. There are a few different ways to lay out a method in memory, and there are structs and unions defined in the CLR headers to give easier access to the various fields, so there's a lot to get my head round. I'll attack it again tomorrow, and write up my findings.
Cheers,
Jim
More Posts