There's madness in my methods
Today I spent a (mostly) happy eight hours wrestling with the CLR's unmanaged profiling API. There's not much in the way of documentation, but with the help of an Under the Hood article I got a basic profiler up-and-running. The goal of this is to implement method interception, replacing the MSIL of a method on-the-fly. While it wasn't too hard to get a profiler working, the code-replacement stuff is much trickier, and I couldn't find any code examples for doing this (barring a few hints from John and Ingo). My C++/COM isn't up to much either.
Anyway, by hooking into the JitCompilationStarted event, I can get hold of whatever function the CLR is about to compile. The callback looks like this:
HRESULT ProfilerCallback::JITCompilationStarted(FunctionID functionID, BOOL fIsSafeToBlock)
functionID is self-explanatory, and fIsSafeToBlock tells you whether it's OK to hold up the runtime while you do whatever you're doing (I think - the docs aren't very clear on this). Now we have a FunctionID, we can use it to find a function's signature, which class and module it belongs in and, most importantly in this instance, its body. First we need to jump through a small hoop:
ClassID classID;
ModuleID moduleID;
mdToken token;
m_pProfilerInfo->GetFunctionInfo(functionID, &classID, &moduleID, &token);
GetFunctionInfo takes that FunctionID and gives you back ID's for the class and module containing that function, as well as a metadata token. According to John Lam, "a MethodDef token is simply the row number of the corresponding entry into the MethodDef table of a module. MethodDef tokens are static data; they are generated at compile time". You might have seen the ldtoken MSIL opcode before - it takes a MethodDef as an argument. The reason we need a token instead of the FunctionID we had before is that tokens are valid even before the function has been loaded by the CLR.
OK, so now we pass the moduleID and the token back to the runtime like so:
LPCBYTE header;
ULONG methodSize;
m_pProfilerInfo->GetILFunctionBody(moduleID, token, &header, &methodSize);
...and we get back a pointer (to a pointer) to the method header, and the method's size.
Methods are stored as contiguous chunks of memory, consisting of a header block - which contains things like the maximum stack size, initialization flags and pointers to any local variables - the body itself, and zero or more 'sections' of extra data (like exception-handling stuff, and possibly switch tables?). The details are in Partition II of the CLI spec.
This is where it starts getting hairy, and my C++ skills start to falter. There are a few different ways to lay out a method in memory, and there are structs and unions defined in the CLR headers to give easier access to the various fields, so there's a lot to get my head round. I'll attack it again tomorrow, and write up my findings.
Cheers,
Jim