Neat JIT performance improvement when using BR or BR_S
If you had your choice of how IL got JIT'ed down to the host processor assembly language, how would your JIT handle unconditional branching with regards to how the IL is laid out? I would think the JIT'er would be a two step processor myself. It would basically lay out the assembly to kind of mimic what the IL looks like. However, the JIT'er as programmed, has a really cool optimization. Now, I'm not sure if this is fully implemented as an optimization, but I'm pretty sure it is. Check out the following code:
IL_0000: br IL_0005
IL_0005: br IL_000a
IL_000a: br IL_000f
IL_000f: br IL_0014
IL_0014: br IL_0019
IL_0019: br IL_001e
IL_001e: br IL_0023
IL_0023: br IL_0028
IL_0028: br IL_002d
So when the JIT'er encounters a forward jumping unconditional break, what does it do? Apparently, it doesn't do much at all. It simply moves the instruction pointer to the new location, and specifies that the IL be processed from there. There are some assumptions that I'm making here, because the JIT'er does a few additional things. It makes sure you aren't jumping out of try/catch blocks. That is a given, since the jumps would have to be different if that were the case. It also makes sure the address jump doesn't run outside of the bounds of the method and all of the other basic checking. If the target has already been JIT'ed this optimization is kind of turned off as well, since we can't just start JIT'ing at the next address.
This brings me to something that might be really cool. Unconditional breaks over unused code might mean that unused code isn't compiled into memory. Code coverage into today's world is a big performance tool for removing unused code-paths that aren't in use anymore, or for ensuring that the test cycle covers all of the code that needs to be tested. However, at least from a performance perspective, the JITer is doing some of this work for you already. Taking a large assembly that is nothing more than say a break to the next statement, and doing this many thousands of times, results in a piece of JIT'ed code the same size as if you had only done the jump once.
A second side effect is that your code, no matter how it is formatted as IL, is also execution inlined in many cases.
While you can't really take advantage of this knowledge unless you want to be extremely picky about how you program your managed code, it is nice to know that the JIT is doing some really cool things behind the scenes. Note this isn't a license to leave in old code, since while the JIT'er may create an optimized method out of it, that would be the same as if the code were not even there, the JIT process still takes into account the total size of the IL when creating a FJitState tables and the resulting unmanaged code buffers. The smaller your IL the less memory the compilation of your method will take up. Also note that the code enhancements are different when you are inside of try/catch blocks. If you stay within the try/catch blocks, then things work fine, but trying to jump out of these blocks might have really bad performance ramifications. I might leave that for another entry once I've learned more about how the JITer works with exception catching code.
I also just learned of the FJitState tables and of some strange tuning metrics used by the JITer when allocating the unmanaged code buffer. I'm not sure yet, but I think these items might be a possible source of code performance improvement (or at least memory footprint improvement). Again, another entry, another time, but something to look forward to.