Demystifying Managed Code and Compiler Output

Since publishing my Introduction to MSIL series, I have received a number of questions related to compilers and IL. In this post I hope to address a number of those questions.

 

Q How can I get the C++ compiler to produce IL?

A The short answer is that you can’t, but it’s not hard to do with the other tools that are available. Let’s walk through a simple example. Take the following code and save it in a text file called “example.cpp”.

void main()
{
    System::Console::WriteLine("Hello from C++/CLI");
}

This is a minimal C++/CLI program. Now compile it using the Visual C++ compiler (CL) directly. Normally the Visual C++ linker (LINK) is used to produce the build output but for simple examples it’s easier just to use the compiler directly.

cl.exe /clr:safe example.cpp

You should now have an assembly called “example.exe” that you can run to produce the following results:

C:\qa>example.exe
Hello from C++/CLI

The next step is to disassemble the EXE. That’s a job for the MSIL Disassembler (ILDASM), which is freely available with the .NET Framework SDK.

ildasm.exe example.exe /output=example.il

The output file now contains all the metadata and IL describing the assembly.

Using the MSIL Assembler (ILASM), which is part of the .NET Framework redistributable (which explains why it’s not in the same directory as ILDASM even if it is irritating), you can even recompile the program from IL to produce an equivalent EXE.

ilasm.exe example.il /output=example2.exe

This executable can then be run to produce the following results:

C:\qa>example2.exe
Hello from C++/CLI

Q Why is the executable produced by the Visual C++ compiler bigger than the executable produced by the MSIL Assembler?

A If you were following along at a friendly command prompt near you, you may have noticed that the EXE produced by CL is larger than the EXE produced by ILASM yet the results are exactly the same.

To understand what’s going on here you need to appreciate that .NET assemblies aren’t just managed code with an EXE or DLL extension. Assemblies are ultimately just Windows Portable Executable (PE) files. The PE format is derived from the Common Object File Format (COFF) and the specification can be found here. Don’t be fooled by the name. The PE file format is used for both EXE and DLL files.

PE files typically contain a variety of headers, some of which are there only for backward compatibility. Following the headers are a number PE file sections. These sections contain all the code and data for an executable image. There are a number of common section names, however they can be misleading since the data in various sections can be merged by the linker, assuming the sections have common attributes.

So what does this all have to do with the size of executables? Well different compilers may choose to emit different sections into the PE file. The PE file generated by CL in the example above includes three sections namely ‘.text’, ‘.rdata’ and ‘.reloc’. The ‘.text’ section is where all the code goes, among other thing. ‘.rdata’ is a read-only data section and used for things like string literals and C++ virtual function tables. The ‘.reloc’ section contains base relocations, but these are only really interesting for DLLs to optimize loading them into a process address space.

If you look inside the PE file generated by ILASM you will notice that it only has two sections, ‘.text’ and ‘.reloc’. ILASM has merged the data in the ‘.rdata’ section into the ‘.text’ section. The advantage of this is to save space, both on disk and in memory since at a minimum, each section occupies one page of memory. This can add up quickly since memory pages on 32-bit Windows reside on 4KB boundaries while pages on 64-bit versions of Windows reside on 8KB boundaries.

Fortunately we’re talking about Visual C++ here and when it comes to producing PE files there is just no competition. You may have noticed that CL, along with producing an EXE, also produced an OBJ file. These OBJ files incidentally use the same COFF format which PE files are based on. The Visual C++ linker (LINK) accepts one or more OBJ files as input and produces an EXE. One of the LINK options also allows you to combine sections in the resulting PE file.

link.exe example.obj /out:example3.exe /merge:.rdata=.text

The resulting PE file now contains only two sections with the data from the ‘.rdata’ section combined with the ‘.text’ section with the resulting section named ‘.text’. Looking at the size on disk, you should notice that example3.exe, produced by LINK, is the same size as example2.exe, produced by ILASM.

 

Q Why is the executable produced by the Visual C# compiler bigger than the executable produced by the MSIL Assembler?

A You might think the answer to the previous question should suffice but there’s a bit more going on here.

If you’re asking this question then it means you’ve missed something. You see the Visual C# compiler (CSC), which also ships as part of the .NET Framework redistributable, automatically emits a version resource into the resulting PE file to match the version information for the assembly. This resource data is stored in the ‘.rsrc’ section.

When you disassemble the EXE using ILDASM, it splits the results into two files. We’ve already mentioned the first file that contains the metadata and IL. The second file is a RES file which contains the compiled resource data extracted from the EXE.

So to disassemble and reassemble the EXE produced by CSC you need to tell ILDASM to include the RES file in the output.

ilasm.exe example.il /output:example3.exe /resource:example.res

The resulting EXE should be the same size as that produced by CSC and include the same version resource that can be viewed using Windows Explorer.


© 2005 Kenny Kerr

 

1 Comment

Comments have been disabled for this content.