Introduction to MSIL – Part 7 – Casts and Conversions

After a brief hiatus, I decided to post a few more parts of the Introduction to MSIL series.

Casting and type conversion are often raised as topics of concern among programmers. It might be concerns over performance and safety or simply that the implications of casts and conversions are not well understood. In this part of the series I explore these concepts. I will primarily use C++ for illustrations as it is more descriptive in its language constructs related to casting. Of course we will also consider the CLI instructions that are ultimately generated.

Static Casts

When converting from a type with a small value space to a larger value space, an implicit cast can be employed as there is no danger of losing data. This is safe as long as the resulting value space is a superset of the smaller value space. Converting from a 32-bit integer to a 64-bit integer is guaranteed to be accurate, so compilers don’t require any explicit confirmation from the programmer. On the other hand, when converting from a larger type to a smaller type, it is possible for the data to be truncated; therefore compilers typically require confirmation in the form of a cast. This is referred to as a static cast, since only static type information is used in determining the conversion behavior. A static cast is often considered dangerous, as the compiler places the responsibility of ensuring the safety of the cast squarely in the hands of the programmer. Consider the following C++ code:

Int32 small = 123;
Int64 big = small;
 
small = static_cast<Int32>(big);

Conversion from the small variable to the big variable is implicit but conversion from big to small requires the static_cast operator to avoid a compiler warning regarding possible loss of data. Although static casts can and should be considered dangerous, the compiler goes to great lengths to ensure that it is at least in the realm of possibility that the cast will actually be correct at runtime. It does this by considering the static type information, and in the case of user-defined types, whether there are any conversion operators defined. All this is determined at compile-time. Consider the following representation in MSIL:

.locals init (int32 small,
              int64 big)
 
// Int32 small = 123;
ldc.i4.s 123
stloc small
 
// Int64 big = small;
ldloc small
conv.i8
stloc big
 
// small = static_cast<Int32>(big);
ldloc big
conv.i4
stloc small

The ldc.i4.s instruction pushes a 4 byte (32-bit) integer with a value of 123 onto the stack. This is then stored in the small local variable using the stloc instruction. To assign the value of the small variable to the big variable, the value is first pushed onto the stack using the ldloc instruction. The conv.i8 instruction then converts the value to an 8 byte (64-bit) integer. This is then popped off the stack and stored in the big local variable using the stloc instruction. Finally, to convert the value of the big variable to the small variable again, the value stored in the big local variable is pushed onto the stack and the 8 byte value on the stack is converted to a 4 byte value using the conv.i4 instruction. This is then stored in the small local variable using the stloc instruction.

As you can see, MSIL does not make a distinction between implicit and explicit conversions. Everything is explicit. Overflow protection, however, needs to be requested directly, as by default no checking is provided. In the code above, no provision is made for overflow. This code happens to be safe because we know it to be so, but if the value stored in big was too large to be stored in a 4 byte integer, it would simply overflow with no warning at runtime.

Checked Conversions

To detect overflow errors, you can simply replace the conv.i4 instruction in the previous example with the conv.ovf.i4 instruction. If the value on the stack is too large to be represented by the specified type, an OverflowException object is thrown. The C++/CLI language design, introduced in Visual C++ 2005, does not yet provide a language feature to request conversion with overflow checking. A feature is being considered to add a checked keyword, similar to the one provided by C#:

checked
{
    int small = 123;
    long big = small;
 
    small = (int) big;
}

Any arithmetic operations and conversions in a checked block or expression will include overflow-checking. As mentioned above, the conv.<type> set of instructions are replaced by conv.ovf.<type>. Other instructions that could result in overflow also have corresponding checked versions. For example the add instruction, described in part 2, has corresponding add.ovf and add.ovf.un instructions for checked addition of signed and unsigned values respectively.

Dynamic Casts

Static casts can also be used to cast up a class hierarchy, in effect performing a cast of a polymorphic type but making an assumption that the cast will always succeed at runtime. This is useful to avoid the cost of a run-time type check, assuming it is safe. Of course, this is generally a dangerous assumption to make.

Dynamic casts are used to allow the determination of whether the cast is valid to be deferred to run-time. Of course this only makes sense with polymorphic types. As with static casts, the compiler tries to ensure that it is at least plausible that the cast may succeed. Trying to dynamically cast from an integer to a House object will be caught by the compiler and flagged as an error. Consider the following C++ code. Assume BeachHouse and Townhouse are subclasses of House.

House^ house = gcnew BeachHouse;
 
if (nullptr != dynamic_cast<BeachHouse^>(house))
{
    Console::WriteLine("It's a beach house!");
}
 
if (nullptr != dynamic_cast<Townhouse^>(house))
{
    Console::WriteLine("It's a townhouse!");
}

This code will compile perfectly without the compiler knowing whether the casts will succeed. Both casts require a run-time type check to determine whether the house is in fact a BeachHouse or a Townhouse. dynamic_cast returns a nullptr, the C++ representation for a null pointer or handle, if the cast cannot be performed. Let’s see how we can represent this in MSIL:

.locals init (class House house)
 
newobj instance void BeachHouse::.ctor()
stloc house
 
ldloc house
isinst BeachHouse
ldnull
 
beq.s _TOWN_HOUSE
 
ldstr "It's a beach house!"
call void [mscorlib]System.Console::WriteLine(string)
 
_TOWN_HOUSE:
 
ldloc house
isinst TownHouse
ldnull
 
beq.s _CONTINUE
 
ldstr "It's a town house!"
call void [mscorlib]System.Console::WriteLine(string)
 
_CONTINUE:

Having read through this series thus far, this code should be pretty clear. A BeachHouse object is created and a reference to it is stored as a local variable. The dynamic cast is performed by pushing the reference onto the stack and using the isinst instruction to do the type check. The isinst instruction evaluates whether the reference on the stack is a BeachHouse or a subclass of the BeachHouse type. If it is, then the reference is cast to the type defined in the isinst instruction and pushed onto the stack. Otherwise the value null is pushed onto the stack. The expression for the if statement is constructed by pushing a null reference onto the stack and using the beq.s instruction to transfer control to the target if isinst also pushed null onto the stack. The same type check and conditional branching is done for the Townhouse type, with execution ultimately continuing on.

If your application assumes that the dynamic cast must always succeed for the program to be correct, but you don’t want to resort to a static cast to ensure correctness in the exceptional case, you can employ the castclass instruction in place of the isinst instruction. The castclass instruction performs the same type check, but instead of pushing null onto the stack if the type check fails, it throws an InvalidCastException object. If this is the behavior you are looking for, you can use the safe_cast operator in C++ or a simple (C-style) cast in C#.

Read part 8 now: The for each Statement


© 2004 Kenny Kerr

2 Comments

  • Thanks for the comment William.



    No, the MSIL is correct. It accurately duplicates the functionality of C++ example above it. Of course the C++ example could probably have been a bit better. Although it’s possible that a beach house could also be a townhouse, it’s not all that likely.



    :)

  • i've been reading this, too. just wanted to let you know that i also appreciate the effort you've put into it.

Comments have been disabled for this content.