Passing Strings by Ref

Humbled yet again…DOH! No matter how much experience you acquire, no matter how smart you may be, no matter how hard you study, it is impossible to keep fully up to date on all the nuances of the technology we are exposed to. There will always be gaps in our knowledge: Little 'dead zones' of uncertainty. For me, this time, it was about passing string parameters to functions. I thought I knew this stuff cold. First, a little review...

Value Types and Ref

Integers and structs are value types (as opposed to reference types). When declared locally, their memory storage is on the stack; not on the heap. When passed to a function, the function gets a copy of the data and works on the copy. If a function needs to change a value type, you need to use the ref keyword.  Here's an example:

    // ---- declaration -----------------
 
    public struct MyStruct
    {
        public string StrTag;
    }
 
    // ---- functions -----------------------
 
    void SetMyStruct(MyStruct myStruct)     // pass by value
    {
        myStruct.StrTag = "BBB";
    }
 
    void SetMyStruct(ref MyStruct myStruct)  // pass by ref
    {
        myStruct.StrTag = "CCC";
    }
 
    // ---- Usage -----------------------
 
    protected void Button1_Click(object sender, EventArgs e)
    {
        MyStruct Data;
        Data.StrTag = "AAA";
 
        SetMyStruct(Data);
        // Data.StrTag is still "AAA"
 
        SetMyStruct(ref Data);
        // Data.StrTag is now "CCC"
    }

No surprises here. All value types like ints, floats, datetimes, enums, structs, etc. work the same way.

And now on to...

Class Types and Ref

    // ---- Declaration -----------------------------
 
    public class MyClass
    {
        public string StrTag;
    }
 
    // ---- Functions ----------------------------
 
    void SetMyClass(MyClass myClass)  // pass by 'value'
    {
        myClass.StrTag = "BBB";
    }
 
    void SetMyClass(ref MyClass myClass)   // pass by ref
    {
        myClass.StrTag = "CCC";
    }
 
    // ---- Usage ---------------------------------------
 
    protected void Button2_Click(object sender, EventArgs e)
    {
        MyClass Data = new MyClass();
        Data.StrTag = "AAA";
 
        SetMyClass(Data);  
        // Data.StrTag is now "BBB"
 
        SetMyClass(ref Data);
        // Data.StrTag is now "CCC"
    }
 

No surprises here either. Since Classes are reference types, you do not need the ref keyword to modify an object. What may seem a little strange is that with or without the ref keyword, the results are the same: The compiler knows what to do.

So, why would you need to use the ref keyword when passing an object to a function?

Because then you can change the reference itself…ie you can make it refer to a completely different object. Inside the function you can do: myClass = new MyClass() and the old object will be garbage collected and the new object will be returned to the caller.

That ends the review. Now let's look at passing strings as parameters.

The String Type and Ref

Strings are reference types. So when you pass a String to a function, you do not need the ref keyword to change the string. Right? Wrong. Wrong, wrong, wrong.

When I saw this, I was so surprised that I fell out of my chair. Getting up, I bumped my head on my desk (which really hurt). My bumping the desk caused a large speaker to fall off of a bookshelf and land squarely on my big toe. I was screaming in pain and hopping on one foot when I lost my balance and fell. I struck my head on the side of the desk (once again) and knocked myself out cold. When I woke up, I was in the hospital where due to a database error (thanks Oracle) the doctors had put casts on both my hands. I'm typing this ever so slowly with just my ton..tong ..tongu…tongue.

But I digress. Okay, the only true part of that story is that I was a bit surprised.

Here is what happens passing a String to a function.

    // ---- Functions ----------------------------
 
    void SetMyString(String myString)   // pass by 'value'
    {
        myString = "BBB";
    }
 
    void SetMyString(ref String myString)  // pass by ref
    {
        myString = "CCC";
    }
 
    // ---- Usage ---------------------------------
 
    protected void Button3_Click(object sender, EventArgs e)
    {
        String MyString = "AAA";
 
        SetMyString(MyString);
        // MyString is still "AAA"  What!!!!
 
        SetMyString(ref MyString);
        // MyString is now "CCC"
    }

What the heck. We should not have to use the ref keyword when passing a String because Strings are reference types. Why didn't the string change? What is going on?  

I spent hours unssuccessfully researching this anomaly until finally, I had a Eureka moment:

This code:

String MyString = "AAA";

Is semantically equivalent to this code (note this code doesn't actually compile):

String MyString = new String();

MyString = "AAA";

Key Point: In the function, the copy of the reference is pointed to a new object and THAT object is modified. The original reference and what it points to is unchanged.

You can simulate this behavior by modifying the class example code to look like this: 

    void SetMyClass(MyClass myClass)  // call by 'value'
    {
        //myClass.StrTag = "BBB";
        myClass = new MyClass();
        myClass.StrTag = "BBB";
    }

Now when you call the SetMyClass function without using ref, the parameter is unchanged...just like the string example. 

I hope someone finds this useful.

Steve Wellens

11 Comments

  • The behaviour is hardly unexpected, Steve. Strings are reference types, true but they're special in that they are immutable. Since they're immutable, any operation generates a new instance. And that includes assignment. Passing a string by value cannot result in changes to the original string coz any change to a string results in a new string (immutable strings, remember).

    The same behaviour can be seen for anything that's immutable (or meant to be immutable). For example, structs. Structs are passed by value. The following example can show this clearly:

    public class A { public int V {get;set; }
    void ChangeV(A a){ a.V = 10; }

    A a = new A();
    a.V = 1;
    ChangeV(a);
    Console.WriteLine(a.V); //outputs 10

    Since A is a class, simply passing a to ChangeV changed the original a instance.

    Change A to a struct now and the same program would write 1. This is coz A being a value type (struct), passing it to ChangeV creates a new instance of A for the method and assigns 10 to the new instance.

    Same happens with string parameters - since any change creates a new instance, the original string passed in remains unchanged. Keeping strings immutable was a design decision to improve performance when dealing with strings among other things (they're immutable in Java too).


  • >> The behaviour is hardly unexpected

    Well I didn't expect it. There was no need for you to explain why, that's what my post was for.

    Thanks for your comment Ashic.

  • Hmm...I guess you expect it when you know it already. Strings are the only built in immutable reference type and such can get confusing.

    I wanted to point out the immutability of strings but then veered off track with the example I guess ;)

  • It is amazing how far developers can get these days without truely understanding the fundamentals of their choosen language. Its not only .Net developers, though, there is a huge debate on one of the Java sites as to whether the language uses pass-by-value or pass-by-ref.

  • Java??? How dare you use the 'J' word here!

    :)

  • Hey Arun,
    I agree the topics are the same...but the content/style is quite different. Hopefully, readers will benefit from having multiple discussions of the same topic. If one discussion doesn't make the light bulb turn on, maybe another will!

  • This is one of those weird things that really makes me wonder what model people have in their heads that makes this a difficult concept.  Strings are like every other reference-type class in the system, when passed by value you can change the instance properties but not the instance variable itself.  No exceptions, no surprises, this is how all classes work.  The only tiny quirk is the string happens to have no mutable instance properties, but then so do a lot of classes that I write myself.
    And yet programmers get surprised by it over and over.  I honestly have no idea why.  My guess is that the confusion has something to do with the initialization semantics, but I'm really not sure.  I always wonder if it's just a misunderstanding with String, or if some people have a different model in their heads of the whole class system and how it works.

  • "My guess is that the confusion has something to do with the initialization semantics, but I'm really not sure."

    I think you are on to something there. It may have something to do with the fact that a new string object is created on assignment... automatically... with no 'new' keyword.

  • "Value types ... when declared locally, their memory storage is on the stack, not on the heap"
    Don't let Eric Lippert read that!
    "The real statement should be: In the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value."
    blogs.msdn.com/.../the-truth-about-value-types.aspx

  • I supposed if I was writing a legal document I might include that.

    However, when communicating fundamental concepts, that type of verbose, minutia can detract from, and obfuscate, the underlying message.

  • Really good article on how to pass strings by ref, thanks for posting!!..

Comments have been disabled for this content.