Make first letter uppercase

Note: this entry has moved.

The traditional way of doing this is suming the difference between letter "A" and "a" to the first character. However, this will not work in internationalized scenarios. Here are the good reasons together with a great article on Unicode in general.
So I use the following approach, which is I18N-ready:

private string MakeFirstUpperMakePascalCase( string name )
{
if ( name.Length <= 1) return name.ToUpper();
return Char.ToUpper( name[0] ).ToString() + name.Substring( 1 );
Char[] letters = name.ToCharArray();
letters[0] = Char.ToUpper( letters[0] );
return new string( letters );
}

We don't need to pass the current culture to both String.ToUpper and Char.ToUpper() as they already do that internally.
Do you think there's a more efficient/cleaner way of doing this?

26 Comments

  • Error checking before you assume that name[0] is valid? or substring(1) come to that.

  • I would add error checking.



    private string FirstUpper(string name)

    {

    if(name.Length &lt; 0) return &quot;&quot;;

    if(name.Length == 1) return name.ToUpper();



    return name[0].ToUpper + name.Substring(1);

    }



    CIAO

    Michael

  • How about using a StringBuilder?



    private string MakePascalCase(string input)

    {

    StringBuilder sb = new StringBuilder(input, input.length);

    sb[0] = Char.ToUpper(sb[0]);

    return sb.ToString();

    }

  • Hmm...



    Tests i've done show that the non-stringbuilder version is slightly(!) faster than the stringbuilder version. These tests are far from accurate, but they give you a good idea.



    The difference isn't so big.

  • There are plenty of articles from MS and the CLR Perf. team members stating what I said. Of course, I didn't discover anything. I tend to follow their advice because they are the ones that build the features.

    And I must say that the &quot;slightly&quot; has to be carefully measured, because if slightly is 0.0001 second in an operation that could take 0.00008, it's A LOT, even if it looks like a very small amount of time, but it's a 20% worse.

  • Well, I think it's safe to assume that the proper UNICODE characters will be used, NOT a combination of other characters. If we were to take this misuse into account, we would have to constantly look for ligatures and keep a full table of such combinations. This is FAR away from my target.

    I think it's enough to ensure our code uses unicode properly, and it's safe enough to rely on .NET implementation of it. Therefore, in your second example, I think it's .NET's Char struct responsibility to give me a ToTitle() method if it's appropriate. I won't make that work myself, that's for sure.

  • use the ToTitleCase method

  • &quot;Well, I think it's safe to assume that the proper UNICODE characters will be used, NOT a combination of other characters.&quot;



    Those are all &quot;proper&quot; Unicode characters.



    &quot;Therefore, in your second example, I think it's .NET's Char struct responsibility to give me a ToTitle() method if it's appropriate. I won't make that work myself, that's for sure.&quot;



    I agree. While I think it's perfectly acceptable for a language and/or library to take the low level approach (&quot;here's a bunch of bytes, work it out yourself&quot;) once higher-level string handling functions are provided they should be complete; a modern library that provides ToUpper() and ToLower() should also provide a ToTitle().

    Now I've been thinking in very low-level terms of late -I was doing the code for Appendix A of the article you linked to - so I apologise for not thinking of this earlier, but .NET does indeed have a ToTitleCase() method, as Panos Theofanopoulos pointed out. Panos wins the prize for finding the &quot;more efficient/cleaner way&quot; :)

  • Why would one ever want to convert an arbitrary string to CamelCase anyway? This strikes me as linguistically nonsensical since different languages have different capitalization and wordbreak rules.



    If you're doing it because the string is always an English variable name, then just admit as much and don't pretend that you're i18n-ready.

  • If you're writing a code generation facility, you'll most probably want to convert the default XML naming convention of camelCase to PascalCase, as that's the default in .NET applications. It's not an arbitrary string, and I like I said repeatedtly, there are *NO* wordbreaks! AFAIK, XML names basically satisfy these requirements, so I don't need to worry about that.

    If a chinese element name starts with a lowercase (whatever that is in chinese), I want my program to properly make the first letter upper case. I also explained this in previous comments.

    And I don't care if the resulting variable name is in chinese, I just want it (the public property, actually) to follow .NET naming style of starting with an upper case.

    As for different languages having different capitalization, isn't that what Unicode was created for (in part)?

  • Definitely wrong to add error checking as suggested by Michael Schwarz: a System.String can't have a length &lt; 0.

    I'd say your original is optimal except that the method name is a bit misleading: capitalizing the first letter is not the same as making pascal case:



    MakePascalCase(&quot;helloworld&quot;)

    MakePascalCase(&quot;helloworld helloWorld&quot;)

  • Finally!!! You got the idea!

    I definitely agree it should be called MakeFirstUpper(), or something like that...

  • You could also add a refrence to Microsoft Visual Basic .Net Runtime and then use something like this:



    Microsoft.VisualBasic.Strings.StrConv(&quot;some string&quot;, Microsoft.VisualBasic.VbStrConv.ProperCase);

  • Mmm... as a C# developer, I think I'll pass on use Microsoft.VisualBasic...

    Anyway, I found a more efficient way:



    Char[] letters = theString.ToCharArray();

    letters[0] = Char.ToUpper( letters[0] );

    return new string( letters );

  • &quot;ToTitleCase serves a different purpose&quot;

    Yes, but it could be used as the basis of a MakePascalCase(). Depending on how much care you want to put into locale considerations you could either put the portion of the string up until the first upper-case, title-case or uncased character into title-case; or for a faster version that should be fine with identifiers merely put the first character into title-case (for this type of code I think I'd go for the latter).



    &quot;Because as far as Unicode is concerned, they are not &quot;equivalent&quot; at all.&quot;

    All the more reason to accept them (if they *were* equivalent then normalising to NFC would get rid of the complexity, although in some cases there might be security issues with such normalisation).



    &quot;If you're writing a code generation facility, you'll most probably want to convert the default XML naming convention of camelCase to PascalCase, as that's the default in .NET&quot;

    Personally I'd rather convert PascalCase to camelCase, but that's just a matter of taste (though I do try to force myself to use PascalCase if doing .NET or VB6 for the sake of anyone reading the code who is only used to those environments).



    I'd be wary of this in some cases, there are times when the choice of camelCase and PascalCase are chosen by a convention where the choice reflects something, one example would be RDF/XML where camelCase is often used for predicates and PascalCase for classes. While it would still be poor design to have to identifiers that differ only in case it might still happen, and someone very used to such a convention might find the case &quot;stands out&quot; more than the names, and hence not notice if they had an author predicate and an Author class in the same namespace.

  • how do you expect anyone to read the code no one's gonna bother changing their settings to see it

  • Don't know what you mean. You don't have to change any setting to see my code :S

  • Don't need to re-invent the wheel.

    //Begin code: first letter of every word to upper case
    string sString = "cool string";

    System.Globalization.TextInfo tiInfo = new System.Globalization.TextInfo();

    sString = tiInfo.ToTitleCase(sString);
    //End code

    the sString variable will now be "Cool String".

    Look before leap, research before code.

  • Correct that second line to:
    //Begin Code
    System.Globalization.TextInfo tiInfo = new System.Globalization.CultureInfo("en-US", false).TextInfo;
    //End Code

  • hi, this should in C# it should be used in for one string and first letter of the paragtraph

  • myString = myString[0].ToString().ToUpper() + myString.Substring(1, myString.Length-1);

  • public class Upper {
    public static String Upper() {
    String str = SavitchIn.readLine();
    int i = str.length();
    char c[] = str.toCharArray();
    c[0] = Character.toUpperCase(c[0]);
    str = Character.toString(c[0]);
    for (int count = 1; count < i; count++){
    if (c[count] == ' '){
    c[count+1] = Character.toUpperCase(c[count+1]);
    String replace = Character.toString(c[count]);
    }
    String replace = Character.toString(c[count]);
    str = str.concat(replace);
    }
    return str;
    }
    }

  • I wonder how that is more efficient than my original proposal... you're doing a tight loop, then you're creating two strings (I guess the double declaration of "replace" is just a typo), and then doing a string.concat which creates yet another string.

    Those are many more little objects than I'd want... also, as per the discussion above, we're not trying to split by ' '. That should be done with TextInfo.ToTitleCase()

  • just do this:

    private string FirstToUpper(string makeUpper){
    return Char.ToUpper(makeUpper[0]+makeUpper.Substring(1);
    }

  • Yup, that one is a shorthand to my code.
    Cool!

  • Help you :


    .allUpperCase { text-transform: uppercase }
    .allLowerCase {text-transform: lowercase}
    .firstUpperCase {text-transform: capitalize}

Comments have been disabled for this content.