ASP.NET Hosting

Removing diacritics (accents) from strings

It's often useful to remove diacritic marks (often called accent marks) from characters. You know: tilde, cédille, umlaut and friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à' becomes 'a'. This could be used for indexing or to build simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can play with String.Replace or regular expressions... But do you know .NET 2 has all that is required to make this easier?

You should use this kind of code for example:

public static String RemoveDiacritics(String s)
{
  String normalizedString = s.Normalize(NormalizationForm.FormD);
  StringBuilder stringBuilder = new StringBuilder();

  for (int i = 0; i < normalizedString.Length; i++)
  {
    Char c = normalizedString[i];
    if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
      stringBuilder.Append(c);
  }

  return stringBuilder.ToString();
}

This piece of code comes from Michael Kaplan's blog. I've been using it to clean my URL keys. It works perfectly and I suspect it's more efficient than other solutions.

12 Comments

  • Well, 'ü' should become 'ue'. This is the legal alternative. (like 'ß' is equivalent to 'ss')

  • Good remark! I'll pay attention to this.

  • ü => ue is perhaps the "german way" but in France ü => u.

    ü => u is consistant with the function name RemoveDiacritics (= remove the extra symbol ¨ on the primary character)

  • Nice :-)
    I have been looking for a function to do this for a while!!
    cheers,
    joe

  • For VB.NET programmers like me :

    Public Function RemoveAccentMarks(ByVal s As String) As String
    Dim normalizedString As String = s.Normalize(NormalizationForm.FormD)
    Dim stringBuilder As New StringBuilder()
    Dim c As Char

    For i = 0 To normalizedString.Length - 1
    c = normalizedString(i)
    If System.Globalization.CharUnicodeInfo.GetUnicodeCategory(c) System.Globalization.UnicodeCategory.NonSpacingMark Then
    stringBuilder.Append(c)
    End If
    Next

    Return stringBuilder.ToString
    End Function

  • i am afraid this code does not work. it does nothing when i am trying it to letters in lowercase furthermore it destroys some letters in uppercase

  • Not present in Compact Framework. :o(

  • Wonderful!

    Thanks.

  • Thanks, this function just saved my life.

  • I am not sure if this is good code but it works fine for me! The Others didnt! Just a simple bitmask.

    Public Function RemoveAccentMarks(ByVal s As String) As String
    Dim stringBuilder As New StringBuilder
    Dim c As Char
    For Each c In s
    Dim v As Char = Chr(Asc(c) And &H7F)
    stringBuilder.Append(v)
    Next
    Return stringBuilder.ToString
    End Function

  • Hello!

    How can I do same with Java ?

    Many thanks!

  • This will do it :

    public string RemoveAccents(string s)
    {
    byte[] b = Encoding.GetEncoding(1251).GetBytes(s);
    return Encoding.ASCII.GetString(b);
    }

Comments have been disabled for this content.