Fabrice's weblog

Tools and Source

News

My .NET Toolbox
An error occured. See the script errors signaled by your web browser.
No tools selected yet
.NET tools by SharpToolbox.com

Read sample chapters or buy LINQ in Action now!
Our LINQ book is also available on AMAZON

.NET jobs

Emplois .NET

transatlantys hot news

Contact

Me

Others

Selected content

Removing diacritics (accents) from strings

It's often useful to remove diacritic marks (often called accent marks) from characters. You know: tilde, cédille, umlaut and friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à' becomes 'a'. This could be used for indexing or to build simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can play with String.Replace or regular expressions... But do you know .NET 2 has all that is required to make this easier?

You should use this kind of code for example:

public static String RemoveDiacritics(String s)
{
  String normalizedString = s.Normalize(NormalizationForm.FormD);
  StringBuilder stringBuilder = new StringBuilder();

  for (int i = 0; i < normalizedString.Length; i++)
  {
    Char c = normalizedString[i];
    if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
      stringBuilder.Append(c);
  }

  return stringBuilder.ToString();
}

This piece of code comes from Michael Kaplan's blog. I've been using it to clean my URL keys. It works perfectly and I suspect it's more efficient than other solutions.

Comments

Florian DITTGEN said:

Well, 'ü' should become 'ue'. This is the legal alternative. (like 'ß' is equivalent to 'ss')
# November 8, 2006 5:59 PM

Fabrice Marguerie said:

Good remark! I'll pay attention to this.

# November 8, 2006 6:44 PM

anonymous coward said:

ü => ue is perhaps the "german way" but in France ü => u.

ü => u is consistant with the function name RemoveDiacritics (= remove the extra symbol ¨ on the primary character)

# February 13, 2007 11:10 AM

Durron's Blog said:

It&#39;s often useful to remove diacritic marks (often called accent marks) from characters. You know

# June 21, 2007 7:18 AM

Joe said:

Nice :-)

I have been looking for a function to do this for a while!!

cheers,

joe

# December 5, 2007 6:37 AM

C# Code: How to transform Åäö to Aao | Fredrik Haglund's blog said:

Pingback from  C# Code: How to transform &Aring;&auml;&ouml; to Aao | Fredrik Haglund's blog

# April 16, 2008 9:40 AM

Raphaël said:

For VB.NET programmers like me :

Public Function RemoveAccentMarks(ByVal s As String) As String

       Dim normalizedString As String = s.Normalize(NormalizationForm.FormD)

       Dim stringBuilder As New StringBuilder()

       Dim c As Char

       For i = 0 To normalizedString.Length - 1

           c = normalizedString(i)

           If System.Globalization.CharUnicodeInfo.GetUnicodeCategory(c) <> System.Globalization.UnicodeCategory.NonSpacingMark Then

               stringBuilder.Append(c)

           End If

       Next

       Return stringBuilder.ToString

   End Function

# May 2, 2008 1:27 PM

bonaqua said:

i am afraid this code does not work. it does nothing when i am trying it to letters in lowercase furthermore it destroys some letters in uppercase

# June 20, 2008 5:32 PM

ojejej said:

See tool wReplace which removes diacritic:

wwidgets.com/us_wReplace.html

There is also replacement table available inside, so you can test your solutions.

# June 30, 2008 8:19 AM

TPI said:

Not present in Compact Framework. :o(

# July 29, 2008 4:48 AM

glezypr said:

Wonderful!

Thanks.

# August 27, 2008 1:22 PM

David said:

Thanks, this function just saved my life.

# July 30, 2009 1:13 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)