Removing diacritics (accents) from strings
It's often useful to remove
diacritic marks
(often called accent marks) from characters. You know:
tilde, cédille, umlaut and
friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à'
becomes 'a'. This could be used for indexing or to build
simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can
play with String.Replace or regular expressions... But do
you know .NET 2 has all that is required to make this
easier?
You should use this kind of code for example:
public static String RemoveDiacritics(String s)
{
String normalizedString = s.Normalize(NormalizationForm.FormD);
StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < normalizedString.Length; i++)
{
Char c = normalizedString[i];
if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
stringBuilder.Append(c);
}
return stringBuilder.ToString();
}
This piece of code comes from Michael Kaplan's blog. I've been using it to clean my URL keys. It works perfectly and I suspect it's more efficient than other solutions.
