Fabrice's weblog

Tools and Source

News


Read sample chapters or buy LINQ in Action now!
Our LINQ book is also available on AMAZON

.NET jobs

Emplois .NET

The views expressed on this weblog are mine alone and do not necessarily reflect the views of my employer. The content of this weblog is independent from Microsoft or any other company. transatlantys hot news

Contact

Me

Others

Selected content

October 2006 - Posts

Removing diacritics (accents) from strings

It's often useful to remove diacritic marks (often called accent marks) from characters. You know: tilde, cédille, umlaut and friends. This means 'é' becomes 'e', 'ü' becomes 'u' or 'à' becomes 'a'. This could be used for indexing or to build simple URLs, for example.
Doing so is not so easy if you don't know the trick. You can play with String.Replace or regular expressions... But do you know .NET 2 has all that is required to make this easier?

You should use this kind of code for example:

public static String RemoveDiacritics(String s)
{
  String normalizedString = s.Normalize(NormalizationForm.FormD);
  StringBuilder stringBuilder = new StringBuilder();

  for (int i = 0; i < normalizedString.Length; i++)
  {
    Char c = normalizedString[i];
    if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
      stringBuilder.Append(c);
  }

  return stringBuilder.ToString();
}

This piece of code comes from Michael Kaplan's blog. I've been using it to clean my URL keys. It works perfectly and I suspect it's more efficient than other solutions.

Solving URL rewriting problems with themes and trailing slashes

In a comment to an old post of mine about URL rewriting, a visitor named Tim has just asked how to solve a problem he was facing with ASP.NET themes and rewriting. The original post was addressing the main problems by using an overload of the RewritePath method introduced by .NET 2. Yet, a simple problem still existed: whenever URL rewriting is used with a URL like ~/somepath/ the theme gets broken because the path to the CSS files and other themed resources (like images) are wrong. The problem here is the trailing slash, which confuses the theme engine. A URL like ~/somepath works fine of course.

In fact, I recently noticed that I had the same problem with my own URLs on SharpToolbox/JavaToolbox and on a new site I'm working on. What I did to resolve this is to perform a (permanent) redirection to URLs without the ending "/". This is indeed what we wish to express: ~/somepath/ and ~/somepath are the same URLs.

Here is the regular expression I use to identify a problematic URL: ^(.*)/(\?.*)?$
I then redirect to: $1$2
You can notice that this expression also supports URLs with parameters.

If you use UrlRewritingNet.UrlRewrite you can use the following rule:
<add virtualUrl="^(.*)/(\?.*)?$" destinationUrl="$1$2" rewriteUrlParameter="ExcludeFromClientQueryString" redirectMode="Permanent" redirect="Domain" />

If you use my old HTTP module, you should be able to use:
<add targetUrl="^(.*)/(\?.*)?$" destinationUrl="$1$2" permanent="true" />

More Posts