Detect Chinese Character in Unicode String
Recently, when trying to convert some directory/file names between Chinese and English, it is necessary to detect if a Unicode string contains Chinese characters. Unfortunately, Chinese language detection, or language detection, is not easy. There are several options:
- Use API of Microsoft Language Detection in Extended Linguistic Services
 - Use the Detect API of Microsoft Translator
 - Microsoft has a sample C# package for language identification
 - Take the character range of East Asia languages (CJK Unified Ideographs (Han), where CJK means Chinese-Japanese-Korean) from the Unicode charts, and detect whether each character is in the range.
 - Use Google Chrome’s language detector, since Chrome is open source.
 
These are all practical, but it would be nice if there is a simple stupid solution. Actually .NET has an infamous enum System.Globalization.UnicodeCategory, it has 29 members:
- UppercaseLetter
 - LowercaseLetter
 - OpenPunctuation
 - ClosePunctuation
 - MathSymbol
 - OtherLetter
 - …
 
And there are 2 APIs accepting a char and returning the char’s UnicodeCategory:
- char.GetUnicodeCategory
 - CharUnicodeInfo.GetUnicodeCategory
 
So, generally, the following extension method detects if a string contains char in the specified UnicodeCategory:
public static bool Any(this string value, UnicodeCategory category) => !string.IsNullOrWhiteSpace(value) && value.Any(@char => char.GetUnicodeCategory(@char) == category);
Chinese characters are categorized into OtherLetter, so the Chinese detection problem can becomes OtherLetter detection.
public static bool HasOtherLetter(this string value) => value.Any(UnicodeCategory.OtherLetter);
The detection is easy:
bool hasOtherLetter = text.HasOtherLetter();
It is not totally accurate for Chinese language, but it works very well to distinguish English string and Chinese string.
