Detect Chinese Character in Unicode String

Sunday, January 10, 2016

Unicode C#

Recently, when trying to convert some directory/file names between Chinese and English, it is necessary to detect if a Unicode string contains Chinese characters. Unfortunately, Chinese language detection, or language detection, is not easy. There are several options:

Use API of Microsoft Language Detection in Extended Linguistic Services
Use the Detect API of Microsoft Translator
Microsoft has a sample C# package for language identification
Take the character range of East Asia languages (CJK Unified Ideographs (Han), where CJK means Chinese-Japanese-Korean) from the Unicode charts, and detect whether each character is in the range.
Use Google Chrome’s language detector, since Chrome is open source.

These are all practical, but it would be nice if there is a simple stupid solution. Actually .NET has an infamous enum System.Globalization.UnicodeCategory, it has 29 members:

UppercaseLetter
LowercaseLetter
OpenPunctuation
ClosePunctuation
MathSymbol
OtherLetter
…

And there are 2 APIs accepting a char and returning the char’s UnicodeCategory:

char.GetUnicodeCategory
CharUnicodeInfo.GetUnicodeCategory

So, generally, the following extension method detects if a string contains char in the specified UnicodeCategory:

public static bool Any(this string value, UnicodeCategory category) =>
    !string.IsNullOrWhiteSpace(value)
    && value.Any(@char => char.GetUnicodeCategory(@char) == category);

Chinese characters are categorized into OtherLetter, so the Chinese detection problem can becomes OtherLetter detection.

public static bool HasOtherLetter(this string value) => value.Any(UnicodeCategory.OtherLetter);

The detection is easy:

bool hasOtherLetter = text.HasOtherLetter();

It is not totally accurate for Chinese language, but it works very well to distinguish English string and Chinese string.

12 Comments

Thanks. This is a simple way to say whether the text is English or not.

Patrick - Wednesday, January 6, 2016 7:12:00 PM

Yeah, I've mentoned that early like you

Hydromax - Tuesday, January 12, 2016 11:20:06 AM

تصميم مواقع ويب

mahmoud ali - Thursday, June 24, 2021 8:17:24 AM

https://ma-study.blogspot.com/

medicalphd - Monday, December 13, 2021 11:51:19 PM

Everyone really likes the content of your articles. and i am one of them that would like you to write for us to read again and again

ดาวน์โหลด bet game tv - Wednesday, May 17, 2023 9:15:01 PM

Hello ! I am the one who writes posts on these topics <a href="https://toolbarqueries.google.ga/url?sa=t&url=https%3A%2F%2Fwww.mtclean.blog/">bitcoincasino</a> I would like to write an article based on your article. When can I ask for a review?

bitcoincasino - Thursday, November 23, 2023 2:11:10 AM

I was looking for another article by chance and found your article <a href="https://toolbarqueries.google.fr/url?sa=t&url=https%3A%2F%2Fwww.mtclean.blog/">safetoto</a> I am writing on this topic, so I think it will help a lot. I leave my blog address below. Please visit once.

safetoto - Thursday, November 23, 2023 2:11:34 AM

I came to this site with the introduction of a friend around me and I was very impressed when I found your writing. I'll come back often after bookmarking! <a href="https://toolbarqueries.google.fm/url?sa=t&url=https%3A%2F%2Fwww.mtclean.blog/">casino online</a>

casino online - Thursday, November 23, 2023 2:12:03 AM

It's the same topic , but I was quite surprised to see the opinions I didn't think of. My blog also has articles on these topics, so I look forward to your visit. <a href="https://toolbarqueries.google.fi/url?sa=t&url=https%3A%2F%2Fwww.mtclean.blog/">totosite</a>

totosite - Thursday, November 23, 2023 2:12:46 AM

you are really good Everything you bring to propagate is absolutely amazing. i'm so obsessed with it.

ดาวน์โหลด sa - Thursday, February 15, 2024 6:39:53 PM

I am grateful for the depth of thought and research that went into your article.

ใลน์huaybee - Saturday, February 17, 2024 1:54:29 AM

I’m not that mսch of a online reaɗer to be honest but your sites really nice,keep it up! .Thanks for sharing.

เว็บ 789betting - Tuesday, February 20, 2024 9:11:35 PM

Dixin's Blog

12 Comments