guyS's WebLog

IShare, My DotNet Fingerprint

NVARCHAR versus VARCHAR

SQL Server provides both datatypes to store character information. For the most part the two datatypes are identical in how you would work with them within SQL Server or from an application. The difference is that nvarchar is used to store unicode data, which is used to store multilingual data in your database tables. Other languages have an extended set of character codes that need to be saved and this datatype allows for this extension. If your database will not be storing multilingual data you should use the varchar datatype instead. The reason for this is that nvarchar takes twice as much space as varchar, this is because of the need to store the extended character codes for other languages from Ask The Database Expert: Questions & Answers
Posted: Jan 15 2005, 05:49 AM by guyS | with 42 comment(s) |
Filed under: ,

Comments

Tim said:

Does it really take twice as much space to store at all times?  Unicode is only 16-bits wide when using characters outside of the standard 7-bit ASCII range.  If you are primarily storing English, wouldn't the nvarchar therefore only store most characters as 8-bit wide as well?  That's how unicode works in most programming languages...  Just curious, I have no idea how SQL works in this respect.

# June 11, 2007 7:03 PM

Adam said:

Actually, most programming languages use two bytes (the equivalent of a WCHAR instead of a CHAR) to store unicode character data, regardless of what the data contains.  It would be a lot of overhead to try to determine, at runtime, whether each character could safely be chopped down to a single byte; so either you've declared up front that you're going to allocate space for unicode throughout the string, or you've declared that you won't.

# June 25, 2007 10:10 AM

Rajdeep Gupta said:

This the best example of a man of few words.

Excellent way to explain.

# July 17, 2007 8:11 AM

Graham said:

Just a note:

IIRC, Java uses unicode internally for strings throughout anyway, and has to convert to 8-bit on the fly.

# July 26, 2007 10:12 AM

Ben said:

UCS-2 is typical for in-memory representation and UTF-8 for serialisation.

NVARCHAR is UCS-2.

# August 17, 2007 5:35 AM

varun P said:

Very Useful Explanation in few words...............

thanx Adam

# October 19, 2007 7:47 AM

Hemant said:

Adam thanks ,Its Really good information.

# November 13, 2007 1:00 AM

Enlaces para el curso de MS SQL Server 2005 » Innova Desarrollos informáticos said:

Pingback from  Enlaces para el curso de MS SQL Server 2005 » Innova Desarrollos informáticos

# November 26, 2007 11:56 AM

Paul Sanders said:

If you have any plans to store accented characters and the like, nvarchar is definitely the way to go.  It avoids all the problems which different character sets and code pages can introduce.

Strangely, though, you are still expected to define a collating sequence for such a column, which is very much an 8-bit-character kind of thing.  Can anyone throw any light on this?

Regards,

Paul Sanders

AlpineSoft

http://www.alpinesoft.co.uk

# December 5, 2007 4:22 AM

Alexandre said:

"Strangely, though, you are still expected to define a collating sequence for such a column"

Well, I don't work for Microsoft so I can't say that I know why they did it.

But I may have a clue though. If you set up your column to accept unicode data, it's basically because you expect accents or other special characters.

From a language to another, the way those characters are sorted (does the e acute comes after Z, as in English, or after E, as in French?) changes. So it makes sense to give an hint to the DB enigne as to what to do with the data.

It think that it's like telling the DB : "Okay, this is unicode, but you should threat it like a Greek would do".

That's what I think it is, anyway... :)

# January 11, 2008 1:37 PM

Dmitry Lyalin said:

Thank you for putting this up, its simple and to the point.

# January 24, 2008 9:23 PM

adeptus said:

Just the information I was looking for! Thanks!

# January 30, 2008 12:02 PM

lembeke said:

Aha that's what they are! Excellent info !

# March 4, 2008 8:59 AM

Nick Steele said:

Perfect answers!  Sheesh you guys are smart!

# March 16, 2008 10:15 PM

Hamid said:

Thanks for complete information

# April 8, 2008 4:11 AM

smcmiata said:

Thanks for the simple answer to a simple question.

# April 24, 2008 1:35 PM

Jesse said:

I believe SQL server is optimized for unicode data so the question becomes is there really a benefit to using varchar over nvarchar outside of disk space? Does SQL Server convert UTF-8 to an internal unicode format before working with it? I know quite a few other areas of Microsoft software do this unless you specify not to.

# May 1, 2008 3:10 PM

Hari said:

then varchar also sufficint right..it is also used for dynamic thing only....if i ll give like varchar(max) then nothing difference right?

# May 3, 2008 11:24 AM

prakash patel said:

i got information about varchar in this area...

# June 10, 2008 4:30 AM

Shawn Steele said:

nvarchar should always be used, otherwise you end up with code page conversion issues and lots of pain.

# June 13, 2008 12:45 PM

Rock said:

Thanks. That's what i am looking for

# August 12, 2008 8:34 PM

Jesse #2 said:

How come I am able to store a unicode character in a varchar field?

# August 20, 2008 1:24 AM

Petra said:

Jesse, I have the same question. My field is varchar, but I can store there characters as ššžýá. But I am not able to save another national characters -e.g. čř. This can be done only with nvarchar.

Maybe collation setting on the database infuece it, too.

# September 24, 2008 8:33 AM

Randall said:

@Jesse, Petra

Those special characters are still ASCII characters, so you can store them in a varchar column.  

Here's a link to the entire ASCII set

www.codetoad.com/.../ascii_characters.asp

varchar is good for English, Spanish, German, French and many other European languages.

You'll need nvarchar for languages like Japanese, Korean, Hindi and Arabic that aren't descendents of Latin languages.

# November 22, 2008 2:40 PM

ForeignTrades said:

thanks, i had a problem with the database size, now i solved it...

# January 21, 2009 3:58 PM

Anthony said:

I ran into an issue of incredibly poor performance with 'linking' varchar and nvarchar fields in supposing they were practically identical... but it went away completely (going from a 10 minute query down to 6 seconds) when I made both table fields identical type - at varchar.

# February 4, 2009 3:40 PM

Cal said:

sp_executesql (please don't hurt me for mentioning prepared sql)... requires nvarchar for @stmt parameter... up to nvarchar(max) in 2005 or nvarchar(4000) in 2000. Correct me if I'm wrong. -Cal

# February 11, 2009 2:06 PM

NVarchar Oddness said:

Pingback from  NVarchar Oddness

# February 24, 2009 4:47 AM

Pablo said:

I have programmed spanish/english applications for several years and I have always used varchar for the string fields.  I have never had an issue with them.  Spanish accented characters and tildes (ñ) are stored and retrieved with no problem at all.

I have read just what Randall said above: nvarchar (unicode) fields are necessary when you have the need to store japanese, chinese or other "strange" characters.

# February 24, 2009 5:03 PM

ora said:

I saw what you guys wrote above, but I am still not sure of how nvarchar affects performance.

Lets say I have in my mssql only few tables with columns that may have Latin characters, how will the performance be affected if I change these specific columns to be nvarchar?

# March 31, 2009 11:55 AM

JLeitão said:

Ora,

Not sure about that, but in the original post you can read,

"...The reason for this is that nvarchar takes twice as much space as varchar..."

If you have lots of data, that can be a problem.

I can't see another performance problem

# April 15, 2009 1:00 PM

Tuk said:

This is the "Feeling Lucky" hit for "varchar vs nvarchar" :)

# April 20, 2009 10:40 PM

anisurrahman said:

Thank you very much for sharing this information

# May 9, 2009 6:54 PM

Jack said:

Yes, It really takes twice as much space at all times. The reason you declare how long a varchar or nvarchar is so that the Database App can reserve a specified amount of space for that variable. If you have a field set up as varchar(15) and your data is only 3 characters long, it will still take up 15 characters of storage because that is how much space the Database App reserved.

# June 8, 2009 11:35 AM

Karim said:

Short and sweet dude.  Took me longer to write this comment than to get my answer.

# June 15, 2009 2:15 PM

Xiara said:

Since data-storage is not the problem anymore, I allways use nvarchar.

# June 29, 2009 2:11 PM

Gary Howlett said:

Space may not be a problem these days, however I typically see the bigger the database the longer it may take to perform IO intensive queries.

Of course 1/2 the field size dosent mean 2x the query speed however what about when transmitting the data? espically over slower network.

Also why not use a user defined datatype, so if you do need to change the type the task is a little easier.

# July 20, 2009 9:11 AM

Badri said:

I totally agree, If you don't need it shouldn't be using NVarchar as size of data increases, transmission time increses. Good Article

# September 14, 2009 2:49 PM

» VARCHAR VS NVARCHAR Sean's Blog said:

Pingback from  » VARCHAR VS NVARCHAR Sean's Blog

# October 23, 2009 10:16 AM

Dave said:

What about conversion issues since ADO.NET uses Unicode and indexing etc. searching for varchar with an nvarchar parameter? Does that blow the indexing?

# November 2, 2009 10:12 PM

PGN said:

What Anthony said above about the query issue joining nvarchar to varchar is a known bug.  I had the same issue and resolved by making both varchar.  

# November 18, 2009 10:57 AM

Ben said:

Wouldn't nVarchars be slower in sorting, searching and joining because the processor has to look at twice as many bits to perform any action?

# November 19, 2009 1:27 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)