"What's the Point of [SQL Server] User-Defined Types?"

Friday, October 7, 2005

I'm asked that question every now and then from other developers who've played around in SQL Server Enterprise Manager and noticed the "User Defined Data Types" tab under their database. UDT seem a bit strange and pointless because they do not allow one to define (as one might expect) a data structure with more than one related data element. A UDT consists simply of a name and a base type (INT, VARCHAR(6), etc).

So why then would one use a UDT? It all has to do with a fundamental concept of data known as "domains." I'm not referring to a dot-com type domain, but a domain in the mathematical sense of restricting the value of a particular value. For example, the domain of x for "f(x) = 1/x" is "!=0".

We don't get domains in C++ / C# / VB / etc; all we have are types (integer, date, string, etc). But we're used to not having this; everyone knows you need to check if "x != 0" before trying to divide by x. Imagine how much less coding (and related bugs) we'd have if trying to assign "0" to "x" threw an exception from the start, instead of in the middle. That's exactly what you can (and should) be doing with your databases.

When I start on this same explanation to others, it turns out a lot don't quite understand what check constraints are. Basically, check constraints are used to define the domain of a column to ensure that a row can only contain valid data according to the business rules. For example, your Products table should have a check constraint on the Price column, requiring it to be greater than zero (this would cause an exception to be raised if you tried to update the price to zero). Here's another example of some code:

CREATE TABLE [Transactions] (
[Transaction_Id] INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
[Transaction_Type] VARCHAR(5) NOT NULL
    CHECK ([Transaction_Type] IN ('Debit','Credit','Escrow')),
[Transaction_Amount] DECIMAL(4,2) NOT NULL
    CHECK ([Transaction_Amount] <> 0),
[Reference_Code] CHAR(5)
    CHECK ([Reference_Code] LIKE '[A-Z][ A-Z][A-Z][A-Z][A-Z]'))
)

Get the idea? Each column has a constraint to ensure only valid data is allowed in the table. This way, there is no way that [Reference_Code] could contain anything but a five character string of upper case letters. No need to write code to test it, no need to ever validate it (except maybe on the data entry form so that the user doesn't see an ugly exception message), and no need to assume that it will be anything but that.

Now, immagine that you wanted to have the same [Reference_Code] attribute throughout your database. You'd have to define that check constraint time and time again. If the rules ever changed, you'd need to change it in every place. That's where UDTs come into place. UDTs are the SQL Server imlementation of domains.

If you have a common data element that will be used throughout the system, then it should be a UDT. Account number, Username, Order Number, etc; all should be UDT. When you define these types, you can easily apply rules (which are essentially just check constraints that apply whenever the type is used) to the type, and have it automatically enforced throughout the system.

It's really easy to do. I'll use the SQL 2005 syntax, but you can do the same things in 2000 using sp_addtype and sp_addrule:

CREATE TYPE USERNAME FROM VARCHAR(20)
GO

CREATE RULE USERNAME_Domain
AS @Username = LTRIM(RTRIM(@Username))
AND LOWER(@Username) NOT IN ('admin','administrator','guest')
GO

EXEC sp_bindrule 'USERNAME_Domain', 'USERNAME'
GO

And that's it. Now you can use the type throughout the database just as you normally would, and you'll never need to check or verify to make sure that someone slipped in an invalid value ...

CREATE TABLE [User_Logons] (
[Username] USERNAME NOT NULL,
[Logon_Date] DATETIME NOT NULL,
[Success_Indicator] CHAR(1) NOT NULL
CHECK ([Success_Indicator] IN ('Y','N')),
PRIMARY KEY ([Username],[Logon_Date])
)

There's wisdom in your words and I'm surprised that so much people don't know anything about SQL servers and use them as flat tables. But let me give you something to think about: Consider the following situation - if you have n-tier application you better check the input in the user access layer - some simple validators will dramatically minimize the traffic and load of your servers. You better check the input as early as possible not rely on the SQL server to inform you whether input is wrong or not.

However I strongly encourage anyone to be using as much check constraints, foreign keys and unique indexes as possible in order to keep the database consistent. I ran so many times into wrong reports, because of doing it "the wrong way" - that means the sql wasn't not joining the tables "properly" and it was including records, that shouldn't be in the database and those records of course were not referenced at all.

nsimeonov - Sunday, October 16, 2005 8:29:00 PM

Thanks for posting this; it's clear, concise example of UDTs that makes a great response to why people should use them.

sqlfan - Friday, July 21, 2006 10:58:12 PM

But how can change it, say from char(1) to char(2)?
since you cannot drop it if it's already referenced.
The point of user-defined type is to keep inconsistency, now it canot be changed since it's defined.
Thanks for your help.

Steve Zhang - Tuesday, August 8, 2006 3:19:57 PM

Great explanation!

I'll consider and talk to our IS-team to move many of the data validations-functions to the SQL-layer instead.

Thank you very much for great examples!

Stefan Nilsson - Monday, June 25, 2007 10:19:55 AM

Alex,

Your code will leave the books in worse shape than Enron! Hint: your transactions table will contain only debits.

Syarzhuk - Wednesday, August 15, 2007 6:28:21 PM

5 Comments