sysobjects, sys.objects... what about INFORMATION_SCHEMA?

SQL Server Management Objects (SMO) is one of the coolest parts of SQL Server 2005, in my opinion. I've written about SMO before, and used it extensively in writing Data Dictionary Creator. In a nutshell, SMO is a collection of objects that are designed for programming all aspects of managing Microsoft SQL Server.

One thing about SMO puzzles me, though - it uses the system tables (sysobjects and sys.objects) instead of INFORMATION_SCHEMA when it checks if tables, views, or procedures exist. There are at least three ways that's evident:

  1. Scripting database objects from SSMS
  2. Scripting database object using SMO in CaptureSQL mode
  3. The new SQL Server Database Publishing Wizard

All the above check for object existence using system table checks rather than INFORMATION_SCHEMA checks. I'm not just guessing how they work, though - take a look at Microsoft.SqlServer.Management.Smo.Scripts..ctor() in Reflector:

Scripts.INCLUDE_EXISTS_TABLE80 = "IF {0} EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'{1}') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)"
Scripts.INCLUDE_EXISTS_TABLE90 = "IF {0} EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'{1}') AND type in (N'U'))"
Scripts.INCLUDE_EXISTS_VIEW90 = "IF {0} EXISTS (SELECT * FROM sys.views WHERE object_id = OBJECT_ID(N'{1}'))"
Scripts.INCLUDE_EXISTS_VIEW80 = "IF {0} EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'{1}') AND OBJECTPROPERTY(id, N'IsView') = 1)"
Scripts.INCLUDE_EXISTS_PROCEDURE80 = "IF {0} EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'{1}') AND OBJECTPROPERTY(id,N'IsProcedure') = 1)"
Scripts.INCLUDE_EXISTS_PROCEDURE90 = "IF {0} EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'{1}') AND type in (N'P', N'PC'))"

INFORMATION_SCHEMA views good, System Tables bad

The biggest problem is that internal system tables change between versions, so scripts are generated for one specific SQL Server version. The INFORMATION_SCHEMA views, on the other hand, are part of the SQL92 standard, so they don't change between versions.[1] Here's a pretty good article on INFORMATION_SCHEMA if you'd like to read further.

For example, the following works on both SQL 2000 and SQL 2005:

use northwind
go

if exists (select * from information_schema.tables where table_name = 'Products' and table_type = 'BASE TABLE') print 'Products table exists'
if exists (select * from information_schema.tables where table_name = 'Invoices' and table_type = 'VIEW') print 'Invoices view exists'
if exists (select * from information_schema.routines where specific_name = 'SalesByCategory') print 'SalesByCategory stored procedure exists'

Disclaimer

I know there are some cases where the INFORMATION_SCHEMA won't cut it. INFO SCHEMA is a cross database standard, so it's got no concept of some advanced and proprietary features. In Data Dictionary Creator, I had to support different syntax for extended property lookup, since SQL Server 2000 uses  ::fn_listextendedproperty and SQL Server 2005 uses sys.fn_listextendedproperty.

Fine. That's no reason to drop INFORMATION_SCHEMA - use it for the 90% of the time it will do the job, and drop down to the system tables when you have to.

Workaround: Use Regular Expression replacements on the scripts

You can use regular expression replacements to "fix" SMO generated scripts to run against INFORMATION_SCHEMA.

This regular expression matches a table existence check in SQL 2000 specific syntax:

^IF\ EXISTS\ \(SELECT\ \*\ FROM\ dbo\.sysobjects\ WHERE\ id\ =\ OBJECT_ID\(N'(?<tablename>.+?)'\)\ AND\ OBJECTPROPERTY\(id,\ N'IsUserTable'\)\ =\ 1\)$

Here's the same thing, in SQL 2005 syntax:

^IF\ NOT\ EXISTS\ \(SELECT\ \*\ FROM\ sys\.objects\ WHERE\ object_id\ =\ OBJECT_ID\(N'\[dbo]\.\[(?<tablename>.+?)]'\)\ AND\ type\ in\ \(N'U'\)\)$

We're using a named capture group so we can use it in the replacement. Then we replace either with the following:

IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'${TABLENAME}' AND TABLE_TYPE = 'BASE TABLE')

I'm resisting the urge to write a utility to handle the script conversion since I need to be working on my book. Here's a rough outline  for a console application that would read the console input, convert table existence checks to use INFORMATION_SCHEMA, and write the result to the console output. The reason for using console input and output is to allow piping SqlPubWiz output through it. It's not tested, and it doesn't include checks for views, procedures, and indices; they'd follow the exact same pattern. This isn't the optimal way to do regular expression replacements - given time, I'd do these with MatchEvaluators.

using System;
using System.Text;
using System.Text.RegularExpressions;

class SqlScriptCleaner
{
static void Main(string[] args)
{
string script = System.Console.In.ToString();
Regex.Replace(script,
@"^IF\ EXISTS\ \(SELECT\ \*\ FROM\ dbo\.sysobjects\ WHERE\ id\ =\ OBJECT_ID\(N'(?<tablename>.+?)'\)\ AND\ OBJECTPROPERTY\(id,\ N'IsUserTable'\)\ =\ 1\)$",
@"IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'${TABLENAME}' AND TABLE_TYPE = 'BASE TABLE')");
Regex.Replace(script,
@"^IF\ NOT\ EXISTS\ \(SELECT\ \*\ FROM\ sys\.objects\ WHERE\ object_id\ =\ OBJECT_ID\(N'\[dbo]\.\[(?<tablename>.+?)]'\)\ AND\ type\ in\ \(N'U'\)\)$",
@"IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'${TABLENAME}' AND TABLE_TYPE = 'BASE TABLE')");
System.Console.Out.Write(script);
}
}

Come to think of it, this (theoretical) utility could also take care of my other gripe with SqlPubWiz, which is that it doesn't allow you to select which objects to script. It's inelegant, but it would work - let SqlPubWiz script everything, then clean the output to remove scripts for tables we don't want included.

A more practical workaround: Regex Replacements in SSMS

For now, though, I'm fine with just handling this manually in an editor that supports Regular Expression replacement. Visual Studio does it, as do most of the top Notepad replacement tools (Notepad++, Notepad2, etc.). Now that SQL Server Management Studio uses the Visual Studio IDE, you can do Regex replacements in SSMS as well.

  1. Generate your script and open it in a new query window - if you're just scripting schema, you can right click on the database, select Tasks, and pick the Generate Scripts task (selecting a new query window as the output method). If you're scripting schema and data with SqlPubWiz, output to a SQL file and open it in SSMS.
  2. Bring up the Quick Replace dialog (Edit -> Find / Replace -> Quick Replace, or just hit ctrl-h).
  3. Expand the "Find Options" section and check the "Use" checkbox, then ensure "Regular expressions" is selected.
  4. Enter the correct Regex find and replacements as shown below. Note that Visual Studio has its own bastardized Regex syntax, so you'll need to tweak the regexes.
    Revised Find expression: ^IF\ NOT\ EXISTS\ \(SELECT\ \*\ FROM\ sys\.objects\ WHERE\ object_id\ =\ OBJECT_ID\(N'\[dbo\]\.\[{.+}\]'\)\ AND\ type\ in\ \(N'U'\)\)
    Revised Replace expression: IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = N'\1' AND TABLE_TYPE = 'BASE TABLE') 
  5. Hit the Replace All button.
  6. Make sure the script still compiles by hitting CTRL-F5.

[1] Jamie Thomson reported one issue with INFORMATION_SCHEMA behavior changing between 2000 and 2005 due to an implementation bug in INFORMATION_SCHEMA.COLUMNS.NUMERIC.PRECISION for smalldatetime and datetime.

4 Comments

  • I understand your gripe, but there's a reason that SMO works the way it does.

    2005 is backwards compatible with 2000. that means any script you generate from a 2000 db will execute fine on a 2005 server. this means sysobjects and friends exists on a 2005 server.

    the converse is not true. a generic create script in 2005 will not execute in 2000, because of new datatypes, say, like varchar(max), xml, etc.

    so you can't expect a 2005 script to run on a 2000 server.

    if SMO devs used INFORMATION_SCHEMA when possible, the code would have been even more difficult to maintain, because sometimes they'd have to use sys.objects, sometimes sysobjects, and sometimes INFORMATION_SCHEMA.

    from a maintenance standpoint, better to limit to the minimum. which means INFORMATION_SCHEMA is out, unfortunately. perhaps not the best choice for customers, but certainly better from a code-complexity standpoint (which means less bugs - also good for customers :)

  • How would you query for tables in tempdb using information schema views?

  • Select * from tempdb.INFORMATION_SCHEMA.TABLES

  • One thing that surprised me about information_schema.routines is that it only stores the first 4000 characters of the procedure. This is a big problem when searching the text of longer stored procedures. I don't know if there are any other "gotchas" to use information_schema, but that might be another reason why Microsoft opted to stick with sys.

Comments have been disabled for this content.