SQL Server 2008 Data Compression Part-1

If most of you think that compression is a brand new feature introduced in SQL Server 2008 then I am afraid, you are wrong. It actually started with SQL Server 2005 when SP2 brought with itself a new storage format called Vardecimal.

Unlike Varchar, Vardecimal was a table-level option and affected the numeric data types stored in the underlying table.

SQL Server 2008 however goes beyond this and has introduced a full blown - Data Compression. Data Compression comes in two flavors:

  • Row Compression
  • Page Compression

Row Compression
Row compression changes the format of physical storage of data. Enabling row compression has the following affects on the underlying data:

  • It minimize the metadata (column information, length, offsets etc) associated with each record.
  • Numeric data types and fixed length strings are stored in variable-length storage format, just like Varchar.

It’s very simple to create a table with row compression enabled:

CREATE TABLE MyTable
(
      ID int identity Primary key,
      Name char(100),
      Email char(100)
)
WITH (DATA_COMPRESSION = Row);
GO

To enabled row compression on an existing table:

Alter TABLE MyTable REBUILD WITH (DATA_COMPRESSION=Row, MAXDOP=2);

Since compression is a CPU intensive process there is an additional setting MAXDOP which is used to specify the maximum number of processors to be used during compression operation.

Page Compression
The second and the most vital compression method is page compression. Page compression allows common data to be shared between rows for a given page. Its uses the following techniques to compress data:

  • Row compression. (Already discussed above)
  • Prefix Compression. For every column in a page duplicate prefixes are identified. These prefixes are saved in compression information headers (CI) which resides after page header. A reference number is assigned to these prefixes and that reference number is replaced where ever those prefixes are being used.
  • Dictionary Compression. Dictionary compression searches for duplicate values through out the page and stores them in CI. The main difference between prefix and dictionary compression is that prefix is only restricted to one column while dictionary is applicable to the complete page.

The syntax for enabling page compression is same as the one for row:

CREATE TABLE MyTable
(
      ID int identity Primary key,
      Name char(100),
      Email char(100)
)
WITH (DATA_COMPRESSION = Page);
GO

Or

Alter TABLE MyTable REBUILD WITH (DATA_COMPRESSION=Page, MAXDOP=2);

There is however few important points that one needs to understand when using page compression. Page compression works on page level so the amount of benefit that you can get out of this really depends on the data distribution per page. You can get drastically different results by just moving the data around, changing the clustered Index or any other change that modifies the structure of data stored.

Example
It’s time to put every thing that we studied so for to a test. Create a sample table and fill it with some hypo theoretical data.

Create Table CompressionCheck
(
      [ID] Int Identity Primary Key,
      [CharColumn] Char(100),
      [NumericColumn] Numeric(18, 4),
      [DateColumn] DateTime
)
Go
Set NoCount On
Declare @Min int, @Max int, @Rand int, @Str varchar(100), @i int
Declare @Counter Int
Set @Counter = 1
While @Counter < 100001
Begin
      Set @Min = 65
      Set @Max = 90
    Set @i = 0
    Set @Str = ''
    While @i < CONVERT(int, RAND() * 100)
    Begin
        Set @Rand = (((@Max + 1) - @Min) * RAND()) + @Min
        Set @Str = @Str + char(@Rand)
        Set @i = @i + 1
    End
      Insert Into dbo.CompressionCheck(NumericColumn, DateColumn, CharColumn)
      Select RAND() * 10000, GETDATE() + Convert(Int, RAND() * 1000), @Str
      Set @Counter = @Counter + 1
End
Set NoCount Off

The above TSQL script creates a table CompressionCheck and inserts 100000 randomly generated records. Note that page compression is not yet enabled. To find the current space utilization by the table use sp_spaceused proc.

Exec sp_spaceused 'CompressionCheck';

Results:

The above stats suggest that 13 MB of storage space is currently being used by CompressionCheck table. Now let’s apply page compression and see the results.

DBCC DROPCLEANBUFFERS --Removes all clean buffers from the buffer pool
Go
ALTER TABLE CompressionCheck REBUILD WITH (DATA_COMPRESSION=PAGE);
Go

Exec sp_spaceused 'CompressionCheck';

 

Results:

Comparing both the results we have successfully managed to reduce the space allocated to CompressionCheck table from 13 MB to just 3.5 MB.

kick it on DotNetKicks.com

1 Comment

  • Marvelous, very informative article. A very simple solution of a very complex problem programmers face frequently.
    Well done Harooon. Hope you will keep helping community.

Comments have been disabled for this content.