Using SQL Server 2005 CLR Features to Create Aggregates
I played around with SQL Server 2005 a bit this past week to explore (more in-depth) some of the different CLR features that are available. I'll be posting a few of the things I've experimented with over the next few days. One thing that can be really useful is the ability to create custom aggregate functions using C# or VB.NET, store them in the database and then use them within queries or stored procedures.
Jeff Jones (a SQL Server guru teaching at Interface Technical Training) and I were talking about the new Pivot functionality and how it was fairly painful to create the list of values to pivot on using pure T-SQL. It can definitely be done, but is a bit "hacky" IMHO. For example, the following query (which Jeff shared with me) could be used to dynamically grab the values to use as pivot columns. The query relies upon XPath capabilities of SQL Server to create a list similar to "[2004],[2005],[2006]" and then uses that list in a pivot statement.
FROM (SELECT DISTINCT CONVERT(VARCHAR, YEAR(OrderDate)) as dyear
FROM dbo.Orders) Y
ORDER BY dyear
FOR XML PATH('')), 1, 1, '')
IF @YEARLIST IS NOT NULL
EXEC ('SELECT * FROM (SELECT ProductID, YEAR(OrderDate) as theyear, Total
FROM dbo.Orders) t
PIVOT (SUM(Total) FOR theyear IN (' + @Yearlist + ')) PVT')
ELSE
SELECT * FROM (SELECT ProductID, YEAR(OrderDate) as theyear, Total
FROM dbo.Orders) t
PIVOT (SUM(Total) FOR theyear IN ([2006])) PVT
Although the XML PATH trick shown above does create the list of years to pivot on correctly, it's fairly tricky to understand what's going on. After thinking through it a little I decided that writing a CLR aggregate to handle generating the list of years would make things much easier. So, using a VS.NET 2005 database project I wrote the following class which handles generating the "[year1],[year2],[year3]" list used in the pivot query.
[SqlUserDefinedAggregate(
Format.UserDefined, //use clr serialization
IsInvariantToNulls = true, //optimizer property
IsInvariantToDuplicates = false, //optimizer property
IsInvariantToOrder = false, //optimizer property
MaxByteSize = 8000) //maximum size in bytes of persisted value
]
public class PivotConcatenate : IBinarySerialize
{
/// <summary>
/// The variable that holds the intermediate
/// result of the concatenation
/// </summary>
private ArrayList items;
/// <summary>
/// Initialize the internal data structures
/// </summary>
public void Init()
{
this.items = new ArrayList();
}
/// <summary>
/// Accumulate the next value, not if the value is null
/// </summary>
/// <param name="value"></param>
public void Accumulate(SqlString value)
{
if (value.IsNull)
{
return;
}
if (!items.Contains(value.Value))
{
this.items.Add(value.Value);
}
}
/// <summary>
/// Merge the partially computed aggregate
/// with this aggregate.
/// </summary>
/// <param name="other"></param>
public void Merge(PivotConcatenate other)
{
this.items.Add(other.items);
}
/// <summary>
/// Called at the end of aggregation, to return the
/// results of the aggregation.
/// </summary>
/// <returns></returns>
public SqlString Terminate()
{
//delete the trailing comma, if any
if (this.items != null && this.items.Count > 0)
{
//output = this.items.ToString(0,
//this.items.Length - 2);
}
return new SqlString(AddDelimeters(true));
}
public string AddDelimeters(bool addStartEndChars)
{
string output = String.Empty;
string[] newItems =
(string[])items.ToArray(typeof(string));
Array.Sort(newItems);
output = String.Join("],[", newItems);
if (addStartEndChars) output = "[" +
output + "]";
return output;
}
public void Read(BinaryReader r)
{
items = new ArrayList();
items.Add(r.ReadString());
}
public void Write(BinaryWriter w)
{
w.Write(AddDelimeters(false));
}
}
After deploying the aggregate class to my SQL Server database I was then able to change the query shown earlier to the following which is much simpler I think. Notice that the PivotConcatenate aggregate (shown above) is called directly rather than using the XML PATH trick.
SELECT @Yearlist =
(SELECT dbo.PivotConcatenate(CONVERT(VARCHAR, YEAR(OrderDate))) as dyear
FROM dbo.Orders)
PRINT @Yearlist
EXEC ('SELECT * FROM (SELECT ProductID, YEAR(OrderDate) as theyear, Total
FROM dbo.Orders) t
PIVOT (SUM(Total) FOR theyear IN (' + @Yearlist + ')) PVT')
SQL Server 2005's CLR capabilities can definitely help simplify code and make it easier to do string manipulations plus much more. I'll post some examples of using CLR functionality to grab RSS feeds from within SQL Server soon. Thanks to Jeff Jones for sharing the original pivot example with me.