Archives

Archives / 2005 / May

SharePoint statistics: source processing

Sunday, May 29, 2005
SharePoint SQL Server
62 Comments

In post http://weblogs.asp.net/soever/archive/2005/05/21/408207.aspx I did some investigations into the information logged by SharePoint in the IIS and STS log files. In this post I describe some decisions I’m going to make on processing these log files, based on the information that became available during my investigations. I’m writing these blog posts while doing these investigations, so if you have any comments on the decisions I make, please let me know!!

Goal of this weblog post is to massage the available log data into a format that can easily be processed for importing into the “Stage Area – IN”, a SQL server (2005) database where we import all source data that will eventually en up into out data warehouse.

STS logs

First of all we need a tool to convert the STS binary log files to a format that can easily be processed. The article Usage Event Logging in Windows SharePoint Services contains the code for a C++ application to do this conversion. I also got my hands on a C# implementation through Steven Kassim, a colleague of mine. He got this code from a newsgroup, but I couldn’t find where it exactly came from, and who wrote it. I’m doing some modifications to the code to change the output format (so LogParser can handle it), and to improve the speed. I will publish the code as soon as I’m ready. [Update: tracked down the newsgroup: http://groups.yahoo.com/group/sharepointdiscussions/, and the author: Fred LaForest].

IIS logs

Although the IIS log files are already in a format that could be easily parsed, there are some good reasons to do a preprocessing parse to accomplish the following:
- Handle the problem of the IIS log header appearing in the log file on each IIS-RESET
- Filter out log entries we are not interested in:
- Filter out fields we are not interested in, like in our case the client IP address, be base the location on the main loacation of a user in the company directory (can also be done through IIS by only selecting the properties in our log file that we are interested in!)
IIS supports multiple log formats, and multiple ways to log information. It is possible to do direct ODBC logging to a database, but this approach gives a heavier load on the web servers. The best format IIS can log in is the W3C Extended Log File Format. In this log format it is possible to select the fields we are interested in:

Carefully selecting the the properties we are interested in can greatly reduce the amount of data that will be logged.

For more information on the W3C Extended Log File Format see:
- http://www.w3.org/TR/WD-logfile.html
- http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/676400bc-8969-4aa7-851a-9319490a9bbb.mspx
Processing the log files: the tool

There are many good systems around to process log files. Two log file processors I would really like to mention are:
- Analog, http://www.analog.cx
- LogParser, http://www.logparser.com
I have selected LogParser, because of its following features:
- It supports any log file format (handy for the STS log files)
- It might even be possible to implement direct binary parsing of the STS log files through a custom component into LogParser (still investigating this)
- It support incremental input parsing through checkpoints, which simplifies incrementally importing of log file data into our database
- It has a powerful query syntax
- It is very powerful in its supported output formats
- There is extensive programmability support available
For more information on LogParser see:
- http://www.logparser.com, especially the forum is very interesting! Gabriele Giuseppini does a great job in answering the questions!
- How Log Parser 2.2 Works, by Gabriele Giuseppini, the author of LogParser: http://www.microsoft.com/technet/community/columns/profwin/pw0505.mspx
For information on LogParser with respect to SharePoint, where direct reporting on the log files is done see:
Back to the IIS log, what do we need

As stated in the previous post, in the STS log all successful requests to all pages and documents that are within WSS sites are logged. This includes WSS site based SPS things like MySite and Areas. All those request are logged in the IIS log as well, and they are difficult to correlate due to time differences. It is also the question if it is interesting to correlate those log entries, the STS log contains all the information that we need… although… I have one issue: the bandwidth consumed by the request. I can’t get the correct value out of the STS log (although it should be in there), while the IIS log contains the correct values (sc-bytes = cs-bytes). This would be the only reason to do the correlation. I’m still working on this issue (I post on this later), so lets assume that problem will be solved.

So where do we need the IIS logs for:
- Pages not found (404 errors)
- Pages in the /_layouts folder, this is also the location where we store our custom web applications and our custom services
- Unmanaged paths in the SharePoint virtual directory (paths excluded for the SharePoint render-engine “treatment”)
- IIS logs of other web sites, not related to SharePoint, but part of our intranet
Any requests for images, javascript files and stylesheet files in the IIS log can be skipped in our case, because those files are static files, supporting the SharePoint UI and our custom applications. We also filter out requests made by service account, we are not interested in those reuqests.

In the STS log requests for images are interesting, because these images are user uploaded documents within the WSS sites. We do filter out request made by service accounts as well for the the STS logs.

Moving IIS log files into the database

To move the IIS log files into the database we need a table definition for the IIS logs. I’m currently using the following table definition:

CREATE TABLE [dbo].[IISlog] (
[date] [datetime] NULL,
[time] [datetime] NULL,
[csUsername] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[sComputername] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[csMethod] [varchar](16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[csUriStem] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[csUriQuery] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[scStatus] [smallint] NULL,
[scSubstatus] [smallint] NULL,
[scWin32Status] [int] NULL,
[scBytes] [int] NULL,
[csBytes] [int] NULL,
[timeTaken] [int] NULL,
[csHost] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[csUserAgent] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[csReferer] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[application] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

And the following LogParser script to move the data from the log files to the database:

"C:\Program Files\Log Parser 2.2\logparser.exe" "SELECT date, time, cs-username, s-computername, cs-method, cs-uri-stem, cs-uri-query, sc-status, sc-substatus, sc-win32-status, sc-bytes, cs-bytes, time-taken, cs-host, cs(User-Agent) as cs-User-Agent, cs(Referer) as cs-Referer, 'SharePointPortal' as application INTO IISlog FROM c:\projects\IISlog\*.log WHERE (cs-username IS NOT NULL) AND (TO_LOWERCASE(cs-username) NOT IN ('domain\serviceaccount'))" -i:IISW3C -o:SQL -server:localhost -database:SharePoint_SA_IN -clearTable:ON

This is the first step where I filter out all request made by the system account used to index the SharePoint content. I did not do the filtering out of the WSS sites requests (we will use the STS log for this) and the unwanted files in the /_layouts/ directory yet. I’m moving one step at a time. So we now have all log files (collected into the directory c:\projects\IISlog) moved into the database.

Moving STS log files into the database

To move the STS log files into the database we need a table definition for the STS logs. I’m currently using the following table definition:

CREATE TABLE [dbo].[STSlog](
[application] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[date] [datetime] NULL,
[time] [datetime] NULL,
[username] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[computername] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[method] [varchar](16) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[siteURL] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[webURL] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[docName] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[bytes] [int] NULL,
[queryString] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[userAgent] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[referer] [varchar](2048) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[bitFlags] [smallint] NULL,
[status] [smallint] NULL,
[siteGuid] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

And the following script to move the data from the binary log files to the database:

"C:\projects\STSLogParser\STSLogParser.exe" 2005-01-01 "c:\projects\STSlog\2005-01-01\00.log" c:\projects\logparsertmp\stslog.csv
"C:\Program Files\Log Parser 2.2\logparser.exe" "SELECT 'SharePointPortal' as application, TO_DATE(TO_UTCTIME(TO_TIMESTAMP(TO_TIMESTAMP(date, 'yyyy-MM-dd'), TO_TIMESTAMP(time, 'hh:mm:ss')))) AS date, TO_TIME( TO_UTCTIME( TO_TIMESTAMP(TO_TIMESTAMP(date, 'yyyy-MM-dd'), TO_TIMESTAMP(time, 'hh:mm:ss')))), UserName as username, 'SERVERNAME' as computername, 'GET' as method, SiteURL as siteURL, WebURL as webURL, DocName as docName, cBytes as bytes, QueryString as queryString, UserAgent as userAgent, RefURL as referer, TO_INT(bitFlags) as bitFlags, TO_INT(HttpStatus) as status, TO_STRING(SiteGuid) as siteGuid INTO STSlog FROM c:\projects\logparsertmp\stslog.csv WHERE (username IS NOT NULL) AND (TO_LOWERCASE(username) NOT IN (domain\serviceaccount))" -i:CSV -headerRow:ON -o:SQL -server:localhost -database:SharePoint_SA_IN -clearTable:ON

This script currently moves only one day, but you get the drift. As you can see we also set day, computername and application in the log file. Currently using fixed values, we will move this into a dynamic system later on. The date field is obvious, we want to record the date into the database for each log entry. We need the computer and application fields because we will have multiple servers, and multiple “applications” build on SharePoint, like for example ‘SharePointPortal’, ‘TeamSites’ (Intranet) and ‘ExternalTeamSites’ (Extranet).

The STSLogParser is an application to parse the STS log file from it’s binary format into a comma serperated ASCII log file. I will post the code for this converter in one of my next posts.
SharePoint custom site definitions... again...

Wednesday, May 25, 2005
SharePoint
3 Comments

There were a lot of comments on my post: SharePoint custom site definitions... I’m lost…, where I described the problems I have with the following statement in a new knowledge base article by Microsoft:

"Microsoft does not support modifying a custom site definition or a custom area definition after you create a new site or a new portal area by using that site definition or area definition. Additionally, Microsoft does not support modifying the .xml files or the .aspx files in the custom site definition or in the custom area definition after you deploy the custom site definition or the custom area definition."

I also asked John Jansen from Microsoft for a comment, and through his connections within Microsoft he came back with the following reaction:

"The statement that appears to be making the most waves on Serge's blog -- "You modify a custom site definition or a custom area definition after you deploy the custom site definition or the custom area definition" -- was already in place in the SDK (http://msdn.microsoft.com/library/en-us/spptsdk/html/tsovGuidelinesCustomTemplates_SV01018815.asp?frame=true) before this KB article was published; this KB article was, more or less, a reminder and summary of the "rules" that had already been defined in various places throughout the SDK."

I don’t know since when this is in the documentation. I started with the documentation when SharePoint was still beta, and the documentation was really sparse, a lot changed since then. Every new release it gets better and better, and bigger and bigger… so I must have missed this one while upgrading to ther new documentation when a new version came out. The documentation does contain the following sentence:

“Changing a site definition after it has already been deployed can break existing sites and is not supported. If you find that you must modify a site definition after it has already been deployed, keep in mind that adding features can cause fewer problems than changing or deleting them. Changing features often results in loss of data and deleting them often results in broken views.”

So we are having a problem, and we need a solution to it… An interesting post by Cornelius van Dyk describes such a solution (http://www.dtdn.com/blog/2005/05/microsoft-support-scenarios-for-custom_24.htm). Lets hope he is willing to provide more information on his tools….

For more information on issues with changing site definitions after instantiating a site based on the site definition, see also: http://weblogs.asp.net/bsimser/archive/2005/05/17/407237.aspx

I would like to thank everyone for their responses, good to know what we can do and should not do.
SharePoint statistics: the sources

Saturday, May 21, 2005

4 Comments

First Issue in SharePoint statistics is: where can we find the information to do statistics on.

Normally for a web application I grab the IIS log files and start from there. SharePoint is another case. Besides the IIS log files there are also the STS log files. The name STS log files dates back to the SharePoint Team Sites from the past.

IIS log files: used to log ALL activities on a web site
STS log files: used to log all activities on Windows SharePoint Services (WSS) sites, also the basis for SPS area’s

There is a good reason why for SharePoint you need logs in two different places: although all web access is logged in the IIS logs, many accesses to SharePoint go through the FrontPage Server Extensions. Yes, most of SharePoint is still running on FSE, and still implemented in COM. In these URL accesses there is no detail information available on what is exactly requested. In the IIS logs you find entries like:

2004-12-31 23:58:06 SRV-P-INTRA-3 10.10.4.15 POST /_vti_bin/_vti_aut/author.dll - 443 domain\username 10.10.4.102 HTTP/1.1 MSFrontPage/6.0 - - hostname 200 0 0 1061 614 0
2005-01-01 00:08:08 SRV-P-INTRA-3 10.10.4.15 POST /_vti_bin/_vti_aut/author.dll - 443 domain\username 10.10.4.102 HTTP/1.1 MSFrontPage/6.0 - - hostname 200 0 0 1061 614 140

(I removed the author, because no one should have to know this guy does not have a life: editing SharePoint pages when everyone in the world is celebrating the new year!!!)

As you can see FrontPage does all page accesses through author.dll, but no information is available on which page is edited using FrontPage. Also access to documents in WSS goes through a FSE dll.

In the following example we access the homepage and a document test.doc in the document library docs in the site test:

IIS log (stripped down a bit to save space):

2005-01-01 00:52:22 SRV-P-INTRA-3 GET /default.aspx - 443 domain\username
2005-01-01 00:52:22 SRV-P-INTRA-3 10.10.4.15 GET /_layouts/1033/owsbrows.js - 443 domain\username
2005-01-01 00:52:22 SRV-P-INTRA-3 10.10.4.15 GET /_layouts/1033/styles/ows.css - 443 domain\username
2005-01-01 00:52:26 SRV-P-INTRA-3 10.10.4.15 GET /_layouts/images/logo_macaw.jpg - 443 domain\username
: goes on and on and on for all stylesheets, javascript files and pictures
2005-01-01 00:52:26 SRV-P-INTRA-3 10.10.4.15 GET /_vti_bin/owssvr.dll - 443 domain\username

STS log (stripped down a bit to save space):

01:52:22,1,200,2758144,1,0BAD41D9-D7D6-4892-A42F-61E4BB7AAEED,domain\username,https://servername,,default.aspx
01:52:27,1,200,1670913,1,040D5AB9-3072-45E3-975F-40C6B28CF132,domain\username,https://servername/sites/test,,docs/test.dochttps://servername/sites/test,,docs/test.doc

So in the IIS log the access to the page and the access to all it’s embedded and linked content is logged, while in the STS log only the access to the page is logged.
In the IIS log accessing a document is logged as /_vti_bin/owssvr.dll, while the STS log exactly specifies wchich document is loaded from which document library in which site.

For more information on the STS log format, have a look at the MSDN article: Usage Event Logging in Windows SharePoint Services.

Looking at the IIS and STS logs, there are some important observations to make (some directly visible, others from the literature):
- IIS logs have a log timestamp in GMT time
- STS logs have a log time stamp in local server time (honouring daylight saving time)
- IIS log files don’t look at daylight saving time
- STS logs are in a binary format, and must be converted to a usable format before processing
- IIS logs write “header lines” on each IISRESET, sospecial processing is needed
- After each page access information is directly written tot the IIS log
- STS uses caching in writing to the log file, do an IISRESET during investigating to make sure the cached log entries are written
- The timestamp written to the IIS and STS logs can be different for the same page access. See last line in example above for both IIS log and STS log. IIS log entry is written on 00:52:26 (so at 26 seconds), while STS log entry is written on 1:52:27 (so at 27 seconds)
- In the STS log only succesful requests are logged (information streamed back to the client)
- In the IIS log ALL requests are logged, request for the /_layouts “in site context” pages but also requests for missing pages
- The STS log only logs requests for pages and documents in sites, not information in for example the /_layouts directory
- The STS log entries only have a time, no date. The date is given by the folder structure where the STS log files are stored
- The available fields in STS log files is different to the avialable fields in the IIS log files
Where to go from here? I save that for my next post!
SharePoint statistics: diving into SqlServer 2005 datawarehousing...

Saturday, May 21, 2005
SharePoint SQL Server
No Comments

I have got a new project to dive into: statistics and click-stream analysis on a SharePoint intranet for 30.000 users for one of our large customers.

After years of development on a custom build classic ASP based portal for this customer, our company (Macaw) did a new implementation of their intranet portal based on SharePoint Portal Server 2003. I was part of this development team and created most of the tooling around the automatic build proces. We are currently code complete on the new implementation.

Important in a large intranet is statistics. One part of our company is Macaw Business Solutions (MBS), specialised in Business Intelligence. MBS got the project to implement the statitics and click-stream analysis part on the new intranet.

Due to my knowledge on SharePoint I am now part of the project team, and I’m now diving into the new world of Business Intelligence. I already got a “steam course” into BI from Jack Klaassen (Director of MBS) and Ralf van Gellekom, and it sounds like fun stuff!

In their wisdom Jack and Ralf, together with the customer, decided to go for SQL server 2005 and all the BI functionality it has available, instead of using SQL server 2000 now, and when the project is up and running SQL server 2005 comes available with much more powerful capabilities and tooling, and a migration project is needed.

In my blog I will try to report on some of the steps and issues we are encountering is this adventurous project. I will keep you posted!
SharePoint custom site definitions... I'm lost...

Wednesday, May 11, 2005
SharePoint
10 Comments

Microsoft did release a knowledge base article on which scenario's are supported and not supported with respect to SharePoint site definitions: http://www.kbalertz.com/Feedback_898631.aspx

One of the things that makes me really sad is the following statement:

"Microsoft does not support modifying a custom site definition or a custom area definition after you create a new site or a new portal area by using that site definition or area definition. Additionally, Microsoft does not support modifying the .xml files or the .aspx files in the custom site definition or in the custom area definition after you deploy the custom site definition or the custom area definition."

Besides ghosting I thought that exactly this point was the powerful thing about site definitions!! Back to the simple site templates... if you may not make any modifications afterwards so all instances of your custom site definition instantly reflect those changes, site definitions are useless!!!

NAnt task xmllist, way more powerful than xmlpeek (source provided)

Saturday, May 7, 2005

26 Comments

UPDATE: See http://weblogs.asp.net/soever/archive/2006/12/01/nant-xmllist-command-updated.aspx for an updated version of the NAnt XmlPeek command.

I have a love-hate relationship with the <xmlpeek> command in NAnt.

The problems I have with it are:

It report an error when the XPath expression does not resolve into a node, there is NO WAY to test if a node or attribute exists (to my knowledge)
It’s logging level is set to Level.Info, so there is always output. This should have been Level.Verbose, I don’t want output for every xmlpeek I perform
It is not possible to return the contents of multiple nodes selected in the XPath expression

Especially the problem that I can’t test for the existance of a node or attribute bothers me. I can set failonerror to false, ant test afterwards if the property exist, but that means that there is still an error that is reported in my buildserver report, while it is expected behaviour!

Based on an implementation by Richard Case I wrote the same version of his <xmllist> task, but a bit more powerful and using the standard naming for the attributes. Using this task you can extract text from an XML file at the locations specified by an XPath expression, and return those texts separated by a delimiter string. If the XPath expression specifies multiple nodes the node are seperated by the delimiter string, if no nodes are matched, an empty string is returned.

See the comments in the code for an extensive example.

I will try to post this code to the NAnt developers mailing list, but it’s here for you to get you starget if you need this kind of functionality.

// NAnt - A .NET build tool
// Copyright (C) 2001-2003 Gerry Shaw
//
// This program is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
//
// Serge van den Oever (serge@macaw.nl)
// Based on idea from weblog entry: http://blogs.geekdojo.net/rcase/archive/2005/01/06/5971.aspx combined with the code of xmlpeek.
using System;
using System.Globalization;
using System.IO;
using System.Text;
using System.Xml;
using System.Collections.Specialized;
using NAnt.Core;
using NAnt.Core.Attributes;
using NAnt.Core.Types;
namespace Macaw.MGDE
{
/// <summary>
/// Extracts text from an XML file at the locations specified by an XPath
/// expression, and return those texts separated by a delimiter string.
/// </summary>
/// <remarks>
/// <para>
/// If the XPath expression specifies multiple nodes the node are seperated
/// by the delimiter string, if no nodes are matched, an empty string is returned.
/// </para>
/// </remarks>
/// <example>
///   <para>
///   The example provided assumes that the following XML file (xmllisttest.xml)
///   exists in the current build directory.
///   </para>
///   <code>
///     <![CDATA[
///	<?xml version="1.0" encoding="utf-8" ?>
/// <xmllisttest>
/// <firstnode attrib="attrib1">node1</firstnode>
/// <secondnode attrib="attrib2">
/// <subnode attrib="attribone">one</subnode>
/// <subnode attrib="attribtwo">two</subnode>
/// <subnode attrib="attribthree">three</subnode>
/// <subnode attrib="attribtwo">two</subnode>
/// </secondnode>
/// </xmllisttest>	
///		]]>
///   </code>
/// </example>
/// <example>
///   <para>
///   The example reads numerous values from this file:
///   </para>
///   <code>
///     <![CDATA[
/// <?xml version="1.0" encoding="utf-8" ?>
/// <project name="tests.build" default="test" basedir=".">
/// 	<target name="test">
/// 		<!-- TEST1: node exists, is single node, get value -->
/// 		<xmllist file="xmllisttest.xml" property="prop1" delim="," xpath="/xmllisttest/firstnode"/>

/// 		<echo message="prop1=${prop1}"/>
/// 		<fail message="TEST1: Expected: prop1=node1" unless="${prop1 == 'node1'}"/>
/// 		
/// 		<!-- TEST2: node does not exist -->
/// 		<xmllist file="xmllisttest.xml" property="prop2" delim="," xpath="/xmllisttest/nonexistantnode" />

/// 		<echo message="prop2='${prop2}'"/>
/// 		<fail message="TEST2: Expected: prop2=<empty>" unless="${prop2 == ''}"/>
///
/// 		<!-- TEST3: node exists, get attribute value -->
/// 		<xmllist file="xmllisttest.xml" property="prop3" delim="," xpath="/xmllisttest/firstnode/@attrib" />

/// 		<echo message="prop3=${prop3}"/>
/// 		<fail message="TEST3: Expected: prop3=attrib1" unless="${prop3 == 'attrib1'}"/>
///
/// 		<!-- TEST4: nodes exists, get multiple values -->
/// 		<xmllist file="xmllisttest.xml" property="prop5" delim="," xpath="/xmllisttest/secondnode/subnode" />

/// 		<echo message="prop5=${prop5}"/>
/// 		<fail message="TEST4: Expected: prop5=one,two,three,two" unless="${prop5 == 'one,two,three,two'}"/>
///
/// 		<!-- TEST5: nodes exists, get multiple attribute values -->
/// 		<xmllist file="xmllisttest.xml" property="prop5" delim="," xpath="/xmllisttest/secondnode/subnode/@attrib" />

/// 		<echo message="prop5=${prop5}"/>
/// 		<fail message="TEST5: Expected: prop5=attribone,attribtwo,attribthree,attribtwo" unless="${prop5 == 'attribone,attribtwo,attribthree,attribtwo'}"/>
///
/// 		<!-- TEST6: nodes exists, get multiple values, but only unique values -->
/// 		<xmllist file="xmllisttest.xml" property="prop6" delim="," xpath="/xmllisttest/secondnode/subnode" unique="true"/>

/// 		<echo message="prop6=${prop6}"/>
/// 		<fail message="TEST4: Expected: prop6=one,two,three" unless="${prop6 == 'one,two,three'}"/>
///
/// 		<!-- TEST7: nodes exists, get multiple attribute values -->
/// 		<xmllist file="xmllisttest.xml" property="prop7" delim="," xpath="/xmllisttest/secondnode/subnode/@attrib" unique="true"/>

/// 		<echo message="prop7=${prop7}"/>
/// 		<fail message="TEST7: Expected: prop7=attribone,attribtwo,attribthree" unless="${prop7 == 'attribone,attribtwo,attribthree'}"/>
/// 		
/// 		<!-- TEST8: node exists, is single node, has namespace http://thirdnodenamespace, get value -->
/// 		<xmllist file="xmllisttest.xml" property="prop8" delim="," xpath="/xmllisttest/x:thirdnode">

/// 			<namespaces>
/// 				<namespace prefix="x" uri="http://thirdnodenamespace" />
/// 			</namespaces>
/// 		</xmllist>
/// 		<echo message="prop8=${prop8}"/>
/// 		<fail message="TEST8: Expected: prop8=namespacednode" unless="${prop8 == 'namespacednode'}"/>
/// 	</target>
/// </project>
///		]]>
///   </code>
///   Result when you run this code:
///   <code>
///		<![CDATA[
/// 	test:
///
/// 	[echo] prop1="node1"
/// 	[echo] prop2="''"
/// 	[echo] prop3="attrib1"
/// 	[echo] prop5="one,two,three,two"
/// 	[echo] prop5="attribone,attribtwo,attribthree,attribtwo"
/// 	[echo] prop6="one,two,three"
/// 	[echo] prop7="attribone,attribtwo,attribthree"
/// 	[echo] prop8="namespacednode"
///
/// 	BUILD SUCCEEDED
///		]]
///   </code>
/// </example>
[TaskName ("xmllist")]
public class XmlListTask : Task
{
#region Private Instance Fields
	private FileInfo _xmlFile;
	private string _xPath;
	private string _property;
	private string _delimiter = &quot;,&quot;;
	private bool _unique = false; // assume we return all values
	private XmlNamespaceCollection _namespaces = new XmlNamespaceCollection();

	#endregion Private Instance Fields

	#region Public Instance Properties
	/// &lt;summary&gt;
	/// The name of the file that contains the XML document
	/// that is going to be interrogated.
	/// &lt;/summary&gt;
	[TaskAttribute(&quot;file&quot;, Required=true)]
	public FileInfo XmlFile 
	{
		get
		{
			return _xmlFile;
		}
		set
		{
			_xmlFile = value;
		}
	}

	/// &lt;summary&gt;
	/// The XPath expression used to select which nodes to read.
	/// &lt;/summary&gt;
	[TaskAttribute (&quot;xpath&quot;, Required = true)]
	[StringValidator (AllowEmpty = false)]
	public string XPath
	{
		get
		{
			return _xPath;
		}
		set
		{
			_xPath = value;
		}
	}

	/// &lt;summary&gt;
	/// The property that receives the text representation of the XML inside 
	/// the nodes returned from the XPath expression, seperated by the specified delimiter.
	/// &lt;/summary&gt;
	[TaskAttribute (&quot;property&quot;, Required = true)]
	[StringValidator (AllowEmpty = false)]
	public string Property
	{
		get
		{
			return _property;
		}
		set
		{
			_property = value;
		}
	}

	/// &lt;summary&gt;
	/// The delimiter string.
	/// &lt;/summary&gt;
	[TaskAttribute (&quot;delim&quot;, Required = false)]
	[StringValidator (AllowEmpty = false)]
	public string Delimiter
	{
		get
		{
			return _delimiter;
		}
		set
		{
			_delimiter = value;
		}
	}

	/// &lt;summary&gt;
	/// If unique, no duplicate vaslues are returned. By default unique is false and all values are returned.
	/// &lt;/summary&gt;
	[TaskAttribute (&quot;unique&quot;, Required = false)]
	[BooleanValidator()]
	public bool Unique
	{
		get
		{
			return _unique;
		}
		set
		{
			_unique = value;
		}
	}

	/// &lt;summary&gt;
	/// Namespace definitions to resolve prefixes in the XPath expression.
	/// &lt;/summary&gt;
	[BuildElementCollection(&quot;namespaces&quot;, &quot;namespace&quot;)]
	public XmlNamespaceCollection Namespaces 
	{
		get
		{
			return _namespaces;
		}
		set
		{
			_namespaces = value;
		}
	}

	#endregion Public Instance Properties

	#region Override implementation of Task

	/// &lt;summary&gt;
	/// Executes the XML reading task.
	/// &lt;/summary&gt;
	protected override void ExecuteTask() 
	{
		Log(Level.Verbose, &quot;Looking at &#39;{0}&#39; with XPath expression &#39;{1}&#39;.&quot;, 
			XmlFile.FullName,  XPath);

		// ensure the specified xml file exists
		if (!XmlFile.Exists) 
		{
			throw new BuildException(string.Format(CultureInfo.InvariantCulture, 
				&quot;The XML file &#39;{0}&#39; does not exist.&quot;, XmlFile.FullName), Location);
		}
		try 
		{
			XmlDocument document = LoadDocument(XmlFile.FullName);
			Properties[Property] = GetNodeContents(XPath, document);
		} 
		catch (BuildException ex) 
		{
			throw ex; // Just re-throw the build exceptions.
		} 
		catch (Exception ex) 
		{
			throw new BuildException(string.Format(CultureInfo.InvariantCulture,
				&quot;Retrieving the information from &#39;{0}&#39; failed.&quot;, XmlFile.FullName), 
				Location, ex);
		}
	}
    
	#endregion Override implementation of Task
    
	#region private Instance Methods

	/// &lt;summary&gt;
	/// Loads an XML document from a file on disk.
	/// &lt;/summary&gt;
	/// &lt;param name=&quot;fileName&quot;&gt;The file name of the file to load the XML document from.&lt;/param&gt;
	/// &lt;returns&gt;
	/// A &lt;see cref=&quot;XmlDocument&quot;&gt;document&lt;/see&gt; containing
	/// the document object representing the file.
	/// &lt;/returns&gt;
	private XmlDocument LoadDocument(string fileName)  
	{
		XmlDocument document = null;

		try 
		{
			document = new XmlDocument();
			document.Load(fileName);
			return document;
		} 
		catch (Exception ex) 
		{
			throw new BuildException(string.Format(CultureInfo.InvariantCulture,
				&quot;Can&#39;t load XML file &#39;{0}&#39;.&quot;, fileName), Location, 
				ex);
		}
	}

	/// &lt;summary&gt;
	/// Gets the contents of the list of nodes specified by the XPath expression.
	/// &lt;/summary&gt;
	/// &lt;param name=&quot;xpath&quot;&gt;The XPath expression used to determine the nodes.&lt;/param&gt;
	/// &lt;param name=&quot;document&quot;&gt;The XML document to select the nodes from.&lt;/param&gt;
	/// &lt;returns&gt;
	/// The contents of the nodes specified by the XPath expression, delimited by 
	/// the delimiter string.
	/// &lt;/returns&gt;
	private string GetNodeContents(string xpath, XmlDocument document) 
	{
		XmlNodeList nodes;

		try 
		{
			XmlNamespaceManager nsMgr = new XmlNamespaceManager(document.NameTable);
			foreach (XmlNamespace xmlNamespace in Namespaces) 
			{
				if (xmlNamespace.IfDefined &amp;&amp; !xmlNamespace.UnlessDefined) 
				{
					nsMgr.AddNamespace(xmlNamespace.Prefix, xmlNamespace.Uri);
				}
			}
			nodes = document.SelectNodes(xpath, nsMgr);
		} 
		catch (Exception ex) 
		{
			throw new BuildException(string.Format(CultureInfo.InvariantCulture,
				&quot;Failed to execute the xpath expression {0}.&quot;, xpath), 
				Location, ex);
		}

		Log(Level.Verbose, &quot;Found &#39;{0}&#39; nodes with the XPath expression &#39;{1}&#39;.&quot;,
			nodes.Count, xpath);

		// collect all strings in a string collection, skip duplications if Unique is true
		StringCollection texts = new StringCollection();
		foreach (XmlNode node in nodes)
		{
			string text = node.InnerText;
			if (!Unique || !texts.Contains(text))
			{
				texts.Add(text);
			}
		}
		
		// Concatenate the strings in the string collection to a single string, delimited by Delimiter
		StringBuilder builder = new StringBuilder();
		foreach (string text in texts)
		{
			if (builder.Length &gt; 0)
			{
				builder.Append(Delimiter);
			}
			builder.Append(text);
		}

		return builder.ToString();
	}
	#endregion private Instance Methods
}

}