Sharepoint Include and Exclude Content and the Site Directory

Internally we're using several site directories around the portal to categorize external content, both WSS sites and other stuff.

We are pretty liberal in allowing people to create sites and add links to sites with relevance. There's YASQ related to this. First let me explain how the Site Directory area template works.

The Site Directory template is really just an Area with a list. I usually compare the Directory to a whitepages listing that allows you to tag content with metadata, like you tag people with addresses and phone numbers in the phonebook. If your address changes there are no high coupling to the phonebook so using Site Directories provides more flexibility than creating site hierarchies in WSS to indicate associations.

When clicking "Create Site" from the Site Directory Area Template (no matter where it is instantiated) you're basically going to perform three functions:

  1. Create a new top-level WSS site (yes, always top-level)
  2. Create an entry in the "whitepages", i.o.w. tagging your new site with metadata relevant for the Area you're creating it from (we often customize the list to provide accurate categories and metadata)
  3. Register the site/external resource for search (make it appear in the "Manage crawls of Site Directory -> Approved Sites")

So, no matter where you create a Site Directory in your portal, the template is going to perform these operations. By default all sites will also be registered for search in the same place, not differentiating on which Area you created the site from.

The other function provided by the Site Directory template is "Add link to site". This allows users to add unlisted WSS sites or external content to the directory, and also make Sharepoint crawl these resources and include them for search. For internal WSS sites all is good, but if you add an external resource using this function you might get a nasty surprise.

I added my weblog to our portal Site Directory with the url http://weblogs.asp.net/mnissen. Forgetting about this YASQ I arrived the next morning finding that our portal was still indexing content starting 01:00 AM last night. Sharepoint was indexing the entire http://weblogs.asp.net site, thank God I hadn't "Include linked content" enabled.

To correct this behaviour I had to add a couple of new rules to the Non_Portal_Content index. One excluding http://weblogs.asp.net/* and one including http://weblogs.asp.net/mnissen/*. I feel somehow that this should have been the default behaviour, and provide this as a clear warning to administrators that enable the "Add link to site" feature without approvals. For every external resource included in the site directory you have to think carefully about your include and exclude rules.

3 Comments

  • Maybe you can help me with a little stuff that I need to do...



    I need to create a web part that lists the same stuff listed in Site Directory... But I got some limitations.



    I can't have directly access to the SQL Database and I can't use the Administration's Class (I don't got the service account to impersonate)...



    Is there another way?



    Thanks.

  • Use the sharepoint lists webservice. Download the Sharepoint SDK for details.

  • So, I don't got any solution using API?

Comments have been disabled for this content.