Some thoughts on SPS Content Indexing [3/3]
...and just one more.
Various indexing problems I've encountered.
- "Cannot create the file because it already exists - 0x800700B7"
This error was reported in the gatherer log for my Portal_Content index. It occured several times - once for each Portal Area I had that was derived from a certain Site Definition I had customized. No content in these areas was indexed. This had me stumped for a while, since I couldn't understand what file the message referred to.
The problem, it turned out, was in the Site Definition's XML files.
The sitedef contained a custom list definition, which contained its own SCHEMA.XML file with the list's configuration. When customizing this file, I accidently left the "DefaultView=TRUE" attribute set for two different Views of the list. This apparently caused SPS to choke when trying to index sites based on the definition and leave them unindexed.
A good way to tell if this has happened (and which should have alerted me sooner) was that when viewing the list of Lists in an area, instances of that custom list were shown twice - with links leading to both default views.
After fixing the SCHEMA.XML, new areas based on the template were indexed properly. To fix existing areas (which already had data in them), a small console application connected to the server and used the Object Model to iterate over all problematic Areas and change the List's View's "DefaultView" property to False:
non_default_view.DefaultView = false;
- Contents of WSS sites listed in a Site Directory aren't indexed - only the site's existence can be found.
A two-step problem here. The first problem I had with indexing my WSS contents was that I was using a custom Site Directory rather than the default one. This was fixed using the method described in the previous article.
The second problem I had was that the sites were added to the portal when I was connected locally to the server, and browsed the http://localhost URL. This caused the newly-created WSS site's internally-saved URL to be under the localhost hostname, rather than my server name. When the crawling engine reached the sites, it noticed that their URLs differed - it was searching under servername, and found localhost instead. Since the default crawling rules don't allow server-hops while crawling, the sites were skipped.
I could not find a way to change the URLs of existing sites through the interface (changing the link in the Sites list only changed the pointer, not the SPSite itself) so I resorted to some quick DB editing - simply fix up the FullUrl column in the Sites table in your content database and all is well.
Well, that's it for my late-night verbal assault. Hope I helped someone.