Summit 7 Team Blogs

Avoiding a Full Crawl with SharePoint Search (and FS4SP)

While documenting the steps required to replace a failed primary indexer in a FS4SP farm, I read a statement in the Microsoft TechNet documentation that a full crawl was recommended since items being processed (or indexed) at the time of the failure would not be in the recovered index.

Reading that, I remembered that individual items in the SharePoint crawl logs can be “marked” to be re-crawled during the next crawl. Rather than spend the resources for a full crawl of all content, one could just locate the items crawled just before the failure and mark them to be re-crawled.

On the Crawl Log page for the FAST Connector SSA, select either the Host Name or URL tab. In either view, one can filter the results by status and time period. Select Success as status and the time frame for a few minutes before the server failure. For each item, the context menu offers an option to mark the item Recrawl this item in the next crawl.

Obviously, this could take some time if there were lots of items, but not as much time as a full crawl of all content sources.

How far back before the failure should one start? That would depend upon the performance of one’s FS4SP pipeline and the length of time from the content dispatcher receiving items before they are indexed. I would think only a very few minutes.

This often overlooked and forgotten “trick” would also come in handy when a user insists that a particular document has not been indexed or was not indexed properly.

My developer friends can probably write a few lines of code that would also mark items for re-crawling but I have to work with the tools that I am given.

Hope this was helpful.

About Daniel Webster