Let me just start this post by saying that Lucene.net and it's extensions are one of my personal favorite parts of Sitecore 6. If you're not already familiar with it and you have pages that return large item sets in Sitecore it's definitely worth looking in to. Now that I've gotten that out of the way I don't feel so bad telling you that you should be very extraordinarily careful when dealing with wildcard searches. I think that's a fair statement in any case but it is especially now for me and for my recent experience with Lucene.net.
Now I'm not saying that Lucene.net wildcard queries necessarily perform worse than others, I haven't done enough testing to make that sort of claim. What I am saying is that if you are going to write custom code for Sitecore that includes Lucene.net wildcard queries you should, before you write that amazing query, be aware of what that query will look like to the server that you're working with.
In my case I had a fairly innocent query that selected every item in a set between two dates. Normally this doesn't sound like a big deal but in this case it was. In this case when the query was being processed what actually happend was the range was broke down into specific dates and an 'OR' clause was being appended. So what looked to me like a single query with a range from September 1 to September 30th of the same year looked to Lucene like a query with a clause term for each day between September 1st and September 30th. Now thats not the end of the world by any means, Lucene does some fantastic things to manage those queries and their terms and make them much less harsh for us. But what happens when you really have data in there? and you really have something like 5 years worth of constantly updated items that you're pulling from? To put it bluntly your query stands a chance of going over it's limit for clauses per query you get a .net error.
So what to do when faced with this nasty little mark on what is otherwise a pretty darn fantastic product? Don't worry there are a few pretty simple answers to this problem.
If you absolutely have to have more clauses available to your queries then you can simply increase the maximum number of query clauses allowed in a single query. To do that just set the "SetMaxClauseCount" property to a higher number than default of 1024.
Here's what that looks like:
The sort of ugly side to this solution however is that you may risk putting a heavy CPU load on your server with the kind of queries that take this many clauses to execute. And I mean heavy. In my case when the clauses were maxed out we were seeing CPU loads in the 70 - 80% range consistantly. Which brings me to my next possible solution. Revise what you're pulling in one query. I can't stress this enough; I also can't give a specific code example for this. What I can do is say that after simply revising the query at hand and talking to some key players it turned out that the query of my nightmares really didn't need to pull nearly as much information and I was able to lessen the number of clauses created for that wildcard query and other controls using the same sort of methodology didn't need to be as filtered. Yes, you read that right. You can actually solve some issues with your queries by filtering less! Look at it this way, you already know you're going to be pulling a large stack of items and you're going to have to do something with them. You might even have to do some work with each item individually. So why not take advantage of that fact and simplify those beastly queries and use Sitecore's fantastic built-in functions to handle some of that chaff a little more efficiently?
If you'd like to get more familiar with Sitecore search and the advanced database crawler i would definately reccomend going to Alex Syba's blog. He's done a lot of work on the product and has produced some amazing results.