Making Sense of the SharePoint World


Knowing Your Limitations

Oct-122009

MCj03789710000[1]

"2.1 Billion ID's Should be Enough for Anybody!"

One of the more infamous stories about Bill Gates is that he once said "640K of memory should be enough for anyone." That wasn't true - he never said it, but it did point up the frustration that came from one of the design limits of the original IBM PC. The memory between 640K and 1MB (which was the physical limit of the CPU) was allocated by IBM for video, I/O buffers, and lots of other "housekeeping", and therefore couldn't be accessed by DOS. This was fine at the time, when the typical computer came with 64K of RAM, and even expanding to 512K was a luxury; but when applications (like Lotus 123, dBase III, and even Windows itself) became complicated enough to require that memory, and more powerful CPUs became available that allowed access to even more, that big "gap" before getting to the extended memory required more effort to program around than anyone could have predicted. (Yes, that's way over-simplified, but it is enough to get the point across...)

The reason I bring up this little history lesson is to point out that when you are designing products, you have to set limits somewhere. Sometimes these limits are intrinsic, like the 1MB maximum RAM of the 8088 CPU. Others are compromises, like how much of that 1MB to allocate for system housekeeping, and where to locate it in the address space. You hope you set these high enough that most users will never see them, but they are there.

SharePoint also has a number of limits. Most of them are well documented. Some of them are "soft" limits - places where you see performance degradation. Others are "hard" limits, like the maximum size of an integer value. But some limits are buried under the covers, because they are internal to a function, and users never see the processes that are impacted. If they are set high enough, the users will never even know they exist.

Crawling Forward

Unfortunately, there is a limit that wasn't set high enough. This was buried deep inside the MOSS and MSS search databases. Most database tables have a field for a unique identifier. This is automatically incremented every time a new row is added. Typically, a SQL Server Integer (int) is used for this ID, allowing up to just over 2 billion items to be added (2,147,483,647 if you must know). That's a lot. But this value just goes up - it isn't decremented if you delete a row.

In the SharePoint Search DB, there is a table that keeps track of all of the links in your crawled content. Whenever you do a new crawl, rows are added to and deleted from this table. This table originally used the int referenced above for its ID field. Now, there can be a lot of links in a SharePoint site, but still, 2.1 billion should take an awfully long time to reach, and in most cases it does. But reach it you can. For very large and complicated sites, if you do a full crawl every day (which deletes and replaces all of the link references) you can reach it faster than you might (and the developers did) think.

So, what happens if SharePoint actually hits this limit, and runs out of IDs? It isn't pretty. Essentially the crawling process gets stuck. It asks the database for permission to write the next available row, and since there isn't an ID that can be given to it, the database just says "no". Unfortunately, SharePoint doesn't take no for an answer, and keeps asking. You will, occasionally, see an error in the event log talking about a SQL Identity failure, but unless you were aware of this possibility, it wouldn't make much sense.

Recovering

This also prevents you from effectively controlling search. Because SharePoint insists on finishing the last thing it was doing, you can't stop the crawl. Because there isn't much to go on in the logs, and it takes some SQL Server proficiency to accurately diagnose the problem, many times, this results in folks rebuilding their SSP, with all of the pain and agony that entails, just for the want of an ID.

Note: At this point, you need to consider the search index on this SSP corrupt. There is nothing that can recover the ability to crawl new content without resetting your index and doing a full crawl as described in the prevention section below.

Even if you can successfully diagnose it, there are very few supportable solutions that *don't* involve rebuilding the SSP one way or another. Remember, directly modifying the SharePoint databases yourself can result in an unsupported state. So, if you reset the seed of the maxed-out table to 1 in order to get control of the crawl back and stop it, you should restore the search database from a backup to reach a production state before you reset the crawled content (see below), which resets the database to an initialized state.

You can also restore your whole SSP from a backup, but that's almost as much fun as rebuilding it, and it assumes you have a restorable backup of your SSP.

An Ounce of Prevention

Obviously, it is much better to prevent this problem from occurring in the first place than to try recovering from it. There are a couple ways to do this. The first and best is to upgrade your SharePoint environment to Service Pack 2. Among the many enhancements in SP2, the ID fields in the search databases that were prone to maxing out are updated to "big" integers. BigInts are twice the number of bits as regular integers. That doesn't just double the capacity, though. It makes it 4 billion times as large. (For those who really need to know it makes the number of possible ID's 9,223,372,036,854,775,807!) So, if it took 6 months to reach the old limit, it would take 24 billion months to reach the new one.

If you can't upgrade to SP2, you should consider adding a periodic reset of the index into your maintenance plans - especially if you have a very large corpus, with lots of links. The option to do this is available from Quick launch in the Search Administration page.

image

Resetting the crawled content doesn't impact your settings, keywords and best bets, etc... But it does delete your existing index and completely resets the search crawl database - including the table ID fields. After the reset, search results will not be available until a full crawl is performed, so you should schedule this to take place during a down time and/or notify your users of the search outage. If you have multiple content sources defined, you will need to crawl all of them.

When you select reset, you will get a screen asking if you want to turn off search alerts during the reset. It will default to being selected, and you should leave it that way.

image

The alerts can be reactivated once your crawls have been completed.

Conclusion

As Clint Eastwood once said as Dirty Harry, "A man's got to know his limitations." Everyone, and every thing, has limits.

Limits are only a problem when you don't know about them, and don't take them into account. SharePoint, as powerful as it is, has plenty of them. In addition to the hidden limit I covered in today's article, you might want to review some of the more well known limits in the SharePoint planning material: Planning for Software Boundaries.

 
Posted by Woody Windischman | 4 Comments | Trackback Url | Bookmark with:        
Tags: Administration, Design, Search Server, SharePoint, Search

Comments

Tuesday, 13 Oct 2009 08:36 by Todd Klindt
Interesting find, Woody. I love obscure stuff like this. Helps keep me in business. :) tk

Wednesday, 14 Oct 2009 04:03 by Woody
Thanks, Todd!

Thursday, 15 Oct 2009 09:57 by Wahid Saleemi
Hopefully MOSS 2010 will be out before 24 billion months! Great article, very informative.

Name:
URL:
Email:
Comments: