Haunted By Inconsistencies
Aug 9, 2005Our client's database is hosted by a third party in a clustered SQL Server 2000 environment. Things were running swimmingly for 18 months when, in January, some inconsistencies started to appear. The first symptoms were three records in one of the tables that look corrupted; i.e. they contained dates for the years 2079 and 6619! We simply removed the data that hd been bulk loaded that day and re-bcpd it and were able to continue without issue.
We never quite got to the bottom of the exact cause of this corruption and were happy to leave it be until, on 30 June, a DBCC after a failure gave the following (trimmed) results.
DBCC results for 'DealPeriodParmsFlexShadow'.
There are 0 rows in 1 pages for object 'DealPeriodParmsFlexShadow'.
DBCC results for 'DealPeriodParmsShadow'.
Server: Msg 8928, Level 16, State 1, Line 1
Object ID 1383012008, index ID 0: Page (1:128399) could not be processed. See other errors for details.
There are 127575 rows in 1057 pages for object 'DealPeriodParmsShadow'.
DBCC results for 'recvspre17'.
There are 97593 rows in 3367 pages for object 'recvspre17'.
DBCC results for 'DealPeriodParmsSubShadow'.
There are 0 rows in 1 pages for object 'DealPeriodParmsSubShadow'.
DBCC results for 'System'.
There are 22 rows in 1 pages for object 'System'.
DBCC results for 'Recvs'.
There are 4980591 rows in 185222 pages for object 'Recvs'.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'Recvs' (object ID 1383012008).
DBCC results for 'DealPeriodsShadow'.
There are 1182 rows in 32 pages for object 'DealPeriodsShadow'.
DBCC results for 'DealsShadow'.
There are 0 rows in 1 pages for object 'DealsShadow'.
Running DBCC CHECKDB a second time seemed to imply that the issue was fixed. The application crashed within 45 minutes again. Manually running the SP that was executing at the time of the crash resulted in the following message:
Server: Msg 21, Level 21, State 1, Procedure sp_FlagGalpIneligibleOverdueObligors,
Line 109 Warning: Fatal error 644 occurred at Jun 30 2005 4:23PM
Running the same SP immediately again resulted in a new message:
Server: Msg 21, Level 20, State 1, Procedure sp_FlagGalpIneligibleOverdueObligors, Line 109
Warning: Fatal error 3624 occurred at Jun 30 2005 4:31PM
Location: scanrid.cpp:321
Expression: m_len != 0
SPID: 52
Process ID: 3756
A DBCC CHECKDB resulted in "CHECKDB found 0 allocation errors and 68 consistency errors in database" with messages such as:
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:245:193) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663407:10)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:245:194) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663407:11)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:245:195) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663407:12)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:245:196) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663407:16)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:245:197) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663407:17)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
{These two messages appeared 42 consecutive times, then there were some standard DBCC result messages and then...}
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:571866:69) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663445:25)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
Server: Msg 8956, Level 16, State 1, Line 1
Index row (1:571867:244) with values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'I') points to the data row identified by (RID = (1:663382:1)).
Server: Msg 8952, Level 16, State 1, Line 1
Table error: Database 'dbCresta', index 'Recvs.XIEDealPeriodTypeStatus' (ID 1383012008) (index ID 2). Extra or invalid key for the keys:
{a further 26 consecutive times.}
Running DBCC a second time resulted in 0 inconsistencies. This seemed to be a growing pattern: i.e. DBCC would show problems on the first attempt and come back clean on immediate subsequent attempts. The third party hosting the database insist that there are no hard drive issues (we pointed them to a number of forums implying this might be the case). They are using a RAID 5 configuration and also insist that no config changes have taken place in the last year.
That night a job was run to rebuild all of the indexes in the database. The following morning the very first thing I did was run a DBCC checkdb, with the following (trimmed) results:
...
Table error: Table 'Recvs' (ID 1383012008). Missing or invalid key in index 'XIEDealPeriodTypeStatus' (ID 2) for the row:
Server: Msg 8955, Level 16, State 1, Line 1
Data row (1:663382:1) identified by (RID = (1:663382:1) ) has index values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'E').
Server: Msg 8951, Level 16, State 1, Line 1
Table error: Table 'Recvs' (ID 1383012008). Missing or invalid key in index 'XIEDealPeriodTypeStatus' (ID 2) for the row:
Server: Msg 8955, Level 16, State 1, Line 1
Data row (1:663382:2) identified by (RID = (1:663382:2) ) has index values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'E').
Server: Msg 8951, Level 16, State 1, Line 1
Table error: Table 'Recvs' (ID 1383012008). Missing or invalid key in index 'XIEDealPeriodTypeStatus' (ID 2) for the row:
Server: Msg 8955, Level 16, State 1, Line 1
Data row (1:663382:3) identified by (RID = (1:663382:3) ) has index values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'E').
...
CHECKDB found 0 allocation errors and 3 consistency errors in database
{Running DBCC immediately again resulted in:}
...
DBCC results for 'DealPeriodParmsShadow'.
Server: Msg 8951, Level 16, State 1, Line 1
Table error: Table 'Recvs' (ID 1383012008). Missing or invalid key in index 'XIEDealPeriodTypeStatus' (ID 2) for the row:
Server: Msg 8955, Level 16, State 1, Line 1
Data row (1:663382:1) identified by (RID = (1:663382:1) ) has index values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'E').
Server: Msg 8951, Level 16, State 1, Line 1
Table error: Table 'Recvs' (ID 1383012008). Missing or invalid key in index 'XIEDealPeriodTypeStatus' (ID 2) for the row:
Server: Msg 8955, Level 16, State 1, Line 1
Data row (1:663382:2) identified by (RID = (1:663382:2) ) has index values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'E').
Server: Msg 8951, Level 16, State 1, Line 1
Table error: Table 'Recvs' (ID 1383012008). Missing or invalid key in index 'XIEDealPeriodTypeStatus' (ID 2) for the row:
Server: Msg 8955, Level 16, State 1, Line 1
Data row (1:663382:3) identified by (RID = (1:663382:3) ) has index values (DealId = 2 and PeriodId = 469 and Type = 'R' and Status = 'E').
There are 127803 rows in 1059 pages for object 'DealPeriodParmsShadow'.
...
CHECKDB found 0 allocation errors and 3 consistency errors in database
It should be noted that between the overnight job running and me executing DBCC in the morning no other external database activity took place. I.e. our application had been closed and no other databases on the server are currently in use.
The third party host subsequently executed dbcc checkdb with repair_rebuild and after this a standard DBCC CHECKDB came back clean. We later found that some of the data we had deleted in an attempt to rollback to a state before the errors had actually been resurrected by some of this activity. We then had to run many manual checks on the affected tables to verify the data was as expected.
Over the weekend of the 2nd and 3rd July diagnostic tests were run against the hardware in question in order to rule out h/w issues. The system was given a clean bill of health.
On Monday 4th the DBCC inconsistencies were back... They come and go, appearing on average once a week (sometimes two days in a row, sometimes not for two weeks). We have tried archiving data to make the tables smaller, removing the statistics (which we discovered were severely affecting performance anyway), running nightly DBCC CHECKDBs, all to no avail. The Recvs table seems to be the one that is the main problem. The bizarre thing is that the indexes on this table are dropped and recreated daily, since we bcp our daily data in. So the indexes should always be 'fresh'. There are no clustered indexes on this table, simply four non-clustered ones.
The support work resulting from the application failing (each time these DBCC inconsistencies occur) is time we'd very much like to spend more productively. We would very much appreciate any assistance anyone can give in getting to the root cause of the problem so that it can be addressed and the problems made to go away.