I recently researched on the CHECKSUM & CHECKSUM_AGG functions in T-Sql and found them really useful. However, I was skeptical that there are chances of these functions returning the same values for non-identical inputs. I just got on to the forums and found more than one unhappy folks writing about their experience with these functions.
I am designing a large database (warehouse) and found these functions tempting to implement for the sake of
using CHECKSUM for
- indexing long character fields
- multiple colums of the same table that would involve in a join and use the new checksum field instead
using CHECKSUM_AGG for
- I bulkcopy flat file soruce data into a character field of a table and to ensure that I am not loading the same file multiple times, I plan to use CHECKSUM_AGG( CHECKSUM( [FlatFileRecord] ) ) and verify that no two loads have the same output.
Can some body suggest if I can trust these methods for my purpose?
Gentlemen,I am using the following query to get a list of grouped checksum data.SELECT CAST(Field0_datetime AS INT),CHECKSUM_AGG(BINARY_CHECKSUM(Field1_bigint, Field2_datetime,Field3_datetime, Field4_bigint, Field5_bigint, CAST(Field6_floatDecimal(38,6)), Field7_datetime))FROM Table1WHERE Field0_datetime BETWEEN '2003-01-01' AND '2003-01-20'GROUP BY CAST(Field0_datetime AS INT)Please notice the used filter: from January 1 to January 20.That query takes about 6 minutes do return the data. The result is 18records.However, when I execute the same query filtering BETWEEN '2003-01-01' and'2003-01-10', this time it takes only 1 second to return data.When I execute the query filtering BETWEEN '2003-01-10' and '2003-01-20' thequery takes another 1 second to return data.So why 6 minutes to process them together??The table have an index by Field0_datetime.It contains about 1.5 millions records total, using around 1.7Gb ofdiskspace, indexes included.From 2003-01-01 and 2003-01-20, there are 11401 records selected. Don't looklike that much.The situation is repeatable, I mean, if I execute the queries back andagain, they takes the about the same ammount of time to execute, so I don'tthink this problem is related to cache or something like that.I would appreciate any advice about what might be wrong with my situation.Thanks a lot and kind regards,Orly JuniorIT Professional
Hi,I can see that by using the object ID rather that the object name, thefollowing SQL query works. Has anybody got any idea what is causing theerror?-- Works OKselect o.id,checksum_agg(binary_checksum(m.text))from sysobjects o,syscomments mwhere o.id = m.idand o.xtype in ('FN','IF','P','TF','TR','V')group by o.id-- Error-- Server: Msg 1540, Level 16, State 1, Line 1-- Cannot sort a row of size 8096, which is greater than the-- allowable maximum of 8094.select object_name(o.id),checksum_agg(binary_checksum(m.text))from sysobjects o,syscomments mwhere o.id = m.idand o.xtype in ('FN','IF','P','TF','TR','V')group by object_name(o.id)-- Error-- Server: Msg 1540, Level 16, State 1, Line 1-- Cannot sort a row of size 8096, which is greater than the-- allowable maximum of 8094.select o.name,checksum_agg(binary_checksum(m.text))from sysobjects o,syscomments mwhere o.id = m.idand o.xtype in ('FN','IF','P','TF','TR','V')group by o.name-- Workaroundselect getdate(),object_name(x.id),check_sumfrom (select m.id,checksum_agg(binary_checksum(m.text)) as check_sumfrom syscomments minner joinsysobjects oon m.id = o.idwhere o.xtype in ('FN','IF','P','TF','TR','V')group by m.id) as xRegardsLiam
Can anyone provide me with the syntax for comparing rows of two tables using binary checksum? The tables A and B have 8 & 9 columns respectively. The PK in both cases is Col1 & Col2. I want checksum on Columns 1 to 8.
select t.* from test t join test2 t2 on t2.id=t.id where CHECKSUM(t.col2,t.col3)<>CHECKSUM(t2.col2,t2.col3)
--The purpose of the above script is to check for any updates in the two tables. It returns two rows. But as you can see both these rows were present in the table before. So I modify the script to - --SCRIPT B select t.* from test t join test2 t2 on t2.col2=t.col2 where CHECKSUM(t.col3)<>CHECKSUM(t2.col3)
-- In this case no row is returned.This is exactly what I need. The problem - Now execute the script below.
TRUNCATE TABLE TEST TRUNCATE TABLE TEST2
insert test values(4,4,'d','02/06/2004') insert test values(4,4,'d','02/01/2004')
--Now when I execute script B two rows are returned which is not what I want. Since the rows are identical no row should be returned. So depending on what column changes (col2 or col3), I have to alter the script. I seek advise on the method to calculate checksum. Again the PK is ID and Col1 only.
I'm developing a stored procedure to run an update based on valuesentered into a .Net web form. I want to capture the chceksum of therow when it is displayed on the form then validate that when the updateis exec'd. Simple enough logic, eh? The problem is when I try to usethe checksum(*) function, SQL server yells at me and says that it isn'trecognized. I'm using SQL Server 7, so wtf? I am not the admin of theserver and I'm skirting around SQL Server Enterprise Manager and usingany free utils, MS Access, and Visual Studio to maintain this db.ThanksAlex Jamrozek
Hi,I'd like advices about an idea I add to resolve a problem. thanks toyou in advance for yours answers.I have a database with tables that I load with flat file. The size ofeach table is 600 Mb. The flat file are the image of an applicationand there is no updated date or created date on any table. So mytables are just a copy of the data from the flat file.Now I'd like to create an History Table. So I have to determine whichlines changed and which one did'nt.As I don't have any date on my row the only answer I had unil know wasto check each column on each row to see if any data changed. If thedata changed I add a new line in my history date.My idea is to add a checksum column in both table on all columns. Toknow if any data change I just have to check my PK + my checksumcolumn.Do you think that is a good idea ? Is checksum a quick function ornot ?.Thanks.--K
I heard that page checksum enabled will reports errors occured in the log. That's good.
Currently we have DBCC PHYSICAL_ONLY run alone and CHECKTABLE on group of tables on different days. A suggestion came from a person is to turn off 'DBCC CHECK TABLE' and run only when checksum reports an error and continue running CHECKDB WITH PHYSICAL_ONLY as before.
Is this suggestion a best practice? Please also write few lines to say why it is wrong or wright.
it sounds like a column can be added to each row in a table that is the checksum or binary_checksum of an expression. How many bytes do each of these occupy? Does the answer depend on the number and/or length of items in the expression?
i am using checksum in my etl process for this i have a checksum field to calculate the values in my table the column is a computed column and it has a property for persistence .
what decision should i take should i make it persisted ot not what is the industry standard.
Can you please expalin how this property would affect the behaviour of the column
will this property affect me in any thing like indexes . please let me what step should i take should i make the column persisted or not .
I have two tables src_monthly_terrrier and src_weekly_terrier. Both of these tables consists of 10+ columns. As the table names probably suggest, I import weekly data into one and monthly data into another.
All the source data comes from an Excel spreadsheet via straight Import Data procedure. The only guaranteed change on a weekly and monthly basis is that one of the columns in each table named src_date will obviously have the data value for whichever month or week's data it relates to.
I understand that through 'SQL Server Business Intelligence Development Studio' I can create an 'Intergrated Services' package that will import the spreadsheet details for me. I might be going the long way around this, but it was my intention to bring in all the data and then run a couple of 'INSERT INTO' Stored Procedures.
My biggest issue / vunerability I have is that there is no error checking of the data on the way in to ensure that it has not already been imported. What I was thinking I could do to resolve this was to create a Checksum field comprising of a number of different columns (incl src_date) and then somehow write something that will look at the values of each intended imported row and then work out whether a duplicate checksum was found in the target table and then rejected the import routine as Duplicate Data Found (or something similar) and move onto the next stored procedure.
My problem is two fold, one I have no idea how to create said checksum and two no idea where to begin on coding a procedure etc that looks to see if the value already exists etc etc.
I have looked up checksum creation on the net and there appears to be plenty of resource to explain how to create one, so I guess my main question is, Where do I start when it comes to writing some code that will do the check of the checksum before the importation routine begins (or at least the Insert Into procedures.
I would truly appreciate anyone's help on this. In the meanwhile I am off to learn how to create them.
I would like to add, if anyone sees this as a bad idea, then please speak up.
I need to generate HASH of text values for my app. I can generate hash values for normal fields using CHEKCSUM and BINARY_CHECKSUM function but it does not support checksum of text, ntext, image, and cursor, as well as sql_variant.
Looking for some clarification on the CHECKSUM option of the BACKUP command.
If the the CHECKSUM option is specified in the backup, will the backup fail if CHECKSUM finds bad values (or at least raise an error)? Or, is it only reported when doing a RESTORE VERIFYONLY?
With this discussion here http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=70328 I started to thinkn about Microsoft really calculated checksum value.
This code is 100% compatible with MS original. That is, the result is identical. You can use it "as is", or you can use it to see that MS function does not produce that unique values one could expect.
With text/varchar/image data, call with SELECT BINARY_CHECKSUM('abcdefghijklmnop'), dbo.fnPesoBinaryChecksum('abcdefghijklmnop') With integer data, call with SELECT BINARY_CHECKSUM(123), dbo.fnPesoBinaryChecksum(CAST(123 AS VARBINARY)) I haven't figured out how to calculate checksum for integers greater than 255 yet.CREATE FUNCTION dbo.fnPesoBinaryChecksum ( @Data IMAGE ) RETURNS INT AS
BEGIN DECLARE@Index INT, @MaxIndex INT, @SUM BIGINT, @Overflow TINYINT
IF @SUM > 2147483647 SELECT @SUM = @SUM - 4294967296
RETURN @SUM ENDActually this is an improvement of MS function, since it accepts TEXT and IMAGE data.CREATE FUNCTION dbo.fnPesoTextChecksum ( @Data TEXT ) RETURNS INT AS
BEGIN DECLARE@Index INT, @MaxIndex INT, @SUM BIGINT, @Overflow TINYINT
We are using binary_checksum in some of instead of update trigger. The problem came into the knowledge when update falied without raising any error. We came to know after research that checksum returns same number for two different inputs and thats why update failed.
We are using following type of inside the trigger.
For detecting delta records, I'm a big fan of SQLIS' checksum transform. I'm having difficulty in it's install on my current machine, however. After the installation and the new transform is added to my DataFlow toolbox... I can't open the UI for the transform to define the checksum. Instead, I get the following error:
===================================
Could not load file or assembly 'Microsoft.ExceptionMessageBox, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified. (Microsoft Visual Studio)
------------------------------ Program Location:
at Konesans.Dts.Pipeline.ChecksumTransform.ChecksumTransformUI.Edit(IWin32Window parentWindow, Variables variables, Connections connections) at Microsoft.DataTransformationServices.Design.DtsComponentDesigner.StartComponentUI(Boolean startGenericUI)
I am using the Konesans Checksum transformation ( http://www.sqlis.com/21.aspx ) to detect changes in my big (many columns, type 2 SCD) dimensional table.
But I am running into collossions
The checksum transformation, sometimes misses a small change in the record, for instance when a certain flag is set or unset. Is there a more robust checksum generator? Of any other suggestions on to solve this?
The current setting of this option can be determined by examining the page_verify_option column in the sys.databases catalog view or the IsTornPageDetectionEnabled property of the DATABASEPROPERTYEX function.
However, there is no column named page_verify_option in the view sys.databases, and DATABASEPROPERTYEX('IsTornPageDetectionEnabled') does not discriminate between the settings CHECKSUM and NONE (it returns 0 for both)!
From what I've seen, the CheckSum_Agg function appears to returns 0 for even number of repeated values. If so, then what is the practical use of this function for implementing an aggregate checksum across a set of values?
For example, the following work as expected; it returns a non-zero checksum across (1) value or across (2) unequal values.
declare @t table ( ID int ); insert into @t ( ID ) values (-7077); select checksum_agg( ID ) from @t; ----------- -7077 declare @t table ( ID int ); insert into @t ( ID ) values (-7077), (-8112); select checksum_agg( ID ) from @t; ----------- 1035
However, the function appears to returns 0 for an even number of repeated values.
declare @t table ( ID int ); insert into @t ( ID ) values (-7077), (-7077); select checksum_agg( ID ) from @t; ----------- 0
It's not specific to -7077, for example:
declare @t table ( ID int ); insert into @t ( ID ) values (-997777), (-997777); select checksum_agg( ID ) from @t; ----------- 0
What's curious is that (3) repeated equal values will return a checksum > 0.
declare @t table ( ID int ); insert into @t ( ID ) values (-997777), (-997777), (-997777); select checksum_agg( ID ) from @t; ----------- -997777
But a set of (4) repeated equal values will return 0 again.
declare @t table ( ID int ); insert into @t ( ID ) values (-997777), (-997777), (-997777), (-997777); select checksum_agg( ID ) from @t; ----------- 0
Finally, a set of (2) uneuqal values repeated twice will return 0 again.
declare @t table ( ID int ); insert into @t ( ID ) values (-997777), (8112), (-997777), (8112); select checksum_agg( ID ) from @t; ----------- 0
I'm trying to load data from old SQL server 2000 to new SQL server 2014. I need to do a checksum to check if all the source data is loaded in the target database(SQL server 2014). I've created the insert statement for the same which works. I need to use checksum to make sure all the source rows are loaded in the target table. I haven't done checksum before.
Here is my insert statement:
INSERT INTO [Test].[dbo].[Order_tab] ([rec_id] ,[date_loaded] ,[Name1] ,[Name2] ,[Address1] ,[Address2]