SQL Server 2012 :: Data Compare To Identify Change
Mar 3, 2015
I am in process to develop TSql code to identify change in data.
I read about Binary_checksum and hashbyte. Some people say hashbyte is better than binay_checksum as chances of collision are less.
But if we may consider following, chances exist in hashbyte too. My question is what is the best way to compare data to identify change (I can't configure CDC) ?
Is there a efficient way to compare two different columns of 2 different rows in a data set as shown below.
For eg: I would like to DateDiff between Date2 of RowID 1 and Date1 of RowID 2 of IDNo 123. After this comparision , if datediff between two dates are <=14 then i want to update 1 else 0 in IsDateDiffLess14 of RowID1 . In below example its 0 because datediff of two dates >=14. So, want to compare the Date2 and Date1 in this sequence for the same IDNo. For RowID 6 there is only 1 row and no other row to compare, in this case IsDateDiffLess14 should be updated with 0.
Our business get orders through the week with the weekends (Fri & Sat) orders being higher than weekdays. Im wanting to graph this years data with last years and possible the years before but to compare days in such a way that the all the weekdays line up. so comparing 2015 week 1 with 2014 week 1 but with 03/01/2015 (Sat) lining up with 04/01/2014 (Sat) etc.
I'm looking for alternatives to adding or removing days from the dates to solve this issue, i have a date dimension table for the past 5 years that i can use to compare calendar week 201401 with calendar week 201501 but I am finding it a bit inflexable.
ColA ColB ----- ----- 21 A 22 A 23 A 24 B 25 B 26 D
What I want is to be able to identify a set sequence (1,2,3) based upon ColB such that I'd get the following result:
ColA ColB ColC ----- ----- ----- 21 A 1 22 A 1 23 A 1 24 B 2 25 B 2 26 D 3
I know that I should be able to get it using ROW_NUMBER() OVER (PARTITION BY ColB ORDER BY ColA), but instead of getting the sequence (1,1,1,2,2,3) I get (1,2,3,1,2,1). Using DENSE_RANK gave me the same results.
I want to identify rows that go negative but only for 2 cents or more as well as identify rows that 2 or more.
I have this expression that does not work how I want it to work:
CASE WHEN (SUM(FavUnfavCostChange) < (2/100) THEN 'Less' WHEN SUM(FavUnfavCostChange) > (2/100) THEN 'More' ELSE NULL END AS 'Flag'
But I get:
0.00000815000000000000More -- this is not more than 2 cents, is just a positive number -0.00094700000000000000Less -- this is not less than 2 cents, is just negative number -0.00222000000000000000Less -- this is not less than 2 cents, is just negative number -0.00012250000000000000Less -- this is not less than 2 cents, is just negative number 0.00000000000000000000NULL -- this is zero so null is fine 0.01188576000000000000More -- this is not more than 2 cents, is just a positive number
DECLARE @EffLevels TABLE (ChangePoint int, Value Int)
INSERT@EffLevels SELECT'1000', '767' UNION ALL--Changed SELECT'1000', '675' UNION ALL SELECT'1001', '600' UNION ALL--Changed SELECT'1001', '545' UNION ALL SELECT'1001', '765' UNION ALL SELECT'1000', '673' UNION ALL--Changed SELECT'1002', '343' UNION ALL--Changed SELECT'1002', '413' UNION ALL SELECT'1002', '334' UNION ALL SELECT'1001', '823'--Changed
-- My Result should be -- ChangePointPrevChangePointValue -- 1000Null767 -- 1001 1000 675 -- 1000 1001 765 -- 1002 1000 343 -- 1001 1002 823
I am using SQL Server 2012 and to me a part of data captured by CDC is not making sense.
I have a table called 'Schema.Table1', and I enabled CDC on it by running 'sys.sp_cdc_enable_table'. I see that a table called 'cdc.Schema_Table1_CT' got created which now gets an entry when ever I Insert, Update or delete a record in the original table.
Till this point every thing works fine.
My original Table has a NOT NULL INT column called 'AuditTrackerUserID' with a default value of 1996. My application does not provides a value for this column, but because the column itself has a default value, records get inserted without error.
When I try to execute the following Query I see multiple records with __$operation of 3 and 1.
SELECT * from cdc.Schema_Table1_CT where AuditTrackerUserID IS NULL
My expectation is that I should not ever see any record returned by this query because AuditTrackerUserID is a not null column, but I do.
I'm rewriting a huge FOR XML EXPLICIT procedure to use FOR XML PATH, and need to compare previous output to the refactored one, so i didn't mess up XML structure.
The thing is, i'm not sure that SQL Server will always generate exactly same xml **string**, so i'd rather not compare by:
WHERE CAST(@xml_old AS NVARCHAR(MAX)) = CAST(@xml_new AS NVARCHAR(MAX))
nor do i want to manually validate every node, since the generated xml-structure is quite complex.
Table A IdName 101Dante 102Henry 103Harold 104Arnold
Table B NumberName 102Dante 107Gilbert 109Harold 110Arnold 106Susan 112Marian
I want the result in table 3 like below, if value exists in Table A and not exists in Table B then the record should enter in table 3 with table name in new column, and vice versa.
Table C Col1Col2 HenryTable A Gilbert Table B Susan Table B Marian Table B
using below logic to get the values from tables..
select t1.columnA , t2.* from table1 t1 join table2 t2 on t2.columnB = t1.columnA
using below script to compare two tables and get the values.
how to get the count of 'Table A' , 'Table B' , 'Table A & Table B' using below script.
Ex: 'Table A' -- 150 'Table B' -- 300 'Table A & Table B' -- 150 SELECT Col1 = ISNULL(a.name,b.name), Col2 = CASE WHEN ISNULL(a.name,'') = '' THEN 'Table B' WHEN ISNULL(b.name,'') = '' THEN 'Table A' ELSE 'Table A & Table B' END FROM #tableA a FULL JOIN #tableB b ON a.name = b.name;
SET NOCOUNT ON; DECLARE @items TABLE (ITEM_ID INT, ITEM_NAME VARCHAR(10)) INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 10,'ITEM 1' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 11,'ITEM 2' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 12,'ITEM 3' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 13,'ITEM 4' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 14,'ITEM 5' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 15,'ITEM 6' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 16,'ITEM 7' INSERT INTO @items (ITEM_ID, ITEM_NAME) SELECT 17,'ITEM 8' SELECT * FROM @items
-- table with categories
SET NOCOUNT ON; DECLARE @categories TABLE (CAT_ID INT, CAT_NAME VARCHAR(10)) INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 100,'WHITE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 101,'BLACK' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 102,'BLUE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 103,'GREEN' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 104,'YELLOW' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 105,'CIRCLE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 106,'SQUARE' INSERT INTO @categories (CAT_ID, CAT_NAME) SELECT 107,'TRIANGLE' SELECT * FROM @categories
--table where categories are assigned to master categories
SET NOCOUNT ON; DECLARE @master_categories TABLE (MASTERCAT_ID INT, CAT_ID INT) INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,100 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,101 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,102 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,103 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 1,104 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 2,105 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 2,106 INSERT INTO @master_categories (MASTERCAT_ID, CAT_ID) SELECT 2,107 SELECT * FROM @master_categories
-- items-categories assignment table
SET NOCOUNT ON; DECLARE @item_categories TABLE (CAT_ID INT, ITEM_ID INT) INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 100,10 INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 105,10 INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 100,11 INSERT INTO @item_categories (CAT_ID, ITEM_ID) SELECT 105,11
[code]....
So now I need to query the table @t4 in and to determine the items that are assigned to category 'WHITE' in master category 1 and to 'CIRCLE' in master category 2.The important thing is to return items that are assigned solely to 'WHITE' in master cat 1 and solely to 'CIRCLE' in master cat 2.In the above example it would be only the ITEM 1 (id=10) that is returned:
1. ITEM 2 (id=11) is not returned because it has the assignment to category 'SQUARE' in master cat 2 additionally
2. ITEM 3 (id=12) is not returned because it has the assignment to category 'BLACK' in master cat 1 additionally
3. ITEM 4 (id=13) is not returned as it does not have assignment to category 'CIRCLE' in master cat 2 but only to 'WHITE' in master cat 1
3. ITEM 5 (id=14) is not returned as it does not have assignment to category 'WHITE' in master cat 1 but only to 'CIRCLE' in master cat 2
I am task with identifying the source database name, id, and server name for each staging table that I create. I need to add this to a derived column on all staging tables created from merging same tables on different servers together.
When doing a Merge Join, there is no way to identify the source of data so I would like to see if data came from one database more than the other servers or if their are duplicates across servers.
The thing that bugs me about SSIS Data Flow task is there is no way to do an easy Execute SQL Task after I select my ADO.NET Source to get this information because my connection string is dynamic and there is no way of know which data source is being picked up at runtime.
For Example I have Products table on Server 1 and 2:
Server 2 has more Products and would like to join the two together to create a staging table.
I was running an operation to shrink file/emptyfile a data file, and then remove it.
It blocked and caused a huge mess, I suspect on the removal part. But I want to confirm that the emptyfile completed (and that the engine isn't going to try to put more data in there for when I schedule the removal part again a week or more from now).
How does the engine know not to put any more data in there, and how long does that situation last?
I am struggling to come up with a set-based solution for this problem (i.e. that doesn't involve loops/cursors) ..A table contains items (identified by an ItemCode) and the set they belong to (identified by a SetId). Here is some sample data:
SetIdItemCode 1A 1B 24 28 26 310 312 410
[code]....
You can see that there are some sets that have the same members:
- 1 and 10 - 2 and 11 - 7, 8 & 9
What I want to do is identify the sets that have the same members, by giving them the same ID in another column called UniqueSetId.
I run the script below once a day to keep track of row count over time. I would like to compare the results from today and yesterday to see if anyone deleted more than 20% of data from any given table. How would I do this? I really don't need the data anymore than a day just to compare the results.
Mon - Run script to collect row count Tues - Run script to collect current row into temp table ,compare all row count in both tables ,purge records from Monday and insert current Wed - Run script to collect current row into temp table ,compare all row count in both tables
What I need to be able to find is any records where the Discontinue_Date is greater than the Effective_Date on the next row for a given Customer ID and Part_ID. This is a customer pricing table so the Discontinue_Date of row 53 for example should never be greater than the Effective_Date of row 54130, these are the records I'm looking to find. So I'm looking for a SELECT query that would look for any records where this is true. Obviously the last Discontinue_Date row for a Customer_ID will not have a next row so I wouldn't want to return that.
I want to compare the filepath column in table with physical drive files and get the details of files which in table and not in physical and viceversa...
I am trying to write a function to compare the characters between 2 strings and eliminate the similarities to be able to return at the end the number of differences between them.
Having in mind i need the bigger number of differences to be returned also if a character is repeated in one of the 2 words it will be eliminated once because it exist only one time in other string.
I will give an example below to be more clear
--Start declare @string1 as varchar(50)='imos' declare @string2 as varchar(50)='nasos'; WITH n (n) AS ( SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)
[Code] ....
The differences in first string from second one are 2 (i,m) while the differences in second string from first one are 3(nas). So the function should return 3 in previous example.
I have an odd one. I have a SQL job that doesn't have a schedule and is being run each morning. It is a legacy system and I am trying to document the data flow process and I am having a hard time tracking down where/what is starting the job. I see which user executed the job:
SELECT message FROM sysjobhistory WHERE job_id = 'jobid' AND run_date > 'yesterday'
Which is useful, but I want to know what is starting the job.
Say you have a table that has records with numbers sort of like lottery winning numbers, say:
TableWinners num1, num2, num3, num4, num5, num6 33 52 47 23 17 28 ... more records with similar structure.
Then you have another table with chosen numbers, same structure as above, TableGuesses.
How could you do the following comparisons between TableGuesses and TableWinners:
1. Compare a single record in TableGuesses to a single record in TableWinners to get a count of the number of numbers that match (kind of a typical lottery type of thing).
2. Compare a single record in TableGuessess to ALL records in TableWinners to see which record in TableWinners is the closest match to the selected record in TableGuesses.
public Set DoSomthing(Set toBeProcessed, Set measuresToWorkWith)The set measurseToWorkWith is passed as {[Measures].[Measure1], [Measures].[Measure2] ...}
with the measures being real or query-scoped calculated members.
To get the value of the measure for each tuple in the set toBeProcessed, I create an Expression for each tuple (measure) in the set measuresToWorkWith then for each tuple in toBeProcessed call expression.Calculate(tuple) which returns a MDXValue.
My problem is that in order to make the code generic I need to get the real (.NET) data type of the MDXValue. The class only has explicit conversion methods ToInt16() etc which implies that the data type is known at design time.
However, if one of the measures is a query-scoped calculation then it could return a .NET double, int, bool or string.
If the measure is real then I can look up its metadata. However, it appears that if it is a formula (scoped member) then are all bets are off?
I want to remove the rows where A and B are the same as on the previous row. So rows 2, 3, 7 and 9 should be eliminated. Note that A and B can have the same values multiple times, just not in succession in the extract. I've tried ranking but I can't figure out how to keep it from lumping all the values of A and B in the same group. The following incorrectly eliminates rows 5 and 8:
;with data as ( select 1 as A, 1 as B, '2015-01-01' as DTE union select 1 as A, 1 as B, '2015-01-02' as DTE union select 1 as A, 1 as B, '2015-01-03' as DTE union select 2 as A, 1 as B, '2015-01-04' as DTE union
[Code] .....
Of course the real data has many columns and multiple data types that can have nulls. I just want get the row when anything changes. Is there a slick way to do this in SQL?