Custom Property For Remove Duplicates Transform Input Row
Oct 15, 2006
Im working through the MS example of "removeDuplicates". I cant seem to figure out how to add custom property for input column.
I added the helper method:
private static void AddIsKeyCustomPropertyToInput(IDTSInput90 input, object value)
{
IDTSCustomProperty90 isKey = input.CustomPropertyCollection.New();
isKey.Name = "IsKey";
isKey.Value = value;
}
I call it from:
public override void ProvideComponentProperties()
{
//...
AddIsKeyCustomPropertyToInput(input, false);
//...
}
public override void ReinitializeMetaData()
{
IDTSInput90 input = ComponentMetaData.InputCollection[0];
if (input.CustomPropertyCollection.Count == 0)
{
AddIsKeyCustomPropertyToInput(input, false);
}
// ...
}
However when I deployed it and added the component to SSIS package - I cant see the Custom Column "IsKey" in the input column properties window.
What am I missing - please help
I developed a simple custom control flow component which has several read/write properties and one readonly property (lets call it ROP) whichs Get method simple returns the value of a private variable (VAR as string). In the Execute method the VAR has a value assigened. When I put the value of ROP or VAR into MsgBox I can see the correct value. However when I execute the component I can not see the value of the ROP in the property window. I see the property but its value is empty string. For example when I put a breakpoint to postexecute or check the property before click OK in a MsgBox I would expect that the property value would be updated in SSIS as well. Is there a way how to display correct values of custom tasks properties in property window?
What I want to accomplish is that at design time the designer can enter a value for some custom property on my custom task and that this value is accessed at executing time.
I am writing a custom task that has some custom properties. I would like to parameterize these properties i.e. read from a varaible, so I can change these variables from a config file during runtime.
I read the documentation and it says if we set the ExpressionType to CPET_NOTIFY, it should work, but it does not seem to work. Not sure if I am missing anything. Can someone please help me?
In the Editor of my custom task, under custom properties section, I expected a button with 3 dots, to click & pop-up so we can specify the expression or at least so it evaluates the variables if we give @[User::VaraibleName]
I am importing data from xls file to a db table with a dts. In the time of the dts creation I am using 'Transform Data Task Properties' GUI window to map incoming xls fields (source) to the table columns (destination). Question: Is there any way to invoke the 'Transform Data Task Property' GUI window in dts runtime and use it to change the mapping dynamically in the run time? Thanks, Vadim.
I am using the lookup transform with the following settings:
Reference table: Use results of an SQL query. The query retrieves the surrogate key and four business key columns from a dimension table which contains a few thousand rows.
Columns: business keys in the incoming data are mapped to the business keys in the reference table, and the surrogate key is looked up from the reference table.
Advanced: Enable memory restriction is OFF (and the other items on the Advanced tab are greyed out).
I assumed that this means that the lookup transform would cache all of the rows in the SQL query, and then perform the lookups against this cache. This is the behaviour that I saw when I was running the package in my local environment in the BIDS debugger.
However, a colleague was doing some profiling on our production database server, and noticed that the lookup transform is instead issuing a single SQL query for each row of input. Our production ETL server has many GBs of free RAM available (way more than enough to cache the contents of the lookup table in memory), and given that memory restriction is disabled, I don't understand why the lookup transform is behaving in this fashion. Does anyone have an explanation for this? I'm probably misunderstanding a key concept here.
I was just thinking about a situation... Let's say hypothetically, I have a textbox, that i would like someone to input their email address to be added to a mailing list. I would like to first check to see if that email address exists in the database, rather than run a sql statement to check, and then run the update command, is it better to run an IFEXIST() type thing in sql and do the code there?
Hi All I have the dbo.OperatingHour It has many duplicates and I want to remove duplicates permanently The statement below works but when I open the table there are no changes
Insert into OperatingHour(Weekdays, Wednesdays, Fridays,Saturdays, [Sundays/Public Holidays]) (SELECT DISTINCT Weekdays, Wednesdays, Fridays,Saturdays, [Sundays/Public Holidays] FROM OperatingHour)
Welcome,how can I alter following table in order to reduce neighbouringduplicates (symbol, position, quantity, price).Nr Symbol Position QuantityPrice Date1. wz9999b 1 1.02500.0 2007-05-09 08:09:42.6532. wz9999b 2 12.02500.0 2007-05-09 08:09:42.6533. wz9999b 1 100.02590.0 2007-05-10 15:47:04.1404. PZ0008VX 1 2280.8842090.55000000000022007-05-1612:43:12.4035. PZ0008VX 1 2280.8842102.05000000000022007-05-1612:45:27.4206. wz9999b 1 0.0012500.0 2007-05-18 09:47:16.0337. wz9999b 1 0.0012500.0 2007-05-18 09:47:53.2708. wz9999b 1 1.01.0 2007-05-22 12:35:07.8939. PZ0008VX 1 2280.8842102.05000000000022007-05-2409:38:26.16010. PZ0008VX 1 2280.8842102.05000000000022007-05-2409:38:38.80011. wz9999b 1 0.001 2500.02007-05-24 12:35:07.20712 wz9999b 1 0.002 2500.02007-05-24 12:35:14.98713. wz9999b 1 0.001 2500.02007-05-24 12:38:07.207In the result set I would like to get the rows number 6 and 10.Any suggestions??
I have a situation where we get XML files sent daily that need uploading into SQL Server tables, but the source system producing these files sometimes generates duplicate records in the file. The tricky part is, that the record isn't entirely duplicated. What I mean, is that if I look for duplicates by grouping the key columns, having count(*) > 1, I find which ones are duplicates, but when I inspect the data on these duplicates, the other details in the remaining columns may differ. So our rule is: pick the first record, toss the rest of the duplicates.
Because we don't sort on any columns during the import, the first record kept of the duplicates is arbitrary. Again, we can't tell at this point which of the duplicated records is more correct. Someday down the road, we will do this research.
Now, I need to know the most efficient way to accomplish this in SSIS. If it makes it easier, I could just discard all the duplicates, since the number of them is so small.
If the source were a relational table, I could use a SQL statement to filter the records to remove the duplicates, but since the source is an XML file, I don't know how to filter these out in the pipeline, since the file has to be aggregated to search for dups.
DELETE FROM tblContacts WHERE tblContacts.ID IN( SELECT F.ID FROM tblContacts AS F WHERE Exists ( SELECT email, Count(ID) FROM tblContacts WHERE tblContacts.email = F.email GROUP BY tblContacts.email HAVING Count(tblContacts.ID) > 1 ) ) AND tblContacts.ID NOT IN( SELECT Min(ID) FROM tblContacts AS F WHERE Exists ( SELECT email, Count(ID) FROM tblContacts WHERE tblContacts.email = F.email GROUP BY tblContacts.email HAVING Count(tblContacts.ID) > 1 ) GROUP BY email )
I readily admit that I've shamelessly copied 'n pasted this from a tutorial and then taken a stab at tweaking it for my own ends. But I really don't understand what it's doing.
Really, all I want to know is that it will remove records with duplicate email fields. But I could also do with confirming - looking at the "SELECT Min(ID)" bit - does that mean that if it finds a duplicate, it'll delete the latest-added one? And if so, that changing it to remove the earliest-added one is simply a case of changing MIN to MAX?
I'm building packages programmatically and all is well. I have a new custom transform that I developed. It also works fine. Now I'm trying to add my new component to my packages when I programmatically build them, and I'm unable to do that.
Has anyone added their own custom components to a programmatically built package successfully?
I get a COM error on the line that calls ProvideComponentProperties. I've attempted various modifications including not overriding ProvideComponentProperties or just having it do nothing. I always get the same result. What I don't understand is that the custom transform works and handles ProvideComponentProperties fine when it is added to a package in BIDS.
I am working with a bunch of records that have duplicates on the Persid and the intPercentID where there are duplicates I want to remove when I stick them in the temp table, I tried join on tempo table and doing not exists but still inserts, so now I am trying a merge but same thing. how can I keep duplicates from being inserted in the temp table. I made a cursor as well but its slow as heck, but it does work. trying better ways.
Create table #TempStr (STRId int not null Identity(1,1) primary key, Persid int, percentId int, dtCreated datetime, CreatedBy int)
INSERT #TempStr (Persid, percentId, dtCreated, CreatedBy) select intPersonnelID, intPercentID, dtSubmitted, intSubmittedBy from tblSTR whereintpercentId in (61,62) group by intPercentID, intPersonnelID, dtSubmitted, intSubmittedBy UNION ALL
I have table with columns as ID, DupeID1, DupeID2. ID column is unique. DupeID1 and DupeID2 -- the combination should only be there once. I don't want reverse combination of duplicates, i.e. DupeID2, DupeID1 in the table. How can I delete the reverse duplicates from this table?
Product No Grade Quantity A Good A Normal A Bad B Good B Bad C Good C Normal C Bad
In Table 2, Product No divided by Grade. I want to lookup the Quantity from Table 1 to Table 2. The same Product No will have 1 value, the other value is 0. The result for Column Quantity should be like this:
Table 2:
Product No Grade Quantity A Good 1 A Normal 0 A Bad 0 B Good 2 B Bad 0 C Good 3 C Normal 0 C Bad 0
Hi, I want to know how to remove identify property from a column without recreating the whole table... When I do it in Enterprise Manager, it actually drop and recreate the table in background. I just like to know if there is other way without recreating the tables. Thanks! Xiao Tan
Is there any way to remove IDENTITY property on particular table? I tried removing IDENTITY property using Manangement studio, but this operation behind the scene use migration concept that is by creating tmp table and then populating with data; droping the orginal and renaming the tmp back to original.
Second, i want some kind of generic solution using certain system table like aya.sysobjects, sys.identitycolumn etc such a way that i should be able to remove the idenity property from all of the table accross a database.
When data is imported from our legacy system, the same functions need to be applied to several columns on different tables. I want to build a kind of "Function Library", so that the functions I define can be re-used for columns in several packages.
The "Derived Column" transform seems ideal, if only I could add my list of user-defined functions to it. Basically I want to inherit from it, and add my own list of functions for the users to select.
Is this possible ?
What other approaches could I take to building about 30 re-usable functions?
Would anyone happen to have any pointers or know of any good code examples to either programmatically change the type of an input column when it is passed through the component, or add a new column to the output? I am extracting data from an Oracle database which is in Julian date format (represented within SSIS as a DT_NUMERIC column) and I need to to either transform the input column holding it into a date column, or to dynamically add a new output column holding the transformed data.
Within the LinkingID, there are duplicates in ID1 and ID2 but just in opposite columns. I have been trying to figure out a way to remove these set based. It doesn't matter which duplicate is removed. Essentially these are just endpoints and I don't care which side they are on. The solution must recognize the duplicates and not just remove based on every 2nd row.
I have a bunch of contacts that I've scored how well their names match to other contacts in the same business. I can programmatically figure out how to parse the results, but would like to know how to do this via SQL. My problem is for Business_fk 968976 I have 7 contacts. In the end I should have 4 contacts based on name match. For the business key listed Gerardo Lopez is in the ContactScore table twice for Contact keys 7355719 and 57028145. I then have two rows like so:
Each reference each other, and 2 is a good case, a more difficult case would have key 1 listed 10 times showing a ContactMatch_fk of 2 - 11, and then Contact_fk 2 listed 10 times with a ContactMatch_fk of 1, 3-11.I know 57028145 maps to 7355719 from the first row in the ContactScore table, so when Contact_fk of 7355719 comes up I should be able to skip it and not process that match. Hopefully that makes sense. Anyway here is the test data:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ContactScore]') AND type in (N'U')) DROP TABLE [dbo].[ContactScore]; GO CREATE TABLE [dbo].[ContactScore] ( [ContactScore_pk]INT NOT NULL, [Contact_fk]INT NOT NULL,
How can i perform this task with ssis OR TRANSACT SQL? I HAVE THESE ROWS WITH THE NEXT DATA, I want to take just the valid one, BUT I HAVE A LOT OF COMBINATIONS AS following names, it can be animals, things or personal names
GABRIEL OBANDO --CORRECT GABRIEL OVANDO Gavriel OVANDO gAbriel OBANDO GABRIE OBANDO Gabri OBONDA MANAGUA --CORRECT NANAGUA NAMAGUA
We all were new at one point.... any help is appreciated.
Objective:
Combining two 49,000 row tables and remove records where there is only 1 column difference. (keeping the specified column value removing the one with a blank.)
Reason:
I have 2 people going through a list, coding a specific column with a single letter value. They both have different progress on each sheet. Hence I am trying to UNION them and have a result of their combined efforts without duplicates.
My progress/where I'm stuck:
Here is my first query/union:
SELECT * FROM [Eds table] UNION SELECT * FROM [Vickis table];
As shown above, I have unioned these 2 tables and my results removed th obvious whole record duplicates, but since 1 column is different on these, a union without criteria considers them unique.....
an example of duplicates that I must remove are as follows:
I had Excel file input & import to DB Table by using Data flow in SSIS.but it had duplicates so I dont use the Dupe Records
So I planned like below:
Method 1: Here OLEDB Destination are Good Records(Without Duplicates) OLEDB Destination are Not Good Records(only Duplicates) or Method :2 If I add a column(GOOD_RECORD) in DB Table and Should I update '1' for top 1 record (for Good Record) and remaining as '0' for other Records (for Dups)latter I utilize Through flag of GOOD_RECORD
i.e.,, select * from DB_TABLE where GOOD_RECORD='1' .
I think that Method :2 Advisable for Performance/flexible but Here How can I update by using SSIS(Data flow) ????
I have some duplicate values for my query results, about 200 duplicates out of 30000 rows. Of these 200 duplicates I want to keep the ones that have a higher value for... 'UpdatedBatchID'.
SELECT IR.Id as 'ID' , CAST(IR.Priority as varchar) as 'Priority' , IRSupportGroupDN.DisplayName as 'Support Group' , DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),IR.CreatedDate) as 'Created Date' , DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),IR.ResolvedDate) as 'Resolved Date' , SLOConfig.DisplayName as 'SLO' , DATEADD(MI,DATEDIFF(mi,GETUTCDATE(),GETDATE()),SLOFact.TargetEndDate) as 'SLO Target' , SLOStatusDN.DisplayName as 'SLO Status' , SLOMetric.DisplayName as 'SLO Metric' , SLOFact.UpdatedBatchId as 'UpdatedBatchID'
What happens when you add the Ignore Case flag into the mix?
I'm having a hell of a time - I'm dealing with an SCD situation using TableDifference component and I have both existing dimensions and new data coming in, each go through identical Case-Insensitive/Sort with remove duplicates, but I'm getting identical new and deleted records detected - I think because of ordering issues. I'm still trying to whittle the test case down, but I think data from all around the records I'm investigating seems to get sorted in between them, so I'm having trouble getting a small test case built.
I think the mixed case data is the root of the problem, and I think the design is bad, but before I go back to the technical lead, I need to understand enough to show that you cannot take two pipelines sorted and de-duped case-insensitively and then do a case-sensitive table difference operation.
I wrote a custom task following the outline on MSDN. I signed it and installed it into the Tasks folder and in the GAC.
When I go to an SSIS project and add my task, the properties window shows "Could not get value for property 'd61935d9-430b-4c93-9f3e-a29f720d8659'. Specified cast is not valid." (where the guid is different obviously) for many of the properties.
What have I done wrong?
Update: I know this isn't my code because I tried a simple task that just returns success and doesn't do anything. I get the exact same errors, so I must be installing it incorrectly.
I'm wondering if anyone's accomplished this before - I've been unable to find a whiff of info on how to do this so far.
I'm creating a custom component that I'd like to give a "Derived Column" type of ability to. By that, I mean I'd like to populate a property of my component with an expression (including references to input columns, package variables and functions) and be able to evaluate it at runtime - per row processed by the component.
I would also appreciate any information as to how to provide the interface to allow the user to build such an expression as well - is there a UI function in SSIS I can call to pop up the "expression builder"?