Dealing With Error Records - Best Practice Question
Oct 15, 2007
There's lots of smart people here who can, I am sure, offer insight on this one.
I need to add handling of erroneous data to my package.
This is what my SSIS package does. The source table has thousands of records in it, most of which have been processed already (have a status of "Finished".) A typical run would have 400-600 records to process. The runs happen overnight.
Steps:
1) Execute SQL task that Updates all records in input table with status "New" or "Error" to "Process"
2) Data Flow task that takes as input all records with status "Process" and outputs to destinate OLEDB table
3) Execute SQL task that Updates all records with status "Process" to "Finished"
Erroneous records are identified at step 2 (Data Flow task) and need to be marked as "Error" so that they are not marked as "Finished" by step 3. They will be picked up and reprocessed by step 1 next time around.
I am having trouble seeing what would be the best way to achieve this? I am concerned that any steps I take might be prone to deadlocking as the updated data is also the source data?
My initial thought is to output some data to an "error" table in step 2, and then insert another Execute SQL task before step 3 that will update the source table to mark all these records as "Error" records. Does this make sense? Is there a best-practice way to achieve this?
Hi, I have an SSIS package that runs each day from a live data source to create a data mart, which is then used for various things including SSAS and SSRS.
The problem is that certain records that will eventually go on to form fact tables are deleted from the live system (not a very robost database in the first place, hence the SSIS!) but these are not reflected in the SSIS transformation, creating plus figures when compared to the live system.
I currently use type 1 slowly changing dimension processes in each data flow (of which there are about 35) but I realise that this only updates records and does not delete.
The solution I have in place is to truncate the fact tables in the mart before the run starts using an Execute SQL task. This solves the problem though to me seems a little heavy-handed and renders the slowly changing dimension processes redundant (as it is currently only run once a day).
My question is, is there a better method of dealing with the above scenario? If there isn't, it would be a nice feature to add to future versions (*nudge nudge*).
What's the best practice for adding / editing a record into a database with lots of fields ?I am not talking about the mechanics of it, as there are a lot of trivial examples using ADO.NET, stored procs, etc. Deleting is easy, you just pass in (a few) primary key/keys to uniquely identify the record. But in the real world when you have, say, a table with 100 fields! Do you code the INSERT sproc by hand, with 100 parameters... then call it with your ADO.NET code ? sounds like a lot of work to me... What about updating! That's even worst, sometimes you may need to update only 3 or 4 fields, but using sprocs you would have to pass the whole 100 parameters in again, and "update" the whole record (when in fact you are only changing 3 or 4 fields). With the update i could write different sprocs targeting only the fields i wish to update, but that sounds like duplicating work, vs having one generic update proc. Sometimes i just feel like bypassing sprocs and having inline sql as it would be less work... but i know it is untidy.. and more potential to be buggy. So come on guys (and gals)... let's hear your thoughts on how you would handle the insert / update scenarios when you have lots of fields ? Northwind examples are too trivial :-)
What is best practise for what this number should be?
I have seen guidance saying set to a number greater than 25000, but not from any source I particularly know or trust. (I checked SANS, NIST-CIS and the NSA, but I couldn't find anything.
Hi All, I need to send out email when error occurs in the package. Is it a good practice to put the send email task in the event handler? Then MaximumErrorCount is set to 1. But for some reason, some time I saw more than one email are sent out. Please advise. Thanks
Hello! Is there an easy way to deal with this situation below when reading in data from a SQL Database: int? myNullableColumn;myNullableColumn = Convert.ToInt32(datarow["datacolumn"]);
Where, ideally, 'myNullableColumn' would be 'null' if the value was 'DBNull.Value'. This does not work because Convert.ToInt32 will not convert 'DBNull.Value to null', but instead throws an error. Is there a built in funtion that does do this? Thanks!
Ok, I have a table with about 47000 records in it. I have the following query for that table:Select ReportType = Case When ReportType = 1 Then 'Uniquery Report' When ReportType = 2 Then 'SABRE Report' When ReportType = 3 Then 'Menu Report' Else Null End, ReportNameTo_, Frequency.Frequency as Frequency, ReportDate, ReportDescription From Report Inner Join Frequency on ( Report.ReportFrequency = Frequency.FID ) Where ( Active = 1 ) And ReportDate = ( Select Max ( ReportDate ) From Report Where ( Active = 1 ) ) And ReportID = ( Select Max ( ReportID ) From Report Where ( Active = 1 ) ) The idea is that i need to get only the last report based off of unique reportname. I added a computer column to the table to give me the ReportNameTo_, since my deliminator is the _. Now my issue is that I have 1 records showing (the last record added to the table), which is right for the query that is written, but wrong for what I want. I need to only return the last record for each unique ReportNameTo_. So as an example, my table has the following ID, ReportNameTo_, Date fields the data looks something like this: 1, 123_, 1/1/20082, 123_, 1/1/20083, 124_, 1/1/20084, 124_, 1/1/20085, 125_, 1/1/20086, 125_, 1/1/20087, 126_, 1/1/20088, 126_, 1/1/2008 I only want to return the following: 2, 123_, 1/1/20084, 124_, 1/1/20086, 125_, 1/1/20088, 126_, 1/1/2008 Hope someone out there can let me know how to do this... I am almost there, just not all the way.
Hello! Just looking for advise on dealing with duplicates in database. I have a contact table that have a bunch of duplicated customer records. My goal is to combine all duplicated records into one record. This involves couple tables:contact,contact history ,calendar. All tables related by common column "accountno". What would be the best approach for this?
I have the following query in a stored procedure. If there are no rows in the history file, it returns a null. If there some setting or function that would have it return a zero if no rows are found? I use the variable to do arithmetic later on and a null messes everything up.
select agent, name, surname, address, cust1_text01, cust1_text02, phone1, case call_type_id *when NULL then '' else call_type_id end as 'call_type_id' from Record_T
* I have also tried when NULL then space(1)
yet the query still returns NULL when this field is empty ? the idea is to always return data, even if the field is NULL to replace it with an empty space or spaces.
Question 1: In my senario i've developed a system which utilizes 2 database, i've writen queries like db1.dbo.table1 join db2.dbo.table2 etc... Now that db2 is getting huge, client wants to shift it to another server. I don't know how to modifiy my queries to cope with such situation. Could somebody plz tell me on how to you write queries involving two databases from different servers.
Question 2:
I'm maintaining second database (db2) to keep track of records of db1 which have been processed by my software, so that when db1 gets added with more records i can compare db2 table with db1 table to identify which records are new. db1 is not my database and i don't have any control over that, (it's some erp db), is there any way of identifying which rows have been processed. Can the need for db2 be eliminated?
hi ive got a inert sub where i grab values from text boxes etxthe values are passed to a stored procedure however , one of these fields is a date field , but the field is not required ...so on this line if the date text box is left blank i get an error , not a valid date .Parameters.Add("@actiondate", SqlDbType.DateTime).Value = txtActionDate.Texti have tried ( the actiondate field can take nulls ..)if txtActionDate="" then .Parameters.Add("@actiondate", SqlDbType.DateTime).Value = nothing else.Parameters.Add("@actiondate", SqlDbType.DateTime).Value = txtActionDate.Textend if but this doesnt workwhat is the best way of allowing blank values to be passed to the stored procedure( it doesnt fall over with normal text / varchar fields ) thanks
I am trying to make a stored procedure in SQLServer Express.The question is related to this stored procedure / transact - sql. I think i am doing something wrong with datetime.Here is the stored procedure.The error i am getting is that:Msg 241, Level 16, State 1, Line 20Syntax error converting datetime from character string. ===================================== DECLARE @websiteID intDECLARE @dateFrom datetimeDECLARE @dateTo datetimeDECLARE @sortbystring varchar (20)set @websiteID = 1set @dateFrom = Convert(datetime, '2007-02-07 12:01:00')set @dateTo = Convert(datetime, '2007-03-07 11:59:00')set @sortbystring = 'Campaign'IF ISNULL(@dateTo, '') = ''begin SET @dateTo = @dateFromendSET @dateTo = DATEADD(d, 1, @dateTo)DECLARE @str CHAR(400)LINE 20: SET @str = 'SELECT dateEntry, c.name as Campaign, e.firstname as FirstName FROM entry e, campaign c WHERE e.campaignID = c.id ' + 'AND c.websiteID = @websiteID' + 'AND (ISNULL(' + @dateFrom + ', '''') = '''' OR e.dateEntry BETWEEN '' + @dateFrom + '' AND '' + @dateTo + '') ' + 'AND e.IP NOT IN (SELECT IP FROM IP) ' + ' ORDER BY dateEntry DESC'print (@str)===============================================
I have a stored procedure that takes less than 1 second in sql query analyzer to return my results. I run this same SP in ASP.NET using a calendar control and using perf monitor I notice that for me from my dev machine my cpu utilization is sometimes over 40%.Is there any tweaks I can do to help decrease CPU utilization.
I want the procedure to process 1) all data is no dates are presented 2) all data after the start date, if no end date is supplied 3) all data before the end date if no start date is supplied 4) all data between the start and end dates if both are supplied
Now, instead of an elaborate conditional, I added this to the WHERE clause of my SQL statement:
AND ((@start_date IS NULL OR service_date >= @start_date) AND (@end_date IS NULL OR service_date <= @end_date))
It works fine, but I want to know if anyone has a different/better way of doing it, or if there is a big bug waiting to happen here.
I typically don't like to create multipurpose routines in my code, but this is a better approach for my in a non-object-oriented world of SQL.
In SS 2000 it seems that there is no variable data type that can hold more than 8000 characters (varchar) or 4000 unicode characters (nvarchar). I've seen posts where multiple variables are spliced together to extend this limit. I am looking at performing string manipulations in an sproc and I need to be able to deal with the full 2GB/1GB limit of text and ntext field types. Is this possible? How do you deal with that?
Hello all. Got bit of a long winded question here...........so here we go lol.
OK.......ive got data on an Excel spreadsheet. Ive set the spreadsheet up as a linked server and i'm creating a set of insert statements from it by using the following code:
For most records this generates a correct insert statement.........for example:
INSERT INTO TRAINREC (EMPLOY_REF, COURSE_NAME) VALUES ('153', 'NMA Panel');
However.........my problems start when the value for course name is containes an ' character. If it does the insert statement generated is incorrect. For example:
INSERT INTO TRAINREC (EMPLOY_REF, COURSE_NAME) VALUES ('139', 'Annual Accounting in Lloyd's Market');
can anyone suggest any ideas on how to get round this? Also if i havent explained it clearly enough just let me know and i can try and expand on it.
I have a stored procedure as a recordsource from a contacts table. Inthis example, users can enter parameters to limit contacts by firstletter of last name or company name or keywords:Example:@myName nvarchar(30) = null,@Alpha char(1) = nullSELECTContacts.ContactID, ContactType,CASE WHEN Contacts.ContactType = 0THEN Contacts.CompanyNameELSEisNull(Contacts.LastName,'?') + ', ' +isNull(Contacts.FirstName,'?')ENDAS CNAMEFROMContactsWHERE(Keywords Like '%' + @myKeyword + '%' OR @myKeyword is Null)So far, so good, but...The problem is I want to also give the user the option of filteringalphabetically by first letter. I can't figure out how to deal withnulls in this example (user doesn't enter anything as parameter@Alpha):AND(@Alpha = CASE ContactType WHEN 0 THEN Left(LastName,1) ENDOR@Alpha = CASE ContactType WHEN 1 THEN Left(CompanyName,1) END)Any help is appreciated,LQ
I just realized something. In the old DTS package I am migrating, there is an ActiveX script that checks for a certain condition in a row. If the condition is true, then it does:
DTSTransformationStat_SkipRow
I just can't believe there isn't an equivalent functionality in SSIS.
However, so far, I have tried the following:
1.) Redirect file error output (on all columns in the file)
2.) Use a conditional transform to search for a text string in a column (the "bad" row has different text in it)
And still, I keep getting errors that there is an "impartial row" in the file. Yes, I know that - why doesn't the error redirection catch this? Why doesn't the conditional expression catch it either?
Am I missing something here? Is it just buggy? I find it hard to believe I have to work around something that worked just fine in DTS.
What methods work for storing empty dates? I've read that some people pick an old date and use it to represent empty. I'm not fond of the idea, because then I'll have to strip that date whenever I display the field in my UI.
Any other ways to do this? I'm using SQL Server 2005 and C#.
I'm building a C# database application that access a remote sql 2005 database. For the moment I am using sql express edition. My application will be running in several REMOTE camps which only have an internet connection via sattelite. The sattelite connection has a very high latency. I am wondering what workarounds or solutions are available for this situation. All applications need to access the same database and preferebly be notified when changes take place on the database.
Hi EveryoneVery new to .net and currently dipping my toes in the water with a small application, but getting to the point - I have a form which has somel text fields that expect a date but which are not a required field so in other words the user can leave them blank. The code behind page stores the information using a stored procedure which I add parameters to in the following fashion - SqlParameter userdate = new SqlParameter();userdate .ParameterName = "@dtdate";userdate .SqlDbType = SqlDbType.DateTime;userdate .Direction = ParameterDirection.Input;userdate .Value = dtdate.Text.ToString();cmd.Parameters.Add(userdate); Now if I leave the text field dtdate blank I receive an error because the above expects a date. If I remove the line userdate .SqlDbType = SqlDbType.DateTime; I don't recieve an error but my stored procedure saves the date as 01/01/1900 or similar. I believe this is because in my stored procedure the paramger dtdate is defined as @dtdate datetime Obviously I want to have it so that if the user leaves the text field empty then no date is saved in the database and was wondering how other people tackle this scenario.
Hello, I have a question, i loaded 2 files into SQL and the files have some cells that have the same model number. how can I merge the cells together that have the same model number and (if possible take the avarage of their cell called price) (and combine their other cell called stock) and make it into one cell. Any help would be very very apriciated. Thank you. i tryed this but it does not work SELECT Model_number FROM Products Join Where Model_number='3CM3C1670800B' I have also Tryed this, IT SHOULD work but I have an error someWhere: delete from Productsfrom part_number a join (select part_number, max(part_number) from part_number group by part_number having count(*) > 1) b on a.part_number = b.part_number and part_number < b.part_number
I am using SQL Server to return a XML result set. I then perform a XSLT transformation on the returned result set to fill in HTML form text and select elements. The data returned includes the & character. This character correctly transforms, however I believe that the & is negatively impacting my form post (one of the form elements disappears from the posted data). How can I get around this?
Hi post a sample code create table testNull( a int not Null, b varchar(5), c varchar(5) )
insert into testNull (a,b,c) values(1,'Alex','test') insert into testNull (a,b) values(2,'Alex2')
1. select * from testNull -- return 2 rows 2. select * from testNull where a <> 3 and b <> 'C1' and c <>'C2' -- return ONLY 1 ROW !!! 3. select * from testNull where a <> 3 and b <> 'C1' and isNull(c,'') <>'C2'
query 2 will retun only 1 row, because value of column c is Null
Question Is any setting could be changed on db or server level to prevent errors with missing row in 2-nd query , or I have use isNull operator for every column acepting Null as value ?
Hi, I have a problem with dealing with result sets returned from stored procedures.
I have a procedure like: CREATE PROCEDURE SampleProcedure AS BEGIN SELECT * FROM SampleTable END GO
By executing this stored porocedure is returned result set containing data from SampleTable table. (EXECUTE SampleProcedure)
The returned resultset can be seen in Query Analyzer and can be handled from ADO.NET without any hesitate.
But I can't use this result set from other stored procedure. I tried: SELECT * FROM (EXEC SampleProcedure) But there is sintax error in select statement.
Does anybody know, how to store the result set into a teporary table or select it by SELECT statement?
If I subtract 14 days from a datetime field, will the time of day that I run this query affect the resultset? I am running the query during "normal business hours", 8 am - 5 pm, and the records are entered during this time frame as well.
I am getting a headache trying to research what to do when you have a large number of parameters to include in a query. For example, if I have a large number of checkboxes for the user to pick criteria for a report and they select several, I'm assuming it would be bad practise to say:
WHERE Field = "a" OR Field = "b" OR Field = "c" OR Field = "d" OR Field = "e" OR.....etc etc etc
Is there a good solution for this, given that the number of parameters may vary dramatically depending on what the user selects to include in a report?!
I'm running SQL Server 2000 with an ASP front end.
Hello,Suppose I have the following table...name employeeId email--------------------------------------------Tom 12345 Join Bytes!Hary 54321Hary 54321 Join Bytes!I only want unique employeeIds return. If I use Distinct it will stillreturn all of the above as the email is different/missing. Is there away to query in SQL so that only distinct employeeId is returned? noduplicates.I wouuld like to say WHERE no blank fields are present to get theright row to return.Many thanksYas
We are looking to store a large amount of user data that will bechanged and accessed daily by a large number of people. We expectaround 6-8 million subscribers to our service with each record beingapproximately 2000-2500 bytes. The system needs to be running 24/7and therefore cannot be shut down. What is the best way to implementthis? We were thinking of setting up a cluster of servers to hold theinformation and another cluster to backup the information. Is thispractical?Also, what software is available out there that can distribute querycalls across different servers and to manage large amounts of queryrequests?Thank you in advance.Ben
Thought I would share this since it caused me so much grief.
In some mainframe systems, some dates are stored as the string "00000000". In my SSIS package, I was trying to anticipate for this string, as well as any other combination of zeros (e.g., "000", "0000", etc), since I had already seen lots of dirty data in the flat file (like non-printing characters, etc).
So, what I tried to do was perform an integer conversion on the string and test if it was the equivalent of the numerical value zero:
Code Snippet
(DT_STR)[ColumnName] == 0 ? .... Now, for some reason, that doesn't work, even though a similar operation in SQL does work:
Code Snippet
SELECT TOP 1 ISNUMERIC('00000000') FROM tableName
In the end, I had to resort to testing for a match on the literal string "00000000" and hope that no other dates came in as "000" or other variation. Fortunately, this has been true so far.
However, the moral of the story is, converting a series of zeros into a numeric zero, and testing against that, does not seem to work. I don't have a good explanation for why that is, but I would guess it has something to do with the limitation of the conversion function.
Hi I want to use CLR for developing database object such as stored procedures. I have read the "Getting Started with CLR Integration" from MSDN help and successfully create my first procedure. I create an assembly in SQLServer2005 with this code:
CREATE ASSEMBLY helloworld from 'c:helloworld.dll' WITH PERMISSION_SET = SAFE
My questions are : How should I deal with helloworld.dll after creating Assembly? Can I delete this file? What should I do for uploading my application on the webserver? Should I upload any .dll file?