SQL 2012 :: Scheduling Packages Parallel And Retry On Failure
Jun 1, 2015
I come from other ETL tools (Oracle Warehouse Builder, BODI, BODS & DataStage) and i'm having trouble finding the best practice for scheduling a collection of packages to be processed parallel en retry those that fail. I created a staging project which contains all the packages (50) that extract data from 1 source system and grouped the packages into 2 sequence containers to make sure that the 'heavy' packages are started first and together in parallel.
I soon discovered that there is no standard option to have one child package retry on failure. Currently if 1 package fails the whole project is retried.
I explored checkpoints as a solution but that seems a dead end when running packages in parallel.
There seem to be 2 solutions for my issue:
(1) create a loop around every EPT with 3 variables (waittime, retry_counter & succes_flag)
(2) create an event handler to keep a list of ID's that failed and enable/disable EPT's based on that list (there's a lot more to it).
Option 1 seems like a lot of bloatware in what i expected to be standard functionality. I'm still investigating option 2.
How do others handle this kind of scheduling? Is it so different with SSIS that i'm approaching this incorrectly ?
I have several packages within secuence containers and into one main dtsx package with a checkpoint configuration and when I run it some succeed and some don´t. The problem is that when I rerun it checkpoint doesn´t seem to work ´cause some of the successful packages are rerun as well (and not skipped as it should be...) In other words, the process does not begin on the point of failure..
Seems to be that packages that finish after the failure point (and succeed) are not registered in the checkpoint file, then when I rerun the main package these succeeded packages are rerun too....
I have seen a number of posts regarding parallel development of SSIS packages and need some further information.
So far we have been developing SSIS packages along a single development stream and therefore have managed to avoid parallel development of our packages.
However, due to business pressures we will soon have multiple project streams running in parallel, and therefore multiple code branches, as part of that we will definitely need to redevelop the same SSIS packages in parallel. Judging from your post above and some testing we have done this is going to be a nightmare as we cannot merge the code. We can put in place processes to try and mitigate this but there are bound to be issues along the way.
Do you know whether this problem is going to be fixed? We are now using Team Foundation Server but presumably the merge algorythm used is same/similar to that of VSS and therefore very flaky?
However, not only are we having problems with the merging of the XML files, but we also use script tasks within the packages which are precompiled, as the DTSX files contain the binary objects associated with the script source code, if two developers change the same script task in isolated branches the binary is not recompiled as the merge software does not recognise this object.
Do you know whether these issues have been identified and are going to be fixed to be in line with the rest of Microsoft Configuration Managment principles of parallel development?
I am facing some problem's while using the FOR loop container to execute 7-10 packages in parallel.
The main package has 7 FOR loop containers say F1-F7.
Each FOR loop container has 2 task's
T1==> exec child package C1
T2==> exec delay task Delay1.
The idea is to run child packages c1-c7 in parallel ...delay for some time and then run again since there are in the FOR loop container.
I am facing someproblems.
1. The execution of tasks T1-T7 is not guranteed. This means SSIS picks up any 6 tasks of T1-T7 randomly to start with. 6 is the max it processes whereas i have more than that. Can i change this setting???
2. Its not guranteed that if say Task t1 of FOR loop F1 is executed the subsequent task for Delay within tat For loop would be executed next. Typically wat happens is it starts with T1-T6 (T7 onhold) and then exec the delay for T1-T5 and passes control to T7 without going into the delay for T6.This is not the intended execution.
What i want is exec T1-t7 ..delay for the next exec and start again.
I'm pretty new to SSIS but I've managed to cobble together a number of individual packages to refresh SQL tables from a 3rd-party database. Â Now, what I'd like to do is have a single package that I can use to invoke each of the individual ones. Since it will run on a quad, I'd like to invoke them such they'll run in parallel.
I'm expecting to run 3 isloated version of the package with in first version VARA=1 VARB=0 VARC=0 second version VARA=0 VARB=1 VARC=0 third version VARA=0 VARB=0 VARC=1 but it doesn't seem like doing that the maxconcurrent variable is set to 40 to be on the safe side.
when I run I get
first version VARA=1 VARB=0 VARC=0 second version VARA=1 VARB=1 VARC=0 third version VARA=0 VARB=1 VARC=1
I have a system of SSIS packages in which several packages perform the same lookup on the same table. E.g., i have PackageA, PackageB and PackageC all doing a lookup on TableA. All of these packages are spawned by the same PackageD and run frequently. In some cases, there is an issue with concurrency on these lookups. I get the following exception :
" The ProcessInput method on component "LKP Lookup SecurityID" (6658) failed with error code 0xC004702C. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
"
The hex code of this exception corresponds to the following description : "DTS_E_BUFFERNOTLOCKED. This buffer is not locked and cannot be manipulated." That's as much as i could find on this.
My suspision is that the SSIS engine somehow figures that the lookup in these distinct packages is the same one and builds a shared version of the lookup table in memory. Then there is some sort of a multi-threading issue in accessing this shared memory which leads to the exception above.
Has anyone experienced this? Can someone shed some light on this?
One of our dba's runs a process every night to update the database with a daily data file received from an external source. He was testing on a new SQL Server 2012/Windows 2012 R2 cluster that has an Availability Group. While trying to process INSERTs, the process failed with a error: "could not allocate a new page for database X because of insufficient disk space in filegroup PRIMARY."
The log also contains "Operating System Error 1237 (The operating could not be completed. A retry should be performed) encountered".
However, there is 300 GB free on the data drive (E:) where the .mdf file is located. The SQL Server service account has the "Perform Volume Maintenance Tasks" permission (instant file initialization).
All of the disks are VMware 5.1 or 5.5 VM's and the E: disk has thick/eager zero provisioning.
I have seen lots of messages on scheduling DTS packages, but none have solved my specific problem. I could manually run my local DTS package, but it wouldn't run when I scheduled it. The problem was with connecting using OLE DB provider. I checked everything: my NT account had all relevant permissions, as did the SQL Server agent account. I tried playing around with the job owner.
In the end, the only thing that worked was if I physically created the local DTS package on the Server rather than sitting at my workstation (still under my NT account though - I even logged on to the server with my NT account). The job then ran fine.
I think it is something to do with NT authentication (as I never got the problem before using NT auth). Can anyone shed any light on it? Is it something to do with the fact that with NT authentication the workstation ID is an advanced property of the OLE DB connection?
I have several independant DTS packages that I would like to schedule and run as 1 job stream. (sql2000) I can schedule them individually, but I would prefer to have each one be a step of 1 big scheduled job.
Hi all! I just started a new job, and im a junior programmer, so have no idea about anything yet... My boss asked me to run a SSIS package weekly. I found out that I can do it throught the SQLServer Agent job task. Am I right? As i'm not sure about it.
Well I configure it, first as a Transact-Sql with the cmdShell, and it gave me an error like I have to start the messenger Service, I cannot do it as i'm not as administrator. Any way to sort this problem out?
Another thing I could do, I guess is configure the job as a SQL Server Integration package, but it doesn't find my package. Should I save it in a special path??
Any help would be great! As I'm really lost in it, and I've been 3 days trying to sort it out, with no luck. My boss is expecting me to do it by tomorrow...
Oh... finally, I guess I should add a Execute sql job task in the SSIS package? I've already done it, but it may be wrong.
I have a problem when trying to schedule a Database /Object Transfer. The transfer between two similiar databases located on differernt servers is consistantly failing even though it will be successful if I choose the Start Transfer Option. The error message that appears is 'Process Exit Code 1. Cannot connect to source server'. This is even with the other server regestered on both servers ! I have tried different levels of security (sa and trusted), re-regestering the servers, WINS settings in TCP/IP but all to no avail. The servers are configured in the same way. If you have come across the same problem, please share the solution. Any suggestions greatly appreciated.
I have created and run SSIS packages from my workstation that are successful. When I Build and then Deploy the same package to the server and schedule it, it fails. I am using SQL authentication connecting from a SQL 2000 database to a SQL 2005 database via OLE DB connections.
The following is the exported result of the package...
05/09/2007 15:46:05,AMS_EnterpriseWarehouse,Error,0,AMSBINT1,AMS_EnterpriseWarehouse,(Job outcome),,The job failed. The Job was invoked by User JDEREP. The last step to run was step 1 (step1).,00:00:01,0,0,,,,0
05/09/2007 15:46:05,AMS_EnterpriseWarehouse,Error,1,AMSBINT1,AMS_EnterpriseWarehouse,step1,,Executed as user: AMSOILvector. Error: 2007-05-09 15:46:06.38 Code: 0xC0047017 Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: component "F0005" (1) failed validation and returned error code 0xC020801C. End Error Error: 2007-05-09 15:46:06.38 Code: 0xC004700C Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: One or more component failed validation. End Error Error: 2007-05-09 15:46:06.38 Code: 0xC0024107 Source: JDE Production to AMS BI 7332 Description: There were errors during task validation. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 3:46:05 PM Finished: 3:46:06 PM Elapsed: 0.75 seconds. The package execution failed. The step failed.,00:00:01,0,0,,,,0 05/09/2007 15:30:00,AMS_EnterpriseWarehouse,Error,0,AMSBINT1,AMS_EnterpriseWarehouse,(Job outcome),,The job failed. The Job was invoked by Schedule 29 (AMS_EnterpriseWarehouse). The last step to run was step 1 (step1).,00:00:01,0,0,,,,0 05/09/2007 15:30:00,AMS_EnterpriseWarehouse,Error,1,AMSBINT1,AMS_EnterpriseWarehouse,step1,,Executed as user: AMSOILvector. Error: 2007-05-09 15:30:01.15 Code: 0xC0047017 Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: component "F0005" (1) failed validation and returned error code 0xC020801C. End Error Error: 2007-05-09 15:30:01.15 Code: 0xC004700C Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: One or more component failed validation. End Error Error: 2007-05-09 15:30:01.15 Code: 0xC0024107 Source: JDE Production to AMS BI 7332 Description: There were errors during task validation. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 3:30:00 PM Finished: 3:30:01 PM Elapsed: 0.734 seconds. The package execution failed. The step failed.,00:00:01,0,0,,,,0 05/09/2007 15:20:00,AMS_EnterpriseWarehouse,Error,0,AMSBINT1,AMS_EnterpriseWarehouse,(Job outcome),,The job failed. The Job was invoked by Schedule 29 (AMS_EnterpriseWarehouse). The last step to run was step 1 (step1).,00:00:01,0,0,,,,0 05/09/2007 15:20:00,AMS_EnterpriseWarehouse,Error,1,AMSBINT1,AMS_EnterpriseWarehouse,step1,,Executed as user: AMSOILvector. Error: 2007-05-09 15:20:01.55 Code: 0xC0047017 Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: component "F0005" (1) failed validation and returned error code 0xC020801C. End Error Error: 2007-05-09 15:20:01.55 Code: 0xC004700C Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: One or more component failed validation. End Error Error: 2007-05-09 15:20:01.55 Code: 0xC0024107 Source: JDE Production to AMS BI 7332 Description: There were errors during task validation. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 3:20:00 PM Finished: 3:20:01 PM Elapsed: 0.703 seconds. The package execution failed. The step failed.,00:00:01,0,0,,,,0 05/09/2007 15:18:00,AMS_EnterpriseWarehouse,Error,0,AMSBINT1,AMS_EnterpriseWarehouse,(Job outcome),,The job failed. The Job was invoked by Schedule 29 (AMS_EnterpriseWarehouse). The last step to run was step 1 (step1).,00:00:01,0,0,,,,0 05/09/2007 15:18:00,AMS_EnterpriseWarehouse,Error,1,AMSBINT1,AMS_EnterpriseWarehouse,step1,,Executed as user: AMSOILvector. Error: 2007-05-09 15:18:01.53 Code: 0xC0047017 Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: component "F0005" (1) failed validation and returned error code 0xC020801C. End Error Error: 2007-05-09 15:18:01.53 Code: 0xC004700C Source: JDE Production to AMS BI 7332 DTS.Pipeline Description: One or more component failed validation. End Error Error: 2007-05-09 15:18:01.53 Code: 0xC0024107 Source: JDE Production to AMS BI 7332 Description: There were errors during task validation. End Error DTExec: The package execution returned DTSER_FAILURE (1). Started: 3:18:00 PM Finished: 3:18:01 PM Elapsed: 0.766 seconds. The package execution failed. The step failed.,00:00:01,0,0,,,,0
Any help would be greatly appreciated...
Jim
Additionally, when I run this from the command line, I get the following
I'm trying to create a SQL Agent job and schedule that executes an SSIS Package, but when I try to run the job I get the following error:
Executed as user: adApp1. The package could not be loaded.
The App1 user is the Identity used in the Credential for the Proxy which has the SSIS Packages Subsystem. The package encryption is set EncryptAllWithPassword and that password is included in the Command line with the /DECRYPT option. The package is stored on the server in the SSIS Package Store and was placed there via the Import package option. Does the method matter?
One thing that I haven't been able to track down is exactly what permissions the domain account adApp1 needs on the server or in SQL Server, if any, in order to run the package. Not sure if that has any affect anyway.
---------------------- MS Win XP Pro 2002 SP2 MS SQL Server 2005 MS Visual Studio 2005 ----------------------
Can anyone help me (even by pointing me to a documentation) in order to schedule Packages (from file system source) in SQL Server 2005.
I've configured providers logging, but still the error file doesn't give me any explanation why the error happens: "#Fields: event,computer,operator,source,sourceid,executionid,starttime,endtime,datacode,databytes,message OnPreValidate,PC1234,NT AUTHORITYSYSTEM,D_AGR,{8A4FA774-F5F0-40DE-AB16-A93F27950E09},{8A918844-8E43-403D-A606-C8CB4B7D8238},31/08/2006 16:42:55,31/08/2006 16:42:55,0,0x,(null)"
I've also done the same on the Step properties under 'Logging'
In Management Studio I've added my login name and I'm the only user using the machine and I manage both Visual Studio and Management Studio
The error coming up on the job history is as follow: "Executed as user: PC1234SYSTEM. The package execution failed. The step failed." "The job failed. The Job was invoked by User UK1USER123. The last step to run was step 1 (step1)."
By the way the package (.dtsx) runs fine in BI Visual Studio)
I have stumbled on a problem with running a large number of SSIS packages in parallel, using the €œdtexec€? command from inside an SQL Server job.
I€™ve described the environment, the goal and the problem below. Sorry if it€™s a bit too long, but I tried to be as clear as possible.
The environment: Windows Server 2003 Enterprise x64 Edition, SQL Server 2005 32bit Enterprise Edition SP2.
The goal: We have a large number of text files that we€™re loading into a staging area of a data warehouse (based on SQL Server 2k5, as said above).
We have one €œmain€? SSIS package that takes a list of files to load from an XML file, loops through that list and for each file in the list starts an SSIS package by using €œdtexec€? command. The command is started asynchronously by using system.diagnostics.process.start() method. This means that a large number of SSIS packages are started in parallel. These packages perform the actual loading (with BULK insert).
I have successfully run the loading process from the command prompt (using the dtexec command to start the main package) a number of times.
In order to move the loading to a production environment and schedule it, we have set up an SQL Server Agent job. We€™ve created a proxy user with the necessary rights (the same user that runs the job from command prompt), created an the SQL Agent job (there is one step of type €œcmdexec€? that runs the €œmain€? SSIS package with the €œdtexec€? command).
If the input XML file for the main package contains a small number of files (for example 10), the SQL Server Agent job works fine €“ the SSIS packages are started in parallel and they finish work successfully.
The problem: When the number of the concurrently started SSIS packages gets too big, the packages start to fail. When a large number of SSIS package executions are already taking place, the new dtexec commands fail after 0 seconds of work with an empty error message.
Please bear in mind that the same loading still works perfectly from command prompt on the same server with the same user. It only fails when run from the SQL Agent Job.
I€™ve tried to understand the limit, when do the packages start to fail, and I believe that the threshold is 80 parallel executions (I understand that it might not be desirable to start so many SSIS packages at once, but I€™d like to do it despite this).
Additional information:
The dtexec utility provides an error message where the package variables are shown and the fact that the package ran 0 seconds, but the €œMessage€? is empty (€œMessage: €œ). Turning the logging on in all the packages does not provide an error message either, just a lot of run-time information. The try-catch block around the process.start() script in the main package€™s script task also does not reveal any errors. I€™ve increased the €œmax worker threads€? number for the cmdexec subsystem in the msdb.dbo.syssubsystems table to a safely high number and restarted the SQL Server, but this had no effect either.
The request:
Can anyone give ideas what could be the cause of the problem? If you have any ideas about how to further debug the problem, they are also very welcome. Thanks in advance!
I have migrated several (7) DTS packages to SSIS in a development environment. In this environment these packages all work as designed. I am now in the process of placing the .dtsx files on the production server and scheduling jobs around them.
I am encountering problems in 3 of these 7 packages. I get the following error:
Message The job failed. The Job was invoked by User <domainuser>. The last step to run was step 1 (Create and Transfer Credentialling Data). This is not what I would call very informative. I have opened the dtsx files in question in BIDS on the production server and each has the following error reported:
Error loading <name>dtsx; Failed to decrypt protected XML node "PassWord" with error 0x8009000B" Key not valid for use in specified state." You may not be authorized to access this information. This error occurs when there is a cryptographic error. Verify that the correct key is available. I have looked at the package setting "Protection Level" and all seven packages have the same setting: EncryptSensisitveWithUserKey Each of the connections within the packages have been defined to utilize Windows Authentictation. And the jobs are set to run under the SQL Agent Account. What I find most puzzeling is that each of these packages are defined exactly the same and some work correctly and others do not. Thanks for taking the time help me with this issue.
Hi, Is it possible to do an event based scheduling of SSIS packages from SQL Server Agent? Like schedule a package to run when a file is available in a particular folder. Does anyone have any similar experiences to share?
I've found this problem that when I change settings in my configuration file it does not automatically apply to all child packages which uses the same configuration file if run from a job in SQL Server Agent. I need to open the package and save it again from BIDS. I use one "load group" package to execute all other packages.
Is there a way from the job configuration to set a setting so the package allways will have the newest configuration?
what the ideal CPU count and Max Degree of Parallelism are for a 3rd party database server.The server has 12 CPUs, 32GB RAM and all database sizes add up to < 30GB so they can all fit in memory (I tried to force this by doing a select * from every table). On certain payroll days, the CPU gets maxed out to 100% for a few seconds.
MAXDOP was originally set to the default 0. We later changed it to 8 based on several 'best-practices' articles. However the vendor suggests to change it to 1 (no parallelism), while others suggest changing it to 4, so that one run-away query doesn't hog most of the CPUs.
I'd like to find out how many CPUs are actually being used by queries. There is a Degree of Parallelism event in URL.... The BinaryData column says :
0x00000000, indicates a serial plan running in serial. 0x01000000, indicates a parallel plan running in serial. >= 0x02000000 indicates a parallel plan running in parallel.- What does "parallel plan running in serial" mean ?
I see a lot of 0x01000000, and a few 0x08000000's in my trace.How can i determine whether one query is hogging CPUs and if reducing it to 4 will work?
I create a main program which will launch two jobs at a time, each job does some processes and at the end I'm trying to delete those jobs after storing the job details in one of the custom table I created (cleanup sub-program).
Out of two jobs I am able to store one job details (like job_name,job_id,start_time and end_time of the job) in the custom table and able to delete that job, but the job that's getting completed at the end is not getting captured nor getting deleted from sysjobs and sysjobhistory tables.
I had included this step (which will call the cleanup sub-program to store the job details and delete it) at the end. I can see that this cleanup procedure getting called from debug message but it is neither storing details nor deleting the job.
When I execute this cleanup program separately, it does store the job details and delete it.
I have a stored procedure that calls several views that rely on each other. In the past these views used to go parallel and use up all 100% of the CPU (12 cores), and now when the same stored procedure runs it only uses 8% of the CPU (1 core). This extends the time spent on the query from roughly 10-15 sec to 2-3min. I'm not quite sure why this is happening.
Are there some obvious things to look at when optimizing views to utilize all cores/threads? Also, it doesn't matter if I set Cost Threshold for Parallelism to 1 or 50 or 5, it is always the same, and I have Max Degree of Parallelism set to 0 as well, which should mean to use all cores when available.
I have a table with 8 columns, I need to update data in multiple columns on this table, this table contains 1 million records, having single update was taking time so I broke the single update into multiple update statements and running multiple update statements in parallel, Each update statement updates different column.
This approach is working fine but I am getting the deadlock error.
Transaction (Process ID 65) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
I have a scenario where i have to run update task on multiple servers in parallel and once all of them are completed (success or failure) another task is to be run on another server
1. in maintenance plan, if we add tasks which are not joined, will they run in paralled at the same time 2. if we link the last task to all the tasks with link type 'completed' will the last task complete after all tasks are completed or when any one of them is completed (i have big doubt here)
the business requirement behind this is to bring data from multiple servers into shadow copies locally and then process them together. its ok if some server data transfer fails, but its not ok to start processing centrally while data transfer is going on. further, we want to run data transfer from multiple servers in paralleled to save time.
Is the SQL Server Profiler Reads Column Incorrect For Parallel Plans?
I often use profiler as one tool to identify bad plans. The reads column gives me a good indication of excessive IO to dig into and correct if necessary. I often use it with Showplan so I can see what a query does, replicate it and fix it.
However I have just lost some faith in it. I am looking at a poorly performing query joining five tables. A parallel plan has been generated and one table is being scanned (in parallel) due to a missing index. This table had in excess of 4 million rows in it. The rest hitd indexes well. However the entire query generates ONLY 12 READS.
Once corrected, a single processor plan is used. This looks really efficient and uses 120 reads. That looks the right figure to me.
Does the profiler only display one thread of a parallel plan perhaps? Or something else?
I have a series of SSIS packages which populate 12 different databases. Which data source & target they use is controlled by values passed down from the SQL Agent Scheduler Job Step using "Set Values"
These values are passed to Variables which are used in the Expressions in the connection manager for the database to change the connection string and initial catalog.
e.g. REPLACE( @[User::ConString] , "Royalty", @[User::RoyDb] )
These jobs run successfully the majority of the time but each day I get a significant number of failures where one or more target or error trapping table cant be found. They are all using the same connection manager and most of the tables in the database get updated correctly but the job fails on trying to access one or two of the tables, with the following message in the On Error event:
-1071636248,0x,Opening a rowset for "[dbo].[RIGDUE_DAY]" failed. Check that the object exists in the database.
This happens both when I schedule the jobs in parallel with other jobs running the same packages & when I run the job by itself using the right click, start at step option. e.g.
I had one fail last night, I ran it by itself this morning, it failed, I ran it again, it succeeded. Nothing concerning the data it was transforming had changed.
I have applied the hotfix to service pack one concerning the paralel use of variables in a package as referred to in the following link.
I am not able to login to the SSMS, while I am logging I got this error, how to troubleshoot..
Cannot connect to LAASYA.
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server) (Microsoft SQL Server, Error: 2)
[URL]
------------------------------ The system cannot find the file specified ------------------------------ BUTTONS: OK ------------------------------