I have a package that is doing some file transformation (Text, XML, and Excel) job based on a variable value. This package is called by a Parent package, where I am calling this package parallel through a script Task. So there are three parallel script task and all variables are local to script task.
In Script Task I am assigning value to child package variable using following code.
I have a SQL Server 2000 instance running on a Windows Server 2003 box with 4 processors. SQL Server is configured to use all 4 processors, and use all available processors for parallelism.
I have created a simple DTS package which has 2 "execute external process" tasks with no precedence constraints between them. There are no connections required or defined for the two tasks (sequential processing is forced on tasks sharing connections). The DTS package properties have the "limit the number of tasks to execute in parallel" set to 4.
However, despite the above configuration, the two steps are never executed in parallel, but always sequentially.
Does anyone have any ideas as to why these tasks are not being executed in parallel?
Does an UPDATE statment lock the entire table or just the rows that will be affected by the UPDATE?
I ask because -
Can I run UPDATE statements in parallel on the same table on the same column. The need for doing this is because the table is a large fact table. I plan to execute the same UPDATE statements on different time sections of the data to expedite the processing.
If the UPDATE statment lock the entire table then I cannot run an UPDATE in parallel. If the UPDATE statement just locks the rows that will be affected then maybe I can because rows affected will be different for each UPDATE.
In my application code I am trying to invoke multiple threads in which each thread is loading an instance of the same SSIS package and would initialize the package variables with different values and execute the different instances in parallel. In each thread - after the package execution has completed successfully - I read that instance's SSIS package variables to get result information from that Instance run.
When I load the same package in different thread using LoadFromSqlServer() method - does the code create multiple instances of the SSIS package and load the distinct instances in each of the thread - Will the Package Execution ID be different for the different instances? - Are the package level variables instance safe?
I try to find out how many jobs where run in parallel on my server in an interval of time. For example: between 1:00 AM and 2:00 AM there were MAX 66 jobs that run in parallel and MIN 4 jobs. I am not sure if I can find this info out from a system view or I need to play with sysjobhistory view.
I'm expecting to run 3 isloated version of the package with in first version VARA=1 VARB=0 VARC=0 second version VARA=0 VARB=1 VARC=0 third version VARA=0 VARB=0 VARC=1 but it doesn't seem like doing that the maxconcurrent variable is set to 40 to be on the safe side.
when I run I get
first version VARA=1 VARB=0 VARC=0 second version VARA=1 VARB=1 VARC=0 third version VARA=0 VARB=1 VARC=1
I have a system of SSIS packages in which several packages perform the same lookup on the same table. E.g., i have PackageA, PackageB and PackageC all doing a lookup on TableA. All of these packages are spawned by the same PackageD and run frequently. In some cases, there is an issue with concurrency on these lookups. I get the following exception :
" The ProcessInput method on component "LKP Lookup SecurityID" (6658) failed with error code 0xC004702C. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
"
The hex code of this exception corresponds to the following description : "DTS_E_BUFFERNOTLOCKED. This buffer is not locked and cannot be manipulated." That's as much as i could find on this.
My suspision is that the SSIS engine somehow figures that the lookup in these distinct packages is the same one and builds a shared version of the lookup table in memory. Then there is some sort of a multi-threading issue in accessing this shared memory which leads to the exception above.
Has anyone experienced this? Can someone shed some light on this?
Hi list, I'm a long time lurker on this list and really enjoy the discussions, although I rarely get a chance to participate.
Here is my situation: We are importing chunks of data (500 records at a time) from a C++ interface. The records have to be transformed before inserting into the target table which I am doing using a stored proc which is working fine. The records are in memory in C++ and the programmer is looping through the records building inserts into a temp table through ADO (which my proc picks up). The server business object is using the connection.execute method which is inserting one record at a time. That part of the process is taking over 15 seconds for 500 records which is the bulk of the total time.
My question is: Using ADO is there a better way to insert these records into the temp table? I see mention of a recordset interface but my programmers are new to ADO and since I am the DBA and have never used ADO, I am not sure what to tell them.
Hi,Before stepping into ado.net code to perform an insert or update, the insert / update has already taken place, just on starting the debugger. I use VS 2005 on SQL Server 2000. This did not happen with VS 2003 and SQL Server 2000.Anyone else encountered this?
I have stumbled on a problem with running a large number of SSIS packages in parallel, using the €œdtexec€? command from inside an SQL Server job.
I€™ve described the environment, the goal and the problem below. Sorry if it€™s a bit too long, but I tried to be as clear as possible.
The environment: Windows Server 2003 Enterprise x64 Edition, SQL Server 2005 32bit Enterprise Edition SP2.
The goal: We have a large number of text files that we€™re loading into a staging area of a data warehouse (based on SQL Server 2k5, as said above).
We have one €œmain€? SSIS package that takes a list of files to load from an XML file, loops through that list and for each file in the list starts an SSIS package by using €œdtexec€? command. The command is started asynchronously by using system.diagnostics.process.start() method. This means that a large number of SSIS packages are started in parallel. These packages perform the actual loading (with BULK insert).
I have successfully run the loading process from the command prompt (using the dtexec command to start the main package) a number of times.
In order to move the loading to a production environment and schedule it, we have set up an SQL Server Agent job. We€™ve created a proxy user with the necessary rights (the same user that runs the job from command prompt), created an the SQL Agent job (there is one step of type €œcmdexec€? that runs the €œmain€? SSIS package with the €œdtexec€? command).
If the input XML file for the main package contains a small number of files (for example 10), the SQL Server Agent job works fine €“ the SSIS packages are started in parallel and they finish work successfully.
The problem: When the number of the concurrently started SSIS packages gets too big, the packages start to fail. When a large number of SSIS package executions are already taking place, the new dtexec commands fail after 0 seconds of work with an empty error message.
Please bear in mind that the same loading still works perfectly from command prompt on the same server with the same user. It only fails when run from the SQL Agent Job.
I€™ve tried to understand the limit, when do the packages start to fail, and I believe that the threshold is 80 parallel executions (I understand that it might not be desirable to start so many SSIS packages at once, but I€™d like to do it despite this).
Additional information:
The dtexec utility provides an error message where the package variables are shown and the fact that the package ran 0 seconds, but the €œMessage€? is empty (€œMessage: €œ). Turning the logging on in all the packages does not provide an error message either, just a lot of run-time information. The try-catch block around the process.start() script in the main package€™s script task also does not reveal any errors. I€™ve increased the €œmax worker threads€? number for the cmdexec subsystem in the msdb.dbo.syssubsystems table to a safely high number and restarted the SQL Server, but this had no effect either.
The request:
Can anyone give ideas what could be the cause of the problem? If you have any ideas about how to further debug the problem, they are also very welcome. Thanks in advance!
I am working on SQL Server 7.0. Every weekend we go for reindexing of some tables. I want to know if it is possible to run the re-indexing of tables in parallel so that I can save time.
Our database is of size 80GB and one table is around 22GB. Rebuilding of index on this table takes a lot of time and we are unable to index the other tables.
hi, we currently use the Database Maintenance Plan to do backups for our SQL Server 2000 databases. I notice that the database are backed up one after the other.
I would like to know how to run the backups in parallel rather than sequentially. To do this, is there any dependency on the number of CPUs?
I created the package to download 4 ftp files at once. I set the MaxConcurrentExecutables for the SSIS package to 4. So in BIDS in downloads 4 files at the same time.
However, when I started the job I noticed that only 3 files were downloaded at the time (looking at temp files in download directory)
Solution: Sure enough after digging around for awhile - in Step properties for SSIS package - there is execution tab - and "Maximum Concurrent Executables" was -1 (which for some reason defaults to 3 concurrent processes even on our dual CPU server) - so after chanign that value to 4 - tada - all 4 files in parallel
Is there any way to run a stored procedure in parallel to another one? i.e. I have a stored procedure that sends an email. I then scan a table and send any unsent emails. I do not want the second part to slow the response to the user.
Assuming I have a line, is there a function I can call to create a parallel line at a given distance away.i.e - with the below I would want to draw a parallel line to the one output.
Hi ,I need to place the results of two different queries in the same resulttable parallel to each other.So if the result of the first query is1 122 343 45and the second query is1 342 443 98the results should be displayed as1 12 342 34 443 45 98If a union is done for both the queries , we get the results in rows.How can the above be done.Thanks in advance,vivekian
Hi,All. I'm writing test cases on C# for a few methods that make changes in database.To prevent making changes I used BeginTransaction-Rollback,everything was good.But this doesn't work if tested method has BeginTransaction-Rollback code itself.An error appears in NUnit: System.InvalidOperationException : SqlConnection does not support parallel transactions. Do smb know how to solve the problem?
I have several packages within secuence containers and into one main dtsx package with a checkpoint configuration and when I run it some succeed and some don´t. The problem is that when I rerun it checkpoint doesn´t seem to work ´cause some of the successful packages are rerun as well (and not skipped as it should be...) In other words, the process does not begin on the point of failure..
Seems to be that packages that finish after the failure point (and succeed) are not registered in the checkpoint file, then when I rerun the main package these succeeded packages are rerun too....
what the ideal CPU count and Max Degree of Parallelism are for a 3rd party database server.The server has 12 CPUs, 32GB RAM and all database sizes add up to < 30GB so they can all fit in memory (I tried to force this by doing a select * from every table). On certain payroll days, the CPU gets maxed out to 100% for a few seconds.
MAXDOP was originally set to the default 0. We later changed it to 8 based on several 'best-practices' articles. However the vendor suggests to change it to 1 (no parallelism), while others suggest changing it to 4, so that one run-away query doesn't hog most of the CPUs.
I'd like to find out how many CPUs are actually being used by queries. There is a Degree of Parallelism event in URL.... The BinaryData column says :
0x00000000, indicates a serial plan running in serial. 0x01000000, indicates a parallel plan running in serial. >= 0x02000000 indicates a parallel plan running in parallel.- What does "parallel plan running in serial" mean ?
I see a lot of 0x01000000, and a few 0x08000000's in my trace.How can i determine whether one query is hogging CPUs and if reducing it to 4 will work?
I have another post here regarding SQL 2005 running a query 50% slower than on 2000. It was discovered that 2005 runs the query in series whereas 2000 runs it in parallel.
Even with "Cost Threshold For Parallelism" set to a default value – 0, 2005 still executes my query in series. Does anyone know how to force a query to run in parallel in SQL 2005. I specifically want to set it at the database level.
I have a SQL 7 db with a union query (view), and I'm getting the error, "Thequery processor could not start the necessary thread resources for parallelquery execution." This union query has been in place for about two years nowwith no problems until just now, though I haven't changed anything. Also, Ihave a local copy of the database on my machine, and the query runs fine.As noted, I haven't changed anything in the query, nor in the SQL settings.There is a network administrator, so it's possible that he may have changeda setting, but I don't know what. The query is reproduced below. Any ideasas to what's going on would be appreciated.NeilMain query:SELECT Tmp.INVCUST, Tmp.SDNBR, Tmp.SDBOOK, Tmp.SDIVCLN,Tmp.SDPAID, Tmp.SDPRICE, Tmp.SDCOPIES, Tmp.Location,INVTRY.AUTHILL1, Tmp.INVDATE, INVTRY.SaleSrc,INVTRY.HoldInitFROM (SELECT INVDATE, INVCUST, SDNBR, SDBOOK, SDIVCLN,SDPAID, SDPRICE, SDCOPIES, 'P' AS LocationFROM vwInvoiceDetUNION ALLSELECT INVDATE, INVCUST, SDNBR, SDBOOK, SDIVCLN,SDPAID, SDPRICE, SDCOPIES, 'N' AS LocationFROM vwInvoiceDetNUNION ALLSELECT INVDATE, INVCUST, SDNBR, SDBOOK, SDIVCLN,SDPAID, SDPRICE, SDCOPIES, 'M' AS LocationFROM vwInvoiceDetM) Tmp INNER JOINdbo.INVTRY ON Tmp.SDBOOK = dbo.INVTRY.[Index]vwInvoiceDet:SELECT tabInvoice.INVDATE, tabInvoice.INVCUST,SALEDET.SDNBR, SALEDET.SDBOOK, SALEDET.SDINVNUM,SALEDET.SDPRICE, SALEDET.SDPAID, SALEDET.SDCOPIES,SALEDET.SDIVCLN, tabInvoice.INVNBR, SALEDET.SDIDFROM dbo.tabInvoice INNER JOINdbo.SALEDET ONdbo.tabInvoice.INVNBR = dbo.SALEDET.SDNBR(vwInvoiceDetN and vwInvoiceDetM are similar to vwInvoiceDet.)
SQL Server 2000 SP3ALast week one of our processes starting issuing or suffering deadlockdetected errors every 15 minutes or so.I have read several articles at MS on the subject. I set a couple ofstartup parameters related to producing deadlock detection informationand ran SQL Profiler. I found the SQL statements being issued by thedeadlocked statements. In every deadlock the same UPDATE statementappears however the data values being searched on are different. Thebest I can tell from trying to query the actual data each update hitsonly one or very few rows. No indexed column is updated so the indexesshould not be the source of conflict.Looking at the query I noticed that the query does not have anavailable index and Query Analyzer shows that the full table scan isbeing done in parallel.My question: Does SQL Server change or modify its locking rules whenqueries are converted to be ran using parallel processing? If so, doyou have a reference?Here is the deadlock entries posted to the error log:SPID=167ResType:LockOwner Stype:'OR' Mode: IX SPID:63 ECID:0 Ec:(0x65971510)Value:0x3c577e60 Cost:(0/0)Input Buf: Language Event: UPDATE Station_Upload setStation_Accept_Status = 'ACC',HeadStatus ='ACC',LastProcessedSta='110',HeadPartType='1' WHERE Part_Serial_No ='SCH1119323' AND Station = 'H110'SPID=63ResType:LockOwner Stype:'OR' Mode: IX SPID:167 ECID:0 Ec:(0x65801510)Value:0x3c27d060 Cost:(0/0)Input Buf: Language Event: UPDATE Station_Upload setStation_Accept_Status = 'ACC',HeadStatus ='ACC',LastProcessedSta='70',HeadPartType='1' WHERE Part_Serial_No ='SCH1119060' AND Station = 'H070'I have suggested adding an index to support the query.Any ideas?Thanks -- Mark D Powell --
I have seen a number of posts regarding parallel development of SSIS packages and need some further information.
So far we have been developing SSIS packages along a single development stream and therefore have managed to avoid parallel development of our packages.
However, due to business pressures we will soon have multiple project streams running in parallel, and therefore multiple code branches, as part of that we will definitely need to redevelop the same SSIS packages in parallel. Judging from your post above and some testing we have done this is going to be a nightmare as we cannot merge the code. We can put in place processes to try and mitigate this but there are bound to be issues along the way.
Do you know whether this problem is going to be fixed? We are now using Team Foundation Server but presumably the merge algorythm used is same/similar to that of VSS and therefore very flaky?
However, not only are we having problems with the merging of the XML files, but we also use script tasks within the packages which are precompiled, as the DTSX files contain the binary objects associated with the script source code, if two developers change the same script task in isolated branches the binary is not recompiled as the merge software does not recognise this object.
Do you know whether these issues have been identified and are going to be fixed to be in line with the rest of Microsoft Configuration Managment principles of parallel development?
I have done a search and have read some of the posts, but am left more confused than before. I am fairly new to SSIS. Here is my situation and what i am trying to accomplish.
I have a package that has a sequence container, in which there are multiple SQL tasks (about 20) running in parallel. I have checkpoints enabled, and FailPackageOnFailure enabled as well. If the package fails, when i re-run the package it will run the last task as well as all the other tasks. What I am looking to accomplish is when the package is re-run, have the SQL tasks that failed ran and not the previous successful tasks.
I think the best way would be via disabling tasks on successful completion of a task, where it writes the name of the SQL task to a temp table, but I am skeptical.
Can anyone point me in a direction to help me accomplish what I am looking for please.
Hey guys. I'll try to keep this as short as possible.
Was testing out some thresholds with a simple service broker application and ran into something interesting that I'm trying to find a workaround to. I setup a simple service on a single instance, and am transmitting messages to/from the same instance. I assigned a procedure to be activated on the queue, with a max of 20 parallel activations.
The procedure originally was reading messages off the queue by following a process with major steps outlined here:
Start loop until no more messages to process Call get conv. group with 3000 timeout into variable x If group retrieved, receive top 1 from queue where conv_group = x
If no group, exit Process message(s) received (simply insert some of the msg data into user table, then waitfor 1 second to simulate doing some work) for the given group Loop to get next group and messages
So, with this type of configuration, I sent off a few thousand messages to the queue and monitored the queue count, user table count, and activated tasks count. To my surprise, despite the fact that there were thousands of messages in the queue waiting to be processed, only a single task was processing data, pulling about 1 per second.
I tried over and over with different values in the waitfor, and no additional tasks got spun up until I had a waitfor of 6 seconds or more...then only about 3-4 would get spun up at any time in parallel.
Given the way activation occurs, I thought I'd try things without the call to get conv group, and instead simply use a receive without a where clause...so, I rewrote the activation procedure to skip the call to GET CONV GROUP, and simply perform a receive without any where clause. I also removed the waitfor delay and simply left it to process messages as fast as it could...
This time, I sent over the same number of messages and checked the same counts, this time noticing that within 20 seconds or so, all 20 parallel instances of the activation procedure were processing messages off the queue, and doing so very quickly. Same occured if I put the waitfor delay back in...
The problem is, that this type of coding doesn't allow for processing related messages, even though it seems to be much better at allowing parallel activations. I tried to rewrite again using 2 types of receive statements (first one without a where clause and next one with a where clause on the conv. group from the first receive after processing the message) to see if that would allow for better parallel activation, however that worked about the same as using the GET CONV GROUP call.
So, if I have an application that mostly does not make use of grouping and want to ensure optimal parallel activation when needed, however also handle the times when related messages arrive and need to be processed together, what can I do? Any ideas?
I have test code scripts that can reproduce this behavior if needed. Thanks up front,
I'm working on an SSIS package whose job it is to push data from x # of tables to x# of servers. Right now, I have it implemented with nested ForEach loops. ForEach Server loop inside of a ForEach table loop. It works great, but it will take too long as it is serial. I need a way for the inner operation to take place in parallel. In SQL 2000, the way I 'm doing this is by spawning a SQLAgent job per server. I could always do this here as well, but I'm interested in finding out if there is a way to do this with a single package.
I'm okay with process the table serially, but for each table I'd like to be able to parallel the push to the servers. Can't do it with multiple data flows because there are N # of servers being pushed to.
I have three SQL tasks executing in parallel in an Integration Services package.
+-B-+ A-+-C-+-E +-D-+
It starts with task A; then B, C, and D all execute in parallel; and finally task E runs after BCD are done.
B, C, and D are all Execute SQL tasks, all with the same connection manager. Here is their code:
B) SELECT CASE WHEN COUNT(*) = 0 THEN 0 ELSE 1 END AS Process FROM temp_B
C) SELECT CASE WHEN COUNT(*) = 0 THEN 0 ELSE 1 END AS Process FROM temp_C
D) SELECT CASE WHEN COUNT(*) = 0 THEN 0 ELSE 1 END AS Process FROM temp_D
Each one is setting a binary value to a package variable (using Result Set settings) based on the count of records from different tables.
This works with no problems when I run it against one server (development). But when I switch to the production server, task B and D both fail. I'v checked to make sure all of the temp tables exist in the database for that connection manager and that all three have the same connection manager - all is okay.
Here's the trickier part. When I'm still pointing to the production server and I run these tasks individually, they are all successful. It is only when they are attempting to run in parallel that they fail.
Here is the Output error: Error: 0xC002F210 at Process Med?, Execute SQL Task: Executing the query "SELECT CASE WHEN COUNT(*) = 0 THEN 0 ELSE 1 END AS Process FROM temp_B" failed with the following error: "Invalid object name 'temp_B'.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.