Integration Services :: Creating Parallelism By Executing Many Dtexecs
Sep 1, 2015
I'm currently looking at refactoring an existing, large SSIS 2012 implementation that consists of about 55 projects and 360+ packages. The ETL framework that is in use has a "main" control package that reads from a database table and determines which packages are ready to execute (based on some dependency logic) and then uses an Execute Process task within a loop that calls dtexec with the arguments: /C start Dtexec /SQL "Some Package Path" /SERVER "someserver"
This design allows the loop to execute a package and then immediately iterate because it doesn't wait for the package to respond (aka complete with a failure or success) so it can quickly kick off as many packages are ready to execute. A SQL Agent job is used to call this package every few minutes so that it can pick up any packages that have had their dependencies satisfied since the last execution and kick those off.It's a a clever design but has some problems such as decentralized exception handling (since the parent package is unaware of what is happening in the "asynchronous" dtexec calls.My biggest concern is that by executing packages, not with the Execute Package Task but with the Execute Process Task, and spinning up many dtexecs, the framework is not leveraging SSIS's ability to handle threading, memory consumption, etc. across all running packages and executables because it is simply unaware of them. It's essentially like using an Execute Package Task with the ExecuteOutOfProcess property set to true.
I created a SSIS package which loads CSV into database. The package is called in a C# console application which is set as scheduled task in a server. The problem I having is that the package hangs during validation stage: "Truncation may occur due to inserting data from data flow column "Reading Type" with a length of 50 to database column "ReadingType" with a length of 2."There is no problem loading same data from development machine.
I've developed SSIS package and able to Launch and Execute dtsx package using Web Service/Console application. If I run it from my local it has no issues. If I ask my client to run this web service it is throwing error "Unable to connect to remote server".
If I ask my client to directly run the package itself they get following error "AcquireConnection method call to the connection manager failed"...
Even after my client having SQL Server, Integrations Service and BIDS installed on his machine he gets the above error. How to set this up. SQL Sever Agent is not an option for us.
I am using SQL Server 2012 SP1. I have built an SSIS package that imports flat file data from various files to SQL Server. I have got it to do everything I want it to do when things are going well, and am now on what I want it to do when it encounters a failure executing specific tasks and containers. For example, I have a Foreach Loop container that executes a dedicated stored procedure for each csv file in the target folder. If any of the store procedures fail to run for any reason I want to carry out certain actions.
For the most part I think I will be fine using the Event Handlers. What I can't seem to find is how to tell the package to stop executing on a Failure event after carrying out the actions defined by the relevant Event Handler. Or, perhaps it isn't necessary as that would be the default behaviour on a failure?
I have a sql agent job which runs the Masterpackage, Master package has aroung 35 child packages calling in it. Issue with this Master package is it running upto 30 child packages and I get success message on job. It is not executing packages beyond 30 what could be the problem. I disabled the all the 30 child packages and ran it again it ran those other 5 child packges without any problem.
I have to execute a .bat file on a remote server (that is used to stop and start services of an appl). The remote server doesn't have SSIS,SSMS installed. I want to create a package on my desktop the uses Execute process task and execute the .bat file on the remote server and then schedule it using the SSMS.
I need to grab data from teradata(using odbc connection).. i have no issues if its just bunch of joins and wheres conditions.. but now i have a challenge. simple scenario, i have to create volatile table, dump data into this and then grab data from this volatile table. (Don't want to modify the query in such a way i don't have to use this volatile table.. its a pretty big query and i have no choice but create bunch of volatile tables, above scenarios is just mentioned on simple 1 volatile table ).
So i created a proc and trying to pass this string into teradata, not sure if it works.. what options i have.. (I dont have a leisure to create proc in terdata and get it executed when ever i want and then grab data from the table. )
We are executing a SSIS package using a xp_cmdshell command in a SP as shown below. This package does consumes time to execute almost 90 minutes and does get executed successfully too. But the strange thing is we don't get the result in @result variable just because somehow the next sql statement after the below highlighted statement doesn't get executed at all. After checking execution stats for the SP using the query attached below we observed that somehow the SP vanishes out of the execution stats for the server.
I'm executing Oracle procedure, which has three OUTPUT parameters and returns results in table type variable. i should not use ODBC, MSDAORA providers to call the procedure. So I'm planning to using Oracle OLEDB provider. I'm able to execute the procedure successfully, but when i do check (while dr.Read()) its not returning any records. But I know as per stored procedure results, it should return 66 records.
Dim conn As New OleDbConnection Dim cmd As New OleDbCommand Dim dr As OleDbDataReader Dim QSQL As String
We run std 2008 r2, I need to recreate flat files from their varbinary(max) equivalents in our db. I have a mix of excel, pdf, word etc to recreate. Will ssis be a good tool for doing this? I'm wondering what transform(s) would be involved.
Perhaps I need to cast to varchar 1st and then land the data but if I recall correctly there is a maximum record length in ssis destination flat file rows. And I'm thinking I would have to map the varbinary (or cast equiv) to a row in the destination once for each file created.
If I have an XML without an XSD what is the best way to create and import data in SQL Server? I know I can use xsd.exe to create an XSD from my XML.
But if I want my structure to be somewhat different in SQL server how would I go about creating a reliable and repeatable import system for my data so i can easily manage the data updates?
I am creating my first Integration Services project and am getting an error immediately after creating the project. The error is in the designer and it displays a red X, saying "Microsoft Visual Studio is unable to load this document: Object reference not set to an instance of an object". Clicking on the error in the error window just opens up the xml file which contains the following:
I have VS2005 SP1, running WindowsXP SP2. Integration Services is running on my machine, but I would think that this shouldn't matter anyway as the error is happening before I have even indicated and data source to use. If I take this xml file with the error and copy it to another new SSIS project on another machine, it works fine. Reinstalling VS2005 had no effect. I am now reinstalling SQL Server 2005 SP2 to see what effect that may have.
I'm trying to create a file using a stored procedure with SSIS. I've tried to use the Execute SQL Task, but it will not create a file nor output to that file. I'm using "Full result set", but I don't know how to sent the result to the file. Is there another Control Flow Item I need to use?The reason why I'm using SSIS and not SQL Server Agent is because the file name must contain a timestamp.
i am creating ssis packages with condition split . condition is SUBSTRING(EnglishProductName,1,1) == "A". pacakge is successfully executived but data is not move to condition split transformer to oldeb destinations. it not showing any error.
I have a transformation where final result set give me 25 rows of data. Now before I put into destination table, I need to add another column which will show how many total records we have. Like.
My dataset:
A 20 abc B 24 mnp c 44 apq
Now I need to add another column within my transformation before I store the result set to destination like this:
A 20 abc 3 b 24 mnp 3 c 44 apq 3
Here. new column gives count of total rows in our dataset which was 3.
How can I achieve this? Can I use derive column to this?
I've been running around in circles all afternoon trying to create one simple report using Reporting Services (with latest SP2 installed) and SharePoint 2007. To the best of my knowledge, I have everything configured correctly:
When I access http://<server>/ReportServer, I see the server name of my SharePoint site. When I click on the name of my SharePoint site, it shows me the directory structure I have created within my SharePoint Site When I drill down in the directory, I can ultimately see the forms I created in my forms library (created via InfoPath 2007).
The next step is to create one simple report from the data in one of these forms libraries and a report on all the items within a form library. I'm stuck at the first step of creating a report, namely what to enter as the Data Source and the connection string. With a SQL database this isn't an issue.
How does one create a data source that will allow reporting over SharePoint content with the setup described above? And, if you have information that is found in the SQL Books Online, please be kind and post links so others know where to find this information.
I want to achieve the following in (SSIS/SSDT for SQL 2012) -
I have a generic SSIS package which simply sends out email notifications using SMTP email task (this package is within its own project, and has project level input parameters).
I need to be able to call this package in the Event handler section of every package (numbering in about less than 60) that we have. These packages are within their own respective projects.
I thought I could use the "execute package task", but it turns out , using this, I cannot call a package that is part of some other project. I also cannot call a package that is stored in the CATALOG. Is there any way I can do this ?
When I call the child package , I should be able to send in parameters like - error information and package name of the Parent package.
having on mind that this is my Target server: what is the way of creating shared folder in order to perform operation from the title (and, of course, to continue with installation of packages etc...)? SQL SERVER 2008 R2
I have a scenario, need to create SQL server Tables dynamically.
I Have multiple xml data file on a particular location, and want to load those XML data into sql server tables, but he metadata of each xml data files are not same.
Hence the approach is that,
1. Pick first file from that location 2. Create a table according to that xml data file metada 3. load data on newly created table. 4. Pickup the next xml data files. 5. loop through, till the XML data files are exists on that location.
I am not sure if this is a correct forum to discuss on the document posted @ http://www.microsoft.com/downloads/details.aspx?familyid=1c2a7dd2-3ec3-4641-9407-a5a337bea7d3&displaylang=en on SQL Server Integration Services (SSIS) Hands on Training - Creating Custom Components.
I am assuming Microsoft Developers are constantly monitoring this forum.
In the document - SSIS Creating a Custom Transformation Component .doc on Page 2 - Exercise 1 - Writing the no-op data flow transformation component - Task 1 - Create a new C# Class Library Project
The textual description talks about creating a new Visual C# Class Library project in VS 2005 but the screenshot accompanying it shows the creation of new "Integration Service Project" in VS 2005.
Please change the screenshot appropriately to avoid confusions.
I've got an SSIS solution file with project deployment model in VS 2013 and would like to deploy that to SSISDB on different environments.All these days I followed the regular way to create a project in SSISDB and deploy it to that. Now want to find out if i can automate this process and so got some questions
1. Can we automate the process of creating a project on SSISDB based on our SSIS project name? This will be like when we do a deployment it should check if the project exists or not on SSISDB based on our SSIS project name, if the project exists we just deploy the packages in the project and if the project does not exists in SSISDB it will create that project and deploy the packages.
2. Can we also automate the process of creating environments? In traditional way we manually create the environment variables under environment tab of SSISDB, but can we make that also as part of deployment? Like when we are releasing to Dev server we look if that particular Dev variable exists on that server, if it exists we just update the existing stuff and if it does not exists we just create it.
I have been stuck with this problem since few days, need help regarding the same. I am enclosing the problem description and possible solutions that I have found.
Can anyone please help me out here?
Thanks and regards, Virat
Problem Description:
I have a requirement for which I have created a data driven subscription in SQL Server 2005, the whole thing works like this:
I have a report on Report Server which executes a stored procedure to get its parameters; then it calls another stored procedure to get data for the report; then it creates the report and copies it to a file share. This is done using data driven subscription and the time set for repeating this process is 5 minutes.
You can assume that following are working fine:
1. I have deployed the report on the Report Manager (Uploaded the report, created a data source, linked the report to data source) - manually, the report works fine.
2. Created a data driven subscription.
3. The data driven subscription calls a stored procedure, say GetReportParameters which returns all the parameters required for the report to execute.
4. The Report Manager executes the report by calling a stored procedure, say GetReportData with the parameters provided by GetReportParameters stored procedure; after it has generated the report file (PDF) is copied to a file share.
For each row that GetReportParameters stored procedure returns a report (PDF file) will be created and copied to file share.
Now, my question is
1. How to I get a notification that this file was successfully created or an error occurred? 2. The only message that reporting service shows on 'Report Manager > My Subscriptions' is something like "Done: 5 processed of 10 total; 2 errors." How do I find out which record was processed successfully and which ones resulted in an error?
Based on above results (success or failure), I have to perform further operations.
Solutions or Work around that I have found:
1. Create a windows service which will monitor the file share folder and look for the file name (each record has a unique file name) for the reports that were picked up for PDF creation. If the file is not found, this service will report an error. Now, there's a glitch there; if a report takes very long time to execute it will also be reported as error (i.e. when this service checks for the PDF file, the report was currently being generated). So, I can't go with this solution.
2. I have also looked at following tables on ReportServer database:
a. Catalog - information regarding all the reports, folders, data source information, etc. b. Subscriptions - all the subscriptions information. c. ExecutionLog - information regarding execution of the subscriptions and the also manual execution of reports. d. Notifications - information regarding the errors that occurred during subscription execution.
For this solution, I was thinking of doing a windows service which will monitor these tables and do further operations as required.
This looks like most feasible solution so far.
3. Third option is to look at DeliveryExtensions but in that case I will have to manually call SSRS APIs and will have to manage report invocation and subscription information. What is your opinion on this?
My environment details:
Windows XP SP2
SQL Server 2005
Reporting Services 2005
Please let me know if I am missing something somewhere...
Hello friends. I managed to design an Integration service package,but the desired level of performance has not been achieved(i.e it is performing slow). So I want to know what are the best practices for optimized solution . In my package I'm exreacting data from XML file and Storing it in SQL server database with some processing dring data flow.
I'm using 1) Two Script Task Control -In these control,I m opening the connection to XML file through VB.net code and iterating each record at a time. 2)Two OLE DB Command -Each fetched record from script task component is processed in OLEDB command through stored procedure and then inseted into database. 3)One for Loop -This loop contains two script Task control and two OLEDB Command control, (mentioned above),for fetching single record and inserting it in database. 4)One derived Column 5)One Multicast 6)One Character Map 7)One OlEDB Source
As with my current performance I'm able to insert one record in every .5 second (Which is much below to acceptable limits) Is control lying disabled on SSIS designer pane also affect the performance of execution.
Hi, I have just install SQL 2005 SP2 and trying to get Window SharePoint Services V3 integrated with SQL 2005 SP2 reporting services. In SharePoint Central Administration, I select the Reporting Services Integration page and have setup the Report Server Web Service URL and Authentication Mode. I then goto Grant database access, specify the SQL server name, get promted for a username and password that has access SQL Reportserver and get the following error "The group name could not be found" Does anyone have any ideas? Thanks
Hello, I have a problem when trying to fully process an SSAS database using Integration Services "Analysis Services Processing Task" task. I have 2 of these tasks which are responsible for processing the Dimensions then the Cubes. When I run the package either via the BIDS environment or on the local server from the Integration Services engine, I will get an error after about 20 minutes stating:
"Error: Memory Error: Allocation failure. Not enough storage is available to process this command""Error: Errors in the metadata manager. An error occurred when loading the <cube name> cube from the file \?D:Program FilesMicrosoft SQL ServerMSSQL.2OLAPDataMyWarehouse<cube file>.xml"
The cube name is not specific, it will fail and any of my cubes could be in the error log
If I fully process the AS database using the AS engine (logon to local AS server, right-click AS database and click Process), I get no errors at all, it processes and completes fine. The processing options are identical when I run in AS or via the SSIS "Analysis Services Processing Task" task.
I've searched quite a lot online but no joy, the information I have gleaned from various sites does not directly link SSIS with SSAS processing problems.
When either the AS processing starts via SSAS or SSIS the memory usage of MSMDSRV.exe increases to around 1.4 / 1.5 GB but never goes to 2GB ever, even when the error appears.
I've done the following with no effect.
" Have run via AS and works fine " No specific cube it fails on " Have created a Dimension only package, same problem " Changed the maxmemorylimit " Changed the connections to localhost " Memory DOES NOT max out on server
Server Specs: Windows Server 2003 Standard + Service Pack 2 4GM ram, 2GB paging file
I made a Java application to pre-process portuguese texts (stopwords, stemming, BOW creating, etc.)
I want to transform this application on a Integration Service component. I understand I will have to code this new component from zero. But I have no idea on how to start.
I'm reading and testing several tutorials on Integration Services that came with the SQL Server install package but none of them has clues on developing new components. These tutorials seams more focused on demostrate the (awesome) capabilities of Integration Services.
Is there any tutorials on how to implement new components to Integration Services ?
Hi all I have a function that I'm trying to make a report with SQL server report services. My problem is everytime I add a table I addd the function and I get the table with a red arrow pointing down, when I try to execute int sql server intelligence development studio it gives me an error message. can anyone help pleaes
I'm trying to execute an SSIS package from an ASP.NET web page. Using some code from http://msdn2.microsoft.com/en-us/library/ms403355.aspx I have managed to get this working when executing a package that contains a stored procedure, however it does not work when there is an Analysis Services cube to build.
Here is my code for the webservice
Code Snippet
Public Class SCCBuildDW
Inherits System.Web.Services.WebService
' LaunchPackage Method Parameters:
' 1. sourceType: file, sql, dts
' 2. sourceLocation: file system folder, (none), logical folder
' 3. packageName: for file system, ".dtsx" extension is appended
_
Public Function LaunchPackage( _
ByVal sourceType As String, _
ByVal sourceLocation As String, _
ByVal packageName As String) As Integer 'DTSExecResult
I'm using windows authentication and impersonating an identity. The identity has access to the analysis services role but the package still fails to run.
I know you can change the max degree parallelism server wide, but can you do it on the fly for one query? I know... trust the query processor but when I turn it off for this one sp, my query goes from 3 seconds to 0 and I got this ex-MS guy in here telling me there is a way, but he does not remember how.
I want him to simplify the sp or have his project's DBA do it, and I even offered to take a hack but.... you know.
Does anyone know about sqlserver's Parallelism. a query without parallelism takes much less time as the one with parallelism, in my case it's 6 times faster without parallelism. If that's the true. What do we need parallelism for? Any ideas Thanks
I have a function that returns a table of information aboutresidential properties. The main input is a property type anda location in grid coordinates. Because I want to get only acertain number of properties, ordered by distance from thelocation, I get the properties from a cursor ordered by distance,and stop when the number is reached. (Not really possible todetermine the distance analytically in advance.) The cursor alsoinvolves joins to a table of grid coordinates vs. postcodes (theproperties are identified mainly by postcode), and to a tablethat maps the input property type into what types to search for.Opening the cursor typically results in the creation of six toeight parallel threads, and takes approx 1 second, which is abouthalf of the total time for the function.Recently the main property table grew from 4 million to 6.5million records, and suddenly the parallelism is lost. Takingthe identical code and executing it as a script gives parallelism.Turning it into a SP that inserts into a #temp table and thenselects * from that table as the last statement also givesparallelism. But when it's in the form of a function, there isonly one thread -- and the execution time has gone from ~2 secto ~8 sec. I updated the statistics on the table, but stillno parallelism.I could turn it into a SP easily enough, but that would involvea change to the C++ program that calls it, which takes a whileto get through the pipeline. In the meantime, is there some wayto induce the optimizer to use parallelism? It used to.
hi,i've set 'max degree of parallelism' to 1 because some sql request hanged.Now when i connect, how can i set the parallelism to 4 for a session.Is there a command like this :'alter session set max degree of parallelism 4' ?ThanksPaul
If SQL Server is designed for multi processor systems, how can runninga query in parallel make such a dramatic difference to performance ?We have a reasonably simple query which brings in data from a few nonecomplex views. If we run it on our 2x2.4Ghz Xeon server it takes 6minutes plus to run. If we run this on the same server withOPTION(MAXDOP 1) at the end of the same query it takes less than asecond.Examining the execution plan, the only difference I have been able tosee is that parallelism is taking up 96% of the run time when usingtwo processors. This drops when using the one so a sort takes up thevast majority of the time for the query to run.OK, so running in parallel should mean that it's run in various partsand then 'joined up' later for performance gains, but how can it getit so wrong (timewise) ?If this is the case, will I see a significant difference changing ourserver to use a single processor, which seems completely the wrongapproach (or should I do this on each query in each app - eek) ?Do we have a problem that we don't know about that causes it to takethis long ?What can we do ? Ideally, using both processors would seem to bepreferrable.