I'm just starting to learn SSIS and would like some advice on how to handle something I encounter frequently. I often have to connect to a remote FTP site which contains a large number of files. Each day a number of files are added (old files are not purged). So each day I need to download all files which have been added since last time I checked, store on a file server, then load to a SQL database.
I've written a DataFlow using a script which generates a list of files and dumps to a raw file and doing the same for FTP directory listing wouldn't be too hard - I could then feed these into a third DataFlow to work out what the new files were, then a third to download the files to a temp directory etc.
But is there a cleaner way of doing this (I'm not adverse to scripting or even writing my own components - but for maintainability reasons I'd like to keep this as "out of box" as possible)? How are other people approaching this task?
I am trying to incrementally update a Cube to get near real time data for the end users. Currently we have a Sql server agent Job that does a FullProcess on the Cube. The Cube consists of a single Measure group which is simply one named query containing inner joins of all the dimensions and fact tables in the underlying relational database. The end users have a lot to upload during the day and they would like us to refresh the cube (near real time) to ensure the adjustments are loaded so that they could reconcile their daily PnLs. We have a MeasureId added which is an auto increment column in the Measures table.
I am trying to schedule the below XMLA query in Sql server agent Job and schedule it to run every 15mins or even less (if possible). However it seems to be not working and keep throwing all sorts of errors.
DECLARE @LastMeasureId AS INT, @myXMLA nvarchar(max) SELECT @LastMeasureId = "[Measures].[Maximum Measures Id]" FROM OpenRowset( 'MSOLAP', 'DATA SOURCE=L68F728326574; Initial Catalog=GMDR;', 'SELECT NON EMPTY {[Measures].[Maximum Measures Id]} ON COLUMNS FROM [GMDR]');
i have a unique problem here.i get four different type of flat files,which i need to pick up and process parallelly,this i am doing using four different foreach containers for the four load process as i can get multiple files ( i had other issues like rollback etc for using forecah containers).
my problem is,my input file names will have format like this yyyymmdd_salesdataforproduct_yyyymmdd_hhmmss.txt
here the first date is bussiness date and the second one is the sysytem date..if i get multiple file i have to proceess considering the second date.but by default the foreach loop is considering the first date,what can i do to ensure that only second is used to process my files.
The requirement is to process two zip files in turn.
Each zip will contain either 1 or 2 csv files, but both zips will contain the same number of csv files. The structure of the csvs inside the zips is identical.
I have set up a Foreach Loop container with a File Enumerator to look for *.zip files. The zip file name is successfully retrieved into my 'Zipfile' user variable, and I use this in an Execute Process task to unzip the zip file. So far so good.
However what I now need is to get hold of the name(s) of the csv file(s) that have been unzipped, and use those in my flat file connection strings.
I considered trying to use a nested foreach loop, the outer loop enumerating the zips, and the inner enumerating the csvs. However this is no good because if 2 csvs are present, I need them both available in the control flow at the same time as they need to be merged together.
I'm sure there's a way to do this in SSIS but I'm struggling to identify it. Anyone got any ideas at all?
Hello I have some flat files that contain CSV records with different number of fields but the first 4 fields of each record type are the same of each re. eg there would be an entry of one record that has eight fields and another that has 6 fields. Which of the items in the toolbox can i use to filter the records based on the entry in the first 4 fields so i can process the filtered records.
We have a scenario to process last created/modified files from a location using SSIS package , eventhough the folder contains multiple files with same name and extension.
Kindly give respond to this if any one has worked on this.
I proposed on a new server that we separate Data Files, Log Files, tempDB, Backups, etc. onto separate LUNS on a SAN with High Speed Solid State Drives.I was told that with the new technology with solid state SAN's that it would decrease performance and that it did not work the same way as it did when you had RAID 5's etc.I thought that if things were cared out correctly by a SAN Administrator they would know how to configure for optimal performance.
In the For Loop, How to Iterate from Older flat files to Newer flat files based on File's Timestamp. If there are some older files in that folder, it should be processed first and then continue with the newer one.
In the first step of my SSIS package I need to get files from FTP and dump it/them in a local directory, but it's more than that, the logic is like this: 1. If no file(s) found, stop executing and send email saying no file(s) found; 2. If file(s) found, then compare it/them with existing files in our archive folder; if file(s) already exist in archive folder, stop executing and send email saying file(s) already existed, if file(s) not in archive folder yet, then transfer it/them to the local directory for processing.
I know i have to use a script task to do this and i did some research and found examples for each of the above 2 steps and not both combined, so that's why I need some help here to get the logic incorporated right.
Thanks for the help in advance and i apologize for the long lines of code!
example for step 1: ----------------------------------------------------------------------------------------------------------
' Microsoft SQL Server Integration Services Script Task ' Write scripts using Microsoft Visual Basic ' The ScriptMain class is the entry point of the Script Task.
' The execution engine calls this method when the task executes. ' To access the object model, use the Dts object. Connections, variables, events, ' and logging features are available as static members of the Dts class. ' Before returning from this method, set the value of Dts.TaskResult to indicate success or failure. ' ' To open Code and Text Editor Help, press F1. ' To open Object Browser, press Ctrl+Alt+J.
Public Sub Main()
Dim cDataFileName As String Dim cFileType As String Dim cFileFlgVar As String WriteVariable("SCFileFlg", False) WriteVariable("OOFileFlg", False) WriteVariable("INFileFlg", False) WriteVariable("IAFileFlg", False) WriteVariable("RCFileFlg", False) cDataFileName = ReadVariable("DataFileName").ToString cFileType = Left(Right(cDataFileName, 4), 2) cFileFlgVar = cFileType.ToUpper + "FileFlg" WriteVariable(cFileFlgVar, True) Dts.TaskResult = Dts.Results.Success End Sub Private Sub WriteVariable(ByVal varName As String, ByVal varValue As Object) Try Dim vars As Variables Dts.VariableDispenser.LockForWrite(varName) Dts.VariableDispenser.GetVariables(vars) Try vars(varName).Value = varValue Catch ex As Exception Throw ex Finally vars.Unlock() End Try Catch ex As Exception Throw ex End Try End Sub Private Function ReadVariable(ByVal varName As String) As Object Dim result As Object Try Dim vars As Variables Dts.VariableDispenser.LockForRead(varName) Dts.VariableDispenser.GetVariables(vars) Try result = vars(varName).Value Catch ex As Exception Throw ex Finally vars.Unlock() End Try Catch ex As Exception Throw ex End Try Return result End Function End Class
example for step 2: -------------------------------------------------------------------------------------------------------
' Microsoft SQL Server Integration Services Script Task
' Write scripts using Microsoft Visual Basic
' The ScriptMain class is the entry point of the Script Task.
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Public Class ScriptMain
' The execution engine calls this method when the task executes.
' To access the object model, use the Dts object. Connections, variables, events,
' and logging features are available as static members of the Dts class.
' Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
'
' To open Code and Text Editor Help, press F1.
' To open Object Browser, press Ctrl+Alt+J.
Public Sub Main()
Try
'Create the connection to the ftp server
Dim cm As ConnectionManager = Dts.Connections.Add("FTP")
It works remotely if I run it via command prompt. But when I add this to a TSQL job on my remote SQL instance, it runs without deleting anything. What I'm missing?
Brief overview...Running SQL Server 2003 Server Enterprise 64 bit - All Service Packs and patches current SQL Server 2005 Enterprise Edition 64 bit Build Microsoft SQL Server 2005 - 9.00.3054.00 (X64) Mar 23 2007 18:41:50 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
I cannot import any SSIS packages nor crete any new folders under stored packages. I hve googled the news groups and looked at BOL to no avail. HELP!!!!
I am thinking about replacing the INSERT data scriptfiles that I have with XML files. This way I can open the XMLfile using an XML Editor and see the values in a GRID andmake changes easier.Do you see any problem with this approach?I managed to put together some code that is exportinga SQL table with its data to an XML file and also a codethat reads the XML file's data and inserts it into a table.Now I am researching on XSD, td:datatype, DTD...(I am new to XML) in order to figure out how I canuse a single xml file that will hold both the sql serverfields, the datatypes and their values.If you have links to some sample code that has anythingto do with the datatype export and import I am workingon, can you please share them with me?Most importantly what do you think about the idea of usingXML files vs sql scripts?Thank you
I have a DTS that imports data from an orcle database into SQL Server. Doesn't the processing mostly occur on the SQL Server, not on the oracle database from which the data is being imported? The oracle database is vendor provieded and they are saying our SQL Server DTS package is killing their server. Any insight is appreciated. Thanks
I've got a process that creates records in my database based on XML input that I've gotten. What I am doing it giving this XML to a stored procedure to handle a specific task, then modify the XML and send it to the next stored procedure.
For instance, the XML could hold header records with detail records, I would first send the XML to a stored procedure that creates the header records, then updates the XML so the XML now knows the identity values of the header records I have just created, and then send the XML to the next stored procedure to create the details for those headers.
All works great and fine, but I have a problem with writing the identity values back to the XML. It seems I can only change one item in the XML at a time and thus need to loop this. For many records this really takes a long time.
Here is some sample code of what I'm doing (please excuse any typos, this is a simplified version of the code) :
declare @lvSeq numeric(15) declare @lvRowNo int declare @lvNumRows int
insert into myHeaderTable ( recid, recdesc ) select ref.value('@recid', 'nvarchar(25)') recid, ref.value('@recdesc', 'nvarchar(250)') recdesc from @pXML.nodes('//headers/header') R(ref)
select @lvRowNo=1, @lvNumRows = @pXML.value('count(//headers/header)', 'int') while (@lvRowNo<=@lvNnumRows) begin select @lvSeq = recseq from myHeaderTable where recid = @pXML.value('//headers/header[position()=sql:variable("@lvRowNo")]/@recid)
set @pXML.modify('replace value of (//headers/header[position()=sql:variable("@lvRowNo")]/@recseq with sql:variable("@lvSeq")')
select @lvRowNo=@lvRowNo+1 end
Obviously I am looking for a better way to update the XML with the sequences. The insert takes a second, the loop takes minutes with large XML sets. I guess MSSQL is searching the whole XML to find the item to update.
It would be nice if I didn't have to loop through the XML. One solution I was thinking off is to store the XML in a temporary table with a single record per header item. Then I could do the modify in one go and recreate the XML by simply selecting the contents of the temporary tabel. I have no idea if this is possible.
So something like this:
select ref.value('@recid','nvarchar(25)') recid, ref.value('.','XML') XMLData -- this gives an error into #TMP_XML from @pXML.nodes('//headers/header') R(ref)
insert into myHeaderTable ( recid, recdesc ) select recid, ref.value('@recdesc', 'nvarchar(250)') recdesc from #TMP_XML CROSS APPLY XMLData.nodes('/header') R(ref)
update #TMP_XML set XMLData.modify('replace ....') from myheadertable where #TMP_XML.recid = myheadertable.recid
Hello friends, I needed a suggestion, I am currently working on a reporting website that generates reports and i need to store all the reports in the database.
I usually go by row wise processing as it can be easily controlled but the problem is there will be a lot of reports, that is an estimation of 30,000 rows in a month and i m not sure if sql server can hold more than 2 billion rows per table.
I will just explain whole scenario what I m facing in tricky problem..
We have xml files coming at regular interval by some other source into sql server 2000…daily having records near @10000 to 70000…we have job scheduled to run it regular interval…we doing this by some filter criteria… suppose the flow is like staging table into secondary table and then final into primary table…. We design DTS package accordingly means take the records from staging table put into secondary table and then into primary table…(near @ 8 task involved in it…) Suppose xml file came at 8:30 am and our DTS package will run at 9:00 am…and then 11:00 am and the 1:00 pm like that….what I observing from many days is that after running job at 9:00 am successfully some good data still pending in secondary table not processed into primary table. But when again job ran at 11:00 am it processed that pending good records into primary table…some times when I ran this job manually through DTS design level the good data that pending in secondary table processed!!!
My question is that why this job not processed all the good records in single shot????
Hello everyone. I need help regarding the following:Given the following table:CREATE TABLE T1 (C1 nvarchar(10), C2 money)INSERT INTO T1 VALUES ('A',1)INSERT INTO T1 VALUES ('B',2)INSERT INTO T1 VALUES ('C',3)let's say that i have this table in a local server and i want to uploadit to a remote server and in the remote server upload it to a databasethat contains the same table.the uploading part can be done by another application in the remoteserver, but i want i need is a way to transfer the data at the fastestpossible way.what steps do i need to follow?tia,Rey Guerrero
Which component should be used to process dataset as a whole, and not on per row basis? I have need to process dataset conditionally (condition based on dataset), e.g. if a special row is present in dataset than dataset should be processed in a special way. Should I maybe use one Script Transformation to determine if dataset satisfies condition (and store that result into a variable), and then based on that condition (variable value) perform or not processing (using Conditional Split and Script Transformation)?
I am having trouble trying to construct the following process in SSIS/SQL 2005:
1. Grab a set of unprocessed rows (ProcessDT = null) in an 'Action' table 2. For each of these rows, execute multiple stored procedures base on the action type If actiontype = 1, exec spAct1a @param1, @parm2 exec spAct1b @param1, @parm2, @param3, @param4 If actiontype = 2, exec spAct2a @param1, @parm2, @param3 exec spAct2b @param1, @parm2, @param3 etc.... 3. Update ProcessDT so it's not processed again 4. Repeat until all rows are processed
Note - all sp params are contained in additional columns in the Action table. Basically the Action table is a store for post-event processing of sorts but is order dependent, hence the row by row processing. And some of my servers are 2000 so Service Broker is not an option (yet).
I first attempted to do this totally within the control flow - using an ado recordset/foreach loop control, but I could not figure out how to run conditional process paths based on the ActionTypeID. I then tried to do this within the dataflow using on OLEDB data source, a conditional split, and an oledb command control which almost got me there - the problem being for each row I need to execute multiple sp's and it appears as if the oledb command only gives me one sp.
I'm populating a new table based on information in an existing table. The new table is a list of all "items" and contains a primary key. The old table is a database of receipts where items can appear many times in any order.
I have put together the off-the-shelf components to do this, using a lookup transformation to see if the item is already in the new table. Problem is, because there's so much repetition in the old table I need to process the old table one row at a time. Batch processing is generating errors because the lookup doesn't detect duplicates within the buffer.
I tried setting the "DefaultBufferMaxRows" property of the task to 1, but that doesn't seem to have any effect.
To get the data from the old table, I'm using an OLE DB source. To get the data into the new table, I'm using the OLE DB Command transformation with parameters to execute an INSERT statement.
This is a job I have to do exactly once, so I don't care if I have to run it overnight. I'd rather have a simple, easy to understand but inefficient script so I understand what it's doing completely.
Any help on forcing SSIS to process one row at a time?
I am verifying my reports processing time. I get the information from the Reporting Service DB - [ExecutionLogs] table. I have the following information:
[TimeEnd] €“ time that reports generation ends.
[TimeStart] - time that reports generation starts.
[TimeDataRetrieval] - amount of time spent running the data sources.
[TimeProcessing] - time spent processing the report.
[TimeRendering] - time spent generating the output format.
If this information is correct the following statement should be true:
When using the AS processing task with a connection to "an Analysis Services project in this solution", only some processing options are available for processing dimensions. For instance, it is not possible to select "Process Update". Once I change the connection manager to point to the deployed cube database, I can choose from all the options. Is this by design?
I need help from sql experts for the following problem
DOCTYPE DATE QTY PRD LOT Purchase 1 jan 20+ AA 2007FW Purchase 4 jan 50+ AA 2007SS Purchase 9 jan 10+ AA 2007FW Sale 3 jan 10- AA Sale 4 jan 20- AA Returned Good 4 feb 10 AA
As you can see I don't have the LOT code in sales records, so I must update these records with the following logic:
I have to assign LOT code to sales in FIFO order: from the table above the 3rd of January I find the first sale of 10, take the first LOT (ascending date order), check for qty on hand (20+), consume 10 from LOT (10 remaining), update sale record with 2007FW LOT code.
Then I find next sale of 20, as before I take the first LOT with qty on hand to consume, again it's the first record with only 10 remaining so I set LOT qty on hand to zero but I have 10 more to allocate to a LOT code. So I find next available LOT of 50 and consume 10, with 40 remaining.
It's important to remember that I can sell part of a "lot", have to track the remaining goods per LOT
Is it possible in SQL to do that in batch mode? How? If I have to split sale record consuming 2 or more LOTS, how can I do? Can you show me SQL or good hints?
I have 3 cubes in a single SSAS database and these cubes should be processed using the following schedule
Cube 1 - Every Day Cube 2 - Every Week Cube 3 - Every Month Cube 4 - Every Day
The issue that I face is that these cubes share the dimensions and so I cant do a FULL process of these SHARED Dimensions as it will affec other cubes.
I can expect additions and deletions to my dimension data , but the structure remains the same. It would be great if someone can suggest how to go about processing the dimensions. I am confused with the number of options(Process Incremental, Process Update etc.,) available for processing the dimensions.
I will creating a SSIS package to automate the processing. One more question is say, if Cube 2 fails during a day and Cube 1 has succesfully processed on the same day earlier, how do I revert back to the old state of Cube 2? Does this mean that I need to do a back up of the SSAS database before processing each cube?
I am in the midst of designing a new Data Warehouse system. As we get further into the design of the system, the more we are realising how complex our ETL is going to be and that the amount of time it will take to run could be significant i.e. a few days! My question is obviously I don't want to have a down time in my relational system for this long and prevent my users from accessing the data for days at a time each month. So what functionality should I be looking at to allow me to maintain a working copy of the data that users can query whilst I perform database updates and then perform a quick promotion of the updated data to users for querying?
If you can point me in the direction of the right functionality in SQL Server 2005 and possible some relevant white papers that cover this sort of scenario I would be grateful.
I am wanting to reduce the amount of Virtual Log Files I have. In reading through the Online Book Documentation, I realize that I have forgotten to move the Transaction Log Files to a different drive. Now that the server is in production, I wanted to get some input about the best way of making this change.
Can I just change the directory the log files are being written to in the DB properties without having any adverse problems occurring?
Hi! I'm using replication with two database on SQL 2000,when begin, the log files size is 50mb and the data files size is 150mb. But now the log files size is 2Gb and the data files size is 4Gb. I would like to decrease the log files and the data files ??? How do i do this??? (I using Truncate and shrink doesn't change ) Thanks!!!
I need to write codes to access many image files and wave files and display them on the webpage. Both of these files are stored on the server (as many experts in this forum adviced) and their locations are in the SQL server database. About the wav files, I think I will have to convert them to MP3 first for fast delivery on internet. Since I am a newbie, any help would be appreciated. Thanks,
I have a problem with bcp and format files.We changed our databases from varchar to nvarchar to support unicode. Noproblems so fare with that. It is working fine.But now I need a format file for the customer table and and it is notworking. It is working fine with the old DB with varchar, but withnvarchar I'm not able to copy the data. The biggest problem is, that Igot no error message. BCP starts copying to table and finished withouterror message.This is my table:CREATE TABLE [dbo].[Customer] ([ID] [int] NOT NULL ,[CreationTime] [datetime] NULL ,[ModificationTime] [datetime] NULL ,[DiscoveryTime] [datetime] NULL ,[Name_] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AI NULL ,[Class] [int] NULL ,[Subclass] [int] NULL ,[Capabilities] [int] NULL ,[SnapshotID] [int] NOT NULL ,[CompanyName] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AI NOTNULL ,[TargetRCCountry] [nvarchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AINOT NULL ,[LocationID] [int] NULL ,[MirrorID] [binary] (16) NULL ,[DeleteFlag] [bit] NULL ,[AdminStatus] [bit] NULL) ON [PRIMARY]GOand this is the format file:8.0131 SQLINT 1 12 "#~@~#" 1 ID ""2 SQLDATETIME 1 24 "#~@~#" 2 CreationTime ""3 SQLDATETIME 1 24 "#~@~#" 3 ModificationTime ""4 SQLDATETIME 1 24 "#~@~#" 4 DiscoveryTime ""5 SQLNCHAR 2 510 "#~@~#" 5 Name_SQL_Latin1_General_CP1_CI_AS6 SQLINT 1 12 "#~@~#" 6 Class ""7 SQLINT 1 12 "#~@~#" 7 Subclass ""8 SQLINT 1 12 "#~@~#" 8 Capabilities ""9 SQLINT 1 12 "#~@~#" 9 SnapshotID ""10 SQLNCHAR 2 510 "#~@~#" 10 CompanyNameSQL_Latin1_General_CP1_CI_AS11 SQLNCHAR 2 510 "#~@~#" 11 TargetRCCountrySQL_Latin1_General_CP1_CI_AS12 SQLINT 1 12 "#~@~#" 12 LocationID ""13 SQLBINARY 1 33 "#~@~#
"13 MirrorID """#~@~#" is the field terminator. We have a lot of text files with allkind of charachers in it. So we think this is a set that will neveroccur in our files.Thanks for your help!*** Sent via Developersdex http://www.developersdex.com ***Don't just participate in USENET...get rewarded for it!
Are there any MDS Descriptor Files associated with Databases Files? Are these associated with transaction logs? As per my information there are .mdf , .ndf and .ldf files.Could anyone please guide me ASAP,as there is an urgent requirement from one of the clients?