Problem:long Running Validation When Pointing OLEDB Source To A View.
Jan 24, 2007
One of our developers has written a view which will execute completely (returns ~38,000 rows) in approx 1 min out of SQLMS (results start at 20 sec and completes by 1:10 consistently).
However, if he adds a data flow task in SSIS, adds an OLEDB Data Source and selects Data Access Mode to "Table or view" and then selects the same view, it is consistently taking over 30 minutes (at which point we've been killing it). I can see the activity in the Activity Monitor, it is doing a SELECT * from that view and is runnable the whole time.
If we modify the view to SELECT TOP 10, it returns in a short time.
Has anyone run into this problem? Any suggestions? It is very problematic, as if the views change we have to hack around this problem.
I have a table that contains approx 200 thousand records that I need to run validations on. Here's my stored proc:
[code] CREATE PROCEDURE [dbo].[uspValidateLoadLeads] @sQuotes char(1) = null, @sProjectId varchar(10) = null, @sErrorText varchar(1000) out AS BEGIN DECLARE @ProcName sysname, @Error int, @RC int, @lErrorCode bigint, DECLARE @SQL varchar(8000)
IF @sQuotes = '0' BEGIN UPDATE dbo.prProjectDiallingList_staging SET sPhone = RTrim(LTrim(Convert(varchar(30), Convert(numeric(20, 1), phone)))) END ELSE BEGIN UPDATE dbo.prProjectDiallingList_staging SET sPhone = phone END
--4. Update failed Validation column if not 10 digits UPDATE dbo.prProjectDiallingList_staging SET sFailedValidation = 'X' WHERE(Len(RTrim(LTrim(sPhone))) <> 10)
--5. Dedup UPDATE a SET a.sFailedValidation = 'X' FROM dbo.prProjectDiallingList_staging a (nolock) INNER JOIN dbo.prProjectDiallingList_staging b ON a.sPhone= b.sPhone WHERE(a.iList_StagingID > b.iList_StagingID)
--6. Update failed Validation column if not numeric UPDATE dbo.prProjectDiallingList_staging SET sFailedValidation = 'X' WHERE(IsNumeric(RTrim(LTrim(sPhone))) = 0)
--7. Update time zones UPDATE s SET s.sTimeZone =z.sTimeZone FROM dbo.prProjectDiallingList_staging s (nolock) LEFT OUTER JOIN dbo.prPhoneTimeZone z ON left(rtrim(ltrim(s.sphone)),3) = z.sPhoneAreaCode
--8. Insert into dialing table only records that have not failed the validation INSERT dbo.prProjectDiallingList(iPrProjectId, sPhoneNumber, sTimeZone) SELECT @sProjectId,sPhone, sTimeZone FROM dbo.prProjectDiallingList_staging WHERE ISNULL(sFailedValidation,'1') = '1'
UPDATE d SET d.bProcessReporting = 1 FROM dbo.prProjectDialling d WHERE d.iPrProjectId = @sProjectId END [/code]
When I execute this stored proc it runs for more than 5 minutes. Is there anything i can do to speed it up? Maybe there is a faster way of writing these queries?
I have an Integration Services project which creates a flat file report from Analysis Services, I'm using an OLE DB as data source and running an Openquery in the SQL statement.
the problem is that Integration services runs the query twice before getting the data into the flat file. I know this because the query runs two times in Profiler, and because the same query takes half the time when run in Management Studio.
Integration Services is running the whole query when validating. how can I disable this validation or better make it validate properly.
I'm trying to get data from AS400. using OLEDB source as my connection. i'm using IBM OLEDB provider for iSeries. and working on standard edition of SQL Server 2005.
While using OLEDB source task when i set my access mode to 'table or view' and try to see list of available libraryname.tablenames, i do not get and tables
where as when i use Data access mode as 'SQL Command' i can get data (can only see preview of data) from AS400 but not able to insert that into my destination table. At run time task Fails with the error mentioned below.
I have configured Data links tab inside the OLEDB connection manager also, but when tried to set a default library it gives me error. : "Error code :CWBZZ5042" - ( catalog is invalid ) but it does exist.
Is there some settings that needs to be done from AS400 side or SQL Server side to view the available libray and its tables ?
Can some one help me on the same.
thanks in advance
Shah
Error Message received when executed with SQL command:
Error: 0xC0202009 at Data Flow Task, OLE DB Source [1]: An OLE DB error has occurred. Error code: 0x80040E00.
Error: 0xC0047038 at Data Flow Task, DTS.Pipeline: The PrimeOutput method on component "OLE DB Source" (1) returned error code 0xC0202009. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "SourceThread0" has exited with error code 0xC0047038.
Error: 0xC0047039 at Data Flow Task, DTS.Pipeline: Thread "WorkThread0" received a shutdown signal and is terminating. The user requested a shutdown, or an error in another thread is causing the pipeline to shutdown.
Error: 0xC0047021 at Data Flow Task, DTS.Pipeline: Thread "WorkThread0" has exited with error code 0xC0047039.
I am try to transfer some tables data from one database server into another database server. I create a package in SSIS, and I use a variable to pass each table name. In Data flow, I use a OLEDB Source, but I cannot set the Data access mode to Table name or view name variable. Ever time, I will get this following error info "===================================
Error at Data Flow Task [OLE DB Source [31]]: A destination table name has not been provided.
(Microsoft Visual Studio)
===================================
Exception from HRESULT: 0xC0202042 (Microsoft.SqlServer.DTSPipelineWrap)
------------------------------ Program Location:
at Microsoft.SqlServer.Dts.Pipeline.Wrapper.CManagedComponentWrapperClass.ReinitializeMetaData() at Microsoft.DataTransformationServices.DataFlowUI.DataFlowComponentUI.ReinitializeMetadata() at Microsoft.DataTransformationServices.DataFlowUI.DataFlowAdapterUI.connectionPage_SaveConnectionAttributes(Object sender, ConnectionAttributesEventArgs args)".
Some one can tell me what is the reason, or give me some examples.
I got an error when i do an OLE db Source pointing to an sql 2000 database and executing a sql query inside the OLE Source. The ole source will point to an OLE DB destination which is an sql 2005 database.
But i got the below error:
Error at Data Flow Task [OLE DB Destination [245]]: the column firstname cannot be processed because more than one code page (936 and 1252) are specified for it.
Error at Data Flow Task [DTS.Pipeline]: "component "OLE DB destination" (245)" failed validation and returned validation status "VS_ISBROKEN".
Error at Data Flow Task [DTS.Pipeline]: One or more component failed validation.
Error at Data Flow TaSK: There were errors during task validation.
If I start a long running query running on a background thread is there a way to abort the query so that it does not continue running on SQL server?
The query would be running on SQL Server 2005 from a Windows form application using the Background worker component. So the query would have been started from the background workers DoWork event using ado.net. If the user clicks an abort button in the UI I would want the query to die so that it does not continue to use sql server resources.
I have SSIS Projects taking a long time to open with packages with a large number of data flows. Is there a way to turn off validation of metadata when a package opens? Turn off validation during execution on SSIS Service (after previously validated in dev)? Or be able to control when validation takes place in general?
In my one package (1 of 5) I have 43 data flows (with a single source to target mapping) in 4 sequence containers, and it takes approximately 2-3 seconds per source to target mapping and sequence container to validate which will translate to 1 ½ to 2 ½ minutes to open. When the project with all 100+ tables for the data warehouse goes through validation, I can make coffee in the time it takes to open the project. I have to delete *.suo file (or verify all packages are closed in the designer and save the project file), and when I open the project, I have to jump immediately to SSISÃ Work Offline to set it to not validate the metadata to be able to work in a timely fashion. DelayValidation=TRUE does not help much.
Running in debug mode, has an effect of causing packages that were not open and validated to go through validation though I am not running those packages. Validate once during design and run forever.
Even if I re-open a package that I just closed from designer and had gone through validation, it will go through the validation process again.
It would be great if there could be an on-demand option off the menu bar to allow one to control when validation can take place for a project, or a more granular validation option for a specific data flow or container.
I have an XML Source which seems properly configured on design mode. But when I try to execute the DataFlow I have the following Error Message:
"The Output "AccidentData" (outputID) contains a RowsetID with a value of AccidentData that does not reference a data table in the schema"
"VS_NEEDNEWMETADATA"
I don't understand what this error message means, I have another dataflow which uses the same XML Data file and XSD file which works properly, the difference is that the working data flow does not include all columns.
I am trying to executed a packege so that it loads data from from the excel file to the SQL Server Server database. When I execute it, it prompts the following error message and 1 warning The excel file has three colums, Week, Item and Value
Error 4 Validation error. Data Flow Task: OLE DB Source [94]: SSIS Error Code DTS_E_OLEDBERROR. An OLE DB error has occurred. Error code: 0x80040E14. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E37 Description: "ORA-00942: table or view does not exist ". Test - GET NW PERF 1.dtsx 0 0
Warning
Warning 1 Validation warning. Data Flow Task: OLE DB Destination [36]: The external metadata column collection is out of synchronization with the data source columns. The column "DAY" needs to be added to the external metadata column collection. The column "TCH_AVAIL" needs to be added to the external metadata column collection. The column "PDROP" needs to be added to the external metadata column collection. The column "P_HR" needs to be added to the external metadata column collection. The column "SFAIL" needs to be added to the external metadata column collection. The "external metadata column "VALUE" (90)" needs to be removed from the external metadata column collection. The "external metadata column "ITEM" (89)" needs to be removed from the external metadata column collection. Not in use - GET NW STATS.dtsx 0 0
I was trying to load data using SSIS, Data Flow Task, OLE DB Source, source was a view to a OLE DB Destination (SQL Server). This view returns 420,591 rows from Query Analyzer in 21 seconds. Row length is 925. When I try to executed the Data Flow Task from SSIS, I had to stop the process after 30 minutes, because only 2,000 rows had been retrieved. I modified the view to retun top 440, 000 and reran. This time all 420, 591 rows were retrieved and written in 22 seconds. Next, I tried to use a TOP 100 Percent. Again, only 2,000 rows were return after 30 minutes. TempDB is on a separate SAN Raid group with 200 gig free, Databases on a separate drive with 200 gig free. Server has 13 gig of memory and no other processes were executing.
The only way I could populate the table was by using an Execute SQL Task and hard code an Insert into table selecting data from the view (35 seconds) from SSIS.
Have anyone else experience this or a similar issue? Anyone have a solutionexplanation?
I have a package that hangs in the designer after I change the sql statement in a DataReader Source from a 'select' to a 'call stored procedure'. The stored procedure takes 2 date parameters. I use an expression to build the 'call stored proc' statement and the 2 date strings. The data reader source uses an ADO.Net connection manager. The ADO.Net connection manager uses the provider for MySQL (Connector/.Net 5.1) which I installed from MySQL.com (http://dev.mysql.com/downloads/connector/net/5.1.html). Before creating the stored procedure I had been using an expression to build a 'select' statement with two date variables as follows:
The sql for the data reader source is set via the sql command property of the data flow component.
After testing the sql, I created a stored proc from this sql and then changed the expression (using the sql command property of the the data flow component) to build the 'call stored proc' statement, like this.
then when I tried to switch to the data flow tab, the editor froze, with the status bar saying "validating datareader source". The data flow tab says "Loading...". I don't know how to troubleshoot this. Each time I have tried I have had to kill the application. Any ideas/suggestions?
I was having one package which uses a source query in OLEDB Source Control and fetches the record and a couple of lookups and then an oledb command to insert/update the records in the table using as SP. I changed the source query(Infact the package) and removed in lookup and a different SP was called similar to the old one. But my problem is the package which was before taking only minutes to update 50,000 records is now taking more than 2 hrs. The problem is the number of records it is fetching from the source each time is very less.. its fetching hardly 500 records a time compared to nearly 2500 records before. Where am i going wrong? Any suggestion greatly appreciated.
I am running an MDX query in SSIS but I don't know what is the best way of doing this, performance wise. I know I can run the MDX query through an openquery in the OLEDB, and also run it through a Datareader, no openquery needed.
I know the datareader is slower in a normal basis due to .Net, but in this case the OLEDB is running an open query to a linked server which won't be fast like running a regular SQL.
If anyone knows which of this two run faster in this scenario I'll appretiate if you let me know.
Firtsly - I am new to SSIS if my approach could be improved then I welcome suggestions.
Scenario: I have a large SSIS package that consolidates / summarizes work week information from several data sources. Currently each data flow task in the control flow calculates the from and to date that is filtered on, for example:
DECLARE @FromDT AS DATETIME SET @FromDT = CAST(FLOOR( CAST( DATEADD(D, -7, GETDATE()) AS FLOAT ) ) AS DATETIME)
DECLARE @ToDT AS DATETIME SET @ToDT = CAST(FLOOR( CAST( GETDATE() AS FLOAT ) ) AS DATETIME)
I would like to remove these statements that appear in most steps and replace them with a global variable that is used throughout the package. This statement would only appear once & it would make the package much easier to run after failure etc.
Problem: I am using Data Reader Source with the 'SQLCommand' property specified. It looks like parameters are only supported if an OleDB connection is used?
So I switched to an OleDB connection and no parameters are recognised in the string - a forum search reveals that parameters in sub queries are not always found properly. The solution to this problem appears to be, to set 'Bypass Prepare' to True but this is a property for the Execute SQL task, not the Data Flow Task source.
Questions:
Does the Data Reader Source control from Data Flow Source toolbox section support parameters? Can anyone suggest a fix to the OleDB Source issue with Parameters? Is there a better way to solve my problem e.g. Using Execute SQL Task instead of Data Flow tasks etc
Example SQL:
This SQL is an example of the SQL for the OleDB Data Source (within a Data Flow task)
------------------------------ --RADIUS LOGINS ------------------------------ DECLARE @FromDT AS DATETIME SET @FromDT = CAST(FLOOR( CAST( DATEADD(D, -7, GETDATE()) AS FLOAT ) ) AS DATETIME)
DECLARE @ToDT AS DATETIME SET @ToDT = CAST(FLOOR( CAST( GETDATE() AS FLOAT ) ) AS DATETIME)
DECLARE @Attempts AS BIGINT SET @Attempts = (SELECT COUNT(*) FROM dbo.Radius_Login_Records WHERE LoggedAt BETWEEN @FromDT AND @ToDT)
DECLARE @Failures AS BIGINT SET @Failures = (SELECT COUNT(*) FROM dbo.Radius_Login_Records WHERE LoggedAt BETWEEN @FromDT AND @ToDT AND Authen_Failure_Code IS NOT NULL)
DECLARE @Successes AS BIGINT SET @Successes = @Attempts - @Failures
DECLARE @OcaV1Hits AS BIGINT SET @OcaV1Hits = (SELECT COUNT(DISTINCT LoginName) FROM dbo.Radius_Login_Records WHERE LoggedAt BETWEEN @FromDT AND @ToDT AND EAPTypeID = 25)
DECLARE @OcaV2Hits AS BIGINT SET @OcaV2Hits = (SELECT COUNT(DISTINCT LoginName) AS OcaV2Hits FROM dbo.Radius_Login_Records WHERE LoggedAt BETWEEN @FromDT AND @ToDT AND EAPTypeID = 13)
SELECT @Attempts AS ConnectionAttempts, @Failures AS ConnectionFailures, (CAST(@Successes AS DECIMAL(38,2)) / CAST(@Attempts AS FLOAT) * 100) AS SuccessRate, @OcaV1Hits AS OcaV1Hits, @OcaV2Hits AS OcaV2Hits
Please remember, I'm new to SSIS - so be detailed in your response. Thanks for your help!
Iam migrating data from one database to another .I want give input of that source and traget database names through globally declared user variables (@sourcename,@targetname)
How i can map the variables in OLE DB Source ..i dint find any option to that . Can somebody help ?
I have a pretty complex query that aggregates lots of data and inserts multiple rows of that data into a reporting table. When I call this SPROC from SQL Server Management Studio, it executes in under 3 seconds. When I try to execute the same SPROC using .NET's SqlCommand object the query runs indefinitely until the CommandTimeout is reached. Why would this SPROC behave differently with the same inputs, but being called from .NET? Thanks for your help!
Hi everyone.... I'm trying to execute this update statement... It takes an eternity... any ideas on how to rewrite or speed it up?
It's a several step process... below is everything that i run, one step at a time. The final update statement is what takes so long. It should only affect about 2600 rows out of a potential 9000. That's why I'm confused on the response time
select d.olddevicename, de.device, d.newdevicename into #temp9 from dns d, devices de where de.device = d.olddevicename
update #temp9 set device = newdevicename where olddevicename = device
update devices set device = #temp9.device from #temp9, devices where #temp9.device in (select #temp9.device from #temp9, devices where #temp9.olddevicename = devices.device)
I have 3 three scheduled job: one runs onece a day, one runs once per hour, and another runs every 17 minutes. It is a NetIQ application. I just scheduled SQL Server maintianace job last night which ran at 2:00Am and 4:00Am. This morning, I came in office and found all my jobs were still running; and they were all blocked by the first 3 jobs. I had to kill all of them. In this afternoon, I kicked off one of my many DTS packages which runs usually about 40 minutes, but it failed. I tried several times but no luck. I suspected one of user tables corrupted or one of stored procedures corrupted. After I recycle the server, and dropped the table and the stored procedure, and recreated them, the package went fine. The store procedure involves many updates and inserts.
The question I have is: is it possible to cause this problem because I killed the unfinished jobs (especially the sql maintanace job)?
NOTE: the sql maintanace job does not include the backup of database and transaction log.
My backups are running 5-6 hours on SQL2000. I'm sure they only used to take 1 hour or so. On another server, backing up the same database (both about 50 gig), the backup only takes 45 min - 1 hour. What can I look at to see why it's taking so long ?
Trying to come up with a way to monitor (without profiler, hopefully with a job and a select statement) a specific sql job that may cause a problem if the duration is too long. It seems that there is an sp called sp_sqlagent_log_jobhistory that shoves a record in sysjobhistory, but only after all the job steps run. Anyone tried this before?
Hello Gurus I am using sql 2005 and one job status is executing in job monitor in 2005,How can i check since how long this job is running? Please advice
I've got a server (SQL 2K, Win2K) where the backupshave started running long.The database is a bit largish -- 150GB or so. Up untillast month, the backups were taking on the order of4 to 5 hours -- depending on the level of activity on theserver.I'm using a T-SQL script in the SQLAgent to run thebackups. Native SQL backup to an AIT tape drive.Now, for no apparent reason, the backups are takingon the order of 24 to 26 hours. The backups completesuccessfully -- no errors, just taking an outrageouslylong time to complete. DBCCs check out AOK, noproblems with the database.No changes to the machine. No hardware changes. Nosoftware changes. Weird.Multiple tape media have been tried -- it's not a caseof a tape going bad.We've had no problems with this box for almost 4years. Now it's gettin' jiggy with us!Any ideas on where to start with this one?Thanks in advance.
Hi. I'm fairly new to SSRS, and very new to this forum. I have a report based on a stored procedure. I've optimized the procedure so that it runs from 2-4 minutes (previously over half an hour). However, when I run the report that calls the sp, it runs forever (well over 45 minutes in some cases), and the users basically give up on it. Any ideas of why this happens and what steps I can take to improve performance?
I need to execute a long running package (it takes about 16 hours to finish) to load a data warehouse for the first time with all historical data. This package it's a master package and execute other packages; I log the start time and the finish time of the package in a table to manage future incremental loads.
I executed the package on sql server where it is saved, but after 8 hours it was running, a new package was started automatically. Then two more packages started .. each every two hours.
I set the MaxConcorrentExecutable = 4, this could affect this strange behavoir ?
I'm trying to optimize a long running (several hours) query. This query is a cross join on two tables. Table 1 has 3 fields - ROWID, LAt and Long. Table 2 has Name, Addr1,Addr2,City,State,Zip,Lat,Long.
Both tables has LatRad - Lat in radians, LonRad- Lon in Radians. Sin and Cos values of Lat and Lon are calulated and stored to be used in the distance formula.
What I'm trying to do here is find the nearest dealer (Table 2) for each of Table 1 rows. The Select statement takes a long time to execute as there are about 19 million recs on table 1 and 1250 rows in table 2. I ran into Log issues- filling the transaction log, so I'm currently using table variables and split up the process into 100000 recs at a time. I cross join and calculate the distance (@DistValues) and then find the minimum distance (tablevar2) for each rowid and then the result is inserted into another Table (ResultTable).
distance=3963.1*Case when cast(S.SinLat * T.SinLat + S.CosLat * T.cosLat * cos(T.Lonrad - s.Lonrad) as numeric(20,15)) not between -1.0 and 1.0 then 0.0 else acos(cast(S.SinLat * T.SinLat + S.CosLat * T.cosLat * cos(T.Lonrad - s.Lonrad) as numeric(20,15))) end
from dbo.TopNForProcess T , dbo.Table2 S where Isnull(T.Lat,0) <> 0 and Isnull(T.Lon,0)<> 0
Insert into @MinDistance
Select DataSeqno,Min(distance) From @DistValues Group by DataSeqno
Insert into ResultTable (DataSeqno,Lat2,Lon2,StoreNo,LAt1,Long1,distance)
Select D.DataSeqno, D.Lat2, D.Lon2, D.StoreNo, D.LAt1, D.Long1, M.distance from @DistValues D Inner Join @MinDistance M on D.DataSeqno = M.DataSeqno and D.Distance = M.Distance
I created a View called TopNForProcess which looks like this. This cut down the processing time compared to when I had this as the Subquery.
SELECT TOP (100000) DataSeqno, lat, Lon, LatRad, LonRad, SinLat, cosLat, SinLon, CosLon FROM Table1 WHERE (DataSeqno NOT IN (SELECT DataSeqno FROM dbo.ResultTable)) AND (ISNULL(lat, 0) <> 0) AND (ISNULL(Lon, 0) <> 0)
I have indexes on table table1 - Rowid and another one with Lat and lon. Table2 - Lat and Long.
Is there any way this can be optimized/improved? This is already in a stored procedure.
I am new in SSIS. Anyone know how to valify number of record that I load from csv file to SQL database table?
For example, the source file call product.csv and target table in database named DSS table name PRODUCT. I load data from flat file to table then I need verification if count between source and target not match send e-mail to me.
I am debugging a Data Flow task in my SSIS package. When I run the package in debug mode, one of the OLEDB Data Sources turns red. I have rerouted all Error Output to a flat file, and put a Data Viewer on that path: no rows get sent. When I click the Preview button on this component in Design mode, I see the expected data and get no error messages. The connection does a simple table access...no SQL command. I don't see anything different between this component and other OLEDB sources in the same package that don't trigger any errors. I've tried dropping and re-creating the component with the same results.