Tranform Large Volume Of Data. Sholuld It Be Processed Chunk By Chunk?
Dec 16, 2007
Hi,
I have to transform about 60 millions of data and it runs so slow that it never finishes in my testing. Should I have to process it chunk by chunk? Or is there any other techniques I can use (I am using data flow task). Thanks for advice.
Here is my dilema. I have a 120 GB database that I need to mask customercredit card numbers in. The field is a varchar (16). I need to updatethe field so that we only store the first 4 numbers and the last 4numbers of the credit card and insert * to fill in the rest of thecredit card number.I was going to do this as a loop using the following code:While Exists (Select Top 10 * From Header Where IsNumeric(CCNbr) = 1)BeginBegin Transaction T1UpdateHeaderSetHeader.CCNbr = Left (D1.CCNbr, 4) + '********' + Right (D1.CCNbr, 4)From(Select Top 10 * From Header Where IsNumeric(CCNbr) = 1) as D1Commit Transaction T1If Not Exists(Select Top 10 * From Header Where IsNumeric(CCNbr) = 1)BreakElseContinueEndIn theory this only selects the top 10 rows, updates them, dumps the logand moves on to the next 10 until all the rows are updated.I tried running this on my test database and it fills up the transactionlog.Can anyone tell me the best way to go about doing what I need?Thanks*** Sent via Developersdex http://www.developersdex.com ***
Hi. I am trying to put a hugh chunk of text into my database for example information to a particular product which has more than 2000 characters. I had saw this datatype "nvarchar(MAX)" in SQL Server 2005 and was wondering if i can use this to store my text. Thanks
Following upgrade to SSRS2005, Reporting Services worked EXCEPT from within applications or from scheduled jobs. Running reports from application-generated URL€™s produced the following error: €˜An internal error occurred on the report server. See the error log for more details€™. These same reports, however, ran perfectly from within SSRS. After running them once from Reporting Services, they subsequently run without problem when called by applications or jobs.
Examples of these errors include the following (stack traces available if needed):
ReportingServicesService!runningjobs!13!5/27/2007-01:57:23:: i INFO: Adding: 1 running jobs to the database ReportingServicesService!chunks!f!05/27/2007-01:58:34:: e ERROR: LockSnapshotForUpgrade: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
ReportingServicesService!chunks!1a!05/27/2007-01:58:34:: e ERROR: GetChunkPointerAndLength: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
ReportingServicesService!chunks!11!05/27/2007-01:58:34:: e ERROR: ### SnapshotConverter(00d68151-85e5-4669-a0de-28ed81bd091c, True), System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
Log entries below correspond to the following attempts to run a specific report.
ENF_Comprehensive_Report 5/31/2007 9:43:37 AM rsInternalError (one of many user attempts to run from app after upgrade) ENF_Comprehensive_Report 5/31/2007 10:50:57 AM rsSuccess (first successful run after upgrade; run from within SSRS) ENF_Comprehensive_Report 5/31/2007 11:09:59 AM rsSuccess (first successful run by user from application)
It appears that after the upgrade Reporting Services was attempting to use non-existent or invalid chunks and snapshots to satisfy application or job originated requests. When first called from within SSRS, the old chunk was accessed but then appears to have been ignored, with subsequent calls running without a chunk. Our workaround for the problem was to manually run each of our 200+ reports from within SSRS to €˜initialize€™ them for applications and jobs.
Typical Application Request Failure for report: w3wp!library!a!5/31/2007-09:43:22:: i INFO: Cleaned 0 batch records, 0 policies, 4 sessions, 0 cache entries, 5 snapshots, 46 chunks, 0 running jobs, 0 persisted streams w3wp!library!a!05/31/2007-09:43:37:: i INFO: Call to RenderFirst( '/Enforcement/Historical/ENF_Comprehensive_Report' ) w3wp!chunks!a!05/31/2007-09:43:37:: i INFO: Returning old chunk for: (24788648-41c5-43d8-ba6b-409662211a37, 'CompiledDefinition', 0) w3wp!runningjobs!6!5/31/2007-09:44:29:: i INFO: Adding: 1 running jobs to the database w3wp!chunks!a!05/31/2007-09:45:37:: e ERROR: ### SnapshotConverter(24788648-41c5-43d8-ba6b-409662211a37, True), System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ßstack trace entries here€”> w3wp!library!a!05/31/2007-09:45:37:: e ERROR: Throwing Microsoft.ReportingServices.Diagnostics.Utilities.InternalCatalogException: An internal error occurred on the report server. See the error log for more details., ; Info: Microsoft.ReportingServices.Diagnostics.Utilities.InternalCatalogException: An internal error occurred on the report server. See the error log for more details. ---> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ßstack trace entries here€”> --- End of inner exception stack trace --- w3wp!webserver!a!05/31/2007-09:45:37:: e ERROR: Reporting Services error Microsoft.ReportingServices.Diagnostics.Utilities.RSException: An internal error occurred on the report server. See the error log for more details. ---> Microsoft.ReportingServices.Diagnostics.Utilities.InternalCatalogException: An internal error occurred on the report server. See the error log for more details. ---> System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
First run of report from within Reporting Services following upgrade: w3wp!library!19!05/31/2007-10:50:30:: Call to GetPermissionsAction(/Enforcement/Historical/ENF_Comprehensive_Report). w3wp!library!19!05/31/2007-10:50:30:: Call to GetSystemPropertiesAction(). w3wp!library!19!05/31/2007-10:50:30:: Call to GetPropertiesAction(/Enforcement/Historical/ENF_Comprehensive_Report, PathBased). w3wp!chunks!19!05/31/2007-10:50:30:: i INFO: Returning old chunk for: (24788648-41c5-43d8-ba6b-409662211a37, 'CompiledDefinition', 0) w3wp!library!1!05/31/2007-10:50:31:: Call to GetSystemPermissionsAction(). w3wp!library!1!05/31/2007-10:50:31:: Call to GetPropertiesAction(/Enforcement/Historical/ENF_Comprehensive_Report, PathBased). w3wp!library!e!05/31/2007-10:50:31:: Call to GetSystemPropertiesAction(). w3wp!library!1!05/31/2007-10:50:56:: Call to GetPermissionsAction(/Enforcement/Historical/ENF_Comprehensive_Report). w3wp!library!6!05/31/2007-10:50:56:: Call to GetSystemPropertiesAction(). w3wp!library!6!05/31/2007-10:50:56:: Call to GetPropertiesAction(/Enforcement/Historical/ENF_Comprehensive_Report, PathBased). w3wp!library!6!05/31/2007-10:50:56:: Call to GetSystemPermissionsAction(). w3wp!library!6!05/31/2007-10:50:56:: Call to GetPropertiesAction(/Enforcement/Historical/ENF_Comprehensive_Report, PathBased). w3wp!library!6!05/31/2007-10:50:56:: Call to GetSystemPropertiesAction(). w3wp!library!1!05/31/2007-10:50:57:: i INFO: Call to RenderFirst( '/Enforcement/Historical/ENF_Comprehensive_Report' ) w3wp!webserver!1!05/31/2007-10:50:58:: i INFO: Processed report.
Subsequent application request for same report: w3wp!library!11!05/31/2007-11:09:59:: i INFO: Call to RenderFirst( '/Enforcement/Historical/ENF_Comprehensive_Report' ) w3wp!webserver!11!05/31/2007-11:10:00:: i INFO: Processed report. Report='/Enforcement/Historical/ENF_Comprehensive_Report', Stream=''
Environment: 32-bit WK3 Enterprise R2 SP2 with single instance of SQL2000 SP3 hosting SSRS2005 SP2 metadata; IIS running in 6.0 isolation mode; separate dedicated IIS server; dedicated domain user accounts for service accounts and application pools. Over 200 reports deployed, half of which are run by users from within applications.
SSRS upgrade process: backup up existing databases, settings, keys; uninstall SSRS2000; perform €˜files only€™ install of SSRS2005 to existing SQL2000 instance; manually configure SSRS2005 and IIS; upgrade reporting services databases.
This is a followup of a previous posting to this forum 30-May 6:28 pm UTC: Problems following SSRS database upgrade.
Hi,We need to use a free database for a project because of tight budget.Is MSDE ok for handling large volume of data and 70 - 80 users?My understanding is that MSDE is optimized for 5 concurrent users.Is MySQL better than MSDE?Thanks,Ben
I have a question on training large volume of datasets. In this case, the training will take a long while to complete, is there anything we can do to improve that? I know, we obviously cant split the training dataset into different smaller datasets. What we can do to improve that?
Hope my question is clear for your help.
Thank you very much in advance for your advices and help and I am looking forward to hearing from you shortly.
How do I set an On Failure when a Tranform Data Task fails? I'm pulling in a text file and if the set the On Failure for the text file connection it says "Defining precedences between the selected items is not valid", and if I set the On Failure for the database connetion is doesn't execute when the tranform data fails. Any ideas on how capture this error?
We've just patched our Dev server, and, of our 3 servers (Dev, Test, Prod), we see major changes in the output of a raw data import process that runs nightly. Each night we import tables from a Remedy helpdesk system running on Oracle and place each ticket into a row on a table, tracking the changes and history of the ticket, etc. This includes tracking when supervisor groups are changed during the course of a ticket (ie Helpdesk to Data Comms to Billing etc). Now, after SP1, the results on Dev are skewed with partial strings showing in the From and To fields, broken in odd places (like the middle of words).
Has anyone noticed any changes in which post-SP1 SQL Server 2005 processes strings? Does it automatically trim spaces or convert NULLs etc?
The data import should be identical between Dev and the other servers.
I am facing problems with concurrent access in SQL Server 2000,The scenario is that the DB contains one huge de-normalized table containing 40 million records.
The application frequently queries this table to populate other derived tables,the sql queries take a long time to return results.
So if one query is in execution the other user's query goes into a wait mode.Please suggest how I can better this.
My day started with loading huge volume of data and my data flow task failed to do so.
My data flow has a flat file connected to a OLEDB target. This is a one to one mapping. My source file contains 50 lac records and it is of 500 MB in size.
I'm processing the data with all the default buffer settings. I have 4 CPUs in my server.
the system process DTSDebug.exe is utilizing more than 2GB page size. My average CPU usage being 70% when one of those CPU s is hitting 100% utilization.
I'm very new to SSIS. So, please provide me some info how do i set my buffers and do we have any PDF for performance and tuning in SSIS ?
Do we have any bulk load transformation in SSIS to load into DB2UDB ?
I am in the process of choosing between either SQL Workgroup or Standard Edition. I see the differences in features on the comparison table, but do not see any references to the differing capabilities in handling transactions.
Is there any differences between Workgroup and Standard in terms of handling transaction/data capabilities? i.e. Does Standard have the superior capability in handling X times more TPMs than Workgroup?
If not, am I correct to assume that this is totally determined by hardware configuration (# of CPUs, processor speed, HD speed, RAM) ?
If the data volume / transactions handling is solely determined by hardware configuration, and I know the # of transactions and amount of R/W per second, .......where would be a good reference to know what kind of hardware configuration I need (ideally, once I know the hardware configuration, I guess I would be able to determine I need Workgroup or Standard)
I have been asked to design a solution for a client of mine who basically requires the daily analysis and reconciliation of the differences between 2 extremely large text files.
The files are not in an identical format but are both in some form of delimited format (one is CSV, the other is a little more complex). For the sake of this question, let's assume that I can effectively import each file into an MS SQL table.
Each file will have in excess of 100,000 rows each day (new data for each day).
Whilst I know that MS SQL does easily have the capacity to store the data, is there a recommended way to tackle the potential problems (I imagine that performance is important... they will be running the report every day)
Or is building the solution as simple as importing the data into 2 tables, and then querying the differences and outputting as a report using Crystal?
HiYour help for the following query will be highly apprecaited. I'vewasted alot of time on it. Data definition is at the bottom.Thanks-----------------------------------Business need: User selects answers for few questions. Answers arepresented in the form of radio buttons.Questions Table:===================QuestionID QuestionName1 1.Rate your organization2 10.Opportunity in your group3 117.Effectiveness----------------------------------------Answers Table:==============AnswerId AnswerName QuestionID1 1.Best 12 2.Average 13 3.Wrose 14 1.Hardly Any 25 2.not too much 26 3.Average 27 4.Great Deal 28 1.Strong 39 2.Minor 310 3.Nothing 3--------------------------------------------------------Lets say User ABC answered below:For Question 1, she chose '2.Average'For Question 2: she left blankFOr Quesiton 3: she chose '2.Minor'And for user XYZ:For Question 1, she let blankFor Question 2: she left blankFOr Quesiton 3: she chose '3.Nothing'Then the Results table will look like this:Results Table:===============Resultsid QuestionID AnswerId User1 1 2 ABC2 3 9 ABC3 3 10 XYZ---------------------------------------------------Desired Output:===============User 1 10 117--- -- --- ----ABC 2 2XYZ 3Explanation:--> Show ALL distinct users on rows--> show ALL distinct QuestionName on the columns BUT only display thenumber before '.'--> Show the chosen answer (answername) for each user but only displaythe number before '.'-------------------------------------------------------------------if exists (select * from dbo.sysobjects where id =object_id(N'[dbo].[Questions]') and OBJECTPROPERTY(id, N'IsUserTable')= 1)drop table [dbo].[Questions]GOCREATE TABLE [dbo].[Questions] ([QuestionId] [int] IDENTITY (1, 1) NOT NULL ,[QuestionName] [nvarchar] (1000) COLLATESQL_Latin1_General_CP1_CI_AS NULL) ON [PRIMARY]GO-------------------------------if exists (select * from dbo.sysobjects where id =object_id(N'[dbo].[Answers]') and OBJECTPROPERTY(id, N'IsUserTable') =1)drop table [dbo].[Answers]GOCREATE TABLE [dbo].[Answers] ([AnswerId] [int] IDENTITY (1, 1) NOT NULL ,[AnswerName] [nvarchar] (150) COLLATESQL_Latin1_General_CP1_CI_AS NULL ,[QuestionId] [int] NULL) ON [PRIMARY]GO----------------------------------if exists (select * from dbo.sysobjects where id =object_id(N'[dbo].[Results]') and OBJECTPROPERTY(id, N'IsUserTable') =1)drop table [dbo].[Results]GOCREATE TABLE [dbo].[Results] ([ResultId] [int] IDENTITY (1, 1) NOT NULL ,[QuestionId] [int] NULL ,[AnswerId] [int] NULL ,[UserId] [nchar] (40) COLLATE SQL_Latin1_General_CP1_CI_ASNULL) ON [PRIMARY]GO---------------------------------------------INSERT Answers (AnswerName ,QuestionId)VALUES ('1.Best' ,1)goINSERT Answers (AnswerName ,QuestionId)VALUES ('2.Average' ,1)goINSERT Answers (AnswerName ,QuestionId)VALUES ('3.Wrose' ,1)goINSERT Answers (AnswerName ,QuestionId)VALUES ('1.Hardly Any' ,2)goINSERT Answers (AnswerName ,QuestionId)VALUES ('2.not too much' ,2)goINSERT Answers (AnswerName ,QuestionId)VALUES ('3.Average' ,2)goINSERT Answers (AnswerName ,QuestionId)VALUES ('4.Great Deal' ,2)goINSERT Answers (AnswerName ,QuestionId)VALUES ('1.Strong' ,3)goINSERT Answers (AnswerName ,QuestionId)VALUES ('2.Minor' ,3)goINSERT Answers (AnswerName ,QuestionId)VALUES ('3.Nothing' ,3)go-----------------------------INSERT Questions (QuestionName)VALUES ('1.Rate your organization')goINSERT Questions (QuestionName)VALUES ('10.Opportunity in your group')goINSERT Questions (QuestionName)VALUES ('117.Effectiveness')go------------------------------------INSERT Results (QuestionId ,AnswerId ,UserId)VALUES (1 ,2 ,'ABC')goINSERT Results (QuestionId ,AnswerId ,UserId)VALUES (3 ,9 ,'ABC')goINSERT Results (QuestionId ,AnswerId ,UserId)VALUES (3 ,10 ,'XYZ')go
I'm just getting started with SSIS and need pointing in the right direction with my first attempt to create a simple transform lookup.
The objective is to copy some columns from a table in one SQL Server 2005 db into another table in a different db on the same server. There are some simple column transforms involved but one column in the target table needs populating from a lookup as a result of a SQL Query.
So far, I have one OLE DB Source object linked to one OLE DB Destination in 'Data Flow'. I've configured the column transforms and these appear ok. However, the problem is that I can't see how to setup the Lookup. I have added a lookup transform object to the Data Flow space, linked from the appropriate OLE DB Source. The simple sql query returns the correct value when previewed. If I try to connect the output of this lookup object with the same OLE DB Destination that is the output from the first column transform, I get a warning message : -
"Cannot create connector. The destination component does not have any available inputs for use in creating a path"
There is column in the destination table available but I don't know how to direct the lookup output to it. I can see this available column by looking at the OLE Destination object properties and clicking 'Mappings'. The other columns are showing the links in the transformation with the destination column intended for the lookup output is not showing any links.
How do I linkup the output from the lookup object to the OLE Destination object? Is it ok to have two inputs into the OLE Destination object (ie. 1. the output from the OLE DB Source; and 2. The output from the lookup object)?
I am studying indexes and keys. I have a table that has a fixed width of data to be loaded in the first column which is parsed in a view based on data types within the fixed width specifications.
Example column A: (name phone house cost of house,zipcodecountystatecountry) -a view will later split this large varchar string based column b: is the source filename of the data load (varchar 256) ....
a. would there be a benefit of adding a clustered or nonclustered index (if so which/point in direction on why)
b. is there benefit of making one of these two columns a primary key (millions of records) or for adding a 3rd new column as a pk?
c. view: this parses the data in column a so it ends up looking more like "name phone house cost of house zipcode county state country" each having their own column.
-any pros/cons of adding indexes (if so which) to the view instead of the tables or both for once the data is parsed?
I have a confusing situation that I CAN resolve with cursors, BUT cursors are far too performance hungry to be run.
Tables: LogFileIn, LogFileOut,
I want to look on LogFileIn FLI for files that has a NULL LogFileOut(response to LFI) AND dont have any other value.
A file can have multiple LFO's depending on how many times that file gets processed.
It is confusing but bear with me.
Code:
select * from LogFileIn where IdentityFileChar = 'BPRTFILE' and responseLogFileOutId is null
This does pull out a list of files that havent been processed, BUT that same file might have been processed again and succeeded. Therefore the results I get arent accurate.
With the cursor I would have taken the filename from the LFI and used that to check if that filename comes up again in the LFO as processed.
Did you get that? And can you help?? Im confusing myself!
This table records everytime a file is run, so there might be 20 identical LFI's all with different LFO's... Im looking for LFI's that have null, and only Null.
what is order in which conditions are processed for sql query i.e for select * from table1, table2 where cond1 and cond2 and cond3 which condition will be processed first (i.e. for optimination purpose condition cutting down max no. of row shud be placed first or last?)
I'm working with SQL Server Reporting Services (SQL Server 2005) I tried to configure a Data-Driven Subscription but the run failed with the error: Done: 1 processed of 1 total; 1 errors.
I cannot find anything into the error log ReportServerService__<datetime>.log located at <Program Files>Microsoft SQL ServerMSSQL.#Reporting ServicesLogFiles.
How can I find useful information about the error message?
hello guyshere is my problem:i am developing a asp.net web app in .net 2.0. i have some sensitive data in my database. which is encrypted using DES ( with some key which is only known by the top level authorities ). now there is an option of changing the secret key. on changing the key the sensitive data has to decrypted using the old key and then again encrypted using the new key. Now if the no of records increases i am afraid that it might take a longer time and the application might look as it got hanged. guys i have no clue on how to do this. if you guys have any idea on how to implement this please let me know. any help would be appreciatedVignesh
Hi, I'm currently trying to retrieve results from a large dataset, there are over 45000 records and I need to use them all to peform counts etc. I have set up views, but my page is still being returned slowly, is there anything I can do to speed this up? Thanks Gemma
I am trying to import data into SQL Server 7. The table will be 700-800 columns, and the data will be about 150,000 records at a time. The data source is flat file.
First I create the table using a database schema, and secondly I would like to populate the table. The problem is that most of the data is numeric, and to be used for statistical analysis.
So far I have tried Bulk Insert, bcp, and dts. DTS is the only method that has worked in any way, shape or form, but that requires importing each column as a Varchar. Importing to my pre-created table doesn't work, because it is interpreting some of the source columns as character data and refusing to insert them into an int field. Bulk Insert and bcp both give error messages, and I am wondering if that is because of the size of the insert statement that is required to handle so many fields.
For the moment I am just trying to import the data in any way, but eventually, it will have to be run as an automated process, with the table structure probably needing to be altered as well.
Any help/suggestions would be very greatfully received.
I have a web site that allows user to enter large strings into a database (comments, etc). What is the best way to do that? Right now I have them limited to 25 characters and the data type is varchar. Is there a better way?
when to use table variable and temp table. i told the interviewer that when rows is less like hundreds or thousand then use table variable else use temp table.After that he asked that what do u mean by less data or thousand rows may be there are multiple columns involved with that less rows and make a huge data set.
I want to build a system that will have about 1 million rows in atable in sql server database.I am using this for a web application andaccessing it via JDBC type 4 driver.But display 20 records at a timeonly using pagination(as in google).What will be the best way to goabout this.
I have a database that is 70GB big. One of the tables has over 350million rows of data. I need to delete about 1/3 of the data in thatone table.I was going to use a simple delete command to delete the unnessacaydata.Something likeDelete from tablename where columname > '100'However, it takes SOOO LONG to run, over 8 hours, if not longer....What is the best way to delete this data???One idea was to create a Temp Table, then move the needed data to thetemp table, then Truncate the Table, and move that data back.I'm not very good with SQL Query Language, so can somone give me anexample on how to do this?? Or if you have a differant idea that wouldbe faster please let me know.thanks,Sam
HiI have a SQL2000 server with 128m rows of data. I want to delete about65m of that. So far I have bcp'ed the relevent data out and put theminto another SQL database.We have a small amount of space for our transaction log so I cannotdelete all 65m rows in one go. So far I have been doing them is 0.5mchuncks, but it is extremly slow.Would a faster way be to bcp the data I wan to keep and truncate thetable and bulk import them in again ?What hapnes to log size in when builk import is happening and is thereanother way of doing this ?Thanks for any help
I want to store some binary things(pic and so on), so I create a table which contain a a "varbinary" data-type column.
but 1. I used OPENROWSET to insert the large file in this table. 2. I used master..xp_cmdshell to retrieve data out as a file. One strange thing happened: the size of the input and output is really different(output is 1k bigger than the input file).
and it seems that the file is broken with different file format......