I have been developing a genealogy application using a SQL Server 2000 database and ASP .NET 2.0. In this application a process, Ged.Parse, converts data from the GEDCOM standard format (a heirachical file format that looks as if it was designed for 80-column cards) into my SQL Server database.
As we started to load reasonable quantities of data into the system we found that the on-line response became abysmal. This problem was fixed by defining a number of secondary indexes (response times dropped to under a second, from previously exceeding 2 minutes and often timing out). Unfortunately however the processing time of Ged.Parse then tripled, and it may now take up to an hour to process a GEDCOM. I believe that this is a byproduct of defining several indexes that are not needed by Ged.Parse itself, but which are of course maintained as Ged.Parse inserts new records into the database.
I am wondering what my best strategy is, apart from putting Ged.Parse into a background task and just letting it trickle away. (I will probably do this anyway). What I'd like to be able to do is to have Ged.Parse load records without creating the secondary indexes, and then create the indexes for the newly-added records as a penultimate step just before it makes them available for general use. Of course there is no way that you can do this: records in a table are either indexed or they are not.
Proposed change: recode Ged.Parse to load data into temporary tables, say NewPeople, NewFacts, etc., with these tables having only the indexes required by Ged.Parse. Then, as the last process in Ged.Parse run a SQL procedure with code like: -
Insert into People Select * From NewPeople
Delete from NewPeople
etc
This is a reasonable amount of programming, so before I make this change could somebody tell me: will this be significantly faster overall, or is this likely to make little or no improvement compared to the present process in which Ged.Parse loads data directly into People, Facts, etc? Two facts that may influence the answer. First, all record relationships are through GUIDs, so records in NewPeople, NewFacts, etc would already have their final key values. Second: although Ged.Parse needs to form relationships between records, these relationships are only within the new records (created from the same GEDCOM), and Ged.Parse does not need to relate any of these new records to earlier records.
I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.
In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.
As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.
When I read the instructions linked above, I hear “Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”
I’m looking for clearity on partition switching. The idea is to use many BULK INSERT statements into table dbo.X_n in parallel and when BULK INSERT for table dbo.X_n is completed, switch dbo.X_n into dbo.bigdaddy. I think this is the fastest way to upload a couple hundred GB of data.
In learning about partition switching (in part) from The Data Loading Performance Guide under Partition SWITCH, I hear the instructions to say copy the main table exactly to become a target. But in that same step (#1), I read that we need to change the default file group of the target (dbo.X_n) from the default file group. Then it says I need to match indexes and lists the filegroup as something we need to match with the main table.
As an overview of the partition switching strategy, I think the whole point of BULK INSERT with partitioning is to have seperate files (in same group) to enable concurrent uploading where each table has its own file. Once the upload is completed to a table (dbo.X_n) then we do the partition switch into the main table (dbo.bigdaddy). The data we just uploaded doesn’t actually move, just the metadata for it.
“Don’t have the same filegroup on your target as the main table. You must have the same filegroup on your target as the main table.”
Don't know if this is the right forum to be asking this, but I'll give it a try...
I'm relativelly a beginner in SQL Server and T-SQL in general. The problem I'm trying to solve is the following:
The big picture is that I have data coming from different data sources which I need to store on a database for later reference. Each data source might have a different set of measurements. For example, data source 1 might log Pressure and Humidity while data source 2 logs Pressure and Temperature. Once the data is present on the DB, the users can go ahead and retrieve data for a given [datasource/measurement/time interval] to generate reports or charts.
My implementation so far consists of two tables: series_info and series_data. series_info holds general information for a given series of measurements for a given data source (Pressure for data source 1, Pressure for data source 2, Humidity for data source 1 and Temperature for data source 2, in our example). Each series has a bigint index as primary key.
The table series_data contains all data relative to the series from series_info. Each piece of data has a bigint as a primary key, an associate time (which is always crescent) and a foreign key to the series it represents (in series_info).
Alright, everything is cool so far. However, whenever a user wants to retrieve data for given [data source/measurement/time interval], this takes very long, since all data is interposed in series_data and for every search it's necessary to find where the desired data actually lies.
One obvious solution for this would be to dynamically create a new table to hold the data for each series, but that would just make my database disorganized, since there would be thousands and thousands of tables.
Another thing that comes to my mind is to create a table with information of where lies the data for a given [data source / measurement] for given dates. So when the user requested data for a given [data source/measurement] between, say, january and february, we would first look at this intermediate table and find out that the data lies between indexes 1000 and 2000 on the series_data table, so the next SELECT command to series_data would already contain a restriction like WHERE index>=1000 and index<=2000. This should probably improve the speed of retrieval.
What do you guys (or girls) think? Maybe there's simply a classical solution for such a case.
Regarding SQL Server data, I am looking to implement the beset Data-Archive and Purge policy. Normal, we do SQL Backups and keep the history for some period , for example, 8 weeks, so we can go back and restore any data point in time upto 8 month in past. and we also do Tape backups.
Question is Where can I get nice article or documentation on this to best design such policy where I make sure that I am covered for point in time recovery of database (which is sql backups) and point in time recovery in far past, say, 3 years ago using tape backups, and I need to make sure that I don't repeat the same efforsts.
I am writing a query where I am identifying different scenarios where data changes between one week and the next. I've set up my result set in the following manner:
PrimaryID Field Changed Previous Value New Value 10003 SKUName SKU12345 SKU56789 10003 LocationId Den123 NYC987 etc...
The key here being that in the initial resultset ID 10003 is represented by one row but indicates two changes, and in the final output those two changes are being represented by two distinct rows. Obviously, I will bring in the previous and new values from a source.
I'm using Script Component to load data into Oracle DB due to the poor performance issue. Now, I found it will missing some data during the transmission. Please see the screenshot below:
I am getting ErrorCode 8 while loading the data from stage to model. I have checked my error view it states that "Member Code is Inactive".
Initially I have loaded same set of data in Model from MDS Stage table but then deleted with ImportType = 5 which removed all the data from the MDM model.
Now i want to load it back but its giving the Error Code 8 .. Before loading the same data i have changed the stage table Importtype to 2 and Importstatusid to 0.
Hi, all experts here, Do we always have to use SCD component for the loading of data into data warehouse to handle changes of rows? I am looking forward to hearing from you and thank you very much in advance for your help. With best regards,
Hi i am trying to do a straight forward load from a Flatfile source , i have defined the columns according to the lenghts defined in the Data Dictionary Provided but when i am trying to run the Task i am encounterring this error
The column data for column "Column 20" overflowed the disk I/O buffer.
I tried to add another column 21 at the end and truncate or leave that column unmapped to destination but the same problem occurs for column 21 what should i do to over come this .
In case of Bad Data how to clean up the source.. Please help me with this
Hello!! searching information about how to migrate some date from an old data base (any tipe) from SQL Iv found this: LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE tbl_name [FIELDS [TERMINATED BY 'string'] [[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char' ] ] [LINES [STARTING BY 'string'] [TERMINATED BY 'string'] ] [IGNORE number LINES] [(col_name_or_user_var,...)] [SET col_name = expr,...)] Does anybody know how does it works and how to use it????Id like to know because I have to load data from a text file to a SQL Data Base and this seems to be te fastest an easiest way to do it...Thanks!!!!bye!
The project is a C/S data analysis system built with .Net 2.0 in windows environment, OS: Microsoft Windows 2003 R2 standard Edition Service Pack2, Database used in this project is: Sql server 2005. As a data analysis system, we need to load large amount of data from file to database, we do it by create a dts package and then do data loading by execute "m_Package.Execute(null, variables, m_PackageEvents, null, null)".
The problem is, we fount that DTS miss some data randomly sometimes, we can't find the rule till now. for example we've data as follows in data file, all data field splited by '|' 11234|26341|2007-09-03 00:00|0|0|0.0|0|0.011470833793282509|1|0.045497223734855652|0|0|1|0|3|2929|13130|43|0|2|0|0|40|1|0|0|0|0|0|1||0|0|3|0|0|0|0|0||0|3|0|0|43|43|0|41270|0|3|0|0|10|3|0|0|0|0|0||0|1912|0|0|3|0|0|0|0|0|0|0|3|0|0|5|0|40|0|9|0|0|0|0|0|0|0|0|29|1|1|24|24.0|16|16.0|0|0|0.0|0|0|24|23.980693817138672|0|0.0|0|0.0|0|0.0|0|0.0|11|2.0764460563659668|43|2|0|0|30|11|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|3|3|0|0|0|0|0|0|0|0|0|6|0|0|0|0|0|6|0|0|45|1|0|0|0|2|42|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|2|0|0|0|2|0|0|0|0|0|0|51|47|85|0|0|||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|||||||||||||0|0|0|0|0|97.117401123046875|0|0|83|57|||0.011738888919353485|0|1|0.065955556929111481|0|4|||0.00026658669230528176|1|0.00014440112863667309|1|68|53|12|2|1|2.0562667846679688|10|94|2|0|0|30|11|47|4|13902|7024|6878|18|85|4.9072666168212891|5|0.0|0|0.0|0|0.0|0|0.0|0|358|6448|6324|0|0|0|0||0||462|967|0|41|39|2|0|0|0|1|0|0|0|0|0|0|0|0|3|0|0|3|0|0|0|0|0|0|0|0|0|3|0|3|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|46|0|1|0|1|37|0|0|46|0|1|0|1|37|0|0|0|0|0|0|0|0|0.0|0|0|6|4|2|0|0|2|1|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|0|1|0.012290795333683491|0|44|44.0|0|0.0|0|0|0|30|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|2|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|2|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|27|0|0|2112|411|411|45|437|2|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|4|0|4|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|0|0|0|0|6|6|0|3|2|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|5|5.0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|600|600|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|6|0|0|0|0|0|0|6|0|9|1|2|2|3|0|1|0|0|0|0|0|0|0|0|0|0|0|13|3|2|5|1|1|1|0|0|0|102|0|1|1|0|0|0|3|3|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||||||0|0|0|0|0|0|0|0|0|0|0||0|0|0|0|0|0|0|0|0||||||||||0|0|0|0|0|0|0||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|46.0|46|0.0|0|0.0|0|0.011469460092484951|1|0.0|0|0.0|0|3|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0.0|0|0.0|0|0|0|0|0|0|0|0|0|0|0|0|0|||0|100.0|100.0|0|1|0|1|0|0|0.02481505274772644|1|0.014951236546039581|1|0|0|0|0|0|0|0|0|0|0|0|0|0|||||||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|||0|||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|4695.5556640625|42260|7126.66650390625|62040|||||||||||||||||||||||||||||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0||||||||||0|0||||||||||
We found that some of the data field become 'null' after the load action finished, if we load the same data again, problem disappeared, we can't 100% reproduce this issue each time, we don't know why, Anybody here can help us to solve this issue or give us some clue?
I have a package which loads data from a flat file (csv) to 4 tables in a database. Now, the load is incremental.
I want to clear the data of all 4 tables(in the database) before loading the data from flat file everytime.How can i do this? Iam using 4 Oledb Destinations, 1 multicast, 1 source component to do this. Also can it happen like a transaction? because if it deletes the existing data and couldnt load new data there will be a problem!.how to avoid this?
Hi, I am loading Data from Mainframe to SqlServer on WinNt. Normally it was taking 35 mts do the dts job. on last two days it runs for more then six hours still the job doesnt get over. I am at a loss to know what to do and how to fix this problem. THe main frame ppl said sqlServer is fetching the data very slowly. If anyone knows the solution pl post a solution
What is the best way to load large amounts of data? I am working on a project where I will need to load data into approx. 20 tables. Into several of the tables I will need to load around 400,000 records. I am familiar with the concepts involved in using BCP but was hoping I could avoid the step of going to text files. I am pulling data from Access (either 97 or 2000). Any suggestions would be welcome.
I have an xml file I want to use as the source. It's not overly complicated, but not simple either. It has one hierachy, and one optional field and looks like this
a
1 'text'
2 'text'
b
1 'text'
2 'text' 'optional field'
Ok, now I want the data to load like this:
a,1,text
a,2,text
b,1,text
b,2,text, optional field
but when I try to use the xml source it won't create the xsd...anything I can do?
I am trying to enhance an existing package that actually does the xml to table load, this packge is using a script component to parse and load the xml file into multiple tables (3 tables) using VB.net code, I want to add a component to it where if there are any xml rows that are erroring out , they get redirected to a different table, this log table is having only one column and the xml records are supposed to be loaded into this column in XML format. this is the existing design and I have to live with it and at the same time I am not a big .net coder, any help is appreciated.
Hi! I need to load text data into SQL7. The tricky part (at least for me) is that this data may be duplicated. How can I load this data discarding the duplicated rows? AFAIK, the sql job (or DTS) will fail if a primary key is violated. TIA, Fabio Aneas
Using SQL Server 7.0, I need to watch for a file to be placed in a directory and then load it automatically. What is the best way to do this? I have the Bulk Insert process set up in DTS and would like to just add another process if possible.
I am new at SQL and I have been asked to load some data into a database. I was given a file that has an extention .sql. shown below are the first few lines
This looks to me like it was built to be scripted in or use some function of SQL to create and populate the table... does that make sense? Anyway, is there an easy way to insert this data into a table?
Hi, I created a package to load a fact table which loads more than 7 million records. When loading the table it took nearly 15 minutes. Then indexes were created for the table at the DB level after which the time to load same number of records has increased. How to resolve this time delay?
I have a remote website which uses sql server database. the sql job runs in a server called goofy which imports the data from remote sql to an access sitting on another server called dumpy. I get below error when trying to import data. I check but no one is accessing this data.
Executed as user: GOOFYSYSTEM. ...e Package Utility Version 9.00.1399.06 for 32-bit Copyright (C) Microsoft Corp 1984-2005. All rights reserved. Started: 7:45:29 PM Progress: 2007-10-25 19:45:29.44 Source: Data Flow Task 1 Validating: 0% complete End Progress Progress: 2007-10-25 19:45:29.44 Source: Data Flow Task 1 Validating: 33% complete End Progress Progress: 2007-10-25 19:45:29.60 Source: Data Flow Task 1 Validating: 66% complete End Progress Error: 2007-10-25 19:45:29.68 Code: 0xC0202009 Source: importdata Connection manager "SponsorshipData 1" Description: An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80004005 Description: "The Microsoft Jet database engine cannot open the file '\dumpyd$CompanyCCNASponsorshipData.mdb'. It is already opened exclusively by another user, or you need permission to view its data.". End Error Error: 20. The step failed.
my another question is. there is a column of type memo in access. but in remote sql i have that field as ntext how could i also pull ntext into memo field. i saw in forums but i found it too complex to understand is there any step by guide to pull ntext data into memo field in ssis.
I have got an xml file with size more than 2 GB. I have to load this file into tables. With 32 bit platform, I am unable to load this file using SSIS. Ram is 8 GB, but it is still bombing out. As I know it uses XML DOM Parser and tries to shred the file in memory and because of memory limition, it fails. Although I have already written code in C# using XmlTextReading object(implemetation of SAX Parser) to load data in tables, but I want to keep this loading process within the limits of DBAs.
I am stuck. Can someone guide me through the situation?
At first I've got a Flat File Source and then Script Component Task and then OleDb Destination, linked among them by arrows, of course. When I run the SSIS all of them is successfully executed except the last task. Why? I don't know but it isn't awared of nothing.
649 rows are passed to Script Component from the file but they aren't going to my Sql table.
I have an xls spreadsheet with multiple worksheets. From each worksheet I need to load the data into sql server. The data I need to retrieve from each worksheet is not row by row, but cell by cell. For example, I need to load data from cells C44, G65, I23, A78, etc. in all the worksheets. Is this possible using SSIS?
I'm populating a table (B) in SQL Server from a Staging table (A) using a stored procedure.At any point of time, the Staging table holds 60 months' old data.In the first load of the destination table B, I get 13 months of old data whereas for every subsequent load,I need to load the data for the most recent month and delete data for the 1st(oldest) month. For example, if the load procedure runs on December 02,2006, it should pick data for the month of November,2006 from the Staging table and delete data for the 1st month.
I have a column DATA_MONTH_KEY in table B which maps to the column DATA_MONTH in my staging table A. I get the data for the first 13 months using:
(B.DATA_MONTH_KEY BETWEEN ( DATEADD(month,-13,@startdate)) AND @startdate) where startdate is the current date on which the procedure for populating table B is run. I get the value of startdate from a function.
How do i get data for the most recent months and delete the oldest month in subsequent loads?
Hello,I'm trying to load a lot of data into SQL Server Express database, using SQL Management Express. How can I import data? Because Express doesn't come with BIDS (new DTS), I can't create a package to do the import. How can I do it?Thanks.
I'm having a very irritating time trying to migrate data from a COBOL system to SQL Server.
One of the A/R Master files has approx. 200 columns.
I can export this file any number of ways that will (sometimes) load partially into my database, but always when the load succeeds, columns 75 through N simply contain NULL, even though there is data in the file. When the load fails in DTS, the error is always missing column delimiter. Using BULK INSERT the error is always something like data too long at column 75.
Putting all this together, I have deduced that something isn't working if I try to load a staging table with more than 74 columns of data. This seems like a way-too-low threshold for a robust database, especially since SQL Server is supposed to be able to handle up to 1,024 columns per table.
CREATE TABLE #Source ( Id int identity(1,1) ,categoryint ,Leaf_Node_code varchar(10) -- ,Level1_Name varchar(20) ,Level2_Name varchar(20)
[Code] ....
Here category 1 has 3 levels ,
category 2 has 4 levels , category 3 has 5 levels ,
Below is the target table, here Leaf_Node_code should populate to only for leaf nodes for each category .. Need to populate Node_id with hierarchical data
I am unable frame a sql query to handle different levels , in future #Source may have more levels .
How to handle multiple hierarchy levels .. here only leaf node should have Leaf_Node_code
CREATE TABLE TARGET_TABLE ( ID INT IDENTITY(1,1) primary key ,Node_id HIERARCHYID ,category int ,Parent_id int references TARGET_TABLE(id) ,Leaf_Node_code varchar(10) ,Namevarchar(20) )