SQL 2012 :: Create Script That Will Import Large XML Files?
Jul 28, 2014
I need to create script that will import large XML files (500 - 7GB) on a daily basis and store the data in a relational db structure.
What is the best and fastest way of importing such files. I have played around with smaller files and found the following.
1. SSIS XML Data Source: It doesn't seem to like the complex elements types and throws out the file.
2. Using Bulk File Import, sorting the file in XML variable and using XQuery to parse the file: This works but it can't take a file more than 2GB in size, so I can't use this method.
3. C# + XML Serialization: This also works, but seems to be terribly slow. I open the DB connection once, so it doesn't open and close for each db call, but still seems like it takes a long time.
how to import large XML quickly in a relational table structure?
I created a SSIS solution for reading data from dbase and storing them in SQL Server. In a ForEachDirectory-Loop up to one thousand dbase files are read and stored. The system where the packages are running has 16 GB RAM. For the first few hundred dbase files everything goes fine, but then, the RAM seems not to suffice any more and a temp file is created (I changed the path in BufferTempStoragePath).
How can it be that there is a need to create temp files if there is so much RAM available? Why is the RAM filled more and more during the SSIS package execution? Is there anything I can do to release some of it? (it is running in a loop and there is no need to store all the data) Could it be caused by dbase?? (I use Microsoft Jet 4.0 OLE DB Provider)
Another thing is that the temp file is not stored in the path I set in BufferTempStoragePath. There are sufficient permissions set, but temp file is still created in user temp folder...
I have a master table containing details of over 800000 surveys made up of approximately 400 distinct document names and versions. Each document can have as few as 10 questions but as many as 150. Each question represents one row.
My challenge is to create a separate spreadsheet for each of the 400 distinct document names and versions containing all the rows and columns present in the master table. The largest number of rows would be around 150 and therefore each spreadsheet will not be very big.
e.g. in my sample data below, i will need to create individual Excel files named as follows . . . "Document1Version1.xlsx" containing all the column names and 6 rows for the 6 questions relating to Document 1 version 1 "Document1Version2.xlsx" containing all the column names and 8 rows for the 8 questions relating to Document 1 version 2 "Document2Version1.xlsx" containing all the column names and 4 rows for the 4 questions relating to Document 2 version 1
I assume that one of the first things is to create a lookup of the distinct document names and versions assign some variables and then use this lookup to loop through and sequentially filter the master table data ready for creating the individual Excel files.
--CREATE TEMP TABLE FOR EXAMPLE
IF OBJECT_ID('tempdb..#excelTest') IS NOT NULL DROP TABLE #excelTest CREATE TABLE #excelTest ( [rowID] [nvarchar](10) NULL, [docName] [nvarchar](50) NULL,
I need to create a Clustered Index (CI) on a very large SQL Server 2012 database table. This table has about approximately 10 billion rows, 500 GB in size. The job ran for about 20 hours into it and then fails with error: "Out of disk space in tempdb". My tempDB size is 1.8TB, but yet it's still not enough.
Here is my script:
CREATE CLUSTERED INDEX CI_IndexName ON TableName(Column1,Column2) WITH (MAXDOP= 4, ONLINE=ON, SORT_IN_TEMPDB = ON, DATA_COMPRESSION=PAGE) ON sh_WeekDT(Day_DT) GO
Other than right-clicking on each individual table in SSMS and generating a CREATE script, is there a simple way to generate CREATE TABLE scripts for tables within a given database?
Background: I have a bunch of tables in one database, and I would like to add tables to a second database that have the same names and basic structures of some of the tables from the first database.
I do not need to transfer any data from the tables, this is a seperate project that will use a similar data structure. I just want to generate the CREATE TABLE scripts for 30ish tables within the first database, and then I'll tweak the scripts as appropriate and run them against the new database.
I was in the process of creating additional TempDB.ndf files, and received an error saying they already exist. I checked the location and it was empty, nothing to see here. So I looked in sys.master_files and there are several tempdb files listed in various locations, all of which come up empty.
So the files are listed as online in sys.master_files, but they do not exist on the server. I restarted SQL services but it did not change anything.
Brief overview...Running SQL Server 2003 Server Enterprise 64 bit - All Service Packs and patches current SQL Server 2005 Enterprise Edition 64 bit Build Microsoft SQL Server 2005 - 9.00.3054.00 (X64) Mar 23 2007 18:41:50 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)
I cannot import any SSIS packages nor crete any new folders under stored packages. I hve googled the news groups and looked at BOL to no avail. HELP!!!!
Importing data from an Access database, I cannot overcome the limit of 1,000 records. In DTS, I "copy one or more tables", select tables, run, and cannot see my 1,052 entries. Where can I set a max size of ~1,500 in my sql target base?
I am trying to import data into SQL Server 7. The table will be 700-800 columns, and the data will be about 150,000 records at a time. The data source is flat file.
First I create the table using a database schema, and secondly I would like to populate the table. The problem is that most of the data is numeric, and to be used for statistical analysis.
So far I have tried Bulk Insert, bcp, and dts. DTS is the only method that has worked in any way, shape or form, but that requires importing each column as a Varchar. Importing to my pre-created table doesn't work, because it is interpreting some of the source columns as character data and refusing to insert them into an int field. Bulk Insert and bcp both give error messages, and I am wondering if that is because of the size of the insert statement that is required to handle so many fields.
For the moment I am just trying to import the data in any way, but eventually, it will have to be run as an automated process, with the table structure probably needing to be altered as well.
Any help/suggestions would be very greatfully received.
I have inherited some databases whith extremely large Log files. I tried the truncate transaction log but did not work. Can some body please tell me how to truncate these log files.
I have many, many Access databases that are roughly 1.5GB-3GBs each and they have millions of records. Each MS Access Database file corresponds to one Database in SQL server. I'm trying to simply transform the data as it is in Access to MS SQL 2005.
I'm using the 64 bit version of Windows Server 2003 and the 64 bit version of SQL 2005. The server is running four dual core AMD Operton processors and has 8GB of RAM with a 1TB RAID 5 configuration. I think the hardware should be sufficient but the SQL Server Import and Export Wizard can't seem to handle the large number of tables/records. If I do one table at a time, it works well; however, it produces the following error message whenever I try to import the entire database:
Pre-execute (Error) Messages Error 0xc0202009: {5A5BF7AD-E86B-4316-AD43-1912358C56F4}: An OLE DB error has occurred. Error code: 0x80004005. An OLE DB record is available. Source: "Microsoft JET Database Engine" Hresult: 0x80004005 Description: "Unspecified error". (SQL Server Import and Export Wizard)
Error 0xc020801c: Data Flow Task: The AcquireConnection method call to the connection manager "SourceConnectionOLEDB" failed with error code 0xC0202009. (SQL Server Import and Export Wizard)
Error 0xc004701a: Data Flow Task: component "Source 64 - District Corporal Punishment Class" (5743) failed the pre-execute phase and returned error code 0xC020801C. (SQL Server Import and Export Wizard)
Any ideas would be much appreciated! Thank you, Cody
This is a general question on the best way to import a large amount of datato a MS-SQL DB.I can have the data in just about any format I need to, I just don't knowhow to import the data. I some experience with SQL but not much.There is about 1500 to 2000 lines of data. I am looking for the best way toget this amount of data in on a monthly basis.Any help is greatly thanked!!Mike Charney
I am making a SSIS package that imports data from a application using a custom ODBC driver. The field in the application is set to be a "longvarchar" type field and can be from 2 characters to 2MB of data.
I've created a ODBC data connection in the SSIS package and use a "DataReader Source" to read the data I need. The sql statement is very simple
Select log from tablename When I try to run the SSIS package with that statement it just goes to yellow on the DataReader Source and stops. It stays like that until I stop it. If I select other fields except for that field it works fine. Also I've been able to get it to succeed getting the log field if I select a log record that's not too big. The largest one I've been able to get is 800 characters, but I got one with 2500 characters that just stops on yellow.
In the Progress log the last line says:
[DTS.Pipeline] Information: Execute phase is beginning. Does anyone have any ideas on how to resolve this?
I have some Large flat fiiles that I need to export to my SQL Server database. The file sizes range from 16 MB to 116 MB. I've tried to save the files to an excel sread sheet and then export them in that format, but that didn't work. does anyone have any suggestions?
i have a few tables using Sql Server 2005 Express. currently they are holding roughly 30-40k records in them. i have my log files set at restricted growth to 90 megs. while im not close to reaching that, i would like my tables to be able to scale up to possibly millions of records. based on that, i figure the transaction log file will prolly need to have a higher threshold (unrestricted growth). for those with experience, for tables that have millions of records, what are the average size log files i could expect. is it a bad idea to just shrink the log file every night during off peak hours so that regardless of the amount of records i have, ill always start the day with a minimal log file? do large log files have any effect on SQL performance?
We have SQL Server running on a Windows 2003 server, only because Backup Exec requires it. AT the location : C:Program FilesMicrosoft SQL ServerMSSQLData there is this file: SuperVISorNet_log.LDF which is 15 Gb and is accessed daily. I apologize because I don't know what this is!
My question is: can this file be 'pruned' (for want of a better word) because it's taking up a lot of backup space.
I am trying to run a query that deletes duplicates records on a table with 24m records. The problem is each time I run it the log file fills up and I get an error saying the log file is full. For this reason the query never ends.
Is there anyway to turn of logging when running a query?
I think it also has to do with disk drive runng out of space as the log file is growing to over 12gb.
I am currently importing (and exporting) binary flat files to and from Db fields using the TEXTPTR and UPDATETEXT (or READTEXT for export) functions. This allows me to fetch/send the data in manageable packet sizes without the need to load complete files into RAM first.
Given that some files can be up to 1Gb in size I am keen to find out a new way of doing this since the announcement that TEXTPTR, READTEXT and UPDATETEXT are going to be removed from T-SQL.
I had a quick foray into SSIS but couldn't find anything suitable which brings me back to T-SQL. If anyone knows a nice elegant way of doing this and is prepared to share, that would be grand.
Hello, I have decided to use Linq for my current ASP.NET project and so far it has been good, but now I am implementing a system that will allow users to upload binary content such as pictures and videos. For ease of management and security, I have decided to store this content directly in the database. The performance hit is a minor concern because very few user-uploaded images/videos will be seen on any given page (usually just one). From the limited tutorials I have seen on the internet, Linq supports the SQL Server varbinary column through its System.Linq.Binary class. This class does not appear to support STREAMS and instead opts to load all of the contents into memory. This content can then be converted to an array of bytes, which can then be output to the browser via the response stream. This is not good. What if I am sending a video that is very large? Varbinary supports up to 2 GB. I can't have a 2 GB video sitting in memory. It makes a lot more sense to stream it via a small buffer. Obviously, I am going to limit the size of the content that users can upload, but the core problem remains. If I limit content size to 2 MB and I have 2 GB of memory on the server, then I can only serve 1000 users concurrently. In reality, that number would be much less because of other processes running on the server. Is there no way to stream data from a varbinary column with Linq using a small buffer of bytes? Do I need to implement some custom logic on my Linq classes? Since these classes are automatically generated, how would I do such a thing? Thanks.
how do i insert a large chunk of text into a table column. my project is to build a news website. where people can go and read news articles. the articles are provided by the author in word format, so how do i insert that news article into the table's column? any help would be appreciated
It works remotely if I run it via command prompt. But when I add this to a TSQL job on my remote SQL instance, it runs without deleting anything. What I'm missing?
I have a table that I'm inserting a file into and using the Image data type to store the binary object. Now the code below works fine for files around 1.5 MB, but anything larger and it's like the code won't even execute and I get a Page Not found error. I'm in the process of running some traces to find out what's going on in the backend, but I'm assuming there's something amiss with my code. The Image data type should handle files that size with no problem but for some reason it isn't. Does anyone see anything wrong? Thanks Dim iLength As Integer = CType(File1.PostedFile.InputStream.Length, Integer) If iLength = 0 Then Exit Sub 'not a valid file Dim sContentType As String = File1.PostedFile.ContentType Dim sFileName As String, i As Integer Dim bytContent As Byte() ReDim bytContent(iLength) 'byte array, set to file size
'strip the path off the filename i = InStrRev(File1.PostedFile.FileName.Trim, "") If i = 0 Then sFileName = File1.PostedFile.FileName.Trim Else sFileName = Right(File1.PostedFile.FileName.Trim, Len(File1.PostedFile.FileName.Trim) - i) End If conn = New SqlConnection(eco) conn.Open() cmd = New SqlCommand("INSERT INTO ECO_Attachments (ECOID, FromType, DocName,OldRev,NewRev,NtLogin,DisplayName, FileName, FileSize, FileData, ContentType) VALUES (@ECOID, @FromType,@DocName,@OldRev,@NewRev,@NtLogin,@DisplayName, @FileName, @FileSize, @FileData, @ContentType) ") cmd.Connection = conn Try File1.PostedFile.InputStream.Read(bytContent, 0, iLength) With cmd .Parameters.Add("@ECOID", SqlDbType.Int) .Parameters.Add("@FromType", SqlDbType.NVarChar, 50) .Parameters.Add("@DocName", SqlDbType.NVarChar, 250) .Parameters.Add("@OldRev", SqlDbType.NVarChar, 50) .Parameters.Add("@NewRev", SqlDbType.NVarChar, 50) .Parameters.Add("@NTLogin", SqlDbType.NVarChar, 100) .Parameters.Add("@DisplayName", SqlDbType.NVarChar, 200) .Parameters.Add("@FileName", SqlDbType.NVarChar, 255) .Parameters.Add("@FileSize", SqlDbType.Real) .Parameters.Add("@FileData", SqlDbType.Image) .Parameters.Add("@ContentType", SqlDbType.NVarChar, 50) .Parameters("@ECOID").Value = ECOID .Parameters("@FromType").Value = From .Parameters("@DocName").Value = DocName .Parameters("@OldRev").Value = OldRev .Parameters("@NewRev").Value = NewRev .Parameters("@NTLogin").Value = NTLogon .Parameters("@DisplayName").Value = DisplayName .Parameters("@FileName").Value = sFileName .Parameters("@FileSize").Value = iLength .Parameters("@FileData").Value = bytContent .Parameters("@ContentType").Value = sContentType .ExecuteNonQuery() '.ExecuteScalar() End With Catch ex As Exception Response.Write(ex) 'Handle your database error here conn.Close() End Try
Here's my delema, I have a file that's 308 bytes wide by 5.7 million records. The record length is fixed and the position and width of the known within the record. When I run DTS I recieve this error Msg MS DTS flat file provide and Err Diesdription: error creating file mapping view: not enough storage is available to process this command. Then when I try to continue with the wizard, it will not allow me to separate the data into the format that I need. Is there any other way to import this file using DTS?
Hi my data files sit in the default directories and I think they are causing my partition to run out of space. I mainly use one db that I created but don't use the others (ie master, model, tempdb, etc). Yet I see their MDF and LDF files are growing. What can I do to shrink them down or perhaps move them off to a larger partition after shrinking?
Hi€¦ During my web search looking for a solution I ran across SQL CE 3.5 articles. My questions about SQL CE 3.5 are: 1) Can SQL CE 3.5 handle a 4 €“ 6 GB file - Read - Parse (SQL) 2) Can SQL CE 3.5 act as a standalone client that a user can view a large (4-6 GB) text file? - Will I need a .NET (small) client to read the large (4-6 GB) text file? More info: The text file will reside on the machine where the SQL CE 3.5 is installed. There is no pull to get the data.
What is the easiest way to get a large fixed width text file (200 columns) defintion into SSIS? To have to define each column with the ruler would be very cumbersome.
I have several databases that have grown to 300 GB and would like to distribute the data into multiple files across multiple drives. Can I create a new database that is spread across the new drives and use a full backup to restore or am I stuck with unloading the data table by table?
I am attempting to restore the database from within VB.NET application I am making the following 3 calls:
RESTORE FileListOnly FROM DISK = 'C:MyDatabase.dat'
USE Master RESTORE DATABASE MyDatabase FROM DISK = 'C:MyDatabase.dat' WITH NORECOVERY, MOVE 'MyDatabase' TO 'C:Program FilesMicrosoft SQL ServerMSSQLDataMyDatabase.mdf', MOVE 'MyDatabase_log' TO 'C:Program FilesMicrosoft SQL ServerMSSQLDataLDFMyDatabase.ldf', REPLACE
RESTORE DATABASE MyDatabase FROM DISK = 'C:MyDatabase.dat'
using SMO. This logic works fine with small *.dat files, however when using *.dat file of about 4Gb I get an error on the 3d restore database call:
ExecuteNonQuery failed for Database 'master'.
An exception occurred while executing a Transact-SQL statement or batch.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Operator aborted backup or restore. See the error messages returned to the console for more details.
ExecuteNonQuery failed for Database 'master'.
An exception occurred while executing a Transact-SQL statement or batch.
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Operator aborted backup or restore. See the error messages returned to the console for more details.
The same program/logic also works fine when I use MS SQL 2005 and it runs fine from MS SQL 2005 Query Analyzer for both 2005 and 2000 databases. There seem to be only problem with MS SQL 2000 from within VB.NET. Anybody has any idea? I'd appreciate any response. Thanks
This is my first time posting here, I hope this questions has not been asked before. I tried to search for it but I came not with nothing.
Recreating the error :
I am using VS2005. I created a Pocket PC 2003 project. I have downloaded the SQL Server Compact Edition and installed it. I get the System.Data.SqlServerCe.dll file from the installation directory. I reference to that DLL using Add Reference in VS2005.
Build it. In the Bin folder, a long list of files suddenly appears.
System.data.dll System.data.oracleClient.dll system.web.dll system.enterpriseservices.dll system.enterpriseservices.wrapper.dll system.transactions.dll and the rest of your original files in Bin
The worst of it all, all of these files are deployed into the Emulator! Causing it to run out of memory and unable to deploy.
Something is not right here, I just cannot figure it out! If this happens, each mobile devices can hold one applications. Thats not the way it should be, right?
If you have solved this before, do help. I am at my wits end at the moment.
SQL Server 7/2000: We have reasonably large tables (3,000,000 rows)that we need to add some indexes for. In a test, it took over 12 hoursto CREATE a new INDEX against this table. One of us suggested that wecreate a temp table with the new index and copy the data from the oldtable into the new one, then rename it. I understand this took 15minutes. Why the heck would it be faster to move the data and buildmultiple indexes incrementally vs adding an index??
We are processing 60,00, 000 rows(2 GB file) available in a flat file and loading them in to a database tables using OLEDB Destination components. In the data pipeline of an SSIS package we have 1 flat file source reader, 7 look up components(full cache mode), 1 multicast component and 2 OLE DB destinations with fast load option.
We have observed that first 10,00, 000 rows are processed and loaded in to target tables in just 4 minutes time. The second set of 10,00, 000 rows are processed in 15 minutes time. After this for processing each 1,00,000 rows SSIS is taking approximately 8 - 10 minutes time. We are not able to identify the reasons for the unexpected behaviour of SSIS.
We thought that as the input file size is 2 GB SSIS is not able to manage and slowing down over time of execution. We did split the big input file in to 60 small 37 MB (approx) size files. Then we modified the package by adding For-Each loop task to process all the 60 small files and load them in to database server sequentially. Even in this approach also we have identified data loading has slowed down drastically after processing 13 files.
In order to verify is there any problem with reading source file or transformation, we have replaced OLEDB destinations component with Flat File destinations. With Flat file destination the time taken for processing rows is very constant. For every 8 minutes package is able to process 10,00,000 rows and write them in to the destination files. So, there is no problem with the with either Look up components or flat file source reader.
We are sure that target database server is in same state/condition from the starting to the end of package execution. The client box in which we are running the package is having 1 GB RAM. During package execution time the CPU usage is at 30 % and PF usage is 580 MB. SP1 is also installed on both Client and Server.
Does any one have clue what is causing slow down of data load over the time of package execution?