Need Suggestion On Loading A 50 Million Records Table From Oracle
Feb 16, 2006All,
I need to load a 50 million records table monthly. Any suggestion about the best/fast way to do it?
Thanks a lot
All,
I need to load a 50 million records table monthly. Any suggestion about the best/fast way to do it?
Thanks a lot
I have a requirement to delete 1 Million records from a table having 10 Million data and it's being queried on 24/7 basis (don't have a downtime). how can I achieve that?
View 13 Replies View Relatedi have a directory database with approx. 80 million records. i am feeding the database with bulk_insert. Indexing one of the fields took about 8 hrs. After indexing when i run queries with the indexed field the response time is under 1 sec. However if i run select queries with like on non-indexed fields it takes more than 2 mins. So i decided to index 4 other fields in the database and it looks like the indexing process is going to run for 2 days.
i am a novice in SQL database design and i am not sure if this is the best way to index the table. i am just using create index. Any suggestions / advice welcome.
I come from a web based world were loading 1.5 million records into a temp table is suicide. I’m doing more data warehouse stuff now and I was looking into optimizing a buddies proc and noticed he was loading 1.5 million records into a temp table. We had a discussion about it because being from a web world I was drastically against it. He on the other hand didn’t feel it was an issue being it gets called once maybe twice a day. The tempdb is set to autogrow and it is on a different drive than all the other databases on the box. It has one ldf and mdf. He’s creating an index on the table after load. Why we shouldn’t be loading 1.5 million recs into temp table?
View 5 Replies View RelatedI am trying to update a large table which consists of 45 million records , it is taking more than 2 days to the update , below is my approach
1. The table has only one clustered index and no other indexes on the table.
2. I am updating in batches say 20000 record-wise.
3. Changed the recovery mode to bulk logged and auto-growth size is set to  300MB and there is enough space in my disk for transaction log .
But still the query is running slowly.
I have a table that I need to do some computations on all the data but first I need to remove the duplicate records and insert the results into a destination table. Here's the example below. My table has 3.1 million rows. I have tried using the DISTINCT and the GROUP BY but both ways to select the data takes about half a minute to run. I'm wondering if there is a way to increase performance. Users are ok with this time since the process runs overnight but improving it won't hurt. I do have a clustered index on these fields but that doesn't seem to improve any.
SELECTDateYear ,
DateMonth ,
Nbr ,
Nbr1 ,
Nbr2 ,
Datafield1 ,
Datafield2,
[code].....
We are facing a weird scenario in which the snapshot is getting corrupted after insertupdate few million records in to a table .
SQL Server 2012
windows server 2008 R2
service pack 1
64-bit OS
Hi,
i have to load 1million rows from database( or flatfile) to the database(or flat file).
which task is used as the best solution for this?
Appreciate any assistance in this regard.
Thanks,
Das
Hi,
I have Table A . we already have 80 columns . we have to add 65 more columns.
we are populating this table from oracle .and we need to populate those 65 columns again from the same table.
Is it a better idea to add those new 65 columns to the same table or new table.
If we go for the same table then loading time will be double, If I go for new table and If i am able to run both the packages which loads table data from same oracle server to difffrent sql tables then we should be good. But if we run in to temp space issues on oracle server . Then i have to load the two tables separately which consumes the same time as earlier one.
I was thinking if there is a way in SSIS where I can pull data from same oracle table in to two diff sql tables at same time?
Data_Staging:
Unique_id
Gender
Ethnicity
Race
MCP_key
Admission_Dt
Discharge_Date
Enrollment_key
Reason
Disability
Income
Employment
I need to load the data from this table to three different tables all have foreign key relationship
Registration Table:
Registration_key ( Indetity) -PK
Unique_id
Gender
Ethnicity
Race
Episode:
Episode_Key(Identity)- PK
Registration_key (FK)
MCP_key
Admission_Dt
Discharge_Date
Assessment Table:
Assessment_Key(Identity) €“ PK
Registraion_Key(FK)
Episode_Key(FK)
Enrollment_key
Reason
Disability
Income
Employment
Hi!
I'm loading from Oracle using the OraOLEDB.Oracle.1 provider since I need unicode support and I get the following error:
TITLE: Microsoft Visual Studio
------------------------------
Error at myTask [DTS.Pipeline]: The "output column "myColumn" (9134)" has a precision that is not valid. The precision must be between 1 and 38.
------------------------------
ADDITIONAL INFORMATION:
Exception from HRESULT: 0xC0204018 (Microsoft.SqlServer.DTSPipelineWrap)
------------------------------
BUTTONS:
OK
------------------------------
For most of my queries to Oracle I can cast the columns to get rid of the error (CAST x AS DECIMAL(10) etc), but this does not work for:
1) Union
I have a select like "SELECT NVL(myColumn, 0) .... FROM myTable UNION SELECT 0 AS myColumn, .... FROM DUAL"
Even if I cast the columns in both selects (SELECT CAST(NVL(myColumn, 0) AS DECIMAL(10, 0) .... UNION SELECT CAST(0 AS DECIMAL(10, 0)) AS myColumn, .... FROM DUAL) I still get the error above.
2) SQL command from variable
The select basically looks like this:
"SELECT Column1, Column2, ... FROM myTable WHERE Updated BETWEEN User::LastLoad AND User::CurrentLoad"
Again, even if I cast all columns (like in the union), I still get the same error.
Any help would be greatly appreciated. Thanks!
I am currently working on a simple page to insert 1.6 million UK postcode records into an SQL server table. The table has three columns for the postcode, longditude coordinate and lattitude coordinate. The data is sourced from a pipe (|) delimited txt file and inserted into the database using a FOR loop. The problem I have is that the page will hang after inserting only 10,000 records, the page displays either an invalid View State error or a page cannot be found error.
Now I assume the viewstate error stems from the fact that there is a form on the page which simply contains a button to execute the script and a few labels to show the progress. But without the form and associated viewstate the insert still fails to complete.... any ideas?? Would I be better running this on a thread or should I just do it in stages and be patient. I have now modified the page to read the database on load and pick up from where it crashes?
Meg writes "Hi,
I have a table that has 4+ million records. I need to update those records. I am facing some performance issue. Can someone please advice?
update stage
set batch_status = 1
where update_status = 0
Update transaction
Set aId = s.aId,
b = s.b,
from stage s
Where s.aId = transaction.aId
and s.batch_status = 1
Update stage
Set update_status = 1,
batch_status = 2
where
batch_status = 1
When I run the above query with "set rowcount 1000", it runs in one minute. When I run the query for "set rowcount 10000", it runs in 1 hour 56 minutes. Can someone help me to optimize it?
Thanks.
Meg"
Hey folks...So I have a table that looks like this:CREATE TABLE [tblStation] ([CAMPAIGN] [varchar] (8),[LISTNUM] [varchar] (10),[PHONE] [varchar] (10),[EVENTTIME] [datetime] ,[STATION] [int],[OPERATOR] [varchar] (16),[EVENTCODE] [varchar],[CALLSPAN] [decimal](18, 0),[FDISP] [int],[RECORDNUM] [varchar],[STC] [varchar],[PROMOC] [varchar],[EXP_CAMP] [varchar],[PROMO3] [varchar],[MAXATT] [char],[LISTNAME] [varchar],[SITENAME] [char],[Row_id] [int] IDENTITYIt's taking nine seconds to run the following command:SELECT count([fdisp])FROM [TrunkFiles_new].[dbo].[tblStation] WITH (NOLOCK)WHERE fdisp IS NULLAnyone familiar with a table of this size having performance likethis? The [fdisp] column has a non clustered index on it.Thanks in advance...
View 1 Replies View RelatedHow well SQL Server can support 300 million records...
Any body is working on big database like this. can anyone give me some input on this. it's going to be 60GB database size.
Hello,
What is the fastest way to update 20million records in our database.
I have tried to do a simple update statement like this:
update trail_log with (tablockx, holdlock)
set trail_log .entry_by = users.user_identity
from users
where trail_log.entry_by = users.user_id
but it take 10 plus hours to run since it cannot commit the transactions until the very end. So was was thinking that I need to commit in batch like after 50K but that is slow as well.
Set rowcount 50000
Declare @rc int
Set @rc=50000
While @rc=50000
Begin
Begin Transaction
update trail_log With (tablockx, holdlock)
set trail_log.entry_by = users.user_identity
from users
where trail_log.entry_by = users.user_id
and trail_log.entry_by not like '%[0-9]%'
Select @rc=@@rowcount
--Commit the transaction
Commit
End
go
I have let the above statement run for 1.5 hours and it only update 450000 rows. Any ideas...
Maybe I'm doing it wrong. Please Help!!
Hi all,
I have a sql script that updates records in a table with 40 million records.
There is some functionality in the script that could be put away in functions for code reuse/elegance.
Functions would cause execution overhead.
What else could I use besides functions that would allow me the code reuse and not compromise the execution over head? Is there any thing like includes in TSQL that would allow me to do so?
TIA..
Hi
I have a new client with an existing system that has just over 2 million business listings in one table. Each business listing is associated with one business category.
* Company Table (around 20 fields):
companyID
companyName
categoryID
state
postCode
etc.
* Category Table (5 fields)
categoryID
categoryName
etc.
We are using MSSQL 2005 Express Edition with Advanced Services
A free text search needs to be performed on the companyName and categoryName limited by region (state and or postcode).
1) What kind of response times should I expect for the free text search (I have not used the free text search before)
2) How should I index the companyName and categoryName so they are both used in a joined query? i.e. Do I just configure the free text search index on each field separately and it should work?
Any suggestions appreciated.
Best Regards
Kevan
I want to compare ONLY 1 Column values from 2 tables having more than 4.9 million records. There is a difference of 4000 rows between the 2 tables.
SELECT ID From TABLE1 where ID not in (SELECT DISTINCT ID From TABLE2)
My above query took nearly 4.5 hours to run and I had to cancel it. Is there a better way to write the query . I just want to compare the ID - column values which are missing in TABLE2
Hi
I have 2 tables with more then million records in each and I have to perform full outer join.
The problem is that the join clause contains 2 different parameters (int and string) like this:
Select *
From a full outer join b
On a.cli = b.cli OR a.reference = b.reference
Because of the OR in the clause and the million records the query is infinite. If I change to one rule only then it works fine.
How can I join these 2 big tables with 2 rules?
Thanks
Itay
I have tried to process > 3 million Fuzzy grouping records on two different servers with no success. 3 mill works but anything above 4 mill doesn't. Some background:
We are trying to de-dup our customer table on: name (.5 min), address1 (.5 min), city (.5 min), state (exact). .8 overall record min score.
Output includes additional fields: customerid, sourceid, address2, country, phonenumber
Without SP1 installed I couldn't even get a few hundred thousand records to process
Two different servers - same problems. Note that SSIS and SQL Server are running locally on both
The higher end server has 4GB RAM, the other 2.5 GB RAM. Plenty of free disk space on both
SQL Server is configured to use 2 GB of RAM max
The page file is currently at 15GB
After running a number of test on both servers trying different batch sizes etc. the one thing I noticed is that it seems to always error out when SSIS takes over and starts chewing up all the available RAM. This happens after the index is created and SSIS starts "warming caches". On both servers SQL Server uses up about 1.6GB of RAM at this point while SSIS keeps taking over RAM until all physical RAM is used up.
Some questions:
Has anyone been able to process more then 3 million records and if so what is your hardware configuration?
Should we try running SSIS from a different server so it has access to the full amount of physical RAM? (so it doesn't have to fight for RAM with SQL Server)
Should we install Win 2003 Enterprise Server so we can add more RAM?
Any ideas why switching to the page file might be causing errors?
Thanks!!
Keith Doyle
I have 2 tables with this schema
CREATE TABLE tableValues(
[LASTENCRYPTIONDT] [datetime] NULL,
[ENCRYPTIONID] [int] NULL,
[NAME] [varchar](50) NULL
[Code] ....
I want to update tableToUpdate in batches of 5000 per batch and set the lastenecryptionDT to null based on the the join to the tableValues using the column ENCRYPTIONID, and also output updated rows into another table. Incase I would need to do a rollback.
i have table below
CREATE TABLE [dbo].[DR_Test](
[source_item_id] [int] NOT NULL,
[source_line_no] [int] NULL,
[buyer_id] [int] NOT NULL,
[seller_member_id] [int] NULL,
[code]...
the table contains more than 80 million records so when i fetch the data using buyer_id & timezone its taking lot of more than 1 hours or so....& where buyer_id is not unique.how to fetch the data fast or need to change the structure of the table
I have 1+ CSV files (using a foreach loop) which I'm doing a lot of transform work on and then inserting into a SQL database table.
Each CSV file usually contains about 2 days worth of data (contains date stamps) - somewhere in the region of 60k records per day.
The destination table currently contains 3 million+ rows and will get bigger.
I need to make sure that before inserting into the destination table, the data doesn't already exist.
I've read the following article: http://www.sqlis.com/311.aspx
While the lookup method works, it takes ages and eats up memory as it caches the 3m+ records before running for each CSV. Obviously this will only get worse as the table grows in size.
To make things a little more efficient what I'd like to do, is first derive the dates I'm dealing with in the current file - essentially storing the max(date) and min(date) in variables. Then in the lookup SQL use those vars, to reduce the amount of data that needs to be brought into the transformation to check against before inserting into the destination table.
Lookup SQL eg. SELECT * FROM MyTable WHERE Date BETWEEN varMinDate AND varMaxDate.
Ideally I'd use an aggregate transformation and then use the subsequent output from that either in the lookup query or store the output in vars, but I don't think you can do that and I get the feeling I'm approaching this with the wrong mindset.
Any thoughts would be great!
I have a pretty simple SSIS package that fast loads a 100 million record table into a SQL Server 2008 table on a daily basis. This normally runs fine and completes in about 1 hour. As this is perhaps one of our largest running SSIS packages, about once every 2-3 weeks this SSIS will fail/drop connection. Once it fails, the large number of records will start rolling back. This rollback process can take 1+ hours so I cannot even restart the failed SSIS package immediately. This is a problem.
I am looking for a solution or option so I do not have to wait on that rollback to restart this particular, long running SSIS package. Is there an option/setting to leave the partial data set committed and not rollback? Then I could just restart the SSIS package immediately or set it the SSIS to auto-restart 1 time on failure. The first step in the SSIS does a truncate of the destination table.
Well, I have a SSIS package loading data from SSIS raw file without any transformations.
My concern is its taking hell lots of time to load the data (simple source à target )
How can I improve the load speed when my target is oracle?
Its talking 9 min to load 67000 records.
&
I have other tables which have 2-3mm records
I€™m very much concerned about the performance.
Please help me out.
Thank you
Hi All,
We are considering loading oracle datawarehouse using SSIS as ETL. I would like to know the experiences of the team in doing the same. Please share your experiences on this.
Thanks,
S Suresh
Ok, we have built a data mart using SSIS etc...for transformations and loading.
Our biggest single problem we have currently is loading data from an Oracle server to our SQL server. Some tables from oracle run fine when retrieving the data but there is one particular table that just doesn't load fast enough (9 million records take over 12 hours). It seems that we are idling alot and its not always running.
Can anyone help with this problem?
Hello.
I'm trying to put in an SQL server database some data extracted from Oracle Server 9i.
During the load process, the "OLE DB Destination" in the task chokes up when it finds a record containing the date '0197-01-01 00:00:00' (i got this by putting the error output to a column of type varchar(50)). I can't use the Condicional Split's date functions to filter this out because they also choke on the strange date.
Can anyone give a sugestion?
Thanks in advance,
Hugo Oliveira
I need a little help here..I want to transfer ONLY new records AND update any modified recordsfrom Oracle into SQL Server using DTS. How should I go about it?a) how do I use global variable to get max date.Where and what DTS task should I use to complete the job? Data DrivenQuery? Transform data task? How ? can u give me samples. Perhaps youcan email me the Demo Package as well.b) so far, what I did was,- I have datemodified field in my Oracle table so that I can comparewith datelastrun of my DTS package to get new records- records in Oracle having datemodified >Max(datelastrun), and transferto SQL Server table.Now, I am stuck as to where should I proceed - how can I transfer theserecords?Hope u can give me some lights. Thank you in advance.
View 2 Replies View RelatedHi.I need to give my customer an sql file that they can run in query analyzer.All the stuff they need to run is in a set of existing files.I'd like to just tell them to load this file (this is oracle syntax):@file1.sql@file2.sql@file3.sqlis there some way of calling these files (that are in the same dir) from amaster sql file?ThanksJeff Kish
View 2 Replies View RelatedHello sir
I just started working in SQL Server so please don't mind when my question is irritating please help me out for this
I have a table having a coloumns
Month-----------------Size----------------------Issue
April-07---------------750----------------------2676
April-07--------------1000--------------------1223
April-07--------------180---------------------3439
April-07--------------90---------------------23562
May-07---------------750----------------------254
May-07--------------375--------------------454
May-07--------------180---------------------454
May-07--------------90---------------------3434
Something like that it is just a example o my table I have data in Thousands
So know what I want to do is I want to add one more coloumn where i want to use if statment something like that
Issue_cases: IIf([size]=1000,[Issue]/9,IIf([size]=750,[Issue]/12,IIf([size]=375,[Issue]/24,IIf([size]=180,[Issue]/48,IIf([size]=90,[Issue]/100)))))
now you please tell me that for this what I have to do I have to create a view or procedure or anything else and how?
Please tell me
I thing I explain my question best
I required your help immediately
Thanks
Ever Smiling
Ashish
You Have to Loss Many Times to Win Single Time
We have a table with 16 Million records, and also this table is replicated.
We want to add a new column in to this table for some reason?