Fastest Way To Deduplicate A List
Oct 21, 2007
Im trying to dedupe a table with only one field on it. The table has
40 million records in it. What is the fastest way?
1) create a table with a unque constraint on it insert into that
table?
2) create a table without a unique constraint on it and use insert
into table select distinct un from table2?
3) another way?
Michael
View 1 Replies
ADVERTISEMENT
May 14, 2008
right now I have a stored procedure that goes through each of the Line and Body fields using a cursor. The problem is that this method is very slow. How would you experts solve this problem? any Hints or suggestions?
BEFORE
EXAMPLEPartLineBodySeriesEngineYear
11234A,BWETC1998
25678991,93,94,95WET01997
3345656S,R5,6,12WENC1995
AFTER
EXAMPLEPartLineBodySeriesEngineYear
11234AWETC1998
11234BWETC1998
25678991WET01997
25678993WET01997
25678994WET01997
25678995WET01997
3345656S5WENC1995
3345656S6WENC1995
3345656S12WENC1995
3345656R5WENC1995
3345656R6WENC1995
3345656R12WENC1995
View 4 Replies
View Related
Sep 29, 2000
Hi,
In my SQL server 7.0, I have got 250 store procedures in each database.
Before using them for my application, I want to ecyption all.
I must add "WITH ENCRYPTION" string in each SP in all database and it'll take me a long time. Is there fastest way to encryption all SPs in all DBs? Have anyone got an utility SP ( or anyway else) to do this?
Thanks in advance.
View 1 Replies
View Related
Mar 10, 2008
What is the fast way a stored procedure can copy a table from a linked server?
I would like to tune this statement, possibly with hints or other logging options. Assume that table_A and table_B have the exact table structure and that I want to preserve table_A and all its indexes and contraints. The table will be truncated before this load, if that helps in any way.
insert into table_A select * from OpenQuery(Server,'select * from Table_B')
TIA, Mike
View 4 Replies
View Related
Jul 20, 2005
In relation to my last post, I have a question for the SQL-gurus.I need to update 70k records, and mark all those updated in a specialcolumn for further processing by another system.So, if the record wasKey1, foo, foo, ""it needs to becomeKey1, fap, fap, "U"iff and only iff the datavalues are actually different (as above, foobecomes fap),otherwise it must becomeKey1, foo,foo, ""Is it quicker to :1) get the row of the destination table, inspect all valuesprogramatically, and determine IF an update query is neededOR2) just do a update on all rows, but addingand (field1 <> value1 or field2<>value2) to the update querythat isupdate myTablesetfield1 = "foo"markField="u"where key="mykey" and (field1 <> foo)The first one will not generate new update queries if the record hasnot changed, on account of doing a select, whereas the second versionalways runs an update, but some of them will not affect any lines.Will I need a full index on the second version?Thanks in advance,Asger Henriksen
View 2 Replies
View Related
Dec 26, 2000
Hi,
1)I need to transfer 500 gb of data from one server to other, which is faster, DTS/BCP/Restore.
2)Which are the best methods for checking blocking, dead locks & Indexes!
Thanks you all in advance
Richard..
View 5 Replies
View Related
Nov 5, 2004
I have a master table which has demographic data such as name, dob, location along with a primary key id. It will have about 10-12000 records. We get a refresh file every hour which may or may not have corrections for these records hourly with about 3,000 records. I put this data into a table. This data should be considered always to be correct. To handle the update to the master table I need to create an update process. I can take one of two approaches, just update all the records in the master table regardless if they are correct or not, or do some type of left join on those that do not match (in other words, only update the ones where thae names or dob don't match) There is an underlying update trigger on the patient master which will also fire if these values are changed. An opinions on a best approach?
View 1 Replies
View Related
Feb 2, 2006
Hi,
I have a production server that has an 8Gb db. It is dual Xeon with 5x HDD - 2 mirrored and 3 striped. db on stripe, log and OS on mirror. 2x Gb network cards.
The application goes slow (ie users notice) when a backup is running so i have placed a crossover cable from one NIC to a test server so that it can back up to a HDD on that server, and then to tape. The test server has 2xGb NIC and the link between the two servers is on a seperate subnet to However, in the first trial of this the back up and verify takes 3 minutes longer.
Is this because the target server doesnt have a disk stripe?
What is the best config for the production server (ie will a slower backup but to another server be less load to contend with the application)?
thanks
Fatherjack
View 2 Replies
View Related
May 22, 2007
I've got a view that is driven from a 80 million record table in a data warehouse. I am trying to populate an aggregate table in a datamart, but am running into preformance problems. The datamart table needs to be updated daily. I understand there are many factors that effect performance, but in general would the fastest approach be:
1) Truncate the datamart table
2) Perform a bcp of the view to a text file
3) Bulk Insert to the datamart table
If you need more information to answer this please let me know.
Thanks,
Matt
View 7 Replies
View Related
Jul 23, 2005
Hi!We have Sql Server 2000 in our server (NT 4). Our database have now about+350.000 rows with information of images. Table have lot of columnsincluding information about image name, keywords, location, price, colormode etc. So our database don?t include the images itself, just a path tothe location of every image. Keywords -field have data for example likethis:cat,animal,pet,home,child with pet,child. Now our search use Full-TextSearch which sounded like good idea in the beginning but now it have hadproblems that really reduce our search engine?s performance. Also searchresults are not exact enough. Some of our images have also photographer?sname in keywords -column and if photographer?s name is, for example, PeterMoss, his pictures appears in web-page when customer want to search "moss"(nature-like) -pictures.Another problem is that Full-Text Search started to be very slow when queryresult contains thousands of rows. When search term gives maximum 3000rows, search is fast but larger searches take from 6 to 20 seconds tofinish which is not good. I have noticed also that first search is alwaysvery slow, but next ones are faster. It seems that engine is just"starting" when first query started to run.Is there better and faster way to handle the queries? Is it better torebuild the database somehow and use another method to search than Full-Text Search? I don?t know how to handle the database other way when everyimage have about 10 to even 50 different keywords to search.We have made web interface and search code with Coldfusion. ColdfusionServer then take care of sending all queries to Sql Server.I hope that somebody have some idea how to speed up our picture search.--Message posted via http://www.sqlmonster.com
View 2 Replies
View Related
Jan 10, 2008
Can anyone point to a reference which documents the pros and cons of the various connection protocols, such as Shared Memory vs. TCP/IP? I thought I saw something indicating that shared memory is fastest, which would explain why this protocol is tried first, but now I can't find it. This resource has information on creating connection strings, but not the advantages and disadvantages.
http://msdn2.microsoft.com/en-us/library/ms187662.aspx
View 7 Replies
View Related
Jan 11, 2007
Hi,
I'd like to know the fastest way to add named instances to SQL server 2005 (I need to add 5 named instances)
Thank you!
John
View 4 Replies
View Related
Nov 17, 2006
I am trying to find some info on the fastest connection transport for an app that is running on the same box as a SQL 2005 instance. The app does a large number of updates (high volume of data).. win32 using native ODBC. I am trying to find info on which connection mechanism is best... socket, pipes, etc. I read somewhere that there is a file mapped method available but i cannot find info on that either. Also, is there any performance difference between the old SQL OCBD drive and the new SQL Native Client OCBD drive?
Thanks.
View 2 Replies
View Related
Jul 30, 2007
Hi All,
Im bulk loading a ton of data into MSSQL SERVER 2005 Standard Edition. I used to do this process in version 2000. It seems there is some more overhead in 2005. Is there a way to drop logging to almost null to speed up insert?
This is my current sql statment to load data.
EXEC sp_dboption 'my_stuff', 'select into/bulkcopy', 'true'
SET ANSI_WARNINGS OFF
BULK INSERT mystuff.dbo.[v1]
FROM 'c:myfile.txt'
WITH
(
FIRSTROW = 1,
FORMATFILE = 'c:scriptsv1.fmt',
MAXERRORS=2000,
ROWS_PER_BATCH=100000
)
Thanks,
Mike
View 2 Replies
View Related
Nov 14, 2007
I'm writing a program that allows users to upload a csv file. This file is then seperated into 4 datatables based on certain criteria then each datatable is uploaded into my database. I'm essentially adding new rows to the datatables then running an update command on each using a tableadapter. The problem is that these csv files can be large and can end up with 4000+ new records being added to the database and the update commands take a while to do it. I've sat for about five minutes on one run while it updated. I put in some time variable to see where all the time is spent and it takes only seconds to parse the data and seperate into the datatables, but minutes on the update commands. Is there a more efficient way to insert this much data?
View 4 Replies
View Related
Mar 5, 2008
Hi, I have a table with one XML type column. This column holds custom field information. Its used as a way of storing ad hoc fields and data that don't fit the DB design. <?xml version="1.0"?><contact><Reference>A39390TFH</Reference><Misc>all kinds of stuff go in here</Misc></contact>I want to provide a way of displaying the data stored in this column in the same DataTable as normal relational data from the same table. I have been able to achieve this goal BUT I want to know if the community had any ideas on how I could speed the process. I am using the XML value() function. It allows me to extract the data I need. SELECT Name, Number, Reference FROM
(
SELECT Name, Number, xmlvalues.value( ' (contact/Reference)[1] ', ' varchar(40) ' ) as Reference
From MyTable
WHERE Name = 'Some Dude'
AND xmlvalues is not null
) T
GROUP BY Name, Number, Reference anyone know if there are better xml functions to get this data out of the XMLColumn?????? There is no schema, because each xml fragment has different tags and different values. Regards Niall
View 1 Replies
View Related
Mar 15, 2004
If there is 13 million records in one table and 40 thousand records in another table then what is the fastest way of joining these two tables????
This was a question to me from somebody to which i cudn't answer back properly. Cud anybody tell the answer with properreasons behind the answer??????
Thanx.
View 7 Replies
View Related
Nov 22, 1999
We have a table that we BCP into, the data is then processed and inserted into its appropriate table.
Then the table or its data needs to be removed. This seems to be a very slow operation to remove
the table or table data. I have tried drop table, and truncate table and it takes nearly as long as
the bcp operation. The table has 12 million rows. I didn't think either operation wrote to the
transaction log except for page extent management. Why is the drop and truncate so slow. Suggestions?
View 1 Replies
View Related
Mar 19, 2008
Hello,
What is the fastest way to update 20million records in our database.
I have tried to do a simple update statement like this:
update trail_log with (tablockx, holdlock)
set trail_log .entry_by = users.user_identity
from users
where trail_log.entry_by = users.user_id
but it take 10 plus hours to run since it cannot commit the transactions until the very end. So was was thinking that I need to commit in batch like after 50K but that is slow as well.
Set rowcount 50000
Declare @rc int
Set @rc=50000
While @rc=50000
Begin
Begin Transaction
update trail_log With (tablockx, holdlock)
set trail_log.entry_by = users.user_identity
from users
where trail_log.entry_by = users.user_id
and trail_log.entry_by not like '%[0-9]%'
Select @rc=@@rowcount
--Commit the transaction
Commit
End
go
I have let the above statement run for 1.5 hours and it only update 450000 rows. Any ideas...
Maybe I'm doing it wrong. Please Help!!
View 1 Replies
View Related
Jul 23, 2005
What is the fastest way to generate an SQL 2000 databaseout of SQL scripts.The SQL scripts contain the create tables, views, storedprocedures, triggers, constraints, and the tables DATArecords.What are my options? isql? osql? are there other ways?Thank you
View 6 Replies
View Related
Jul 20, 2005
I'd like to use a stored procedure to insert large amounts of recordsinto a table. My field A should be filled with a given range ofnumbers. I do the following ... but I'm sure there is a better(faster) way:select @start = max(A) from tbl where B = 'test1' and C = 'test2'while @start <= 500000begininsert into tbl (A, B, C)values (@start, 'test1', test2')set @start = @start +1endanother question is, how to prevent that another user inserts the samenumbers into the field A?Thanks a lot for any help!ratu
View 5 Replies
View Related
Jul 31, 2007
Hi All,
I have NUnit tests. on each test I create a temp db with data and drop it in the end of the test.
Since I have several connections to it during the test, I do "set single_user" on the db and then drop because I can't drop it while others are connected. ("Other" menas other connections)
This takes quite some time. tried detach but no improvements.
What would be the fastest way to drop a database?
Thanks
Avi
View 3 Replies
View Related
Jul 10, 2006
I need to load a lot of data into a sql table as fast as possible. (All data is string data, CSV file). I have read that the "fastest" way to load data is using the "Fast Load" option of the OLEDB Destination but I have also read that the "SQL Server" Destination is fast.
1. Is there a general consensus as to which way is the quicksest load?
2. The file is a CSV file. Would a fixed-format file be faster to read?
TIA,
barkingdog
View 3 Replies
View Related
Aug 13, 2015
I have a big table (> 40,000,000 rows) in SQL Server 2008 R2 database . I added new column of type int to the table. What is the fastest way to populate the column with -1 value. The table can be locked.
View 2 Replies
View Related
Mar 3, 2006
I was wondering what is the fastest way to UPDATE lots of recods. I heard the fastest way to perform lots of inserts in to use SqlCeResultSet. Would this also be the fastest way to update already existing records? If so, is this the fastest way to do that:
1. Create a SqlCeCommand object.
2. Set the CommandText to select the datat I want to update
3. Call the command object's ExecuteResultSet method to create a SqlCeResultSet object
4. Call the result set object's Read method to advance to the next record
5. Use the result set object to update the values using the SqlCeResultSet.SetValue method and the Update method.
6. repeat steps 4 and 5
Also I was wondering do call the SqlCeResultSet.Update method once per row, or just once? Also would it be possible and faster to wrap all that in a transaction?
Would parameterized updates be faster?
Any help will be appreciated.
View 3 Replies
View Related
Jan 11, 2007
Hello SQL colleagues,
I need to add 5 new named instances on a SQL Server 2005 Enterpise Edition.
Is there a fast way to do it?
Best Regards,
John
View 3 Replies
View Related
Jul 27, 2006
If I create an index on a field in SQL Server, what will be the most efficient (fastest) field type to index a field? (This field will be a "Pointer" to a child table that will contain a list of codes, and their description.)
Would a Numeric field be quicker than a VarChar field?
VarChar would make it easier for a Human to decipher the raw records. (For example, if I used a numeric the code would be 42 or 47, while the VarChar could be 'savings' or 'checking'.)
Basically I will have the following "Master" table: FieldType
---------
IDInt
NameVarChar
StatusInt -or- VarChar
Customer_TypeInt -or- VarChar
If Customer_Type is a code that can be looked up in another table, and I index that field, would I want the "Code" to be an Int or VarChar?
SQL: Select *
From Master
Where Customer_Type = <42> or <'savings'>
My Where clause would depend on the field type.
Thank you, Bryan
View 1 Replies
View Related
Mar 29, 2004
ASP.NET and MsSQL are run inside the same machine, and inside win2000 server,
and the physical memory limit of mssql is set to 192MB.
any one have any good idea(s)? please share to us here
:)
View 4 Replies
View Related
May 12, 2006
So I got 2 classes one I wrote to interrogate databases using normal ADO:Mine:SqlConnection myConnection = new SqlConnection(m_sConnectionString);SqlCommand myCommand = new SqlCommand(sQuery, myConnection);myCommand.CommandTimeout = 120; // 60 Seconds TimeoutmyConnection.Open();SqlDataReader result = myCommand.ExecuteReader(CommandBehavior.CloseConnection);return result;Microsoft WaySqlDatabase dbSvc = new SqlDatabase(m_sConnectionString);DbCommand dbCommand = dbSvc.GetSqlStringCommand(sQuery);return ((SqlDataReader)dbSvc.ExecuteReader(dbCommand));What's faster?My way:SqlConnection myConnection = new SqlConnection(m_sConnectionString);SqlCommand myCommand = new SqlCommand(sQuery, myConnection);myCommand.CommandTimeout = 120; // 60 Seconds Timeout// Use a DataTable – required for default pagingSqlDataAdapter myAdapter = new SqlDataAdapter(myCommand);DataTable myTable = new DataTable();myAdapter.Fill(myTable);myConnection.Close();myConnection.Dispose();myConnection = null;return (myTable);Microsoft Way:SqlDatabase dbSvc = new SqlDatabase(m_sConnectionString);DbCommand dbCommand = dbSvc.GetSqlStringCommand(sQuery);DataTable dtData = null;DataSet dsData = dbSvc.ExecuteDataSet(dbCommand);dtData = dsData.Tables[0];return (dtData);Comments? Ideas?Al
View 4 Replies
View Related
Mar 23, 2008
Hi Guys,
What is the fast way to move huge table (77 million) records with 25 columns across servers? The servers are not linked though.
Thanks for the help.
View 3 Replies
View Related
Jul 23, 2005
I work with databases and some of my backup files are 30-100 gigs insize. I currently use the FastCopy utility to copy files from oneserver to another. (This is needed when I need to transfer a DB to adifferent server). FastCopy says its 30 to 50% faster than Windowsfile copy. I read somewhere that the windows kernel defaults to4mb/sec throughput for copying.Is this true? Is there a faster way or some utility to copy files?I'm aware of RoboCopy but the documentation didn't mention anythingabout faster performance. All my servers are windows 2000 NTFS.Thanks
View 3 Replies
View Related
Jul 23, 2005
I've probably not given the best title to this topic, but thatreflects my relative "newbie" status.I have a table that goes essentiallyTSDATETIMEjobnumberVARCHARjobentryVARCHAR...the TS is a time stamp, and the other two fields are job numberand entries. There are (many) more fields, but this is the core ofit. The relationship is there will be several entries per job, withone row in the table per entry (i.e several rows per job).In constructing a web interface I want to create a list of recentjob numbers, and summarize it broadly as followsmax(ts)jobnumbercount(jobentry)...I can do this by a select command as followsselect top 30 max(ts) as time, jobnumber, count(jobentry)from Jobsgroup by jobnumberorder by time descHowever I'm now finding that this is quite slow now that my teat tablehas around 400,000 entries (Even though I've added indexes to mostrelevant fields). In particular it's much slower thanselect top 30 jobnumberfrom Jobsgroup by jobnumberorder by jobnumber descleading me to suspect that the aggregate functions are slowing thisdown. I would guesstimate the difference in speed is around a factorof 10-20.As I type this I think I've just realised it's theorder by time descthat is probably causing the problem (in that it will need tocalculate max(ts) for all jobnumber's in the table in order toexecute the "order by" clause).I think I can see the solution now (amazing what typing out a longquestion can do for you :-)My question was going to be if there was any approved method/tricksfor avoiding this sort performance hit.It seemed to me that if I could first grab a decent number of recentrecords (quickly) I could then execute the aggregate functions againstthat smaller set of records. I know in broad terms how any entriesthere can be per job, so I could select enough to make at least 30jobs, and then execute the real query against that. In my case therewill be probably less than 10 entries per job, so I could grab 300records and execute against that, instead of against the whole400,000.That being the case is this best donea) as a subquery, and if so, howb) by creating a temporary intermediate table (which isin essence a more explicit version of (a) isn't it?)Another solution that occurred wasc) to create and maintain a summary table, with justone record per jobnumber with a view to having itserve this particular (common) enquiry.For (c) to work I would probably need triggers on the Jobs table tomaintain the summary table. This would probably represent moreoverhead overall, but should certainly give a better responsewhen the query outlined above is executed on the web site.The response time to this query is becoming critical on the webinterface to avoid timeouts.Any suggestions or comments are welcome. If it's RTFM, then I'dappreciate page numbers :-)Thanks for reading this far :-)--HTML-to-text and markup removal with Detaggerhttp://www.jafsoft.com/detagger/
View 4 Replies
View Related
Nov 12, 2006
Hi all,
I have a situation where my Visual C# application presents a number of fields. In order to update a student object, I wish to call a stored proc. 1 or more fields can be updated... And If one is left null, then I don't want to update it, but instead I want to keep the old value.
I am really wondering if I am approaching this the right way. The following stored proc does what I want it to do, however I'm thinking there may be a faster way...
Here it is:
-- Update a student, by ID.
DROP PROCEDURE p_UpdateStudent
CREATE PROCEDURE p_UpdateStudent
@ID INT,
@NewFName VARCHAR(25),
@NewOName VARCHAR(25),
@NewLName VARCHAR(25),
@NewDOB DATETIME,
@NewENumber VARCHAR(10),
@NewContactAID INT,
@NewContactBID INT
AS
BEGIN
SET NOCOUNT ON;
-- DECLARE THE OLD VALUES
DECLARE @FName AS VARCHAR(25)
DECLARE @OName AS VARCHAR(25)
DECLARE @LName AS VARCHAR(25)
DECLARE @DOB AS DATETIME
DECLARE @ENumber AS VARCHAR(10)
DECLARE @ContactAID AS INT
DECLARE @ContactBID AS INT
-- Get all of the old values
SELECT @FName = FName FROM TBL_Student WHERE ID = 10000
SELECT @OName = OName FROM TBL_Student WHERE ID = 10000
SELECT @LName = LName FROM TBL_Student WHERE ID = 10000
SELECT @DOB = DOB FROM TBL_Student WHERE ID = 10000
SELECT @ENumber = ENumber FROM TBL_Student WHERE ID = 10000
SELECT @ContactAID = ContactAID FROM TBL_Student WHERE ID = 10000
SELECT @ContactBID = ContactBID FROM TBL_Student WHERE ID = 10000
-- USE ISNULL to set all of the new parameters to the provided values only if they are not null
-- Keep the old ones otherwise.
SET @NewFName = ISNULL(@NewFName, @FName)
SET @NewOName = ISNULL(@NewOName, @OName)
SET @NewLName = ISNULL(@NewLName, @LName)
SET @NewDOB = ISNULL(@NewDOB, @DOB)
SET @NewENumber = ISNULL(@NewENumber, @ENumber)
SET @NewContactAID = ISNULL(@NewContactAID, @ContactAID)
SET @NewContactBID = ISNULL(@NewContactBID, @ContactBID)
-- Do the update
UPDATE TBL_Student
SET FName = @NewFName,
OName = @NewOName,
LName = @NewLName,
DOB = @NewDOB,
ENumber = @NewENumber,
ContactAID = @NewContactAID,
ContactBID = @NewContactBID
WHERE
ID = @ID
END
GO
So yeah it works. But As you can see I wish to keep an old copy of the values to perform checks pre update....
Is there any faster way, or am I on the right track? I need a pro's advice :) (before i write all of my procs!!)
Thanks all.
Chris
View 7 Replies
View Related