SQL Server 2014 :: Composite PK / Surrogate And Sequence Identity
Jun 21, 2015
So, I have some questions about best practice in SQL Server.
1.) I have PK like this (company TINYINT, store TINYINT, action TINYINT, invoice INT, sn SMALLINT). I know JOINS will work faster with surrogate key but I have only couple of JOINS on that table. I use members of PK in WHERE clause mainly, alone and combined for reporting purpose. Is it always better to have surrogate key because they don't have any meaning and context of data laying in current PK.
2.) In my PK from above I have two candidates for using Sequence object. Invoice start with 1 for every (company,store,action) combination. Sn start with 1 for every (company,store,action,invoice) combination. I would like to know can I implement Sequence object here knowing that Sequence don't support PARTITION BY in OVER clause. From what I red it cannot be done via Sequence but I have to ask.Here is data sample for this PK
The orininal design of my db (part of it...) is the following
A JOB has a Number and a Description. Each JOB can have one or two TASKS (min one, max two). Each TASK is identified by the JOB it belongs to and an Index (unique only for the same JOB). Each TASK has one an only one set of INFO1, one and only one set of INFO2, one and only one set of INFO3 etc.
(There is a reason to keep INFO1, 2 and 3 separate, because eachof them will be linked to different table. This might influence the answer to my real question.)
First of all, I wouldn't add any surrogate key for TASK, not to loose the logic behind; plus I'd put an ined on JonMum only, being Index equal to 1 or 2 only, so not selective.
The real question is about INFO1 (and 2, 3 etc.) table: should I leave JobNum and Index as PK (consider that the PK of INFo1 will be used as FK for another table), or should I use a surrogate key, like for eaxmple
C: INFO1 (Info1ID [PK], JobNum [FKb], Index [FKb], ...)
I don't really like this solution. Actually I'd prefer the following
C: INFO1 (Info1ID [PK], ...)
where Info1ID = JobNum + Index (+ = string concatenation).
My previous post was not really clear, so I'll try again with a (hopefully) better (even if longer) example...
Consider the following...
A JOB describes the processment of a document. Each document can exist in two versions: English and French. A JOB can have 1 or 2 TASK, each describing the processement of either the English or French version. So we have the following:
that is there is an identifying 1:M (where maxium allowed for M is 2) relationship between JOB and TASK; TASK being identified by JobNum and Version (where the domain for Version is {E, F}).
Each TASK may require a TRANSLATION sub_task. Each TASK may require a TYPING sub_task. Each TASK may require a DISTRIBUTION sub_task.
For example, for a given doc, the English TASK requires TRANSLATION and DISTRIBUTION, while the French only DISTRIBUTION.
That is, there is a 1:1 not-required relationship between TASK and TRANSLATION, TYPING and DISTRIBUTION. So we have the following:
C: TRANSLATION (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) D: TYPING (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) E: DISTRIBUTION (JobNum [PK] [FKb], Version [PK] [FKb], Copies, ...)
As you can see I am using the PK of TASK as FK and PK for each of the three SUB_TASKs.
To complicate things, each SUB_TASK has one or more assignments. The assignments for each SUB_TASK records different information from the others. So we have...
C: TRANSLATION (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) D: TYPING (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) E: DISTRIBUTION (JobNum [PK] [FKb], Version [PK] [FKb], Copies, ...)
F: TRA_ASSIGN (JobNum [PK] [FKc], Version [PK] [FKc], Index [PK], Translator, ...) G: TYP_ASSIGN (JobNum [PK] [FKd], Version [PK] [FKd], Index [PK], Typyst, ...) H: REP_ASSIGN (JobNum [PK] [FKe], Version [PK] [FKe], Index [PK], Pages, ...)
that is there is an identifying 1:M relationship between each SUB_TASK and its ASSIGNMENTs, each ASSIGNMENT being identified by the SUB_TASK it belongs to and an Index.
I wish I could send a pic of the ER diagram...
Maybe there is another and better way to model this: if so, any suggestion?
Given this model, should I use for TRANSLATION, TYPING and DISTRIBUTION a surrogate key, instead of using the composite key, like for example:
C: TRANSLATION (TranslationID [PK], JobNum [FKb], Version [FKb], DueDate, ...) D: TYPING (TypingID [PK], JobNum [FKb], Version [FKb], DueDate, ...) E: DISTRIBUTION (DistributionID [PK], JobNum [FKb], Version [FKb], Copies, ...)
this will "improve" the ASSIGNMENTs tables:
F: TRA_ASSIGN (TranslationID [PK] [FKc], Index [PK], Translator, ...) G: TYP_ASSIGN (TypingID [PK] [FKd], Index [PK], Typyst, ...) H: REP_ASSIGN (DistributionID [PK] [FKe], Index [PK], Pages, ...)
I could even go further using a surrogate key even for TASK, which leads me to the following:
F: TRA_ASSIGN (TaskID [PK] [FKc], Index [PK], Translator, ...) G: TYP_ASSIGN (TaskID [PK] [FKd], Index [PK], Typyst, ...) H: REP_ASSIGN (TaskID [PK] [FKe], Index [PK], Pages, ...)
I don't really like this second solution, but I'm still not sure about the first solution, the one with the surrogate key only in the SUB_TASks tables.
Hello I'm looking for a way of generating the next key value that works in MS and Sybase SQL Servers. Sybase identity columns are a bit dodgy, so...
If I have a separate table NextKey (NextKey int) with one row that I update as follows...
declare @NextKey int update NextKey set NextKey = NextKey + 1, @NextKey = NextKey + 1 insert into myTable (PrimaryKeyCol, ....) values (@NextKey, ....)
are there any problems with concurrency ? As I see it the update will lock the row so different connections will always come up with a different @NextKey value....
The script below works but i have a break when datediff (days,startdate, endate) reaches 0. For every bookidm i want to iterate till the datediff is zero then move on to next bookid and do the same thing.
I've started using a SEQUENCE in a table instead of an identity.
I seem to be experiencing problems of the sequence getting reset to a lower value periodically. Inserting will work on the table, producing the next bigint in the sequence as the primary key, for days and then all of the sudden duplicate primary key errors show up. When I check, the last primary key value in the table is higher than the current value of the sequence.
For example: right now I have primary key values 6000 through 7032 contiguously in the table, all of which were generated with the sequence. Suddenly I'm getting duplicate primary key errors. A quick check of the sequence shows it's at 7002, but the last inserted row has a primary key of 7032!
I'm populating this table in one place (in the application layer), leaving the primary key null, which allows the default constraint to get the next sequence.
When the problem shows up, I've reset the sequence to the higher number in the past and all is well for many days, then the problem occurs again.
The definition for the sequence is:
CREATE SEQUENCE [dbo].[IntegrationQueueSEQ] AS [bigint] START WITH 1 INCREMENT BY 1 MINVALUE 0 MAXVALUE 9223372036854775807 CYCLE CACHE 50
The default constraint for the primary key on the table is defined as:
ALTER TABLE [dbo].[IntegrationQueue] ADD CONSTRAINT [DF_IntegrationQueue_IntegrationQueueID] DEFAULT (NEXT VALUE FOR [dbo].[IntegrationQueueSEQ]) FOR [IntegrationQueueID]
I am designing a database. I want to define a automatic sequence on a table primary key field. what is the best solution for it?I know I can enable identity property for a field, but it has some problems ( for example its seed jumps on restart and unsuccessful events).I also can use some calculated sequences. for example I can calculate max of the filed values and after incrementing use it as key for new inserted record.
I have a requirement where in I have to concatenate the fields based on their sequence given in another table along with respect to their lengths. eg..
Input 1:
Table A: (below are the fields and their respective values, not all fields will have values) ----------- KSCHL - ZIC0 (KEY) KOTABNR - 521 (KEY) MATNR KUNNR-->1234567890 LIFNR VKORG-->a234 PRCTR KUNRE-->4355325363 LIFRE-->88390234 PRODH
Table BIt contains the same fields as in table A and will have sequence number in which the concatenation should happen. The length field(LEN) will have corresponding field lengths(pipe delimited) should be considered in concatenation)
Note: If the field length given in Table B doesn't match with actual size of the fields then, the field should be filled with 2 left spaces while concatenation.. Eg. In above example say LIFNR value = 88390234(len =icon_cool.gif then after concat the value should be like below:
12345678904355325363a234 88390234
Note:The fields are not constant..I have around 40 fields like that in which any combination of fields can be possible...eg..
I am not sure which field has the value 1, 2 etc.. and how many fields are forming the combination..It can be sometimes 3/40 fields or it can be 10/40 fields...I have to dynamically get those values and concat...
I can have any number of fields for concatenation..above example is just for 4...it should be dynamic enough to handle any number of fields..
I have a table to store int primary keys. Where the value nextID is the value to use.I need to get the value and update it (nextID + 1) before the next person gets it.I thought using tran would work but it doesn't stop a select.How do I get a value (lock the table for the update), update the table (preventing any selects until done) and release the table.
BEGIN TRAN UPDATE TableIds SET NextId = NextId + 1 WHERE TableName = 'Users'
WAITFOR DELAY '00:00:20'
SELECT NextId FROM TableIds WHERE TableName = 'Users'
COMMIT TRAN
I really want to do this in the reverse order but it doesn't work. But if I do this and have another query to just do a select:
SELECT NextId FROM TableIds WHERE TableName = 'Users'
This query never returns. Doesn't the COMMIT TRAN release the lock on the table?
Ok I think I will need to use a temp table for this and there is no code to share as of yet. Here is the intent.
I need to insert data into two tables (a header and detail table) the Header Table will give me lets say an order number and this order number needs to be placed on the corresponding detail lines in the detail table.
Now if I were inserting a single invoice with one or more detail lines EASY, just set @@Identity to a variable and do a second insert statement.
What is happening is I will be importing a ton of Invoice headers and inserting those into the header table. The details are already in the database across various tables and and I will do that insert based on a select with some joins. As stated I need to get the invoice number from IDENTITY of the header table for each DETAIL insert.
I am assuming the only way to do this is with a loop... Insert one header, get identity; Insert the detail table and include the IDENTITY variable, and repeat.
How do I get the next int value for a column before I do an insert inMY SQL Server 2000? I'm currently using Oracle sequence and doingsomething like:select seq.nextval from dual;Then I do my insert into 3 different table all using the same uniqueID.I can't use the @@identity function because my application uses aconnection pool and it's not garanteed that a connection won't be usedby another request so under a lot of load there could be major problemsand this doens't work:insert into <table>;select @@identity;This doesn't work because the select @@identity might give me the valueof an insert from someone else's request.Thanks,Brent
Recently I'm working on a multi-thread solution based on SQL-Server, now I'm facing such a problem:
Suppose I have process No.1(with multi-threads) inserting data to Table A, which has its identity column auto generated. And process No.2(also with multi-threads) retrieving data from Table A ,generate some records and insert the result into Table B. Both of these two processes are doing batch processing(batch retrieving and batch writing), and they are running parallelly.
Now since process No.2 retrieve data sequencely by the identity of Table A, it found there exists missing results. This is due to that records with bigger identities are not necessarily commited earlier than those who have smaller identities.
One direct solution is add one flag field in Table A indicating whether this record has been processed by process No.2, and each time it was processed , the field will be set. But unfortunatelly the table structure is not supposed to be modified.
So is there any other good solutions for this problem? Thanks.
I'm trying to move specific data from three linked tables on a source database to three tables on a destination database by generating dynamic SQL INSERT scripts but am getting stuck on the last set of INSERT statements. I don't think I can use SSIS because the source autonumber fields may already exist on the destination side but I could be wrong.
For simplicity, Table 1 (T1) has one autonumber key of PK1 as well as other fields (A1, B1, etc.).
Table 2 (T2) has one autonumber key of PK2 and a foreign key (FK1) that links back to PK1 as well as other fields (A2, B2, etc.).
Table 3 (T3) has one autonumber key of PK3 and a foreign key (FK2) that links back to PK2 as well as other fields (A3, B3, etc.).
Like this: T1: PK1, A1, B1, etc. T2: PK2, FK1, A2, B2, etc. T3: PK3, FK2, A3, B3, etc.
So, I'm able to query the source database T1 to generate my dynamic SQL INSERT statements that will be run on the destination side. I'm also able to generate my dynamic SQL statements to insert into T2 but when I get to T3 I currently am stuck figuring out how to insert because the destination side is unable to match it's FK2 to the just inserted PK2. It throws the below error.
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
I have a table which has a composite primary key consisting of four columns, one of them being a datetime called Day.
The nice thing afaik with this composite key is that it prevents duplicate entries in the table for any given day. But the problem is probably two-fold
1. multiple columns need to be used for joins and I think this might degrade performance? 2. in client applications such as asp.net these primary keys must be sent in the query string and the query string becomes long and a little bit unmanagable.
A possible solutions I'm thinking of is dropping the existing primary key and creating a new identity column and a composite unique index on the columns from the existing composite key.
I would like to have some tips, recommendations and alternatives for what I should do in this case.
How can I create a composite key (two primary keys for one table) using SQL Server 2005 Management Studio? Can I do it using the GUI, or do I have to write SQL code to do this? Thanks
hi guys i have a question ..... i would like to know how to delare a compound attribute in this case would be Birthinfo onto a sql server script. from this diagram . thanyou guyt in advance and any suggestion or help would be greatly appreciated .....
I want to created composite primary key using MS SQL Server Enterprise Manager Interface. I want to use Interface instead of writing it in CREATE Table statement. I was able to create foreign key using this Interface using the "Manage Relationship" option. But cannot find how to add primary key consisting of two fields. Any help regarding this is highly appreciated.
hi! when i read some reference books about the SQL7.0, i often met 'surrogate key'. what's the surrogate key? what's its funtion? could you give me a good example? thanks very much!
Hey All, I'm trying to decide what's the 'best' to use. I've been designing and creating database for a while and have pretty much always used a surrogate key and not a normal one. I've finally had some free time to start studying more so in my spare time and read up and come accross a lot of guides, articles and stories that tout that normal keys should be used whenever possible as they're a better identifier and that surrogate keys should only be used when there is not a readily available normal key. Now perhaps I'd be open to accepting that but absolutely every database I come across tends to only use surrogate keys. For example I'm doing an authentication system from scratch and am looking at the User table. Now of course the user name has to be unique, should that be the primary key or should I have a seperate column with a guid or an incrementing int or the like as the primary key? I can certainly see that username could be used. I can also see how it may be easier when looking through the data tables to identify who/what a table is refering to with a surrogate key. However it still seems sort of sloppy, for lack of a better word, to me. Where now I could have somebody's username (or any other piece of data used for this purpose) spread accross a lot of other tables. And while writting this I just thought of the scenario that perhaps somebody needs their username changed, with this method now the ids need to be changed on all the related rows of all the other tables whereas with a surrogate key it wouldn't matter. Anyways I'm mostly looking for opinions on which way to go (not just with the user sample, but more in general).Thanks.
I am performing a Select Into from a #table into a real table that has a surrogate key. If this is in a transaction (or not in one) am I guaranteed that the records inserted will be sequential surrogate key ids?
Select * into REALTABLE from MYPOUNDTABLE --40 rows
Can I assume that if the first one inserted is id 32 that the last one is 72?
{CREATE TABLEs and INSERTs follow...}Gents,I have a main table that is in ONE-MANY with many other tables. For example, ifthe main table is named A, there are these realtionships:A-->BA-->CA-->DA-->EWith one field in Common (Person). The tables B, C, D and E are History tables,with Start and End dates. Each person has a Program history (table B, ie), anExperience history (table C, ie), and so on...many differernt types ofhistories, and it may grow from here....table F, G, etc.The included CREATE TABLEs and INSERTs contain tables A, B and C.The problem: Each tblCase (table A) record has a date. When joining all of thehistory tables to tblCase on Person, obviously you get a cross-product of eachhistory unless you specify a WHERE clause that extracts one single record fromeach of the histories (duh...that's the point...to extract a single record fromeach history, because there can only be one value in effect at the time of theCase.)QUESTION: From a performance standpoint, would it behoove me to maintain thesurrogate ***HistoryID from each history table in tblCase, or, assuming theindexes are set up properly, would a WHERE condition for each history besufficient? For example, the following select works as expected:SELECT CasePerson, CaseDate, ProCode, ExpYearFROM tblExperienceHistory INNER JOIN (tblCase INNER JOIN tblProgramHistory ONtblCase.CasePerson = tblProgramHistory.ProPerson) ON tblCase.CasePerson =tblExperienceHistory.ExpPersonWHERE CaseDate BETWEEN ProStartDate and ProEndDateAND CaseDate BETWEEN ExpStartDate and ExpEndDateIt extracts the single record from each history for each person for each case.But I'm afraid of performace with such a scenario.Instead, I could store each ***HistoryID in the table tblCase, and then justjoin on that...no WHERE needed. But the trade-off is that I'd have to buildprocesses to maintain that. ("Hey, when you insert a record into tblCase, makesure to go get each HistoryID from the History tables!" or "If the user changesthe date ranges in one of histories, make sure to update tblCase to match thenew historyID!")Maybe a clustered index on each ***History table on Person/StartDate combinedwith the WHERE clause should perform as well as a real JOIN on surrogateintegers.It seems cheesey to have to resort to surrogate IDs...but the performanceincrease might be worth it. Also, if I go that route, whenever I add a newhistory table, I'd have to change the design of tblCase AND any SPs thatreference it. With the WHERE solution, I'd only have to change the SPs.Comments are welcome! (tblCase grows at 250,000 records per year; the historytables will increase about 1000 records per year)DCMFANCREATE TABLE [dbo].[tblCase] ([CaseID] [char] (5) CONSTRAINT [PK_tblCase] PRIMARY KEY CLUSTERED NOT NULL ,[CaseDate] [smalldatetime] NOT NULL ,[CasePerson] [char] (5) NOT NULL) ON [PRIMARY]GOCREATE TABLE [dbo].[tblExperienceHistory] ([ExperienceHistID] [int] IDENTITY (1, 1) NOT NULL ,[ExpPerson] [char] (5) NOT NULL ,[ExpStartDate] [smalldatetime] NOT NULL ,[ExpEndDate] [smalldatetime] NOT NULL ,[ExpYear] [int] NOT NULL) ON [PRIMARY]GOCREATE TABLE [dbo].[tblProgramHistory] ([ProgramHistID] [int] IDENTITY (1, 1) NOT NULL ,[ProPerson] [char] (5) NOT NULL ,[ProStartDate] [smalldatetime] NOT NULL ,[ProEndDate] [smalldatetime] NOT NULL ,[ProCode] [int] NOT NULL) ON [PRIMARY]GOINSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])VALUES('12345', '3/1/03', '00000')INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])VALUES('A1G34', '4/23/03', '00001')INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],[ExpYear])VALUES('00000', '1/1/03', '5/19/03', 1)INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],[ExpYear])VALUES('00000', '5/20/03', '12/31/03', 2)INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],[ExpYear])VALUES('00001', '4/20/03', '11/1/03', 0)INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],[ProCode])VALUES( '00000', '2/1/03', '9/30/03', '55555')INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],[ProCode])VALUES( '00000', '10/1/03', '5/1/04', '55555')INSERT INTO [tblProgramHistory]([ProPerson], [ProStartDate], [ProEndDate],[ProCode])VALUES( '00001', '1/1/03', '12/31/03', '55555')
I am performing a Select Into from a #table into a real table that has a surrogate key. If this is in a transaction (or not in one) am I guaranteed that the records inserted will be sequential surrogate key ids?
Select * into REALTABLE from MYPOUNDTABLE --40 rows
Can I assume that if the first one inserted is id 32 that the last one is 72?
Hi, I am having problem in bulk update of a sql server table haning identity column from a datatable( has no identity column) using sqlbulkcopy. I tried several approaches, but it does not show any error nor is the table getting updated. But the identity value seems to getting increased every time. thanks. varun
Now, I like #1 because the implementation is simple -- the calling code simply passes an author name, and a country id and an INSERT INTO statement is called with those parameters
INSERT INTO authors( @authorName, @countryId )
I like #1, because it hides the surrogate "id" key from the application calling code. But on the downside, it has more overhead work, because you have to first a) verify a country with that name exists, and b) select that id into a variable.
DECLARE id INT; IF EXISTS (select * from countries where country_id = @countryId ) THEN SELECT country_id INTO id FROM countries WHERE country_name = @countryName; END IF;
(Sorry I may have the SQL syntax wrong up there, but I was just trying to demonstrate the extra overhead involved).
I am in the process of building a fact table in a staging area. The data in the host system has numerous composite keys, so I have replaced all the composite keys in the dimensions with surrogate keys (integer) which are generated using an identity at load time. When I load the staging (fact) table, I have set the default value of all the foreign keys to 0. What I must do now is update all the foreign key values with the surrogate key values from the dimensions. I'm using an update command and the original gid values from the source system in the where clause...i.e. UPDATE X SET x.key_1 = y.key_1 FROM TableA X WITH (NOLOCK) INNER JOIN TableB Y WITH (NOLOCK) ON x.org_id = y.org_id AND x.bus_id = y.bus_id AND x.prov_gid = y.prov_gid AND x.log_gid = y.loc_gid;
This seems to work fine for most tables. However, I am now trying to update a table that has over 10 million rows and approximately 30 foreign keys. The script runs for hours. I ususally stop it after about 8 hours when it still hasn't completed. Since the keys are dynamic and they could possibly change during each load process, I can't add them during the load process.
Is there a better way to update these keys. I need to regenerate the fact tables every night and taking this much time to reload a fact table is just not practicle. I've indexed the alternate keys on all the dimensions and have also indexed the gids on the target fact table. Am I doing something wrong? Have I over indexed the target table? Please help! Thanks Jerry
I have a dimension called 'Caller Type' with the following attributes:
CallerTypeKey ---- surrogate key
CallerTypeID
CallerTypeDesc
CreatedByKey ---- foreign surrogate key from User Dimension
I used Script Task to get the last used key and increment it so i can use it for new records in my dimension. however, my dimension is linked to a User Dimension and I need the surrogate key of that once I insert the new record to CallerType Dimension.
This is the code iam using to get the incremental surrogate keys:
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent 'Declare a variable scoped to class ScriptMain Dim counter As Integer
Public Sub New() 'This method gets called only once per execution 'Initialise the variable counter = 1093 End Sub
'This method gets called for each row in the InputBuffer Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer) 'Increment the variable counter += 1
'Output the value of the variable Row.instance = counter End Sub
End Class
--'Instance' is my surrogate feild name
but iam getting an error saying that InputBuffer is not defined ..Any idea?
If I want to add two more incrementive fileds ,where i have to add it?
Sorry if it sounds silly ,iam very new to this scripting.
I have a database surrogate key that increments so rapidly (+5000 every 30 mins). I need my SSIS package to reset this database surrogate key to avoid reaching an upper limit value for that field.