I have a main table that is in ONE-MANY with many other tables. For example, if
the main table is named A, there are these realtionships:
A-->B
A-->C
A-->D
A-->E
With one field in Common (Person). The tables B, C, D and E are History tables,
with Start and End dates. Each person has a Program history (table B, ie), an
Experience history (table C, ie), and so on...many differernt types of
histories, and it may grow from here....table F, G, etc.
The included CREATE TABLEs and INSERTs contain tables A, B and C.
The problem: Each tblCase (table A) record has a date. When joining all of the
history tables to tblCase on Person, obviously you get a cross-product of each
history unless you specify a WHERE clause that extracts one single record from
each of the histories (duh...that's the point...to extract a single record from
each history, because there can only be one value in effect at the time of the
Case.)
QUESTION: From a performance standpoint, would it behoove me to maintain the
surrogate ***HistoryID from each history table in tblCase, or, assuming the
indexes are set up properly, would a WHERE condition for each history be
sufficient? For example, the following select works as expected:
SELECT CasePerson, CaseDate, ProCode, ExpYear
FROM tblExperienceHistory INNER JOIN (tblCase INNER JOIN tblProgramHistory ON
tblCase.CasePerson = tblProgramHistory.ProPerson) ON tblCase.CasePerson =
tblExperienceHistory.ExpPerson
WHERE CaseDate BETWEEN ProStartDate and ProEndDate
AND CaseDate BETWEEN ExpStartDate and ExpEndDate
It extracts the single record from each history for each person for each case.
But I'm afraid of performace with such a scenario.
Instead, I could store each ***HistoryID in the table tblCase, and then just
join on that...no WHERE needed. But the trade-off is that I'd have to build
processes to maintain that. ("Hey, when you insert a record into tblCase, make
sure to go get each HistoryID from the History tables!" or "If the user changes
the date ranges in one of histories, make sure to update tblCase to match the
new historyID!")
Maybe a clustered index on each ***History table on Person/StartDate combined
with the WHERE clause should perform as well as a real JOIN on surrogate
integers.
It seems cheesey to have to resort to surrogate IDs...but the performance
increase might be worth it. Also, if I go that route, whenever I add a new
history table, I'd have to change the design of tblCase AND any SPs that
reference it. With the WHERE solution, I'd only have to change the SPs.
Comments are welcome! (tblCase grows at 250,000 records per year; the history
tables will increase about 1000 records per year)
DCMFAN
CREATE TABLE [dbo].[tblCase] (
[CaseID] [char] (5) CONSTRAINT [PK_tblCase] PRIMARY KEY CLUSTERED NOT NULL ,
[CaseDate] [smalldatetime] NOT NULL ,
[CasePerson] [char] (5) NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblExperienceHistory] (
[ExperienceHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ExpPerson] [char] (5) NOT NULL ,
[ExpStartDate] [smalldatetime] NOT NULL ,
[ExpEndDate] [smalldatetime] NOT NULL ,
[ExpYear] [int] NOT NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[tblProgramHistory] (
[ProgramHistID] [int] IDENTITY (1, 1) NOT NULL ,
[ProPerson] [char] (5) NOT NULL ,
[ProStartDate] [smalldatetime] NOT NULL ,
[ProEndDate] [smalldatetime] NOT NULL ,
[ProCode] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('12345', '3/1/03', '00000')
INSERT INTO [tblCase]([CaseID], [CaseDate], [CasePerson])
VALUES('A1G34', '4/23/03', '00001')
INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '1/1/03', '5/19/03', 1)
INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00000', '5/20/03', '12/31/03', 2)
INSERT INTO [tblExperienceHistory]([ExpPerson], [ExpStartDate], [ExpEndDate],
[ExpYear])
VALUES('00001', '4/20/03', '11/1/03', 0)
hi! when i read some reference books about the SQL7.0, i often met 'surrogate key'. what's the surrogate key? what's its funtion? could you give me a good example? thanks very much!
The orininal design of my db (part of it...) is the following
A JOB has a Number and a Description. Each JOB can have one or two TASKS (min one, max two). Each TASK is identified by the JOB it belongs to and an Index (unique only for the same JOB). Each TASK has one an only one set of INFO1, one and only one set of INFO2, one and only one set of INFO3 etc.
(There is a reason to keep INFO1, 2 and 3 separate, because eachof them will be linked to different table. This might influence the answer to my real question.)
First of all, I wouldn't add any surrogate key for TASK, not to loose the logic behind; plus I'd put an ined on JonMum only, being Index equal to 1 or 2 only, so not selective.
The real question is about INFO1 (and 2, 3 etc.) table: should I leave JobNum and Index as PK (consider that the PK of INFo1 will be used as FK for another table), or should I use a surrogate key, like for eaxmple
C: INFO1 (Info1ID [PK], JobNum [FKb], Index [FKb], ...)
I don't really like this solution. Actually I'd prefer the following
C: INFO1 (Info1ID [PK], ...)
where Info1ID = JobNum + Index (+ = string concatenation).
Hey All, I'm trying to decide what's the 'best' to use. I've been designing and creating database for a while and have pretty much always used a surrogate key and not a normal one. I've finally had some free time to start studying more so in my spare time and read up and come accross a lot of guides, articles and stories that tout that normal keys should be used whenever possible as they're a better identifier and that surrogate keys should only be used when there is not a readily available normal key. Now perhaps I'd be open to accepting that but absolutely every database I come across tends to only use surrogate keys. For example I'm doing an authentication system from scratch and am looking at the User table. Now of course the user name has to be unique, should that be the primary key or should I have a seperate column with a guid or an incrementing int or the like as the primary key? I can certainly see that username could be used. I can also see how it may be easier when looking through the data tables to identify who/what a table is refering to with a surrogate key. However it still seems sort of sloppy, for lack of a better word, to me. Where now I could have somebody's username (or any other piece of data used for this purpose) spread accross a lot of other tables. And while writting this I just thought of the scenario that perhaps somebody needs their username changed, with this method now the ids need to be changed on all the related rows of all the other tables whereas with a surrogate key it wouldn't matter. Anyways I'm mostly looking for opinions on which way to go (not just with the user sample, but more in general).Thanks.
Hello I'm looking for a way of generating the next key value that works in MS and Sybase SQL Servers. Sybase identity columns are a bit dodgy, so...
If I have a separate table NextKey (NextKey int) with one row that I update as follows...
declare @NextKey int update NextKey set NextKey = NextKey + 1, @NextKey = NextKey + 1 insert into myTable (PrimaryKeyCol, ....) values (@NextKey, ....)
are there any problems with concurrency ? As I see it the update will lock the row so different connections will always come up with a different @NextKey value....
My previous post was not really clear, so I'll try again with a (hopefully) better (even if longer) example...
Consider the following...
A JOB describes the processment of a document. Each document can exist in two versions: English and French. A JOB can have 1 or 2 TASK, each describing the processement of either the English or French version. So we have the following:
that is there is an identifying 1:M (where maxium allowed for M is 2) relationship between JOB and TASK; TASK being identified by JobNum and Version (where the domain for Version is {E, F}).
Each TASK may require a TRANSLATION sub_task. Each TASK may require a TYPING sub_task. Each TASK may require a DISTRIBUTION sub_task.
For example, for a given doc, the English TASK requires TRANSLATION and DISTRIBUTION, while the French only DISTRIBUTION.
That is, there is a 1:1 not-required relationship between TASK and TRANSLATION, TYPING and DISTRIBUTION. So we have the following:
C: TRANSLATION (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) D: TYPING (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) E: DISTRIBUTION (JobNum [PK] [FKb], Version [PK] [FKb], Copies, ...)
As you can see I am using the PK of TASK as FK and PK for each of the three SUB_TASKs.
To complicate things, each SUB_TASK has one or more assignments. The assignments for each SUB_TASK records different information from the others. So we have...
C: TRANSLATION (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) D: TYPING (JobNum [PK] [FKb], Version [PK] [FKb], DueDate, ...) E: DISTRIBUTION (JobNum [PK] [FKb], Version [PK] [FKb], Copies, ...)
F: TRA_ASSIGN (JobNum [PK] [FKc], Version [PK] [FKc], Index [PK], Translator, ...) G: TYP_ASSIGN (JobNum [PK] [FKd], Version [PK] [FKd], Index [PK], Typyst, ...) H: REP_ASSIGN (JobNum [PK] [FKe], Version [PK] [FKe], Index [PK], Pages, ...)
that is there is an identifying 1:M relationship between each SUB_TASK and its ASSIGNMENTs, each ASSIGNMENT being identified by the SUB_TASK it belongs to and an Index.
I wish I could send a pic of the ER diagram...
Maybe there is another and better way to model this: if so, any suggestion?
Given this model, should I use for TRANSLATION, TYPING and DISTRIBUTION a surrogate key, instead of using the composite key, like for example:
C: TRANSLATION (TranslationID [PK], JobNum [FKb], Version [FKb], DueDate, ...) D: TYPING (TypingID [PK], JobNum [FKb], Version [FKb], DueDate, ...) E: DISTRIBUTION (DistributionID [PK], JobNum [FKb], Version [FKb], Copies, ...)
this will "improve" the ASSIGNMENTs tables:
F: TRA_ASSIGN (TranslationID [PK] [FKc], Index [PK], Translator, ...) G: TYP_ASSIGN (TypingID [PK] [FKd], Index [PK], Typyst, ...) H: REP_ASSIGN (DistributionID [PK] [FKe], Index [PK], Pages, ...)
I could even go further using a surrogate key even for TASK, which leads me to the following:
F: TRA_ASSIGN (TaskID [PK] [FKc], Index [PK], Translator, ...) G: TYP_ASSIGN (TaskID [PK] [FKd], Index [PK], Typyst, ...) H: REP_ASSIGN (TaskID [PK] [FKe], Index [PK], Pages, ...)
I don't really like this second solution, but I'm still not sure about the first solution, the one with the surrogate key only in the SUB_TASks tables.
I am performing a Select Into from a #table into a real table that has a surrogate key. If this is in a transaction (or not in one) am I guaranteed that the records inserted will be sequential surrogate key ids?
Select * into REALTABLE from MYPOUNDTABLE --40 rows
Can I assume that if the first one inserted is id 32 that the last one is 72?
I am performing a Select Into from a #table into a real table that has a surrogate key. If this is in a transaction (or not in one) am I guaranteed that the records inserted will be sequential surrogate key ids?
Select * into REALTABLE from MYPOUNDTABLE --40 rows
Can I assume that if the first one inserted is id 32 that the last one is 72?
I have a query that is the initial part of a stored procedure; the results are put in a temporary table that will be used by several other operations. I'm trying to consolidate some of these, possibly getting the whole thing down to one query. One of steps I'm trying to consolidate results in a join that looks like this (where c is a table alias created previously): Code:
LEFT JOIN proccode_t pc ON pc.proccode=c.proccode AND (pc.effectivedate<=c.dateofservice OR pc.effectivedate IS NULL)
I don't have any influence over the schema in this database, I can only write select queries. Things are set up such that proccode_t will have several records for any given proccode, each with a different effectivedate. The oldest/original effectivedate is often NULL. This means my join up there often matches up with several records in proccode_t, when I only ever want to match the one most recent and occasionally not match anything at all.
I know a few ways to accomplish this (as covered in another recent thread here), but none of them seem to match well to this situation. A sub query that uses TOP 1 ORDER BY effectivedate DESC would almost cover it, but if the query returns NULL that might or might not mean it matched the NULL effectivedate for that proccode. It's also kinda slow.
The proccoce_t table does contain another field to tell you which of the codes is currently effective aside from just using the date, and that would be the normal way to do this here. However, this procedure is for auditing old data and therefore the currently effective proccode may not be the one I'm concerned with.
Hello, I have been having a hard time with this issue. I am attempting to join a table onto itself to get the closest date onto a single row. What i mean is: I have the following data id date 1 10/07/08 2 10/06/07 3 10/06/03 4 10/06/03
the new table should have the current id and the one closes to it as so. 1 10/07/08 2 10/06/07 2 10/06/07 3 10/06/03 3 10/06/03 null null 4 10/06/03 null null but i am getting duplicates do to the 10/06/03. 1 10/07/08 2 10/06/07 2 10/06/07 3 10/06/03 2 10/06/07 4 10/06/03 3 10/06/03 null null 4 10/06/03 null null i want so that if there is a duplicate i can take the id thats higher. I cant figure it out. This is my current sql:
SELECT PB.ID,PB.StartDate, PB2.ID, PB2.Startdate from table PB left outer join table PB2 on PB.keyID = PB2.keyID and PB2.StartDate < PB.StartDate and PB.StartDate = (select top(1) StartDate from table PB3 where PB.keyID = PB3.keyID and PB2.StartDate < PB3.StartDate order by PB3.StartDate asc)
rowID PersonID Start Date End Date ===== ======== ========== ========== 001 6575556 19/06/2013 09/07/2013 001 6575556 20/06/2013 12/07/2013 001 6575556 21/06/2013 12/07/2013 002 9478522 15/05/2013 18/05/2013 003 7753423 22/08/2013 01/09/2013
Person can have more than one start/end date therefore I get multiple of the same row ID and Person ID when looking at their dates.
I want to display the most recent end date and associated data if there is more than one start/end date for the same person. I decided to do a self join with max Date aggregate using this against a main select from the Table1:
SELECT PersonID, MAX([End Date]) AS MaxEndDate FROM Table1 GROUP BY PersonID
And join it this way:
select RowID, PersonID, [End Date] FROM Table1 INNER JOIN ( SELECT PersonID, MAX([End Date]) AS MaxEndDate
[Code] ....
When I run the sub-query on its own it gives me the single PersonID and Max Date but on self-joining with Table1 I still get the duplicates values.
Now, I like #1 because the implementation is simple -- the calling code simply passes an author name, and a country id and an INSERT INTO statement is called with those parameters
INSERT INTO authors( @authorName, @countryId )
I like #1, because it hides the surrogate "id" key from the application calling code. But on the downside, it has more overhead work, because you have to first a) verify a country with that name exists, and b) select that id into a variable.
DECLARE id INT; IF EXISTS (select * from countries where country_id = @countryId ) THEN SELECT country_id INTO id FROM countries WHERE country_name = @countryName; END IF;
(Sorry I may have the SQL syntax wrong up there, but I was just trying to demonstrate the extra overhead involved).
I am in the process of building a fact table in a staging area. The data in the host system has numerous composite keys, so I have replaced all the composite keys in the dimensions with surrogate keys (integer) which are generated using an identity at load time. When I load the staging (fact) table, I have set the default value of all the foreign keys to 0. What I must do now is update all the foreign key values with the surrogate key values from the dimensions. I'm using an update command and the original gid values from the source system in the where clause...i.e. UPDATE X SET x.key_1 = y.key_1 FROM TableA X WITH (NOLOCK) INNER JOIN TableB Y WITH (NOLOCK) ON x.org_id = y.org_id AND x.bus_id = y.bus_id AND x.prov_gid = y.prov_gid AND x.log_gid = y.loc_gid;
This seems to work fine for most tables. However, I am now trying to update a table that has over 10 million rows and approximately 30 foreign keys. The script runs for hours. I ususally stop it after about 8 hours when it still hasn't completed. Since the keys are dynamic and they could possibly change during each load process, I can't add them during the load process.
Is there a better way to update these keys. I need to regenerate the fact tables every night and taking this much time to reload a fact table is just not practicle. I've indexed the alternate keys on all the dimensions and have also indexed the gids on the target fact table. Am I doing something wrong? Have I over indexed the target table? Please help! Thanks Jerry
I have a dimension called 'Caller Type' with the following attributes:
CallerTypeKey ---- surrogate key
CallerTypeID
CallerTypeDesc
CreatedByKey ---- foreign surrogate key from User Dimension
I used Script Task to get the last used key and increment it so i can use it for new records in my dimension. however, my dimension is linked to a User Dimension and I need the surrogate key of that once I insert the new record to CallerType Dimension.
This is the code iam using to get the incremental surrogate keys:
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent 'Declare a variable scoped to class ScriptMain Dim counter As Integer
Public Sub New() 'This method gets called only once per execution 'Initialise the variable counter = 1093 End Sub
'This method gets called for each row in the InputBuffer Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer) 'Increment the variable counter += 1
'Output the value of the variable Row.instance = counter End Sub
End Class
--'Instance' is my surrogate feild name
but iam getting an error saying that InputBuffer is not defined ..Any idea?
If I want to add two more incrementive fileds ,where i have to add it?
Sorry if it sounds silly ,iam very new to this scripting.
I have a database surrogate key that increments so rapidly (+5000 every 30 mins). I need my SSIS package to reset this database surrogate key to avoid reaching an upper limit value for that field.
I have several stage to star (i.e. moving data from a staging table through the key lookups into a fact table) ETL transformations in a single SSIS package. Each fact table has a different set of measures but the identical foreign key set, e.g. ConsultantKey, SubsidiaryKey, ContestKey, ContestParamKey and MonthKey.
Currently I have to replicate the key lookup (Surrogate Key Pipeline, or SKP) for each data flow. If I could cache each dimension one time in the package and reuse it for each stage to fact it would be much more efficient.
Is there a way for me to reuse a common data flow?
I need to create a query to get 1 row per location and get the minimum PolicyBookingDt and RowUpdateDt from the policy table. All the attributes from the Location table should also be from the Policy that has the minimum PolicyBookingDt.
So from the above example, i need to get the following:
Getting a little confused on how to create the syntax for this.
I am trying to write a ssis surrogate key data transform, my problem is I can't find an example how to add a column to the incoming columns and add some data to it. If anyone has a sample, can you please post it. I found a script option that works but I would like an actual transform.
I want to change the work table name to work_version2 and later drop the work table. First, I created the table (work_version2) along with the data structure seen below and later inserted data from the work table. As I tried to make workID a surrogate key in work_version2 using SSMS, I got the below error message when I try to save the changes. Is there a way to do this?
Saving changes is not permitted. The changes you have make requires the following tables to be dropped and re-created. You have either make changes to a table that cant't be recreated or enabled the option that prevent saving changes that requires the table to be recreated. Work_version2.
CREATE TABLE WORK( WorkID Int NOT NULL IDENTITY (500,1), Title Char(35) NOT NULL, Copy Char(12) NOT
I have two tables that I am trying to join and both have similar columns with which I am trying to use. One is Date and the other is LOGdate. Date's format is "01/01/2000" BUT LOGdate's is "01/01/2000 12:41:00 PM/AM". Can I join these two tables?
I have two tables that I am trying to join and both have similar columns with which I am trying to use. One is Date and the other is LOGdate. Date's format is "01/01/2000" BUT LOGdate's is "01/01/2000 12:41:00 PM/AM". Can I join these two tables? If yes how do I get rid of the excess time on LOGdate.
Table Master --------------------------- EmpID EffectiveDateFr Group 00001 1/1/2014 A 00001 1/5/2014 B 00001 1/9/2014 C 00001 2/1/2014 B 00001 2/20/2014 A ....
I want to create query the output should be:
EmpID TransDate Group 00001 1/1/2014 A 00001 1/2/2014 A 00001 1/3/2104 A 00001 1/4/2014 A 00001 1/5/2014 B 00001 1/6/2014 B 00001 1/15/2014 C 00001 2/1/2014 B 00001 2/2/2014 B 00001 2/20/2004 A
So, I have some questions about best practice in SQL Server.
1.) I have PK like this (company TINYINT, store TINYINT, action TINYINT, invoice INT, sn SMALLINT). I know JOINS will work faster with surrogate key but I have only couple of JOINS on that table. I use members of PK in WHERE clause mainly, alone and combined for reporting purpose. Is it always better to have surrogate key because they don't have any meaning and context of data laying in current PK.
2.) In my PK from above I have two candidates for using Sequence object. Invoice start with 1 for every (company,store,action) combination. Sn start with 1 for every (company,store,action,invoice) combination. I would like to know can I implement Sequence object here knowing that Sequence don't support PARTITION BY in OVER clause. From what I red it cannot be done via Sequence but I have to ask.Here is data sample for this PK
I have tables like the one below for my Stage and dimension tables:
Stage Table
accountid
name
address
Dimension Table
accountkey ---- surrogate key (DW key)
accountid ---- business key (transaction's primary key)
name
address
I used slowly changing dimension to detect the changes for the records inside my Dimension table. But I had a problem when a new record exists in the stage table. The accountkey is set as the primary key and it gets its value from a different table which stores the last account key that was created. I cannot load all the changes unless i have a business key. Is there a way that i can get the "last key" from a different table in the data flow area and then supply it together with the other fields in the new output branch of the slowly changing dimension?
I want to create an import table for daily rows with an integer column like 20150430 for the date, called DayKey. This table would do one date per day. It would then be imported into a STAGE table which would have the same columns and would have all of the import rows for every day.My question would be this: I want to be able to have an integer Primary Key unless there is a better idea. I could make the STAGE table use an auto-incremented value for the key. Then, when I load the import table which is truncated every day, I could take the NEXT value of the key from the STAGE table and increment by 1.
Let's say the last value in STAGE is 1000, then the next value that would be in IMPORT would be 1001 and incrementing up. Then these would be added to the STAGE table with the associated keys. There is no chance of anyone or anything else adding to the STAGE table any other way.
I want to select unique country - date pairs. It is not even necessary to have the count of each one, just the list of unique country/dates.
My query here uses 'group by' to accomplish this task, but there may be a way to do this with a self join. I believe using a self join would make the query faster.