DB Design :: Import And Stage Table Surrogate Keys
Apr 30, 2015
I want to create an import table for daily rows with an integer column like 20150430 for the date, called DayKey. This table would do one date per day. It would then be imported into a STAGE table which would have the same columns and would have all of the import rows for every day.My question would be this: I want to be able to have an integer Primary Key unless there is a better idea. I could make the STAGE table use an auto-incremented value for the key. Then, when I load the import table which is truncated every day, I could take the NEXT value of the key from the STAGE table and increment by 1.
Let's say the last value in STAGE is 1000, then the next value that would be in IMPORT would be 1001 and incrementing up. Then these would be added to the STAGE table with the associated keys. There is no chance of anyone or anything else adding to the STAGE table any other way.
Hi, I use lookups to map surrogate of level 1 dimensions to my fact tables in SSIS. But how to handle a level 2 dimension with a ValidFrom and a ValidUntil date field? I do not use an IsCurrent column, because this could problem with late arriving facts.
- In dts I used an SQL statement like this:
update SA SET SA.DimProdRef = Dim.RecordID FROM SAWarenEingang SA, DimProd Dim where SA.ProduktNumber = Dim.ProduktNumber and SA.ArtikelkontoBewegungsdatum between Dim.ValidFrom and Dim.ValidUntil
Now in SSIS I want to handle the whole thing in the data flow without using a staging table: - Using Lookups: I would have to pass the date column for each inside the fact table into the lookup. That does not work. - Using Execute SQL in the data flow: would be very slow, because the statement will be executed for any line in the dataflow
I want to change the work table name to work_version2 and later drop the work table. First, I created the table (work_version2) along with the data structure seen below and later inserted data from the work table. As I tried to make workID a surrogate key in work_version2 using SSMS, I got the below error message when I try to save the changes. Is there a way to do this?
Saving changes is not permitted. The changes you have make requires the following tables to be dropped and re-created. You have either make changes to a table that cant't be recreated or enabled the option that prevent saving changes that requires the table to be recreated. Work_version2.
CREATE TABLE WORK( WorkID Int NOT NULL IDENTITY (500,1), Title Char(35) NOT NULL, Copy Char(12) NOT
My question here is as the dimension has SCD type 2 on it and every time when there is a change the persn_key gets a new key value but the fact table still points to oldest key.how to update the surrogate key on fact table to the current key value? As per the requirement fact surrogate key must be pointing to current active record on the dimension.
I am in the process of building a fact table in a staging area. The data in the host system has numerous composite keys, so I have replaced all the composite keys in the dimensions with surrogate keys (integer) which are generated using an identity at load time. When I load the staging (fact) table, I have set the default value of all the foreign keys to 0. What I must do now is update all the foreign key values with the surrogate key values from the dimensions. I'm using an update command and the original gid values from the source system in the where clause...i.e. UPDATE X SET x.key_1 = y.key_1 FROM TableA X WITH (NOLOCK) INNER JOIN TableB Y WITH (NOLOCK) ON x.org_id = y.org_id AND x.bus_id = y.bus_id AND x.prov_gid = y.prov_gid AND x.log_gid = y.loc_gid;
This seems to work fine for most tables. However, I am now trying to update a table that has over 10 million rows and approximately 30 foreign keys. The script runs for hours. I ususally stop it after about 8 hours when it still hasn't completed. Since the keys are dynamic and they could possibly change during each load process, I can't add them during the load process.
Is there a better way to update these keys. I need to regenerate the fact tables every night and taking this much time to reload a fact table is just not practicle. I've indexed the alternate keys on all the dimensions and have also indexed the gids on the target fact table. Am I doing something wrong? Have I over indexed the target table? Please help! Thanks Jerry
This is the code iam using to get the incremental surrogate keys:
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent 'Declare a variable scoped to class ScriptMain Dim counter As Integer
Public Sub New() 'This method gets called only once per execution 'Initialise the variable counter = 1093 End Sub
'This method gets called for each row in the InputBuffer Public Overrides Sub Input_ProcessInputRow(ByVal Row As InputBuffer) 'Increment the variable counter += 1
'Output the value of the variable Row.instance = counter End Sub
End Class
--'Instance' is my surrogate feild name
but iam getting an error saying that InputBuffer is not defined ..Any idea?
If I want to add two more incrementive fileds ,where i have to add it?
Sorry if it sounds silly ,iam very new to this scripting.
As u can see there is two company references in my fact table, and the schema is in snowflake. My customer requirements state that the Contracts' amounts can be aggregated/filtered for/by, ServiceProviderCompany, its city/profession or ClientCompay, its city/profession.
First thing came in to my mind is to dublicate whole dimension structure (one for serviceproviders, one for clients), which i thought that there should be another way around?
I use the following 3 sets of sql code in SQL Server Management Studio Express (SSMSE) to import the csv data/files to 3 dbo.Tables via CREATE TABLE & BUKL INSERT operations:
-- ImportCSVprojects.sql --
USE ChemDatabase
GO
CREATE TABLE Projects
(
ProjectID int,
ProjectName nvarchar(25),
LabName nvarchar(25)
);
BULK INSERT dbo.Projects
FROM 'c:myfileProjects.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = ''
)
GO ======================================= -- ImportCSVsamples.sql --
USE ChemDatabase
GO
CREATE TABLE Samples
(
SampleID int,
SampleName nvarchar(25),
Matrix nvarchar(25),
SampleType nvarchar(25),
ChemGroup nvarchar(25),
ProjectID int
);
BULK INSERT dbo.Samples
FROM 'c:myfileSamples.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = ''
)
GO ========================================= -- ImportCSVtestResult.sql --
USE ChemDatabase
GO
CREATE TABLE TestResults
(
AnalyteID int,
AnalyteName nvarchar(25),
Result decimal(9,3),
UnitForConc nvarchar(25),
SampleID int
);
BULK INSERT dbo.TestResults
FROM 'c:myfileLabTests.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = ''
)
GO
======================================== The 3 csv files were successfully imported into the ChemDatabase of my SSMSE.
2 questions to ask: (1) How can I designate the Primary and Foreign Keys to these 3 dbo Tables? Should I do this "designate" thing after the 3 dbo Tables are done or during the "Importing" period? (2) How can I set up the relationships among these 3 dbo Tables?
we have a table in our ERP database and we copy data from this table into another "stage" table on a nightly basis. is there a way to dynamically alter the schema of the stage table when the source table's structure is changed? in other words, if a new column is added to the source table, i would like to add the column to the stage table during the nightly refresh.
I am new to MDM profisee tool and currently working for Addres verification project for my organization.I wanted to clear my doubts here about Unique Identifer in Stage table and how it works.. Here what i understand till now:
Step 1) I created an Entity using MDM profisee UI and it generated a stage table in MDS database called stg.Address_leaf Step 2) I have loaded data from external source to MDS stage table using ETL and passed Import type as 2 and Import status id as 0 Step 3) I have run store procudure system generated something stg.udp_Address_leaf to load the model and passed the version name as Version_1, Log flag as '1' and Batch tag as 'Address'
Now my below are my questions: 1) What is the field i can use in MDS stage table to populate my unique intifier value coming from source? (lets say Address_Id is my Unique value for all the records coming from source) 2) Where/how Unique Identifier is useful in this process? Will this be useful in next time load from stage to Model? 3) If i truncate and load my MDS stage table in next run and few earlier records has been updated how it will update those records in Model? will this process (code present in SP) recognize by Unique Identier column present in MDS stage table?
I'm going through my tables and rewriting them so that I can create relationship-based constraints and create foreign keys among my tables. I didn't have a problem with a few of the tables but I seem to have come across a slightly confusing hiccup.
Here's the query for my Classes table:
Code:
CREATE TABLE Classes ( class_id INT IDENTITY PRIMARY KEY NOT NULL,
This statement runs without problems and I Create the relationship with my Users table just fine, having renamed it to teacher_id. I have a 1:n relationship between users and tables AND an n:m relationship because a user can be a student or a teacher, the difference is one field, user_type, which denotes what type of user a person is. In any case, the relationship that's 1:n from users to classes is that of the teacher instructing the class. The problem exists when I run my query for the intermediary table between the class and the gradebook:
Code:
CREATE TABLE Classes_have_Grades ( class_id INT PRIMARY KEY NOT NULL,
Query Analyzer spits out: Quote: Originally Posted by Query Analyzer There are no primary or candidate keys in the referenced table 'Classes' that match the referencing column list in the foreign key 'Classes_have_gradesFKIndex2'. Now, I know in SQL Server 2000 you can only have one primary key. Does that mean I can have a multi-columned Primary key (which is in fact what I would like) or does that mean that just one field can be a primary key and that a table can have only the one primary key?
In addition, what is a "candidate" key? Will making the other fields "Candidate" keys solve my problem?
I have seen two approaches to primary keys. First one - and it is like default - is to use surrogate key as primary key. For each table I will create some autonumeric field hat cannot be changed once it has value. Some materials refer to this key also as technical primary key. I design my databases this way usually.
The other approach is to create primary key of fields that make primary key on database logical model. This approach is not so popular and has some side effects like a little bit clumpsy looking joins and unconvenient use in applications.
Question: What is the main idea behind second approach? Or how explain their preference database designers who are using second approach?
The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "FK_tbl_position_tbl_departments". The conflict occurred in database "MyProjectDB", table "dbo.tbl_departments", column 'ID'. All of my tables have an ID field that is a primary key with Allow Null unchecked.
First of all, this is my initial thread here on dbforums. I come from the land of Broadband Reports and would like to say, Hello fellow DB enthusiasts. :)
I'm not a novice to relational databases (Access MDBs), but new to implementing a db via SQL SERVER (2000 in this case) and using Access Data Projects.
Sample Data would be 1 | CRYS | CRYSTAL STAIRS 2 | MAOF | MEXICAN AMERICAN FOUNDATION 3 | PATH | PATHWAYS 4 | CCRC | CHILD CARE RESOURCE CENTER 5 | CHSC | CHILDREN'S HOME SOCIETY OF CALIFORNIA ==========================================
THE REMAINING "FKs" FROM SCHOOL ARE SIMILAR, as is other tables and their relationships. The design of the foreign keys were made using sql and the keyword "REFERENCES" and "FOREIGN KEY."
My questions are: :confused: (1) Is the use of FK as a Constraint any different than using an INDEX and how? (2) Should I Alter the Tables to include CASCADING Up/Down? (3) Are the use of CHARs Ok for the Keys? (4) Have I over/under-normalized any of the relationships?
We have a database with hundreds of tables, each with "CreatedByLoginId" and "ModifiedByLoginId" FK columns back to the Login table. This is all fine and well, but 500+ tables all link back to Login table every time a record is inserted or updated.
For strictly performance reasons, what do you think of us REMOVING the FK constraints on all of our tables? While this does mean that a GUID that is not a valid LoginId could potentially be put in a table, I'm not too worried about it because users don't have direct access to the database.
I have taken on a contract to improve reporting for an old HR database that was developed using FoxPro (Visual FoxPro, I think) with the data stored in SQL Server 2000. There are no foreign keys in SQL Server 2000 so the relationships are maintained inside FoxPro.Is there a way of extracting the relationships from the FoxPro code and generate foreign keys in SQL Server, so that I can do proper design?
A view named "Viw_Labour_Cost_By_Service_Order_No" has been created and can be run successfully on the server. I want to import the data which draws from the view to a table using SQL Server Import and Export Wizard. However, when I run the wizard on the server, it gives me the following error message and stop on the step Setting Source Connection
Operation stopped...
- Initializing Data Flow Task (Success)
- Initializing Connections (Success)
- Setting SQL Command (Success) - Setting Source Connection (Error) Messages Error 0xc020801c: Source - Viw_Labour_Cost_By_Service_Order_No [1]: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "SourceConnectionOLEDB" failed with error code 0xC0014019. There may be error messages posted before this with more information on why the AcquireConnection method call failed. (SQL Server Import and Export Wizard)
Exception from HRESULT: 0xC020801C (Microsoft.SqlServer.DTSPipelineWrap)
- Setting Destination Connection (Stopped)
- Validating (Stopped)
- Prepare for Execute (Stopped)
- Pre-execute (Stopped)
- Executing (Stopped)
- Copying to [NAV_CSG].[dbo].[Report_Labour_Cost_By_Service_Order_No] (Stopped)
- Post-execute (Stopped)
Does anyone encounter this problem before and know what is happening?
I would like to create a table called product. My objective is to get list of packages available for each product in data grid view column while selecting each product. Each product may have different packages type (eg:- Nos, CTN, OTR etc). Some product may have two packages and some for 3 packages etc. Quantity in each packages also may be differ ( for eg:- for some CTN may contain 12 nos or in other case 8 nos etc). Prices for each packages also will be different that also need to show. How to design the table..
Product name : Nestle milk | Rainbow milk packages : CTN,OTR, NOs |
CTN, NOs Price: 50,20,5 | 40,6
(Remarks for your reference):CTN=10nos, OTR=4 nos | CTN=8 Nos
I did not know how to search the archives here yet, so sorry if this is a repeat question.
I am installing SQL Server 2000 Developer edition on my XP Pro workstation and it is failing at the stage of the setup where it attempts to configure the database and run scripts.
17:13:42 Process Exit Code: (-1) 17:13:46 Setup failed to configure the server. Refer to the server error logs and C:WINDOWSsqlstp.log for more information. 17:13:46 Action CleanUpInstall: 17:13:46 C:WINDOWSTEMPSqlSetupBinscm.exe -Silent 1 -Action 4 -Service SQLSERVERAGENT 17:13:46 Process Exit Code: (1060) The specified service does not exist as an installed service.
Hello group, I have a question regarding the OLE DB Command Stage. Currently, I am reviewing a Data Flow that runs in production. This Data Flow Inserts to the various dimension tables in our warehouse. For a particular dimension table, the flow is like this:
Read Source records for Product combinations LookUp Product combinations against the current dimProduct table (cached in memory) Rows not found are then subjected to another LookUp on the dimProduct table (not cached). This is to find any rows inserted during the current run Rows not found are then Inserted to dimProduct using a Stored Procedure invoked by an OLE DB Command Successful Inserts then continue on, Rejected Inserts should be captured to a Flat File on our server for review.
Apparently, this last step has never been successful at capturing Rejects. Obviously, we would want to review these records to find the reason for failure. We get an empty file.
Currently, in the Stored Procedure we are using logic like this:
IF @PRODUCTCOUNT <> 0
BEGIN
RAISERROR ('DUPLICATE PRODUCT!', 10, 1)
RETURN
END
Questions:
Is the RAISERROR command going to give us Output? Can we implement the OUTPUT command in our Proc invocation? I have not found any documentation that says the OLE DB Command Stage supports Error logging (Although columns are available to be added in the Input/Output columns tab??) Should we be using another Stage to accomplish this?
Can any one give a summary of difference between Data Stage and SSIS . we are in process of Migrating existing Jobs in Data Stage to SSIS .so i am prepareing a comparitive report between Data Stage and SSIS . help us pls
ALTER TABLE [Students] WITH CHECK ADD CONSTRAINT [FK_Students_Schools] FOREIGN KEY([SchoolId]) REFERENCES [Schools] ([SchoolId])
What kind of index would ensure best performance for INSERTs/UPDATEs, so that SQL Server can most efficiently check the FK constraints? Would it be simply:
CREATE INDEX IX_Students_SchlId ON Students (SchoolId) Or CREATE INDEX IX_Students_SchlId ON Students (SchoolId, StudentId)
In other words, what's best practice for adding an index which best supports a Foreign Key constraint?
I have created a sample dataflow to parse the employee details (empid,empname,empaddr) from a flat file to oracle 9i database table named employee(columns : empid,empname,empaddress - All are varchar2(15)) using SLOWLY CHANGING DIMENSION transformation for insert/update on the table.
EMPID as Businees key
EMPNAME and EMPADDR as changing attributes.
Connection string is using Microsoft oledb provider for oracle.
TITLE: Microsoft Visual Studio ------------------------------
Error at Data Flow Task [OLE DB Command 1 [2007]]: An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Error at Data Flow Task [OLE DB Command 1 [2007]]: Unable to retrieve destination column descriptions from the parameters of the SQL command.
Warning at {CF5DCB64-279E-45A4-A9A8-FF2FBB130980} [Insert Destination [1972]]: Cannot retrieve the column code page info from the OLE DB provider. If the component supports the "DefaultCodePage" property, the code page from that property will be used. Change the value of the property if the current string code page values are incorrect. If the component does not support the property, the code page from the component's locale ID will be used.
Errors were encountered while generating the wizard results: Error at Data Flow Task [OLE DB Command [1996]]: An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Error at Data Flow Task [OLE DB Command [1996]]: Unable to retrieve destination column descriptions from the parameters of the SQL command.
Error at Data Flow Task [OLE DB Command 1 [2007]]: An OLE DB error has occurred. Error code: 0x80040E51. An OLE DB record is available. Source: "Microsoft OLE DB Provider for Oracle" Hresult: 0x80040E51 Description: "Provider cannot derive parameter information and SetParameterInfo has not been called.".
Error at Data Flow Task [OLE DB Command 1 [2007]]: Unable to retrieve destination column descriptions from the parameters of the SQL command.
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.42&EvtSrc=Microsoft.DataTransformationServices.Design.SR&EvtID=ScdWizardGenerationErrors&LinkId=20476
Pls let me know How I generate script for All primary keys and foreign keys in a table. Thereafter that can be used to add primary keys and foreign keys in another databse with same structure.
Also how I script default and other constraints of a table?
There is itemlookup table, which stores item number and itemdescription. Also there is a child table, pricehistory, of theitemlookup table, which stores different prices and different dateranges. So it is one-to-many relationship. (Price can be stored morethan one with a different date range)And there is another table RequestItem that stores the foreign key ofthe itemlookup table to show the information of the itemlookup table.Then how do I know later which date range of the price was stored inthe RequestItem table? Since I only keep the foreign key of theitemlookup table, it will be impossible to keep track of the row of thepricehistory table if there are more than one data existed in thepricehistory table.Will it be a valid table structure to create a column for the foreignkey of the pricehistory in RequestItem table or any other ways tohandle this issue?
Hello all,I got a chance to peak into a database system. Part of its design israther unfamiliar to me. When I look at the diagram generated by SQLServer, there are many floating tables. Eventually it turns out thatthese many floating tables are actually not floating. Their tablenames relate to fields (as TableID) in other tables. In this case,you can get a handle to one of such tables by search TableID columnsin other tables.To be more specific, the database is a microarray database implementedin SQL Server 2000. They have a table called MICROARRAYS. In thistable, there is a column called table_id. These table_ids are in facttable names of a bunch of other tables.My questions are1) Is this good relational design?2) How well is this kind of design supported in SQL Server?3) Are there better alternatives?Any other comment or link to helpful resources will also beappreciated.Thanks,Eric Wu
Can i have a combination of sources some with Unique Identifiers and some without?
I need to know what happens when I have option “Create Code values automatically” selected for the entity and same time pulling Unique Identifier for other sources in same entity stage table.
When we select option “Create Code values automatically” for the entity we creates, then during load from external source to MDM stage we don’t send any values to MDM stage “Code” field and in next step when we execute the stored procedure to load the data from MDM stage to MDM model, it assigns the Code values to each record in MDM stage and MDM model.
I need to know whether after executing the store procedure, will it assign Code values for only NULL records in MDM stage and not give us any error for the records that already have Code values present in MDM stage.
I'm using SqlDatasource when the update process is on a primary key, how do you cascade the changes to other tables? Also when deleting does the cascade deletes the row from another table or only sets the primary key to NULL?
I have two tables as below. I want to update a quote and change the item for which it is for. So I want to do an update statement to change the cat_ref that the quote is for. However, when I do this I get a foreign key conflict with cat_ref in the item table. How do I get around this? Thanks