SQL Server 2012 :: How To Create Staging Table To Handle Incremental Load
Jan 2, 2014
We are designing a Staging layer to handle incremental load. I want to start with a simple scenario to design the staging.
In the source database There are two tables ex, tbl_Department, tbl_Employee. Both this table is loading a single table at destination database ex, tbl_EmployeRecord.
The query which is loading tbl_EmployeRecord is, SELECT EMPID,EMPNAME,DEPTNAME FROM tbl_Department D INNER JOIN tbl_Employee E ON D.DEPARTMENTID=E.DEPARTMENTID.
Now, we need to identify incremental load in tbl_Department, tbl_Employee and store it in staging and load only the incremental load to the destination.
The columns of the tables are,
tbl_Department : DEPARTMENTID,DEPTNAME
tbl_Employee : EMPID,EMPNAME,DEPARTMENTID
tbl_EmployeRecord : EMPID,EMPNAME,DEPTNAME
How to design the staging for this to handle Insert, Update and Delete.
I'm attempting to load some data into an explicit hierarchy in MDS 2012 via the staging table and struggling with the HierarchyName field. Specifically I'm loading data into stg.[Entity Name]_Consolidated and using the exact name of the explicit hierarchy I've set up in the front end web application.
Originally my hierarchy was labelled "Reporting Hierarchy" and when loading the data into staging using this name then running the batch from the Import Data screen I can see the error message "Error - The HierarchyName is missing or is not valid.". I've checked the table mdm.tblHierarchy and can see that the name there is exactly as it was in the staging table and have since renamed the hierarchy as "Reporting_Hierarchy" with the same results.
We need to implement incremental load in database. A sample scenario is, there is a view (INCOMEVW) which is build on top of a query like
CREATE VIEW INCOMEVW AS SELECT CLIENTID,COUNTRYNAME,SUM(OUTPUT.INCOME) AS INCOME (SELECT EOCLIENT_ID AS CLIENTID,EOCOUNTRYNAME AS COUNTRYNAME,EOINCOME AS INCOME FROM EOCLIENT C INNER JOIN EOCOUNTRY CT ON C.COUNTRYCODE=CT.COUNTRYCODE
[code]...
This is a sample view. As of now there is a full load happening from the source(select * from INCOMEVW) and loads to target table tbl_Income.We need to pick only the delta and load to the target table using a staging. The challenge is,
1) If we get the delta(Insert,update or deleted rows in the source tables EOCLIENT,EOCOUNTRY,ENCLIENT,ENCOUNTRY, how to load the incremental to
single target table tbl_Income.
2) How to do the Sum operation with group by in incremental load?
3) We are planning to have a daily incremental load and thinking to create the same table structure as source with Date and Flag column to identify
the date and whether that source row is an Insert or Update or Delete with the flag. But not sure how to frame something like this view and load to single target with Sum operations.
What is the best way to transfer data from the staging table into the main table.
Example: Staging Table Name: TableA_satge (# of rows - millions) Main Table Name: TableA_main (# of rows - billions)
Note: Staging table may have some data same as the main table.
Currently I am doing: - Load data into staging table (TableA_stage) - Remove any duplication of rows from the staging table (TableA_stage) - Disable all indexes on main table (TableA_main) - Insert into main table (TableA_main) from staging table (TableA_stage) - Remove any duplication of rows from the main table using CTE (TableA_main) - Rebuild indexes on main_table (TableA_main)
The problem with the above method is that, it takes a lot of time and log file size grows very big.
I have a transaction table having about 40 crore rows in source. It don't have timestamp and unique key columns. It have only Bill_month and Bill_Year columns. Actually for loading this table into staging I have added a new datetime column by adding default bill_date as 01. Then
* First we delete last 3 month data from staging tables. * Get last 3 months data from source table. * Load that 3 months data from source to staging table.
We do this because we only get update for last three months data. Now I have to include this transaction table as Fact table in DW. What will be the best practice for loading the fact table by picking data form staging table. Also we have to look up with dimensions for Foreign Keys.
* Should I implement the same method of deleting last 3 months records and loading them again.
I am having one store procedure which use to load data from flat file to staging table dynamically. everything is working fine.Staging_temp table have single column.All the data stored in that single column below is the sample Data.
I am working on an HR project and I have one final component that I am stuck on.
I have an Excel File that is loaded into a folder every month.
I have built a package that captures the data from the excel file and loads it into a staging table (transforming a few bits of data).
I then combine it with another table in a view.
I have another package that loads that view into a Master table and I have added a Slowly Changing Dimension so that it only updates what has been changed. (it’s a table of all employees, positions, hire dates, term dates etc).
Our HR wants to have this data in a report (with charts and tables) and they wanted it to be in a familiar format. So I made a data connection with Excel loading the data into a series of pivot tables.
I have one final component that i cant seem to figure out. At the end of every year I need to capture a count of all Active Employees and all Termed employees for that year. Just a count.
The data is in one table labeled [EEMaster]. To test the count I have the following.
SELECT COUNT([PersNo]) AS HistoricalHC FROM [dbo].[EEMaster] WHERE [ChangeStatus] = 'Current' AND [EmpStatusName] = 'Active'
this returns the HistoricalHC for 2013 as 418.
SELECT COUNT([PersNo]) AS NumbOfTermEE FROM [dbo].[EEMaster] WHERE [ChangeStatus] = 'Current' AND [EmpStatusName] = 'Withdrawn' AND [TermYear] = '2013'
This returns the Number of Termed employees for 2013 as 42.
I have created a table to report from called [dbo.TORateFY] that I have manually entered previous years data into.
After the staging_temp data gets inserted into main table.my probelm is to handle such a file where number of columns are more than the actual table.
If you see the sample rows there are 4 column separated by "¯".but actual I am having only 3 columns in my main table.so how can I get only first 3 column from the satging_temp table.
I have a table in Server A and it has 5 columns. One is Address & ID, CreateDatetime,..
I need to transfer data from this table from Server A to server B for a report pupose. The Address column in this table has some places two address in the table. I am giving ex below
Address Houston, Dallas Redmond Sacramento New Jersey, New York
I want to avoid the rows where address are two Houston, Dallas & New Jersey, NewYork to the destination table in server B and need to do incremental loads.
how to proceed with this?
The version we are using is Sql Server standard edition 2008r2
In my ETL job I would like to truncate stg table but before truncating stging table, I want to make sure that all the records are inserted in the data model. The sample is as below
create table #stg ( CreateID int, Name nvarchar(10) ) insert into #stg select 1, 'a' union all select 2, 'b' union all
[Code] ....
How can I check among these tables and make sure that all values are loaded into the data model and the staging table can be truncated.
I'm trying to use Excel in SSIS to import the data from spreadsheet to a staging table. The package runs well from the web server using SSMS. But when I deploy and try to execute the package, I'm getting the below error. I've a question, whether I've to install the AccessDatabaseEngine driver in SQL database server or the web server where I'm executing the SSIS?
Error: The requested OLE DB provider Microsoft.Jet.OLEDB.4.0 is not registered. If the 64-bit driver is not installed, run the package in 32-bit mode.
where including/excluding a single column in an empty staging table would influence a resultset returning from distributed query? Both servers are SQL Server 2012. Nothing special about the staging table. It contains 12 columns with a mixture of INT and NVARCHAR(256) columns. In one case I exclude the column and the query returns in 17 seconds. When I include it the query does not return. Excluding the INSERT INTO the staging table and query returns in 17 secs with and without the column.
I've created an SSIS package that loads data from source to destination, using Lookup and conditional Split to check New rows and changed rows for one table.
Now I want to take this father by loading data for multible table more that 100. I did it in T-SQL using dynamic sql and cursor.
I am attempting to perform an incremental load, inserting new rows and updating existing rows. I am using a lookup and everything works fine, except when it is the first load of the destination table. As there are no records in the table at all, the lookup fails. I thought of using a rowcount - if the count variable is zero load everything from temporary table to load table, otherwise perform lookup and incremental load.
We are in the process of converting our existing incremental loads from DTS to SSIS.
Currently we get all the data for the past month into temp tables in the warehouse, compare with key fields add the new rows and update changed rows. All this is done using Execute SQL task.
Is there a better way to implement the incremental logic using SSIS any new objects that be used to avoid too much SQL codes? Performance is very important and we do a lot of aggregation after the load for the reports to run faster so that we can meet customer SLA's.
We have around 20 tables that needs to be loaded 4 have large amount of data between 20 and 40 million rows out of which we will be brining over around 100 thousand during each incremental run. The other tables have less than 100,000 rows so does not hurt truncating and reloading the entire table.
Hi everyone. I'm trying to figure out how to run an incremental load into a Staging table.
At this point I'm not trying to Conditional Split it between "New" and "Changed" records... just the load.
The logic in my head says that after each load, you can take the most recent "modified" date/time and store that in an incremental load table. That way, next time you run an incremental load, you just have to look up that "modified" date/time, and only load the source records with a "modified" date/time later than the record in your incremental load table. Does that plan sound feasible?
I think so far my problem is that my source is on an ADO.NET connection, and my incremental load table is on my SQL Server. So when I do my load from the ADO.NET database, I cannot read the data from the incremental load table.
Is my logic flawed? Any help would be appreciated.
i have a fact table which loads through package,when i m trying to load this table by running the package,i m truncating the fact table and loading the fresh data,instaed of this without truncating the fact table i have to implement the incremental logic in this.
For this i can use SCD or Conditional split,but problem here is i have many source tables to load this fact table,so its very difficult to trace the changes in different source tables.
We have build a powerpivotmodel that uses a couple of datasources. The user extracts on monthly base an excel file with 20 samples (randomly) from the model to check some aspects of the input. He can put Yes or No into a column if he agrees or not and fill in a remark field. Next this excel file is loaded back into the powerpivot model (in a nightly batch) and the Yes/No field and the remark field are added to the original data. If the user refreshes his excel sample file from the pivot, the remark and YN field will be filled with the values he added. So far so good. One moth later when he extracts the file for a new month a problem arises. Since the excel file only has data for the new month, when the PP-model is reloaded the data for the previous month will be lost. So we need some sort of incremental load for the excel file. Is that possible and is this the best way to handle this situation?
I am trying to insert bulk data into main table from staging table in sql server 2012. If any error comes, this total activity is rollbacked. I don't want that to happen. I want to know the records where ever the problem persists, and the rest has to be inserted.
i am inserting something into the temp table even without creating it before. But this does not give any compilation error. Only when I want to execute the stored procedure I get the error message that there is an invalid temp table. Should this not result in a compilation error rather during the execution time.?
--create the procedure and insert into the temp table without creating it. --no compilation error. CREATE PROC testTemp AS BEGIN INSERT INTO #tmp(dt) SELECT GETDATE() END
only on calling the proc does this give an execution error
Hi does anyone know how to create a sql table and then import a list by just clicking on a button to call a procedure?CREATE TABLE clients(ClientID VARCHAR(5), ClientName VARCHAR(30), PRIMARY KEY (ClientID));LOAD DATA LOCAL INFILE 'C:/client.csv' INTO TABLE clientsLINES TERMINATED BY ' ';
I used SQL Server 2012 Management Studio to create a new table on an 2014 SQL Server instance and got this message: 'This backend version is not supported to design database diagrams or tables'. Does this mean that I have to have SQL Server 2014 Management Studio to create a table on a SQL Server 2014 instance?
I am trying to create a table that would represent a workload for each shop. In order to do that I need to have WorkLoad table and ShopWorkLoad table which is actually just aggregation of WorkLoad.
WorkLoad contains a list of following items:
current orders that are in the process (one select statement) scheduled orders (another select statement) expected orders (third select statement) that come through a third-party system
All of this needs to be live. So, for example, as soon as order is added to Order table it should be included in WorkLoad if certain conditions are met. Same goes for scheduled orders (which come from another table). Expected orders will be loaded on a daily bases (based on historical data).
ShopWorkLoad table is aggregation of WorkLoad table.
Currently I did it this way:
Added after insert/update trigger on Order table: when order is created/updated, if it meets certain conditions, it should be inserted in WorkLoad, otherwise remove it from workload if it's in there and doesn't meet conditions
Added after insert/update trigger on Schedule table: when order is scheduled, if it meets certain conditions, it should be inserted in WorkLoad, otherwise remove it from workload if it's in there and doesn't meet conditions
Running daily job that populates WorkLoad table with expected orders based on historical values
Final step is to create an indexed view vShopWorkLoad
My biggest concern is usage of triggers which call pretty complex logic to determine whether item should be added to workload or not.
One other option was to create vWorkLoad view and somehow make it an indexed view but currently I don't see a way of doing that because the query consists of 4 union select statements, below is pseudo example. But even if doing it that way, how to build aggregated indexed view on top of vWorkLoad indexed view?
Third option is to use sql agent job which would run every x seconds (maybe 20) and it would execute all of these queries to populate WorkLoad table with delay of 10-20 seconds, but I am still not sure if this is acceptable to the client.
Fourth option is to create 3 or 4 indexed view where sum of them makes a workload. Then, ShopWorkLoad view would be built on top of these 3 or 4 indexed views, but in this case I don't know how this would affect performance since ShopWorkLoad query would be often queried.
Example of workload pseudo query:
select WorkLoadType = 'Order in process', OrderId, ShopId, ... from Order
I am having 100 of flat files need to load in respective staging table.I want to create table on run time as per filename input.suppose if input filename is ABC then table name should be Staging_ABC if file name is XYZ then it should be Staging_XYZ.Table structure is below need to create at run time
CREATE TABLE Staging_'Filename'( [COL001] [varchar](4000) NULL, [Id] [int] IDENTITY(1,1) NOT NULL, [LoadDate] [datetime] NOT NULL default getdate() )
We have a table to 100M rows and up until now we were fine with an non clustered index a varchar(4000) because we never went above 900 bytes (yes it is a bad design).We have the need to support international character sets now so the column was updated to nvarchar(4000) and now we have data past the 900 byte limit.
The data is long, seems useless but is needed by the business and they need to be able to search "where bigcolumn like 'test%'". With an index, even with a huge amount of data, it was 'fast'. Now of course without an index it is unusable. The wildcard is always at the end of the search. I made a full text index on the column and basic queries such as: select * from ourtable where contains(bigcolumn, 'AReallyLongStringofTextHere') works fine unless there is a space in the data. We loose thousands of returned rows because of spaces in the data.
I have tried select * from ourtable where contains(bigcolumn, '"AReallyLongStringofTextHere that includes spaces"') but not all of the data is returned. I get 112 rows with the contains statement. The table scanning statement of "select * from ourtable where bigcolumn like 'AReallyLongStringofTextHere that includes spaces%' returns 1939 rows.I understand that a full text index is breaking the long string up since it contains spaces. Is there a way to retain the entire string as 1 index entry or is there a way to fix my query to return all of the rows?
Hello, I have a following SPI want to add an extra field "ranking" that just increments the row number.Another feature would be: if several users have an equal totalvalue, theyshould have an equal ranking number. the rankings following users would haveto be adjusted as well. thanksSET QUOTED_IDENTIFIER OFFGOSET ANSI_NULLS OFFGOCREATE PROCEDURE [dbo].[Rankings]@iErrorCode int OUTPUTASSELECT top 30###COUNTER##,[user],[totalvalue], [cash], [stocksvalue]FROM [dbo].[users]ORDER BY totalvalue DESCSELECT @iErrorCode=@@ERRORGOSET QUOTED_IDENTIFIER OFFGOSET ANSI_NULLS ONGO