I'm working on my first data warehouse and I'm not sure how I should name the columns in the database.
The first phase of the data warehouse is to store a bunch of data from one third party source. The source contains over 100 pieces of data and the business user doesn't even know what some of the fields are but he wants to store everything. The third party refers to the each field with a somewhat cryptic short name and a longer description. The short name isn't always cryptic.
My question is am I better off naming my columns the same as the source system's short name so that I can easily debug problems later? Should I instead try to shorten their definition into something meaningful? On a side note, I'm 100% positive that we'll never populate the tables in questions with data from an additional source.
I'm in the process of converting a rather huge VSAM database into a set of SQL tables. I am using the same data names from the mainframe (like XDB-NAME to RDB-NAME). I load the files using Import Export Data and it makes the tables with such column names as col001, col002, col003, etc... and always sets the data types to varchr(255). And I have to cut and paste the data names from the manframe side to the server side (and the data types to.) So, is there an easier way to do this? Or am I doomed to cut-n-paste my days away... Thanks for any help.
Ok, i am finally giving in on this one and asking for some help! I am trying to set up a T-SQL Statement that will extract data from all the tables in the current database to a csv file including Column names!
I know that bcp can not handle the column names, so i tried to get around this with an append of the column names from a select, but unfortunatly the select gives me the names in Alpha order and not the order of the fields.
I have tried putting in an order by on the select, but this does not seem to have any effect. I have included the snippet of my script that is causing the problem here :
-- set up the echo command select @colcommand= 'exec master..xp_cmdshell' + ' ''' + 'echo ' + @names + ' >> c:cp' + TABLE_CATALOG + '.' + TABLE_SCHEMA + '.' + TABLE_NAME + '.txt' + '''' from INFORMATION_SCHEMA.TABLES where TABLE_NAME=@TABLE
and just in case you are interested in the rest of the script, the full monster is included at the bottom of the post. Also if you can see any more efficient ways of doing what i am trying to do, please let me know!
-- Script to create a csv file of data from all tables inside current database
-- declare all variables declare @command varchar(200) -- command used for bcp declare @fetch_status int-- variable for fetch status in cursor
declare @TABLE varchar (200)-- Variable to hold table name declare @colcommand varchar (200)-- Variable to hold column creation command declare @count int-- Variable used to determine first itteration of Column loop declare @names varchar(100) -- variable used for the column names declare @delimiter varchar(10)-- variable used for delimiter in column names
SET @delimiter = ','-- set up the delimiter to comma select @count=0-- initialises the COUNT variable
-- setup cursor to create the bcp command to backup the data files to csv format declare bcpcommand cursor READ_ONLY FOR select 'exec master..xp_cmdshell' + ' ''' + 'bcp' + ' ' + TABLE_CATALOG + '.' + TABLE_SCHEMA + '.' + TABLE_NAME + ' out' + ' c:cp' + TABLE_CATALOG + '.' + TABLE_SCHEMA + '.' + TABLE_NAME + '.txt' + ' -c -t,' + ' -T' + ' -S' + @@servername + ''''from INFORMATION_SCHEMA.TABLES where TABLE_TYPE = 'BASE TABLE'
-- setup cursor to pick up all the tables in the given database (used for column names section) declare dbtables cursor READ_ONLY FOR select TABLE_NAME from INFORMATION_SCHEMA.TABLES where TABLE_TYPE = 'BASE TABLE'
open bcpcommand
select @fetch_status=0
while @fetch_status=0 begin fetch next from bcpcommand into @COMMAND select @fetch_status=@@fetch_status if @fetch_status<>0 begin continue end
-- print 'Command to be run : ' + @COMMAND EXEC (@COMMAND) end
-- close and tidy up close runme deallocate runme
-- now create the fieldname files and then echo the 2 files together!
open dbtables select @fetch_status=0 while @fetch_status=0 begin fetch next from dbtables into @TABLE select @fetch_status=@@fetch_status
if @fetch_status<>0 begin continue end
SELECT @names = COALESCE(@names + @delimiter, '') + name FROM syscolumns where id = (select id from sysobjects where name=@TABLE)
-- due to the concatonation used, the second itteration onwards has a , attached to the front of the line -- this section removes the first char if @count <> 0 begin Select @names=SUBSTRING(@names,2,198) end
-- set up the echo command select @colcommand= 'exec master..xp_cmdshell' + ' ''' + 'echo ' + @names + ' >> c:cp' + TABLE_CATALOG + '.' + TABLE_SCHEMA + '.' + TABLE_NAME + '.txt' + '''' from INFORMATION_SCHEMA.TABLES where TABLE_NAME=@TABLE
-- print 'COMMAND : ' + @colcommand
exec (@colcommand)
-- reset @names variable for next itteration, and set count to 1 to trigger IF above select @names='' select @count=1 end
-- close and tidy up close dbtables deallocate dbtables
I have a SQL text column from SP_who2 in table #SqlStatement:
like 1row shown  below :
 "update Panel  set PanelValue=7286 where PanelFirmwareID=4 and PanelSettingID=9004000"
I want to find what table and column  names are in the text ..
I tried like below .. Â
Select B.Statement from #sp_who2 A  LEFT JOIN #SqlStatement B ON A.spid = B.spid  where B.Statement IN ( SELECT T.name, C.name FROM sys.tables T JOIN sys.columns C ON T.object_id=C.object_id WHERE T.type='U' )Â
Something like this : find the column names and tables name
I was wondering if anyone has an idea of how we could find the table names and column names of the tables in our Sql server database at runtime/dynamically given our connection string? Please let me know.
I created a ssis package which exports the data from oledb source to flat file (csv format). For this i have OLEDB source and Flat File as destination. I generate the file and filename dynamically with the column names in the first row. So if the dynamically generated file name already exists , then i want to append the data in the same existing file. But I dont want to append the column names again. I just want to append the rows to the existing rows.
so lets say first time i generate a file called File1_3132008.csv.
Col1, Col2 1,2 3,4
After some days if my ssis package generates the same file name i.e. File1_3132008.csv, this time i just want to append the rows to the existing file. So the file should look like this- Col1, Col21,23,45,67,8
But instead my file looks like this if i set Overwrite propery to false
Col1,Col2 1,2 3,4 Col1,Col2 5,6 7,8
Can anyone help me to get the file as shown in the highlighed
I have a requirement to implement CDC for 50+ tables to implement incremental data changes warehouse/reporting rather than exporting the whole table data. The largest table is having more than half a billion records.
The warehouse use a daily copy of OLTP db (daily DB refresh). How can I accomplish this. Is there a downside in implementing CDC just for the sake of taking incremental changes on the tables?
Is there any performance impact if we enable CDC on OLTP db?
Can we make use of the CDC tables on the environment we do daily db refresh so that the queries don't hit OLTP database?
What is the best way to implement CDC to take incremental changes for reporting.
hi I am new at MSSQL 2000 DBA thing. and trying to learn more about analysis service/data warehouse/data mining. so is any expert out there can Recommend some good books or web link article to read? Thanks
I am Crystal Reports Developer and I am new in SSIS environment. I have started to read Professional SQL Server 2005 IS book. I am really confused by many tasks to choose.
I need to develop reports from data warehouse. But before I have to send the data from operational database (SQL Server 2000) to warehouse (SQL Server 2005) monthly - I have a script for retrieving the data. For my package, I chose Data Flow Task, Execute SQL Task, and OLE DB Destination, and it does not work.
Please help me if I can look similar packages performing? Thank you!!
OLE DB source which calls a stored proc that returns a result set
data conversion
Excel destination I am in design mode in Business Intelligence studio. My excel destination (with an Excel Connection) shows no sheet name though I have an execute SQL task before the data flow to create the excel table called SHEET1. Needless to say, there are no output columns visible to do any mappings. I did go to the ExcelConnection to set the OpenRowset Property to SHEET1 but it seems to have no effect.
I can do the export in SQL Server Management studio and that works fine, but it is basic and does not meet my requirements. I have to customize the package to allow dynamic Excel filenames based on account names and have to split my result set into multiple excel sheets because excel 2003 has a max of 65536 rows per sheet. Also when I use the export wizard, I have the source as a table and eventually the source has to be a stored proc with input parms.
What am I missing or doing wrong? Thanks in advance
hi, Can anyone list the type of error that will make @@error =1 I created a procedure to update a table based on a cutomer id, Id 7 doest not exist in table A, and I am suppose to have Not valid id, but in this case nothing happen I always get table a updated thanks Ali
begin tran update table a set title =' manager ' where id =7
if (@error <> 0) begin rollback tran print 'not valid id ' return end ELSE begin commit tran print 'table A updated' end
I need a recommendation on a data modeling tool that can be used with a data warehouse. My warehouse is running SQL 2012.
Here is my challenge: Most of the tables in the warehouse do not have primary keys and none of the tables have foreign keys on them. However, there are indexes and unique keys/indexes on the tables. I am looking for a tool that I can create virtual relationships on how the data is related, so it is visually easier for the ETL developers to write the code.
I have looked at both ER/Studio 11 and ERwin 9.6. Neither of them do it exactly the way I want it too. However, ER/Studio is pretty close.
I am working on to create a data warehouse. I have made a database which will be the data warehouse and will consist of dimension and fact tables. I know that other than dimension and fact table a data warehouse should also consist of a meta data, now my question is what should be the structure of metadata and all the information it should have?
I am going to use Microsoft SQL Server to develop my data warehouse, but one thing makes me confused. Since Analysis Service can create a Star schema database, do I have to pre-set up a Star schema database for ETLed data? Basically, I am wondering what's the relationship between an ETLed database and the one created through the Analysis Services.
Can any one give me an explanation from the implementation perspective?
I am trying to restore my data warehouse from a January 2008 backup under a new name to recover a table that I accidentally deleted. It is taking a long time for the restore to get done. Here is the command I am running as sa in QA
---
RESTORE DATABASE Warehouse_new FROM DISK = 'H:MSSQLDataMSSQLBACKUPDBBackupsWarehouseWa rehouse_db_200801050600.BAK' WITH MOVE 'Warehouse_Data' TO 'G:MSSQLDataMSSQLDataWarehouse_New_Data.MDF', MOVE 'Warehouse_Log' TO 'H:MSSQLDataMSSQLLogsWarehouse_New_Log.ldf'
----
There Warehouse_New_Data.MDF is 375 GB and the log is 12 GB.
There is still 169 GB of free space on the drive I am restoring to after the presence of Warehouse_Data.MDF and Warehouse_New_Data.MDF (each 375 GB).
Its been 4.5 hrs and the restore is still running. Backups take about 3.5 hrs to complete. Can I do any checks on the restore to see what point it is at? I stopped the restore using EM earlier after it took 8 hours and still no progress.
Hello..I was wondering if anyone out there could tell me how they deal withNULL values in a data warehouse? I am looking to implement a warehousein SQL 2005 and have some fields which will have NULL values and Iwould like some further ideas on how to deal with them. At my last jobin dealing with Oracle we were just going to leave the fields NULL, butin SQL how would you best recommend cleaning the data? I greatlyappreicate your help and look forward to your reponses.Thank you
I€™m making warehouse for our HMIS (healthcare management information system)by using SSIS. I€™m facing some problems now, could you please help me to solve my problem.
Brief idea about my Warehouse: Source: oracle 9i Destination: Sql server 2005 ETL tool: SSIS
Problems:
How to refresh or load the current data to data warehouse.(now i'm using truncate sql task for deleting old/entire data for each packages, i really dont want to use in the production) . For example: The patient admissions data is adding everyday so i want to load the current data into my warehouse. Could you pls suggest me good solution for this?
Refresh Cycle timings: is there any task available in SSIS?
current status:
First Time load completed, i set one Execute Sql statement ctrl flow task for Truncate the existing loaded data in the sql server 2005. and then again i process one data flow task for loading the data from oracle to sql server.
Is SSIS a tool for extracting realtime data from staging to data warehouse? Realtime in my case can be loading every 15 minutes but no more than 30 minutes. I've a data warehouse which data refresh once a day and it worked fine. The data that I extract into the warehouse is from a Staging database which is realtime replication of multiple production databases. Once a day, I've to have replication pauses on staging for a couple hours to refresh the data warehouse. That's the only way so SSIS can pull the data correctly; if I've replication on while SSIS pull data, it will always copy less rows than it supposed to.
I cannot afford to have replication pauses every 15 minutes just so I can refresh data warehouse. Does anyone every have this problem? or any best practice how to do this?
Does any body have the experience to execute data warehouse backup/recover? What I want to know is how to backup/recover database in data warehouse and cubes.
At my office, we've been slowly working on putting together a data warehouse.
We're a financial services company and one of the services that we offer is debt collection. As far as reports go, our clients are interested in knowing how much money we collect over time. In particular, they want to know how many payments we've gotten 5, 10, and 15 months (and so on) after we receive a case. (Obviously, the 5-month payments are also included in the 10 and 15-month calculations).
When I wrote this report using our transactional database, I was completely new to SQL and the ever-resourceful Patron Saint took pity on me, so you can see a good description of the details at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=78510
Now that I'm no longer a total newbie at SQL, and having been through a relatively extensive seminar on data warehousing, I've been entrusted with researching certain aspects of data warehouse development (rest easy, though, folks - the real DWH work is not being done by the very inexperienced me, but by an actual professional :) ).
My question:
how would you model this kind of "relative time" in a data warehouse? How would you display the 5-month, 10-month, and 15-month payments in a DWH? I can't really imagine that the kinds of joins necessary to do this in a transactional database would be desirable in a data warehouse.
We have the following:
1.) FACT_Payment: A fact table showing each each payment to the most detailed granularity. One attribute of this table is the payment date. Another attribute is a foreign key to the case dimension described below.
2.) DIM_Case: A dimension table showing information on each case, including the case start date. DIM_Case
3.) DIM_Date: A date dimension table.
(For added clarification: The FACT_Payment payment date has to be 5, 10, 15 months etc... after the DIM_Case start date.)
Any ideas, comments, experience with something like this?
I need someone to give me a small briefieng on how a enterprise data warehouse (EDW)differs from a regular database? Currently we have an appplication that accesses a database with about 18 tables. We also have a Data Warehouse. For some reason I was thinking that it would be possibly to migrate thedatabase into the data warehouse. The reason is that in looking at the schematic design forthe data warehouse there are some data tables that could also be used by our application that uses the DB. I guess I am confused because I am not sure if a data warehouse is used in the same wayas a database?
i want to create a data mart from an existing OLTP database. for example northwind or i will create an OLTP database. i dont know how i can create data mart from OLTP database. i want to learn that step by step. help me??? please!!
I'm reviewing a data warehouse design schema for a client that is following Kimball's data warehousing principles. One of the first things I noticed was a table of dates with expanded columns giving such information as the year, month, month name, fiscal year, quarter, etc for each date, They also have a surrogate key (int) for the date value. The fact tables store the surrogate key rather than the date value itself. They were very surprised when I questioned the purpose of this table, assuring me that Kimball was very strong on the concept of having a date dimension for each table. I don't see the purpose of a table containing nothing by derived date formats. I think they will get a bigger performance hit from having to link through the surrogate key than they would suffer from having to convert datevalues stored in the fact tables. Has anybody else ever seen this before? Does Kimball really advise this?
Hi,I would like to know if anyone out there really uses SQLServer 2000 (which edition?) to hold the data for a datawarehouse? How much data does it handle efficiently?TIAFrank
Hello all,I just started a new job this week and they complain about the length oftime it takes to load data into their data warehouse,which they do once a month.From what I can gather, they rebuild the indexes before the insert with an80% Fillfactor, then insert the data (with theindexes enabled), then rebuild the indexes with a 100% Fillfactor.Most of my RDBMS experience is with a different product. We would havedisabled the indexes and Foreign Keys, loaded the data, thenre-enabled them, moving any records that violated the constraints into anappropriate audit table to be checked after.Can someone share with me what the accepted "best practices" are for loadingdata efficiently into a data warehouse?Any thoughts would be deeply appreciated.Steve