Looking For Recommended Approach To Merging Records
Aug 16, 2006
I am trying to create a dimension table and I am pulling in data from two tables to create it. I need all records from table A, any records from table B that are not in table A, and I need to use the fields from B for those records that do match. What would be the best way to approach this, merge join + derived columns, union all + aggrigation? Any suggestions?
It seems like it's harder to do this in ssis rather then just doing it in the database.
View 1 Replies
ADVERTISEMENT
Mar 8, 2007
All,
I am new to DTS/SSIS and have a couple of questions about using it to solve a problem. We have an application running on SQL Server 2005 where status records are written to a status table. I need to be able to send those records over to a status table in a legacy application running on Access.
Originally, I thought about writing a custom c# stored proc and accessing Access from it and then someone pointed me to DTS/SSIS.
Is there a way to exectute the package based on a trigger event that a row was inserted or updated? If not and I take a scheduled approach (every 3 minutes, etc.) do I have to maintain a column for the records that get processed so they are not picked up again.
In general is using SSIS the approach to take? The overall business requirements are straight forward, but I am not sure if SSIS is overkill for this or not.
Thanks,
Steve
View 1 Replies
View Related
Oct 15, 2005
Hello there. I am Completely new to SQL and this forum, and this problem that I have may appear to be very basic to you guys but still... I was wondering if I could get some help with a database I am trying to make in MS Access.
I have used the Access TransferText function to import data from a text file into a table with an ID attached to each line, eg.
ID Text
1 Hello world
2 This is an example
3 Of my database
I want to merge the data, or copy it into a field in a new table to get:
ID Text
1 Hello World
This is an example
Of my database
2 [more imported text from a different table]
and i have been advised that SQL is the best way to do this. Is it possible to have line breaks in a field within microsoft access, or would it have to be structured as
ID Text
1 Hello World This is an Example Of My Database
2 ...
And how would i make the SQL to do this?
Thanks,
Thom
View 2 Replies
View Related
Sep 8, 2013
I have a table like
Number Desc
1 Bank
2 Shop
3 Store
2 Home
1 Mall
2 House
I want to have a result as
Number Desc All
1 Bank Mall
2 Shop Home House
3 Store
using a proper select syntax
View 3 Replies
View Related
Apr 21, 2014
We have a data warehouse staging database in which we capture change history for hundreds of tables from a source system. In the source system, records are updated in place, but in our data warehouse we capture these changes by "terminating" the existing record and adding a new record reflecting the changes. In the data warehouse we add two columns to every table -- effective_date and expiration_date -- which indicate the dates the record was in effect in the source system. By convention, an expiration_date of 6/6/2079 means the record is currently still active in the source system. Each day we simply compare yesterday's version of the record (in the data warehouse) against today's version (in the source system). If differences are found in any of the columns, we terminate the record and add a new one, setting those dates appropriately.
In this example, the employee_id column is the natural key in the source system. We add the effective_date and expiration_date in the data warehouse, so those three columns together make up the key in the data warehouse. The employee_name, employee_dept, and last_login_date columns all come from the source system as well.
drop table mytbl
create table mytbl (
effective_date smalldatetime,
expiration_date smalldatetime,
employee_id int,
employee_name varchar(30),
[code]....
In the select output, you can follow the trail of changes for each of these three employees. Bob moved from dept 7 to 8 at some point; Frank didn't change departments at all; Cheryl moved from dept 6 to 9 and later back to 6. However, the last_login_date was updated frequently for all these employees.
We've tracked hundreds of tables this way for years, some with hundreds of columns. For optimization purposes, I'm now interested in trimming the fat a bit. That is, we track changes in many columns that we don't really need in our data warehouse. Some of these columns are rapidly-changing, causing all sorts of unnecessary terminate/inserts in the data warehouse. My goal is to remove these columns, reclaim the disk space and increase the ETL speed. So in this example, let's get rid of the last_login_date column.
alter table mytbl
drop column last_login_date
select *
from mytbl
order by employee_id, effective_date
Now in the select output, you can see we have many "effective duplicate" records. For example, nothing changed for Bob between 1/1/2014 and 1/31/2014 -- those really should be one record, not three. Here's the challenge: I'm looking for an efficient way to merge these "effective duplicates" together, through set-based sql updates/deletes/inserts (hoping to avoid any RBAR operations). Here's what the table ultimately should look like (cheating to get there):
create table mytbl2 (
effective_date smalldatetime,
expiration_date smalldatetime,
employee_id int,
employee_name varchar(30),
employee_dept int
[code]...
Note that Bob only has two records (he changed department), Frank only has one record (no changes), and Cheryl has three records (two department changes).
My inclination would be to drop the unwanted columns, then GROUP BY all the remaining columns from the source system, and taking the MIN effective_date and MAX expiration_date. However, this doesn't work for cases like Cheryl's -- she moved to another department, then back again, so that change history needs to be retained.
As I mentioned, we have hundreds of tables, and I'd like to strip out dozens (maybe hundreds) of unused columns, so ultimately there will be millions of these pseudo-duplicates that need to be merged together. These are huge tables, so I really need to find an efficient set-based approach to this.
View 2 Replies
View Related
Jun 2, 2015
I'm trying to avoid a large amount of manual data manipulation.
Here's the background: Legacy system that has (well let's call apples apples) pretty much no method of enforcing data integrity, which has caused a fairly decent amount of garbage data to be inserted in some tables. Pulling one of the [Individuals] table from within this Legacy system and inserting it into a production system, into the Table schema currently in place to track [Individuals] in this Production system.
Problem: Inserting the information is easy, how to deduplicate the records that exist within the staging table that the legacy [Individuals] table has been dumped into in production, prior to insertion. (Wanting to do this programmatically with SQL or SSIS preferably, so that I can alter it later to allow for updating existing/inserting new)
Staging Table Schema:
;
CREATE TABLE [dbo].[stage_Individuals](
[SysID] [int] NULL, --Unique, though it's not an index intended to identify the [Individuals]
[JJISID] [nvarchar](10) NULL,
[NameLast] [nvarchar](30) NULL,
[NameFirst] [nvarchar](30) NULL,
[NameMiddle] [nvarchar](30) NULL,
[code]....
Scenario: There are records that duplicate the JJISID, though this value is supposed to be unique for every individual. The SYSID is just a Clustered Index (I'm assuming) within the Legacy system and will be most likely dropped when inserted into the Production [Inviduals] table. There are records that are missing their JJISID, though this isn't supposed to happen either, but have valid information within SSN/DOB/Name/etc that can be merged into the correct record that has a JJISID assigned. There is really no data conformity, some records have NULLS for everything except JJISID, or some records will have all the [Individuals] information excluding the JJISID.
Currently I am running the following SQL just to get a list of the records that have a duplicate JJISID (I have other's that partition by Name/DOB/etc and will adapt whatever I come up with to be used for those as well):
;
select j.*
from (select ROW_NUMBER() OVER (PARTITION BY JJISID ORDER BY JJISID) as RowNum, stage_Individuals.*, COUNT(*) OVER (partition by jjisid) as cnt from stage_Individuals) as j
where cnt > 1 and j.JJISID is not nullNow, with SQL Server 2012 or later I could use LAG and LEAD w/ the RowNum value to do my data manipulation...but that won't work because we are on SQL Server 2008 in this environment.
[URL]
With, the following as a potential solution:
GSquared (3/16/2010)Here's a query that seems to do what you need. Try it, let me know if it works.
Performance on it will be a problem, but I can't fine tune that. You'll need to look at various method for getting this kind of data from the table and work out which variation will be best for your data. Without access to the actual table, I can't do that.
;
WITH CTE
AS (SELECT master_id,
MIN(ID) AS first_id,
MAX(Account_Expiry) AS latest_expiry
FROM #People
GROUP BY master_id)
SELECT P1.master_id,
[code].....
Unfortunately, I don't think that will accomplish what I'm looking for - I have some records that are duplicated 6 times, and I'm wanting to keep the values within these that aren't NULL.
Basically what I'm looking for, is to update any column with a NULL value to the corresponding Duplicate [Individuals] record value for that column.
**EDIT - Example, Record 1 has a JJISID with NULL NameFirst & NameLast BUT Record 2 has the same JJISID and values for NameFirst & NameLast. I'm wanting to propogate the NameFirst & NameLast from Record2 into Record1
View 6 Replies
View Related
Jan 24, 2015
I have a database full of different types of leads some for company A some for company B and so on, each doing a different service. However the leads from B can be used for A and leads from A can be used for B, so I want to merge the data.
Example:
Phone Number Name Home Owner Credit Insurance
727-555-1234 Dave Thomas Yes B
727-555-1234 Dave Thomas Gieco
I would like the end result to be one record:
Phone Number Name Home Owner Credit Insurance
727-555-1234 Dave Thomas Yes B Gieco
Since these were imported into SQL they all have a unique ID, here are the current labels
ID,phone_ number,first_ name,last_name,address1, address2, address3,city,state,postal_code,HOME_OWNR,HH_INCOME,CREDIT_RATING,AGE,MATCH,source_id,
title,comments,dnc_flag,provider,vehicle,coverage,alt_phone,email,marital status,dob
View 8 Replies
View Related
May 31, 2002
I have a 6.5 database running on NT 4.0 that is approximately 43GB in used space. I am making a case to management for some sort of upgrade to the whole system.
What sizes of 6.5 databases would anyone consider more "risky"? Is 43GB large for 6.5 (my thought is that it is)?
Thanks,
Kurt Symanzik
Handleman Company
View 2 Replies
View Related
Feb 9, 2004
Let's say for instance that you have a group of tables that stores address information for different groups (i.e. Doctors, Patients, Providers, etc.) Would it be better to create each table to store the address information or create an Address table that would store this information with an Address type and a link back to each table?
I prefer the second choice, but am having a hard time convincing other devlopers to follow this route. Maybe if I have some input from a more experienced users group I can stress my point a little more effectively. Thanks in advance for any input you can provide.
View 2 Replies
View Related
Jul 14, 2006
I need to import a CSV file with a few million records and 50 fields into a table. Only 1 column in the file needs to be transformed and a second column needs to be checked for data validity (e.g. don't want to let someone pass in 'CA' for an integer field.). Two approaches come to mind:
1. Use SSIS to read the file directly into the table, then apply t-sql to do a mass update to the single field that needs to be transformed. (with this approach it is not clear how to check the data valdity in each row via t-sql, though).
2. Use SSIS to import the file, 1 line at a time, transforming the data and checking its validity.as it goes. I suspect this approach will be much slower than that in 1) but I haven't tried it yet.
Which way do you think would be the fastest?
TIA,
barkingdog
View 3 Replies
View Related
Aug 22, 2005
Can somebody recoommend me books for sql server 2005. I am interested specially in CLR inside sql and Business Intelligence.
View 4 Replies
View Related
Jun 5, 2007
In a situation where one may have a single master SQL Server that ultimately needs to communicate information back down to 1000's of downstream servers, what is the recommended architectural approach?
It doesn't feel right to have to add 1K-5K routes to the master SQL Server. Is there a way to have the dowstream servers "broadcast" their existence to the master, so that new servers can be added and updates can happen seamlessly? Does this fall into a pub-sub scenario or is there a better way? And, if so, how to ensure an open conversation (so that one server doesn't miss information that all the other servers received)? Should the master dynamically create routes or better to rely on an open conversation initiated by the downstream server?
View 20 Replies
View Related
Apr 23, 2001
I know I've seen documentation on this but I can't find it at the time. What's the recommended file locations for a SQL install.. System and Data on a RAID drive and logs on a separate drive that's mirrored..? Oh and if anyone has links to this info let me know also.
Thx!
View 4 Replies
View Related
Jan 18, 2002
Hi,
What are current thoughts about who should own a Database?
I see 3 possibilities:
1. The DOMAINAdministrator (person wo starts up the Server at Bootup)
2. 'sa', or
3. a person/user closely tied to the database.
reasons for each?
Thanks for your opinions.
MichaelG
View 1 Replies
View Related
Jul 20, 2001
I have a couple of years of light experience with SQL server. I'd like to start studying to take the SQL 2000 exams. I have a good test environment set up and I'm reading through the Books Online. Can anyone recommend a book or books that might be helpful for me? My end goal here is to pass the test in the near future, but I want to really learn SQL rather than just learn to pass the test.
Thanks,
Allie
View 1 Replies
View Related
Nov 5, 2004
i was pondering getting a MCDBA certification. i want to learn everything about the OS i'd use, so i just wanted to get some feedback on whether to go with NT or server 2003, etc. and anyone here recommend even getting or not bothering with the certification?
View 2 Replies
View Related
Jul 27, 2007
We are replicating data from server1 to server2. We expect the connection between servers to be reliable, but we can not always guarantee uptime on both ends. We do not need real-time data access on server2. What type of replication would be best? The downside we see to snapshot is that the data will be growing over time and that means the amount replicated will continue to grow. Can we setup transaction replication and then schedule the updates so it only replicates transactions since the last update? Does this present any problems if the connection is lost at any time between the servers? At this time, we will not be making any changes to the data on server2 so it does not need to be updated on server1.
View 7 Replies
View Related
Aug 12, 2007
Hi,
I have a database app that deploys with sql 2005 express to each end-user. I would like to install sql 2005 express using Windows Authentication only. In this case, should I bother to set an sa password? And if I do set the sa password, how would I go about making sure that the sa password is different for every installation of sql express? Would it be recommended to save every end-user's sa password (possibly tens of thousands of passwords) just in case sql maintenance needs to be done on their computer? Any help would be greatly appreciated. Thanks!
View 6 Replies
View Related
Jul 23, 2005
I'm reading a book 'Professional SQL Server 2000 Programming' by RobertVieirathere is a recommendation: "stay away from building views based onviews"Why? What's so wrong with nested views?
View 2 Replies
View Related
May 31, 2007
I have been using the Microsoft Oracle Provider (MSDAORA) up until I needed to work with CLOB data types in Oracle. As much as I hate to switch providers at this phase in the project, I can't use MSDAORA due to the CLOB limitation.
So, what other providers are available?
I know that there is a native Oracle provider (OraOLEDB.Oracle.1) that is supported in SSIS. Does anyone have any comments on this?
Are there other options for Oracle?
Any comments, feedback, etc appreciated.
Thanks,
Rick
View 3 Replies
View Related
Feb 19, 2008
Hi,
Could anyone recommend a good book on the SQL-Server, please? I need to understand how to retrieve data with select statements and commands. It is urgent!
TIA
View 1 Replies
View Related
Dec 19, 2003
I was wondering what everyones preferred way to install a database in an automated fashion is.. IE:
You have a webapp. It sdriven by SQL Server. You need to prompt the user for a server, username, password, and database. Once you have those, you execute thge scripts against the DB.
I've been using osql.exe. but heres the situation. The installer may be run from a system, which does not have the sql server client tools installed. Which will be a problem.
So, given the situation that the machine the application is being installed on, does not have the client tools installed. How would YOU execute the provided SQL script against a remote server.
View 4 Replies
View Related
Jul 23, 2005
Hi,Our company is an independent Voice applications solution provider withnumber clients using our suite. We have a CT application suite which isrunning with Application Server and SQL Server 7 / 2000 as DB Enginesat the back end.The SQL server has two databases configured:Logging Database - Massive updates every second, the data growsrapidly,Configuration Database - Generally small-sized and updatedoccasionally.Now we want to have the reslience implemented on the server. We have tosynchronize the two databases 'real-timely' and in 'efficient'manner, so that if Primary server or its Databases gets unavailable,the users are seamlessly switched over to the Secondry server that willhave its own set of data updated and well synchronized.Typically, it can be explained as follows:1. We will have 2 database servers A - Primary (acting as publisher)and B - Secondary (acting as subscriber). Our application will beinitially connected to A.2. When A becomes unavailable (for whatever reason), the applicationwill fail-over to B.3. All the users will be switched to server B and the updates are beingdone accordingly without being replicated on Server A temporarily.4. When A is back on-line, A needs to be brought up-to-date with Bautomatically (In other words, I shouldn't have to manually export allthe data from B to A ).Our requirements are:- The system should support Bi-directional Synchronizationbetween both the servers for their set of databases (the logging andconfiguration).- There will be constant and heavy activity in LoggingDatabase, thus if one server gets down the data should be logged andmaintained as it is on second server and on fail-back no data-lossshould occur with minimum latency time.- There could be a scenario when a server fails-over for aweek's time, there will be constant logging each second! Once itfails-back the system should rapidly synchronize the data withoutnoticeable delay among the two server database sets.- The system should also work fine if certain amount ofrecords are purged over a time period.Our concern is, observing the above scenario, how any of your SQLserver replication strategy can help us achieve the requirements.ThanksJohn
View 4 Replies
View Related
Jul 20, 2005
I know the default data path on a SQL7 server is defined in theregistry key SQLDataPath. I want to be able to determine the defaultdata path in a VB.NET application on the local machine and remotemachines. Is using the Registry Class the best way to do that or isthere a SQL command that can tell me? I have read about xp_regread butI can't find it documented anywhere and I do not know what parameterlist it is expecting. I thought this path may be in an InformationSchema view, but I can't find it.Thanks for any help.
View 2 Replies
View Related
Sep 10, 2006
Just wondering which scenarios is suitable to use SQLCLR. Any kind of data access is not recommended I guess. Only things that cannot be easily done in TSQL should be done in SQLCLR but why? Can't those things be done in app layer itself? Scenarios recommended for SQL CLR:
- External data access like filesystem, registry etc
- Complex calculation
- Recursion without data access (this can be implemented with CTE for data access)
If data access with SQL CLR is not recommended why should CLR should be even used and logic reside in database layer.. it makes no sense to me. Any thoughts??
View 19 Replies
View Related
Jun 29, 2006
Some of our databases have many transactions (a million or more) a day. I have read that every so often I need to rebuild indexes, update statictics for all tables (however that is done), and shrink the transaction logs.
I'm confused by all this. What are the daily recommended database maintennace steps steps for database "health" and how can they be done?
TIA,
barkingdog.
View 3 Replies
View Related
May 9, 2008
I did a search (google and on the forums) and found a few suggestions here and there, but I'd like something more complete to follow as far as naming conventions are concerned.
I wrote my first DB based on MySQL/Ruby/Active Record type naming convention...
- plural table names
- all lower cased
- underscores between words
- "id" is auto incrementer for each table
- something+"_at" is for datetime fields
- something+"_on" is for date fields
- referencing the primary id in another table is "tablename (singular)" + "_id".
This worked great in Ruby/MySQL, but in C#/SQL Server, its an ambiguity nightmare! All of my "id" fields conflict and alot of my tables have "added_at" datetime fields and they all conflict with each other. Essentially, any field that's named the same in one table as in another conflict on joins.
For example: users post comments to stories submitted by users...
table = articles
field 1 = id
field 2 = title
field 3 = body
field 4 = user_id
table = comments
field 1 = id
field 2 = title
field 3 = body
field 4 = user_id
field 5 = article_id
Trying to join these two tables is an ambiguity nightmare but I'd like to not have to name every field uniquely or start adding table prefixes to them all...
I guess I just need some good suggestions or links to recommended table structure/naming conventions for SQL Server. Thanks in advance!
View 3 Replies
View Related
Jun 7, 2007
Hi,
I'm new to SSIS.
Is there any recommended reading material that you suggest to learn more how SSIS works?
Thank you!
View 1 Replies
View Related
May 7, 2007
We have just started using SQL 2005 and released our first few projects to prodcution. We are currently using msdb storage for SSIS packages in production using the 'rely on server storage' for protection level and separating each subject areas by folders under msdb in the management studio.
However some of our DBA's feel that this is not the right approach and we should be storing as XML.
Anyone has any recommendation for either or considerations to be taken when deciding what storage to use?
Thanks!
View 1 Replies
View Related
Jun 6, 2007
Would anyone have a suggestion on how to setup a partner to partner NIC configuration for heartbeats/mirroring traffic? I've been told this is the recommended setup but have not found much on how to do it. We currently have a teamed NIC config for redundancy, but would like to have a separate set of NICs on each partner so that mirroring traffic is not interrupted by any regular network traffic.
We also have a witness running in full safety mode. Does this mean partnerA and partner B both need NICs with a crossover cable between them AND is it recommended for the witness to also have extra NICs to both partnerA and partnerB (w/ crossover cables)?
Any suggestions/help/links on properly configuring this would be appreciated.
View 1 Replies
View Related
Jun 9, 2007
what is the recommended data type i should use if i want to have a price field that can include "TBA". i can't use smallmoney i suppose, so i should use VARCHAR then validate the String with Visual Studio?
View 1 Replies
View Related
May 22, 2008
I have an app that will have up to 13 PC's. Each machine will be logging data to the 200X Express Server every 5 seconds. The data size sent each time will be around 1K each.
1. Is this within the limits of 200X Express?
2. At what point do you decide that Express is bottlenecked and Standard is needed?
Thanks for any help!
Ron Lindsey
View 10 Replies
View Related
Jun 5, 2006
As stated in the subject I have a situation where if database mirroring is employed for either manual or automatic failover, all the client (including web connections) connections use ODBC not ADO, or OLEDB etc... so what methods are recommended? Client side redirect is not available so I could not employe the "Data Source =A; Failover Partner=B..." option.
Right now the method employed (pre database mirroring and basically employing log shipping on SQL 2000) is to have a DNS alias for the ODBC connection so that if the server were to change in a failover situation the DNS record would have to be altered, so that all the client connections would not have to be reconfigured.
Regards,
Dominic Baines
View 5 Replies
View Related