Matching Relational Records. Is It Possible Using Data Minig?
Jun 20, 2006
Problem:
I am working on a price comparison system which matches the best prices for a purchase (or an order) from exisiting purchase data.
The order is stored in multiple tables including order details (stores major items purchased: e.g., PC) and order sub-details (optional items purchased with the major items: e.g., speakers, backup device, webcam etc.).
There could be a number of major items in an order and each major item could have multiple related sub items. The other variables that affect the price include trade-ins if any, sales going on at the time of order, number of units etc.
Now, for any new configuration (major items/related sub items), the system should be able to return a list of previous purchases made with similar configurations, and similar variables (quatities, trade-ins etc). Even if the same model is not present, similar pcs by the same vendor should be considered. etc etc.
Questions:
Is this possible using Data mining?
If yes, which algorithm is recommended?
Also, can I assign/modify any kind of weights to certain variables (if same model: .6 ; if same model not available but pcs made by same manufacturer available: .3 ; by other manufacturers: .1)?
I have a strange request that might not be possible based on the laws of relational databases but I thought I'd give it a try.
I have three tables which for simplicity I will call A, B and C. Table A contains my master records, Table B contains user details and the final table contains some extra data
In my initial search when joining A and B, I return 100 records. I then need to search in table C for these 100 records based on a criteria. the expected result should return all 100 rows for the ones that match and also the ones that do not match. The problem is that in Table C, not all the 100 IDs exist, so there will not be a corresponding record. Unfortunately, our users still want to see all 100 records in the output. Is this possible
As always any help or direction would be appreciated.
Hi, This is a where clause I am using in a search. WHERE (ADDRESS_STREET LIKE '%' + @Search + '%' ) I am trying to do a search which returns the most matching record. For example if I have a record with Denver as text . If I search for Denvr the spell error is intended , I will not get the result. How can I create a stored procedure to counter probable spelling errors and return matching results in a ranked order. Thanks
Init SC --- 89 Post NCOA --- 89 Post Supp --- 89 Revised Final State Counts --- 89 Revised Final State Counts --- 94 ***********************************************************************************************
Since "Revised Final State Counts" appears in both cycles 89 & 94. How can I query the table so that I only get that 1 record?
There is something I don't understand. When I use join
SELECT r.CHECK_NUMBER, i.orig_file from (AP_INVOICEDOCS i join AP_DETAIL_REG r on r.PAYABLE_ID= i.PAYABLE_ID)
I am getting 76 orig_file records
But when I do
SELECT r.CHECK_NUMBER, i.orig_file from (AP_INVOICEDOCS i right outer join AP_DETAIL_REG r on r.PAYABLE_ID= i.PAYABLE_ID)
I am showing only 8 records under i.orig_file column and I am not sure why. What I need is to get all the AP_INVOICEDOCS in the matching orig_file records.
How to return only non matching left join records. Currently I am doing a traffic management database to learn sql.
I am checking for all parishes with no associated drivers. Currently I only have 2 of such.
The regular left join
select parish.name, driver.fname from parish left join driver on driver.parish=parish.name
Returns the all the names of the parishes and the first name of the associated drive, followed by the matches, however the two parishes with no matches have null for the first name.
update wce_contact set blank = 'missing' where website in ('www.name1.co.uk','www.name2.co.uk','www.name3.co.uk')
I know this query will set 'blank' to missing when it matches the above websites. However if i wanted to set blank to 'missing' where mail1date is not null and mail2date is not null (keep going to mail18date not null) how exactly would i go about this?
I guess it would be a case of adding another bracket somewhere but im unsure?
I'm hoping someone can tell me how to construct a stored procedure thatdeletes all records in tblA not matching the PK in tblBThis gives me the recordset of all records in tblA with no matchingrecords in tblB (ID is the PK in tblB)SELECT a.IDFROM dbo.tblB bRIGHT OUTER JOIN dbo.tblA a ON b.ID = a.IDWHEREb.ID IS NULLthanks,lq
I have a table (tblA) that records the RecordID, UserID andLastViewedDate (DateTime) of each record opened in tblB where RecordIDis the PK in tblB. I want to construct a query that groups all recordsin tblA by RecordID, filters by UserID and keeps only the most recent25 RecordIDs and deletes the rest.This gets me a recordset of all RecordIDs filtered by UserID in tblAbut I can't figure out how to sort it by LastViewedDate DESC and toeliminate those not in the Top25:SELECT RecordIDFROM dbo.tblAWHERE (UserID = 1234)GROUP BY RecordIDAny help is appreciated!lq
I have a database with thousands of records that contain personal details of customers. Some of these records pertain to the same customer - however, they have been submitted by different people, so they differ slightly in detail.
I've been looking to see if any of the data mining tools provided by Business Intelligence Studio in SQL Server 2005 will enable me to achieve a high degree of accuracy in matching records that pertain to the same customer. From what I can see, these tools seem more suited to making general predictions based on large groupings rather than the kind of precise prediction I am looking for.
So I'd appreciate it if anyone could tell me if there is any way I could use Business Intelligence Studio to match these 'duplicate' records together, or whether I will have to create a more SQL-based solution which attempts to match the customer records using SELECT statements and making assumptions about the data.
Hi Gurus!!! I have two tables tabl_a and tbl_b now tbl_a has some records which are not in tbl_b. I want to update tbl_b with records in tbl_a eg: tbl_a tbl_b a a b b c c d d x y z Now I want to update tbl_b with records 'x', 'y', 'z'. I want to keep the matching record just untouched. Something similar. How can I do that??? Thanks in advance!!!
Hi all, I am trying to create a diagram for our database, during the creating, I create some of the relationships which were not there(basically our original database is not relational database, that's why I am doing it) So sometimes I have to chage data type in order to create a relationship for the coloumns in different tables. i.e. change char(16) to varchar(7) (I checked the field that make sure all the data in this field is <= 7 characters)
But when I saved the diagram, there is an error message that state: Errors were encountered during the save process. Some of your database objects are not saved on your diagram.
'agent' table saved successfully 'VisitUSA' table - Unable to create relationship 'FK_VisitUSA_agent'. ODBC error: [Microsoft][ODBC SQL Server Driver][SQL Server]ALTER TABLE statement conflicted with COLUMN FOREIGN KEY constraint 'FK_VisitUSA_agent'. The conflict occurred in database 'CMC', table 'agent', column 'AgentCode'.
What does that mean? is it caused by some of the agentcode data in VisitUSA table which is not in agent table? Thanks! Betty
I am doing some analysis on our customer base and their payment profiles. I have generated two profile strings, one for whether the balance of an account has gone up or down and one for the size of the balance in relation to the normal invoice amount for the customer. So (for example) the balance movement string will look like this:
UUUDUUUDUUUD-D00 Where U = Up, D = Down, - = no change and 0 = no change and no balance
I want to analyse these strings in two ways. The first is that I want to find customers with a similar pattern: in the example below the first and last patterns are the same, just one out of sync but should be considered the same
Movement Multiple CountRecords UUUDUUUDUUUD1230123012301175 ------------0000000000001163 UDUUUDUUUDUU3012301230121082
The second type of analysis is to find customers whose pattern has changed: in the examples above the patterns are repeated and therefore 'normal' in the records below the patterns have changed in that the first part does not match the second part.
Movement Multiple CountRecords UUDUUUDUUUUU-----------07 UDUUUDUUUUUU------------7
good way to approach this without either a cursor or a hidden REBAR. The challenge as I see it is that I have to interrogate every string to find out if there is a repeating pattern and if so where it starts and how long it is (heuristic because some strings will start with a repeating pattern and then the pattern may change or deteriorate) and then compare the string for N groups of repeating characters to see if and when it changes and I can't think of an efficient method to do this in SQL because it is not a set based operation.
Folks,Using NorthWind as Example: Parent Table derived from: Categories. I added a new Column E-Mail and Selecting rows where Category Id <=3. Here is my Data.
Category ID Category Name Category E-mail
1 Beverages Beverages.com
2 Condiments Condiments.com
3 Confections Child Table derived from: Products. I am Selecting rows where Category Id <=3. Here is my Sample Data.
Category ID Product Name Quantity Per Unit
1 Chang 24 - 12 oz bottles
1 Côte de Blaye 12 - 75 cl bottles
1 Ipoh Coffee 16 - 500 g tins
1 Outback Lager 24 - 355 ml bottles
2 Aniseed Syrup 12 - 550 ml bottles
2 Chef Anton's Gumbo Mix 36 boxes
2 Louisiana Hot Spiced Okra 24 - 8 oz jars
2 Northwoods Cranberry Sauce 12 - 12 oz jars
3 Chocolade 10 pkgs.
3 Gumbär Gummibärchen 100 - 250 g bags
3 Maxilaku 24 - 50 g pkgs.
3 Scottish Longbreads 10 boxes x 8 pieces
3 Sir Rodney's Scones 24 pkgs. x 4 pieces
3 Tarte au sucre 48 piesI would like to read 1st Category Id, Category E-Mail from Categories Table (ie. Category Id = 1), find that in Products Table. If match, extract matching records for that Category from Both Tables (Categories.CategoryID, Products.ProductName, Products.QuantityPerUnit) and e-mail them based on E-Mail Address from Parent (Categories ) Table. If no E-Mail Address is listed, do not create output file. In this instance Category Id = 3.Basically I want to select 1st record from Parent Table (Here is Category) and search for all matching Products in Products Table. And Create an E-mail and sending just those matching records. Repeat the same process for remaining rows from Categories Table. I am expecting my E-Mail Output like this: For Category Id: 1
2 Northwoods Cranberry Sauce 12 - 12 oz jarsI am not extracting the Data for any user Interface (ie. Grid View/Form View Etc). I will just create a Command Button in an ASP.NET 2.0 form to extract Data. My Tables are in SQL 2005. I was thinking to read the Category records in a Data Reader and within the While Loop, call a SP to retrieve the matching records from Products Table. If matching records found, call System SP_Mail to send the E-mail. The drawback with that for every category records (Within While Loop) I need to call my SP to get Products Data. Will be OVERKILL? Ideally I would like extract my records with one call to a SP. Is there any way I can run a while loop inside the SP and extract Child Data based on Parent Record? Any Help or sample URL, Tutorial Page will be appreciated. Thanks
I have an Excel file which contains some data. I want to load that into a SQL server Table. Here are my conditions :
1. If the table doesn't have any matching records from the Excel file, then my DFT should load the data from that Excel to the Dest Table.
2. If the table has even one or more matching records, then the DFT should not process at all, instead I should send an email to the business stating that there are some matching records and hence the package is not process...ed.
P.S. If i use Lookup, I have two matching and non-matching output. which will process the non matching records into the table and matching can be redirected to any flat/Excel file. But i don't want to do this. I just want to lookup the Sql Server table and excel.
It'll be good if there is an additional option in the Lookup "Fail component on matching records".
I have a few questions for you guys. I have a client application that can be offline or online. While offline, records can be added and need to be later synced to production.
I will use rda to pull the table down, and this is working fine. Now what if I have multiple tables where I want a foreign key relationship?
With rda I can only pull down one table at a time from everything I've read. Now say create a constraint after pulling the two or more tables down. While in offline mode I create a new record on two seperate tables with foreign key/primary key relationship.
When I do the push to the server will it automatically update the foreign key reference (locally) to the right one on the production server? Or will I get a duplicate primary key error? On the production server the primary key will be different because of the identity. This is very important because I will have multiple clients.
create table a (id int, name varchar(10)); create table b(id int, sal int); insert into a values(1,'John'),(1,'ken'),(2,'paul'); insert into b values(1,400),(1,500);
select * from a cross apply( select max(sal) as sal from b where b.id = a.id)b;
Below is the result for the same:
idname sal 1John500 1ken500 2paulNULL
Now I'm not sure why the record with ID 2 is coming using CROSS APPLY, shouldn't it be avoided in case of CROSS APPLY and only displayed when using OUTER APPLY.
One thing that I noticed was that if you remove the Aggregate function MAX then the record with ID 2 is not shown in the output. I'm running this query on SQL Server 2012.
I am just now starting the switch from .NET 1.1 to .NET 2.0. I really like the new way of using the SQLDataSource and setting up Views declaratively as opposed to doing it all in code, which brings me to my question.In some of my applications I have a single Stored Procedure return multiple result sets to a single DataSet where I have a DataRelation set up. Then I can have nested DataGrids that use the GetChildRows() method to filter the results to display the hierarchical data. I would like to do something similar with the SQLDataSource and GridViews but haven't found a way to get multiple result sets.One thought I had was to create a Strongly Typed Dataset and then use the ObjectDataSource object, but I still didn't see a way to get child rows out of the datasource. I've seen an example that uses a <FilterParameter> to get nested data, but there is an extra trip made to the server for each parent item as it just put an extra parameter in the WHERE clause of the query.
Hi Please help for this simple problemDTS Transfer or any other method?I have Customer_Order Table and customer_Order_Details Table.For OrderID = 1, I have 3 rows of Order Details.I want to transferCustomer_Order Table for OrderID 1 in DTS, the system should transferOrder Table as well as Order details table Rows for ORderID =1.How to customize in DT or is there any way to get this data to transferfrom source DB to Dest DB?KAMAL KUMAR V--Posted via http://dbforums.com
Hi i am trying to use this query to pull all the publications stored in the database and all the authors contributing to that publication (1 to many relationship). I am trying to use a sub query so that i can display the results on one row of a gridview (including a consecutive list of all the authors). However i am recieving this error: Incorrect keyword near the word SET. ?
Maybe i need to add a temp column in the sub query to pull all the related authors for a single publication - but i dont know the sql for this? can anyone help?
Thanks SELECT ISNULL(Publication.month, '')+ ' ' + ISNULL(convert(nvarchar, Publication.year), '') as SingleColumn, Publication.publicationID, Publication.title FROM Publication WHERE Publication.publicationID IN (SELECT (convert(nvarchar, Authors.authorName)) FROM Authors INNER JOIN PublicationAuthors ON Authors.authorID = PublicationAuthors.authorID) AND Publication.typeID IN (SELECT PublicationType.typeName FROM PublicationType INNER JOIN PublicationType ON Publication.typeID = PublicationType.typeID
Hello everyone, this is my first post here so hopefully I am not asking a common question.
I am trying to create a flat dataset in SQL 2005. Basically I run a query and I get multiple rows for the same primary key. The query I am running is quite large and has several different tables connected to it, here is a small sample of what it looks like...
Typeid(Primary Key) Individual Address
1 Sam 912 Ave. J
1 Sam 913 Ave. Q
1 Sam 914 Ave. R
2 Mike 1000 Ave. O
3 Jill 1001 Ave. O
I want it to kind of look like this
TypeID Individual Address_1 Address_2 Address_3
1 Sam 912 Ave. J 913 Ave. Q 914 Ave. R
2 Mike 1000 Ave. O
3 Jill 1001 Ave. O
As I said before, this query is pretty big, and has several variables like Address where multiple rows are being taken by one Primary Key.
If it is not possible to do this in SQL 2005, is there a program that may be able to? Right now we are using SPSS as sort of a bandaid... we run the query in small portions like the one in the sample and then restructure the in sections but this takes several hours.
Anyways, thanks for any help that you may be able to provide.
I have played around with SSIS in addition to reading an SSIS book front to back, but I am still a little confused as to how to import an Xml file with relational data.
Basically I want to import the Xml data into three tables: categories, products and fields. A product can belong to one or more categories and has one or more fields which store information about the product.
Using the Xml Source component I can load the Xml from the file, but I can only output one section (category, product or field) at a time. Since the relationship is infered from the hierarchical structure of the Xml (e.g. the fields don't store an ID of the product they belong to), I am not sure how to import the relationships into my tables.
If anyone has any tips on how I can go about that, then it would be most appreciated :)
Ok, I have a page on my website where we can add products to our database. We are a music store, and most products have different versions or colors. I've created 2 tables, Products and Subproducts. The products table may hold info like Fender Stratocaster, and the subproducts would hold colors (Blue, Sunburst, etc). The subproducts table has an integer field called MainProductID, which is linked to the mainproducts table field RecordID. So far the page uses a wizard where if first creates the main product using an sql datasource. After the data has been added to the main products table, my page gives you the opportunity to add different sub products. The problem I am having is actually feeding in the RecordID from the main products table to my insert parameter on the sub products data source. This is what I have tried so far: There is a formview on the page that is bound to the main products table, after the entry is created I can physically see the info on my screen, so I know the data is there at my disposal SubProductsDataSource.InsertParameters.Add("@MainProductID", Formview1.datakey.item("RecordID"))SubProductsDataSource.Insert()Using this adds the data to the table, but the MainProductID is nullalso is there a cheap little way to refresh a page, because when I upload the product images I have it go to the next step where you are supposed to be able to see the images you uploaded, I don't see them which makes me think that the page is loading faster than the images are uploading. Thanks
hi All, I tried following piece of code in SQL 2005 , is working fine. Select * from Table FOR XML RAW('RECORDS')ELEMENTS,ROOT('MyTable'). But when i tried the same thing in SQL 2000 , it was not working . Plz suggest a way in 2000 to get the XML output with custom RootNode Name in 2000 also like in 2005 ? Thanks in advance. Mohit
As people say, Microsoft was the first major database vendor to include data mining features in a relational database. What dose this exactly mean? Thanks a lot for any guidance.
hello, I am beginner for asp.net and sql server. I used Sql server manegement studio full version and I exported my aspnetdb which was created by VS2005 to my host sql server. I have a question: relational tables are not relational no longer. I noticed that when I created database diagram. what is wrong by exporting? thanks for your helps...
My Input is a flat file source and it has spaces in few columns in the data . These columns are linked to another table as a foreign key and when i try loading them in a relational structure Foreigh key violation is occuring , is there a standard method to replace these spaces .
what approach should i take so that data gets loaded in a relational structure.
for example
Name Age Salary Address dsds 23 fghghgh
Salary description level 2345 nnncncn 4
here salary is used in this example , the datatype is char in real scenario
what approach should i take to load the data in with cleansing the spaces in ssis
When I deploy the cube which is sitting on my PC (local) the following 4 errors come up:
Error 1 The datasource , 'AdventureWorksDW', contains an ImpersonationMode that that is not supported for processing operations. 0 0 Error 2 Errors in the high-level relational engine. A connection could not be made to the data source with the DataSourceID of 'Adventure Works DW', Name of 'AdventureWorksDW'. 0 0 Error 3 Errors in the OLAP storage engine: An error occurred while the dimension, with the ID of 'Customer', Name of 'Customer' was being processed. 0 0 Error 4 Errors in the OLAP storage engine: An error occurred while the 'Customer Alternate Key' attribute of the 'Customer' dimension from the 'Analysis Services Tutorial' database was being processed. 0 0
i have one task in which i have to match some attributes(required for creating a new databse) with the exiting database, are these attributes present in exisisting database, if yes how many , and how many are not,pls do reply