Can anyone tell me, how the Business Ã?ntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.
I understand Mr. MacLennan's explanation provided at http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=282651&SiteID=1 and appreciate the time he took to explain how importance works. However, like the user with username "sang", I also ran the data in BI 2005 and got the same results listed by the aforementioned user. I did this using the following data:
donut muffin
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
n y
n y
n y
n y
n y
etc.
The rule muffin -> donut has an importance of -0.105302438, which is not the same as Mr. MacLennan's results. I tried switching the roles of a and b in a -> b and using different bases on the logarithms. I don't get the result of -0.105302438 with any of these. I also tried to calculate importance with a small data set I have and can't get the results using Mr. MacLennan's explanation with that data set either. Any thoughts on the descrepancy?
hi, i have a exercise using association datamining my database have 350 records, i use 90 records for datamining and it release some rules which i choose on top of mSOLAP_NODE_SCORE, but when i use select statement to check my result i have 1 records, the same as my result, and 5 records not true; for example: rules A=a,B=b-> C=c select * from <my_table> where A='a' and B='b' and C='c'; ==>1 record return select * from <my_table> where A='a' and B='b' and C<>'c'; ==>5 records return C with 3 values c1,c2,c with the second statement C includes 2 c1 and 3 c2
i don't understand how they work. i want to choose some best rules can present my database. how can i choose importance and probability to get best rules. with database have 90 records and a database have 350 records which values i should use for minimum_probability, Minimum_Support, Minimum_importance... when i choose rules i should choose on importance or probability.
I have run into a .. somewhat of a "duh" question. I'm running association rule to run a basket analysis, and I'm trying to get probability of each prediction. I know this is wrong, but how do I go about running PredictProbability on each ProductPurchase prediction?
When I run the below DMX query, I get this error message...
Error (Data mining): the dot expression is not allowed in the context at line 5, column 25. Use sub-SELECT instead.
Thanks in advance...
-Young K
SELECT t.[AgeGroupName] , t.[ChildrenStatusName] , (Predict([Basket Analysis AR].[Training Product], 3)) as [ProductPurchases] , (PredictProbability([Basket Analysis AR].[Training Product].[ProductName])) as [ProductPurchases] From [Basket Analysis AR] PREDICTION JOIN OPENQUERY([DM Reports DM], 'SELECT [AgeGroupName] , [ChildrenStatusName] FROM [dbo].[DM.BasketAnalysis.Contact] WHERE isTrainingData = 0 ') AS t ON [Basket Analysis AR].[Age Group Name] = t.[AgeGroupName] AND [Basket Analysis AR].[Children Status Name] = t.[ChildrenStatusName]
I haven't been able to find a DMX query which will spit out the cases which support a particular association rule. I was hoping it would work sort of like drillthrough but show only the cases supporting a particular rule. Am I missing something?
What I ended up doing was extracting the itemsets of the rule from the model's content then running a SQL query to retrieve the cases that contain both the left-hand and right-hand itemset of the rule. I'm hoping there's a better way.
I read somewhere that market basket analysis finds rules with substitutes as likely as rules with complements due to a consumer behavior called "horizontal variety seeking". This is when customers buy more than one product in the same category even though they are subsitutes. For example, when people go to the grocery store and buy soda, they buy coke and sprite at the same time even though they are substitutes of each other. I was wondering if anyone has experience with this anomaly and how they solved it. I found a time series model called the vector autoregressive model which is used to find the elasticity of prices over a time period. Does anyone have experience working with the VAR model? I am having trouble figuring out what some of the variables in the model are.
What is the algorithm that generates the itemsets in the Association model? I'm looking to possibly use this part of the Association algorithm (i.e. the grouping into itemsets) in a separate plug-in algorithm.
I need to create a set of cases for a project that uses the Microsoft Association Rules algorithm to make recommendations for products to customers. My question is: the set of scenarios must include all transactions of customers for training?. or is it sufficient some percentage of total transactions? If i do not use all transactions of customers, could be that the algorithm does not consider some products in their groups or rules and could not make recommendations about these?
MS uses the a priori algorithm in Association Rules, while other DM software have gone to the Novel Algorithm. Can you tell us why MS decided to stay with the a priori? Did you overcome the limitations that it's accused of having? Thanks!
In assotiation rules each rule has a [support, confidence] part. In Microsoft Association Rules there is a [probability,importance] measure in each rule and importance can be greater that 1.
I found the following in msdn but i'm not sure if i understood correctly.
MINIMUM_PROBABILITY: Specifies the minimum probability that a rule is true. For example, setting this value to 0.5 specifies that no rule with less than fifty percent probability is generated. The default is 0.4.
MAXIMUM_SUPPORT: Specifies the maximum number of cases in which an itemset can have support. If this value is less than 1, the value represents a percentage of the total cases. Values greater than 1 represent the absolute number of cases that can contain the itemset. The default is 1.
My questions are 1) Can i explain the [probability,importance] in [support,confidence]? If yes, how? 2) What importance>1 means?
i am trying to associate city in patient table --> disease in diseases table. I want to build association data mining model and use it on web form, such a way when the user enters city associated disease will be displayed.
should i select all 3 table to build the model? could help me to decide what tables should i select as Case and what tables as Nested? what attributes from the table should i select as key, input, predictive ?
i am using data mining tutorials on sqlserverdatamining.com to build this model. is there anything further during my model building i get into confusion? please suggest me where i can find complete resource or inform here.
i appreciate Mr.Jamie for his guidance so far in my academic project. i do have the book 'Data mining with sql server 2005'. I left with just one day to do this and document.
hoping someone could suggest. your help is much appreciated.
managed plug-in framework that's available for download here: http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB&displaylang=en#DMAPI.
This package includes the source code for a sample plug-in algorithm written in C#.
in this source code all .cs files are modified for clustering algorithm
if my plugin algorithm is of association or classification type then what modifications are requried in source code???
I have a table containing call records, and made a mining model from that table only. The model has 3 columns : calling_number, called_number, and target_operator, using Association Rule algorithm. The key is calling_number, input was operator, and predicted column called_number.
The result shows no rule, but there are results with item-set size of 1 (column) and 2 (column). On the top record of the result, SQL Server says there are 1891 support for called_number = 1891 and operator = 'INDOSAT'.
I queried the table with this query
SELECT DISTINCT calling_number FROM call_records WHERE called_number = '07786000815' AND target_operator = 'INDOSAT';
It returns 2162 records instead of 1891. If I removed the DISTINCT qualifier, SQL Server returns 2159 records. Why is this differences with the result of mining?
Hi, I am using Sql 7.0 with sp2. I just started as a sql dba. I have a question here, What is the importance of SID's ? When we are mapping to sql logins and user_id 's how we have to give importance regarding SID's. Pls suggest me a good article or some suggestions...
Hi all-- there is a file in C:Program FilesMicrosoft SQL ServerMSSQLBinn direcetory called sqlctr.h which contains a lot of counter parametres..could any one tell me having its importance and can we change any of parametres to gain performance.. Thanks in advance..
// This file is generated by the description file processor. // Please do not edit.
I am thinking of an easy way to explain importance to Marketers without going into the math. This is what i came up with so far. Does this sound correct to you guys?
Reasoning:
IMPORTANCE = Log(Improvement)
Improvement=P(X&Y)/(P(x)*P(y))
Improvement= (Probability 2 products are sold together)/(random chance 2 products are sold together)
If the (Probability 2 products are sold together) = (random chance 2 products are sold together) then Improvement=1. The log(1) = 0
IMPORTANCE SCORE -2 to -1 10 to 100 times less likely than random chance -1 to 0 0 to 10 times less likely than random chance 0 to 1 0 to 10 times more likely than random chance 1 to 2 10 to 100 times more likely than random chance 2 to 3 100 to 1000 times more likely than random chance 3 to 4 1000 to 10000 times more likely than random chance 4 to 5 10000 to 100000 times more likely than random chance 5 to 6 100000 to 1000000 times more likely than random chance 6 to 7 1000000 to 10000000 times more likely than random chance
Is there a way to explicitly assign 'weights' or 'importance' factors to attributes and have that to be considered by the association rules and decision trees algorithms during training? I would like to do so without preprocessing the data (In any case, I can't think on a way to assign weight with preprocessing to boolean attributes like 'smoker')
Those of you who have installed SQL Server 2005 may have noticed that the installation creates several new Windows groups on the server. Do not underestimate the importance of these groups.
I try to search my data and sort the result by importance.
I'm using a MS Access database and my data (table1) looks like this:
Code:
ID NAME TEXT 1 Apples Good red apples 2 Bananas Fine yellow bananas 3 Yellow apples Great yellow apples
I want to search the data and get a result where the column "NAME" is more important than "TEXT". My SQL looks like this:
Code:
SELECT id,name,text,1 AS searchorder FROM table1 WHERE name LIKE '*yellow*' UNION SELECT id,name,text,2 AS searchorder FROM table1 WHERE text LIKE '*yellow*' ORDER BY searchorder
The output is this:
Code:
ID NAME TEXT SEARCHORDER 3 Yellow apples Great yellow apples 1 2 Bananas Fine yellow bananas 2 3 Yellow apples Great yellow apples 2
So far so good - the order by importance works - but I do not get unique columns because of the searchorder column.
Can I fix my SQL so I get unique columns where the last line of "Yellow apples" does not appear or am I lost in space?
During testing a package repetatively that deletes/inserts into several tables, over the course of several days, my package, which took 45 minutes to load 1700 XML files, began to take over 6 hours. Turns out it was an I/O bottleneck, and the Avg Disk Queue Length was around 200 and I was incurring many PAGEIOLATCH_EX. My devl machine uses a single local disk, no raid, so I had no options there, but I ran the maintenance wizard to recreate indexes/statistics and defraged the hard drive, and regained my original 45 minutes time. I guess I'll have to put a maintenance plan together to do this nightly.
Hi I came across something like 3-4-5 rule while going through datamining book....but couldn't get from where that rule has been generalized and how it really works....
How can I setup the dbs in sql server so that when I change the data in one table the changes will cascade down to the tables in my other dbs. Therefore, one database would hold a primary key table. If I had 15 other dbs, then I could somehow link them so the data changed in the primary key table of the 1st database would cascade down to the other dbs.
Hi, I have a database which saves data about bus links. I want to provide a information to passenger about price of their journay. The price depends on three factors: starting busstop, ending busstop and type of ticket (full, part - for students and old people, ...). So I created a table with three foreign key constraints (two for busstops and one for type). When the busstop is deleted or type of ticket I want all data connected with it to be deleted automatically. I wanted to use cascade deleting. But I receive a following exception: Introducing FOREIGN KEY constraint 'FK_TicketPrices_BusStops1' on table 'TicketPrices' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. How can I achieve my task? Why should it cause cycles or multiple cascade paths?
Hi,I have a table with the following columns:ID INTEGEDR,Name VARCHAR(32),Surname VARCHAR(32),GroupID INTEGER,SubGroupOneID INTEGER,SubGroupTwoID INTEGERHow can I create a rule/default/check which update SubGroupOneID &SubGroupTwoID columns when GroupID for example is equal 15 onMSSQL2000.It is imposible to make changes on client, so I need to checkinserted/updated value of GroupID column and automaticly updateSubGroupOneID & SubGroupTwoID columns.Sincerely,Rustam Bogubaev