Analysis Service 2005: Microsoft Association Rules Problem
Jul 11, 2007
Hi there,
i'm new to this forum .. Maybe my way of expression is not very good, but I hope to be understandable.
I've a sql server 2005 database with 90 columns and more or less 185 thousands records. I've to run microsoft associations rules on my laptop (sony vaio sz3, core 2 duo, 2gb ram).
The problems is that the amount of ram seems not to be enough [it starts to swap when it's reading 240th case)
Because of this, i decided to sample my data by extracting 10thousands records randomnly ... it lasts 25minutes (more or less) now, but it's still to much...
Does a better way exists? What's the problem: column or row numbers?
I build a data mining model to predict what are the best studying methods for the student to pass the examinaton.
Create Mining Model StudentAssociation ( Student_No long key, Gender text discrete predict, PassOrFail text discrete predict, StudyMethod table predict ( MethodName text key ) ) Using Microsoft_Association_Rules ( Minimum_Support=0.02, Minimum_Probability=0.03 )
The mining table will contain all the methods that the students use, no matter their examination is passed or failed. The value of PassOrFail will have either 'Pass' or 'Fail'.
According to the above model, can I query the best studying methods? Or I should only train the model with the student who pass the examination, and ignore all the failed.
I am using Microsoft association algorithm to find the association between PATIENT CITY ---> likely Disease. I like to know how can i import association model after creating from SQL Server BI studio to use in my ASP.NET web form? such a way when the user enters PATIENT CITY, system prompts associated Disease.
I do have Data mining with SQL Server 2005 book, could't find any resource for my objective.
Please suggest best source or tutorial how can i do
When i use the MS association rules ,i don't know how it is worked on the background .I stuy the Fp-Growth algorithm , but there're some questions , I don't kown what's the meaning of transcation database. who can give me one example ? thanks .I know we can store the data in relation database,but in basket Analysis ,how a transaction stroed in relation database?
Hi there, it has been a long i'm trying to execute Microsoft Association Rules on my database.
I solved memory leak problem now, but i still can't understand output rules.
Database contain all the italian student who took a degree last year. Here in Italy, they have to compile a summary where they speak about universitary experience. ie: they talk about experience with teachers (pointage from 1 to 5); they says if they want to continue in the universitary field or not, and so on.
Most of the rules, says: Int_Stud=1-2, RapDoc>4
Int_Stud is the column where i store student intention to continue university. 1 means they want to go on, 2 means they do not want to continue to study. So, this rules has no sense, because it relates all the student (in my mind): the one who wants to continue university and the one who do not want to.
I think problem is that visual studio 2005 and analysis service has no understanding of Int_Stud world, they've no idea that Int_Stud can have just 2 values and that they're opposite each other. Is there a solution to this problem? Can i discretize this column?
Even if I know not to have perfect english, I hope to be understandable
I note that there exist three web viewers for data mining algorithms, namely, DMNaiveBayesViewer, DMDecisionTreeViewer and DMClusterViewer. How come there are no viewers for association rules (itemsets, rules, dependency network)? Can you suggest any alternative way of showing such valuable information in a web application?
I understand Mr. MacLennan's explanation provided at http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=282651&SiteID=1 and appreciate the time he took to explain how importance works. However, like the user with username "sang", I also ran the data in BI 2005 and got the same results listed by the aforementioned user. I did this using the following data:
donut muffin
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
y y
n y
n y
n y
n y
n y
etc.
The rule muffin -> donut has an importance of -0.105302438, which is not the same as Mr. MacLennan's results. I tried switching the roles of a and b in a -> b and using different bases on the logarithms. I don't get the result of -0.105302438 with any of these. I also tried to calculate importance with a small data set I have and can't get the results using Mr. MacLennan's explanation with that data set either. Any thoughts on the descrepancy?
I am doing the Market basket analysis for a retailer using association rule. The whole data set is huge which contains grocery, clothes and books etc. If I want to check out the relationship between several different clothes brands, (e.g. LEVI'S and adidas), should I just remove all the grocery and books transactions, use the subset which only contains clothes transactions to re-run the association rules? Is this gonna work?
I got a question about the data preparation of market basket analysis.
There are always some transactions with only one single SKU product. It seems that these kind of transactions have nothing to do with association. Shall I just exclude them or what?
I want to score my data by only the assoziation rules I filtered in Mining-Model-Viewer.
Is this possible?
I recognized that MiningModel Predictionquery uses the generated Model (all rules).
Is there any way to influence the model at generating time.
Thanks a lot.
Hi,
I specified the point to solve my problem but I did not understand why MS SQL Server didn't recognize the Association parameters I but in. It uses the Minimum_Probability I put in, but the Minimum_Importance i wrote didn't care it, it use the default value.
I want to paste a screenshot here but it isn't possible. By the way I'm using the evaluation version, which should be same as the enterprise edition.
I need to create a set of cases for a project that uses the Microsoft Association Rules algorithm to make recommendations for products to customers. My question is: the set of scenarios must include all transactions of customers for training?. or is it sufficient some percentage of total transactions? If i do not use all transactions of customers, could be that the algorithm does not consider some products in their groups or rules and could not make recommendations about these?
The problem is that I can't seem to reduce the minimum probability below .42 to view more rules.
I've considered that it might be becuase these are the only rules discovered, however I know quite a bit about the data and I would excpect many more associations.
I'm new to analysis services and hopefully this is a quick & easy question. I have a couple of quite large (163,000 tuple) tables with columns essentially representing a bit vector. I would like to mine for association rules but the number of '1' values are very, very sparse and they are the only objects of interest. How can I get more control over the algorithm---that is, how can I stipulate that the state of the column must be '1' to be considered? Any help or direction to the proper documentation would be great.
I'm building a mining model wiht MS Association Rules. After processing this model, the result includes some rules(example):
E = Existing, C = Existing -> B = Existing F = Existing -> E = Existing C = Existing, B = Existing -> E = Existing F = Existing -> B = Existing B = Existing, A = Existing -> C = Existing F = Existing, B = Existing -> E = Existing F = Existing, E = Existing -> B = Existing D = Existing -> A = Existing C = Existing -> A = Existing E = Existing, A = Existing -> B = Existing
I want to buid a query that has two or more items on the left of the rules, example: E = Existing, C = Existing -> B = Existing ->I want to buid a query to predict that: when a customer buy 'E' and 'C' then he likely buys 'B'
I read the paper of sequence clustering. It seems that the major application of the algorithm is for the web site. I was just thinking that can I apply this algorithm on the purchase sequences of credit card data?
If so,please also tell me the difference between sequence clustering and association rules on credit card data application. Although I realize that sequence clustering is a fully probabilistic model and it has the capability of prediction, association rules also give the probabilities of purchasing the other products.
I would really appreciate if you could help me out. I am trying to create a taxonomy to be taken into account into the association rules algorithm. For example, if my data is a group of purchases from a supermarket I could have one client who bought milk, cookies and shampoo, and another who bough cheese, cookies and soap.
I would like to specify that milk and cheese belong to the category "dairy" and shampoo and soap belong to "personal hygene". If there are interesting rules regarding the categories I would like them to be taken into account. Additionally, I would not like to have rules like "milk -> dairy". If one specific object appers in a rule, its corresponding category should not. In this scenario I could have milk and "personal hygene" in the same rule, but not shampoo and "personal hygene".
I have seen this done by other mining tools but I've been having trouble finding a way to make this possible in Analysis Services.
What would be the right design approach for the following problem?
I have a single table called SelectionFactors, which has the following columns and sample data:
ProjectID Factor FactorValue
1000 Countries USA
1000 Countries Canada
1000 Countries France
1000 Languages English
1000 Languages French
1000 Company Type Consulting
1000 Company Type Software
2000 Countries India
2000 Countries China
2000 Countries USA
2000 Languages English
2000 Languages Chinese (Simplified)
2000 Languages Chinese (Traditional)
2000 Languages Spanish
2000 Company Type Retail
2000 Company Type Dairy Products
The problem is to allow a descriptive analysis of the data to find patterns in the users selections. For instance, if Languages->English is selected, what are the counts of projects for other Factor->Factor Value combinations? Countries->USA = 2, Countries->Canada=1, Company Type->Consulting=1 and so on.
Since all the data is in this single table, are both the case and nested tables the same? What are the keys and inputs? I only need a descriptive analysis (no prediction) and ALL possible combinations MUST be part of the results; how should the model be designed?
MS uses the a priori algorithm in Association Rules, while other DM software have gone to the Novel Algorithm. Can you tell us why MS decided to stay with the a priori? Did you overcome the limitations that it's accused of having? Thanks!
If I use this code with an association model, it still returns itemsets for me - when it should be returning only nodes with rules associated with them (according to sqlserverdatamining.com). If I try adding 'AND $PROBABILITY > .25' to the where clause, it returns 0 results for every query I try. Any clue why this may be happening?
Code Snippet
SELECT FLATTENED (SELECT * FROM PredictAssociation([Product],20, INCLUDE_NODE_ID,INCLUDE_STATISTICS) WHERE $NODEID<>'') FROM [ProductRecommend] PREDICTION JOIN OPENQUERY([ds], 'SELECT [PRODUCTCLASSID],[DESCRIPTION] FROM [Product_Table] WHERE [PRODUCTCLASSID] = ''1234'' AND [DESCRIPTION] = ''DESC'' ') AS t
ON [ProductRecommend].[Product].[PRODUCTCLASSID] = t.[PRODUCTCLASSID] AND [ProductRecommend].[Product].[DESCRIPTION] = t.[DESCRIPTION]
This query returns more relevant results than those lacking the filtering by $NODEID, however the results should have higher probabilities than .047! Please help! Thanks!
"If you have a pressing need for this fix, please contact our customer support team."
Yes I have a pressing need for this fix, where I have to contact your customer support team. How do I do that?
You know I only have the evaluation version of SQL Server, but I have to show that this programm is good enough to solve Association Analysis. If I can not show this it wouldn't be bought. Please help me.
In assotiation rules each rule has a [support, confidence] part. In Microsoft Association Rules there is a [probability,importance] measure in each rule and importance can be greater that 1.
I found the following in msdn but i'm not sure if i understood correctly.
MINIMUM_PROBABILITY: Specifies the minimum probability that a rule is true. For example, setting this value to 0.5 specifies that no rule with less than fifty percent probability is generated. The default is 0.4.
MAXIMUM_SUPPORT: Specifies the maximum number of cases in which an itemset can have support. If this value is less than 1, the value represents a percentage of the total cases. Values greater than 1 represent the absolute number of cases that can contain the itemset. The default is 1.
My questions are 1) Can i explain the [probability,importance] in [support,confidence]? If yes, how? 2) What importance>1 means?
I want to filter the itemsets or rules based on more than 2 attributes, how can we achieve that? (I can only filter them by only one attribute?). Is it possible to achieve that?
Thanks a lot and I am looking forward to hearing from you shortly.
Hi, Firstly, I had to post this query here since I did not find any other place relating to Microsoft SQL Analysis Server 2005 and that it is related to SQL server 2005. I am using Microsoft SQL analysis server 2005 and .NET 1.1 for Asp.NET (C#) I need to programmatically create Measures, Dimensions and Cubes using .Net 1.1 (C#) in an Asp.NET page, and then I need to access the measures, dimensions and cubes again from another Asp.Net page. My querys are: 1. To create and manipulate the SQL Analysis server objects like Dimensions, Cubes what should I use? The documentation for Analysis server 2005 says we can use (Analysis Management Objects)AMO. but I am not sure if we can use it with .NET 1.1? 2. If AMO is possible, then what to use? There is one more technology we can use called as (Deciscion Suport Objects) DSO. but they are COM based and were for Analysis services 2000. 3. To query the data, what technology can I use? Microsoft says we can use ADOMD.NET. Microsoft also gives many other technologies that I think do similar work, like XMLA, ASSL. Can somebody help me in this. Also please do give me some links that have code samples for the same.
I am working on an academic medical project. I have created PATIENT table
PATIENT_ID NAME | CITY
DISEASE table
DISEASE_ID NAME |
and Relationship table[FOREIGN KEYS]PATIENT_DISEASE
PATIENT_ID DISEASE_ID
I am using Microsoft association algorithm [SQL Server 2005 BI Studio] to find association between PATIENT CITY --> Associated DISEASE. I will be entering dummy data into these tables as this is academic project. I like to know can i be able to find the associated Disease/s with the PATIENT City with this algorithm ? such a way as soon as the user enters PATIENT City, assocaited Disease will be selected from the diseases table? on web interface [asp.net].
i like to know after building this association model, can i use it on my web page to prompt the user associated disease with patient city? or building this model only gives association rules, i need to write a procedure or t-sql statements to implement the association rules?
I am using SQLSERVERDATAMINING.COM tutorials to build the model.
I'm looking for suggestions on the right design approach in relation to a problem that resembles Basket analysis. The data to be analyzed is a dimension Attribute_DIM and contains an ID, Attribute and Attribute_Value. Some examples of the data are :
ID Attribute Attribute_Value
1 Color Black
1 Movie Men in Black
1 Book Of Human Bondage
2 Color White
2 Movie Men in Black
2 Book Grapes of Wrath
We need to be able to analyze multiple selections of the dimension. For example,
Men In Black
Grapes Of Wrath Of Human Bondage
Men In Black Black 1 1
White 1 0
I have had some success using the Association Algorithm Mining Model. I think It is an overkill since I only need descriptive and no predictive analysis.
I'm looking for some ideas on the right approach to this problem. Ideally, we need to present the data in a cube and have the possibility to perform member analysis of the dimension.
I have looked at several articles (including http://msdn2.microsoft.com/en-us/library/aa902637(sql.80).aspx and http://www.aspnetpro.net/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp). I'm not convinced those are the solutions and would appreciate any insight into this problem.
I am having trouble installing Microsoft SQL Server 2005 Service Pack 2 (KB 921896). When I go to Microsoft Update Home, it is the only update listed as available for my computer and I am trying to download and install it, but every time I do it fails with the error message:
Some updates were not installed
The following updates were not installed:
Microsoft SQL Server 2005 Service Pack 2 (KB 921896)"
This has been going on since the beginning of December, and I have a strange feeling that there are updates that I am not getting because this one keeps failing. I have other Windows XP machines with SQL Server 2005 and they have consistently been getting updates fine. This seems to be the only one. (please note that I have not been keeping track of the KB numbers of every update so it is possible that my other machines have not encountered this specific update.) I have also tried killing all of the services associated with SQL thinking that it couldn't update the program while running, but that did not solve the problem.
Any suggestions on why this won't install? Thank you in advance.
How to right choose key column in"Mining Structure" for Microsoft Analysis Services?
I have table:
"Incoming goods"
Create table Income ( ID int not null identity(1, 1) [Date] datetime not null, GoodID int not null, PriceDeliver decimal(18, 2) not null, PriceSalse decimal(18, 2) not null, CONSTRAINT PK_ Income PRIMARY KEY CLUSTERED (ID), CONSTRAINT FK_IncomeGood foreign key (GoodID) references dbo.Goods ( ID ) )
I'm trying to build a relationship(regression) between “Price Sale” from Good and “Price Deliver”.But I do not know what column better choose as “key column”: ID or GoodID ?
I have created a cube with Analysis Service 2005. I then publish the pivot table (generated in Microsoft Excel 2003) as a web page. When I view the web page on the domain the LAN everything is working properly. But when the web page I view on Internet I get the error.
The query could not be processed:
* An error was encounted in the transport layer
* The peer prematurely closed the connection.
The web page is publish on the Internet Information Server, I changed the security directory of the website, but the error persists.
The firewall of the machine is disabled.
The port 2382 and 2883 are allowed.
The components needed to consult the cube are installed in the machine on which you are viewing the cube.
On the Rol of Analysis Service are allowed all users.
I am trying to set up a SQL 2005 analysis server cluster in our two servers, AS02 and AS04. The server cluster is built up on Majority Node Set (MNS). During the installation of the analysis service, I don't see the available cluster groups. In the cluster admin, the cluster is up and running fine. The MNS cluster has no shared disk, and it has two nodes.
Any thoughts or suggestions? or is it possible to built the SQL 2005 analysis service cluster with MNS?
Windows Update Web Page indicates I have BOTH a SUCCESS and FAILURE for SQL Server 2005 Microsoft SQL Server 2005 Express Edition Service Pack 2 (KB 921896) ?Is it possible for my machine to have TWO DIFFERENT MS SQL Server Express installations? Is the problem real or some small ghost issue? I do have a trial CD from Microsoft .. Visual Studio 2005 Team Suite.
Please note that this is what windows update indicates on the web page duplicate lines are on the web page .. not a copy and paste mistake.
SQL Server 2005 Microsoft SQL Server 2005 Express Edition Service Pack 2 (KB 921896) Tuesday, March 13, 2007 Microsoft Update SQL Server 2005 Microsoft SQL Server 2005 Express Edition Service Pack 2 (KB 921896) Tuesday, March 13, 2007 Microsoft Update
LINK FOLLOWED Installation Failure Error Code: 0x652 Try to install the update again, or request help from one of the following resources
I include the logfile since it seems to convey useful informations to SQL Express experienced.
summary.txt logfile >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Time: 03/13/2007 12:28:02.906 KB Number: KB921896 Machine: somejunk OS Version: Microsoft Windows XP Professional Service Pack 2 (Build 2600) Package Language: 1033 (ENU) Package Platform: x86 Package SP Level: 2 Package Version: 3042 Command-line parameters specified: /quiet /allinstances Cluster Installation: No
********************************************************************************** Prerequisites Check & Status SQLSupport: Passed
********************************************************************************** Products Detected Language Level Patch Level Platform Edition Express Database Services (SQLEXPRESS) ENU SP2 2005.090.3042.00 x86 EXPRESS Express Database Services (SQLExpress) ENU RTM x86 EXPRESS
I am having an existing sql 2005 cluster on an active passive cluster. I need to add analysis service to this as a new component. I am having enough space available in the cluster disks. Following are my queries,
1) do i need a new virtual ip and virtual server name for the analysis services ?
2) do i need a seperate cluster resource group for analysis services with an additional disk added ?