Predicting In Trees
Jun 30, 2006
Hi! I have created a DMM using Trees. But when I go to the Mining Model Predition tab and select a Predict function, I get this in the criteria column: <Scalar column reference>[, EXCLUDE_NULL|INCLUDE_NULL][, INCLUDE_NODE_ID]. When select Result, I get this error: "An incorrect number of arguments are used in the function at line 3, column 3." I'm predicting a continuous variable.
But when I delete everything except <Scalar column reference> I get this error: "Parser: The syntax for '<' is incorrect."
When I delete everything in the criteria column, I get this: "Query execution failed."
If I change the criteria to "<Scalar column reference>,INCLUDE_NULL, INCLUDE_NODE_ID" I get the error again that the query execution failed.
I'm working from a data set I created. I had no problems with predictions using clustering, but can't seem to get Trees to work.
View 3 Replies
ADVERTISEMENT
Jul 27, 2006
Hi,
I have built a Clustering model that captures customer demographic information
and identify various hidden clusters based on the information.
What kind of predictions can I make using the above model?
View 5 Replies
View Related
Oct 23, 2007
Hello,
I have a question regarding whether or not Data Mining can be utilized in a specific problem I have to solve.
Situation: I€™m going to simplify the problem by explaining it in terms of a €œpizza manufacturer€?. Suppose I wanted to predict the run minutes + downtime minutes (I use these to get an hourly rate: Pizzas/(run hrs + delay hrs) = Pizzas per hour) by looking at a set of input properties.
My properties could be something like the following:
# of Toppings
# of Special Pricing Stickers
Cardboard Box Indicator
Case Indicator (0 represents auto-casing, 1 represents putting in case by hand)
Machine Type (0 or 1€¦ 0 represents an older €“slower machine, 1 is newer)
Quantity of Run
(there could be up to 15 other properties that may or may not impact our rate)
Measured Values:
Run Minutes
Delay (down) minutes
Steps I€™ve Done So Far:
I€™ve created a couple different data mining models for this as I was unsure which one(s) to use. I checked the lift chart while feeding back in the original data set and my scatter plot appeared fairly inaccurate.
I've attempted to use Excel to create a linear regression, however my r squared value was always around .30. I decided to try to use SQL Server Data Mining to see if it could be something to help predict our accuracy better than a linear formula.
I've played with a couple different algorithms in Data Mining, and it appeared that none of them did exceptionally well with prediction. I even checked the lift chart using the same table as I used to train the model.
What algorithm(s) might work the best?
Can I reasonably expect a prediction within a fairly strict tolerance (I'm guessing the answer to this is: "yes, if your source data represents a consistent pattern")?
How can I best utilize Data Mining to give an answer like "historically, your run rate has been between these 2 values with a probability of X". I'm thinking I can utilize the predictprobability and stdev to some extent.
Any suggestions would be greatly appreciated.
If anyone needs further clarification, please let me know.
Thank you.
Regards,
Dan
View 2 Replies
View Related
Sep 27, 2006
I would like to create a simple regression equation to predict player win on their next trip. I have tried to create the model using a linear regression tree based on two players (as a test). The result gives me a single node (expected) with only a coefficient instead of a regression equation. I can do this math by hand to get a regression equation and predicted value for the next trip for each player.
The dataset I used for a simple test is.....
Trip #
Player
Win
1
1001
1,250
1
1002
50
2
1001
1,450
2
1002
75
3
1001
1,600
3
1002
100
4
1001
2,000
4
1002
175
I also tried to predict next trip worth using a forecasting model. I was able to process the model but I was not able to browse the model content in the viewer.
Ultimately, I want to predict next trip worth for individual players off of a cube. The cube has about 1.5- 6M records (multiple records per player) depending on the datasource.
FYI - I have created a working linear regression and a forecasting model off of a cube --- I think I am setting it up correctly.
View 4 Replies
View Related
Dec 24, 2007
Hi
In this site sample €œgenerate DMX creation statement for a server mining structure and contained model€? I didn€™t understand, what is output? You say output is like a table, but I don€™t know what usage of this sample is because my request was about showing the result of time series algorithm. please notice my question, I should use a query like this for connecting with forecasting model
€œSELECT
PredictTimeSeries(amount)
From
[Forecasting]€?
And then save this value in the text box?
Also I should have training for showing value in the text box, which stage I should do it, this stage is after creating model or no?
In time series algorithm, training is equivalent of historic prediction?
In this code why these items are unknown and after running we have error
In server. Connect for server and in database db for database and in MiningStructure ms for MiningStructurewe have error .
private void button1_Click(object sender, EventArgs e)
{
Microsoft.AnalysisServices.Server server = new Server();
server.Connect("data source=localhost");
Database db = server.Databases["DMClass"];
foreach (MiningStructure ms in db.MiningStructures)
{
MessageBox.Show("Processing " + ms.Name);
ms.Process(ProcessType.ProcessDefault);
}
MiningStructure msIris = db.MiningStructures["Iris"];
MiningModel mm = msIris.CreateMiningModel(true, "newModel");
mm.Algorithm = MiningModelAlgorithms.MicrosoftClustering;
mm.Update();
mm.Process(ProcessType.ProcessDefault);
server.Disconnect();
}
Please alittle define this code.
What is diffrence between amo and adomd ?
And for viweing prediction vlue in textbox and connecting I can use both of them
And how can I build a child form?Thanks a lot for your answers
View 1 Replies
View Related
May 27, 2007
Lets take the following example:
Movie train table:
ID Class
1 +
2 +
3 -
4 +
5 -
Actor train nested table:
ID MovieID Gender
1 1 F
2 1 M
3 1 F
4 1 F
5 2 M
6 2 M
7 2 F
8 3 F
9 3 F
10 4 M
11 4 M
12 4 F
13 4 F
14 5 F
15 5 M
We want to build a classifier model in order to predict the Class of a Movie based on the Gender of movie's actors. To deal with the nested table Analysis Services maps each record of the nested table to an attribute of the case table. These attributes are named Actor(n).Gender with n = 1..15, and so they are dependent on the nested table record numbers. Both Microsoft Decision Trees and Microsoft Naive Bayes algorihms use these attributes without any modification.
We are implementing a Relational Naive Bayes algorithm and we are planning to aggregate such attributes in order to make them independent of the nested table record numbers.
Next step we tried to predict some unseen cases and here we face with
a very huge problem.
Lets take more two tables of unseen cases:
Movie test table:
ID Class
6 +
7 NULL
8 NULL
Actor test nested table:
ID MovieID Gender
1 6 F
2 6 M
3 6 F
4 6 F
16 7 F
17 7 M
18 7 F
19 7 F
20 7 F
21 8 M
22 8 M
23 8 F
Predicting the movie 6 Class is not a problem since the movie actors were included in the training dataset and when the records are mapped to attributes because they already exist in the model. But when you
try to predict movies (7 an 8) with unseen actors all new attributes are simply ignored in the ALGORITHM:redict call (in_ulCaseValues is zero!) because they do not exist in the model!
What is the solution?
View 3 Replies
View Related
May 28, 2008
Hi,
My database looks like:
CategoryID ParentID Title Sort
1 -1 Cars 1
2 1 Honda 1
3 -1 Bikes 2
4 1 Ford 2
5 1 Toyota 3
6 3 Kawasaki 1
How can I retrieve the values in the following order:
1, 2, 4, 5, 3, 6
I have:
WITH MYCTE(categoryID, parentID, Title, Sort)
(
SELECT TOP 1 categoryID, parentID, Title, Sort
FROM Categories
WHERE parentID = -1
ORDER BY Sort ASC
UNION ALL
SELECT c.categoryID, c.parentID, c.title, c.sort
FROM Categories c
INNER JOIN MYCTE cte ON (cte.categoryID = c.parentID)
)
SELECT *
FROM MYCTE
It doesn't seem to work though? Help! hehe
View 4 Replies
View Related
Oct 26, 2005
Hi, for a new project i'm trying to build a tree structure in SQL using one table with 'Node' & 'ParentNode' fields along with 'title', etc.
Table = Tree
Node : ParentNode : Title : Show_Record
1 0 Root 1
2 1 Child 1
Then i'm trying to get SQL to return that in XML to my Tree Control 'oBout ASP TreeView'.
Now the tree control can accept XML fine as long as it's in a set format, which shouldn't be difficult and should cut my code from 200 lines to one.
However getting SQL to return the table records in XML is proving to be a total nightmare.
I've hunted the web but not getting very far, I've even got a couple of O'Reilly guides but still no luck, so any help would be excellent with this.
I wrote a sql query (basic 'select * from tree for xml raw') which returns the results in RAW XML, but when I run this in Query Analyser it returns the results as one long string broken up with '<' & '>' but gets to the third record and cuts off halfway.
<row node="1" parentnode="0" title="Root" type_image="book.gif" type_expanded="True"/><row node="2" parentnode="1" title="Service Delivery" type_image="page.gif" type_expanded="False"/><row node="3" parentnode="1" title="Business Support" type_image="page.
Anyone know why Query Analyser does that?
Any help in this much appreciated, as you can imagine i'm at my wits end.
:eek:
View 4 Replies
View Related
Nov 6, 2006
I am studying the behavior of 200.000 clients. With the use of decision trees I would like to know if my clients will abandon our service or not. I use a training set of 21.822 clients and I use a predict variable "aband" wich is a discrete variable and it can be 0 or 1. In my training set i have 21.597 cases in which aband is 0 and 255 cases in which aband is 1. Looking at the classification matrix obtained using as input table a testing set (unselected data) I can see that my decision tree doesn't recognize the cases in which aband is 1. Here is the Classification Matrix:
Counts for Dati Training on [Aband]
Predicted 0 (Actual) 1 (Actual)
0 21597 225
1 0 0
What should I do?
Chiara
View 3 Replies
View Related
Mar 6, 2007
hi,
I am using Time series alogorrithm.I just wants to know about the autoregression tree.I am having data like
Studid Date Perf
001 01/01/2007 90
001 02/01/2007 95
001 03/01/2007 89
002 01/01/2007 79
002 02/01/2007 90
002 03/01/2007 95
Like that. when I use my Model Viewer --> Descision Tree --> It shows like
Perf = 90.0084 + 1.02 * Perf(-2) + 0.25 * Perf(-2).
What is this value and how its getting calculated?
View 1 Replies
View Related
May 18, 2006
I would appreciate answers to the following doubts I have regarding Decision trees, CONTAINS and using CONTAINS in a DMX query:
1. Does MS decision tree work only off equality/inequality conditions for the nodes? Is it possible to use a predicate as the branch criteria for a node?
2. Can the T-SQL predicate CONTAINS(...) be used in a DMX query? I need to check if a column-value is a substring of another column and create an intermediate column that will enable me to construct a decision tree with the phrase-present/absent branch.
3. Can CONTAINS(...) be used in a select clause? Like -
SELECT CONTAINS(JAT.column1, '"Good day"')
FROM JustAnotherTable;
4. Does CONTAINS(...) support both arguments to be column references? Or, is it mandatory that the pattern (argument #2) has to be a literal string or a variable? E.g.: I need to know the validity of the following expression -
SELECT * FROM JustAnotherTable JAT
WHERE CONTAINS(JAT.column1, JAT.column3);
View 1 Replies
View Related
Aug 3, 2007
Hi,
I'm new to data mining, and have created an MS decision trees model. The model has the columns age, call outcome, call reason, country name, employee name and gender - all as inputs.
In the mining model viewer, I only get nodes for the age, despite having data for all the other columns.
Can anyone help?
Thanks
Jeremy
View 12 Replies
View Related
Dec 8, 2007
Hi,
I'm interested in understanding how the parametes work in the MS Decision Trees algorithm.
As far as I can tell, the MINIMUM_SUPPORT and COMPLEXITY_PENALTY parameters both control the number of splits and hence the depth of the tree.
Unfortunately the BOL descriptions are very brief - so can anyone tell me the difference between these 2 parameters?
Thanks
Jeremy
View 1 Replies
View Related
Mar 19, 2007
Hello.
I am trying to build a decision tree to predict prices. I have created the tree and looked at the lift charts, but I have not seen any of the traditional statistics I am used to from other programs (R-Squared, F statistics, etc.).
Does anyone have an example of how they calculated R-Squared for a decision tree on a continuous variable?
Thanks,
Brian
View 9 Replies
View Related
Dec 13, 2006
Hello,
I installed the bike buyer example and i am learning the DMX language. Now i wrote the following query (using MS decision trees):
SELECT
T.[Last Name],
[Bike Buyer],
PredictProbability(Predict([Bike Buyer])) AS [Probability]
From
[v Target Mail]
PREDICTION JOIN
OPENQUERY
(....... And so on..)
Now the result is surprising to me. In the resulttabel all the probabilities are equal.
Bike Buyer Probability
1 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
1 0.99994590500919611
and so on.
Now i am wondering what predictProbability means. I thought that PredictProbability meant the probability that the prediction is correct. Now all the probabilities are the same and the input is different. Can somebody tell me what PredictProbability means or am I using it wrong?
Thanx in advance,
Joris Valkonet
View 6 Replies
View Related
Sep 12, 2007
In a decision tree algorithm, is there a known way to force a branch at a top level? For exmaple, I have 30 known decision patterns that are going to be completely different and I don't want them to intermingle. I wanted to force a branch at the top node on one of the 30 patterns so I wouldn't have to create 30 mining models per client.
Brian
View 4 Replies
View Related
May 9, 2006
I have some accounting data, with some transaction attributes and amounts.
I'm using Decision Trees to try and predict the next month's amount for certain combinations of attributes.
I've tried two different structures for the model:
A: one with 9 discrete text input attributes.
B: And another with the same 9 attributes + a avarage Amount for all combinations of the nine attribute for every transaction.
When i've processed them and look in the dependency network, it says that the strongest link for the structure A is attribute "1".
And for the second its the avarage-Amount attribute.
Okey, that seems fine, but the second strongest link in structure B is attribute "2".
Shouldn't it be attribute 1 like in structure A?
Second question, if I run the same data in a Neural Network model, the prediction becomes much worst then the decision tree.
I get many predictions that are negative values even though all training data contains positiv values.
The StDev becomes the same for every row also..
What am I doing wrong with that one. I have alot of transactions and a read somewhere that a Neural Network should work better than a decision tree in a case similar to mine.
The score in the "Lift chart" for the Neural Network model becomes 0,00 and for Decision Trees with the same data I get around 110.
View 1 Replies
View Related
Jul 26, 2007
How is the value of Prediction Probability calculated in the context of decision trees?
View 7 Replies
View Related
Dec 7, 2006
Hi,
I am using MS Decision Trees algorithm and for a specific model i get the above warning.As a result of that i dont get any splits in my tree. Is there anything i can do to avoid this?
Thank you for reading
View 1 Replies
View Related
Jan 5, 2007
Hi,
I am trying to run one of the mining models from the book "Delivering BI using SQl Server 2005" but I am running into "Decision Trees found no splits for model". The mining structure has 4 columns, the fourth one being marked as "Predict Only". My Cube slice for the model has sufficient data in the cube. I am lost.. Help!!
Regards
View 4 Replies
View Related
Dec 12, 2007
While recently working with several mining models, I came across something that struck me as pretty odd - and I'm hoping to find an explanation for the behavior.
Consider the following setup:
A single table in the relational database represents the only case table
A single, continuous column is the predictable
A mining structure has been created
The mining structure contains a single model, based on the MS Decision Trees algorithm
Input columns were selected for the model via the BI Studio wizard (i.e., those provided via the "Suggest" button)
The structure has been fully processed
Now, the interesting parts:
I view the scatterplot for the mining model, under the Mining Accuracy Chart tab
Back on the Mining Structure tab, I delete one of the input columns
I add the same column back into the structure
The structure is fully processed again
When I view the scatterplot for the mining model, under the Mining Accuracy Chart tab, a different set of data points are presented for the model predictions
A different set of decision trees under the Mining Model Viewer tab confirms thisHow could different patterns have been found this second time around, even though all of the input columns were the same (as well as the training cases)?
(Note: I encountered this situation while creating a new mining model that was identical to an existing one. Even though the models received the exact same inputs and training cases, they yielded different results. I was able to reproduce the behavior by using steps 1-6 above, though.)
Can someone provide some insight on this behavior, or some kind of explanation of what may be happening?
Thanks,
Joe Miller
View 3 Replies
View Related
Jan 31, 2005
I would like to find information on Clustered and Non-clustered indexes and how B-trees are used. I know a clustered index is placed into a b-tree which makes sense for fast ordered searching. What data structure does a non-clustered index use and how? I tried to find info. on the web but couldn't get much detail...
View 3 Replies
View Related
Sep 29, 2015
I followed the tutorial posted at [URL] ...
Everything was ok until the last step where I had to process the mining structure which resulted in a warning
"Informational (Data mining): Decision Trees found no splits for model, Tbl Decision Tree Example."
What does this error mean? How do I resolve it? Also, I only see the first level in the Mining Model Viewer, I don't see the levels 2 and 3.
View 2 Replies
View Related