Decision Trees
Nov 6, 2006
I am studying the behavior of 200.000 clients. With the use of decision trees I would like to know if my clients will abandon our service or not. I use a training set of 21.822 clients and I use a predict variable "aband" wich is a discrete variable and it can be 0 or 1. In my training set i have 21.597 cases in which aband is 0 and 255 cases in which aband is 1. Looking at the classification matrix obtained using as input table a testing set (unselected data) I can see that my decision tree doesn't recognize the cases in which aband is 1. Here is the Classification Matrix:
Counts for Dati Training on [Aband]
Predicted 0 (Actual) 1 (Actual)
0 21597 225
1 0 0
What should I do?
Chiara
View 3 Replies
ADVERTISEMENT
May 18, 2006
I would appreciate answers to the following doubts I have regarding Decision trees, CONTAINS and using CONTAINS in a DMX query:
1. Does MS decision tree work only off equality/inequality conditions for the nodes? Is it possible to use a predicate as the branch criteria for a node?
2. Can the T-SQL predicate CONTAINS(...) be used in a DMX query? I need to check if a column-value is a substring of another column and create an intermediate column that will enable me to construct a decision tree with the phrase-present/absent branch.
3. Can CONTAINS(...) be used in a select clause? Like -
SELECT CONTAINS(JAT.column1, '"Good day"')
FROM JustAnotherTable;
4. Does CONTAINS(...) support both arguments to be column references? Or, is it mandatory that the pattern (argument #2) has to be a literal string or a variable? E.g.: I need to know the validity of the following expression -
SELECT * FROM JustAnotherTable JAT
WHERE CONTAINS(JAT.column1, JAT.column3);
View 1 Replies
View Related
Aug 3, 2007
Hi,
I'm new to data mining, and have created an MS decision trees model. The model has the columns age, call outcome, call reason, country name, employee name and gender - all as inputs.
In the mining model viewer, I only get nodes for the age, despite having data for all the other columns.
Can anyone help?
Thanks
Jeremy
View 12 Replies
View Related
Dec 8, 2007
Hi,
I'm interested in understanding how the parametes work in the MS Decision Trees algorithm.
As far as I can tell, the MINIMUM_SUPPORT and COMPLEXITY_PENALTY parameters both control the number of splits and hence the depth of the tree.
Unfortunately the BOL descriptions are very brief - so can anyone tell me the difference between these 2 parameters?
Thanks
Jeremy
View 1 Replies
View Related
Mar 19, 2007
Hello.
I am trying to build a decision tree to predict prices. I have created the tree and looked at the lift charts, but I have not seen any of the traditional statistics I am used to from other programs (R-Squared, F statistics, etc.).
Does anyone have an example of how they calculated R-Squared for a decision tree on a continuous variable?
Thanks,
Brian
View 9 Replies
View Related
Dec 13, 2006
Hello,
I installed the bike buyer example and i am learning the DMX language. Now i wrote the following query (using MS decision trees):
SELECT
T.[Last Name],
[Bike Buyer],
PredictProbability(Predict([Bike Buyer])) AS [Probability]
From
[v Target Mail]
PREDICTION JOIN
OPENQUERY
(....... And so on..)
Now the result is surprising to me. In the resulttabel all the probabilities are equal.
Bike Buyer Probability
1 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
0 0.99994590500919611
1 0.99994590500919611
and so on.
Now i am wondering what predictProbability means. I thought that PredictProbability meant the probability that the prediction is correct. Now all the probabilities are the same and the input is different. Can somebody tell me what PredictProbability means or am I using it wrong?
Thanx in advance,
Joris Valkonet
View 6 Replies
View Related
Sep 12, 2007
In a decision tree algorithm, is there a known way to force a branch at a top level? For exmaple, I have 30 known decision patterns that are going to be completely different and I don't want them to intermingle. I wanted to force a branch at the top node on one of the 30 patterns so I wouldn't have to create 30 mining models per client.
Brian
View 4 Replies
View Related
May 9, 2006
I have some accounting data, with some transaction attributes and amounts.
I'm using Decision Trees to try and predict the next month's amount for certain combinations of attributes.
I've tried two different structures for the model:
A: one with 9 discrete text input attributes.
B: And another with the same 9 attributes + a avarage Amount for all combinations of the nine attribute for every transaction.
When i've processed them and look in the dependency network, it says that the strongest link for the structure A is attribute "1".
And for the second its the avarage-Amount attribute.
Okey, that seems fine, but the second strongest link in structure B is attribute "2".
Shouldn't it be attribute 1 like in structure A?
Second question, if I run the same data in a Neural Network model, the prediction becomes much worst then the decision tree.
I get many predictions that are negative values even though all training data contains positiv values.
The StDev becomes the same for every row also..
What am I doing wrong with that one. I have alot of transactions and a read somewhere that a Neural Network should work better than a decision tree in a case similar to mine.
The score in the "Lift chart" for the Neural Network model becomes 0,00 and for Decision Trees with the same data I get around 110.
View 1 Replies
View Related
Jul 26, 2007
How is the value of Prediction Probability calculated in the context of decision trees?
View 7 Replies
View Related
Dec 7, 2006
Hi,
I am using MS Decision Trees algorithm and for a specific model i get the above warning.As a result of that i dont get any splits in my tree. Is there anything i can do to avoid this?
Thank you for reading
View 1 Replies
View Related
Jan 5, 2007
Hi,
I am trying to run one of the mining models from the book "Delivering BI using SQl Server 2005" but I am running into "Decision Trees found no splits for model". The mining structure has 4 columns, the fourth one being marked as "Predict Only". My Cube slice for the model has sufficient data in the cube. I am lost.. Help!!
Regards
View 4 Replies
View Related
Dec 12, 2007
While recently working with several mining models, I came across something that struck me as pretty odd - and I'm hoping to find an explanation for the behavior.
Consider the following setup:
A single table in the relational database represents the only case table
A single, continuous column is the predictable
A mining structure has been created
The mining structure contains a single model, based on the MS Decision Trees algorithm
Input columns were selected for the model via the BI Studio wizard (i.e., those provided via the "Suggest" button)
The structure has been fully processed
Now, the interesting parts:
I view the scatterplot for the mining model, under the Mining Accuracy Chart tab
Back on the Mining Structure tab, I delete one of the input columns
I add the same column back into the structure
The structure is fully processed again
When I view the scatterplot for the mining model, under the Mining Accuracy Chart tab, a different set of data points are presented for the model predictions
A different set of decision trees under the Mining Model Viewer tab confirms thisHow could different patterns have been found this second time around, even though all of the input columns were the same (as well as the training cases)?
(Note: I encountered this situation while creating a new mining model that was identical to an existing one. Even though the models received the exact same inputs and training cases, they yielded different results. I was able to reproduce the behavior by using steps 1-6 above, though.)
Can someone provide some insight on this behavior, or some kind of explanation of what may be happening?
Thanks,
Joe Miller
View 3 Replies
View Related
Sep 29, 2015
I followed the tutorial posted at [URL] ...
Everything was ok until the last step where I had to process the mining structure which resulted in a warning
"Informational (Data mining): Decision Trees found no splits for model, Tbl Decision Tree Example."
What does this error mean? How do I resolve it? Also, I only see the first level in the Mining Model Viewer, I don't see the levels 2 and 3.
View 2 Replies
View Related
May 28, 2008
Hi,
My database looks like:
CategoryID ParentID Title Sort
1 -1 Cars 1
2 1 Honda 1
3 -1 Bikes 2
4 1 Ford 2
5 1 Toyota 3
6 3 Kawasaki 1
How can I retrieve the values in the following order:
1, 2, 4, 5, 3, 6
I have:
WITH MYCTE(categoryID, parentID, Title, Sort)
(
SELECT TOP 1 categoryID, parentID, Title, Sort
FROM Categories
WHERE parentID = -1
ORDER BY Sort ASC
UNION ALL
SELECT c.categoryID, c.parentID, c.title, c.sort
FROM Categories c
INNER JOIN MYCTE cte ON (cte.categoryID = c.parentID)
)
SELECT *
FROM MYCTE
It doesn't seem to work though? Help! hehe
View 4 Replies
View Related
Oct 26, 2005
Hi, for a new project i'm trying to build a tree structure in SQL using one table with 'Node' & 'ParentNode' fields along with 'title', etc.
Table = Tree
Node : ParentNode : Title : Show_Record
1 0 Root 1
2 1 Child 1
Then i'm trying to get SQL to return that in XML to my Tree Control 'oBout ASP TreeView'.
Now the tree control can accept XML fine as long as it's in a set format, which shouldn't be difficult and should cut my code from 200 lines to one.
However getting SQL to return the table records in XML is proving to be a total nightmare.
I've hunted the web but not getting very far, I've even got a couple of O'Reilly guides but still no luck, so any help would be excellent with this.
I wrote a sql query (basic 'select * from tree for xml raw') which returns the results in RAW XML, but when I run this in Query Analyser it returns the results as one long string broken up with '<' & '>' but gets to the third record and cuts off halfway.
<row node="1" parentnode="0" title="Root" type_image="book.gif" type_expanded="True"/><row node="2" parentnode="1" title="Service Delivery" type_image="page.gif" type_expanded="False"/><row node="3" parentnode="1" title="Business Support" type_image="page.
Anyone know why Query Analyser does that?
Any help in this much appreciated, as you can imagine i'm at my wits end.
:eek:
View 4 Replies
View Related
Jun 30, 2006
Hi! I have created a DMM using Trees. But when I go to the Mining Model Predition tab and select a Predict function, I get this in the criteria column: <Scalar column reference>[, EXCLUDE_NULL|INCLUDE_NULL][, INCLUDE_NODE_ID]. When select Result, I get this error: "An incorrect number of arguments are used in the function at line 3, column 3." I'm predicting a continuous variable.
But when I delete everything except <Scalar column reference> I get this error: "Parser: The syntax for '<' is incorrect."
When I delete everything in the criteria column, I get this: "Query execution failed."
If I change the criteria to "<Scalar column reference>,INCLUDE_NULL, INCLUDE_NODE_ID" I get the error again that the query execution failed.
I'm working from a data set I created. I had no problems with predictions using clustering, but can't seem to get Trees to work.
View 3 Replies
View Related
Mar 6, 2007
hi,
I am using Time series alogorrithm.I just wants to know about the autoregression tree.I am having data like
Studid Date Perf
001 01/01/2007 90
001 02/01/2007 95
001 03/01/2007 89
002 01/01/2007 79
002 02/01/2007 90
002 03/01/2007 95
Like that. when I use my Model Viewer --> Descision Tree --> It shows like
Perf = 90.0084 + 1.02 * Perf(-2) + 0.25 * Perf(-2).
What is this value and how its getting calculated?
View 1 Replies
View Related
Dec 10, 2007
Hello everyone
I'm designing a new database for a project. In this database I have a calendar table with with the following columns:
id, dateValue, year, quarter, week, month, englishMonthName, day, englishDayName, dayOfTheYear, isWeekendDay
This database also has a session monitor that logs every access to the database, with a relation to the calendar row id. This way, I can make database access reports, without replicating the date value.
My question is: for a membership table should I follow the same principle and relate member row to the session monitor, which in turn, relates to the calendar or should I put the date just there?
Some of the tables of this database will have to handle some heavy load, both for updating and selecting. This said, my question is should I make a link or put the date just there to extinguish the need to make 2 joins just to know when something was registered / updated? If I only place the relation to know the date I'll have to do something like:
SELECT
DATEADD(ss, x.timeOffsetInSeconds, c.dateValue) AS date
FROM
<somewhere> x
JOIN sessionMonitor sm ON sm.id = x.sessionMonitorId
JOIN calendar c ON c.id = sm.calendarId
Instead of just doing a select x.lastUpdateDate
How would you gurus usually deal with these situations?
Best regards
View 4 Replies
View Related
Jan 17, 2008
I've decided to put the clustered index on the edit date column on an audit table. As the edit date for a new record is always going to be higher (more recent) than the previous record, the value would go onto the end of the index. So is there still a value in (1) providing a fill specification of less than 100% and (2) padding the index?
View 13 Replies
View Related
Apr 16, 2007
I'm having this problem.....
I wanted to use the Decision Tree to show a result..... after i configure the Mining Structures..... and set all the input.... my decision tree shows only until level 2..... i have 3 input and one PredictOnly column.....where is the other input?
Say.... i have House Owner, Marital Status, Num Cars Owned and Number Of Children(PredictOnly)
my Tree only shows All ---- > Marital Status when i input all 3 together...... the other 2 doesn't seems to show.
wat should i do?? my database in SQL Server and the other keys are all correct and deploying finely.....why is this happening.....?
i'm a newbie in this software.......so any pro here can plz help me if there's actually something that i might have missed out along the way.......
Thank you again.........
View 1 Replies
View Related
Nov 13, 2001
Playing with DTS for converting text data to SQL Server table and sending email messages. Package is to be scheduled hourly.
1. Is there any easy way to force DTS to choose Success or Failure based on a SQL Task? Wish to check contents of a table for new values and branch accordingly in DTS.
2. When new records are brought in I wish to build a custom email message rather than attaching a text file. Any Idea how to build a custom message?
I am attempting to auto scan an error log on a Unix based system and email myself when errors occur. This is an exercise to test scheduling of jobs, bridging between technologies and sending out meaningful messages based on the process via email.
Thanks,
Mike Hoyt
View 1 Replies
View Related
Mar 2, 2004
Whilst on the Nth hour (n = many) of my magical journey through MS Sql BOL I've come across OLTP Vs Decision Support. After a couple of searches here could someone shore up the following for me please...
A decision support database is the same as warehouse database.
This is for static data commonly used for reporting and analysis.
OLTP is a live database (accomodates inserts, deletes, updates etc).
Is that right?
Also would it be fair to assume that a decision support database is generally going to be spawned from the historical data of an OLTP database? Any real world examples of these two terms would be greatly appreciated too.
Cheers
Dan
View 2 Replies
View Related
Jun 28, 2007
Hi,
Can we represent the Decision Tree in a programatically way in an .NET application? I understand that the outcome of a Decision Tree model can be integrated into an .NET application but not sure if we can also visualize it. Does MS SQL Server support any API to render such a tree?
Thanks a lot!
View 3 Replies
View Related
Oct 20, 2007
I have got a lot of results like the following two nodes:
All
Existing Cases: 1035298
Missing Cases: 1604
Y = 3,214,966,177,062,520,000,000.000
a >= -0.9822378254 and < -0.7867621803
Existing Cases: 45291
Missing Cases: 17
Y = 9,491,528,329,086,450,000,000.000
Every node of the tree is as odd as this. I checked the training data and found there are 5 bad points with extraordinarily high values of Y. There are over a million points, how can these five points screw up the entire analysis.
I do have good results for other predicted parameters even though they also bad points.
Any tip?
Thanks,
View 3 Replies
View Related
May 2, 2006
I have an example case below :
Customer Id
Debt Level
Income Level
Employment Type
Credit Risk
1
High
High
Self-Employed
Bad
2
High
High
Salaried
Bad
3
High
Low
Salaried
Bad
4
Low
Low
Salaried
Good
5
Low
Low
Self-Emplyed
Bad
6
Low
High
Self-Employed
Good
7
Low
High
Salaried
Good
My question is how to make a tree from the case above I mean what method we should use to split the tree. (Mannually counting)
I hope anyone could help me by explaining i details.Because i want to make some analysis how microsoft decision tree works exactly.So Please explain me the process to build the tree completely with the method.
Thanks a lot.
View 4 Replies
View Related
May 16, 2006
Hi again ....i'm trying to understand the works of microsoft decision tree algorithm
I have an example case below :
Customer Id
Debt Level
Income Level
Employment Type
Credit Risk
1
High
High
Self-Employed
Bad
2
High
High
Salaried
Bad
3
High
Low
Salaried
Bad
4
Low
Low
Salaried
Good
5
Low
Low
Self-Emplyed
Bad
6
Low
High
Self-Employed
Good
7
Low
High
Salaried
Good
My question is how about the equations used to determine a split?
Please explain me detailed.
Thanks a lot.
View 1 Replies
View Related
Dec 5, 2007
Hello,
Im working on my minor project for my Undergrad course.
I have no earlier experience on working with SQL, im the biggest noob if there ever was one.
For a part of my project i have to design a page using php and sql to query from a big student database selected details(Rank, Sex, Branch) and calculate the industrial placement chances and to construct a multiway decision search tree on SQL(im using WAMP server).
This page is supposed to help new students joining the college decide an ideal branch based on past performances and placement record. A new student will enter his rank and relevant details and the from the decision tree an ideal branch(es) with high placement history will be suggested.
My project assignment reads:
"Now from the above prepared data constuct a decision search tree implement it a either using association rules or persistent Objects and store it in secondary storage as shown
Further studies can be done to improve existing decision trees ... data mining bayesian classifier blah blah blah ... "
What i have done till now is create a table in this format:
But this hardly a tree. Rather i had flattened each path of the tree and made it into a table like:
[node] -> [node] -> [node] -> [leaf]
I have tried to read some text on how to do this, but its not making sence and most importantly im not sure what im reading is actually going to help me achieve my project goals. Right now stranded reading random articles. I have to do this within 5 days. I have asked people around here some professionals and teachers, noone seems to have done this before. A little help in direction would be greatly appreciated.
Regards
Anurag
View 6 Replies
View Related
Feb 19, 2007
Hi
Can anyone tell me the steps involved in retrieving a model's (decision tree) pmml and use the model content to devleop a web based interface. I am using SQL Server 2005.
Thanks,
Nathan
View 5 Replies
View Related
Jan 9, 2008
Hi,
I'm using SQL Server 2005 Standard Edition, and when I try to process a Decision Tree with more or less 50 input variables I get the following warning:
"Informational (Data mining): Automatic feature selection has been applied to model, TREE_2 due to the large number of attributes. Set MAXIMUM_INPUT_ATTRIBUTES and/or MAXIMUM_OUTPUT_ATTRIBUTES to increase the number of attributes considered by the algorithm."
I've tried to set MAXIMUM_INPUT_ATTRIBUTES to 10 and then there's an error saying: "The 'MAXIMUM_INPUT_ATTRIBUTES' data mining parameter is not valid for the 'TREE_2' model."
Does anyone have a clue of how can I solve it?
Thank you.
View 3 Replies
View Related
Apr 12, 2006
I have read some sources about microsoft decision tree algorithm like in claude seidman book, paper about scalable classification over sql databases and paper about learning bayesian network. But i still don't understand and i still didn't get the point on how microsoft decision tree algorithm works exactly when splitting an atribut. Because i have read that microsoft decision tree using Bayesian score to split criteria is it true?
Well, anyone could help me to understand about microsoft decision tree algorithm, please give me details explanation with some example(cases).
thanks for anyone help
View 12 Replies
View Related
Jun 10, 2006
hai ...............all
well i've read in Claude seidmann book about Data mining with microsoft decision, that the statistical techniques employed to build the decision trees include:
Cart, Chaid and C.45.Could anyone explain to me about cart,chaid and c.45? and how the tree statistical techniques influence the decision tree.
thank you so much
View 1 Replies
View Related
Dec 6, 2007
Hi all,
I am searching for days for a paper explaining in details the decision tree algorithm that Microsoft uses. It would be very nice if parameters are described in details and the theory basis illustrated. I will be very happy to know in depeth fro this algorithm and how its parameter it affects the results.
Thank you in advance
Manolis
View 1 Replies
View Related
Mar 20, 2001
We have SQL server 7 installed. And we experiencing the performance problem.
When I tried to solve this problem I have found the interesting thing:
We have the table S_EVT_ACT with the non clustered index
S_EVT_ACT_F4 created on [OWNER_PER_ID], [APPT_REPT_FLG] fields and another clustered one
S_EVT_ACT_M4 created on [ROW_STATUS], [OWNER_PER_ID] fields
I use the next select statement and before run this statement I declare the variable @P1 - @P5 and set the values to them:
declare @P1 as char(1)
declare @P2 as char(1)
declare @P3 as char(1)
declare @P4 as varchar(10)
declare @P5 as char(1)
set @P1 = 'Y'
set @P2 = 'Y'
set @P3 = 'N'
set @P4 = '1-K56'
set @P5 = 'Y'
SELECT
..
...
...
FROM
dbo.S_EVT_ACT T1 --(index = S_EVT_ACT_F4)
LEFT OUTER JOIN dbo.S_CONTACT T2 ON T1.TARGET_PER_ID = T2.ROW_ID
LEFT OUTER JOIN dbo.S_OPTY T3 ON T1.OPTY_ID = T3.ROW_ID
LEFT OUTER JOIN dbo.S_ORG_EXT T4 ON T1.TARGET_OU_ID = T4.ROW_ID
LEFT OUTER JOIN dbo.S_EVT_ACT_X T5 ON T1.ROW_ID = T5.PAR_ROW_ID
WHERE
((T1.ALARM_FLAG = @P1 OR T1.APPT_REPT_REPL_CD IS NOT NULL) AND
(T1.APPT_REPT_FLG = @P2 AND (T1.CAL_DISP_FLG = @P3 OR T1.CAL_DISP_FLG IS NULL)) AND
(T1.OWNER_PER_ID = @P4) AND
(T1.TEMPLATE_FLG != @P5 OR T1.TEMPLATE_FLG IS NULL))
ORDER BY
T1.CREATED
In this case Query Analyzer uses S_EVT_ACT_M4 index and the performance is bad!
But when I try to run the same statement and use hard code 'Y' instead of the variable @P2 the Query Analyzer uses the S_EVT_ACT_F4 index and performance is PERFECT.
Question: What the difference between using variable @P2 and hard code 'Y' in the select statement and how to configure SQL server to use the right index in the situation when I can't change the Select statement and use the hard coding?
Any Idea will be very Appreciated.
Dmitri Denejkine
stssystems
MS SQL Server DBA, MIS
(514) 426-0822 ext. 2676
View 3 Replies
View Related