Interpreting The Percentage In Decision-tree Model

Sep 15, 2006

Hi,

I used a decision-tree mining-model to describe and predict fraud. The table contains 1039 records with 775 distinct value of A-number (the calling party). I used 9 columns in the model. SQL Server reports that only 3 columns are significant in predicting the fraud

- BPN_is_too_short (called party-number is too short)
- Duration_is_zero
- Invalid_area_code

The key-column in A-number, and the predicted column is Is_Fraud with the range of values are only 0 and 1. There's no record with NULL (missing-value) in the column Is_Fraud.

Mining Legend shows in the first split
[-] 625 cases of fraud
[-] 150 cases of non-fraud
[-] 0 cases of missing

In addition to that, Mining Legend shows
[-] 79.69% of fraud
[-] 19.64% of non-fraud
[-] 0.67% Missing

Now when I compare those values, they don't match.
(A) 625/775 is 80.645%, not 79.69%
(B) 150/775 is 19.355%, not 19.64%
(C) 0 cases of NULL (missing value) should imply 0% of missing, not 0.67% of missing

Furthermore in one node (with the split on duration_is_zero), there are 541 cases of fraud and 0 cases of non-fraud. This implies the node is leaf-node. However, Mining Legend shows

514 cases of fraud, 99.35%

0 cases of non-fraud, 0.33%

[F] 0 cases of missing, 0.33%

My questions
(1) Why the values don't match like in cases A through C ?
(2) Why the values don't match even in cases D through F when we have no subtree at all ?

I've searched explanation by reading the mathematical reasoning, entropy, Gini index; but it does not answer the discrepancies of those values and percentages in the Mining Legend.

Regards,

Bernaridho

View 3 Replies

Interpreting The Percentage In Decision-tree Model

Getting The Model's (Decision Tree) PMML

Data Mining Model Viewer Of Decision Tree Out Of Memory Error

Error Not Enough Space For Temporal Database When Processing Decision Tree Model

PLZ HELP ME WITH THE DECISION TREE!!

Decision Tree In MS SQL Server

Odd Decision Tree Results

Microsoft Decision Tree

Microsoft Decision Tree

Help With Project! Multiway Decision Tree On SQL

Error When Processing Decision Tree

Microsoft Decision Tree Algorithm

Microsoft Decision Tree Algorithm

Microsoft's Decision Tree Paper

Access To Decision Tree, Cluster... Charts From C# Or Word

Possible To Save Up The Progress At Some Point Of Decision Tree Training?

Decision Tree Predictions Occuring At Non-leaf Node

Possible To Speed Up The Decision Tree By Clustering The Server 2003

Function Does Not Exist For Decision Tree When Running Tutorial

Design Issue With Microsoft Decision Tree Algorithm

PMML: One Node In A Decision Tree Containing Two States Of An Attribute As The Rule For Splitting?

'Decision Trees Found No Appropriate Regressors For Model' Question

Error: Decision Trees Found No Splits For Model

Help With Celko's Tree Model

Questions About Regression Tree Model

How To Partition The Split The Dataset Into Training And Validation When Running Descision Tree Model?

Data Mining :: Informational (Data Mining) - Decision Trees Found No Splits For Model

SQL 2012 :: Sort Tree Members In Right (tree) Structure?

Report Model Deployment : The Model ID Of The Submitted Model Must Match That Of The

Need Help Interpreting Some SQL

Interpreting Product A-2 &&>=1.978

Need Help Interpreting Error Message From Job

Interpreting Index Statistics On SQL 2005