Microsoft's Decision Tree Paper

Dec 6, 2007

Hi all,

I am searching for days for a paper explaining in details the decision tree algorithm that Microsoft uses. It would be very nice if parameters are described in details and the theory basis illustrated. I will be very happy to know in depeth fro this algorithm and how its parameter it affects the results.

Thank you in advance
Manolis

View 1 Replies


ADVERTISEMENT

Microsoft Decision Tree

May 2, 2006

I have an example case below :












Customer Id


Debt Level


Income Level


Employment Type


Credit Risk



1


High


High


Self-Employed


Bad



2


High


High


Salaried


Bad



3


High


Low


Salaried


Bad



4


Low


Low


Salaried


Good



5


Low


Low


Self-Emplyed


Bad



6


Low


High


Self-Employed


Good



7


Low


High


Salaried


Good


My question is how to make a tree from the case above I mean what method we should use to split the tree. (Mannually counting)
I hope anyone could help me by explaining i details.Because i want to make some analysis how microsoft decision tree works exactly.So Please explain me the process to build the tree completely with the method.


Thanks a lot.

View 4 Replies View Related

Microsoft Decision Tree

May 16, 2006

Hi again ....i'm trying to understand the works of microsoft decision tree algorithm

I have an example case below :





Customer Id


Debt Level


Income Level


Employment Type


Credit Risk



1


High


High


Self-Employed


Bad



2


High


High


Salaried


Bad



3


High


Low


Salaried


Bad



4


Low


Low


Salaried


Good



5


Low


Low


Self-Emplyed


Bad



6


Low


High


Self-Employed


Good



7


Low


High


Salaried


Good

My question is how about the equations used to determine a split?

Please explain me detailed.

Thanks a lot.

View 1 Replies View Related

Microsoft Decision Tree Algorithm

Apr 12, 2006

I have read some sources about microsoft decision tree algorithm like in claude seidman book, paper about scalable classification over sql databases and paper about learning bayesian network. But i still don't understand and i still didn't get the point on how microsoft decision tree algorithm works exactly when splitting an atribut. Because i have read that microsoft decision tree using Bayesian score to split criteria is it true?

Well, anyone could help me to understand about microsoft decision tree algorithm, please give me details explanation with some example(cases).



thanks for anyone help

View 12 Replies View Related

Microsoft Decision Tree Algorithm

Jun 10, 2006

hai ...............all

well i've read in Claude seidmann book about Data mining with microsoft decision, that the statistical techniques employed to build the decision trees include:

Cart, Chaid and C.45.Could anyone explain to me about cart,chaid and c.45? and how the tree statistical techniques influence the decision tree.

thank you so much

View 1 Replies View Related

Design Issue With Microsoft Decision Tree Algorithm

Nov 26, 2007



I am doing some testing with the Microsoft Decision Tree algorithm and I can't get the results I am expecting. At this point I am concerned that my design might be incorrect. Here is my scenario:

Suppose I have a company which sells bikes and I am trying to predict customer satisfaction. Each customer can buy one or more bikes so I set the customer table as the case table and the bike_sale table as a nested table.






Customer Table (Case)

Bike_Sale Table (Nested)


Cust Name

Cust Surname

Cust Satisfaction

Bike Type

Bike Quality


John

Woods

5

Racer

5


Peter

Cole

3

Racer

3


Mountain Bike

4


Joe

Matthews

4

Mountain Bike

4


Tyron

Wright

2

Mountain Bike

2


Josh

Yorke

1

Racer

1


For testing purposes, I hid a pattern in the training data such that the customer satisfaction attribute (the attribute to be predicted) has strong correlation with the bike quality attribute as can be seen in the exemplary data provided.

However, in the data mining model wizard, when I set the Cust Satisfaction attribute as the predictable one and click the Suggest button, the algorithm does not list the bike quality attribute. I also tried setting the Bike Quality attribute as the only input attribute and process the model, but still, no patterns were found. Do you have any suggestions?

View 7 Replies View Related

PLZ HELP ME WITH THE DECISION TREE!!

Apr 16, 2007

I'm having this problem.....



I wanted to use the Decision Tree to show a result..... after i configure the Mining Structures..... and set all the input.... my decision tree shows only until level 2..... i have 3 input and one PredictOnly column.....where is the other input?



Say.... i have House Owner, Marital Status, Num Cars Owned and Number Of Children(PredictOnly)



my Tree only shows All ---- > Marital Status when i input all 3 together...... the other 2 doesn't seems to show.



wat should i do?? my database in SQL Server and the other keys are all correct and deploying finely.....why is this happening.....?



i'm a newbie in this software.......so any pro here can plz help me if there's actually something that i might have missed out along the way.......



Thank you again.........

View 1 Replies View Related

Decision Tree In MS SQL Server

Jun 28, 2007

Hi,



Can we represent the Decision Tree in a programatically way in an .NET application? I understand that the outcome of a Decision Tree model can be integrated into an .NET application but not sure if we can also visualize it. Does MS SQL Server support any API to render such a tree?



Thanks a lot!

View 3 Replies View Related

Odd Decision Tree Results

Oct 20, 2007

I have got a lot of results like the following two nodes:

All
Existing Cases: 1035298
Missing Cases: 1604
Y = 3,214,966,177,062,520,000,000.000

a >= -0.9822378254 and < -0.7867621803
Existing Cases: 45291
Missing Cases: 17
Y = 9,491,528,329,086,450,000,000.000

Every node of the tree is as odd as this. I checked the training data and found there are 5 bad points with extraordinarily high values of Y. There are over a million points, how can these five points screw up the entire analysis.

I do have good results for other predicted parameters even though they also bad points.

Any tip?

Thanks,

View 3 Replies View Related

Help With Project! Multiway Decision Tree On SQL

Dec 5, 2007

Hello,

Im working on my minor project for my Undergrad course.
I have no earlier experience on working with SQL, im the biggest noob if there ever was one.

For a part of my project i have to design a page using php and sql to query from a big student database selected details(Rank, Sex, Branch) and calculate the industrial placement chances and to construct a multiway decision search tree on SQL(im using WAMP server).

This page is supposed to help new students joining the college decide an ideal branch based on past performances and placement record. A new student will enter his rank and relevant details and the from the decision tree an ideal branch(es) with high placement history will be suggested.

My project assignment reads:
"Now from the above prepared data constuct a decision search tree implement it a either using association rules or persistent Objects and store it in secondary storage as shown



Further studies can be done to improve existing decision trees ... data mining bayesian classifier blah blah blah ... "

What i have done till now is create a table in this format:



But this hardly a tree. Rather i had flattened each path of the tree and made it into a table like:
[node] -> [node] -> [node] -> [leaf]

I have tried to read some text on how to do this, but its not making sence and most importantly im not sure what im reading is actually going to help me achieve my project goals. Right now stranded reading random articles. I have to do this within 5 days. I have asked people around here some professionals and teachers, noone seems to have done this before. A little help in direction would be greatly appreciated.

Regards

Anurag

View 6 Replies View Related

Getting The Model's (Decision Tree) PMML

Feb 19, 2007

Hi

Can anyone tell me the steps involved in retrieving a model's (decision tree) pmml and use the model content to devleop a web based interface. I am using SQL Server 2005.

Thanks,

Nathan



View 5 Replies View Related

Error When Processing Decision Tree

Jan 9, 2008

Hi,

I'm using SQL Server 2005 Standard Edition, and when I try to process a Decision Tree with more or less 50 input variables I get the following warning:


"Informational (Data mining): Automatic feature selection has been applied to model, TREE_2 due to the large number of attributes. Set MAXIMUM_INPUT_ATTRIBUTES and/or MAXIMUM_OUTPUT_ATTRIBUTES to increase the number of attributes considered by the algorithm."

I've tried to set MAXIMUM_INPUT_ATTRIBUTES to 10 and then there's an error saying: "The 'MAXIMUM_INPUT_ATTRIBUTES' data mining parameter is not valid for the 'TREE_2' model."


Does anyone have a clue of how can I solve it?


Thank you.

View 3 Replies View Related

Interpreting The Percentage In Decision-tree Model

Sep 15, 2006

Hi,

I used a decision-tree mining-model to describe and predict fraud. The table contains 1039 records with 775 distinct value of A-number (the calling party). I used 9 columns in the model. SQL Server reports that only 3 columns are significant in predicting the fraud

- BPN_is_too_short (called party-number is too short)
- Duration_is_zero
- Invalid_area_code

The key-column in A-number, and the predicted column is Is_Fraud with the range of values are only 0 and 1. There's no record with NULL (missing-value) in the column Is_Fraud.

Mining Legend shows in the first split
[-] 625 cases of fraud
[-] 150 cases of non-fraud
[-] 0 cases of missing

In addition to that, Mining Legend shows
[-] 79.69% of fraud
[-] 19.64% of non-fraud
[-] 0.67% Missing

Now when I compare those values, they don't match.
(A) 625/775 is 80.645%, not 79.69%
(B) 150/775 is 19.355%, not 19.64%
(C) 0 cases of NULL (missing value) should imply 0% of missing, not 0.67% of missing

Furthermore in one node (with the split on duration_is_zero), there are 541 cases of fraud and 0 cases of non-fraud. This implies the node is leaf-node. However, Mining Legend shows

514 cases of fraud, 99.35%

0 cases of non-fraud, 0.33%

[F] 0 cases of missing, 0.33%


My questions
(1) Why the values don't match like in cases A through C ?
(2) Why the values don't match even in cases D through F when we have no subtree at all ?

I've searched explanation by reading the mathematical reasoning, entropy, Gini index; but it does not answer the discrepancies of those values and percentages in the Mining Legend.

Regards,

Bernaridho

View 3 Replies View Related

Access To Decision Tree, Cluster... Charts From C# Or Word

Nov 20, 2007



Hi!
We use SS2005.

a.)
Let's assume you already have defined "mining model" and they are visible in object explorer for Analysis Services.

How to show the picture in an web form (and no I don't want to right click, taking an snapshot of the picture in SSMS or Visual Studio) using c#-api? Alternative: for some time ago I read something about that you can do this in Word, but I can't find the article...

I could see there is an C#-api for Reporting Services so I would expect similar for Analysis Services ;-)

b.)
Lets assume I don't want to go through Visual Studio to creating models. How to store/create new datamining models via C#? And of course: how to force the calculation of the values for node splits etc.?


The solution for "a" will ensure that I always get an actual version of the charts.
Why I ask for "b": Management thinks this will be great for root cause analysis. But I think there is the risk that the
many resulting models, which probably differ will be more confusing than helping.

Thanks

View 1 Replies View Related

Possible To Save Up The Progress At Some Point Of Decision Tree Training?

Aug 24, 2007

Dear All,


If I have a decision tree training work which might last for many days or months. Is it possible to tell the data mining training program to save up the progress at some point? In case the computer hangs or power fail in the middle, the computer can resume the rest of the work at the saving point?

Thanks

Tony Chun Tung Siu

View 1 Replies View Related

Decision Tree Predictions Occuring At Non-leaf Node

May 2, 2007

After having built a decision tree model to predict a boolean output attribute using 64-bit SQL Server 2005 (build 9.0.3054), we have observed that predictions for some cases are being done at non-leaf nodes in the tree.



Specifically, after executing a prediction join which returns:


- CaseTable.CaseID
- MiningModel.OutputAttribute
- PredictProbability(MiningModel.OutputAttribute)
- PredictNodeId(MiningModel.OutputAttribute)



and comparing the values of PredictNodeID(MiningModel.OutputAttribute) with the mining model content column [NODE_UNIQUE_NAME] to determine the actual "rule" used to make the case-level prediction.



We have observed that for a subset of cases, predictions are being made at nodes in the tree that are not leaf nodes. Specifically, predictions are being made at a node that is 3 levels deep. The leaf nodes below this inner-tree node are 2 levels further down the tree.



Also supporting the fact that that predictions are being made at this non-leaf node is that the PredictProbability corresponds exactly with the output attribute distribution at this non-leaf node.



In this particular application, we would have obtained better results if the predictions were made at the leaf-nodes.



A few questions:
1. Why are predictions with decision trees made at non-leaf nodes?
2. Is there a way to "force" predictions to occur at leaf nodes via DMX?



Thanks in advance for any information or advice.

- Paul

View 1 Replies View Related

Possible To Speed Up The Decision Tree By Clustering The Server 2003

Aug 24, 2007

Dear All,


I have a dataminig programming that need to run for days. Is it possibile to speed up the training process by clustering several server by Windows 2003 clustering services? Is it actually that clustering 2 QUAD core computer is almost giving comparable performance as the sum of the speed of two (There must be some overhead, I know). I am actually familiary with the use of clustering. Is it just for making the server farm more reliable or it will collaborate and speeed up the whole training process?

If it is, is there any limit on the number of cluster is in the cluster. What version of Windows and SQL Server do I need to achieve speed up of data mining training process?

Thanks and regards

Tony Chun Tung Siu

View 3 Replies View Related

Function Does Not Exist For Decision Tree When Running Tutorial

Feb 8, 2007

When I run the Microsoft tutorial for data mining I get this error when I get to the decision tree part.
I get a similar error for clustering in the same tutorial.
However, The Naive Bayes demo seems fine.
The messages said the project was built and deployed without errors.

Does anyone know how to fix the error:

TITLE: Microsoft Visual Studio
------------------------------

The tree graph cannot be created because of the following error:

'Query (1, 6) The '[System].[Microsoft].[AnalysisServices].[System].[DataMining].[DecisionTrees].[GetTreeScores]' function does not exist.'.

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.762&EvtSrc=Microsoft.AnalysisServices.Viewers.SR&EvtID=ErrorCreateGraphFailed&LinkId=20476

------------------------------
ADDITIONAL INFORMATION:

Query (1, 6) The '[System].[Microsoft].[AnalysisServices].[System].[DataMining].[DecisionTrees].[GetTreeScores]' function does not exist. (Microsoft OLE DB Provider for Analysis Services 2005)

------------------------------
BUTTONS:

OK
------------------------------

View 4 Replies View Related

Data Mining Model Viewer Of Decision Tree Out Of Memory Error

May 18, 2007

We've successfully processed a large decision tree model in SQL Server 2005. When I try to view the tree in the mining model viewer, I get the following error:



TITLE: Microsoft Visual Studio
------------------------------

The tree graph cannot be created because of the following error:

'Exception of type 'System.OutOfMemoryException' was thrown.'.

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.42&EvtSrc=Microsoft.AnalysisServices.Viewers.SR&EvtID=ErrorCreateGraphFailed&LinkId=20476


The link provides no other documentaiton on the error.



We're using 64-bit SQL on a Dell Workstation running XP-64 with 16GB of memory. From my view of things we aren't close to running out of memory. Since the model processed and the error occurs when viewing the model, is this a problem with Visual Studio and nont necessarily Anlaysis Services?



Thanks in advance.



Nick

View 4 Replies View Related

Error Not Enough Space For Temporal Database When Processing Decision Tree Model

Sep 4, 2007

Hello,
I have a table (in Access) with about 30 fields and 1,700,000 records.
I had created a mining model in AS2005 with only one key (the autonum column called ID)
and other attributes marked as Input and/or predict.
When processing the model, it finish (after 15 min.) with an error: 3183
"Not enough space in temporal disk"
After some search , I encountered that is close related to the memory asigned to the tempdb.
I tried to increase the size of tempdb but it is imposible, moreover, it starts
with 8MB but it is autosized when needed.

I don't know how to solve this issue. Or, if it is a question of memory/disk space management (I have 100GB of free space in disk).

I tried the same model changing the KEY (I assign StudyID as key) then with the same data but 60,000 StudyIDs it is ok, so the mining model is ok (no nested tables, no case, too easy for getting a memory error)...

Please, can anyone recommend a possible solution for this issue?.
Many Thanks.

View 2 Replies View Related

PMML: One Node In A Decision Tree Containing Two States Of An Attribute As The Rule For Splitting?

Sep 22, 2006

Hi,
is there a way to import a decision tree-model from pmml where a node contains two or more states of an attribute as the split-rule?


Example:

...
<Node recordCount="600">
<CompoundPredicate booleanOperator="or">
<SimplePredicate field="color" operator="equal" value="red" />
<SimplePredicate field="color" operator="equal" value="green" />
</CompoundPredicate>
<ScoreDistribution value="true" recordCount="200"/>
<ScoreDistribution value="false" recordCount="400"/>
</Node>
...

This node shoud contain all cases, whose color is red or green (The Microsoft DecisionTree-Algorithm would build a model with two steps like red/ not red and then green / not green). According to the DMG, this is valid PMML 2.1, but when trying to import the server complains about an unexpected value in the SimplePredicate-tag.

How can i import such a node in SqlServer 2005?

Thank you in advance for any help

Chris

View 8 Replies View Related

Microsoft's Connectivity White Paper Is Officially Published

Feb 14, 2007

Hello All!

Many of you have seen the draft version, but we finally have it out the door. You can download the new version directly from Microsoft's official site.

The momentum in our connecitivty wiki is growing! so check back and see if you can get useful information on 64bit, Office 2007 connecitivity, and samples.

Some of the recent activity in our connectivity portal:

***************

The new connectivity white paper is now officially available, click here (http://download.microsoft.com/download/2/7/c/27cd7357-2649-4035-84af-e9c47df4329c/ConnectivitySSIS.doc) to download.

***************

There's a new article from Deniz Erkan on Office 2007 Connectivity (Data Sources/Microsoft Office (2007)).

***************

Bob Beauschemin provided us a sample package to connect to Excel 2007.

***************

Microsoft's Partner ETI has a new separate page for connectivity offerings for SSIS (Data Sources/ETI High Performance Data Integration).

***************

Microsoft's Partner Persistent (Data Sources/Persistent Systems Products for SSIS) has put together a list of options for SSIS connectivity.

********************************************************************

View 4 Replies View Related

Programmatically Set The Paper Size And And Paper Layout

May 15, 2006

how can i set the paper size and paper layout programmatically in RS. im using c#.net as prog. lang. for this. Pls help thanks!

View 11 Replies View Related

SQL 2012 :: Sort Tree Members In Right (tree) Structure?

Apr 6, 2015

I got assignment, how to make it appear in the right order .

/* DROP TABLE EMP
SELECT * INTO Emp FROM (
SELECT 'A' EmpID, NULL ManID, 'Name' EmpName UNION ALL
SELECT 'MAC' EmpID, 'A' ManID, 'Name__' EmpName UNION ALL
SELECT '1ABA' EmpID, 'MAC' ManID, 'Name____' EmpName UNION ALL
SELECT 'ABB' EmpID, '1ABA' ManID, 'Name______' EmpName UNION ALL
SELECT 'XB' EmpID, 'A' ManID, 'Name__' EmpName UNION ALL
SELECT 'BAC' EmpID, 'XB' ManID, 'Name____' EmpName ) b
*/

[code]....

View 2 Replies View Related

Paper About SQL Server 2005

Jul 23, 2005

Hey,did anyone know a good paper or a good ms link about SQL Server 2005 -because I have problems to install the Beta Version..thanks very much

View 1 Replies View Related

We Have A New White Paper On Connectivity!

Jan 10, 2007

Dear Forum Members,

We've been working on a white paper targeting SSIS connectivity which we hope will help answer some of the key questions in the following areas :



What are the SSIS components and their support level for ADO.NET, ODBC, and OleDB?


How to deal with 64-bit connectors? what is supported, what is not?


Special sections on popular data sources such as SAP, Oracle, DB2, Flat File, XML.


A comprehensive list of data sources and available connectors from Microsoft and other 3rd parties.

You'll also find answers to why some of the things are the way they are today.

Note that this white paper is currently under official editing and publishing in Microsoft. It'll be a while before it goes public officially, but I wanted to share it with you, as the rich content it offers can't really wait. You'll find the paper in my blog, which is really a wiki site about SSIS connectivity fully open to public, so feel free to add/update content in there as you feel proper, and help the SSIS community with your wisdom!

A lot of feedback went into this white paper not only from Microsoft, but also from some of our partners and MVPs. I'd like to extend special thanks to Bob Beauschemin for authoring this challenging white paper.

Enjoy!

Deniz Erkan

Program Manager - SSIS

 

View 7 Replies View Related

BPA Vs. Security Best Practices Paper

Jul 17, 2007





I would like to refer to the following technical article



SQL Server 2005 Security Best Practices - Operational and Administrative Tasks

http://www.microsoft.com/technet/prodtechnol/sql/2005/sql2005secbestpract.mspx



Among best practices for SQL Server service accounts on page 8, it is recommended to 'use a separate account for each service'. I created separate account for each service as advised and assign account to relevant Windows group created for each SQL Server service during SQL setup.



Now when I run Best Practices Analyzer, its report seemed to contradict what the above article said. For example, BPA reports excerpts:

"We recommend that the service SQLBrowser on host MachineName be run under Network Service Account". I get similar recommendation for SQLSERVERAGENT account as well. Most importantly, it recommends that MSFTESQL be run under SQL Server Service Account.



Can anyone of you shed some light on it?



Thanks,

Asaf

View 8 Replies View Related

Printing To Different Paper Trays

Sep 10, 2007

I'm trying to find out if there is any way I can embed anything in a report to tell it which paper tray to print to. So far, the only references I've found to such a capability are involved in using the Printer Delivery Extension. Does anyone know if this is indeed possible with that, or by any other means? Thanks!

View 3 Replies View Related

Display Paper Co Authored From Same Department

Nov 27, 2006

How to display the paper number that is co authored by authors from the same department?
The database structure:
Department(DeptNum, Descrip, Instname, DeptName, State, Postcode)
Academic(AcNum, DeptNum, FamName, GiveName, Initials, Title)
Paper(PaNum, Title)
Author(PaNum, AcNum)
Field(FieldNum, ID, Title)
Interest(FieldNum, AcNum, Descrip)
Thanks for help a lot

View 3 Replies View Related

Published Paper For Db Performance Strategies?

Jul 20, 2005

I am looking for some published paper regarding database performancetunning performance strategies. This is for academic purpose so itneeds not to be any commerical database specific. It will be evenbetter if the paper has some kind of methods to quantify/measureperformance. Has anyone come across with any interesting paper aboutthis?Thanks,ewong

View 2 Replies View Related

Margin And Paper Orientation Programmatically

Aug 16, 2007

Hello.
I have a report but by default it prints Portrait and 1.0inch Margin. I would like to programmatically set the values of my report to 0.2 inch margin and Landscape. I am using RDLC (Local Report).
Does anyone has an idea how to achieve this?

Jose

View 4 Replies View Related

Setting Paper Size To Be Default A4

May 30, 2006

Hi,

I have been trying to configure the Paper Size to be default "A4" instead of "Letter".
My Report is configured to 21cm x 29,7cm and margins 1,5cm.
The Body is configured to 18cm x 26,7cm.

Everything looks fine in the Preview but the Size is always "Letter". The printers are all configured for A4 printing.

Is there a way to set these default values in the Page Setup Toolbar or is it supposed to figure it out?

Thanks,
steinar

View 4 Replies View Related

Print Reports On Legal Paper

Jun 30, 2006



While working in the report project in Visual Studio, I set my Report Layout to 14w x 8.5h with .25 margins on all sides, and the page size to be 13.5in (to take into account the margins). When I print from the report viewer control in Preview mode, the report prints as it should, in Landscape on Legal paper without me changing any settings.

When I deploy the same report to the report server, and print from there, the report prints in Landscape on Letter paper (which causes some columns to print on a second page).

Why is there a difference in the two environments? Is there something I'm missing?

The goal is for the users to be able to print the report correctly without having them change any print settings in the dialog.

View 9 Replies View Related







Copyrights 2005-15 www.BigResource.com, All rights reserved