In a decision tree algorithm, is there a known way to force a branch at a top level? For exmaple, I have 30 known decision patterns that are going to be completely different and I don't want them to intermingle. I wanted to force a branch at the top node on one of the 30 patterns so I wouldn't have to create 30 mining models per client.
I am studying the behavior of 200.000 clients. With the use of decision trees I would like to know if my clients will abandon our service or not. I use a training set of 21.822 clients and I use a predict variable "aband" wich is a discrete variable and it can be 0 or 1. In my training set i have 21.597 cases in which aband is 0 and 255 cases in which aband is 1. Looking at the classification matrix obtained using as input table a testing set (unselected data) I can see that my decision tree doesn't recognize the cases in which aband is 1. Here is the Classification Matrix: Counts for Dati Training on [Aband] Predicted 0 (Actual) 1 (Actual) 0 21597 225 1 0 0
I would appreciate answers to the following doubts I have regarding Decision trees, CONTAINS and using CONTAINS in a DMX query:
1. Does MS decision tree work only off equality/inequality conditions for the nodes? Is it possible to use a predicate as the branch criteria for a node?
2. Can the T-SQL predicate CONTAINS(...) be used in a DMX query? I need to check if a column-value is a substring of another column and create an intermediate column that will enable me to construct a decision tree with the phrase-present/absent branch.
3. Can CONTAINS(...) be used in a select clause? Like -
SELECT CONTAINS(JAT.column1, '"Good day"')
FROM JustAnotherTable;
4. Does CONTAINS(...) support both arguments to be column references? Or, is it mandatory that the pattern (argument #2) has to be a literal string or a variable? E.g.: I need to know the validity of the following expression -
I'm new to data mining, and have created an MS decision trees model. The model has the columns age, call outcome, call reason, country name, employee name and gender - all as inputs.
In the mining model viewer, I only get nodes for the age, despite having data for all the other columns.
I am trying to build a decision tree to predict prices. I have created the tree and looked at the lift charts, but I have not seen any of the traditional statistics I am used to from other programs (R-Squared, F statistics, etc.).
Does anyone have an example of how they calculated R-Squared for a decision tree on a continuous variable?
I installed the bike buyer example and i am learning the DMX language. Now i wrote the following query (using MS decision trees):
SELECT T.[Last Name], [Bike Buyer], PredictProbability(Predict([Bike Buyer])) AS [Probability] From [v Target Mail] PREDICTION JOIN OPENQUERY (....... And so on..)
Now the result is surprising to me. In the resulttabel all the probabilities are equal.
Bike Buyer Probability 1 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 1 0.99994590500919611
and so on.
Now i am wondering what predictProbability means. I thought that PredictProbability meant the probability that the prediction is correct. Now all the probabilities are the same and the input is different. Can somebody tell me what PredictProbability means or am I using it wrong?
I have some accounting data, with some transaction attributes and amounts. I'm using Decision Trees to try and predict the next month's amount for certain combinations of attributes.
I've tried two different structures for the model:
A: one with 9 discrete text input attributes. B: And another with the same 9 attributes + a avarage Amount for all combinations of the nine attribute for every transaction.
When i've processed them and look in the dependency network, it says that the strongest link for the structure A is attribute "1". And for the second its the avarage-Amount attribute. Okey, that seems fine, but the second strongest link in structure B is attribute "2".
Shouldn't it be attribute 1 like in structure A?
Second question, if I run the same data in a Neural Network model, the prediction becomes much worst then the decision tree. I get many predictions that are negative values even though all training data contains positiv values. The StDev becomes the same for every row also.. What am I doing wrong with that one. I have alot of transactions and a read somewhere that a Neural Network should work better than a decision tree in a case similar to mine. The score in the "Lift chart" for the Neural Network model becomes 0,00 and for Decision Trees with the same data I get around 110.
I am using MS Decision Trees algorithm and for a specific model i get the above warning.As a result of that i dont get any splits in my tree. Is there anything i can do to avoid this?
I am trying to run one of the mining models from the book "Delivering BI using SQl Server 2005" but I am running into "Decision Trees found no splits for model". The mining structure has 4 columns, the fourth one being marked as "Predict Only". My Cube slice for the model has sufficient data in the cube. I am lost.. Help!!
While recently working with several mining models, I came across something that struck me as pretty odd - and I'm hoping to find an explanation for the behavior.
Consider the following setup:
A single table in the relational database represents the only case table A single, continuous column is the predictable A mining structure has been created
The mining structure contains a single model, based on the MS Decision Trees algorithm Input columns were selected for the model via the BI Studio wizard (i.e., those provided via the "Suggest" button) The structure has been fully processed Now, the interesting parts:
I view the scatterplot for the mining model, under the Mining Accuracy Chart tab Back on the Mining Structure tab, I delete one of the input columns I add the same column back into the structure The structure is fully processed again When I view the scatterplot for the mining model, under the Mining Accuracy Chart tab, a different set of data points are presented for the model predictions A different set of decision trees under the Mining Model Viewer tab confirms thisHow could different patterns have been found this second time around, even though all of the input columns were the same (as well as the training cases)?
(Note: I encountered this situation while creating a new mining model that was identical to an existing one. Even though the models received the exact same inputs and training cases, they yielded different results. I was able to reproduce the behavior by using steps 1-6 above, though.)
Can someone provide some insight on this behavior, or some kind of explanation of what may be happening?
On the front sheet I've created system wide figures with a two columns per month and a matrix which is also broken down by category (e.g. child fiction, child non fiction, adult fiction, adult non fiction) which has row and column totals.
I want to have a page like this for each branch in the system.
Hi, for a new project i'm trying to build a tree structure in SQL using one table with 'Node' & 'ParentNode' fields along with 'title', etc.
Table = Tree Node : ParentNode : Title : Show_Record 1 0 Root 1 2 1 Child 1
Then i'm trying to get SQL to return that in XML to my Tree Control 'oBout ASP TreeView'.
Now the tree control can accept XML fine as long as it's in a set format, which shouldn't be difficult and should cut my code from 200 lines to one.
However getting SQL to return the table records in XML is proving to be a total nightmare.
I've hunted the web but not getting very far, I've even got a couple of O'Reilly guides but still no luck, so any help would be excellent with this.
I wrote a sql query (basic 'select * from tree for xml raw') which returns the results in RAW XML, but when I run this in Query Analyser it returns the results as one long string broken up with '<' & '>' but gets to the third record and cuts off halfway.
Hi! I have created a DMM using Trees. But when I go to the Mining Model Predition tab and select a Predict function, I get this in the criteria column: <Scalar column reference>[, EXCLUDE_NULL|INCLUDE_NULL][, INCLUDE_NODE_ID]. When select Result, I get this error: "An incorrect number of arguments are used in the function at line 3, column 3." I'm predicting a continuous variable.
But when I delete everything except <Scalar column reference> I get this error: "Parser: The syntax for '<' is incorrect."
When I delete everything in the criteria column, I get this: "Query execution failed."
If I change the criteria to "<Scalar column reference>,INCLUDE_NULL, INCLUDE_NODE_ID" I get the error again that the query execution failed.
I'm working from a data set I created. I had no problems with predictions using clustering, but can't seem to get Trees to work.
I have an Itanium 64bit server to run SSIS packages on. I have one package with three parralell streams. When I run the package in 64 bit mode using dtexec, it runs through validation and exits with no reported errors, when I run it from a job, the job fails and says to see job log, which has no errors.
When I run it in 32 bit mode using the GUI, it runs all the way through.
Does anyone know how to launch SSIS in 32 bit mode from a job on an Itanium?
This is a really wide spread - more than a time discussed - on SQL CE MSDN Forums - Issue !!! Is there any way i can commit changes which happens during runtime (when i am developing the application) such as inserts/updates and deletes to the .sdf DB on the machine ?????
As our DB has no primary keys or indexes ive taken a copy of all populated tables and tried to force primary keys within a new DB.
the problem is all off the tables have multiple datasets within them, a dataset for each year. This causes all instances of ID numbers to not be unique as they are replicated for every year they are active.
Its a school database so a student who has been here for 3 years will have 3 instances of his ID number, one for each years' data set.
So how do i force primary keys if there is no unique identifier? ive been highlighting both data set and ID columns and setting that combination as the primary key.
Essentially i need to analyse the relationships between the tabls in a diagram and also run some speed tests to see how fast the db works when it has indexes and primary keys.
the reason im writing is that ive done this on ten tables and with another 160 to do im just checking im doing the right thing?
CASE WHEN CAST(wo.start_date AS TIME) BETWEEN '00:00:00' AND '00:59:59' THEN 0 WHEN CAST(wo.start_date AS TIME) BETWEEN '01:00:00' AND '01:59:59' THEN 1 WHEN CAST(wo.start_date AS TIME) BETWEEN '02:00:00' AND '02:59:59' THEN 2 WHEN CAST(wo.start_date AS TIME) BETWEEN '03:00:00' AND '03:59:59' THEN 3 WHEN CAST(wo.start_date AS TIME) BETWEEN '04:00:00' AND '04:59:59' THEN 4
[code]....
The purpose is to take a row and set it to the hour of the day that it occurred in. This works fine, however I would like to force it to display every hour 0-23 regardless of whether or not it has a corresponding row.
So, if no row exists for 0, display 0 with null values for the rest of the columns.
In the following procedure i write the results to a temp table called #temp1I now want to count the results of #temp1, if the count of #temp1 = 0 I want to insert 'No Records Found' into #temp.ERRORMSG else return what is in the table
any idea on how to do this?
ALTER PROC [dbo].[SPU_RPT_Savings_AnomalyDispatches] 40,'04/01/07|06/30/07' @PropertyID varchar(4000), @DropDown varchar(50)
AS SELECT Client.CLIENT, Client.CLIENTID, ErrorEmailLog.ID, ErrorEmailLog.SITEID, ErrorEmailLog.PROPID, ErrorEmailLog.DISTINCTERRORS, ErrorEmailLog.ERRORMSG, ErrorEmailLog.ERRORDATETIME, ErrorEmailLog.EMAILRECIPIENTS, Property.PROPERTY, Property.STREET, Property.CITY, Property.STATE, Property.ZIP, Property.PHONE INTO #TEMP1 FROM ErrorEmailLog INNER JOIN Property ON ErrorEmailLog.PROPID = Property.PROPID INNER JOIN Client ON Property.CLIENTID = Client.CLIENTID WHERE (ErrorEmailLog.ERRORDATETIME BETWEEN SUBSTRING(CONVERT(VARCHAR(12), @DropDown), 0, 9)
Hello all, Is there any way to force Autonum to generate a number before an entire record is created? Some of my forms will not work because it needs a number already listed in its index (which uses Autonum) and cannot add to the table until it is created.I really need it to have a number ready and waiting upon the last record's completion.
im testing an application change that should handle a timeout on a stored procedure being called from the application. thing is, the timeout that we experience in production that led to this fix is random. so is there some way for me to setup a test stored procedure or some way to call the SP so that i can test a timeout scenario? im using MFC and the CDatabase::ExecuteSQL method to call this SP if you were wondering at all. this app is running locally on the server that has an instance of SQL Server Express 2k5 on it. server is running win 2k3.
I have a slight problem, a query that i have written produces data with 2 primary keys the same... however, DINSTINCT wont work in this case as the rows are still different...
Is their a way to force 1 column to always be unique?
Heres the query:
SELECT TOP 5 ORDER_ITEM.ItemID AS 'Item ID', ITEM.ItemName AS 'Item Name', (SELECT SUM(OrdItem2.ItemQuantity) FROM ORDER_ITEM OrdItem2 WHERE OrdItem2.ItemID = ORDER_ITEM.ItemID ) AS Total_Purchased, SUM(ORDER_ITEM.ItemQuantity) AS 'Customer Purchased', CUSTOMER.customerForename AS 'Customer Forename', CUSTOMER.customerSurname AS 'Customer Surname' FROM ITEM, ORDER_ITEM, ORDER_T, CUSTOMER WHERE ITEM.ItemID = ORDER_ITEM.ItemID AND ORDER_ITEM.OrderID = ORDER_0510096.OrderID AND ORDER_T.CustomerID = CUSTOMER.CustomerID GROUP BY ORDER_ITEM.ItemID, ITEM.ItemName, CUSTOMER.customerForename, CUSTOMER.customerSurname ORDER BY Total_Purchased DESC
The query is supposed to select the TOP 5 Products sold as well as selecting the customer that purchased the greatest amount of that item and the amount they purchased.
Currently, i will get 2 duplicate rows (except for customers name and the items the purchased. Like this:
ItemID 83630Mathew Smith 8 366Tony Wattage
Which is kinda annoying.... is there anyway i can prevent this?
And also apart from the Where Joins... is there a more efficient way of writing this?
I am developing a simple DB-Library program in C calling SQL Server 2000 onwindows 2003 and NT 4. I have some T-SQL code that checks for the existenceof a table and want to abort the program if the table doesn't exist. I issuea raiserror if the table doesn't exist and then call RETURN.I construct the string using sprintf and pass it dbfcnd and dbsqlexec. Sincethe commands work, there is no error to halt the execution of the program.Is there an easy, clean way to force dbsqlexec to fail? Do I need a storedprocedure to return an error code and then deal with that?Thanks for any advice,-Gary
A stored procedure in the cache is automatically recompiled when a table it refers to has a table structure change. User defined functions are not. Here's a simplified code sample:
set nocount on go
create table tmpTest (a int, b int, c int)
insert into tmpTest (a, b, c) values (1, 2, 3) insert into tmpTest (a, b, c) values (2, 3, 4) go
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[fTest]') and xtype in (N'FN', N'IF', N'TF')) drop function [dbo].[fTest] GO
CREATE FUNCTION dbo.fTest (@a int) RETURNS TABLE AS RETURN (SELECT * from tmpTest where a = @a) GO
select * from fTest(1)
CREATE TABLE dbo.Tmp_tmpTest ( a int NULL, b int NULL, d int NULL, c int NULL ) ON [PRIMARY] IF EXISTS(SELECT * FROM dbo.tmpTest) EXEC('INSERT INTO dbo.Tmp_tmpTest (a, b, c) SELECT a, b, c FROM dbo.tmpTest TABLOCKX') DROP TABLE dbo.tmpTest EXECUTE sp_rename N'dbo.Tmp_tmpTest', N'tmpTest', 'OBJECT'
select * from fTest(1)
drop table tmpTest
Running it, the output is:
a b c ----------- ----------- ----------- 1 2 3
Caution: Changing any part of an object name could break scripts and stored procedures. The OBJECT was renamed to 'tmpTest'. a b c ----------- ----------- ----------- 1 2 NULL
(I know that "select *" is bad, but it's a lot of legacy code that I'm working with here, and that's how it's written.)
The function doesn't detect that the table has changed in structure, or even that there is no longer a dependency on tmpTest. (Appending a column rather than inserting has the same effect, in that only the first 3 columns are returned.)
DBCC FREEPROCCACHE has no effect, not that I really expected it to, but you never know...
Is there any way, other than dropping and recreating, to force a recompilation of a particular function in memory, or perhaps all functions?
Due to a lack of planning during an Active Directory migration last year, I'm now stuck with an immutable service master key on one of my production servers. Since I'm posting here, I guess it's obvious that we have no backup from which to restore. The account that all of the SQL services used to run under no longer exists, so the WITH OLD_ACCOUNT workaround is not viable. And REGENERATE fails, as expected, with Msg 15329, Level 16, State 2, Line 2, "The current master key cannot be decrypted..."
After some research, including several of Laurentiu's blog entries, it seems that my only path at this point is to use the FORCE option to REGENERATE. (And then to immediately backup the service master key at several geographically disparate locations!!
Considering that:
We aren't actively using any of SQL Server's encryption capabilities, the closest we come is that one of our legacy applications calls the old PWDENCRYPT() function to hash passwords
##MS_ServiceMasterKey## is the only record in master.sys.symmetric_keys, and every other database's sys.symmetric_keys table is empty
What, if anything, am I likely to lose if I ALTER SERVICE MASTER KEY FORCE REGENERATE? My understanding is that since we don't have any database master keys and aren't using encryption, there's no real potential for corruption or loss. However, I want to be a little more confident about this before I give it a go.
Im trying to do an interactive sort , one of the rows returned from my datasource called 'Total' i wish to display at the bottom always. is there a way i can do this?
I've tried the below on the column header but the total is either at the bottom or the top how can i check the ordering if Ascending or Descending? Then i cld swop the 1 and the 2 around.
SELECT acct.USERNAME, SUM(trans.CHARGES) - SUM(trans.CREDITS) AS [Charges - Credits], MAX(trans.ENDPERIOD) AS [Billed Through], acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLAN FROM TRANS trans, ACCTS acct, BILLING bill WHERE trans.ACCTNUM = acct.ACCTNUM and bill.ACCTNUM = acct.ACCTNUM and bill.ACCTNUM = trans.ACCTNUM AND acct.CLOSED = 0 AND acct.SUSPENDED = 0 GROUP BY acct.USERNAME, acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLAN HAVING SUM(trans.CHARGES) - SUM(CREDITS) > 0 ORDER BY [Billed Through] DESC
SELECT acct.USERNAME, SUM(trans.CHARGES) - SUM(trans.CREDITS) AS [Charges - Credits], MAX(trans.ENDPERIOD) AS [Billed Through], acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLANFROM TRANS trans, ACCTS acct, BILLING billWHERE trans.ACCTNUM = acct.ACCTNUM AND bill.ACCTNUM = acct.ACCTNUM AND bill.ACCTNUM = trans.ACCTNUM AND acct.CLOSED = 0 AND acct.SUSPENDED = 0GROUP BY acct.USERNAME, acct.FULLNAME, bill.COMPANY, bill.BILLTOCOMPANY, bill.firstname, bill.lastname, bill.STREET1, bill.STREET2, bill.CITY, bill.STATE, bill.ZIPCODE, bill.COUNTRY, acct.PHONE1, acct.PHONE2, bill.EMAIL, acct.BILLPERIOD, acct.PLANHAVING SUM(trans.CHARGES) - SUM(CREDITS) > 0ORDER BY [Billed Through] DESC
Incorrect syntax near the keyword 'PLAN'.
If i take out SELECT & GROUP BY acct.plan, it works fine.
I've googled a bit and found 'EXPLAIN PLAN' command, I assume it's parsing the 'PLAN' as a command and screwing stuff up. I don't get why it'd take it for a command instead of a column. How does one select a keyword as a column name? Brackets & single quotes didn't do the trick.