Can I Ask How To Use 'weight' Variable In Descision Tree And How To Use Cross Validation In The Data Mining Procedure?
Jun 15, 2007Also when using cross validation, which descision tree should we choose?
Thanks very much
Also when using cross validation, which descision tree should we choose?
Thanks very much
Can I ask how to split the dataset into training and validation when running descision tree model?
Hi!
I try to use Data Mining HTML Tree Viewer, but has some problem. In the Connection property, i can't set a new connection to this property.
Help me!!!
I am going through the data mining web control viewer tutorial and its going great. I have been able to build and setup the viewer. The problem I am having is when I publish the site out to my web server, it gives me the following error:
Code Snippet
Error: Either the user, <domain><computerName>$, does not have access to the prospectDataMining database, or the database does not exist.
When I debug this on my local machine via Visual Studio 2005, it works GREAT! It is just when I publish the site to the web server.
I have a dedicated SSAS server along w/ a dedicated web server. To test, I published the site to the SSAS server to see if it was a connection issue. I received the same error w/ a different <domain><computerName>$.
I looked at trying to put in the optional connection info for the dataMining html tree viewer properties... but apparently dont know how to do that properly. I also checked the IIS directory security and enabled Integrated security.
What am i doing wrong? ANY help is much appreciated.
Thanks,
Cameron
We've successfully processed a large decision tree model in SQL Server 2005. When I try to view the tree in the mining model viewer, I get the following error:
TITLE: Microsoft Visual Studio
------------------------------
The tree graph cannot be created because of the following error:
'Exception of type 'System.OutOfMemoryException' was thrown.'.
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.42&EvtSrc=Microsoft.AnalysisServices.Viewers.SR&EvtID=ErrorCreateGraphFailed&LinkId=20476
The link provides no other documentaiton on the error.
We're using 64-bit SQL on a Dell Workstation running XP-64 with 16GB of memory. From my view of things we aren't close to running out of memory. Since the model processed and the error occurs when viewing the model, is this a problem with Visual Studio and nont necessarily Anlaysis Services?
Thanks in advance.
Nick
Could I ask how to spit the data into training and validation sets when doing data mining?
Thanks
Hi, can someone please explain some more how to use cross-validation in SQL Server 2008 (CTP)? I read here that it "is under Accuracy Charts in Business Intelligence Development Studio, in addition to being accessible programmatically via a stored procedure call". What is the stored procedure and input parameters?
I'm actually interested in doing so inside an Integration Services package, in a data flow task.
Thanks,
Gustavo Frederico
Hi, all here,
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
Hi! I am using the Cross Validation tab in BIDS. Can you explain why my Liklihood Log Score is a negative number (according to BOL - meaning it is worse than a random guess) when my lift chart shoes that the alogorithm is significantly better than a random guess.
Also, the BOL definition for truenegative and falsenegative are hard for me to interpret. If I have a target state of "Yes", can you please put those definitions in yes/no terms for me?
Thanks.
Ann
Dear forum users,
I am a newbie in using MS SQL server with analysis services.
There seems to be no 'cross-validation' tool in MS SQL
which is frequently used in data mining and even statistics.
Is there anyone having similar difficulties?
Is there any solution like a small scripts to divide
the given dataset with multiple folds?
Your valuable comments and feedbacks would be appreciated.
Minnetongka
How does cross-validation work in the case of models with predictable nested tables? Is it supported? For classification and regression with a flat structure, during the testing phase (that is, validation phase) of cross-validation I can think of the inputs being presented and comparing the predicted value with the real value. But in the case of nested tables, the input is not a subset of the attributes (a subset of the input vector), but whole input vectors. (For instance, complete itemsets in the case of association rules). Can you please explain some more how the validation phase works in the case of the association rules and decision trees with predictable nested tables?
thanks,
Gustavo Frederico
Hi, guys,
Thanks for your kind attention.
Just want to make things perfectly work and make the most of our fantastic SQL Server 2005 Data Mining Engine. Can any of you here give me some super advices on the validation of the mining models. As we always see, the 3 aspects of a mining model are: Score, Population correct, and Predict Probability. So the question is: How can we combine these three aspects to best judge the mining models by being able to tell which model is the best one? And to what extent can we really trust these mining models?
These are very important before we can actually bring the models into work to convince other people who have no ideas what are going on with these models. Yes, we just want to convince them with the results of these models and make the most of them and best help them getting the most from their business operations etc.
By the way please can you explain a bit details on each of these aspects? Thanks again.
I am looking forward to hearing from you shortly and thanks bunch for your help.
With best regards,
Yours sincerely,
I followed the tutorial posted at [URL] ...
Everything was ok until the last step where I had to process the mining structure which resulted in a warning
"Informational (Data mining): Decision Trees found no splits for model, Tbl Decision Tree Example."
What does this error mean? How do I resolve it? Also, I only see the first level in the Mining Model Viewer, I don't see the levels 2 and 3.
Hi, all experts here,
I would like to know if there is any way to migrate third-party data mining packages with SQL Server 2005 data mining algorithms together then we can have a comparison among all of them to get the best results for training models.
I am looking forward to hearing from you.
Thanks a lot.
With best regards,
Yours sincerely,
Hoping someone will have a solution for this error
Errors in the metadata manager. The data type of the '~CaseDetail ~MG-Fact Voic~6' measure must be the same as its source data type. This is because the aggregate function is not set to count or distinct count.
Is the problem due to the data type of the column used in the mining structure is Long, and the underlying field in the cube has a type of BigInt,or am I barking up the wrong tree?
I'm a beginner with SQL 2012 SSDT & SSMS. I get this error message when I try to deploy my project:Â
"Error 6
Error (Data mining): KEY SEQUENCE columns are not supported at the case level. The 'Customer Key' column of the 'TK448 Ch09 Cube Clustering' mining structure contains content that is not valid.
0 0
"
I am finding it hard to locate the content that is not valid. I've been trying to find a answer for this problem but can't seem to find anything. How can I locate the content that is not valid and change or delete it so that I can deploy this solution?
Having successfully created :
- a data mining structure with about 80 columns.
- a data mining model using Microsoft_Decision_Trees with 2 prediction columns.Â
I thought I would then explore the possibility of have more than 2 prediction columns, in this case 20.
I get an error message and I can't work out :
a) if this is because there's a limit to the maximum number of prediction columns and where that maximum is stated.
b) if something else has become corrupted
c) there's a know bug and if the error message is either meaningful or not.
Either way, I'm unable to complete the data mining wizardÂ
The error message is :Errors in the metadata manager. Either the mining structure with the ID of '[my model Structure]' does not exist in the database with the ID of 'DMAddinsDB', or the user does not have permissions to access the object.
Hi all,
I am using Microsoft_Time_Series and have set HISTORIC_MODEL_GAP to various values (from 1 to 21). I always get this error:
Error (Data mining): The 'HISTORIC_MODEL_GAP' data mining parameter is not valid for the 'My Time Series' model.
In Algorithm Parameters window, this parameters is not there by default, so I have to add it.
Any tip will be greatly appreciated.
Implementing data mining Add-in in an academic setting? We need to handle over 150 new students a semester and have their connection to Analysis Services survive for their four years at the college. We are introducing data mining to every freshman business student as a unit within their Intro to Excel class (close to a month of work to give them a sense of what is possible). Other courses later in their curriculum will expand on that introduction.Â
Once implemented, we would have as many as 900 connections to manage (four years from now). It is possible that multiple sections will be running at the same time, so 40 students may be accessing the data mining tools concurrently. Â
Is there a way to "bulk establish" the access credentials and establish those databases?
How to create a stored procedure to query the data mining model?
Where can I find some sample code?
Thanks.
Joe.
With SASS Database i have created Data mining Structure Using Time series algorithm, while processing the SSAS db, Data mining  taking long time to process, so how we can  reduce processing time ???
View 2 Replies View RelatedHi,
I have just run a simple data set through a model to predict a simple true or false value (i.e. binary output)
The Lift Chart/Mining Legend in Analysis Services shows three results €“ Score, Population Correct (%), and Predict Probability (%)
Population Correct I beleive is the percentage of predictions it got right out of the total number of predictions it tried to make. Is this correct?
However, I can€™t work out how the other two are derived in particular the 'SCORE'. To give a live example the scores were as follows:
Model Score Pop Correct Pred Probability
Decision Trees 0.83 76.59% 54.28%
Neural Network 0.75 67.63% 50.05%
Ideal Model 100.00%
Can anyone help with this and give a detailed explanation?
Many thanks,
S Rajput
Hi,
I am trying to model data in analysis services with the Advance Create Mining Model function in the excel addin. I am having trouble creating an association model that works like the Associate button above the Advanced button.
The format of my data is like this
OrderID Product
100 Bike
100 Helmet
100 Shoes
200 Helmet
200 basketball
200 Bat
300 Shoes
300 Socks
The associate button works perfectly since it asks me which column is the transaction id (orderid) and which column I am trying to predict (product). The advanced create mining model asks me to determine what the columns are...
OrderID=key Product=Input+Predict?
When I run the advance create mining model associate, I get a browser that gives me no rules and the support for only one item itemset (each product but no combination of products).
Does anyone know what I have to do to get it to work like the associate button?
I perform data mining on all products and a specific product category.
Do I need to create 2 data source views, one for all products and the other one for the specific product category?
Afterward, I run the Data Mining Wizard 2 times to create 2 mining structures.
I also need to add the same mining model (e.g. Bayes, Cluster) to each of these mining structures.
Is there any simple way to do it?
Thanks.
Joe.
I got assignment, how to make it appear in the right order .
/* DROP TABLE EMP
SELECT * INTO Emp FROM (
SELECT 'A' EmpID, NULL ManID, 'Name' EmpName UNION ALL
SELECT 'MAC' EmpID, 'A' ManID, 'Name__' EmpName UNION ALL
SELECT '1ABA' EmpID, 'MAC' ManID, 'Name____' EmpName UNION ALL
SELECT 'ABB' EmpID, '1ABA' ManID, 'Name______' EmpName UNION ALL
SELECT 'XB' EmpID, 'A' ManID, 'Name__' EmpName UNION ALL
SELECT 'BAC' EmpID, 'XB' ManID, 'Name____' EmpName ) b
*/
[code]....
I apologize if I posted in the wrong section, but I cannot find a solution to this. I was hoping that some one can help me figure this out.
I need this solution in a stored procedure VS using dataset because there are thousands of categories and it is extremely slow with a dataset.
I have a category table
ID intPARENTID intNAME
SAMPLE DATA:
ID
NAME
PARENT_ID
1
ANIMALS
0
2
DOGS
1
3
CATS
1
4
Abyssinian
3
5
Persian
3
6
Rurkish Van
3
7
Dalmation
2
8
German Shepherd
2
9
Irish Setter
2
10
Bulldog
2
I need the stored proc to return results in a single record set like this:
ANIMALS - DOGS - - Dalmation - - German Shepherd - - Irish Setter - - Bulldog - CATS - - Abyssinian - - Persian - - Turkish Van
Seems fairly easy but I have been battling with this for days now.
Thank you in advance,
EL
Well I have a two tables lets say they look like as follows:
Table A
CompanyID
Location
TableB
CompanyID
CompanyName
Date
The CompanyID column has values that look like this:
AAA OR
AAA.BBB OR
AAA.BBB.CCC OR
AAA.BBB.CCC.DDD OR
AAA.BBB.CCC.DDD.EEE OR
AAA.BBB.CCC.DDD.EEE.FFF
each line representing the level of the company.
we can see that CompanyID column exists in both tables. and I would like to find out the CompanyName from table B for each companyID that exists in Table A.
but the tricky part is that for a specific CompanyID in tableA I might not have a exact match.
e.g. in A lets say I have AAA.BBB.CCC.DDD.EEE for CompanyID but in table B i might not have AAA.BBB.CCC.DDD.EEE but instead have AAA.BBB or AAA.BBB.CCC or AAA.
so I have to walk up the tree or these levels to look for a match an get the companyName. the match logic is as such.
If in table A companyID = AAA.BBB.CCC.DDD.EEE then look for same value in B if NOT FOUND then
remove the last level from the companyID value from Table A. so new value of CompanyID = AAA.BBB.CCC.DDD
Now look for this new value AAA.BBB.CCC.DDD in table B. IF NOT FOUND then
remove another level from ComapnyID in table A so new companyID = AAA.BBB.CCC and look for
AAA.BBB.CCC in table B. just going up the tree by chopping off a level till I find a match.
now the question is how to do this in TSQL.... can someone help?
So what I really want to do is to look for a match between A and B based on companyID.
Hi, all here,
Thank you very much for your kind attention.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
Thank you very much in advance for any help.
With best regards,
Yours sincerely,
HiI have a table with rows of timestamped values for given variables. Each of the variables are periodically inserted at the same time. ieDateTime Variable Value.T1 V1 xxxxT1 V2 xxxxT1 V3 xxxxT2 V1 yyyyT2 V2 yyyyT2 V3 yyyyIn a SQLserver stored procedure I would like to insert this data into a temp table like the following. I don't need summaries just need a table or query where I can select on a variables value. ie "select * from temptable where V3 = yyyy"Datetime V1 V2 V3T1 xxxx xxxx xxxxT2 yyyy yyyy yyyy any suggestions for an SQL statement.thanks in advance.
View 2 Replies View RelatedHi All,
I am working on a project in which an Integration Services package should be executed 24 times a day. Everytime it is executed, it validates for tables, views etc. I am able to validate for tables, views, partition functions etc. But when I am trying to write a code as
If not exists (select name from sys.objects where type='U' and name='SP')
Create Procedure SP
Begin
End
I am trying to put the above as a task in Integration Services.
When the task is executed, it is throw an error 'Create Procedure should be the first line' etc.
How to solve the above problem? Can any help me immediately?
Thanks
I have an MVC asp.net application that stores many records in a table on sql server, in its own system. Â used the system for 2 months, worked fine accessing, changing data.
Now that other users are logging in? there is cross coupling going on. Â one user gets the data from another users sql search.
In the mvc app it had used the get async method to read the ID record from the db, i set that to synchronous. Â no effect; Â the user makes their own login id but that does nt matter either.
I'm new to SSIS, but have been programming in SQL and ASP.Net for several years. In Visual Studio 2005 Team Edition I've created an SSIS that imports data from a flat file into the database. The original process worked, but did not check the creation date of the import file. I've been asked to add logic that will check that date and verify that it's more recent than a value stored in the database before the import process executes.
Here are the task steps.
[Execute SQL Task] - Run a stored procedure that checks to see if the import is running. If so, stop execution. Otherwise, proceed to the next step.
[Execute SQL Task] - Log an entry to a table indicating that the import has started.
[Script Task] - Get the create date for the current flat file via the reference provided in the file connection manager. Assign that date to a global value (FileCreateDate) and pass it to the next step. This works.
[Execute SQL Task] - Compare this file date with the last file create date in the database. This is where the process breaks. This step depends on 2 variables defined at a global level. The first is FileCreateDate, which gets set in step 3. The second is a global variable named IsNewFile. That variable needs to be set in this step based on what the stored procedure this step calls finds out on the database. Precedence constraints direct behavior to the next proper node according to the TRUE/FALSE setting of IsNewFile.
If IsNewFile is FALSE, direct the process to a step that enters a log entry to a table and conclude execution of the SSIS.
If IsNewFile is TRUE, proceed with the import. There are 5 other subsequent steps that follow this decision, but since those work they are not relevant to this post.
Here is the stored procedure that Step 4 is calling. You can see that I experimented with using and not using the OUTPUT option. I really don't care if it returns the value as an OUTPUT or as a field in a recordset. All I care about is getting that value back from the stored procedure so this node in the decision tree can point the flow in the correct direction.
CREATE PROCEDURE [dbo].[p_CheckImportFileCreateDate]
/*
The SSIS package passes the FileCreateDate parameter to this procedure, which then compares that parameter with the date saved in tbl_ImportFileCreateDate.
If the date is newer (or if there is no date), it updates the field in that table and returns a TRUE IsNewFile bit value in a recordset.
Otherwise it returns a FALSE value in the IsNewFile column.
Example:
exec p_CheckImportFileCreateDate 'GL Account Import', '2/27/2008 9:24 AM', 0
*/
@ProcessName varchar(50)
, @FileCreateDate datetime
, @IsNewFile bit OUTPUT
AS
SET NOCOUNT ON
--DECLARE @IsNewFile bit
DECLARE @CreateDateInTable datetime
SELECT @CreateDateInTable = FileCreateDate FROM tbl_ImportFileCreateDate WHERE ProcessName = @ProcessName
IF EXISTS (SELECT ProcessName FROM tbl_ImportFileCreateDate WHERE ProcessName = @ProcessName)
BEGIN
-- The process exists in tbl_ImportFileCreateDate. Compare the create dates.
IF (@FileCreateDate > @CreateDateInTable)
BEGIN
-- This is a newer file date. Update the table and set @IsNewFile to TRUE.
UPDATE tbl_ImportFileCreateDate
SET FileCreateDate = @FileCreateDate
WHERE ProcessName = @ProcessName
SET @IsNewFile = 1
END
ELSE
BEGIN
-- The file date is the same or older.
SET @IsNewFile = 0
END
END
ELSE
BEGIN
-- This is a new process for tbl_ImportFileCreateDate. Add a record to that table and set @IsNewFile to TRUE.
INSERT INTO tbl_ImportFileCreateDate (ProcessName, FileCreateDate)
VALUES (@ProcessName, @FileCreateDate)
SET @IsNewFile = 1
END
SELECT @IsNewFile
The relevant Global Variables in the package are defined as follows:
Name : Scope : Date Type : Value
FileCreateDate : (Package Name) : DateType : 1/1/2000
IsNewFile : (Package Name) : Boolean : False
Setting the properties in the "Execute SQL Task Editor" has been the difficult part of this. Here are the settings.
General
Name = Compare Last File Create Date
Description = Compares the create date of the current file with a value in tbl_ImportFileCreateDate.
TimeOut = 0
CodePage = 1252
ResultSet = None
ConnectionType = OLE DB
Connection = MyServerDataBase
SQLSourceType = Direct input
IsQueryStoredProcedure = False
BypassPrepare = True
I tried several SQL statements, suspecting it's a syntax issue. All of these failed, but with different error messages. These are the 2 most recent attempts based on posts I was able to locate.
SQLStatement = exec ? = dbo.p_CheckImportFileCreateDate 'GL Account Import', ?, ? output
SQLStatement = exec p_CheckImportFileCreateDate 'GL Account Import', ?, ? output
Parameter Mapping
Variable Name = User::FileCreateDate, Direction = Input, DataType = DATE, Parameter Name = 0, Parameter Size = -1
Variable Name = User::IsNewFile, Direction = Output, DataType = BYTE, Parameter Name = 1, Parameter Size = -1
Result Set is empty.
Expressions is empty.
When I run this in debug mode with this SQL statement ...
exec ? = dbo.p_CheckImportFileCreateDate 'GL Account Import', ?, ? output
... the following error message appears.
SSIS package "MyPackage.dtsx" starting.
Information: 0x4004300A at Import data from flat file to tbl_GLImport, DTS.Pipeline: Validation phase is beginning.
Error: 0xC002F210 at Compare Last File Create Date, Execute SQL Task: Executing the query "exec ? = dbo.p_CheckImportFileCreateDate 'GL Account Import', ?, ? output" failed with the following error: "No value given for one or more required parameters.". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
Task failed: Compare Last File Create Date
Warning: 0x80019002 at GLImport: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (1) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "MyPackage.dtsx" finished: Failure.
When the above is run tbl_ImportFileCreateDate does not get updated, so it's failing at some point when calling the procedure.
When I run this in debug mode with this SQL statement ...
exec p_CheckImportFileCreateDate 'GL Account Import', ?, ? output
... the tbl_ImportFileCreateDate table gets updated. So I know that data piece is working, but then it fails with the following message.
SSIS package "MyPackage.dtsx" starting.
Information: 0x4004300A at Import data from flat file to tbl_GLImport, DTS.Pipeline: Validation phase is beginning.
Error: 0xC001F009 at GLImport: The type of the value being assigned to variable "User::IsNewFile" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
Error: 0xC002F210 at Compare Last File Create Date, Execute SQL Task: Executing the query "exec p_CheckImportFileCreateDate 'GL Account Import', ?, ? output" failed with the following error: "The type of the value being assigned to variable "User::IsNewFile" differs from the current variable type. Variables may not change type during execution. Variable types are strict, except for variables of type Object.
". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.
Task failed: Compare Last File Create Date
Warning: 0x80019002 at GLImport: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (3) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
SSIS package "MyPackage.dtsx" finished: Failure.
The IsNewFile global variable is scoped at the package level and has a Boolean data type, and the Output parameter in the stored procedure is defined as a Bit. So what gives?
The "Possible Failure Reasons" message is so generic that it's been useless to me. And I've been unable to find any examples online that explain how to do what I'm attempting. This would seem to be a very common task. My suspicion is that one or more of the settings in that Execute SQL Task node is bad. Or that there is some cryptic, undocumented reason that this is failing.
Thanks for your help.
I am currently in the process of migrating data from Sybase to Sql server and would like to know how to test the data migrated.
As of now, we took one table data from both source and destination and compared it in Excel to check if the data migrated looks good (note, we used SSIS to migrate data). However, I would like to check if there are any other best & easy ways to apprach data validation post migration.