Nested Tables With Data Mining Training Transform?
Nov 23, 2007Hi ...I can't figure out how to put nested tables into the Data Mining Model Training Transform (SSIS). Can anybody help me? some example please...!!!??
Diego B.
Hi ...I can't figure out how to put nested tables into the Data Mining Model Training Transform (SSIS). Can anybody help me? some example please...!!!??
Diego B.
I can't figure out how to put nested tables into the Data Mining Model Training Transform (SSIS). I can do a simple case table, but how do you get those nested tables with DM Training Transformation? Any ideas? Samples?
Thanks in advance,
-Young K
Lets take the following example:
Movie train table:
ID Class
1 +
2 +
3 -
4 +
5 -
Actor train nested table:
ID MovieID Gender
1 1 F
2 1 M
3 1 F
4 1 F
5 2 M
6 2 M
7 2 F
8 3 F
9 3 F
10 4 M
11 4 M
12 4 F
13 4 F
14 5 F
15 5 M
We want to build a classifier model in order to predict the Class of a Movie based on the Gender of movie's actors. To deal with the nested table Analysis Services maps each record of the nested table to an attribute of the case table. These attributes are named Actor(n).Gender with n = 1..15, and so they are dependent on the nested table record numbers. Both Microsoft Decision Trees and Microsoft Naive Bayes algorihms use these attributes without any modification.
We are implementing a Relational Naive Bayes algorithm and we are planning to aggregate such attributes in order to make them independent of the nested table record numbers.
Next step we tried to predict some unseen cases and here we face with
a very huge problem.
Lets take more two tables of unseen cases:
Movie test table:
ID Class
6 +
7 NULL
8 NULL
Actor test nested table:
ID MovieID Gender
1 6 F
2 6 M
3 6 F
4 6 F
16 7 F
17 7 M
18 7 F
19 7 F
20 7 F
21 8 M
22 8 M
23 8 F
Predicting the movie 6 Class is not a problem since the movie actors were included in the training dataset and when the records are mapped to attributes because they already exist in the model. But when you
try to predict movies (7 an 8) with unseen actors all new attributes are simply ignored in the ALGORITHM:redict call (in_ulCaseValues is zero!) because they do not exist in the model!
What is the solution?
Hi, all here,
Thank you very much for your kind attention.
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
Thank you very much in advance for any help.
With best regards,
Yours sincerely,
I have been trying to use SQL 2005 data mining for about 8 weeks. I am becoming frustrated because I am not able to make progress nor am I able to exploit the power of the system.
I need a training course! I have asked Microsoft in UK for recommendations but they have been unable to help. I have searched for courses in the UK and US without sucess.
I am coming to the Microsoft BI event in Seattle - will there be any opportunities there to get help or find help? (In Seattle I intend to concentrate on the Excel add ins)
Thank you
Hi, all experts here,
I am wondering is there any way to select only a portion of a data set to train the mining model? In this case, I mean we dont need to split the dataset in advance, what I want to do is being able to select any random portion of a selected dataset to train a mining model. Any advices?
I am looking forward to hearing from you and thanks a lot in advance for your advices and help.
With best regards,
Yours sincerely,
Could I ask how to spit the data into training and validation sets when doing data mining?
Thanks
In the Mining Accuracy Chart, the predictable columns of nested tables does not show up in the "Select predictable mining model columns to show in the lift chart" table. The "Predictable column name" is empty.
Predictable columns in the case table shows up, but not the predictable columns in the nested table. What am I missing?
-Young K
Hi, all here,
Thank you very much for your kind attention.
I dont think we should sample any nested tables for data mining model training? Since I think any nested tables are bound to the case table. Therefore whenever we sample the case table, the nested tables are like any other input attributes within the case table to be rectrieved as inputs accordingly?
Thank you very much for any guidance to clear my confusion.
With best regards,
Yours sincerely,
I am in the process of creating an Integration Services package to automate the process of training mining models and getting predictions. Until recently, I have been processing the models directly from Business Intelligence Studio without any problems. However, when I try to use the exact same training set as an input to the Data Mining Model Training destination, I get several errors. Here is the output:
[Mining Models [1]] Error: Parser: An error occurred during pipeline processing.
[Mining Models [1]] Error: Errors in the OLAP storage engine: The process operation ended because the number of errors encountered during processing reached the defined limit of allowable errors for the operation.
[Mining Models [1]] Error: Errors in the OLAP storage engine: An error occurred while the 'CPT MODIFIER' attribute of the 'BCCA DMS ~MC-CLAIM LIN~5' dimension from the 'BCCA LRG DMS TEST' database was being processed.
[Mining Models [1]] Error: File system error: The record ID is incorrect. Physical file: . Logical file: .
[Mining Models [1]] Error: Errors in the OLAP storage engine: The process operation ended because the number of errors encountered during processing reached the defined limit of allowable errors for the operation.
[Mining Models [1]] Error: Errors in the OLAP storage engine: An error occurred while the 'BILL TYPE' attribute of the 'BCCA DMS ~MC-CLAIM LIN~5' dimension from the 'BCCA LRG DMS TEST' database was being processed.
[Mining Models [1]] Error: File system error: The record ID is incorrect. Physical file: . Logical file: .
[DTS.Pipeline] Error: The ProcessInput method on component "Mining Models" (1) failed with error code 0x80004005. The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running.
I have not been able to find an answer as to why this is happening. I found a post regarding a similar problem with processing an OLAP cube in SSIS, but it seems that the author of that post never found an answer. Has anyone else here seen similar errors when processing mining models from Integration Services?
Also, if I process the mining models manually then try to run only predictions in SSIS, I get many of the same errors. I'll keep looking into the problem myself, but I would be very grateful if someone in this forum could shed some light on this issue.
Dear All, I have a simple mining structure created by the DMX statement below. Then I tried to insert some data with MDX language by extracting data in OLAP. But I got the following error when I execute the insert statement.
Errors in the high-level relational engine. The 'Customer ID' column in the RELATES clause was not found in the results of the OPENROWSET query.
It seems that the append statement can't really recognize the name of the column which should be Customer ID.
How can I fix this problem?
Thanks
Tony Chun Tung Siu
The source code for create and insert is as below.
create mining model customerMiner
(
customerID long key ,
age long continuous,
orders table
(
orderID long key,
goodsID long discrete predict_only
)
)using [Microsoft_decision_trees] with drillthrough;
insert into mining model customerMiner
(
customerID,
age,
orders
(
skip,
orderID,
goodsID
)
)
shape
{
openQuery([Simple SSAS],
'
select {[Measures].[Customer ID], [Measures].[Age]} on columns,
{[Customer].[Customer].[Customer].members} on rows
from fi
')
}
append
(
{
openQuery([Simple SSAS],
'
select {[Measures].[Customer ID],[Measures].[Order ID2],[Measures].[Goods ID]} on columns,
[Goods].[Order ID2].[Order ID2].members on rows
from fi
')
}
relate [Customer ID] to [Customer ID]
)as orders
Hi, all here,
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
I have a database available and it is in live production.
I want to show only three tables in the front end. asp.net application in dropdown menu. I checked the internet and found only query for seeing all the tables from the DB.
SQL 2008R2
Request is to merge or join or case stmt or union or... from up to four unique columns all in separate tables to new combined table (matrix) of results from said.
What is example of best method to do this ?
I followed the tutorial posted at [URL] ...
Everything was ok until the last step where I had to process the mining structure which resulted in a warning
"Informational (Data mining): Decision Trees found no splits for model, Tbl Decision Tree Example."
What does this error mean? How do I resolve it? Also, I only see the first level in the Mining Model Viewer, I don't see the levels 2 and 3.
I have a requirement of creating nested tables in SQL server. how to create them. Just to give a background I am trying to move the RDBMS from oracle to SQL server.
Structure of tables is as follows. I have table 'Employees' with address as one of the column. I have one more table with columns Street, Town, Dist, State. When I query the table 'Employees' I should see the attribute name and values of all the columns in address table in address column.
Employees: with columns: ID, FirstName, LastName, dept, gender, dob, address
Address (Nested table): with columns : Street, Town, Dist, State
This was done in oracle using Nested tables and user defined data types. what is alternative for this in SQL server. How can I achive this requirement in SQL server.
Hi, all experts here,
I would like to know if there is any way to migrate third-party data mining packages with SQL Server 2005 data mining algorithms together then we can have a comparison among all of them to get the best results for training models.
I am looking forward to hearing from you.
Thanks a lot.
With best regards,
Yours sincerely,
Hoping someone will have a solution for this error
Errors in the metadata manager. The data type of the '~CaseDetail ~MG-Fact Voic~6' measure must be the same as its source data type. This is because the aggregate function is not set to count or distinct count.
Is the problem due to the data type of the column used in the mining structure is Long, and the underlying field in the cube has a type of BigInt,or am I barking up the wrong tree?
I'm a beginner with SQL 2012 SSDT & SSMS. I get this error message when I try to deploy my project:Â
"Error 6
Error (Data mining): KEY SEQUENCE columns are not supported at the case level. The 'Customer Key' column of the 'TK448 Ch09 Cube Clustering' mining structure contains content that is not valid.
0 0
"
I am finding it hard to locate the content that is not valid. I've been trying to find a answer for this problem but can't seem to find anything. How can I locate the content that is not valid and change or delete it so that I can deploy this solution?
Having successfully created :
- a data mining structure with about 80 columns.
- a data mining model using Microsoft_Decision_Trees with 2 prediction columns.Â
I thought I would then explore the possibility of have more than 2 prediction columns, in this case 20.
I get an error message and I can't work out :
a) if this is because there's a limit to the maximum number of prediction columns and where that maximum is stated.
b) if something else has become corrupted
c) there's a know bug and if the error message is either meaningful or not.
Either way, I'm unable to complete the data mining wizardÂ
The error message is :Errors in the metadata manager. Either the mining structure with the ID of '[my model Structure]' does not exist in the database with the ID of 'DMAddinsDB', or the user does not have permissions to access the object.
Hi all,
I am using Microsoft_Time_Series and have set HISTORIC_MODEL_GAP to various values (from 1 to 21). I always get this error:
Error (Data mining): The 'HISTORIC_MODEL_GAP' data mining parameter is not valid for the 'My Time Series' model.
In Algorithm Parameters window, this parameters is not there by default, so I have to add it.
Any tip will be greatly appreciated.
Implementing data mining Add-in in an academic setting? We need to handle over 150 new students a semester and have their connection to Analysis Services survive for their four years at the college. We are introducing data mining to every freshman business student as a unit within their Intro to Excel class (close to a month of work to give them a sense of what is possible). Other courses later in their curriculum will expand on that introduction.Â
Once implemented, we would have as many as 900 connections to manage (four years from now). It is possible that multiple sections will be running at the same time, so 40 students may be accessing the data mining tools concurrently. Â
Is there a way to "bulk establish" the access credentials and establish those databases?
With SASS Database i have created Data mining Structure Using Time series algorithm, while processing the SSAS db, Data mining  taking long time to process, so how we can  reduce processing time ???
View 2 Replies View RelatedHi,
I have just run a simple data set through a model to predict a simple true or false value (i.e. binary output)
The Lift Chart/Mining Legend in Analysis Services shows three results €“ Score, Population Correct (%), and Predict Probability (%)
Population Correct I beleive is the percentage of predictions it got right out of the total number of predictions it tried to make. Is this correct?
However, I can€™t work out how the other two are derived in particular the 'SCORE'. To give a live example the scores were as follows:
Model Score Pop Correct Pred Probability
Decision Trees 0.83 76.59% 54.28%
Neural Network 0.75 67.63% 50.05%
Ideal Model 100.00%
Can anyone help with this and give a detailed explanation?
Many thanks,
S Rajput
Hi,
I am trying to model data in analysis services with the Advance Create Mining Model function in the excel addin. I am having trouble creating an association model that works like the Associate button above the Advanced button.
The format of my data is like this
OrderID Product
100 Bike
100 Helmet
100 Shoes
200 Helmet
200 basketball
200 Bat
300 Shoes
300 Socks
The associate button works perfectly since it asks me which column is the transaction id (orderid) and which column I am trying to predict (product). The advanced create mining model asks me to determine what the columns are...
OrderID=key Product=Input+Predict?
When I run the advance create mining model associate, I get a browser that gives me no rules and the support for only one item itemset (each product but no combination of products).
Does anyone know what I have to do to get it to work like the associate button?
I perform data mining on all products and a specific product category.
Do I need to create 2 data source views, one for all products and the other one for the specific product category?
Afterward, I run the Data Mining Wizard 2 times to create 2 mining structures.
I also need to add the same mining model (e.g. Bayes, Cluster) to each of these mining structures.
Is there any simple way to do it?
Thanks.
Joe.
Hi, all experts here,
Thanks for your kind attention.
I have a question on missing_value substitution. As we know in the algorithm parametres setting, there is an option for us to select the method to deal with the missing_value: None, Previous, or Mean. My question is: to numeric data column, we can select Mean method to deal with its missing value. But how about discrete type of data column? What is the best practice to deal with this kind of data? I do think it to a great extent matters the quality of the models being trained?
Can please any of you guys here give me any advices and suggestions on this based on what best practices you have got?
I am looking forward to hearing from you shortly and thanks a lot in advance.
With best regards,
Yours sincerely,
I've got a dataset bound to a Table in my report. Running the report takes a good 15-20 seconds as there's a metric ton of data coming back...that part works great as is.
Now I need to be able to Nest some additional Detail Data inside the main row group footer for each row (or not, depending if a field != null). I pass the subreport a value from the row, and it displays the returned data.
Normally if I was doing this from scratch I'd just include the data in the stored proc results, however I'm not doing it from scratch and I have no idea how to get this done so it's not taking 5 minutes to show the report
The "detail" stored proc might not even return data, it only really ever has 30-100 records. It seems almost a shame to keep requerying to check for results for all XThousand rows. Is there a way to maybe run it once and just keep re-filtering the data into the column I need?
ANY info would be great
Thanks
Steve
I have a decision tree mining model that has two nested tables and that amount of inputs processes in under a minute. When I add a third nested table in what I think is exactly the same way (I've tried two different ones), it never returns from counting the cases. Is there a limit on the number of nested tables one can have in a DT model? It does process the rest of the objects and measure groups but can never seem to return from counting cases, perpetually showing "Counting Cases 0".
I should probably add that the only way to stop it from processing after it hangs for 5 mins or 5 hours is to stop and start the service. After I remove a nested table and replace it with one of the new ones, it flies right through again. Something seems magical about the third nested table.
Thanks!
Brian
Hi friends,
Can somebody tell me how to do this-
How can we Analyze existing code used to transform data into the Operations Data Warehouse, and make changes to correspond to upcoming changes in the SAP data sources.
Thanks
sk
Hi,
I am trying to revamp our product database with a view to making it search-optmised and would like some guidance (or confirmation of method, if you will!!). We currently use a three table structure (Product, Brand, Cat(egory)) along the lines of :
create table product
(prod_id int not null,
brand_id int not null,
cat_id int not null,
other stuff e.g. tech. specs, displayed text on web page, etc....
)
with corresponding brand_id and cat_id in the other tables. While this seems relationally sound I see it as being inefficient for searching, particularly after reading the theory behind nested sets.
A new function I am building will enable users to drill down through the product list or runs searches against all or part of the db :
e.g. all products from one category,
all products fitting certain search criteria,
products from several selected brands fitting certain criteria,
and any combination of the above you can think of!
The problem is, not all products have the same criteria list (in fact I would be surprised if any did) and may also be of more than one category (a digital camera with movie mode might easily fit into the digital camcorder search). I think I am correct in that a nested set would make the structure fit the requirement - things like criteria, displayable text, etc. could be nodes in their own right and each logical level might have its own criteria. For example, if a category is selected then certain text must be displayed and could list further categories or products. The next level down would then require its own displayable text - I am mainly thinking about SEO tags here. Also, I am not precious about retaining the current table structure and would like an open ended solution where I can add further data/functionality in a dynamic fashion, which nested sets seem to embody.
Does this make sense to anybody coz I think I've confused myself even more!!
Thanks in advance,
G
I need to perform some operations that seem to require multiple nested tables.
Is the following code the way I should be attempting this or is there a better way?
Here is the sample code:
SELECT tt.tdl_ID, tt.tx_ID, adj.Prof_Chgs, rc.Rvu_Comp
FROM Table1 as tt
LEFT JOIN (
SELECT tt1.Tdl_Id,
CASE
WHEN tt1.tdl_ID = '1000'
[code]....
I'm just wodnering if you have two select statements, one nested inside another, can you reference tables from the outer loop?
for example, say I would like to find all employees who have at least two clients and employees and clients have a one to many relationship.
select *
from Employee e
where
(
select count(*)
from Clients c
where c.EmployeeId = e.EmployeeId
) >= 2
This obviously doesn't work - but how would i go about doing something like this?