How does cross-validation work in the case of models with predictable nested tables? Is it supported? For classification and regression with a flat structure, during the testing phase (that is, validation phase) of cross-validation I can think of the inputs being presented and comparing the predicted value with the real value. But in the case of nested tables, the input is not a subset of the attributes (a subset of the input vector), but whole input vectors. (For instance, complete itemsets in the case of association rules). Can you please explain some more how the validation phase works in the case of the association rules and decision trees with predictable nested tables?
In the Mining Accuracy Chart, the predictable columns of nested tables does not show up in the "Select predictable mining model columns to show in the lift chart" table. The "Predictable column name" is empty.
Predictable columns in the case table shows up, but not the predictable columns in the nested table. What am I missing?
Hi, can someone please explain some more how to use cross-validation in SQL Server 2008 (CTP)? I read here that it "is under Accuracy Charts in Business Intelligence Development Studio, in addition to being accessible programmatically via a stored procedure call". What is the stored procedure and input parameters? I'm actually interested in doing so inside an Integration Services package, in a data flow task.
Hi! I am using the Cross Validation tab in BIDS. Can you explain why my Liklihood Log Score is a negative number (according to BOL - meaning it is worse than a random guess) when my lift chart shoes that the alogorithm is significantly better than a random guess.
Also, the BOL definition for truenegative and falsenegative are hard for me to interpret. If I have a target state of "Yes", can you please put those definitions in yes/no terms for me?
I am a newbie in using MS SQL server with analysis services. There seems to be no 'cross-validation' tool in MS SQL which is frequently used in data mining and even statistics. Is there anyone having similar difficulties? Is there any solution like a small scripts to divide the given dataset with multiple folds? Your valuable comments and feedbacks would be appreciated.
I tried to utilize Mining Accuracy to analyze my models.
Mining Model Predictable Column Name Predictable Value ----------------------------------------------------------------------------------------------------- NaiveBayesModel DecisionTreeModel
When I want to choose an option for "Predictable Column Name" on NaiveBayesModel row or DecisionTreeModel row, there is no option/value/choice on the drop box. There is also no option/value/choice for the "Predictable Value" column.
When I clicked "Lift Chart" tab to see the accuracy chart, it gave me this error message: "No mining models are selected for comparison."
Hi, I'm calling the cross validation SP like this call SystemGetCrossValidationResults([V Product View Event],[V Product Event 1],4,500,'V Product View Event') and I get this error: Parsing the query ... Error (Data mining): The attribute, 'V Product View Event', is not valid in the current procedure call because the attribute is not a PREDICT or a PREDICT_ONLY attribute for the model, 'V Product Event 1'.
Parsing complete
In my model 'V Product View Event' is a predictable attribute. It so happens that I'm using association rules. And 'V Product View Event' is a nested table (a TableMiningStructureColumn). Inside the 'V Product View Event' nested table I have a 'Product' ScalarMiningStructureColumn. Is there any way to call the SP with nested tables in models? I'm using SQL Server 2008 June CTP.
I am wanting to show Total Quantity of Presentations2 for each EmpVol and TypeofPresentation, even if there are no Presentations done in TypeofPresentation or if an Employee did not do any Presentations.
I am close with the query below - but not getting back exactly what I want - Ideas.
SELECT ISNULL(Presentations2.Quantity, 0) AS Quantity, TypeofPresentation.Typeofpresforrpt, EmpVol_2.LastName, EmpVol_2.FSSTVOLUNTEER FROM Presentations2 INNER JOIN EmpVol ON Presentations2.ID = EmpVol.ID RIGHT OUTER JOIN TypeofPresentation ON Presentations2.TypePresINT = TypeofPresentation.TypePresINT CROSS JOIN EmpVol AS EmpVol_1 CROSS JOIN EmpVol AS EmpVol_2 GROUP BY TypeofPresentation.Typeofpresforrpt, EmpVol_2.LastName, EmpVol_2.FSSTVOLUNTEER, ISNULL(Presentations2.Quantity, 0) HAVING (EmpVol_2.FSSTVOLUNTEER = N'FSST')
It seems i face a problem with the Microsoft Decision Trees model when i have a predictable variable that is continuous. I have created the whole model according to the AdventureWorks tutorial (and it informs me that the same procedure is followed with a continuous variable) and i have flagged the variable as continuous. Even though everything seems be going well, the results i get are not correct (after a cross check with another project already done and checked). Is there something i am missing or i skipped while creating the model? Any suggestions that may help me are appreciated Thank you in advance
I've got a dataset bound to a Table in my report. Running the report takes a good 15-20 seconds as there's a metric ton of data coming back...that part works great as is.
Now I need to be able to Nest some additional Detail Data inside the main row group footer for each row (or not, depending if a field != null). I pass the subreport a value from the row, and it displays the returned data.
Normally if I was doing this from scratch I'd just include the data in the stored proc results, however I'm not doing it from scratch and I have no idea how to get this done so it's not taking 5 minutes to show the report
The "detail" stored proc might not even return data, it only really ever has 30-100 records. It seems almost a shame to keep requerying to check for results for all XThousand rows. Is there a way to maybe run it once and just keep re-filtering the data into the column I need?
I have a decision tree mining model that has two nested tables and that amount of inputs processes in under a minute. When I add a third nested table in what I think is exactly the same way (I've tried two different ones), it never returns from counting the cases. Is there a limit on the number of nested tables one can have in a DT model? It does process the rest of the objects and measure groups but can never seem to return from counting cases, perpetually showing "Counting Cases 0".
I should probably add that the only way to stop it from processing after it hangs for 5 mins or 5 hours is to stop and start the service. After I remove a nested table and replace it with one of the new ones, it flies right through again. Something seems magical about the third nested table.
I am trying to revamp our product database with a view to making it search-optmised and would like some guidance (or confirmation of method, if you will!!). We currently use a three table structure (Product, Brand, Cat(egory)) along the lines of :
create table product (prod_id int not null, brand_id int not null, cat_id int not null, other stuff e.g. tech. specs, displayed text on web page, etc.... )
with corresponding brand_id and cat_id in the other tables. While this seems relationally sound I see it as being inefficient for searching, particularly after reading the theory behind nested sets.
A new function I am building will enable users to drill down through the product list or runs searches against all or part of the db :
e.g. all products from one category, all products fitting certain search criteria, products from several selected brands fitting certain criteria, and any combination of the above you can think of!
The problem is, not all products have the same criteria list (in fact I would be surprised if any did) and may also be of more than one category (a digital camera with movie mode might easily fit into the digital camcorder search). I think I am correct in that a nested set would make the structure fit the requirement - things like criteria, displayable text, etc. could be nodes in their own right and each logical level might have its own criteria. For example, if a category is selected then certain text must be displayed and could list further categories or products. The next level down would then require its own displayable text - I am mainly thinking about SEO tags here. Also, I am not precious about retaining the current table structure and would like an open ended solution where I can add further data/functionality in a dynamic fashion, which nested sets seem to embody.
Does this make sense to anybody coz I think I've confused myself even more!!
hi! I've been using sql server for a while but until recently have kept things pretty simple. now I'm trying to expand my horizons by trying to tackle some more complex applications, and one I'm really struggling with is nested relations. I hope this is in the right forum; if it is not, please feel free to move it, thank you!
here is my problem: I'm desigining a simple "Downloads" page in asp.net, and I have two tables set up. One is Downloads and the other is DownloadCategories. here is a simplified layout of the tables:
Downloads: ID PK Title Description CatID FK
DownloadCategories ID PK Name Description ParentID FK
basically CatID in downloads is a foreign key to the ID in downloadcategories, and ParentID is a foreign key to the ID in the same table, downloadcategories. This is set up because I want to support an infinite number of categories, each being able to support their own subcategories, which can go deeper into more subcats, and so on...
the problem is that I can't seem to get them to fill into the dataset I've created in vs 2005's designer. I have a procedure SelectAll which retrieves all rows, and SelectMain which retrieves only the topmost categories (where ParentID=Null). If I fill the datatable with SelectMain, I don't get any of the child categories, and if I call SelectAll, I get just a single table with all the rows, but no relations.
I have defined a relation in the datatable that mirrors that of the database, but no matter what I try, I cannot get it to show the relationship in the datatable when I fill it. am I doing something wrong? this is kind of how I have it setup:
dataset with Downloads and Categories datatables, relations from Categories to Downloads, and from Categories to Categories. I have the two queries added to the table adapter, and I call the SelectAll query to fill it, but all it does is fill the table with rows; it doesn't create any relations.
I hope this explanation of my problem makes sense. as I said I'm still very new to this complex stuff, and I'm hoping to get my head around it soon, because I really need the functionality (not to mention the skills!) so if you can take a moment to go over what I've explained and point out where my flaw might be, I would really appreciate it!!
if you need any more information to help, please let me know and i'll get right back to you. thanks a bunch!
I have an sql file that contains several queries that are generating numbers to populate a sql table. The sql file is too large for a single sp so I am nesting them. I have 4 nested stored procedures. Each of the queries in each stored procedure dumps into its own global temp table. The final stored procedure needs to insert into a sql table all the information gathered in the global temp tables. So the final stored proc. looks something like: "Create procedure usp_myProc_4 AS EXEC usp_myProc_3 INSERT INTO mySQLTable (a,b,c) SELECT a, b, c FROM ##myTempTable (which was created in usp_myProc_1)
INSERT INTO mySQLTable (a,b,c) SELECT a, b, c FROM ##myOtherTempTable
INSERT INTO......etc;"
I have done this befor and it worked fine. The only difference is that when I did this before these insert statements were being called from within an sp_makewebtask procedure.
Now when I try to save this final stored procedure it tells me "Invalid Object Name: ##myTempTable"
How do I call on these global temp tables from my final nested stored procedure?
I have a problem in design the tables. My main task is to learn how to give the Match Score.
I have hundreds of dataset and one of them is like this:
Test Record Number: 19 Prospect ID = 254040088233400441105260031881009 Match Score = 95 Input Record Fielding ( eg wordnumber[Field] ) : 1[1] 2[1] 3[11] 4[11] 5[11] Prospect Word = 1 type = 1 match level = 4 input word = 1 input type = 1 Prospect Word = 2 type = 2 match level = 0 input word = NA input type = NA Prospect Word = 3 type = 3 match level = 4 input word = 2 input type = 1 Prospect Word = 4 type = 11 match level = 4 input word = 3 input type = 11 Prospect Word = 5 type = 13 match level = 4 input word = 4 input type = 11 Prospect Word = 6 type = 14 match level = 4 input word = 5 input type = 11
Now I have all my data stored in the DB and I seperated them into 3 tables and their structures are:
and the prspid in table 2 & 3 refers to the prospectID in table 1.What I did was setting:
a) prospect table as case table with id AS key, prospectID AS input & predictable;
b) and the other two as nested tables with inputword/inputfield AS key & input, prospectword/prospecttype/matchlevel/inputword/inputtype AS key & input .
But it shows error for having multiply key columns...
And also I am thinking about using the Naive bayes algorithm. Can I also have some suggestion on this?
Hi ...I can't figure out how to put nested tables into the Data Mining Model Training Transform (SSIS). Can anybody help me? some example please...!!!?? Diego B.
i am trying to associate city in patient table --> disease in diseases table. I want to build association data mining model and use it on web form, such a way when the user enters city associated disease will be displayed.
should i select all 3 table to build the model? could help me to decide what tables should i select as Case and what tables as Nested? what attributes from the table should i select as key, input, predictive ?
i am using data mining tutorials on sqlserverdatamining.com to build this model. is there anything further during my model building i get into confusion? please suggest me where i can find complete resource or inform here.
i appreciate Mr.Jamie for his guidance so far in my academic project. i do have the book 'Data mining with sql server 2005'. I left with just one day to do this and document.
hoping someone could suggest. your help is much appreciated.
I have a requirement of creating nested tables in SQL server. how to create them. Just to give a background I am trying to move the RDBMS from oracle to SQL server.
Structure of tables is as follows. I have table 'Employees' with address as one of the column. I have one more table with columns Street, Town, Dist, State. When I query the table 'Employees' I should see the attribute name and values of all the columns in address table in address column.
Employees: with columns: ID, FirstName, LastName, dept, gender, dob, address
Address (Nested table): with columns : Street, Town, Dist, State
This was done in oracle using Nested tables and user defined data types. what is alternative for this in SQL server. How can I achive this requirement in SQL server.
I have this INNER JOIN that is fine to show all possible combinations. But I need to show only rows that have one or more Null values in tbIntersect.
Should I use nested LEFT JOINT? How?
This is the SQL statement: sSQL = "SELECT DISTINCT tbCar100.Car100_ID, tbCar100.Description100 AS [Caractéristique 100], " & _ "tbCar200.Car200_ID, tbCar200.Description200 AS [Caractéristique 200], " & _ "tbCar300.Car300_ID, tbCar300.Description300 AS [Caractéristique 300], " & _ "tbCar400.Car400_ID, tbCar400.Description400 AS [Caractéristique 400], " & _ "tbCar500.Car500_ID, tbCar500.Description500 AS [Caractéristique 500], " & _ "tbCar600.Car600_ID, tbCar600.Description600 AS [Caractéristique 600], " & _ "tbCar700.Car700_ID, tbCar700.Description700 AS [Caractéristique 700], " & _ "tbProducts.Prod_ID, tbProducts.PartNumber AS [Part Number] , tbProducts.Description AS [Description] , tbProducts.DateAdded AS [Date] " & _ "FROM tbProducts INNER JOIN (tbCar700 INNER JOIN (tbCar600 INNER JOIN (tbCar500 INNER JOIN (tbCar400 INNER JOIN (tbCar300 INNER JOIN (tbCar100 INNER JOIN " & _ "(tbCar200 INNER JOIN tbIntersect ON tbCar200.Car200_ID = tbIntersect.Car200_ID) " & _ "ON tbCar100.Car100_ID = tbIntersect.Car100_ID) ON tbCar300.Car300_ID = tbIntersect.Car300_ID) ON tbCar400.Car400_ID = tbIntersect.Car400_ID) ON tbCar500.Car500_ID = tbIntersect.Car500_ID) ON tbCar600.Car600_ID = tbIntersect.Car600_ID) ON tbCar700.Car700_ID = tbIntersect.Car700_ID) ON tbProducts.Prod_ID = tbIntersect.Prod_ID " & _ ";"
Here is the content of the tbIntersect table: Car100_ID Car200_ID Car300_ID Car400_ID Car500_ID Car600_ID Car700_ID Prod_ID ID 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 19 1 3 1 1 1 1 1 20
I need to return the rows that have null data, ex: second row because Prod_ID is NULL and third row because Car300_ID is NULL. In fact I need the data from the other joint tables that correspond to these ID fields.
Hello , i have 2 seperate tables of information about people. Table A :with a key column Anumber contains mobile telephone numbers and table B : with a key column named Bnumber contains mobile telephone numbers
These two key columns have the same data type and hold the same information (mobile phone numbers).
Some mobile numbers from table A exist in table B so i wanted to run a clustering algorithm in order to gain information from the two tables.
I created a new table C with all the distinct MobileNumbers found in the tables A and B ,set the Cnumber column as key columns and linked it with the equivalent columns Anumber , Bnumbers from tables A and B.
Although, when i desing a training model in the Business intelligent Studio ( New mining structure) and set table C as case table and A and B as nested tables in the "DataMiningWizard>Specify the columns used in your analysis" window the key columns from table B and C DO NOT appear at all So if i click next i get a warning (You have not defined a key column for the nested tables). I proceed, put the key columns manually from the mining structure tab (drag and drop the key columns from the data sourve view) but when i run the clustering algorithm the results doesnt at all make sence as you can see at
Quick question, I hope, I am trying to create a table that has a column that is a nested table in SQL Server 2005 Express Edition. Any ideas how I could go about doing this?
I can't figure out how to put nested tables into the Data Mining Model Training Transform (SSIS). I can do a simple case table, but how do you get those nested tables with DM Training Transformation? Any ideas? Samples?
Actor train nested table: ID MovieID Gender 1 1 F 2 1 M 3 1 F 4 1 F 5 2 M 6 2 M 7 2 F 8 3 F 9 3 F 10 4 M 11 4 M 12 4 F 13 4 F 14 5 F 15 5 M
We want to build a classifier model in order to predict the Class of a Movie based on the Gender of movie's actors. To deal with the nested table Analysis Services maps each record of the nested table to an attribute of the case table. These attributes are named Actor(n).Gender with n = 1..15, and so they are dependent on the nested table record numbers. Both Microsoft Decision Trees and Microsoft Naive Bayes algorihms use these attributes without any modification.
We are implementing a Relational Naive Bayes algorithm and we are planning to aggregate such attributes in order to make them independent of the nested table record numbers.
Next step we tried to predict some unseen cases and here we face with a very huge problem.
Lets take more two tables of unseen cases:
Movie test table: ID Class 6 + 7 NULL 8 NULL
Actor test nested table: ID MovieID Gender 1 6 F 2 6 M 3 6 F 4 6 F 16 7 F 17 7 M 18 7 F 19 7 F 20 7 F 21 8 M 22 8 M 23 8 F
Predicting the movie 6 Class is not a problem since the movie actors were included in the training dataset and when the records are mapped to attributes because they already exist in the model. But when you try to predict movies (7 an 8) with unseen actors all new attributes are simply ignored in the ALGORITHM:redict call (in_ulCaseValues is zero!) because they do not exist in the model!
I am working on a query application, and I want to do syntax validation before I submit the dynamically sql to the database. The expression will include ANDs, ORs, IN, (,),>,<,etc. Anyone done this already? any code snippets?