I have a question about writing a prediction query against a clustering model that has the same column added more than once.
Per Jamie, I can accomplish some crude weighting by adding a column to my model multiple times. See this post for an explnation... Now that I have that worked out, I was wondering how my DM query would look? If I have Input_A1, Input_A2 , & Input_A3 all being source from the same column in my structure do I have to reference all three when writing my prediction query?
I have a market basket model using associations. It generated several dozen itemsets. However when I attempt to run a singleton prediction like this:
select (Predict(Orderproduct3q,INCLUDE_STATISTICS,10)) as [Recommendation]
From
[Case All]
NATURAL PREDICTION JOIN
(SELECT (SELECT '16407' AS [Pname])) AS t1
the resulting predictions don't take the itemsets into account. Instead, the predictions consist of the ranked products in the training set, ordered by frequency. This appears to happen regardless of the precise query specified within the "natural prediction join".
What's going on here and how do I generate a singleton prediction which makes use of the itemsets?
Is there a way to display the actual predicted value for an output attribute for a particular model. For example, say I am trying to predict if a particular customer is going to take advantage of a promotion (0=no, 1=yes) and I use neural networks. I know that I can use "Predict" to give me the prediction "yes" or "no" for each customer. However, the neural network actually spits out a number as a result. For example, a 0.997 would be interpreted as a "yes" for life insurance promotion. I do not want the probability that the prediction is correct. I want the actual output for the network.
The reason being is that I want to compute an error rate between the predicted value and the acutal value (root mean squared error or some other measure). Is there a way to compute this using the mining model prediction tab design view? I do not want to write the actual query as I teach a course in data mining using SQL Server and my students do not know DMX queries.
I have built a time series model to forecast sales value
I have data from jan 2004 to jan 2006 and the sales value is at a day level in my database. But I am aggregating it to month level in the DSV of the mining model.
I am required to make only historical predictions using the above model starting form jan 2004 to jan 2006 for every month.
I have set Historical_Model_Count and Historical_Model_Gap parameter values to 24 and 10 respectively, and trying to predict for the past few months (PredictTImeseries(SalesValue,-1,1))
But its throwing me the following error
Error(Data Mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, Sales Forecast, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process
In fact it throws the above error irrespective of what the Historical_Model_Count and Historical_Model_Gap parameter values are
I am not able to figure our why this problem is happening?
What should the parameter values for the above scenario?
It would also be helpful if I can get an explanation on how these two parameters affect the historical predictions. I kind of understand that these two parameters are important for historical predictions but don€™t know why or how.
HI Thanks a lot for your answer I say my request but I don€™t know why anybody don€™t answer me I have a project about predicting a value about selling and buying of a good like t-shirts and I use data mining for my project ,so I should use time series algorithm ,that €˜s mean I have previous data about t-shirts for 11 months ago and now I should say for 12th month ,how many t-shirts are sale? My tables saved on the excel file and it is problem, because how should I use this table for building model? After building a model and structure and predicting the value of 12th month for this store in the mining model then I use this query in model in bi: €œSELECT PredictTimeSeries(amount) From [Forecasting]€? This query showed a column and prediction value. After all of that now I should show this value in the application so I use c# language for building it, so I use a form in c# then I add a button to form that with clicking on this button, I can connect to my structure and then show process of connecting in the panel, then with clicking on the other button I can use this query (€œSELECT PredictTimeSeries(amount) From [Forecasting]€? ) and after using I can see prediction value in the textbox, that€˜s mean the value of 12th month show on textbox. Form has two buttons: one button is for connecting to mining structure and other button is for sending PredictTimeSeries query to structure, one textbox, which is for showing the predicting value for 12th month for selling t-shirts, one panel for viewing the lift chart. Also you say You are building an application that programmatically creates mining structure and a model, and then you want to train the model and display some results. And I don€™t know how can I train my model, I should use a special code for it? If your answer is say, please explain that and then say that codes are for training. Please if you can send c# code for Sporadic of stage, please send that, I need to this code; my request is emergency for me. Thanks a lot i am very sad because any body don't answer me
I am new to SSAS and i want to try to build a "Sales" model. I will have some "Usage" data for some timespanns, but I am not quite sure how to tackle this. Is there somewhere a "Howto" for this?
Edit: There are several locations, and for each location a forecast is needed. And the Icing would be If I would be able to tell where my supplies must go 1st to achieve the best sales...
The potential Client wants to use Oracle but I would like to show them that SQL Server is the better tool for this ;)
I have a database table which has all the inputs, key and the result. In visual studio, I created a decision tree model which has exactly the same fields as in the table. However the visual studio automatically add space preceding the capital letters. As the field name in the Datamining model and those in the database table are slightly different. I cannot use NATURAL prediction join. Is there anyway to told the visual studio not to add the spaces in the variable names?
I am using BI Dev Studio for SS2005 in a research (as opposed to a production) environment. Often I want to compare the results of multiple models using the same attributes. If I switch to a different model, the Design view completely resets. Is there any way to retain the same field names with different models in the Design view?
My current workaround is to give my models similar names with AR, DT, CL, LOG, NN suffixes and make global changes in the DMX.
I have consulted the following without finding an answer: http://msdn2.microsoft.com/en-us/library/ms178445.aspx http://msdn2.microsoft.com/en-us/library/ms175642.aspx http://msdn2.microsoft.com/en-us/library/ms175678.aspx http://msdn2.microsoft.com/en-us/library/ms175637.aspx
I´ve developed a Clustering Model with a total population of 25,000 customers. As a result I have 10 clusters. The variables are products.
My question is:
How can I automatically detect which clusters to select if the Product Manager of Product A (one of the variables) ask me for a DataBase of customers that are more willing to by Product A? In that case I´d like to select the clusters where the penetration of product A is more than 65% and select the customers of those clusters that don´t have Product A.
Basically the question will be....How I detect with a query which is the cluster where the penetration of a given producto is more than X%
Which will be the query that I should use to extract that information?
I am using Sequence Clustering algorithm. (I've built several models with Clustering algorithm and Decision Trees for this client, which work fine.).
Background: Sequence data must be stored in a nested table, which can have only 1 non-key attribute. I specify a mining model structure with the nested table key as the datetime, and the nested table discrete prediction column as [sort name] . this builds the model fine.
When I try to process this data mining model, I get Process failed: "Errors in the OLAP storage engine: The sort order specified for distinct count records is incorrect".
Iit may be that OLAP distinct count requests numerical data type, but not from the examples I've seen. Tried this anyway €“ doesn€™t work on numeric either €“ same problem. Any Suggestions?
I have a very simple time series model which processing works fine without any problem. However when I run the following query
SELECT
[TimeSeries].[PriceChange],
[TimeSeries].[Symbol],
PredictTimeSeries(PriceChange, -3, 2)
From
[TimeSeries]
WHERE
[TimeSeries].[Symbol] = 'x'
I get the following error:
TITLE: Microsoft SQL Server 2005 Analysis Services ------------------------------ Error (Data mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, TimeSeries, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process.
The following is the excerpt of the minding model script related to the two parameters:
<AlgorithmParameters>
<AlgorithmParameter>
<Name>MISSING_VALUE_SUBSTITUTION</Name>
<Value xsi:type="xsdtring">Previous</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_GAP</Name>
<Value xsi:type="xsd:int">1</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_COUNT</Name>
<Value xsi:type="xsd:int">10</Value>
</AlgorithmParameter>
</AlgorithmParameters>
These HISTORIC_MODEL_GAP (1) and HISTORIC_MODEL_COUNT (10) should accommodate PredictTimeSeries(PriceChange, -3, 2). Could anyone shed some light on this?
Hi, I am not getting Mining Accuracy Chart and Min ing Model Prediction Plz tel me how to do.And how to use the filter input data used to generate the lift chart and select predictable mining model columns to show in the lift chart
Hi After building a model in BI, I want to view the chart of model in mining model viewer, in the chart tab I can just see one prediction value that means for my model do prediction for some time slice and in prediction steps I can specify how many steps, I want to show this chart In mining model viewer tab we can see the chart of prediction also decision tree and the chart is for showing all of value prediction, and with choosing prediction steps we can specify that show just one value prediction or two or several values. But sometime I can see just one value in chart and sometime I can see several values in chart, This difference is for my data or no? And also for viewing historic prediction I should choice €œshow historic prediction€? and before that I should set Two parameters: Historic_ model _count and historic _model _count, But I can€™t see historic prediction (sometime this happens) Please help me.
Where I am trying to find out the associations between various service activities so that when a customer buys a service activity we can recommend him/her others
I'm building a mining model wiht MS Association Rules. After processing this model, the result includes some rules(example):
E = Existing, C = Existing -> B = Existing F = Existing -> E = Existing C = Existing, B = Existing -> E = Existing F = Existing -> B = Existing B = Existing, A = Existing -> C = Existing F = Existing, B = Existing -> E = Existing F = Existing, E = Existing -> B = Existing D = Existing -> A = Existing C = Existing -> A = Existing E = Existing, A = Existing -> B = Existing
I want to buid a query that has two or more items on the left of the rules, example: E = Existing, C = Existing -> B = Existing ->I want to buid a query to predict that: when a customer buy 'E' and 'C' then he likely buys 'B'
Is it possible to use two algorithms together?I need to write prediction Query so that its should both models having clustereing algorithm and timeseries algorithm.
for example
I am having student information.I ve to predict performance of students for certain period.The students should be classified by their types like rich kids,poorkids..like that.I need to predict the performance of the rich kids??
Dear friends, I'm reading Wiley's Data mining with SQL Server 2005... There are MANY things I can't understand about MovieClick example (Chapter 3). I hope someone is going to help me with this troubles...
WARNING (1): I'm a dummy both with sql server and data mining. WARNING (2): My English is not good at all.
Just two questions for now:
1) When I create the model to predict the number of bedrooms for homeowners, the book says to check BEDROOMS as Predictable... question: is it also an INPUT for the model, or PREDICTABLE only?
2) I'd like to keep this model (number of bedrooms.......) and make a prediction query.
- Query builder - select case table -> Homeowners - Drag the Customer ID column from the Homeowners table and drop it on the grid - Drag the BEDROOMS column from the mining model and drop it on the grid. - On the last row: Source=PredictionFunction, Field=PredictProbability - Drag the BEDROOMS column from the mining model and drop it into Criteria/Argument - Add (i.e.) 'Two or Three' to the field Criteria/Argument
I execute the query and I obtain many rows in a table with the following colums: CustomerID, BEDROOMS and Expression: WHAT DOES THIS MEAN? WHICH INFO DO I GET FROM THOSE NUMBERS? WHAT CAN I LEARN FROM THEM?
I am doing this right now this way: 1) I do the DMX prediction query where I get the PredictNodeId(predict_var), my query is like this:
SELECT PredictNodeId(predict_var), model_1.predict_var, t.var_1, t.var_2 FROM model_1 PREDICTION JOIN OPENQUERY([DATA_SOURCE_1], 'SELECT var_1, var_2 FROM table_1') AS t ON model_1.var_1 = t.var_1 AND model_1.var_2 = t.var_2 2)I do the DMX query to get the node_description from the model.content iterating each row from the result of my prediction query, this query is like this:
SELECT node_description FROM model_1.content WHERE node_name = 'node_name_var'
In this query node_name_var = PredictNodeId(predict_var) from my prediction query. What I want to know if there is a way to merge Query 1 and Query 2 so I can get the node_description in the same query qhere I get the PredictNodeId.
Can i use a CASE statement in a prediction query. the following query is throwing me an error
SELECT CASE [Sales Forecast Time Series].[City Code] when 'LA' then 'Los Angeles' WHEN 'CA' THEN 'California' ELSE 'OTHERS' END, PredictTimeSeries([Sales Forecast Time Series].[Sales Value],5) From [Sales Forecast Time Series]
ERROR: Parser: The statement dialect could not be resolved due to ambiguity.
Also
Is it possible to discretize the Sales Value column using a the CASE statement, the output column of PredictTimeSeries function.
Is there a link that can give me a comprehensive info on what can be achieved and what cant be using DMX queries
hi,I am a novice SSAS Programmer.I need a prediction Query in time series algorithm, so that it should predict for a particular date.I dont know how to use where condition in a prediction Query.
Can anyone show me how to run a prediction query and save the results to a sql table without using the T-SQL OPENQUERY tip here http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/3914.aspx? I am looking for an example in vb.net that I can use in a SSIS script task.
I have a question about what is possible with a prediction query against a nested table. Say I have a basic customer-product case and nested table mining model like so:
Mining Model DT_CustProd ( [Id] , [Gender] , [Age] [Products] Predict ( [ProductName] , [Quantity] ) ) Using Microsoft_Decision_Trees
I can write a query to find the probability of product (and quantity) A like so:
SELECT (select * from Predict(Products,INCLUDE_STATISTICS) where ProductName = 'A' )
FROM DT_CustProd
NATURAL PREDICTION JOIN
(SELECT 'M' AS [Gender], 27 AS [AGE] ) AS t
What if I know that the query customer (M,27) in question has purchased product B, how can I use that in the prediction join to predict product A? The fact that product B was purchased might influence the prediction, right?
I believe saving prediction query results to relational tables is possible (the BI studio does it!). I am not clear on how to do this w/o the BI studio, which means if I write a DMX query and want to store its output to a relational table, how do I do it?
Hi I have three questions about several topics. In this code: public string ConnectionString { get { return "Provider=MSOLAP.3;Data Source=localhost;Initial Catalog=Adventure Works DW"; } }
What is data source and initial catalog and what does this code do? And if I want to use other database how can change this code? (This code is for data mining viewer client project) And in this code: SqlConnection cn = new SqlConnection("Data Source=localhost;Initial Catalog=AdventureWorks;Integrated Security=True"); SqlCommand cm = new SqlCommand("Select AddressID,AddressLine1 from Person.Address", cn); SqlDataAdapter da = new SqlDataAdapter(); da.SelectCommand = cm; DataTable dt = new DataTable(); da.Fill(dt); this.comboBox1.DisplayMember = "AddressLine1"; this.comboBox1.ValueMember = "AddressID"; this.comboBox1.DataSource = dt;
what is comboBox1.DisplayMember and comboBox1.ValueMember ,and what is difference between those ? and other question: in adventure works dw project for data mining predicting ,in forecasting model ,if I want to show the result of this query in the combobox in c# how can I show that? SELECT PredictTimeSeries(amount) From [Forecasting] And again in this code ,it has a result which has two culomns ,on of them is for amount and other column is for time ,in sql I can save this result in exsiting table or neew table with wizard,but I want to Do this work in c#,that€™s mean with a adomdconnection I connect to forecasting model and write this query then in a datagridviwe ,Iwant to see the values of prediction in adventure works dw database. Other question: In €œdataminingviwerclient€? project I change this code and you can see it,for this code I have a form that give servername and catlogname in that and then with clcking on a button I want to show the chart of model in a child form ,but I can€™t.
public Form1 form1 = new Form1(); public string m_ServerName; public string m_CatalogName; public Form3() { m_ServerName = ""; m_CatalogName = ""; InitializeComponent(); }
public string ConnectionString { get { return "Provider=MSOLAP.3;Data Source=localhost;Initial Catalog=Adventure Works DW"; } }
private void ShowModel(Panel panel, string modelName) { AdomdConnection conn = new AdomdConnection(); try { MiningModelViewerControl viewer = null; MiningModel model = null; MiningService service = null;
// Clear any existing controls from the panel if (panel.HasChildren) panel.Controls.Clear();
// Connect to server conn.ConnectionString = ConnectionString; conn.Open();
// Determine the viewer type based on the model service and // instantiate the correct viewer model = conn.MiningModels[modelName]; service = conn.MiningServices[model.Algorithm]; if (service.ViewerType == "Microsoft_TimeSeries_Viewer") viewer = new TimeSeriesViewer(); else throw new System.Exception("Custom Viewers not supported");
// Set up and load the viewer viewer.ConnectionString = ConnectionString; viewer.MiningModelName = modelName; viewer.Dock = DockStyle.Fill; panel.Controls.Add(viewer); viewer.LoadViewerData(null); } catch (System.Exception ex) { MessageBox.Show(ex.Message, "Model Load"); } conn.Close(); when I run this code ,I have one error that say: the €œ object not found parametr name:index Please see this code and answer my question. If you just can answer one of my qestions ,please say. Thanks a lot for your answers.With best wishes for you
I have to perform a weighted search. I have 2 criteria and each will be weighted on a 100 sum(e.g 25/75, 50/50). I am just wondering if there is an easy way to encompass a weighted value on SELECTS. Sorry if this is a dumb question. Thanks,Kyle
Hi there... I've got an interesting one, that I can't seem to get my head around. Maybe some legend out there might be able to give me a hand...
I'm looking for a way to produce a weighted set of random numbers. I'm doing some work for a client at the moment, and they want to issue 3 random "reward cards" to their members at certain times. These are a bit like discount vouchers etc. The problem is some cards have need to have a higher frequency than the others. I guess a similar problem to baseball cards, you buy a pack of cards, you get mostly common cards, but every now and then, you get a rare card.
Here is the table setup: CREATE TABLE [dbo].[Cards]( [CardID] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Cards_CardID] DEFAULT (newid()), [CardName] [nvarchar](50) NOT NULL, [InsertRatio] [float] NULL, CONSTRAINT [PK_Cards] PRIMARY KEY CLUSTERED ( [CardID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]
INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 1', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 2', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 3', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 4', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 5', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 6', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 7', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 8', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Common 9', NULL) /* Null implies the card is a common card */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Rare 1', 0.02) /* 1:50 ratio */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Rare 2', 0.02) /* 1:50 ratio */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Rare 3', 0.02) /* 1:50 ratio */ INSERT INTO [dbo].[Cards]([CardName],[InsertRatio]) VALUES('Very Rare 1', 0.005) /* 1:200 ratio */
So what I need to do, is have a Stored Proc that I can execute and it returns back 3 random rows. Now in that single run, a card can't be duplicated.
Notice the Insert Ratio column? This has the ratio of the probability, eg a 1:50 insert ratio is equal to 0.02. For the common cards, a NULL value indicates it is a common.
Eventually, this table would have about 1000 rows in it, and about 200 of those would have various ratios (eg 1:50, 1:200, 1:1000, 1:8000 etc)