I have MS Time Seeries model using a database of over a thousand products each of which has hundreds of cases. It amazingly takes only a few minutes to finish processing the model, but when I click Mining Model Viewer to view the models, it takes many hours to show up. Once the window is open, I can choose model for different products almost instantly. Is this normal?
I have a very simple time series model which processing works fine without any problem. However when I run the following query
SELECT
[TimeSeries].[PriceChange],
[TimeSeries].[Symbol],
PredictTimeSeries(PriceChange, -3, 2)
From
[TimeSeries]
WHERE
[TimeSeries].[Symbol] = 'x'
I get the following error:
TITLE: Microsoft SQL Server 2005 Analysis Services ------------------------------ Error (Data mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, TimeSeries, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process.
The following is the excerpt of the minding model script related to the two parameters:
<AlgorithmParameters>
<AlgorithmParameter>
<Name>MISSING_VALUE_SUBSTITUTION</Name>
<Value xsi:type="xsdtring">Previous</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_GAP</Name>
<Value xsi:type="xsd:int">1</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_COUNT</Name>
<Value xsi:type="xsd:int">10</Value>
</AlgorithmParameter>
</AlgorithmParameters>
These HISTORIC_MODEL_GAP (1) and HISTORIC_MODEL_COUNT (10) should accommodate PredictTimeSeries(PriceChange, -3, 2). Could anyone shed some light on this?
Is it possible to group many time series into clusters by using the clustering algorithm of the SQL server 2005. The same question applies to "association rules" technique. Any examples?
I am about to prepare a paper concerning the field of real-time data mining. Real-time here means the process of incremental training of an existing model as soon as the data arrives.
There is a number of papers introducing algorithms for incremental association analysis, incremental clustering etc. Stream mining Ãs a field which is closely related to that. The main reason for the implementation of incremental algorithms is a) the large amount of data to be mined and b) the high rate of new data that is evolving every day.
Using classical batch mining algorithms, models that are outdated for some reason, would have to be re-trained, which could be very time consuming for billions of records. And once the training is completed, the training would have to be restarted once again because a bulk of new data has been arrived.
The question that I would like to discuss now is: For what real world applications would it be a meaningful or even essential to use real-time training of models?
Two main reasons could determine the answer to that question:
You just want to incorporate new data into existing models in order to increase the prediction accuracy of your model or Your underlying data is subject to more or less massive changes (also refered to as concept drift) and you want to adapt your mining model continuously to that reality.
I'm looking for some examples or ideas where one of these cases apply and it would be a good idea to have incremental mining algorithms involved.
I'm looking forward to inspiring some discussion on that issue.
Again I encountered a very strange problem which displayed the predicted attribute values as percentage format? The data type of the attribute is actually double, why is that?
That's really frustrated.
Thanks a lot in advance for your kind advices and I am looking forward to hearing from you shortly.
I encountered a very strange problem again. Why the time series displayed on the chart are so strange? The Key time column I chose for my time series algorithm is cal_month(e.g 199001...), but why the date displayed on the time series chart is like :05/06/2448? (it should be like 199001..?) What is that data? And where exactly did it come from? What is the exact cause of this?
Hope it is clear for your help.
I am really confused on this and thanks a lot for your kind advices and help and I am looking forward to hearing from you shortly.
I want to hear from you for your experiences on how can we be convinced by what the model predicts particularly for Time Series algorithm? As we are not able to see the model accuracy chart for that, in this case, how can use the model results? E.g. we wanna know the possible sales amount of next month for a particular store in order to buy in the goods, in this case, how can we make the most of the prediction by the model? To shop ower, well, if the result is too far away from the usual sales amount, then it is unbelievable, thus, in this case, what else can we try? Keep training the models until its results sounds reasonable? Or what else can we try?Thanks in advance for your advices and help and I am looking forward to hearing from you shortly.
Hi, I have built a Time Series model to forecast Sales Value. I have used data form Jan 2005 to Sep2006 to train the model. When I write a DMX query to forecast the sales value for next 5 months, it is throwing me the output in the following format.
I have built a Sales Forecast model to predict the sales value. Along with making historic predictions for previous time periods I also want to retrieve the actual sales values for those periods.
How can I achieve this in a time series model?
I also would like to know how do mining models store the data.
Do they store the data in the same table/view format as their respecive data source view or in the Model Content format.
As the Microsoft Time Series algorithm implementation is based upon the Autoregressive Tree approach described in:
C. Meek, D. M. Chickering, D. Heckerman. Autoregressive Tree Models for Time-Series Analysis. In Proc. 2nd Intl. SIAM Conf. on Data Mining, 2002 (SDM-02). SIAM, pp. 229 €“ 244. http://www.siam.org/meetings/sdm02/proceedings/sdm02-14.pdf.
The model estimated is refererred to as an instance of "... autoregressive tree models of length p, denoted ART(p). An ART(p) model is an ART model in which each leaf node of the decision tree contains an AR(p) model, and the split variables for the decision tree are chosen from among the previous p variables in the time series..." (see the last paragraph of p. 2 of the paper).
What is the value of "p" used in the Microsoft Time Series implementation -- specifically, how many previous time series variables are used in estimating the model? It doesn't appear that this value can be specified in the algorithm parameters -- is that correct?
I am trying to model data in analysis services with the Advance Create Mining Model function in the excel addin. I am having trouble creating an association model that works like the Associate button above the Advanced button.
The format of my data is like this
OrderID Product
100 Bike
100 Helmet
100 Shoes
200 Helmet
200 basketball
200 Bat
300 Shoes
300 Socks
The associate button works perfectly since it asks me which column is the transaction id (orderid) and which column I am trying to predict (product). The advanced create mining model asks me to determine what the columns are...
OrderID=key Product=Input+Predict?
When I run the advance create mining model associate, I get a browser that gives me no rules and the support for only one item itemset (each product but no combination of products).
Does anyone know what I have to do to get it to work like the associate button?
I perform data mining on all products and a specific product category. Do I need to create 2 data source views, one for all products and the other one for the specific product category? Afterward, I run the Data Mining Wizard 2 times to create 2 mining structures. I also need to add the same mining model (e.g. Bayes, Cluster) to each of these mining structures. Is there any simple way to do it?
I have a separate date dimension marked as Date table in Tabular Model and having proper relationship with another table(e.g SalesTable) with column of date type. But still time intelligence functions are not working. I am using the date column from other table (e.g SalesTable) in the formula. What exactly going wrong in our tabular model.
Real World: Backup Strategy and implementation, how? A quote:
€œReal World:
Whether you back up to tape or disk drive, you should use the tape rotation technique. Create multiple sets, and then write to these sets on a rotating basis. With a disk drive, for example, you could create these back files on different network drives and use them as follows:
//servername/data1drive/backups/AWorks_Set1.bak. Used in week 1, 3, 5 and so on for full and differential backups. //servername/data2drive/backups/AWorks_Set2.bak. Used in week 2, 4, 6 and so on for full and differential backups. //servername/data3drive/backups/AWorks_Set3.bak. Used in the first week of the month for full and differential backups. //servername/data4drive/backups/AWorks_Set4.bak. Used in the first week of the quarter for full and differential backups.
Do not forget that each time you start a new rotation on a tape set, you should overwrite the existing media. For example, you would append all backups in week 1. Then, when starting the next rotation in week 3, you would overwrite the existing media for the first backup and then append the remaining backups for the week.€?
I understand these concepts, however in €˜the real world€™ how do you go about implementing these jobs in SQL2K and how on earth do you schedule the tasks to overwrite, for example, week 1, when on week 3€™s rotation.
Could I have real world examples or scripts for the jobs that would carry out this task? It appears that whatever course you do, it does not fully cover the above, and I have only worked on my own and never with a DBA, so I have never seen this implemented in any environment.
I would like full details on this please, as I need to get my head around it.
I get the following error when I try to load the mining model in the mining model viewer
Query (1, 6) The '[System].[Microsoft].[AnalysisServices].[System].[DataMining].[NeuralNet].[GetAttributeValues]' function does not exist.
I get a similar error when I try to load the Load Mining Accuracy Chart
Failed to execute the query due to the following error:
Query (1, 6) The '[System].[Microsoft].[AnalysisServices].[System].[DataMining].[AllOther].[GenerateLiftTableUsingDatasource]' function does not exist.
I have traditionally done web app and client server programming and am have been playing around with a new tool (Win forms) app that I eventually want to distribute to other developers and a couple of business people in our organization. I have done apps in the past that all connect to a central server for data access. The I'm working on now will have an individual DB per user and should be available locally for a desktop version of the software.
I am looking for more resources on things to consider when deploying an app with either MSDE or SQL 2005 Express. More specifically, items like long term maintenance of the db once it's installed with the user, etc. (DB bloat, transaction files, auto maintenance routines I may want to build in, etc)
I have seen all of the msdn docs on what you need to deploy (and how to do it), but I'm looking fro input from people that have deployed it and any significant pitfalls they have run into with that sort of deployment.
Any links or book references would be appreciated, thank you,
Hello: I would like to send data feeds to Sales Reps listing customers with their likelihood of purchasing a product from a certain Category. The data will basically contain Customer Name, Category, Likelihood Score (or Probability Score). When sorted in Descending order of Probabililty a Sales Rep can easily see the customers who are most likely to purchase at the top of the list and can call them.
I figured I would explore the use of SSAS data mining for this, and integrate it within an SSIS data flow (needed to send the data out to the sales reps).
I have not worked on Data Mining and so am clueless as to where to begin. I have been reading a book on it but most of the examples this far show me how the results can be viewed within AS, which is not what I am looking for.
Ideally I would like to run an algorithm against all our product categories for 2007 and store the results to a table. Then use that table as a lookup to get the probability score for specific categories that we are targeting in 2008.
I appreciate any advice for the experienced SSAS folks on this forum.
public Set DoSomthing(Set toBeProcessed, Set measuresToWorkWith)The set measurseToWorkWith is passed as {[Measures].[Measure1], [Measures].[Measure2] ...}
with the measures being real or query-scoped calculated members.
To get the value of the measure for each tuple in the set toBeProcessed, I create an Expression for each tuple (measure) in the set measuresToWorkWith then for each tuple in toBeProcessed call expression.Calculate(tuple) which returns a MDXValue.
My problem is that in order to make the code generic I need to get the real (.NET) data type of the MDXValue. The class only has explicit conversion methods ToInt16() etc which implies that the data type is known at design time.
However, if one of the measures is a query-scoped calculation then it could return a .NET double, int, bool or string.
If the measure is real then I can look up its metadata. However, it appears that if it is a formula (scoped member) then are all bets are off?
With SASS Database i have created Data mining Structure Using Time series algorithm, while processing the SSAS db, Data mining  taking long time to process, so how we can  reduce processing time ???
Hi! I want to get hole through to Analysis Services 2005 from MS SQL Server 2005 with openquery, so I created linked server (MSOLAP). How to test that I can access an "database" on Analysis Services Server?
Something like select * from openquery(linked_olap, getdate()) or openquery(linked_olap, 'print hello world') or openquery(linked_olap, select results from dataminingmodelCreatedWithVstudio) ? Thanks Bjørn
I've been doing research on the web trying to find if it's possible to create a Report Model that is based on a Data Source that is NOT SQL Server and NOT SSAS.
Specifically I'd like to be able to build a Report Model on a Web Server (or multiple web services).
Does anyone have any information relating to this? Based on my research it looks like there's a few interfaces that need to be implemented (such as ISemanticModelGenerator, etc.).
I did find a post relating to using ODBC data sources for a Report Model and the recommended solution was Linked Tables in SQL Server or UDM in SSAS. Both cases don't look feasible for a Web Service-based Data Source.
I am trying to write some DMX queries to create and populate a mining model as a SSMS analysis services project. I followed the following steps:
1. Create the mining model using a CREATE MINING MODEL ... query.
2. Followed by the - INSERT INTO MINING MODEL ... query, which fetches prediction data from another mining model to populate the mining model.
3. I now want to use this new model for prediction, which requires processing the mining model first. When I process the model, it throws the following error:
Error (Data mining): The 'XYZ_Structure' structure does not contain bindings to data (or contains bindings that are not valid) and cannot be processed.
Please suggest if I am making a mistake in the above procedure. I will appreciate all help in overcoming this issue.
Auxilliary question: How do I process the mining model programmatically?
I tried to find the graphs I saved from Data Mining Model Viewer, but where is it saved? As we see from the mining model viewer we could save the graphs there by clicking the 'save graph' button, but where is the graph?
Really need help for that.
Thank you very much in advance for any guidance and help for that.
I currently have an application which retrieves stats from a SQL database that are updated throughout the day. My application updates every 3 seconds to retrieve these stats but I found that it's quite expensive to the server's CPU when there are several clients running my application. Are there any other methods for extracting the data that won't require so much CPU. The fact that the database is being hit so much isn't good either. I have already written a webservice hoping that this would improve things but I'm not sure it has.
I am new to SQL Server 2005 Analysis Services and would like to use the OLAP Cubes as a datasource to build Mining Model . However i would like to use a particular view of the OLAP cube that i have generated to be used as the datasource for the mining model . I find that i am not able to save the Cube View while browsing the OLAP cube in Business Intelligence Studio. Is there a way i can acheive this requirement.
Any ideas regarding this will be really appreciated.
Hi I´m a SysAdmin and I´m implementing ASSP (anti-spam) to use with an email server. But the ASSP requires a txt file with all domains hosted in the server that are in a SQL Table of the server. My questions is: Is there anything to make a real time TXT file with all the domains ? maybe use trigger I dont know..