Time Series Data Mining
May 21, 2006
Is it possible to group many time series into clusters by using the clustering algorithm of the SQL server 2005. The same question applies to "association rules" technique. Any examples?
Is it possible to group many time series into clusters by using the clustering algorithm of the SQL server 2005. The same question applies to "association rules" technique. Any examples?
I probably posted my question in the wrong forum so please check here:
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=3009605&SiteID=17
thanks
Hi all,
I have a very simple time series model which processing works fine without any problem. However when I run the following query
SELECT
[TimeSeries].[PriceChange],
[TimeSeries].[Symbol],
PredictTimeSeries(PriceChange, -3, 2)
From
[TimeSeries]
WHERE
[TimeSeries].[Symbol] = 'x'
I get the following error:
TITLE: Microsoft SQL Server 2005 Analysis Services
------------------------------
Error (Data mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, TimeSeries, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process.
The following is the excerpt of the minding model script related to the two parameters:
<AlgorithmParameters>
<AlgorithmParameter>
<Name>MISSING_VALUE_SUBSTITUTION</Name>
<Value xsi:type="xsdtring">Previous</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_GAP</Name>
<Value xsi:type="xsd:int">1</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_COUNT</Name>
<Value xsi:type="xsd:int">10</Value>
</AlgorithmParameter>
</AlgorithmParameters>
These HISTORIC_MODEL_GAP (1) and HISTORIC_MODEL_COUNT (10) should accommodate PredictTimeSeries(PriceChange, -3, 2). Could anyone shed some light on this?
Hi, all experts here,
Thank you very much for your kind attention.
I encountered a very strange problem again. Why the time series displayed on the chart are so strange? The Key time column I chose for my time series algorithm is cal_month(e.g 199001...), but why the date displayed on the time series chart is like :05/06/2448? (it should be like 199001..?) What is that data? And where exactly did it come from? What is the exact cause of this?
Hope it is clear for your help.
I am really confused on this and thanks a lot for your kind advices and help and I am looking forward to hearing from you shortly.
With best regards,
Yours sincerely,
Hi all,
I have MS Time Seeries model using a database of over a thousand products each of which has hundreds of cases. It amazingly takes only a few minutes to finish processing the model, but when I click Mining Model Viewer to view the models, it takes many hours to show up. Once the window is open, I can choose model for different products almost instantly. Is this normal?
With SASS Database i have created Data mining Structure Using Time series algorithm, while processing the SSAS db, Data mining  taking long time to process, so how we can  reduce processing time ???
View 2 Replies View RelatedHi all,
I'm wondering what is the best way to store time-series data in an SQL database?? I've done a bit of investigating on the rrdtool (round robin database tool) that is used in a lot of nix based solutions for monitoring network equipment. I have a need to collect performance data from servers and routers and then produces some nice graphs from that data. I'm just not sure who i should store that data without the database growing to some huge size.
Any suggestions?
Hi,
I've got a time series of the number of new customer subscriptions, which
is the target attribute to predict. The number of subscriptions depends on
various marketing activities, such as mailings, which are known within
the time series for the past.
If I train an ART (MS Autoregression Tree), it learns the trend pattern
as well as the correlations between the marketing activities and the
target (cross-correlations), right?
What I would like to do is, providing the model with some marketing activity
planning for the future and let the model predict the number of new
subscriptions based on a) the past trend pattern and b) the future activities.
Unfortunately a time series algorithm does not provide some kind of scoring for input data.
What would be the best approach to solve this problem? How about linear regression?
But how to train a regression model with trend patterns?
Thanks for your help!
Hi, all,
Again I encountered a very strange problem which displayed the predicted attribute values as percentage format? The data type of the attribute is actually double, why is that?
That's really frustrated.
Thanks a lot in advance for your kind advices and I am looking forward to hearing from you shortly.
With best regards,
Yours sincerely,
Hi, all here,
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
We have a set of cubes and dimensions, and we're experimenting with data mining against the cubes (primarily for forecasting applications). We have a custom time dimension (which we call calendar), not generated by the BIStudio wizard. The dimension has year/month/day/hour/... attributes. But when I try to add this Calendar dimension to the mining structure as a nested table using BI studio, it only shows the Year attribute, not the others. Other dimensions seem to show all the attributes.
Is there something we've done wrong in defining our time dimension? What determines which attributes show up as available for selection in BI studio?
I am about to prepare a paper concerning the field of real-time data mining. Real-time here means the process of incremental training of an existing model as soon as the data arrives.
There is a number of papers introducing algorithms for incremental association analysis, incremental clustering etc. Stream mining Ãs a field which is closely related to that. The main reason for the implementation of incremental algorithms is a) the large amount of data to be mined and b) the high rate of new data that is evolving every day.
Using classical batch mining algorithms, models that are outdated for some reason, would have to be re-trained, which could be very time consuming for billions of records. And once the training is completed, the training would have to be restarted once again because a bulk of new data has been arrived.
The question that I would like to discuss now is: For what real world applications would it be a meaningful or even essential to use real-time training of models?
Two main reasons could determine the answer to that question:
You just want to incorporate new data into existing models in order to increase the prediction accuracy of your model or
Your underlying data is subject to more or less massive changes (also refered to as concept drift) and you want to adapt your mining model continuously to that reality.
I'm looking for some examples or ideas where one of these cases apply and it would be a good idea to have incremental mining algorithms involved.
I'm looking forward to inspiring some discussion on that issue.
Hello,
I was working with Microsoft Time Series model (MTS) with some data, when in the mining model viewer, decision tree tab, I realized that the key time variable that I define, it was acting like a split variable.
So, I ask you, this is possible?, because, for me, this should not happen€¦.
After, I review the Data Mining Tutorial by Seth Paul, Jamie MacLennan, Zhaohui Tang and Scott Oveson, and I found, in the Forecasting part, that the key time variable (Time Index) it was acting like a split variable too, in for example, M200 pacific:Quantity and R250 Europe:Quantity.
So people, it€™s possible that a key time variable act like a split variable in a MTS model?
Thanks
Hi, all experts here,
Thank you very much for your kind attention.
I am confused on key time column selection. e.g, I want to predict monthly sales amount, then what column in date dimension should I choose to be the key time column? Is it calendar_date (the key of date dimension) column or calendar_month?
Thanks a lot for your kind advices and help and I am looking forward to hearing from you shortly.
With best regards,
Yours sincerely,
I am working on a stock price analysis project. Stock prices are stored in SQL Server as tick by tick data (that means in a high frequency) and need to be grouped by minutes in order to display them as as high, low, open and close prices in a candlestick or similar chart.
The fields in the table are:
Time (for every stock price tick)
Price (stock price)
Symbol (of the stock)
A sample time series would look like this:
11:45:01 $22.11 MSFT
11:45:02 $22.10 MSFT
11:45:04 $22.25 MSFT
...
...
...
11:45:58 $22.45 MSFT
11:46:03 $22.22 MSFT
11:46:08 $22.25 MSFT
...
...
...
11:46:59 $22.45 MSFT
The result of the SQL query should be:
Symbol Time Open High Low Close
MSFT 11:45 $22.11 $22.45 $22.10 $22.45
MSFT 11:46 $22.22 $22.45 $22.22 $22.45
This is the SQL statement I came up with but obviously doesn't work:
SELECT DATEPART(Minute,Time), MIN(Price), MAX(Price) FROM dbo.TimeSales WHERE Symbol='MSFT' GROUP BY DATEPART(Minute,Time)
I would appreciate any suggestions.
Thank you!
I followed the tutorial posted at [URL] ...
Everything was ok until the last step where I had to process the mining structure which resulted in a warning
"Informational (Data mining): Decision Trees found no splits for model, Tbl Decision Tree Example."
What does this error mean? How do I resolve it? Also, I only see the first level in the Mining Model Viewer, I don't see the levels 2 and 3.
Hi, all experts here,
I would like to know if there is any way to migrate third-party data mining packages with SQL Server 2005 data mining algorithms together then we can have a comparison among all of them to get the best results for training models.
I am looking forward to hearing from you.
Thanks a lot.
With best regards,
Yours sincerely,
Hoping someone will have a solution for this error
Errors in the metadata manager. The data type of the '~CaseDetail ~MG-Fact Voic~6' measure must be the same as its source data type. This is because the aggregate function is not set to count or distinct count.
Is the problem due to the data type of the column used in the mining structure is Long, and the underlying field in the cube has a type of BigInt,or am I barking up the wrong tree?
I'm a beginner with SQL 2012 SSDT & SSMS. I get this error message when I try to deploy my project:Â
"Error 6
Error (Data mining): KEY SEQUENCE columns are not supported at the case level. The 'Customer Key' column of the 'TK448 Ch09 Cube Clustering' mining structure contains content that is not valid.
0 0
"
I am finding it hard to locate the content that is not valid. I've been trying to find a answer for this problem but can't seem to find anything. How can I locate the content that is not valid and change or delete it so that I can deploy this solution?
Having successfully created :
- a data mining structure with about 80 columns.
- a data mining model using Microsoft_Decision_Trees with 2 prediction columns.Â
I thought I would then explore the possibility of have more than 2 prediction columns, in this case 20.
I get an error message and I can't work out :
a) if this is because there's a limit to the maximum number of prediction columns and where that maximum is stated.
b) if something else has become corrupted
c) there's a know bug and if the error message is either meaningful or not.
Either way, I'm unable to complete the data mining wizardÂ
The error message is :Errors in the metadata manager. Either the mining structure with the ID of '[my model Structure]' does not exist in the database with the ID of 'DMAddinsDB', or the user does not have permissions to access the object.
I m using the Time Series Algorithm to forecast sales across regions for various products. Assume the model is built with last 3 years data with the periodicity being monthly.
Is it possible that sometimes I can make predictions based on just 1 yr or 2 yrs data for certain products alone or certain regions alone? Can this be done without having to retrain the already built model?
Also, is it possible that using the model, i can predict week-wise / month-wise / quarterly sales as well?
Hi all,
I am using Microsoft_Time_Series and have set HISTORIC_MODEL_GAP to various values (from 1 to 21). I always get this error:
Error (Data mining): The 'HISTORIC_MODEL_GAP' data mining parameter is not valid for the 'My Time Series' model.
In Algorithm Parameters window, this parameters is not there by default, so I have to add it.
Any tip will be greatly appreciated.
Implementing data mining Add-in in an academic setting? We need to handle over 150 new students a semester and have their connection to Analysis Services survive for their four years at the college. We are introducing data mining to every freshman business student as a unit within their Intro to Excel class (close to a month of work to give them a sense of what is possible). Other courses later in their curriculum will expand on that introduction.Â
Once implemented, we would have as many as 900 connections to manage (four years from now). It is possible that multiple sections will be running at the same time, so 40 students may be accessing the data mining tools concurrently. Â
Is there a way to "bulk establish" the access credentials and establish those databases?
Hi All,
I have a table Test1:
ID date Value
AAUGVAL 2/27/198760.848
AAUGVAL 3/2/1987 64.288
AAUGVAL 3/3/1987 63.77
AAUGVAL 3/4/1987 62.495
AAUGVAL 3/5/1987 62.65
AAUGVAL 3/6/1987 62.548
AAUGVAL 3/9/1987 62.292
AAUGVAL 3/10/198763.045
AAUGVAL 3/11/198763.021
....
I am trying to see the value % changes day by day and here is is the code I wrote:
select
starttime=cast(v.date as char(8)), endtime=cast(a.date as char(8)),
startval=v.Value, endval=a.Value,
change=substring('- +', sign((a.Value-v.Value)+2,1)+ cast(abs(a.Value-v.Value) as varchar)
from
(select date,Value, ranking =(select count(distinct date) from Test1 T where T.Value<=S.Value)
from Test1 S) v left outer join
(select date,Value, ranking=(select count(distinct date) from Test1 T where T.Value<=S.Value)
from Test1 S) a
on( a.ranking=v.ranking+1)
I got the following error message:
Server: Msg 174, Level 15, State 1, Line 4
The sign function requires 1 arguments.
Server: Msg 170, Level 15, State 1, Line 7
Line 7: Incorrect syntax near 'v'.
Server: Msg 170, Level 15, State 1, Line 9
Line 9: Incorrect syntax near 'a'.
Could someone please help with this? Thank you in advance!
shiparsons
I am trying to write a stored proc the calculates a moving average overthree periods. In the following example, I need to stratify the data bypersonID and RecordID in the #Temp table, but I am not sure how to doit. Right now I am restricting the data I use to build my time series bypersonID and I get the results I want *by PersonID*. If I can figure outhow stratify by personID so I don't have to use this restriction, I'msure I can extend it to the RecordID.Create Table #Temp(tmpID int identity,DetailID int,RecordID int,AdminDate Datetime,AdminTime datetime,Status tinyint,--decimal(9,2),Location varchar(100),PersonID char(9),PatientName varchar(100),DOB Datetime,Drug varchar(100),Sort varchar(10))--populate with data by personIDinsert into#Temp(DetailID,RecordID,AdminDate,AdminTime,Status ,Location,PersonID,PatientName,DOB,Drug,Sort)Select MD.PatMedOrderDetailID, MD.PatMedOrderID, M.Date as AdminDate,Case M.Time When 'A' then '8:00:00 AM' When 'N' then '12:00:00 AM' When'P' then '4:00:00 AM'When 'H' then '8:00:00 PM' else M.Time End as Admintime,100*M.Status, P.Location,P.PersonID, P.Name as PatientName, P.DOB,D.GenericName + ' (' + D.TradeName + ') ' +D.Strength,Left(P.Location,3)From PatMedOrderDetail MD Inner Join PatMedOrder MO on MD.PatMedOrderID= MO.PatMedOrderIDinner Join PatMedPass M on MD.PatMedOrderDetailID =M.PatMedOrderDetailIDinner join Patient P on M.PersonID = P.PersonIDinner join Drugs D on MO.DrugID = D.DrugIDWhere P.PersonID = '000126230'Order by P.PersonID,MD.patMedorderID, M.Date, M.TimeSelect * from #Temp -- to view entire set--returns relevant rowsSelect Derived.RefusalRate,T.* from #Temp T inner join(select t1.tmpID, avg(t2.Status) as RefusalRatefrom #Temp t1 cross join #Temp t2WHERE t1.tmpID>=3 AND t1.tmpID BETWEEN t2.tmpID AND t2.tmpID+2group by T1.tmpIDhaving avg(t2.Status)< 100) as Derived on T.tmpID = Derived.tmpIDDrop Table #Temp*** Sent via Developersdex http://www.developersdex.com ***
View 1 Replies View RelatedGiven the following table information:HOSTNAME DATETIMEWEBNYC001 2005-06-15 10:30AMWEBNYC001 2005-06-15 10:31AMWEBNYC001 2005-06-15 10:31AMWEBNYC001 2005-06-15 10:34AMWEBNYC001 2005-06-15 10:35AMWEBNYC001 2005-06-15 10:35AMWEBNYC002 2005-06-15 10:30AMWEBNYC002 2005-06-15 10:30AMWEBNYC002 2005-06-15 10:33AMWEBNYC002 2005-06-15 10:35AMWEBNYC002 2005-06-15 10:35AMWEBNYC002 2005-06-15 10:35AMHow can I easily return the following results:HOSTNAME DATETIME COUNTWEBNYC001 2005-06-15 10:30AM 1WEBNYC001 2005-06-15 10:31AM 2WEBNYC001 2005-06-15 10:32AM 0WEBNYC001 2005-06-15 10:33AM 0WEBNYC001 2005-06-15 10:34AM 1WEBNYC001 2005-06-15 10:35AM 2WEBNYC002 2005-06-15 10:30AM 2WEBNYC002 2005-06-15 10:31AM 0WEBNYC002 2005-06-15 10:32AM 0WEBNYC002 2005-06-15 10:33AM 1WEBNYC002 2005-06-15 10:34AM 0WEBNYC002 2005-06-15 10:35AM 3Thanks!
View 2 Replies View RelatedHello,
I am new to SQL Server and learning lots very quickly! I am experienced at building databases in Access and using VBA in Access and Excel.
I have a time series of 1440 records that may have some gaps in it. I need to check the time series for gaps and then fill these or reject the time series.
The criteria for accepting and rejecting is a user defined number of time steps from 1 to 10. For example, if the user sets the maximum gap as 5 time steps and a gap has 5 or less then I simply want to lineraly interpolate betwen the two timesteps bounding the gap. If the gap is 6 time steps then I will reject the timeseries.
I have searched the BOL and MSDN for SQL Server and think there must be a solution using the PredictTimeSeries in DMX, but not quite sure if I can do this. I may be better off simply passing through the time series as a recordset and processing as I would have done in Access...(I am reluctant to do this as I have of the order 100 * 5 * 365 time series and growng by 100 each day and fear it will take quite some time...)
Can anyone help me by pointing me in the right direction please?
Unless there is a way of using PredictTimeSeries on its own, I think the solution is:
Identify if a record is the a valid one or part of a gap (ie missing values).
Identify the longest gap and reject or process data on this value.
Identify if a record preceedes or succeeds a gap.
For each gap fill it using a linear interpolation.
Thanks,
Alan.
Hi!
First question: How many months of data do you need to make this algorithm to work in Excel 2007? And is there an issue about data types in Excel for this algorithm?
I have found some odd behaviours regarding this. If I use the DM sample Excel 2007 with time series data everything works fine. If I copy and paste data into Excel 2007, from another data source, I can get a forecast of repeating values, that is one value, that will be repeated for each month that I am trying to do a forecast.
Should I avoid having time members for forecast dates in a column? Sometimes my forecast values will be placed below my dates that do not have values. If I am forecasting months in 2008, with month values from 2007 and 2006 the forecast values will be placed below my 2008 empty months.
Kind Regards
Thomas Ivarsson
I'm trying to learn about time series algorithm but I can't set the time periodicity right. I have information stored 2 times a year (semester) so I'll should set up a PERIODICITY_HINT = {2}, right? but it does not change anything.
Here is a screenshot that might help understand the problem:
I'm going to create an analysis report based on time range. The data is grouped by the hourly range. There're two problems that I'm facing.
1. How can I generate such result set so that it will give me 0 count instead of missing that column?
2. How can I vary the start and end time which depends on another table?
I believe this is quite hard to be complete within a single SQL. However, I would still want to try. The SQL server is the Express version. No analysis service is available.
Hi!
I have a table Month_Sales(Month, product_1, .., product_n). The value of column product_i is the sale in this month.
so when i build MS Time Series for this domain, i want to query to find top m product is seld most in next month??
How do i buid that query???
I am trying to predict Revenue gererated by each Person.
My Input like this:
Month Person Revenue
-----------------------------------------
20050101 Person1 $1000
20050101 Person1 $2000
20050201 Person1 $1000
20050101 Person2 $5000
20050201 Person2 $2000
20050201 Person2 $3000
Obviosly for Person1 and 200501 I expect to see on MS Time Series Viewer $3000, correct?
Instead I see REVENUE(actual) - 200501 VALUE =XXX,
Where XXX is absolutly different number.
Also there are negative numbers in forecast area which is not correct form business point
Person1 who is tough guy tryed to shoot me.
What I am doing wrong. Could you please give me an idea how to extract correct
historical and predict information?
Thnak you,
Tim.
Hi Jamie:
I am building data mining models to predict the amount of data storage in GB we will need in the future based on what we have used in the past. I have a table for each device with the amount of storage on that device for each day going back one year. I am using the Time Series algorithm to build these mining models. In many cases, where the storage size does not change abruptly, the model is able to predict several periods forward. However, when there are abrupt changes in storage size (due to factors such as truncating transaction logs on the database ), the mining model will not predict more than two periods. Is there something I can change in terms of the parameters the Time Series Algorithm uses so that it can predict farther forward in time or is this the wrong Algorithm to deal with data patterns that have a saw tooth pattern with a negative linear component.
Thanks,