Questions About Sql Server 2005 Timeseries Algorithm
May 16, 2007
First of all I would like to politely greet everybody as I'm new on that forum and new to Data Mining in fact.
To introduce myself I can say I'm a student of Computer Science and I'm trying to use Time Series algorithm for weather analysis. I know that forecasting weather is a hopeless task even for the fastest computers in the world but what I'm trying to do is a kind of aposteriori analysis of historical data to notice some dependencies or characteristic weather behavior on a specified region and perhaps make some short time predictions.
I tried Time Series Algorithm although I have some doubts about methodological justification of this choice (if You have any critical comments please share them with me). But my main questions are about the usage of the algorithm itself:
I've read the documentation and a tutorial on this page for historical predictions but I still don't know what exactly are HistoricalModelCount and HistoricalModelGap. I know that my historical predictions are bounded by a €“ HistoricalModelCount*HistoricalModelGap*, but it's a rather operational knowledge... The explanation is always clouded with an €œinternal model€? phrase. Can You point me to a document where I can find some more detailed information? (What is the form of the model? How is it built? etc.)
Periodicity Hint. How should I treat these optional values? Are they other possible periods of data? I have data about weather measurements made every six hours for thirteen years** so is it a good choice to set this parameter to {365*4,4} (The first goes for a year and the second for a day)?
This is a technical question and I'm really ashamed of myself that I bother You with it. On the time chart in a model Viewer I can see date from the last year only. Zooming out/in, clicking insanely on every pixel on the screen, did not give any result (apart of broken mouse buttons). Is is possible to browse that data in mining model viewer chart?
Thank You in advance for Your replies!
*This formula suggests how this parameters could work but I would like to know it for sure €“ don't want to make some awful mistakes in my project. :-)
**Of course I plan to reduce the amount of data but the period will stay.
The first question is how to of TimeSeries Algorithm?
Using SQL Server 2005 TimeSeries Algorithm ,I build a data mining model.But after three days,it is still training.The data has 2,200,00 rows.
So what can i do to improve the processing speed.
Thanks!
The second question is parameters in Data Mining Query Task.
Data Mining Query Task is used to get data from data mining model.In the mining model form, i choose a mining model . And in the query form,i wrote a dmx ,"select flattened top 100 predicttimeseries([Xssl],1) from [Time Series XSSL]".Last i choose a table that is for the data from mining model.
hi,I am a novice SSAS Programmer.I need a prediction Query in time series algorithm, so that it should predict for a particular date.I dont know how to use where condition in a prediction Query.
Currently I want to run a vanilla multivariate regression and get some statistics back about the regression that is built. For instance, besides the coefficients, I also want the two-sided p-values on the coefficients and the R2 of the model.
I've tried playing with the Microsoft_Linear_Regression algorithm and have run into two issues. I'm doing all this programmatically using DMX queries rather than through the BI studio.
(a) I can never get the coefficients from the regression to match with results I would get from running R or Excel. The results are close but still significantly off. I suspect this is because the Linear Regression is just a subset of the Decision/Regression Trees functionality, in which case some kind of Bayesian prior is being incorporated here. Is that the issue? And if so, is there some way to turn off the Bayesian scoring and get a vanilla multivariate regression? I don't see anything in the inputs to the linear regression that would let me do this, and even running Microsoft_Decision_Trees with a few different settings, I can't get the output I'm looking for. If there's no way to turn off the Bayesian scoring, can someone explain to me what the prior being used here is and how Bayesian learning is being applied to the regression?
(b) Using the Generic Tree Viewer, I see that there are a few "statistics" values in the Node_Distribution, but I'm not sure what they're referring to. One of them looks like it might be the MSE. I could play with this some more to find out, but I'm hoping someone here can save me that work and tell me what these numbers are. Hopefully they will constitute enough information for me to rebuild the p-values and the R2.
Hello, I was working with Microsoft Time Series (MTS) algorithm and simulated data in order to evaluate/know it a little more. I simulated 24 points of the model y[t] = 5.74-0.1486 y[t-1] + e[t] and 19 points of the model y[t] = 10.48-0.0486 y[t-1] + e[t] (a change of level), where e ~ N(0,0.01). The MTS output is: if time>=23.5 then AR(3) else AR(1): y[t] = 6.23-0.2536 y[t-1]. So, I am wondering: how the algorithm works whit the time variable as a split variable? Like the other variables? Only considering 4 time points? Why the MTS algorithm produces AR(p) models where p is a little large (like the example: I simulated an AR(1) model and the output is an AR(3) model), what about parsimony models? A AR model is a stationary model, so what happen if some data have trend? We need eliminate the trend before the MTS algorithm can be used?Thanks for your time
I am working on academic project using SQL Server 2005 & Visual studio 2005. Using Apriori algorithm to find the association between Patient City and likely diseases.
I have created PATIENT table with Patient_Id, Patient_name, Age, City attributes and Diseases table with Disease_Id, Disease_name. Connected these two tables, MANY - MANY [M:N]. Got a third relation with Patient_Id and Disease_Id attributes.
I am just inputting some dummy data into patient table and disease tables to make Apriori algorithm work. When a new Patient City is entered into patient table, System checks Patient table for same City previously stored and using Third relation, pulls Disease that associated with the City.
Here are my tables with attributes:
PATIENT ( Patient_Id, Patient_name, Age, City)
Diseases(Disease_Id, Disease_name)
[M:N] Got third below third relation bcz its Many to Many relationship
PATIENT_DISEASES(Patient_Id, Disease_Id)
I do think and believe that there is an efficient way of doing , instead of usin dummy data or using this relationships. I did check Microsoft Association algorithm and realised it is not Apriori algorithm.
Could you suggest the best or efficient way of doing this using SQL Server 2005?
Your help and insight into this matter is highly appreciated.
Could please anyone here help me for this problem?
My problem is: I have registered my plug-in algortihm with SQL Server 2005 analysis services, and I can see my plug-in algortihm added to the analysis service configuration file (msmdsrv.ini). But why I can not see my algorithm appearing in the list of algorithms when I tested it? Really need help for that.
managed plug-in framework that's available for download here: http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB&displaylang=en#DMAPI.
This package includes the source code for a sample plug-in algorithm written in C#.
in this source code all .cs files are modified for clustering algorithm
if my plugin algorithm is of association or classification type then what modifications are requried in source code???
We are currently running a corporate client with Windows 2000 and .Net 1.1. We are running a number of SQL Server 2000 applications and are now thinking of upgrading to SQL Server 2005 as part of a data consolidation exercise. I am concerned on a number of points:
Can I connect to SQL Server 2005 using old ADO connectors? We have about 40 Excel VBA solutions, and we dont want to upgrade to SQL Server 2005 if we will be unable to connect to the data source. We cannot upgrade any new versions of MDAC or upgrade the .NET framework so this is a concern.
Do we need .NET 2.0 or Visual Studio 2005 to connect and work with SQL Server 2005? If so, this will be a problem as we cannot upgrade any client beyond .NET 1.1, and only have VS 2003 as a scripted application we can install for any development.
Has anyone have any experience of the KPI capabilities of SQL 2005? We are bordering on committing to a Business Objects BI platform, and having worked with BO Dashboard Manager and Performance Manager for 4 months (it was horrible), I am not relishing the prospect and would like to propose SQL 2005 as an alternative.
I recently upgraded to sql server 2005 for developing on my local system and cant seem to find the option that automatically sets the drop procedure at the top and the usernames on the bottom of a procedure that I script as new. I used to do it in the old query analyzer so Im sure its in there somewhere. Thanks in advance for any help.RyanOC
Hi pardon my ignorance but I wonder if someone could answer a few questions for me.
I am writing a program which will be used by perhaps upto 100 users at a time. The program sits on any number of PCs and loads user specific data to a given PC according to who has logged on to windows on that PC.
A number of data items loaded from the user table have to be unique as they are usernames for other systems that my program simplifies access to.
So when a user logs on to my program for the first time a row is created for them in the user table (indexed by a GUID and their unique network name). The other unique fields are left blank and the user is given an opportunity to fill these details in.
Before writing these details to the user's row in the 'users table' the program loads the whole user table down and checks that these items are unique before committing them to that user's row in the table.
The problem of course is that if between the program downloading the user table into a local datatable, checking the values are unique and then actually writing them someone else writes the same data into their row then 2 users end up with the same data - which shouldn't be allowed. i.e. 2 users can't have the same user name for the other software.
How can I solve this problem with locking? Once the user table is downloaded and in a locel datatable presumably the table is no longer locked so another user could write data to the table.
I acutally think this is going to be a pretty rare occourance but I still want to try to cover all eventualities.
I suspect the problem is the way my program is going about the checking.
Should I use an SQL insert statement like??
If exists(SELECT username from users where username=@username) BEGIN RAISEERROR("Username already exists") END ELSE BEGIN INSERT etc
If so I guess this will simplify my code. Is this the correct thing to do? And then just trap the errors that arise if a duplicate does arise?
Also some more general questions.
1)I presume 2 users simultaneously looking up data from 2 different rows in a table doesn't lock the table so one search fails? I use the code below having set up a command to run a stored procedure to search for a user by their network name.
2) I presume writing data to my user table a row at a time will also not cause a lock. I create a command object with all the row values in and then do a command.executenonquery()
As a rule I close all my connections as soon as I'm done with them.
A few collation questions on SQL Server 2005 SP2, which I'll call SQLS.The default collation for SQLS is apparently SQL_Latin1_General_CP1_CI_AS.I wish to use a variation of this, SQL_Latin1_General_CP1_CS_AIcollation, but there is no such collation returned fromfn_helpcollations(). Also, if I try to use this collation ina CREATE DATABASE stmt, SQLS yells about it.I see that there is a Latin1_General_CS_AI. What effects are therein using this collation? The SQL_* collations are SQL collations,while non-SQL_* collations are Windows collations, yes? SQLS runsonly on Windows, so am I safe in using Latin1_General_CS_AI? Whatdoes the CP1 in the SQL collation signify? Am I asking for trouble?------------------------------------Assuming that I set Latin1_General_CS_AI (or any other case-sensitivecollation) at the database level, I believe my DDL/DML for that databasealso becomes case-sensitive. How can I specify that I want ONLY my dataaccess to be case-sensitive, and not my DDL/DML? I don't want to haveto remember to type "select * from MyCamelCase" when "mycamelcase"should work.Any help appreciated.A new SQLS DBA..aj
Previously in Sql Server 2000, we would be in enterprise manager, you'ddouble click on a view, and a nice little dialog box opened with the t-sqlstatetments, there was also a check sql syntax and apply and cancel buttons.Not exactly query anaylizer, just a quick lightweight dialog box. Is thisfeature still around? Seems like I have to go into the query anaylizer likemode to edit a view now. I am a total newbie to version 2005. Are there anyoptions I can set to make it behave the old way? All feedback isappreciated.TIA,~CK
So I am fairly new to Express and I have installed it on my development machine; much tio my chagrin it is quite difficult to import data into SQLEXPRESS. I have a 'sa' account setup and I have created a new database and table within that database, however, when I try to import data into that table by setting up a link server to excel I am having some major issues!
I ran this code first to create the linked server...
DECLARE @RC int
DECLARE @server nvarchar(128)
DECLARE @srvproduct nvarchar(128)
DECLARE @provider nvarchar(128)
DECLARE @datasrc nvarchar(4000)
DECLARE @location nvarchar(4000)
DECLARE @provstr nvarchar(4000)
DECLARE @catalog nvarchar(128)
-- Set parameter values
SET @server = 'XLTEST_SP'
SET @srvproduct = 'Excel'
SET @provider = 'Microsoft.Jet.OLEDB.4.0'
SET @datasrc = 'c:Anchor_Hocking blactionlist.xls'
OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "Anchor_Hocking" returned message "Cannot start your application. The workgroup information file is missing or opened exclusively by another user.".
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "Anchor_Hocking" reported an error. Authentication failed.
Msg 7303, Level 16, State 1, Line 1
Cannot initialize the data source object of OLE DB provider "Microsoft.Jet.OLEDB.4.0" for linked server "Anchor_Hocking".
The file is not open. I have granted full access rights to all users....I am really frustrated!
Also, how can I get SSIS on this machine with SQLEXPRESS?
All: I am writing an Internet/Extranet based (ASP.Net 2.0) web application that uses SQL server 2005 as the database. I am using forms authentication on my web application. I am also storing the connection string to SQL server in my web config file. The conn string is encrypted using DPAPI with entropy. I currently have created a SQL login account on my SQL server for use by the web application. This is the user ID I am using in my conn string. The reason for this is because all persons using the application will NOT have a windows login. Here is my question: The login I created currently has defaulted to the "dbo" role and therefore has "dbo" rights to the database. I want to setup up this login account so that all it can do is execute stored procedures. I dont want this SQL login to be able to do anything else. In my application I am using stored procedures for ALL data access functions, via a data access layer in my application. Can someone guide me step by step as to how to setup this type of access for this SQL login. Thanks, Blue.
I have a webpage that displays 4000 or more records in a GridView control powered by a SqlDataSource. It's very slow. I'm reading the following article on custom paging: http://aspnet.4guysfromrolla.com/articles/031506-1.aspx. This article uses an ObjectDataSource, and some functionality new to Sql Server 2005 to implement custom paging.There is a stored procedure called GetEmployeesSubestByDepartmentIDSorted that looks like this:ALTER PROCEDURE dbo.GetEmployeesSubsetByDepartmentIDSorted( @DepartmentID int, @sortExpression nvarchar(50), @startRowIndex int, @maximumRows int)AS IF @DepartmentID IS NULL -- If @DepartmentID is null, then we want to get all employees EXEC dbo.GetEmployeesSubsetSorted @sortExpression, @startRowIndex, @maximumRows ELSE BEGIN -- Otherwise we want to get just those employees in the specified department IF LEN(@sortExpression) = 0 SET @sortExpression = 'EmployeeID' -- Since @startRowIndex is zero-based in the data Web control, but one-based w/ROW_NUMBER(), increment SET @startRowIndex = @startRowIndex + 1 -- Issue query DECLARE @sql nvarchar(4000) SET @sql = 'SELECT EmployeeID, LastName, FirstName, DepartmentID, Salary, HireDate, DepartmentName FROM (SELECT EmployeeID, LastName, FirstName, e.DepartmentID, Salary, HireDate, d.Name as DepartmentName, ROW_NUMBER() OVER(ORDER BY ' + @sortExpression + ') as RowNum FROM Employees e INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID WHERE e.DepartmentID = ' + CONVERT(nvarchar(10), @DepartmentID) + ' ) as EmpInfo WHERE RowNum BETWEEN ' + CONVERT(nvarchar(10), @startRowIndex) + ' AND (' + CONVERT(nvarchar(10), @startRowIndex) + ' + ' + CONVERT(nvarchar(10), @maximumRows) + ') - 1' -- Execute the SQL query EXEC sp_executesql @sql ENDThe part that's bold is the part I don't understand. Can someone shed some light on this for me? What is this doing and why?Diane
Hi all, i have standard edition of sql server, on a server hat doesnt have sql server standard would i be able to connect to it using my connection string. Or does the server has to have standard edition too. Is this same for express edition, and if possible to do this whats the difference between express connection string from standard edition thanks
I have a SQL Server 2005 database (called BDHSE) in a PC which i call PC1. I have a second PC (PC2) and both are within a network (a WLAN).
What i want is to have access to BDHSE from an application in VB6 (APP1) running in PC2. All the INSERT, DELETE, UPDATE records process is done through APP1.
APP1 ia currently running in PC1 and is to be installed on PC2.
I have these questions:
1. What do i need to install in PC2 since all the INSERT, DELETE, and UPDATE is done using APP1? I guess i only have to install the Microsoft SQL Native Client (with all the prerequisites of course) but i am not sure.
2. In the APP1 made in VB6, do i have to change the connectionstring since i am accesing the database which physically is at PC1 and the APP1 will be used in PC2?
3. Any advice you can give me on doing this will be well received.
I am confused on key column of case table and key time column of nested table by using Time Series algorithm.
In my case, the case table structure is as below:
Territory key text (the ID is actually dimrisk_key, in this case, I use the name column binding to combine the Territory column of case table Dimrisks),
While the nested table structure is as below:
Cal_month key time (in this case, actually the ID is dimdate_key, again, I used name column bining property to bind the Cal_month to the ID)
So my question is, as the key column of case table has been set to be Territory, as a result, does the model training still cover all the cases (rows) based on the ID of the table?
Also, in the nested table, as the key time column has been set to Cal_month rather than Dimdate_key of the nested table, as a result, would the single series based on the cal_month?
Hope it is clear for your advices and help.
And I am looking forward to hearing from you shortly.
I've saw many tutorials about using TimeSeries. But all of them using a table. But I'm using a cube to represent data. So I'm trying to build forecast from cube, but it doesn't so good as in could be. I've got the same problem as desribed in Microsoft's tutoral Adventure work. So I need to forecast a series of sales. The problem is that I can't create second key value, as it shown in tutorial. So I can't split good's sales. I have created dimentions for goods and for time. So cube's browser shows me very handsome view, but the problem with mining model still remains... Please, help me! How can I solve this problem? Can I create a separate table from cube to build forecast by this table? Or I can solve this problem not using tables?
I mined a small-size table using TimeSeries. There are only 3 columns in the mining model : payment, region_name, and period. Payment is of type floating-point designated as predicted column, region_name is of type string/text designated as key, period is of type numeric(6) with year and month designated as key time. Build and deployment are successful. The value of period spans from 200501 to 200603, with PERIODICITY_HINT is set to {12}.
(1) However, the viewer displays the result in percentage, not the values of column payment. How can I instruct the viewer (or Visual Studio) NOT to display it on percentage?
(2) Two key-values are missing. In the data, there are 8 key-values. But the viewer displays only 6 of them. How can this be?
Revenue 4 GB (4086 MB) Partitioned into 12 partitions (about 340 MB each) The count of records : almost 16.8 millions Each partition has approximately 1.4 million records
Algorithm chosen : TimeSeries Time Key : Period (6 digits integer with values range from 200501 to 200512, integer values) (Non-time) Key : Telephone number, variable-length string/text, max 15 char Input-and-predicted column1 : SLI007 Input-and-predicted column2 : SLI008
I have set the value for PERIODICITY_HINT to {12}. I created the project 3 times in 3 machines. All fail. The error message is the same :
Internal error : An unexpected exception occured. Internal error: An unexpected error occurred (file 'dmtimeseries.cpp', line 646, function 'DMTimeSeries::ProcessCaseTS3and4').
I searched for the file in C:Program FilesMicrosoft SQL Server and its subfolders. I didn't find it.
What causes the error?
I tested 3 machines to check whether the performance of the machine is the culprit.
Single processor Pentium M 2.0 GHz with 2 GB Memory Double processor Xeon 2.4 GHz with 3 GB Memory Double processor Xeon 3.4 GHz with 4 GB Memory
But the project in all those machines return the same error. Any idea?
Using the TimeSeries algorithm, how do I forecast more than one time period ahead? I read in you book on page 182 that PredictTimeSeries function can take a parameter for the number of time periods you want to predict. Fore example, SELECT PredictTimeSeries(Bread,5) tells the algorithm to predict the next 5 time periods. Can you tell me how to change that parameter using the graphic interfaces?
We are running SQL Server 7.0 SP2, and are experiencing the following out-of- space error message:
"Could not allocate new page for database 'FooBar'. There are no more pages available in filegroup SECONDARY. Space can be created by dropping objects, adding additional files, or allowing file growth."
Needless to say, but the the database is set for 10% unlimited autogrowth and there IS available space in the partition where the filegroup resides.
Any ideas as to why this is happening? What is SQL Server's algorithm for allocating space when growing a database? Must it satisfy the request in one 'extent' and the cause of our problem is that our disk is fragmented?
We have an SSIS package that will be used for both our Test and Prod imports on the same server. The SSIS imports are identical expect that Test needs all connections pointing to the Test database while Prod need its connections pointing to the Prod database.
How can I change the connections, based on Test or Prod, used inside a single SSIS package? (I don't want to create two tweaked packages on the same server. If I find a bug in one of them, I have to correct it twice.)
Hi guys, I have read many articles on the matter and I have probably used up all my printer's ink in doing so, however, some questions still remain. 1) What happens if I have to reboot the mirror.. security update, etc.? Obviously the session is broken during reboot, but would I have to do another backup of the principal and resync everything? 2) I know it is not best practice but at this point I have no choice, however, I wanted to get your guys feedback on having two instances of SQL2005 on my development box. One for the mirroring of the production and the second for development. The two instances would live on their own drive... not partitioned and have adequate memory and space. What would I have to look forward with this? 3) Lastly, I am still uncertain if mirroring is approved for production. Is it?
What is the proper syntax to return records that appear in 1 table but NOT the other table? I have 2 tables that should contain the same records(based on shipping report number), so my join will use this field. How can I only return the data where the shipping report number appears in only 1 of the tables though>?