I am about to prepare a paper concerning the field of real-time data mining. Real-time here means the process of incremental training of an existing model as soon as the data arrives.
There is a number of papers introducing algorithms for incremental association analysis, incremental clustering etc. Stream mining Ãs a field which is closely related to that. The main reason for the implementation of incremental algorithms is a) the large amount of data to be mined and b) the high rate of new data that is evolving every day.
Using classical batch mining algorithms, models that are outdated for some reason, would have to be re-trained, which could be very time consuming for billions of records. And once the training is completed, the training would have to be restarted once again because a bulk of new data has been arrived.
The question that I would like to discuss now is: For what real world applications would it be a meaningful or even essential to use real-time training of models?
Two main reasons could determine the answer to that question:
You just want to incorporate new data into existing models in order to increase the prediction accuracy of your model or
Your underlying data is subject to more or less massive changes (also refered to as concept drift) and you want to adapt your mining model continuously to that reality.
I'm looking for some examples or ideas where one of these cases apply and it would be a good idea to have incremental mining algorithms involved.
I'm looking forward to inspiring some discussion on that issue.
With SASS Database i have created Data mining Structure Using Time series algorithm, while processing the SSAS db, Data mining  taking long time to process, so how we can  reduce processing time ???
I currently have an application which retrieves stats from a SQL database that are updated throughout the day. My application updates every 3 seconds to retrieve these stats but I found that it's quite expensive to the server's CPU when there are several clients running my application. Are there any other methods for extracting the data that won't require so much CPU. The fact that the database is being hit so much isn't good either. I have already written a webservice hoping that this would improve things but I'm not sure it has.
Hi I´m a SysAdmin and I´m implementing ASSP (anti-spam) to use with an email server. But the ASSP requires a txt file with all domains hosted in the server that are in a SQL Table of the server. My questions is: Is there anything to make a real time TXT file with all the domains ? maybe use trigger I dont know..
I was wondering about the stability of SSIS when it comes to importing data on a real-time basis. Let's say you have a scenario where flat files, for instance, will be dropped at random intervals ranging from 1 second to 10 seconds apart and the importer has to import these files immediately.
I would imagine that this is done with a package which runs a loop sniffing the directory forever but I stand corrected on the best ways of doing it.
I'm not too sure whether SSIS is a good idea for this as lots of people have had bad comments on SSIS in real-time in my company but they cannot elaborate on why enough to convince me. I have done some pretty cool stuff and must admit that I am a fan of the technology but dont want to defend it into a corner
I want to build a windows application in Visual Studio 2005 that grabs some data from a SQL DB and displays it in a gridview. That is easy and I already have it done, but I also want to know how to show this data in real-time. For example: right now we have a application that pulls some login information for all the users and displays their phone extension, their name, and their status. They can change their status by clicking on some radios below the datagrid and on change it updates the data and then posts back. However, I dont know how the guy who did it made it so that these local changes are automatically shown on everyone elses computers around the office. In other words, they are shown the live data as it changes, without having to refresh it or anything. How would I go about doing this or does anyone have any places/resources that could help me accomplish this task. Thanks. Oh and the guy who wrote this is no longer working here and he deleted the source code so i cant look at it.
I have no idea where to post this kind of question, so here it is!
I have a requirement to retrieve oracle 10 data into SS2000 in as near real-time as possible (stupid users!) and join with resident SS data for on-demand reporting. (We use SS replication to populate some reporting tables from other SS2000 instances and this has spoiled the users as well as the developers! )
I would like to know if there are any clever ways of doing this, or if a plain-old DTS package running in some kind of loop is the practical answer. 1 minute delay is probably too long . . . I don't know if data can be pushed from the oracle side. Or if we need to write a Service and use it to suck and push.
Is it possible to group many time series into clusters by using the clustering algorithm of the SQL server 2005. The same question applies to "association rules" technique. Any examples?
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
We have a set of cubes and dimensions, and we're experimenting with data mining against the cubes (primarily for forecasting applications). We have a custom time dimension (which we call calendar), not generated by the BIStudio wizard. The dimension has year/month/day/hour/... attributes. But when I try to add this Calendar dimension to the mining structure as a nested table using BI studio, it only shows the Year attribute, not the others. Other dimensions seem to show all the attributes.
Is there something we've done wrong in defining our time dimension? What determines which attributes show up as available for selection in BI studio?
Is there a way to keep track in real time on how long a stored procedure is running for? So what I want to do is fire off a trace in a stored procedure if that stored procedure is running for over like 5 minutes.
I would like to know if there is any way to migrate third-party data mining packages with SQL Server 2005 data mining algorithms together then we can have a comparison among all of them to get the best results for training models.
Hoping someone will have a solution for this error
Errors in the metadata manager. The data type of the '~CaseDetail ~MG-Fact Voic~6' measure must be the same as its source data type. This is because the aggregate function is not set to count or distinct count.
Is the problem due to the data type of the column used in the mining structure is Long, and the underlying field in the cube has a type of BigInt,or am I barking up the wrong tree?
I'm a beginner with SQL 2012 SSDT & SSMS. I get this error message when I try to deploy my project:Â
"Error 6 Error (Data mining): KEY SEQUENCE columns are not supported at the case level. The 'Customer Key' column of the 'TK448 Ch09 Cube Clustering' mining structure contains content that is not valid. 0 0 " I am finding it hard to locate the content that is not valid. I've been trying to find a answer for this problem but can't seem to find anything. How can I locate the content that is not valid and change or delete it so that I can deploy this solution?
- a data mining structure with about 80 columns. - a data mining model using Microsoft_Decision_Trees with 2 prediction columns.Â
I thought I would then explore the possibility of have more than 2 prediction columns, in this case 20.
I get an error message and I can't work out : a) if this is because there's a limit to the maximum number of prediction columns and where that maximum is stated. b) if something else has become corrupted c) there's a know bug and if the error message is either meaningful or not.
Either way, I'm unable to complete the data mining wizardÂ
The error message is :Errors in the metadata manager. Either the mining structure with the ID of '[my model Structure]' does not exist in the database with the ID of 'DMAddinsDB', or the user does not have permissions to access the object.
I am using Microsoft_Time_Series and have set HISTORIC_MODEL_GAP to various values (from 1 to 21). I always get this error: Error (Data mining): The 'HISTORIC_MODEL_GAP' data mining parameter is not valid for the 'My Time Series' model.
In Algorithm Parameters window, this parameters is not there by default, so I have to add it.
Implementing data mining Add-in in an academic setting? We need to handle over 150 new students a semester and have their connection to Analysis Services survive for their four years at the college. We are introducing data mining to every freshman business student as a unit within their Intro to Excel class (close to a month of work to give them a sense of what is possible). Other courses later in their curriculum will expand on that introduction.Â
Once implemented, we would have as many as 900 connections to manage (four years from now). It is possible that multiple sections will be running at the same time, so 40 students may be accessing the data mining tools concurrently. Â
Is there a way to "bulk establish" the access credentials and establish those databases?
I have a very simple time series model which processing works fine without any problem. However when I run the following query
SELECT
[TimeSeries].[PriceChange],
[TimeSeries].[Symbol],
PredictTimeSeries(PriceChange, -3, 2)
From
[TimeSeries]
WHERE
[TimeSeries].[Symbol] = 'x'
I get the following error:
TITLE: Microsoft SQL Server 2005 Analysis Services ------------------------------ Error (Data mining): A time series prediction was requested with a start time further in the past than the internal models of the mining model, TimeSeries, specified in the HISTORIC_MODEL_GAP and HISTORIC_MODEL_COUNT parameters can process.
The following is the excerpt of the minding model script related to the two parameters:
<AlgorithmParameters>
<AlgorithmParameter>
<Name>MISSING_VALUE_SUBSTITUTION</Name>
<Value xsi:type="xsdtring">Previous</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_GAP</Name>
<Value xsi:type="xsd:int">1</Value>
</AlgorithmParameter>
<AlgorithmParameter>
<Name>HISTORIC_MODEL_COUNT</Name>
<Value xsi:type="xsd:int">10</Value>
</AlgorithmParameter>
</AlgorithmParameters>
These HISTORIC_MODEL_GAP (1) and HISTORIC_MODEL_COUNT (10) should accommodate PredictTimeSeries(PriceChange, -3, 2). Could anyone shed some light on this?
Hi,when i create a new user in a .MDF file, the hour of creation in the field "createdate" of table "membership" is exactly 2 hours less than the time of my computer. Probably something to do with different hours between Europe and UK or ...How can i set the creation hour to exact the same hour of my computer?ThanksTartuffe
I have posted this question in another forum but no one has so far been able to provide a solution to the problem. Since I know Database Journal always has very informative and enlightening posts, I figured I'd post the question here in hopes that some guru can provide an answer.
If you're running SQL 2000 and have any jobs that have been executed, you could perform a query as such:
select last_run_time from msdb.dbo.sysjobsteps
and receive returned values that contain the last time a job was executed "stored in integer datatype" columns. See -> sp_help sysjobsteps.
In SQL 2005 I believe the concept is the same. I think the intent of Microsoft had it mind for doing this was to store the date separate from the time values which won't work using the datetime datatype and I have read this in documentation in the past.
The challenge is to convert that data into a humanly legible 12 or 24 hour time format like 11:00 AM or 02:45:39.
Does anyone have any suggestions or clues to assist in resolving this problem??? :(
We are using SQL 7 at 2 different locations, on 2 different Domains that are connected via a T1 line.
We need to have our 2 SQL servers be identical at all times, meaning when a change is made on one server, it is immediately replicated to the other. The problem is that both servers are always being used, so there is no primary/backup server scenario.
Does anyone know if there is a third party software available, or if this is even possible to do, given the fact that the databases are always open ?
Can anyone guide me. which is the best method for real time synchronisation of my production server. Is it Transactional Replication or Stand By Server?
I have two database, one is for production and another for reporting database, in this Scenario we need to transfer the real time data to reporting database, which replication we need to use and what at the bottelnecks in this..can any one give some suggesstions or give some links...
db1 gets 150k records every hour. db2 is like the backup of db1. db2 is used to retrieve records to create reports using reporting services. what is the best way to synchronize db2 with db1? is it to create a dts package and set up a job that runs every hour?
off topic question. what kind of language does reporting services use to create the reports?
hi guyz ...i m new to sql server so is there anyone who could guide me abt sql server 2005 ....i need some info abt ssis ,ssas ...i have got a proj n have sme ques regd tat so plz let me know ur no or mail id so that i could contact u ..plz its urgent
hi guyz ...i m new to sql server so is there anyone who could guide me abt sql server 2005 ....i need some info abt ssis ,ssas ...i have got a proj n have sme ques regd tat so plz let me know ur no or mail id so that i could contact u ..plz its urgent
Does anybody know a good way to get a (near) live view of a sqlce database that doesn't involve ResultSets? I tried to use a resultset with the Sensitive option set, but it doesn't like the joins that I have to do in order to have the data make sense to the user.
The ETL processing for my project will be deployed as part of a larger application which is operated by non-technical users. Therefore I want to provide real-time feedback to the user on progress, errors, etc., using windows standard GUI interface objects and style- for example a progress bar. The "designer" of course does some of this but there won't be any designer in our deployed application (and SSIS designer is neither intended to be used by, nor appropriate for, non-technical users anyway).
Now that I have the ETL logic working, it's time to tackle this UI requirement. I am thinking one way to do this is to start a script task to run in parallel with the main ETL processing, open a winform in that task that has various GUI objects whose "state" is based on package scoped variables, update the package level variables via the "on progress" event (or other events as needed) of various SSIS tasks and components, and refresh the winform's display regularly via a timer event.
Does this approach seem like it would be effective? Has anyone tried maintaining an open winform via a script task and updating it's objects via package variables in parrallel with the package is running other tasks?
We currently have a propriety in memory DB that is used to store the latest transactions in the system and we have a service that copies the data to a SQL Server every couple of seconds - For historical reporting purpose.
We would like to move into a more standard DB as our real time DB, since we have scalability and availability issues. We taught about using SQL Server since this is the DB we know, but I'm not sure it's built to handle real time data.
Does someone has any experience in using SQL Server for "Real Time" applications?
Does someone has any experience in storing the data files on RAM?
Does MS has a solution similar to Oracle's TimesTen, which is their real time DB?