I have a question regarding whether or not Data Mining can be utilized in a specific problem I have to solve.
Situation: I€™m going to simplify the problem by explaining it in terms of a €śpizza manufacturer€?. Suppose I wanted to predict the run minutes + downtime minutes (I use these to get an hourly rate: Pizzas/(run hrs + delay hrs) = Pizzas per hour) by looking at a set of input properties.
My properties could be something like the following:
# of Toppings
# of Special Pricing Stickers
Cardboard Box Indicator
Case Indicator (0 represents auto-casing, 1 represents putting in case by hand)
Machine Type (0 or 1€¦ 0 represents an older €“slower machine, 1 is newer)
Quantity of Run
(there could be up to 15 other properties that may or may not impact our rate)
Measured Values:
Run Minutes
Delay (down) minutes
Steps I€™ve Done So Far:
I€™ve created a couple different data mining models for this as I was unsure which one(s) to use. I checked the lift chart while feeding back in the original data set and my scatter plot appeared fairly inaccurate.
I've attempted to use Excel to create a linear regression, however my r squared value was always around .30. I decided to try to use SQL Server Data Mining to see if it could be something to help predict our accuracy better than a linear formula.
I've played with a couple different algorithms in Data Mining, and it appeared that none of them did exceptionally well with prediction. I even checked the lift chart using the same table as I used to train the model.
What algorithm(s) might work the best?
Can I reasonably expect a prediction within a fairly strict tolerance (I'm guessing the answer to this is: "yes, if your source data represents a consistent pattern")?
How can I best utilize Data Mining to give an answer like "historically, your run rate has been between these 2 values with a probability of X". I'm thinking I can utilize the predictprobability and stdev to some extent.
Any suggestions would be greatly appreciated.
If anyone needs further clarification, please let me know.
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[table_Data]') AND type in (N'U')) DROP TABLE [dbo].[table_Data] GO /****** Object: Table [dbo].[table_Data] Script Date: 04/21/2015 22:07:49 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[table_Data]') AND type in (N'U'))
Hi! I have created a DMM using Trees. But when I go to the Mining Model Predition tab and select a Predict function, I get this in the criteria column: <Scalar column reference>[, EXCLUDE_NULL|INCLUDE_NULL][, INCLUDE_NODE_ID]. When select Result, I get this error: "An incorrect number of arguments are used in the function at line 3, column 3." I'm predicting a continuous variable.
But when I delete everything except <Scalar column reference> I get this error: "Parser: The syntax for '<' is incorrect."
When I delete everything in the criteria column, I get this: "Query execution failed."
If I change the criteria to "<Scalar column reference>,INCLUDE_NULL, INCLUDE_NODE_ID" I get the error again that the query execution failed.
I'm working from a data set I created. I had no problems with predictions using clustering, but can't seem to get Trees to work.
I would like to create a simple regression equation to predict player win on their next trip. I have tried to create the model using a linear regression tree based on two players (as a test). The result gives me a single node (expected) with only a coefficient instead of a regression equation. I can do this math by hand to get a regression equation and predicted value for the next trip for each player.
The dataset I used for a simple test is.....
Trip # Player Win
1 1001 1,250
1 1002 50
2 1001 1,450
2 1002 75
3 1001 1,600
3 1002 100
4 1001 2,000
4 1002 175
I also tried to predict next trip worth using a forecasting model. I was able to process the model but I was not able to browse the model content in the viewer.
Ultimately, I want to predict next trip worth for individual players off of a cube. The cube has about 1.5- 6M records (multiple records per player) depending on the datasource.
FYI - I have created a working linear regression and a forecasting model off of a cube --- I think I am setting it up correctly.
Hi In this site sample €śgenerate DMX creation statement for a server mining structure and contained model€? I didn€™t understand, what is output? You say output is like a table, but I don€™t know what usage of this sample is because my request was about showing the result of time series algorithm. please notice my question, I should use a query like this for connecting with forecasting model €śSELECT PredictTimeSeries(amount) From [Forecasting]€? And then save this value in the text box? Also I should have training for showing value in the text box, which stage I should do it, this stage is after creating model or no? In time series algorithm, training is equivalent of historic prediction? In this code why these items are unknown and after running we have error In server. Connect for server and in database db for database and in MiningStructure ms for MiningStructurewe have error .
private void button1_Click(object sender, EventArgs e) { Microsoft.AnalysisServices.Server server = new Server(); server.Connect("data source=localhost"); Database db = server.Databases["DMClass"];
foreach (MiningStructure ms in db.MiningStructures) { MessageBox.Show("Processing " + ms.Name); ms.Process(ProcessType.ProcessDefault); }
MiningStructure msIris = db.MiningStructures["Iris"]; MiningModel mm = msIris.CreateMiningModel(true, "newModel");
} Please alittle define this code. What is diffrence between amo and adomd ? And for viweing prediction vlue in textbox and connecting I can use both of them And how can I build a child form?Thanks a lot for your answers
Actor train nested table: ID MovieID Gender 1 1 F 2 1 M 3 1 F 4 1 F 5 2 M 6 2 M 7 2 F 8 3 F 9 3 F 10 4 M 11 4 M 12 4 F 13 4 F 14 5 F 15 5 M
We want to build a classifier model in order to predict the Class of a Movie based on the Gender of movie's actors. To deal with the nested table Analysis Services maps each record of the nested table to an attribute of the case table. These attributes are named Actor(n).Gender with n = 1..15, and so they are dependent on the nested table record numbers. Both Microsoft Decision Trees and Microsoft Naive Bayes algorihms use these attributes without any modification.
We are implementing a Relational Naive Bayes algorithm and we are planning to aggregate such attributes in order to make them independent of the nested table record numbers.
Next step we tried to predict some unseen cases and here we face with a very huge problem.
Lets take more two tables of unseen cases:
Movie test table: ID Class 6 + 7 NULL 8 NULL
Actor test nested table: ID MovieID Gender 1 6 F 2 6 M 3 6 F 4 6 F 16 7 F 17 7 M 18 7 F 19 7 F 20 7 F 21 8 M 22 8 M 23 8 F
Predicting the movie 6 Class is not a problem since the movie actors were included in the training dataset and when the records are mapped to attributes because they already exist in the model. But when you try to predict movies (7 an 8) with unseen actors all new attributes are simply ignored in the ALGORITHM:redict call (in_ulCaseValues is zero!) because they do not exist in the model!
set dateformat dmycreate table tbl_sampemptable(employeeid int,StartDate datetime) declare @employeeid intset @employeeid=1declare @startdate datetime,@enddate datetimewhile(@employeeid<=1000)begin set @startdate='01/05/2008' set @enddate='31/05/2008' while(@startdate<=@enddate) begin if(@employeeid<>1 and @startdate<>'02/05/2008') insert into tbl_sampemptable values (@employeeid,@startdate) else if(@employeeid=1) insert into tbl_sampemptable values (@employeeid,@startdate)z set @startdate=dateadd(day,1,@startdate) endset @employeeid=@employeeid+1end select * from tbl_sampemptabledrop table tbl_sampemptableset dateformat mdy
i have to select records depending on @count parameter to this table.Depending on this parameter value it should fetch sequential dates.For example if @count=2then result should be like this,
EmployeeID FromDate ToDate 1 01/05/2008 02/05/2008 1 03/05/2008 04/05/2008 . . 2 03/05/2008 04/05/2008 //note that here 01/05/2008 is not selected because 02/05/2008 is missing 2 05/05/2008 06/05/2008 . . 3 03/05/2008 04/05/2008 //note that here 01/05/2008 is not selected because 02/05/2008 is missing 3 05/05/2008 06/05/2008 . .
if @count=3 then result should be like this,
EmployeeID FromDate ToDate 1 01/05/2008 03/05/2008 1 04/05/2008 06/05/2008 . . 2 03/05/2008 05/05/2008 //note that here 01/05/2008 is not selected because 02/05/2008 is missing 2 06/05/2008 08/05/2008 . . 3 03/05/2008 05/05/2008 //note that here 01/05/2008 is not selected because 02/05/2008 is missing 3 06/05/2008 08/05/2008 . . how can i do this.please help me.thanks in advance
I am fairly new to SQL 2005 and have taken over a migration project from 2000 to 2005 and one of our scheduled jobs seem to run forever, but do not have errors. This did not happen in the past so I was wondering if the agent settings for Replication Merge has the -Continuous parameter will the job ever complete or does it really run "continuously"?
l've a series of day which record the date of an event. l would like to count the # of continuous days for the event. In this case, it would be 14/5, 15/5, 16/5, 17/5, 18/5, 19/5 and 20/5. Any idea to do this in SQL?
I am having one question about discretization of continous attributes values. How does it work? I need this information for my thesis. I have a continous attribut, namely SKS, with range 0-20. When I use Microsoft Decision Tree algorithm, this attribut split in SKS <= 18 and SKS > 18. I want to know how does it find 18 as a number to split not the other.
One question again about Microsoft Decision Tree algorithm, about Complexity_Penalty parameter. How does it affect the algorithm? For example, if I set this value=0.1 what does it mean and how does it correspond with growth tree?
Thanks a lot before for your kindness to answer my questions.. :-)
Hi All, I was wondering if there was a way to specify a range when training a model to predict continuous variables. For instance, the predicted variable can only have a range of 1 - 10.
Does anyone know exactly how to create a trace that runs continuously on a server and writes the data to a table? Now I know how to create a trace file with the profiler, but I want something set up so that I don't have to have the profiler running on the server all the time. As well as soemthing that will restart itself if the server is rebooted. I have been looking at these x(xp_trace.*) procedures. Is this the way to do it?
I have to trap login information in a table and have a scheduled job that runs once a month and look for specific data in the table and send out e-mails based on certain values.
I have written the procedure which does this I just need to know how to set up the trace so it runs in the background continuously.
I have a table with below data. Requirement is to replace all integers with continuous 6 or more occurrences with 'x'. Less than 6 occurrences should not be replaced.
create table t1(name varchar (100)) GO INsert into t1 select '1234ABC123456XYZ1234567890ADS' GO INsert into t1 select 'cbv736456XYZ543534534545XLS' GO
I'm currently creating a SSIS package that takes data from 3 unique databases. A SQL DB, FoxPro DB, and an Oracle DB. The data is pulled, cleansed and put into a single SQL 2005 table. The data is then pulled from this table every 15 minutes, formated in a given specification and uploaded to an ftp site. This part is done. My question is this:
This package needs to run around the clock, non-stop. How can package be set up to do this? It needs to pull data from the 3 DBs and put it in the common table, wait 15 minutes and do it again. Wait 15 more mintues and do it again. And so forth. A problem I'm having is I don't see a way to set up a SSIS package so that it runs around the clock.
On same premise, I have another issue. When I try to take data from the common table and there is nothing there, it causes an error. Is there some way that you can run a test like
SELECT * FROM _table_ WHERE is_sent = 0
if results == 0 { wait 15 minutes and test again. } else if
{ write flat file, wait 15 minutes. }
This has to be done in the Control Flow scope, so I can't use a conditional split. This is a pretty big deal as this needs to run around the clock. Thank you in advance for your assistance.
I was wondering if anybody knows how to solve this problem. Here's the setup.
There is an ASP.NET application running on a local web server at the customer's location, it currently uses a MSDE backend database. There is a copy of the application on the customer's webhosting company so it can be accessed from outside the customer's location it is running on a full version of SQL Server 2000. We have this setup because the customer's ISP is not very reliable and the customer needs to be able to use the application even when their web connection is down. It is also used from outside their location by sales people and management and remote offices. The problem is we want to keep both databases synchronized together. We had been using Merge Replication which was working fine until the local ID jumped because it had run out of allocated identities. This causes a problem for their accounting because now there is a gap in the document's numbers.
Is there a way to have the identity field (or a generated document number) to remain continuous and unique across both databases? This needs to also work if one of the databases were to go down for a time or lose connectivity between the two servers. I'm looking for any option. We also have the option of upgrading the application to SQL Server 2005 if needed. Any ideas are appreciated.
I have set up merge replication and it works nicely.
I have set it up to work continuously, because I thought that if it can't find the subscriber or is offline then that's fine it will just sync again when it's back on line.
This is true
BUT it keeps throwing lots of messages into the event log to tell me the merge has failed.
SO
a. Can i just turn off the error reporting
or
b. How can I get it to sync this way automatically on connection without the error messages
I have one data pump in a series that was pumping in too many records. Doing an independent query of the source table, I found there was about 140,000 records. My pump uses a variable for the source query, nothing fancy just a simple SELECT * FROM table WHERE DateField > '4/6/2006 12:00:00AM'. The Destination is local on the SQL Server and is set by a variable, and does a fast load. When I went away and checked in BIDS while it was running (the data flow tab where you can see the record count) it was at 28,000,000 and still going!
Any ideas what could be causing this? As I say there are only 140,000 records and no joins in the query--is this a bug someone has run into before?
It seems i face a problem with the Microsoft Decision Trees model when i have a predictable variable that is continuous. I have created the whole model according to the AdventureWorks tutorial (and it informs me that the same procedure is followed with a continuous variable) and i have flagged the variable as continuous. Even though everything seems be going well, the results i get are not correct (after a cross check with another project already done and checked). Is there something i am missing or i skipped while creating the model? Any suggestions that may help me are appreciated Thank you in advance
hi all how to find the continuous date from the given date range in sqlserver 2005 e.g. 2007-01-27 and 2007-02-02 and output should be 2007-01-27 2007-01-282007-01-292007-01-302007-01-312007-02-012007-02-02any suggestion?
I am trying to find the members who are having monthstartdate continuously for 11 months ;
here in my example 123 wont have monthstartdate continuesly for 11 months it has break for february '2014-02-01'; where as 222 and 223 has continus 11 months , so i need to pull such members .finding out the members continuesly(enrolled) having 11 months.
Below is the sample data i am referring.
memid    MonthStartDate 123 2014-01-01 123 2014-03-01 123 2014-04-01 123 2014-05-01 123 2014-06-01 123 2014-07-01
I am getting SQL Time out exception after long run of 15-20 Hours, Please find the attachment for more details. My SQL queries  are not taking much time for execution , simple Update /Insert statements to local database.
Observed Activity manager also, looks fine. One more application connected to same database (Insert statements) works fine. In connection string I have modified Connection string ConnectTimeout = 2147483647; (max)
I have SQL 2000 Server that had a database called ABC and it has been moved to another server on 5-15-2007 I kept ABC database in Read_only mode for few days (just in case) on old server and finally dropped it on 5-20-2007 and I think I forgot to drop the associated logins. I started seeing login failure for user 'xyz' in error logs When I first noticed the login failure error in SQL Server log for login xyz, I deleted xyz login but it did not stop the errors.
I have been trying from then and no luck in identifying the cause/ resolving this issue. I have ran SQL profiler trace and caught the user hostnames, NTusername in few cases and Application Name and contacted the Application owner & user (who are in the trace) to stop windows service/ schedule jobs..anything that is pointing to old server but the bad luck is they are not aware of anything running or pointing to old server. The worst part is the user whose hostnames are shown in the trace have never used ABC database and do not have any idea.
Here is what I found in the profiler trace: TextData LoginName NTUserName HostName ApplicationName Login failed for user 'xyz' xyz AB00007 WAB000007 Microsoft (r) Windows Script Host
Today I have created the xyz login in the server and assigned model database with reader permission to see if log some different error but nothing new (the same login failure error) Finally, I had no solution other than restoring the ABC database back to my old server and set it in to Read_Only mode to stop these errors and Now I see the login 'xyz' firing the query against the database
* Any hint or pointers why this is happening and any possible solution on this?
We have a table in our legacy database system representing health insurance polices. The customer can, and usually does, renew the policy after 12 months. The legacy database uses the renewal string "99/99/9999" to signify "continuous until cancelled", in other words, "forever", or until cancelled
We need to convert this legacy table into a sql 2005 table, which supports the concept of "forever". But how can we do this? ("99/99/999" is not a valid sql date type and we don't want to use varchar for dates.)
Discuss the following sql query with respect to performance in an applicaiton involving more number of concurrent users creating and deleting records. The objective is to create continuous primary key integer values.
Table name: SitePage
Column DataType --------- -----------
PageID BigINT PageName nchar(10)
Query to insert new record
DECLARE @intFlag INT SET @intFlag = 0 WHILE (@intFlag =0) BEGIN BEGIN TRY
[Code] ....
We don't want to use auto increment integer value for primary key because of the following reason
[URL] .....
We also don't want to use SEQUENCE as we have to create 50 sequence for 50 tables
Paper is 21cm x 9cmPrinter is Epson LX-300..When I set this paper size, SSRS turns orientation to landscape and prints as if clockwise right rotated!I tried creating custom paper on print server options without success. I also tried setting the same paper size in Report Builder and Print Server but failed again.
I have a report that has one subreport. I am finding that if the entire content of the subreport will not fit within the space remaining on the page that it will not start displaying data from that subreport until the next page of the report, leaving a blank section in the report. I would like it to display as much as possible on the first page and then continue on subsequent pages.
I have a job where the first step starts and checks for a condition. If its not true, I want it to reset itself and start again in 10 minutes. I'm using sp_stop_job and sp_update_jobschedule and, initially, it looks like it works. But since it's a Daily job, the 'Next Run Date' increments to the following day. Even though I'm using sp_update_jobschedule to keep the active_start_date as the same day, it still increments. I've tried updating sysjobschedules directly, but get the same results.
Any thoughts much appreciated! Here's my code: USE msdb
--This is the part that goes in the job step --and increments the next_run_time if the condition is true.
If (Select count('x') from mytable (NoLock) Where PublicationDate > getdate()) < 1 BEGIN Declare @ActiveStartDate int Declare @ActiveStartTime int
Select @ActiveStartDate = active_start_date from msdb.dbo.sysjobschedules (NoLock) Where schedule_id = 61 Select @ActiveStartTime = active_start_time from msdb.dbo.sysjobschedules (NoLock) Where schedule_id = 61
I am trying to develop a sql statement that will create a recordset of the min (or max) values in x minute increments over a period of time.
e.g. over a period of 7 days, I have data that was collected in 1 minute intervals. I need to know the min (or max) value in each 10 minute interval over that same period of time.
hey all, i need to find the ratio of difference in 2 datetime variables and the difference of another 2 datetime vars. I figured the best way to do it is to convert the difference in both numerator and denominator to number of minutes.