I have run into a .. somewhat of a "duh" question. I'm running association rule to run a basket analysis, and I'm trying to get probability of each prediction. I know this is wrong, but how do I go about running PredictProbability on each ProductPurchase prediction?
When I run the below DMX query, I get this error message...
Error (Data mining): the dot expression is not allowed in the context at line 5, column 25. Use sub-SELECT instead.
, t.[ChildrenStatusName]
, (Predict([Basket Analysis AR].[Training Product], 3)) as [ProductPurchases]
, (PredictProbability([Basket Analysis AR].[Training Product].[ProductName])) as [ProductPurchases]
[Basket Analysis AR]
, [ChildrenStatusName]
WHERE isTrainingData = 0
') AS t
[Basket Analysis AR].[Age Group Name] = t.[AgeGroupName]
AND [Basket Analysis AR].[Children Status Name] = t.[ChildrenStatusName]
I haven't been able to find a DMX query which will spit out the cases which support a particular association rule. I was hoping it would work sort of like drillthrough but show only the cases supporting a particular rule. Am I missing something?
What I ended up doing was extracting the itemsets of the rule from the model's content then running a SQL query to retrieve the cases that contain both the left-hand and right-hand itemset of the rule. I'm hoping there's a better way.
Can anyone tell me, how the Business Ã?ntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.
I read somewhere that market basket analysis finds rules with substitutes as likely as rules with complements due to a consumer behavior called "horizontal variety seeking". This is when customers buy more than one product in the same category even though they are subsitutes. For example, when people go to the grocery store and buy soda, they buy coke and sprite at the same time even though they are substitutes of each other. I was wondering if anyone has experience with this anomaly and how they solved it. I found a time series model called the vector autoregressive model which is used to find the elasticity of prices over a time period. Does anyone have experience working with the VAR model? I am having trouble figuring out what some of the variables in the model are.
I have a table containing call records, and made a mining model from that table only. The model has 3 columns : calling_number, called_number, and target_operator, using Association Rule algorithm. The key is calling_number, input was operator, and predicted column called_number.
The result shows no rule, but there are results with item-set size of 1 (column) and 2 (column). On the top record of the result, SQL Server says there are 1891 support for called_number = 1891 and operator = 'INDOSAT'.
I queried the table with this query
SELECT DISTINCT calling_number FROM call_records WHERE called_number = '07786000815' AND target_operator = 'INDOSAT';
It returns 2162 records instead of 1891. If I removed the DISTINCT qualifier, SQL Server returns 2159 records. Why is this differences with the result of mining?
I have a market basket model using associations. It generated several dozen itemsets. However when I attempt to run a singleton prediction like this:
select (Predict(Orderproduct3q,INCLUDE_STATISTICS,10)) as [Recommendation]
[Case All]
(SELECT (SELECT '16407' AS [Pname])) AS t1
the resulting predictions don't take the itemsets into account. Instead, the predictions consist of the ranked products in the training set, ordered by frequency. This appears to happen regardless of the precise query specified within the "natural prediction join".
What's going on here and how do I generate a singleton prediction which makes use of the itemsets?
I've been playing around with the association mining model in SQL server 2005 and built a market-basket analysis of my data that I'm pretty happy with. The next task for me is figuring out how to run DMX queries against the data that I've just mined, so we may possibly use it in a web based application. This wouldn't necessarily be a difficult problem (and still may not be), but every example I've seen for the Mining Model Prediction Designer uses relational databases and I built my mining model off OLAP. Therefore, my predictable attribute is nested and when relating the mining model structure to the relational database that the cube was built off always gives me an error:
"Errors in the high-level relational engine. The 'CompanyName' column could not be found in the top-level clause of the SHAPE statement."
What I would like to do, and I'm not really even sure how I should structure any of my queries, is feed the model a product and have it return a listing of all the products it predicts. Currently, I've only been able to get the designer mode to process a singleton query, and even that didn't return any useful data. I know that this probably can be done pretty easily so any advice you may be able to offer would be greatly appreciated!!
So you may better understand my question, my association mining structure hierarchy looks as this..
[Model] ProductRecommend
With that in mind, I'm trying to perform a query simliar to this:
PredictProbability([ProductRecommend].[Product].[PRODUCTCLASSID]), <---- Throws Error for PredictProbability syntax no matter what I try to get to [PRODUCTCLASSID]
(SELECT [PRODUCT] FROM [ProductRecommend].[Product])
What would be the right design approach for the following problem?
I have a single table called SelectionFactors, which has the following columns and sample data:
ProjectID Factor FactorValue
1000 Countries USA
1000 Countries Canada
1000 Countries France
1000 Languages English
1000 Languages French
1000 Company Type Consulting
1000 Company Type Software
2000 Countries India
2000 Countries China
2000 Countries USA
2000 Languages English
2000 Languages Chinese (Simplified)
2000 Languages Chinese (Traditional)
2000 Languages Spanish
2000 Company Type Retail
2000 Company Type Dairy Products
The problem is to allow a descriptive analysis of the data to find patterns in the users selections. For instance, if Languages->English is selected, what are the counts of projects for other Factor->Factor Value combinations? Countries->USA = 2, Countries->Canada=1, Company Type->Consulting=1 and so on.
Since all the data is in this single table, are both the case and nested tables the same? What are the keys and inputs? I only need a descriptive analysis (no prediction) and ALL possible combinations MUST be part of the results; how should the model be designed?
I need to develop a language specific dwh, meaning that descriptions of products are available from a SAP system in multiple languages. English is the most important language and that is the standard. But, there are also requirements of countries that wants productdescriptions in their language.Â
Productnr Productdesc Language 1       product    EN 1       produkt    DE
One option is to column the descriptions, but that is not very elegantly. I was thinking of using bridge tables to model this but you have to always select a language in a filter (I think)..
I'm thinking of a technical solution, such that when a user logs on, the language is determined and a view determines whether to pick a certain product table specific for a certain language. But then I don't have the opportunity to interchange the different language specific fields in a report (or in my case PowerPivot).
We have our Production server having database on which Few DTS packages execute every night. Most of them have Bulk Insert stored procedures running.
SO we have to set Recovery Model of the database to simple for that period of time, otherwise it will blow up our logs.
Is there any way we can set up log shipping between our production and standby server, but pause it for some time, set recovery model of primary db to simple, execute DTS Bulk Insert Jobs, Bring it Back to Full recovery Model AND finally bring back Log SHipping.
It it possible, if yes how can we achieve this.
If not what could be another DR solution in this scenario.
Hi I came across something like 3-4-5 rule while going through datamining book....but couldn't get from where that rule has been generalized and how it really works....
How can I setup the dbs in sql server so that when I change the data in one table the changes will cascade down to the tables in my other dbs. Therefore, one database would hold a primary key table. If I had 15 other dbs, then I could somehow link them so the data changed in the primary key table of the 1st database would cascade down to the other dbs.
Hi, I have a database which saves data about bus links. I want to provide a information to passenger about price of their journay. The price depends on three factors: starting busstop, ending busstop and type of ticket (full, part - for students and old people, ...). So I created a table with three foreign key constraints (two for busstops and one for type). When the busstop is deleted or type of ticket I want all data connected with it to be deleted automatically. I wanted to use cascade deleting. But I receive a following exception: Introducing FOREIGN KEY constraint 'FK_TicketPrices_BusStops1' on table 'TicketPrices' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints. How can I achieve my task? Why should it cause cycles or multiple cascade paths?
Hi,I have a table with the following columns:ID INTEGEDR,Name VARCHAR(32),Surname VARCHAR(32),GroupID INTEGER,SubGroupOneID INTEGER,SubGroupTwoID INTEGERHow can I create a rule/default/check which update SubGroupOneID &SubGroupTwoID columns when GroupID for example is equal 15 onMSSQL2000.It is imposible to make changes on client, so I need to checkinserted/updated value of GroupID column and automaticly updateSubGroupOneID & SubGroupTwoID columns.Sincerely,Rustam Bogubaev
--------------------------------------- small explain this fonctin-generate daily shift pattern 1,1,2,2,3,3,4,5,... (shift=1 morning shift 2=evening shift 3=night ........) and it work ok ------------------------------------------------------------------------------------------------- how to do this ? i want to take this fonctin and add rule so this functin do this generate daily shift pattern 1,1,2,2,3,3,4,5,... now add the new rule !!
if the employee get the shift 2 OR 3 on Thursday !!! but only if it Thursday ! (the week-end start from Thursday until Sunday morning)
the order for this employee id be 2,2,2 or 3,3,3 i explain the employee must start the week-end and finish it with the same shift but only if it start a series 2 OR 3 (2=evening 3=night) ON Thursday . and after continue if the employee on Thursday start shift 2=evening than after 2,2,2 3,3,4,5,1,1,2,2,3,3,4,5,.. if the employee on Thursday start shift 3=night than after 3,3,3 4,5,1,1,2,2,3,3,4,5,.. so like this if the employee on Thursday start a series value 2 OR 3 the employee must to end it on the week-end from Thursday until Sunday morning
so my friends can someone save me how to do this
Code Block -- need a list of employee ids with a basedate set to when they start with shift_code=1, unit=1 -- this is a minimal tale to show the format -- extra columns could be added with other info (e.g. name) create table empbase ( empid int, basedate datetime ) -- fill with test data insert empbase (empid,basedate) values (12345,'2007/1/1') insert empbase (empid,basedate) values (88877,'2007/1/5') insert empbase (empid,basedate) values (98765,'2007/1/20') insert empbase (empid,basedate) values (99994,'2007/6/5') go ------------------------------- create function shifts ( @mth tinyint, @yr smallint ) returns @table_var table ( empid int, date datetime, shift_code int, unit int) as -- generate daily shift pattern 1,1,2,2,3,3,4,5,... changing units 1,2,3,4,... every 30 days. begin declare @d1 datetime declare @d31 datetime set @d1=convert(datetime,convert(char(8),@yr*10000+@mth*100+1)) set @d31=dateadd(dd,-1,dateadd(mm,1,@d1)) ;with n01 (i) as (select 0 as 'i' union all select 1) ,seq (n) as ( select d1.i+(2*d2.i)+(4*d3.i)+(8*d4.i)+(16*d5.i) as 'n' from n01 as d1 cross join n01 as d2 cross join n01 as d3 cross join n01 as d4 cross join n01 as d5) ,dates (dt) as ( select dateadd(dd,n,@d1) as 'dt' from seq where dateadd(dd,n,@d1) <= @d31) ,modval (mod,val) as ( select 0,1 union all select 1,1 union all select 2,2 union all select 3,2 union all select 4,3 union all select 5,3 union all select 6,4 union all select 7,5) insert @table_var select b.empid, d.dt, (select val from modval where mod=(datediff(dd,b.basedate,d.dt) % 8)), ((convert(int,(datediff(dd,b.basedate,d.dt) / 30)) % 4) + 1) from empbase b, dates d where b.basedate <= d.dt return end go
-- test for various months select * from shifts(1,2007) order by empid,date select * from shifts(2,2007) order by empid,date select * from shifts(3,2007) order by empid,date select * from shifts(4,2007) order by empid,date select * from shifts(5,2007) order by empid,date select * from shifts(12,2007) order by empid,date
I need to create the database design for a pretty complex project. We have data coming from a feed and being stored in a table. We need to provide a UI for users to create custom "rules". Each "rule" has to be fully customizable. Here are some examples of possible rules, tailored to the Northwind DB :
1. ( Avg of all Orders with OrderValue > 10 ) / (Median Price of all Products) 2. 50% of the value by which the Product Price exceeds a threshold value of 10
These are just sample rules, there might be many more similar to this.
I am basically looking for pointers for the kind of architecture that would make this sort of customization possible. I am currently thinking of getting the data into a dataset, and store the custom rules that the user creates as DataSet expressions in the DB. When the user chooses to apply a certain rule, the Dataset expression gets evaluated and accordingly returns a value.
I have a table in a database that keeps getting duplicate records added to it. Is there a way to set a rule so that if someone tries to add a duplicate record for that field, it will stop the record from going in?
I know creating an index would be the proper way to do this but:
1. The application does not belong to us. 2. Duplicates already exist in the table for the database.
Basically I am trying to do the most without making alot of changes to the database.
Hi,I basically have two tables with the same structure. One is an archiveof the other (backup). I want to essentially insert the data in to theother.I use:INSERT INTO table ( column, column .... )SELECT * FROM table2Now, table2 has a rule on various columns:@CHARACTER IN ('Y','N')but the column allows nulls, in the design view is says so anyway.When I run this query I get:A column insert or update conflicts with a rule imposed by a previousCREATE RULE statement. The statement was terminated. The conflictoccurred in database 'database', table 'table', column 'column'.The statement has been terminated.Obviously, I've changed the names of everything.The only data in those columns which could possibly conflict with therule is the NULL value. Any ideas why this doesn't work?Thanks.
Hi There, Here I have small problem with default and rule. After create rule or default then we will bind that to any table. I bounded that rule to some of tables.If i want see the list of objects dependent on this rule or default how to see.I know sp_depends stored procedure will show the all dependent objetcs but i could not get through that.I found in help it says sp_depends works for all objects in the database like table,view and so on.But default and rule also objects i could not get it.Please let me know on this if you can give this answer as early as possible. I am very thanks to you.Please don't specify SQL-DMO Listboundedcolumns function.....
I have a query with 2 subqueries, and no error message is reported, but, my problem is that the 2 subqueries do not follow the GROUP BY rule and show the total instead of by vendor...
Code: SELECT Table1.agents AS Vendor , Count(Table1.carS) AS Car_Sold , Sum(Table1.carP) AS Car_Price , Count(Table1.busS) AS MortBus_Sold , Sum(Table1.busP) AS busPRice
It's often said or done that when inserting or updating into a 'large' table that disabling the non-clustered indexes can is needed for performance.
Now I know the obvious way to find out if this is best or not is by testing the different options. I was wondering if there was a rule of thumb to this?
Say you have a table with half a billion rows and 4 non-clustered indexes and are only updating half a million rows then sometimes disabling every night and re-enabling can take way more time than the actual update. Haven't found an articles advising to disable them when a table is over X rows and you are updating Y% of them...
CREATE TABLE EDI_data_proc_log( ID int IDENTITY(1,1), comment VARCHAR(3000), time_recorded DATETIME DEFAULT GETDATE(), run_by varchar(100), duration int );
When a record is inserted I like the duration column to be computed.This should happen only after the first record to the table has gotten inserted.You might say a trigger would be the best.. Ok then, show me the syntax.
Or I am thinking can we write a user defined function that will compute the value for the duration column.
--By default, I would like to update the duration column as follows:
--It should record the number of seconds between the last insertion ( You can get that time from the time_recorded column from the previous record and the current time can be obtained from the getdate() function )
This is a general question about data modeling. I'm more curious than anything else.
There is much talk about over-training data model, and I'm sure there are under training as well. As a rule of thumb, depending on the algorithm, what is a good ratio of attributes vs data points?
Hi, I am a Microsoft BI Developer and currently working on Pharmaceutical BI project. In this project, Client wants to integrate his Blaze Advisor rule engine to SSIS so that he can change the rules in Blaze advisor any time and see the effect of it on the source data. Hence, my question is:
How can i integrate the "Blaze Advisor" to "SQL Service Integration Services" (Microsoft SQL Server ETL tool) which will use my Business Rules ( Written in Blaze Advisor) in the transformation task and process all my source data with the same business logic?
My Trails to Solve this problem:
I have written the rule in the Blaze Advisor & Imported it's rules into .Net file which includes *.Server, *.Client and some other files. I have used the DLLs in this solution in my SSIS script task but it's not supporting to it. It is demanding for *.Server & *.Client files there.
- Can you suggest me a way to integrate SSIS with Blaze Advisor? - How can i use the Blaze Advisor's .Net output files as DLLs into my custom transformation?
I'll be really greatful to you if you could suggest me an approch for this particular business problem.
I have MS Time Seeries model using a database of over a thousand products each of which has hundreds of cases. It amazingly takes only a few minutes to finish processing the model, but when I click Mining Model Viewer to view the models, it takes many hours to show up. Once the window is open, I can choose model for different products almost instantly. Is this normal?