Differences Between Mined Data And Queried Data : Association Rule
Sep 22, 2006
Dear all,
I have a table containing call records, and made a mining model from that table only. The model has 3 columns : calling_number, called_number, and target_operator, using Association Rule algorithm. The key is calling_number, input was operator, and predicted column called_number.
The result shows no rule, but there are results with item-set size of 1 (column) and 2 (column). On the top record of the result, SQL Server says there are 1891 support for called_number = 1891 and operator = 'INDOSAT'.
I queried the table with this query
SELECT DISTINCT calling_number
FROM call_records
WHERE called_number = '07786000815'
AND target_operator = 'INDOSAT';
It returns 2162 records instead of 1891. If I removed the DISTINCT qualifier, SQL Server returns 2159 records. Why is this differences with the result of mining?
I have run into a .. somewhat of a "duh" question. I'm running association rule to run a basket analysis, and I'm trying to get probability of each prediction. I know this is wrong, but how do I go about running PredictProbability on each ProductPurchase prediction?
When I run the below DMX query, I get this error message...
Error (Data mining): the dot expression is not allowed in the context at line 5, column 25. Use sub-SELECT instead.
Thanks in advance...
-Young K
SELECT t.[AgeGroupName] , t.[ChildrenStatusName] , (Predict([Basket Analysis AR].[Training Product], 3)) as [ProductPurchases] , (PredictProbability([Basket Analysis AR].[Training Product].[ProductName])) as [ProductPurchases] From [Basket Analysis AR] PREDICTION JOIN OPENQUERY([DM Reports DM], 'SELECT [AgeGroupName] , [ChildrenStatusName] FROM [dbo].[DM.BasketAnalysis.Contact] WHERE isTrainingData = 0 ') AS t ON [Basket Analysis AR].[Age Group Name] = t.[AgeGroupName] AND [Basket Analysis AR].[Children Status Name] = t.[ChildrenStatusName]
I haven't been able to find a DMX query which will spit out the cases which support a particular association rule. I was hoping it would work sort of like drillthrough but show only the cases supporting a particular rule. Am I missing something?
What I ended up doing was extracting the itemsets of the rule from the model's content then running a SQL query to retrieve the cases that contain both the left-hand and right-hand itemset of the rule. I'm hoping there's a better way.
Can anyone tell me, how the Business Ã?ntelligence Studio calculates the importance of a rule. I can't find the formula. I know some formulas, but the result in SQL Server is completly different.
I read somewhere that market basket analysis finds rules with substitutes as likely as rules with complements due to a consumer behavior called "horizontal variety seeking". This is when customers buy more than one product in the same category even though they are subsitutes. For example, when people go to the grocery store and buy soda, they buy coke and sprite at the same time even though they are substitutes of each other. I was wondering if anyone has experience with this anomaly and how they solved it. I found a time series model called the vector autoregressive model which is used to find the elasticity of prices over a time period. Does anyone have experience working with the VAR model? I am having trouble figuring out what some of the variables in the model are.
I have a problem about calculating data from the table. I want to sent a row to the local variables and calculate them by C#. Thank you in advance for every suggestion.
I have to send updated Employee list from employee master table to a particular email ID on every last date of Month and when a new employee is added / deleted / edited. Also need to send this as an Excel fileÂ
I tried the following but "Invalid Object name dbo.tbl_EmployeeMaster" error coming while inserting a new employee.
USE [eXact] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO ALTER TRIGGER [dbo].[trg_Email]
I got a question about the data preparation of market basket analysis.
There are always some transactions with only one single SKU product. It seems that these kind of transactions have nothing to do with association. Shall I just exclude them or what?
This is a general question about data modeling. I'm more curious than anything else.
There is much talk about over-training data model, and I'm sure there are under training as well. As a rule of thumb, depending on the algorithm, what is a good ratio of attributes vs data points?
I deleted some records out of an entity, I'd like to keep the Codes as contiguous and incremental, meaning no breaks between the code numbers.I created a business rule and applied it but codes remain the same.
I used the "Default to a generated Value" action, then selected the Code attrib. --Saved.
Then back to the Entity, I applied business rules. But nothing seemed to have happened. As there was no change in codes.
managed plug-in framework that's available for download here: http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB&displaylang=en#DMAPI.
This package includes the source code for a sample plug-in algorithm written in C#.
in this source code all .cs files are modified for clustering algorithm
if my plugin algorithm is of association or classification type then what modifications are requried in source code???
I have Two Database that exist on Two seperate servers. The two database contain same schema and contains tables and columns of same name. Some tables have slight differences in terms of data types or Data type lenght.
For example if a Table on ServerA has a column named - CustomerSale with Varchar (100, Null) and a table on ServerB has a column named CustomerSale with Varchar (60, Null), how can i find if other columns have similar differences in all tables with the same name and columns in the two servers.
I am using SQL Server 2005. And the Two Servers are Linked Servers
What Script can i use to accomplish this task. Thanks
I am wondering if it is possible to use SSIS to sample data set to training set and test set directly to my data mining models without saving them somewhere as occupying too much space? Really need guidance for that.
I created a report with same parameters. Two of them a based on a query. The first query contains a simple select * from ... the result will be stored in the parameter @param1. The second parameter contains select * from ... where ID = @param1. So far... If a pass the reportparameters of the report i will get a valuelist for the first query, but nothing for the second one.
How can I manage that the second parameters also gets a valuelist?
I run a website which uses SQL 2000 and VB ASP. I would like to add a section to the site which posts the most popular data being queried from my SQL server. I'm sure this is possible, but I don't know where to start. Please let me know if any of you need specifics regarding my data and set up.
Hello. I work on a proprietary software package at work that has a SQL query engine. I'm able to query tables that I know exist but I would like to know what other tables are there in the db. Unfortunately I don't know the name of the database. Is there a SQL statement that will show me the name of the database that I'm running my query on?
Quick question guys, I remember running into this before but can't remember if I was able to complete it or not. What is the syntax format in the 'Report Parameters' dialog box selecting 'Non-queried' in the "value " field I want to select multiple value parameters ie 001, 002, 003, 004 for one "label" Every combination I tried it's not taking. Thanks in advance
hi please help me,i have a table queried using this sql, select name,(select count(*) from myTable a where a.name = r.name ) as Total, (select count(*) from myTable b where b.name = r.name and dnum > '1') as Used, (select count(*) from myTable c where c.name = r.name and dnum < '1') as remainingfrom myTable r group by namebut i need one more thing in this table that should look like this,nameTotalUsedRemainingPercentageA126650%B2021810%C150150% this is to add the BOLD field from the above table, but my problem is that the computation is "Used / Total = Percentage%"so how can i do this, please help methanks
I have a select statement that is being processed through oSql on Sql Server 2000. There are 2 fields in the select statement that are defined in the dateabase as nvarchar(1). When I perform my select statement, they show up in the output as 4 char fields. See dataset below for example.
How do I get rid of the extra spaces in the output? I have tried using ltrim(rtrim(fielde)) to no avail. Fieldg (the S) is a nullable field and is being processed using an isnull(filedg, ' ').
I am trying to eliminate the extra space in columns that have been returned from a DB query...the tables have character lengths of 40, I would like to return just the necessary characters.
I need to be able to set the date parameters of a report dynamically when it is run based on system time. The problem I am having is being able to compare the dates (StartDate & EndDate) against [Service Date 1]. Essentially this report will only pull the current month's data.
The date fields being created with the GETDATE, DATEADD & DATEDIFF functions are working correctly. Do I need to create a separate dataset to be able to run the parameters automatically in the actual report?
Any help would be greatly appreciated!
SELECT TodaysDate =GetDate()-2,dbo.[Billing Detail].[Service Date 1], DATEADD(mm, DATEDIFF(mm, 0, DATEADD(yy, 0, GETDATE())), 0) AS StartDate, DATEADD(dd, - 1, DATEADD(mm, DATEDIFF(mm, -1, GETDATE()), 0)) AS EndDate, dbo.[Billing Detail].Billing, dbo.[Billing Detail].Chart, dbo.[Billing Detail].Item, dbo.[Billing Detail].[Sub Item], dbo.Patient.[Patient Code], dbo.Patient.[Patient Type], dbo.[Billing Header].Charges, dbo.Practice.Name FROM dbo.[Billing Detail] INNER JOIN dbo.Patient ON dbo.[Billing Detail].Chart = dbo.Patient.[Chart Number] INNER JOIN dbo.[Billing Header] ON dbo.[Billing Detail].Billing = dbo.[Billing Header].Billing CROSS JOIN dbo.Practice WHERE (dbo.[Billing Detail].Item = 0) AND (dbo.[Billing Detail].[Sub Item] = 0) AND (dbo.[Billing Detail].[Service Date 1] Between StartDate AND EndDate
In general, what is the most appropriate way to specify a "data driven" default for a report parameter. Seems to me like options are:
1. construct an expression which operates on one of the existing report datasets (possibly the same data set used to populate the defaults dropdown)
2. construct a new dataset which returns a single value which is the desired default
what's the "normal" approach?
For example, I have an MDX dataset which contains the list of dates for which data is available. There is generally some data available for today, but normally the user is interested in the most recent day with data prior to today.
So which would be the better option out of the two above (or a third option if I'm missing something!) to tell my parameter to set itself to:
"The most recent day prior to today for which data exists".
I need to use a non-queried report parameter to filter a dataset for a report.
The dataset column I'm filtering is numeric. The dataset is not a sproc, it's a table in SQL Server that I am querying.
The Non-queried parameter values (Multi-value) are 1, 2, 3, 4, 5, >=6.
Selecting the >=6 throws the error: "Error converting data type nvarchar to numeric."
Which sucks.
Because...when I go straight into my dataset I can filter my numeric column with the exact same values (=1 or =3 or >=6) and everything works fine. The error is only raised when I use the @Parameter in the dataset.
I've tried eveything, researched everywhere online and I can't find any guidance anywhere.
I have 2 tables People and Scores. A person might have 1-5 scores (unknown at time of Query). I would like to query the two tables into a results table and if person does not have a record the score will be zero. Scores also have a test number so you know which score it is. I can get it done with Stored Proc but I have to use Temp tables and then put the temp tables together.
People Name ID Tom5 Dick2 Harry3 Larry4 Curly1 Scores PrimaryKeyPeopleIDScoreTestNumber 12801
Adding a value to a non-queried default report parameter value does not update on the target server after deployment.
To recreate
1. Create a report in Visual Studio and add a report parameter with the following properties:
Multi-value is checked Available values = "From Query" Dataset = [create a dataset that returns a table w/ a Id and Description column] Value field = [the Id field from the table] Label field=[the Description column from the table] Default values = "Non-queried" (add several values the match the IDs from the table so that some of the values in the report dropdown will show up as checked when rendering the report)
2. Build and deploy the report to the reporting server. View the report and verify the specified items are checked in the report parameter.
3. Go back to Visual Studio and add a value to the Non-queried Default values.
4. Build and deploy the report again. View the report. The newly added item is not selected.
Notes
I verified that the newly added ID exists in the rdl file (as xml) on both the development box and the server where the report was deployed. However, when I view the report parameter using Management Studio (connect to the reporting server), the newly added value for the report parameter does not exist. I verified that changes are being deployed by adding new parameters and changing other properties of the parameter. I thought maybe the rdl itself was being cached somehow - I tried restarting IIS, SQL Server, and SQL Reporting services. None worked. Note that running the report on the development box by running the project through Visual Studio DOES reflect the change to the parameter.
Work-arounds
1. Create a dataset for the report that returns a table of the Ids that you want pre-selected. The query could be something like this:
SELECT '4' AS SelectedId UNION SELECT '5' AS SelectedId UNION SELECT '6' AS SelectedId
2. Delete the report in Management Studio, then redeploy.
I have issue w/ both workarounds because for 1) it is not intuitive and you have to remember to do this for every similar case, and 2) this extra step has to occur each time the report is deployed w/ changes to the report parameter.
Hi I came across something like 3-4-5 rule while going through datamining book....but couldn't get from where that rule has been generalized and how it really works....
How can I setup the dbs in sql server so that when I change the data in one table the changes will cascade down to the tables in my other dbs. Therefore, one database would hold a primary key table. If I had 15 other dbs, then I could somehow link them so the data changed in the primary key table of the 1st database would cascade down to the other dbs.