Why The Node_distribution.PROBABILITY Greater Than 1 In Clustering Algorithm?
Nov 24, 2006
Hi, all experts here,
Thank you very much for your kind attention.
I am having a question about the node_distribution.PRABABILITY. Some of the attribute values though have a small number of support for the specific node, but why it has a big node_distribution.probability even greater than 1? How can the node_distribution.PROBABILITY be greater than 1? How dose SQL Server 2005 data mining engine calculate the node_distribution.PRPBABILITY for its Clustering algorithm? Really confused and need guidance for that.
I dont understand another problem within my mining model. When I query the mining model content ,finding that the same attribute_value have different support and probability for the same node within my clustering model. Why is that? Really confused. And really need help for that.
I have few questions regarding Clustering algorithm.
If I process the clustering model with Ks (K is number of clusters) from 2 to n how to find a measure of variation and loss of information in each model (any kind of measure)? (Purpose would be decision which K to take.)
Which clustering method is better to use when segmenting data K-means or EM?
Since we are not able to use accuracy chart for Clustering algorithms there. So how can we verify the accuracy of clustering algorithm models here in terms of its classification and regression tasks?
Thank you very much in advance for your guidance and advices for that.
I have a question on sequence clustering algorithm. As generally it is used for sequence analysis especially for web path visiting analysis. Besides that, what else scenarios could we apply this algorithm as well?
Thanks a lot in advance and I am looking forward to hearing from you shortly.
Well, i need a piece of code that return me the schema rowset of a desicion tree and let me iterate me on it. I need some columns like node_type, node_unique_name.... etc.. And another question is that the node_distribution has multiple rows of information. How can i access them. I think there must be a adomddatareader object to iterate on them..
Just noticed that the sum of node_distribution.support value is not equal to the value of node_support for a specific node in my Microsoft Clustering Algorithm Model? If so, how can I correctly count the distribution of each attribute value among all?
Really confused.
And thank you very much in advance for any guidance and advices for that. Looking forward to hearing from you shortly.
hi, i have a exercise using association datamining my database have 350 records, i use 90 records for datamining and it release some rules which i choose on top of mSOLAP_NODE_SCORE, but when i use select statement to check my result i have 1 records, the same as my result, and 5 records not true; for example: rules A=a,B=b-> C=c select * from <my_table> where A='a' and B='b' and C='c'; ==>1 record return select * from <my_table> where A='a' and B='b' and C<>'c'; ==>5 records return C with 3 values c1,c2,c with the second statement C includes 2 c1 and 3 c2
i don't understand how they work. i want to choose some best rules can present my database. how can i choose importance and probability to get best rules. with database have 90 records and a database have 350 records which values i should use for minimum_probability, Minimum_Support, Minimum_importance... when i choose rules i should choose on importance or probability.
I installed the bike buyer example and i am learning the DMX language. Now i wrote the following query (using MS decision trees):
SELECT T.[Last Name], [Bike Buyer], PredictProbability(Predict([Bike Buyer])) AS [Probability] From [v Target Mail] PREDICTION JOIN OPENQUERY (....... And so on..)
Now the result is surprising to me. In the resulttabel all the probabilities are equal.
Bike Buyer Probability 1 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 0 0.99994590500919611 1 0.99994590500919611
and so on.
Now i am wondering what predictProbability means. I thought that PredictProbability meant the probability that the prediction is correct. Now all the probabilities are the same and the input is different. Can somebody tell me what PredictProbability means or am I using it wrong?
select * from ( select flattened(*) from ( select att1, topcount(predict([Trans Predictor Unified], INCLUDE_STATISTICS), $Adjustedprobability, 7) as predictedstuff from [Trans Predictor Model] prediction join SHAPE {openquery(DMSCS, 'select distinct CAST(att2 as nvarchar(100)) att1 from DMSCS.dbo.CartProducts order by att1 ')} append ({openquery(DMSCS, 'select CAST(att2 as nvarchar(100)) att1 , att4, att5 as att3 from DMSCS.dbo.CartProducts order by att1 ') } relate [att1] to [att1]) as [Trans Predictor Unified] as SHAPEQ on [Trans Predictor Model].[Trans Predictor Unified].att3 = SHAPEQ.[Trans Predictor Unified].att3 ) as s ) as t where [predictedstuff.$AdjustedProbability] > 0.5
It's working well. I would like to modify one thing. I would like to chang ethe constant in the where condition, so that it is configurable. That is, I would like to store the constant somewhere (SSAS or relational SQL). I was reading the DMX reference, but it doesn't provide much details about the where's "condition expression". And I looked at a document called "OLE DB for Data Mining Specification version 1.0" of July 2000, which does have in Appendix B the SELECT grammar. There it has
I am working on a text mining application wherein I need to detect unusual/anomalous sentences in text. Certain sentences, that I know occur very frequently, are given a likelihood of 0.2 by PredictCaseLikelihood. Other sentences that are just as frequent get a much higher likelihood (>0.9). I am using the NORMALIZED option. The only significant difference between these sentences is their length. The one with the lower likelihood has only 2 words in it, whereas the one with the higher likelihood has more than 10 words. The problem is that the shorter sentences end up being interpreted as anomalous, when in fact they are'nt. Any suggestions?
I am confused about the value of Probability of Value 1 or 2 (on a particular attribute value) in Neural Network viewer. E.g. the value of Probability of value 1 is actually very low (the same to the value of Probability of value 2), but why the bar which shows the strength of the probability of these two values are still so strong even stronger than other values of probability of value 1 or 2 based on other attribute values which have a much higher probability of value 1 or 2?
And how does the algorithm calculate the Probability of attribute value in nerual network by the way?
Hope my question is clear.
I am looking forward to hearing from you shortly and thanks a lot in advance.
In a data mining model with decision tree algorithm. For example I have the following train case table:
StudentID, IQ,EQ, IsPass.
I put all data in the table into the microsoft decision tree datamining model StudentID is the key for datamining model IsPass is prediction only data IQ, EQ is the input.
1. How can I make a DMX selection to find out all NODE_UNIQUE_NAME with probability of IsPass >0.7. 2. How can I make a DMX selection to find out all the StudentID which belongs to the criteria defined by the Node?
We have 2 env. : Testing and Production, both are running Windows 2003 Enterprise Server with SQL Server 2005. The difference is Testing is NOT running Windows cluster but Production do so, what is the best way to transfer a database from testing to production?
We have another systems that both testing and production are running on NON-cluster and we use backup/restore to transfer the database, can it apply in this case.
And I found that there are a tools called DTC, which can transfer all DB objects from one DB to another, is it a best way to transfer between non-cluster and cluster env.?
I have a stored proc. in that proc i need to select a value based on which one is greater. Here is a non working example
select name, if (truck1.age > truck2.age, truck1.age, truck2.age) from person left join truck truck1 on truck1.make = person.make left join truck truck2 on truck2.make = person.make
I have a table which measures the changes in a feedback rating, measured by an integer. Most of my records are the same. Only the primary key & the timestamp change.
How do I query just the changes?
Example dataset:
idrating 15 25 35 45 56 66
[code]....
There are 20 rows & 5 changes. The query I want will result in just those that are different from the ones before them:
I was comparing diffrent columns and within those columns there are specific values. I want to get the greter values using SQL 2000 i want something like these but i think there was an error in scripting
CASE WHEN a > b,c,d,e THEN a WHEN b > a,c,d,e THEN b WHEN c > a,b,d,e THEN c WHEN d > a,b,c,e THEN d WHEN e > a,b,c,d THEN e END
Please help. Is there any possible way to implement this? Thank you.
Here is the first part of a query for MySQL that I am trying to get working on MSSQL:
Code:
SELECT n.*, round((n.rgt-n.lft-1)/2,0) AS childs, count(*)+(n.lft>1) AS level, ((min(p.rgt)-n.rgt-(n.lft>1))/2) > 0 AS lower, (( (n.lft-max(p.lft)>1) )) AS upper FROM table n ...
But, I get this error message:
Server: Msg 170, Level 15, State 1, Line 3 Line 3: Incorrect syntax near '>'.
I created this unique codes and I need all [FRMDAT] field set to "12/31/2014" in the MKLOPT table, where the [JOBCOD] in the VALUE list BELOW have a [FRMDAT] that is currently (greater than) > "12/31/2014"
im supposed to output the companies that have commission rates highter than company "Industrial Appparatus". is there some whay to modify this code so that it will work? commissionrate > ALL(Select commissionRate From salescompanydomestic Where companyName = 'Industrial Appparatus')
I have a query pulling all records with a disconnect date and a transaction date. However, I would like to retrieve any records that have a transaction date greater than 30 days from the disconnect date. I have been unable to figure out the correct formula to use. I think I need to use the datediff function in SQL, but I've never really used this function before.
Problem: I want to set compatibility_level only when it is greater than 110.
Solution: Select the compatibility level and if it is greater than 110, I alter database set compatibility level=110
ISSUE Irrespective of IF Exist statement the alter database statement is executed all the time.
Here is the sql statement
IF EXISTS ( SELECT * FROM sys.databases where compatibility_level >110 AND name='mydatabase' ) BEGIN ALTER DATABASE mydatabase SET COMPATIBILITY_LEVEL = 110 END
Trying to set up a column in a grouped matrix that displays a count of all record over a specificed number.
The field I am counting are response time of transaction and I want to count how many were over 500 milliseconds. I though it would be something like this...
Code Snippet
=Count(Fields!ResponseTime.Value > "500")
However, this appears to just return the count of all rows and ignores the "500" part.
Am I missing something? If someone could post a alternate code snippet, that would be great.
I'm using and Execute Sql Task to get a count of the record in the table: How can I make the workflow to stop if it doesn;t meet the count requirement and continue if it does to the next flow. I'm looking at expression...but a bit comfused about using it.
I have a column of varchar(2000) but when I use it in a select statement I only get the first 255 characters displayed. (all the data is there as I can see different parts using substring) How do I get the complete column to display?
I think I'm trying to do a simple query on maximum date.
I've got 100 tools that have been used over the past three years. Some of the tools are used almost every day. Other tools haven't been used for a month, while other tools haven't been used for a year or more.
Ultimately I'm trying to just find the list of tools whose latest date of use was a year ago.
I have a list of tools and a list of times each tool was used.
I think I'm going to have to do a search that for each tool what was the times it was used. That I can do.
What I'm not sure of is how to then pull only the latest date for each tool.
Once I get that I can then do a query off that result to pull the "oldest latest" date of use.