I am reading DataMining Tutorial and right now I am at the Mining Algorithms section. I cannot understand any of the algorithms. For example take the following text... what a bunch of mouthful bla bla bla it is ....
"The Microsoft Decision Trees algorithm supports both classification and regression and it works well for predictive modeling. Using the algorithm, you can predict both discrete and continuous attributes.
In building a model, the algorithm examines how each input attribute in the dataset affects the result of the predicted attribute, and then it uses the input attributes with the strongest relationship to create a series of splits, called nodes. As new nodes are added to the model, a tree structure begins to form. The top node of the tree describes the breakdown of the predicted attribute over the overall population. Each additional node is created based on the distribution of states of the predicted attribute as compared to the input attributes. If an input attribute is seen to cause the predicted attribute to favor one state over another, a new node is added to the model. The model continues to grow until none of the remaining attributes create a split that provides an improved prediction over the existing node. The model seeks to find a combination of attributes and their states that creates a disproportionate distribution of states in the predicted attribute, therefore allowing you to predict the outcome of the predicted attribute"
In the above text what is meant by discrete and continious attributes? what is regression? what is predicted attributes? what are input attributes? what is distribution of states?
Is there a source which explains these algorighms in a easier way ....
I am having a more considertaion about Data Mining plug-in algorithms. When we say we are going to embed a uesr plug-in algorithm, so what is the context for that ? I mean in which case then we thing we need to embed a user plug-in algortihm? I know when we say we are going to embed a user costomermized plug-in algorithm, it means we want something more costomized. But what kind of customized features are generally concerned? Is it independant for different market sectors?
I dont think we can just try to embed a plug-in algorithm then compete it with avaialble algorithms to see which one is with better prediction accuracy?
Would please someone here give me some guidances about that?
I recently started using SQL 2000 Analysis Manager. I wanted to try data mining but was unable to get the Mining Model Wizard to load available techniques.
When I select a cube and "New Mining Model" I get the following error:
"Unable to get list of data mining algorithms."
"Object of provider is not capable of performing requested operation"
Could please any of you give me some advices for if there are tutorials and demos avaiable which cover all the SQL Server 2005 data mining built-in algorithms?
That will be great to hear from any of you shortly. Thanks a lot in advance.
PREDICATESUsed as a clause.A. What does PREDICATES mean?B. What does it mean when used in a where clause?I checked BOL (Glossary) but get no explanation there.ThanksJay
Hi, I am a extreme beginer to sql server and i am i'm having big trouble trying to display my sql query properly. Bascially i want to put the results of a one to many query into one row per record. I have read articles and forums discussing 'concatenating the values' or creating a function??? but i dont follow what they mean and i am completely lost. Can anyone provide a really simple explanation on what i need to do to resolve my duplicate row issue? i urgently need to find a solution to this. Regards
Just wonder is there any good idea for us to select attributes for training models? Both for non-supervised algorithms like Association Rules and Clustering etc. and supervised algorithms like decision tree etc.
It will be much interesting to hear from you for any best practices and popular methods of dealing with this issue.
I am looking forward to hearing from you and thanks for your advices.
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
Here is how you get the check digit for EAN-8, EAN-13 and EAN-14.CREATE FUNCTION dbo.fnGetEAN ( @EAN VARCHAR(13) ) RETURNS VARCHAR(14) AS BEGIN DECLARE@Index TINYINT, @Multiplier TINYINT, @Sum TINYINT
Here is how you get the check digit for ISBN.CREATE FUNCTION dbo.fnGetISBN ( @ISBN VARCHAR(11) ) RETURNS VARCHAR(13) AS BEGIN DECLARE@Index TINYINT, @Weight TINYINT, @Sum SMALLINT
SELECT@Index = LEN(@ISBN), @Weight = 2, @Sum = 0
WHILE @Index > 0 BEGIN IF SUBSTRING(@ISBN, @Index, 1) <> '-' SELECT@Sum = @Sum + @Weight * CAST(SUBSTRING(@ISBN, @Index, 1) AS TINYINT), @Weight = @Weight + 1
SET @Index = @Index - 1 END
RETURNCASE @Sum % 11 WHEN 1 THEN @ISBN + '-X' ELSE @ISBN + '-' + CONVERT(CHAR(1), 11 - (@Sum % 11)) END END E 12°55'05.25" N 56°04'39.16"
Hi, I have just run a simple data set through a model to predict a simple true or false value (i.e. binary output) The Lift Chart/Mining Legend in Analysis Services shows three results €“ Score, Population Correct (%), and Predict Probability (%)
Population Correct I beleive is the percentage of predictions it got right out of the total number of predictions it tried to make. Is this correct?
However, I can€™t work out how the other two are derived in particular the 'SCORE'. To give a live example the scores were as follows:
Model Score Pop Correct Pred Probability Decision Trees 0.83 76.59% 54.28% Neural Network 0.75 67.63% 50.05% Ideal Model 100.00%
Can anyone help with this and give a detailed explanation?
I am trying to model data in analysis services with the Advance Create Mining Model function in the excel addin. I am having trouble creating an association model that works like the Associate button above the Advanced button.
The format of my data is like this
OrderID Product
100 Bike
100 Helmet
100 Shoes
200 Helmet
200 basketball
200 Bat
300 Shoes
300 Socks
The associate button works perfectly since it asks me which column is the transaction id (orderid) and which column I am trying to predict (product). The advanced create mining model asks me to determine what the columns are...
OrderID=key Product=Input+Predict?
When I run the advance create mining model associate, I get a browser that gives me no rules and the support for only one item itemset (each product but no combination of products).
Does anyone know what I have to do to get it to work like the associate button?
I am in the stage of design for an application that uses SQL server 2005. We intended to encrypt some sensitve data using the encryption features in SQL server 2005. we will use symmetric key encryption. The question here is which symmetric encryption algorithm has the best performance? how much does the key size affect the perfromance? the data to be encrypted will be some lines of text equal to a word document. any ideas?
We're currently preparing for a project for a bank client of ours where we would be using SQL Server 2008's data mining capabilities.
In the context of the out-of-the-box algorithms of SQL08 does the algorithms Logistics, Clustering and Logistic Regression includes Dynamic Logistics, Dynamic Clustering and Dynamic Logistic Regression?
I would like to develop an application that can create Data Mining structures and a mining model in SQL Server 2005 with VB.NET. I tried the code from book Data Mining with SQL server 2005 in chapter 14 but did not work. Any good idea?
Thank you very much for your help. The errors that I can see in the code that you gave in your answer are the following and they are more or less the same as I had previously
I tried the code but initially I have encounter the following problems.
1. In any line that have the declaration As Server, As Database like in Public Function CreateDatabase(ByVal srv As Server, ByVal databaseName As String) As Database gives me the problem that type Database is not declared the same type Server is not declared and it does not give me any option.
2. In addition to that for As DataSource, As RelationalDataSource, As RelationalDataSourceView, As ScalarMiningStructureColumn, As DataSourceViewBinding, gives me the problem that type is not declared.
3. Finally in mc = New MiningModelColumn("Yearly income", Utils.GetSyntacticallyValidID("Yearly income", Type.GetType(MiningModelColumn))) is not accesible in this context because it is 'Private'. I have some more problems but I thing that by solving the above that I referred I will solve the rest.
I perform data mining on all products and a specific product category. Do I need to create 2 data source views, one for all products and the other one for the specific product category? Afterward, I run the Data Mining Wizard 2 times to create 2 mining structures. I also need to add the same mining model (e.g. Bayes, Cluster) to each of these mining structures. Is there any simple way to do it?
I just found that I am not able to view the accuracy chart for my mining model. The error message is: no mining models are selected for comparision. Which is quite strange.
Is it possible to use two algorithms together?I need to write prediction Query so that its should both models having clustereing algorithm and timeseries algorithm.
for example
I am having student information.I ve to predict performance of students for certain period.The students should be classified by their types like rich kids,poorkids..like that.I need to predict the performance of the rich kids??
In the info for SQL Compact Edition 3.5 it states that one of the features is:
Support for newer and more secure encryption algorithms.
I can't seem to find details of exactly what these new, more secure algorithms are. It appears 3.1 used 128-bit RSA. Is this the same in 3.5, or has this changed?
I was just wondering on what algorithms was used in the different feature buttons in the Excel Add-In? For example, forecast, what algorithm was used in order for it to create the forcasting?
I am having a question about plug-in algorithms in SQL Server 2005. Since we are able to implement our own algorithms in SQL Server 2005 analysis services architecture, so my question is: what benefits can to a great extent be achieved? Like say, we are going to implement a plug-in algorithm, so what considerations should be concerned?
Thanks a lot in advance for any guidance and help.
Hi, I have a project that has been given to me and need help please. The complete class is as follows Public Class CarAccessData Public Function Getcarinfo() As List(Of CarInfo) Dim AllCarInfo As New List(Of CarInfo) Dim SQL As String SQL = "SELECT [SKU], [CarMake], [CarModel], [Carprice]" Dim MyConnection As SqlConnection MyConnection = New SqlConnection(ConfigurationManager.ConnectionStrings("AntConnectionString1").ConnectionString) Dim aCmd As SqlCommand aCmd = MyConnection.CreateCommand aCmd.CommandText = SQL aCmd.CommandType = CommandType.Text Dim aDataReader As SqlDataReader Try MyConnection.Open() aDataReader = aCmd.ExecuteReader While aDataReader.Read() AllCarInfo.Add(New CarInfo(aDataReader)) End While Catch ex As Exception Throw ex Finally aDataReader.Close() MyConnection.Close() End Try Return AllCarInfo End Function End Class
The bit I dont quite understand is the following snippet from above
Dim aCmd As SqlCommand aCmd = MyConnection.CreateCommand aCmd.CommandText = SQL aCmd.CommandType = CommandType.Text Dim aDataReader As SqlDataReader Try MyConnection.Open() aDataReader = aCmd.ExecuteReader While aDataReader.Read() AllCarInfo.Add(New CarInfo(aDataReader)) End While Catch ex As Exception Throw ex Finally
Can anyone explain this in real ABC style step by step please, just so I can start to understand this, ( I am quite new to this) many thanks, Anteater
the statement if ((columns_updated() & 2 + 4 + 8)) > 0) is supposed to tell me if the 2nd, 3rd or 5th columns were updated. My question is, what desginates the column 2,3,5, when 2,4,8 are in the statement
I am having some problems with transactions, although it seems that the problem exists within the vb code making the db call it may be that sql is the source of the problem.
1. There is a number of stored procs that contain transactions most of which are inserts followed by a select statement to retrieve the most recently added ID. So to start with is a select the best way or is the @@Identity. I have read the @@Identity is global, so for a external server that is running a number of databases I stayed away from it. Did I take the right actions or is there a better way?
2. Is having sql transactions within a call from vb.net ok? I know that the sqlClient class doesn't support nested transactions, but does that include transactions within sql?
3. if a stored proc is called, from vb contained in a sqlClient transaction, are ALL the calls from within the sproc able to be rolled back?
If there is a reliable way to obtain the Identity without containing it in a transaction, and having the internal sql transactions in the problem, then I am home free, so I am hoping this is the case.
I am wondering what are od1 and od2 used for in SQL Queries? Are they used for joining... can anyone explain their significance in the queries below... (especially the commands in red)
USE Northwind SELECT OrderID, CustomerID FROM Orders o WHERE 20 < (SELECT Quantity FROM [Order Details] od WHERE o.OrderID = od.OrderID AND od.ProductID = 23)USE Northwind SELECT DISTINCT ProductName, Quantity FROM [Order Details] od1 JOIN Products p ON od1.ProductID = p.ProductID WHERE Quantity = (SELECT MAX(Quantity) FROM [Order Details] od2 WHERE od1.ProductID = od2.ProductID)
In T sql for sql server, what is the technical difference between thecomparisons "is" and "="for example:set @test = nullprint @test is null -> trueprint @test = null -> false
Ok, here is a asample table representing the problem more clearlyA | B | C | D-----------------a1 b1 c1 d1a1 b2 c2 d2a3 b3 c1 d3a4 b4 c4 d3a5 b5 c5 d5a6 b6 c6 d3Tha duplications are:row 1+2 in param Arow 1+3 in param Crow 3+4+6 in param Donly row 5 is unique in all parameters.conclusion: row 1+2+3+4+6 are the same usergoal: to find all duplicated rows & to delete them all accept oneinstance to leave.Note:Finding that row 1similar to 2 in A & deleting it will loose databecause we won't know that row 1 is ALSO similar to 3 on C & later onfinding that 3 is similar to 4 & 6 on D & so onThe simple time consuming (about 2 weaks) query to acomplish the taskis:SELECT count(*),A.B,C,DFROM tblGROUP BY A,B,C,DHAVING count(*)>1I THANK YOU ALL
What do each of these files contain? I can figure out that the main database is the DB_Data.DAT, but why is the transaction log a .DAT and why is there four files instead of two? etceterea.
I went to Microsoft to find some info about the function Instr. I need to perform a search with a string similar to their example I found below. Can anyone explain to me Microsoft's example?? I am little confused by the parameters used and the explanation it gives back to me??
Dim SearchString, SearchChar, MyPos SearchString ="XXpXXpXXPXXP" ' String to search in. SearchChar = "P" ' Search for "P". MyPos = Instr(4, SearchString, SearchChar, 1) ' A textual comparison starting at position 4. Returns 6. MyPos = Instr(1, SearchString, SearchChar, 0) ' A binary comparison starting at position 1. Returns 9. MyPos = Instr(SearchString, SearchChar) ' Comparison is binary by default (last argument is omitted). Returns 9. MyPos = Instr(1, SearchString, "W") ' A binary comparison starting at position 1. Returns 0 ("W" is not found).
My problem is this:
I need to scan within SearchString for blanks/spaces characters. When I find one, then place the values to the left and right of it in seperate columns. For example, I would need to scan 'John Smith A' and then place 'John' in FirstName column, 'Smith' in LastName column, and 'A' in MidName column.
I think this is how my code would read, but I am confused on how to place the results into my table to the correct columns?
my search string would be SearchString = 'John Smith A' my SearchChar would be SearchChar = ' ' (note I am searching for a space/blank character)
So would then my code be like:
Dim SearchString, SearchChar, MyPos SearchString = 'John Smith A' SearchChar = ' ' MyPos = Instr(1, SearchString, SearchChar, 0)
How do I get whatever is returned from the Instr function to a column in a table??