Could please anyone here help me for this problem?
My problem is: I have registered my plug-in algortihm with SQL Server 2005 analysis services, and I can see my plug-in algortihm added to the analysis service configuration file (msmdsrv.ini). But why I can not see my algorithm appearing in the list of algorithms when I tested it? Really need help for that.
managed plug-in framework that's available for download here: http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB&displaylang=en#DMAPI.
This package includes the source code for a sample plug-in algorithm written in C#.
in this source code all .cs files are modified for clustering algorithm
if my plugin algorithm is of association or classification type then what modifications are requried in source code???
i'm making my master thesis about a new plug-in algorithm, with the LVQ Algorithm. I make the tutorial with the pair_wise_linear_regression algorithm and i have some doubts. i was searching for the code of the algorithm in the files of the tutorial and i didn't saw it. I have my new algorithm programmed in C++ ready to attach him, but i don't know where to put him, in which file i have to put him to start to define the COM interfaces? And in which file is the code of the pair_wise_linear_regression algorithm in the SRC paste of the tutorial?
I read that it is possible to create a custom algorithm and use it as a plug in to sql server 2005. What programming language are available for this purpose ? C++ only ? Can I use .net ?
for an internship i am writing a data mining plug-in algorithm in SSAS in C#. My algorithm is a subgroup discovery algorithm and for determining the quality of the the discoverd rules/ patterns, i need to know what the support is of the rules.
The rules are of the form (a = x AND b < y THEN c = z). I managed to obtain some statistics by calling MarginalStatistics.getCasesCount(..,..). But I would like more functionality.
I want to evaluate the rule (column1 = 1 AND column2 = 2 THEN column3 > 0). The result should be 2. Now is my question, how do i get the support of my rule in my in C# written algorithm?
Thanks in advance, Joris Valkonet jorisv@avanade.com
I need to develop a Probit Regression Plug-In Algorithm. Does anyone know if the plug-in framework will reasonably handle a Probit Regression? Is anyone aware of any code or materials, specific to a Probit Regression Plug-in, that would help me to do this? I am also interested in applying the dprobit methodology found in Stata for infinitesimal changes in independent variables. Has anyone been successful using Stata to implement an SSAS plug-in algorithm?
//========================= end code =======================
I succeded making room for _vCAttStats vector, but when I tried providing room for the vectors of the vector I got an Assertion failed error (file dmhallocator.h Line:56 Expression assert(_dmhalloc._spidmmemoryallocator != NULL)). Please, see the code below, included in NAVIGATOR::GetNodeArrayProperty function:
//========================= begin code =======================
I am having a question about plug-in algorithms in SQL Server 2005. Since we are able to implement our own algorithms in SQL Server 2005 analysis services architecture, so my question is: what benefits can to a great extent be achieved? Like say, we are going to implement a plug-in algorithm, so what considerations should be concerned?
Thanks a lot in advance for any guidance and help.
I am working on academic project using SQL Server 2005 & Visual studio 2005. Using Apriori algorithm to find the association between Patient City and likely diseases.
I have created PATIENT table with Patient_Id, Patient_name, Age, City attributes and Diseases table with Disease_Id, Disease_name. Connected these two tables, MANY - MANY [M:N]. Got a third relation with Patient_Id and Disease_Id attributes.
I am just inputting some dummy data into patient table and disease tables to make Apriori algorithm work. When a new Patient City is entered into patient table, System checks Patient table for same City previously stored and using Third relation, pulls Disease that associated with the City.
Here are my tables with attributes:
PATIENT ( Patient_Id, Patient_name, Age, City)
Diseases(Disease_Id, Disease_name)
[M:N] Got third below third relation bcz its Many to Many relationship
PATIENT_DISEASES(Patient_Id, Disease_Id)
I do think and believe that there is an efficient way of doing , instead of usin dummy data or using this relationships. I did check Microsoft Association algorithm and realised it is not Apriori algorithm.
Could you suggest the best or efficient way of doing this using SQL Server 2005?
Your help and insight into this matter is highly appreciated.
First of all I would like to politely greet everybody as I'm new on that forum and new to Data Mining in fact.
To introduce myself I can say I'm a student of Computer Science and I'm trying to use Time Series algorithm for weather analysis. I know that forecasting weather is a hopeless task even for the fastest computers in the world but what I'm trying to do is a kind of aposteriori analysis of historical data to notice some dependencies or characteristic weather behavior on a specified region and perhaps make some short time predictions.
I tried Time Series Algorithm although I have some doubts about methodological justification of this choice (if You have any critical comments please share them with me). But my main questions are about the usage of the algorithm itself:
I've read the documentation and a tutorial on this page for historical predictions but I still don't know what exactly are HistoricalModelCount and HistoricalModelGap. I know that my historical predictions are bounded by a €“ HistoricalModelCount*HistoricalModelGap*, but it's a rather operational knowledge... The explanation is always clouded with an €œinternal model€? phrase. Can You point me to a document where I can find some more detailed information? (What is the form of the model? How is it built? etc.)
Periodicity Hint. How should I treat these optional values? Are they other possible periods of data? I have data about weather measurements made every six hours for thirteen years** so is it a good choice to set this parameter to {365*4,4} (The first goes for a year and the second for a day)?
This is a technical question and I'm really ashamed of myself that I bother You with it. On the time chart in a model Viewer I can see date from the last year only. Zooming out/in, clicking insanely on every pixel on the screen, did not give any result (apart of broken mouse buttons). Is is possible to browse that data in mining model viewer chart? Thank You in advance for Your replies!
*This formula suggests how this parameters could work but I would like to know it for sure €“ don't want to make some awful mistakes in my project. :-) **Of course I plan to reduce the amount of data but the period will stay.
The first question is how to of TimeSeries Algorithm?
Using SQL Server 2005 TimeSeries Algorithm ,I build a data mining model.But after three days,it is still training.The data has 2,200,00 rows.
So what can i do to improve the processing speed.
Thanks!
The second question is parameters in Data Mining Query Task.
Data Mining Query Task is used to get data from data mining model.In the mining model form, i choose a mining model . And in the query form,i wrote a dmx ,"select flattened top 100 predicttimeseries([Xssl],1) from [Time Series XSSL]".Last i choose a table that is for the data from mining model.
We are running SQL Server 7.0 SP2, and are experiencing the following out-of- space error message:
"Could not allocate new page for database 'FooBar'. There are no more pages available in filegroup SECONDARY. Space can be created by dropping objects, adding additional files, or allowing file growth."
Needless to say, but the the database is set for 10% unlimited autogrowth and there IS available space in the partition where the filegroup resides.
Any ideas as to why this is happening? What is SQL Server's algorithm for allocating space when growing a database? Must it satisfy the request in one 'extent' and the cause of our problem is that our disk is fragmented?
I have installed the plug-in on two different virtual machines. On one, everything works fine, but on the other, the plug-in does not seem to load when Excel starts. I can see the plug-in the "Inactive Application Add-ins" section of the Excel Option Add-Ins panel, but when I check the two Add-in options in the "Manage Add-ins" dialog, they do not move up to the "Active" section. Stopping/starting Excel does not resolve the issue.
Any ideas on how to get Excel to make the plug-ins active?
I'm in trouble with my Plug-in Algorithm while filling out my model rowsets.
How to return ATTRIBUTE_NAME through GetNodeProperty if does not exist an ID to do it? Also it appears that no call is made for retrieve such node property.
In SQL Express SP2, when I select Tools > Options, there is a place where I am supposed to be able to specify the source control plug-in. I have SourceSafe 2005 installed on this machine, so I see these choices in the drop-down:
None
Microsoft Visual SourceSafe
Microsoft Visual SourceSafe (Internet)
The problem is that whenever I select one of the SourceSafe options, it goes back to "None".
I'm not even sure how the source control intergration works, but I figured I have to select the plug-in before doing anything else.
How can I select the SourceSafe plug-in under SQL Express SP2?
I am having a more considertaion about Data Mining plug-in algorithms. When we say we are going to embed a uesr plug-in algorithm, so what is the context for that ? I mean in which case then we thing we need to embed a user plug-in algortihm? I know when we say we are going to embed a user costomermized plug-in algorithm, it means we want something more costomized. But what kind of customized features are generally concerned? Is it independant for different market sectors?
I dont think we can just try to embed a plug-in algorithm then compete it with avaialble algorithms to see which one is with better prediction accuracy?
Would please someone here give me some guidances about that?
We have an SSIS package that will be used for both our Test and Prod imports on the same server. The SSIS imports are identical expect that Test needs all connections pointing to the Test database while Prod need its connections pointing to the Prod database.
How can I change the connections, based on Test or Prod, used inside a single SSIS package? (I don't want to create two tweaked packages on the same server. If I find a bug in one of them, I have to correct it twice.)
I am new to DM and I am not sure which algorithm would be best to use.
I am trying to build a custom comparitor application that companies can use to compare themselves against other companies based on certain pieces of information. I need to group a company with 11 other companies based on 6 attributes. I need the ability to apply weightings to each of the 6 attributes and have those taken into consideration when determining which 10 other companies each company is grouped with. Each group must contain 11 members, the company for the user logged in and 10 other companies that it will be compared against.
At first I thought that clustering would be a good fit for this but I can not see a way to mandate that each cluster contain exactly 11 members, I cannot see a way to weight the inputs, and I think each company can only be in one cluster at a time which do not meet my requirements.
Well, i have read in claude seidman book about data mining that some algorithm inside in microsoft decision tree are CART, CHAID and C45 algorithm. could anyone explain to me about the tree algorithm and please explain to me how the tree algorithm used together in one case?
Hello,Do you know if the algorithm for the BINARY_CHECKSUM function in documentedsomewhere?I would like to use it to avoid returning some string fields from theserver.By returning only the checksum I could lookup the string in a hashtable andI think this could make the code more efficient on slow connections.Thanks in advanced and kind regards,Orly Junior
What kind of algorithm does the MAX command uses? I have a table that I need to get the last value of the Transaction ID and increment it by 1, so I can use it as the next TransID everytime I insert a new record into the table. I use the MAX command to obtain the last TransID in the table in this process. However, someone suggested that there is a problem with this, since if there are multiple users trying to insert a record into the same table, and processing is slow, they might essentially come up with the same next TransID. He came up with the idea of having a separate table that contains only the TransID and using this table to determine the next TransID. Will this really make a difference as far as processing speed is concerned or using a MAX command on the same table to come up with the next TransID enough? Do you have a better suggestion?
I have few questions regarding Clustering algorithm.
If I process the clustering model with Ks (K is number of clusters) from 2 to n how to find a measure of variation and loss of information in each model (any kind of measure)? (Purpose would be decision which K to take.)
Which clustering method is better to use when segmenting data K-means or EM?
I want to predict which product can be sold together , Pl help me out which algorithm is best either association, cluster or decision and pl let me know how to use case table and nested table my table structure is
hi, i am using sqlserver2005 as back end for my project. actually we developing an stand alone web application for client, so we need to host this application in his server. he is not willing to install sql server 2005 edition in his sever so we r going by placing .mdf file in data directory of project.
but before i developed in server2005 i used aes_256 algorithm to encrypt n decrypt the pwd column by using symmetric keys.it is working fine.
but when i took the .mdf file of project n add into my project it is throwing error at creation of symmetric key that "Either no algorithm has been specified or the bitlength and the algorithm specified for the key are not available in this installation of Windows."
Obviosly for Person1 and 200501 I expect to see on MS Time Series Viewer $3000, correct? Instead I see REVENUE(actual) - 200501 VALUE =XXX, Where XXX is absolutly different number.
Also there are negative numbers in forecast area which is not correct form business point Person1 who is tough guy tryed to shoot me. What I am doing wrong. Could you please give me an idea how to extract correct historical and predict information?