I need to develop a Probit Regression Plug-In Algorithm.
Does anyone know if the plug-in framework will reasonably handle a Probit Regression?
Is anyone aware of any code or materials, specific to a Probit Regression Plug-in, that would help me to do this?
I am also interested in applying the dprobit methodology found in Stata for infinitesimal changes in independent variables.
Has anyone been successful using Stata to implement an SSAS plug-in algorithm?
i'm making my master thesis about a new plug-in algorithm, with the LVQ Algorithm. I make the tutorial with the pair_wise_linear_regression algorithm and i have some doubts. i was searching for the code of the algorithm in the files of the tutorial and i didn't saw it. I have my new algorithm programmed in C++ ready to attach him, but i don't know where to put him, in which file i have to put him to start to define the COM interfaces? And in which file is the code of the pair_wise_linear_regression algorithm in the SRC paste of the tutorial?
I read that it is possible to create a custom algorithm and use it as a plug in to sql server 2005. What programming language are available for this purpose ? C++ only ? Can I use .net ?
for an internship i am writing a data mining plug-in algorithm in SSAS in C#. My algorithm is a subgroup discovery algorithm and for determining the quality of the the discoverd rules/ patterns, i need to know what the support is of the rules.
The rules are of the form (a = x AND b < y THEN c = z). I managed to obtain some statistics by calling MarginalStatistics.getCasesCount(..,..). But I would like more functionality.
I want to evaluate the rule (column1 = 1 AND column2 = 2 THEN column3 > 0). The result should be 2. Now is my question, how do i get the support of my rule in my in C# written algorithm?
Thanks in advance, Joris Valkonet jorisv@avanade.com
Could please anyone here help me for this problem?
My problem is: I have registered my plug-in algortihm with SQL Server 2005 analysis services, and I can see my plug-in algortihm added to the analysis service configuration file (msmdsrv.ini). But why I can not see my algorithm appearing in the list of algorithms when I tested it? Really need help for that.
//========================= end code =======================
I succeded making room for _vCAttStats vector, but when I tried providing room for the vectors of the vector I got an Assertion failed error (file dmhallocator.h Line:56 Expression assert(_dmhalloc._spidmmemoryallocator != NULL)). Please, see the code below, included in NAVIGATOR::GetNodeArrayProperty function:
//========================= begin code =======================
managed plug-in framework that's available for download here: http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB&displaylang=en#DMAPI.
This package includes the source code for a sample plug-in algorithm written in C#.
in this source code all .cs files are modified for clustering algorithm
if my plugin algorithm is of association or classification type then what modifications are requried in source code???
This question is regarding the LogRegHelper - "A scorecard for Logistic Regression models" example in sqlserverdatamining Tips and Tricks page. I launched TestLogReg (Analysis Services Database associated with the project) and ran Logistic Regression over that. While the LogReg shows the highest score for IQ (107 - 121), a score of 558, the Logistic Regression shows that Parent Encouragement has the highest score for the case College Plans = 'Plans to Attend'. Can someone verify this and clarify?
I have a few other questions with LR
- In SQL Server 2005 LR Mining Model Viewer "favors" chart, what algorithm is used for generating Scores?
- Can I use this score as a feature selector? Higher score => stronger predictor (input)
- Is the coefficient weight algorithm used in LogReg wrong ?
I have installed the plug-in on two different virtual machines. On one, everything works fine, but on the other, the plug-in does not seem to load when Excel starts. I can see the plug-in the "Inactive Application Add-ins" section of the Excel Option Add-Ins panel, but when I check the two Add-in options in the "Manage Add-ins" dialog, they do not move up to the "Active" section. Stopping/starting Excel does not resolve the issue.
Any ideas on how to get Excel to make the plug-ins active?
I'm in trouble with my Plug-in Algorithm while filling out my model rowsets.
How to return ATTRIBUTE_NAME through GetNodeProperty if does not exist an ID to do it? Also it appears that no call is made for retrieve such node property.
In SQL Express SP2, when I select Tools > Options, there is a place where I am supposed to be able to specify the source control plug-in. I have SourceSafe 2005 installed on this machine, so I see these choices in the drop-down:
None
Microsoft Visual SourceSafe
Microsoft Visual SourceSafe (Internet)
The problem is that whenever I select one of the SourceSafe options, it goes back to "None".
I'm not even sure how the source control intergration works, but I figured I have to select the plug-in before doing anything else.
How can I select the SourceSafe plug-in under SQL Express SP2?
I am having a question about plug-in algorithms in SQL Server 2005. Since we are able to implement our own algorithms in SQL Server 2005 analysis services architecture, so my question is: what benefits can to a great extent be achieved? Like say, we are going to implement a plug-in algorithm, so what considerations should be concerned?
Thanks a lot in advance for any guidance and help.
I am having a more considertaion about Data Mining plug-in algorithms. When we say we are going to embed a uesr plug-in algorithm, so what is the context for that ? I mean in which case then we thing we need to embed a user plug-in algortihm? I know when we say we are going to embed a user costomermized plug-in algorithm, it means we want something more costomized. But what kind of customized features are generally concerned? Is it independant for different market sectors?
I dont think we can just try to embed a plug-in algorithm then compete it with avaialble algorithms to see which one is with better prediction accuracy?
Would please someone here give me some guidances about that?
I need to write some SQL to do a power regression for a trendline. I have 2 columns of data which represent my X, Y data and all I'm after is the a and the b for the function y=ax^b. Has anyone ran into this before?? I know SSAS has a linear regression function but my data really only fits the power model.
[using: Reporting Services 2005, SQL Server 2005, Analysis Services 2005]
Has anyone ever implemented dynamic trendlines with RS charts?
I have a requirement to create a web-based chart based on an existing Excel chart that the client is already using. This chart uses a trendline to forecast performance for 3 months out. I know in Excel it's as easy as right-click->add trendline.
Is there a similarly simple way to do this in Reporting Services? Also, the data source for this is OLAP, so if any of you are MDX gurus, is there some regression function to plot all the parallel axis points?
This is a real challenge. I hope someone is smart enough to know howto do this.I have a tableTABLE1[Column 1- 2001][Column 2- 2002][Column 3- 2003][Column 4 - 2004][Column 5 - 2005][Column 6 - 2006][Column 7 - Slope][2001][2002][2003][2004][2005][2006] [Slope][1] [2] [3] [4] [5] [6] [1][1.2] [.9] [4] [5] [5.4] [6.2] [?]Slope is defined as "M" in the equation y=mx+bI need a way a finding the linear equation that best fits the points soI can have SQL calculate the slope.Are there any smart people around that would know how to do this?thanks
I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.
I think that it is different from the common methods used in statistics like stepwise, forward or backward.
I have two questions about the regression tree of Microsoft Decision Trees algorithm.
1. The mining legend window has a column named Histogram showing a bar for each coefficient. What does this bar mean? 2. Since each node of a regression tree corresponds to a linear regression, how can I find the regression coefficient of each node? I mean the coefficient that tells how good the regression is.
Hi there, We need to determine the prediction formula coefficients using the multivariate regression formula as is available in Excel AnalysisTool pack [something like Y = Ax + Bz + C and find A, B, C]. It would be a very "simple" type of analysis that would run on a single table. There does not seem to be an easy built-in SQL function to perform this. However, reading on the web, Analysis Services might be used to do this task? Is there a good sample for a multivariate regression?
Actually, is this a proper approach given the relative simplicity of the calculation? Do we really need to go through the trouble of setting up an Analysis Service solution just for this task?
The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)
When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?
However, now we face a different challenge. Running the same data through the SSAS Linear Regression model and the Excel Regression [Data Analysis] tool we get different answers:
Intercept -3.57537
x 0.242462
z 0.353668 SSAS: Intercept -2.95188545928199 x 0.201587406861264
z 0.371940525462092
In Excel we set up the Regression analysis using the 95% confidence interval. Is there a concept for confidence interval for linear regression in SSAS?. Since we are doing this for a company that has been using Excel for years, I do not think such a difference in results will be accepted...
Is there anything else we can do to ensure the answers are close? Must we then have to work around and call these calculations from Excel?
I am trying to create a model using microsoft Linear Regression algorithm. But I want to constrain the coefficient of the parameters to non-negative value. There is concept of bound in SAS where we can specify the range of the coefficient. Does any of the SSAS mining algorithms support restricting the coefficient value?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
We are seeing a regression bug with the Microsoft JDBC driver 1.2 CTP.
Using this driver, we don't seem to be able to call stored procedures which return a result set, if those stored procedures use temporary tables internally.
The 1.2 CTP driver fails to access such stored procedures in both SQL Server 2000 and SQL Server 2005 databases. The previous 1.1 driver, suceeds in both cases.
Here is a test case which demonstrates the problem (with IP addresses and logins omitted). The prDummy stored procedure being called is quite simple, and I've copied it below:
Code Snippet
public class MicrosoftJDBCDriverCallingStoredProceduresTest extends TestCase {
// CREATE PROCEDURE [dbo].[prDummy] // AS // // CREATE TABLE #MyTempTable ( // someid BIGINT NOT NULL PRIMARY KEY, // userid BIGINT, // ) // // SELECT 1 as TEST2, 2 as TEST2 // GO
public void testStoredProcedureViaDirectJDBC() { Connection conn = null; String driverInfo = "<unknown>"; String dbInfo = "<unknown>"; try { // Set up driver & DB login... Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver"); String connectionUrl = "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433"; Properties dbProps = new Properties(); dbProps.put("databaseName", "xxxxxx"); dbProps.put("user", "xxxxxx"); dbProps.put("password", "xxxxxx"); // Get a connection... conn = DriverManager.getConnection(connectionUrl, dbProps); driverInfo = conn.getMetaData().getDriverName() + " v" + conn.getMetaData().getDriverVersion(); dbInfo = conn.getMetaData().getDatabaseProductName() + " v" + conn.getMetaData().getDatabaseProductVersion(); // Perform the test... CallableStatement cs = conn.prepareCall("{CALL prDummy()}"); cs.executeQuery(); // If the previous line executes okay, the test is passed... System.out.println("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" calls the stored procedure successfully."); } catch (Exception e) { // Fail the unit test... fail("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" fails to call the stored procedure: " + e.getMessage()); } finally { // Close the connection... try { if (conn != null) conn.close(); } catch (Exception ignore) { } } } } The output of this test under both drivers and accessing both databases is as follows:
Code Snippet
Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.
Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.
Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.
Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.
How do I write a regression test for a stored proc that produces multiple rowsets via multipl e select queries? E.g. CREATE PROCEDURE myProc AS SELECT 'Some stuff', GETDATE() SELECT 'Some more stuff'
For single-select procs, I can create a temp table and INSERT #temp EXEC myProc, then evaluate the contents of the table to verify correct behavior, but that doesn't work in this case.