Multivariate Regression Differences Between Excel And SSAS
Oct 18, 2007
That solved the application problem
However, now we face a different challenge. Running the same data through the SSAS Linear Regression model and the Excel Regression [Data Analysis] tool we get different answers:
Intercept
-3.57537
x
0.242462
z
0.353668
SSAS:
Intercept -2.95188545928199
x 0.201587406861264
z 0.371940525462092
In Excel we set up the Regression analysis using the 95% confidence interval. Is there a concept for confidence interval for linear regression in SSAS?. Since we are doing this for a company that has been using Excel for years, I do not think such a difference in results will be accepted...
Is there anything else we can do to ensure the answers are close? Must we then have to work around and call these calculations from Excel?
Hi there, We need to determine the prediction formula coefficients using the multivariate regression formula as is available in Excel AnalysisTool pack [something like Y = Ax + Bz + C and find A, B, C]. It would be a very "simple" type of analysis that would run on a single table. There does not seem to be an easy built-in SQL function to perform this. However, reading on the web, Analysis Services might be used to do this task? Is there a good sample for a multivariate regression?
Actually, is this a proper approach given the relative simplicity of the calculation? Do we really need to go through the trouble of setting up an Analysis Service solution just for this task?
This question is regarding the LogRegHelper - "A scorecard for Logistic Regression models" example in sqlserverdatamining Tips and Tricks page. I launched TestLogReg (Analysis Services Database associated with the project) and ran Logistic Regression over that. While the LogReg shows the highest score for IQ (107 - 121), a score of 558, the Logistic Regression shows that Parent Encouragement has the highest score for the case College Plans = 'Plans to Attend'. Can someone verify this and clarify?
I have a few other questions with LR
- In SQL Server 2005 LR Mining Model Viewer "favors" chart, what algorithm is used for generating Scores?
- Can I use this score as a feature selector? Higher score => stronger predictor (input)
- Is the coefficient weight algorithm used in LogReg wrong ?
When I try to connect from excel , to SSAS Getting error message like
A Connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond.
If I connect to SSIS , I'm able to connect correctly.
Why I'm getting this error and how to overcome this?
I have an SSAS cube that i want to add some actions to. I've had problems adding a reporting action to the cube so decided just to add a URL action instead. Start simple and build on the concepts...
So i add a new action, give it a name, set the Target Type to Cells, Taget object to All Cells. I've put no condition on the action since i want it to appear all the time.
The action content type is set to URL the action expression is set to [URL] I've also set a caption of "Google" under the additional properties and said that the caption is MDX (I'm aware that it isn't but i do intend to expand on this...).
I then build and deploy my cube, call up excel (2010) and then create a pivot table off the back of the cube but when i right-click the cells in the pivot table and go to "additional actions" it tells me that there are none specified.
I am working with SSAS Tabular. I have a stand alone table with 60 columns and contains 120K records. Table size is 250MB. And trying to build a tabular report out of it and it is taking longer and throwing exception, screenshot attached.
It might be cross-join issues, as workaround created a dummy measure and using in report. But it working for 10-20 k records and beyond throwing same exception. I have 8 GB RAM and 100 GB free disk space.
Im trying to pull a workbook which has a power pivot into SSAS using a remote tabular instance through visual studio 2013. However Im getting an error saying, " We cannot import the workbook XXXXX.xlsx. Try placing the work book on a server that the service account of DBNAME/Tabular has permission to read and that can be reached with a UNC path (//<server>/<shared>/<file>)". Here DBNAME is the server name and Tabular is the instance name.
I tried changing the logon name in SQL server analysis services(Tabular) in Services.msc Microsoft console.Also tried changing the logon to local system. Still the error persists.
I have a Tabular data model and I'm returning a measure that counts employees (each row is an employee) and then a calculated column in the model that gets SeniorityInMonths.
So if an employee was hired exactly 1 year ago, they would have 12 in this column.
I want to group these into bins, but the Group option is grayed out.
I'm exploring creating local cube files, .cub from an excel sheet with tables.Would SSAS be able to create one cube from taking data from a 1 way table (A, B, C), a 2 way table (AxB, BxC, AxC) and a 3 way table (AxBxC)?
The all-level of dimensions doesn't show up in the PivotTable Field List? I have reports where I want to show one member of a dimensions compared to the total of the dimension (and not the total of the members shown). But I can't select the ALL-level. Is there any way to do this?
I have connected to SQL SERVICES ANALYSIS SERVICE database through excel and when I observed that value of the date attribute is displayed as ######## in the excel for 1/1/1753.
I am able to see the value 1/1/1753 in the Cube browser but not able to see the vale in the excel.
how to replace this value with blank in the excel.
I€™m having a problem with Excel 2007 DM and SQL 2005 and I hope someone out there has a solution.
Consider the following environment:
Windows XP SP2 or Windows Vista, Excel 2007, Data Mining Add-in, SSAS 2005 (with session mining models enabled, an AdventureWorksDW cube deployed and drill-through actions available).
Now take the following steps:
1. In Excel 2007 set up a connection to SSAS
2. Connect to the cube and create a new pivot table report (drag and drop whatever you like)
3. Right-click on one of the cell values in the data region and either select a drill-through action, or, select Show Details in the context menu
4. Ensure that you have at least 10 detailed records that are generated on a new worksheet page; you should have a time-based column in your detailed records
5. Select the table of detailed data, then select the Analyze tab (within the Table Tools grouping) which appears in the topmost menu above the ribbon
6. Click the Forecast button in the ribbon and choose both the field which you want to predict as well as the time-based column (from step 4) as well as the number of time periods to forecast
7. Finally click OK.
1. Having followed these steps on both WinXP SP2 and Vista, I keep coming across the exception: HResult:0x800A03EC. Any ideas as to why this exception pops up? If I was using a normal table of data (which was not generated from a Show Details or drill-through action), then the Forecast button works fine.
I googled it and thought the localization settings for SSAS 2005 and Excel 2007 needed to be the same (initially they weren€™t). I€™ve tried removing the auto-filters which appear atop each column in the detailed data table prior to clicking the Forecast button, and, I€™ve also tested for a series of data across a number of time periods with the same result.
Also, a colleague of mine discovered that the column headers that appear by default from a drill-through start with "$[", and, in removing them the Forecast function appears to work.
I would have thought there would be a seamless transition in Excel 2007 between data retrieved from a cube and the DM Add-in featueres (or at the very least, a more meaningful exception message than the one presented).
Is there something I€™ve missed, or, is there a KB article I haven€™t come across yet? As I know for a fact that the problem is reproducible, is there a fix to this problem on its way to us? Is there a useful workaround that doesn't require manual intervention?
Is there any way to track the information about the  connections to SSAS cubes through local  Excel files (BI usage).
OPreviously, we are tracing the information about the BI usage through the BI SharePoint site. Now we want track the users who are using through local excel files .
I have a very small SSAS database with around 35 Mb. I opened it on Excel 32 bits and started dragging fields to a pivot table and it started failing with memory errors. The behavior on the SSAS server was that memory started growing very fast until 8 GB (vm memory total) and then the error is reported in excel.
What might be the issue in such a small database? I would understand in a big database, but not on this one.
I need to write some SQL to do a power regression for a trendline. I have 2 columns of data which represent my X, Y data and all I'm after is the a and the b for the function y=ax^b. Has anyone ran into this before?? I know SSAS has a linear regression function but my data really only fits the power model.
[using: Reporting Services 2005, SQL Server 2005, Analysis Services 2005]
Has anyone ever implemented dynamic trendlines with RS charts?
I have a requirement to create a web-based chart based on an existing Excel chart that the client is already using. This chart uses a trendline to forecast performance for 3 months out. I know in Excel it's as easy as right-click->add trendline.
Is there a similarly simple way to do this in Reporting Services? Also, the data source for this is OLAP, so if any of you are MDX gurus, is there some regression function to plot all the parallel axis points?
This is a real challenge. I hope someone is smart enough to know howto do this.I have a tableTABLE1[Column 1- 2001][Column 2- 2002][Column 3- 2003][Column 4 - 2004][Column 5 - 2005][Column 6 - 2006][Column 7 - Slope][2001][2002][2003][2004][2005][2006] [Slope][1] [2] [3] [4] [5] [6] [1][1.2] [.9] [4] [5] [5.4] [6.2] [?]Slope is defined as "M" in the equation y=mx+bI need a way a finding the linear equation that best fits the points soI can have SQL calculate the slope.Are there any smart people around that would know how to do this?thanks
I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.
I think that it is different from the common methods used in statistics like stepwise, forward or backward.
I have two questions about the regression tree of Microsoft Decision Trees algorithm.
1. The mining legend window has a column named Histogram showing a bar for each coefficient. What does this bar mean? 2. Since each node of a regression tree corresponds to a linear regression, how can I find the regression coefficient of each node? I mean the coefficient that tells how good the regression is.
I need to develop a Probit Regression Plug-In Algorithm. Does anyone know if the plug-in framework will reasonably handle a Probit Regression? Is anyone aware of any code or materials, specific to a Probit Regression Plug-in, that would help me to do this? I am also interested in applying the dprobit methodology found in Stata for infinitesimal changes in independent variables. Has anyone been successful using Stata to implement an SSAS plug-in algorithm?
The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)
When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?
I am trying to create a model using microsoft Linear Regression algorithm. But I want to constrain the coefficient of the parameters to non-negative value. There is concept of bound in SAS where we can specify the range of the coefficient. Does any of the SSAS mining algorithms support restricting the coefficient value?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
We are seeing a regression bug with the Microsoft JDBC driver 1.2 CTP.
Using this driver, we don't seem to be able to call stored procedures which return a result set, if those stored procedures use temporary tables internally.
The 1.2 CTP driver fails to access such stored procedures in both SQL Server 2000 and SQL Server 2005 databases. The previous 1.1 driver, suceeds in both cases.
Here is a test case which demonstrates the problem (with IP addresses and logins omitted). The prDummy stored procedure being called is quite simple, and I've copied it below:
Code Snippet
public class MicrosoftJDBCDriverCallingStoredProceduresTest extends TestCase {
// CREATE PROCEDURE [dbo].[prDummy] // AS // // CREATE TABLE #MyTempTable ( // someid BIGINT NOT NULL PRIMARY KEY, // userid BIGINT, // ) // // SELECT 1 as TEST2, 2 as TEST2 // GO
public void testStoredProcedureViaDirectJDBC() { Connection conn = null; String driverInfo = "<unknown>"; String dbInfo = "<unknown>"; try { // Set up driver & DB login... Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver"); String connectionUrl = "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433"; Properties dbProps = new Properties(); dbProps.put("databaseName", "xxxxxx"); dbProps.put("user", "xxxxxx"); dbProps.put("password", "xxxxxx"); // Get a connection... conn = DriverManager.getConnection(connectionUrl, dbProps); driverInfo = conn.getMetaData().getDriverName() + " v" + conn.getMetaData().getDriverVersion(); dbInfo = conn.getMetaData().getDatabaseProductName() + " v" + conn.getMetaData().getDatabaseProductVersion(); // Perform the test... CallableStatement cs = conn.prepareCall("{CALL prDummy()}"); cs.executeQuery(); // If the previous line executes okay, the test is passed... System.out.println("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" calls the stored procedure successfully."); } catch (Exception e) { // Fail the unit test... fail("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" fails to call the stored procedure: " + e.getMessage()); } finally { // Close the connection... try { if (conn != null) conn.close(); } catch (Exception ignore) { } } } } The output of this test under both drivers and accessing both databases is as follows:
Code Snippet
Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.
Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.
Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.
Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.
How do I write a regression test for a stored proc that produces multiple rowsets via multipl e select queries? E.g. CREATE PROCEDURE myProc AS SELECT 'Some stuff', GETDATE() SELECT 'Some more stuff'
For single-select procs, I can create a temp table and INSERT #temp EXEC myProc, then evaluate the contents of the table to verify correct behavior, but that doesn't work in this case.
With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:
1. How should these fields be interpreted?
2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?
I bought the book €œData Mining with SQL Server 2005€?, but I can€™t find the solution to a problem I have.
I want to retrieve from C# the logistic regression Attribute Value (AV) Scores for the Logistic Regression Algorithm. I can see the Scores from the Microsoft Logistic Regression Viewer (the same of Neural Network Viewer), but I cannot retrieve them via DMX, OLEDB or similar.
Otherwise, is there a formula that I can use to compute that score from the coefficient, support, or probability values of the Attribute Value pair (I can read this values from DMX)? I can access to them via DMX:
NODE_DISTRIBUTION -> SUPPORT and PROBABILITY ATTRIBUTE_VALUE...
with a query like
SELECT FLATTENED (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE FROM NODE_DISTRIBUTION WHERE VALUETYPE = ... ) FROM [MyModel].CONTENT WHERE NODE_TYPE ....