How Linear Regression Choose His Regressor ?
Apr 22, 2007
I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.
I think that it is different from the common methods used in statistics like stepwise, forward or backward.
Laura Lerner
View 8 Replies
ADVERTISEMENT
Jul 24, 2006
This is a real challenge. I hope someone is smart enough to know howto do this.I have a tableTABLE1[Column 1- 2001][Column 2- 2002][Column 3- 2003][Column 4 - 2004][Column 5 - 2005][Column 6 - 2006][Column 7 - Slope][2001][2002][2003][2004][2005][2006] [Slope][1] [2] [3] [4] [5] [6] [1][1.2] [.9] [4] [5] [5.4] [6.2] [?]Slope is defined as "M" in the equation y=mx+bI need a way a finding the linear equation that best fits the points soI can have SQL calculate the slope.Are there any smart people around that would know how to do this?thanks
View 3 Replies
View Related
Jan 22, 2007
We are trying to create a model of linear regression with nested table. We used the create mining model sintax as follow :
create mining model rate_plan3002_nested2
( CUST_cycle LONG KEY,
VOICE_CHARGES double CONTINUOUS predict,
DUR_PARTNER_GRP_1 double regressor CONTINUOUS ,
nested_taarif_time_3002 table
( CUST_cycle long CONTINUOUS,
TARIFF_TIME text key,
TARIFF_VOICE_DUR_ALL double regressor CONTINUOUS
)
) using microsoft_linear_regression
INSERT INTO MINING STRUCTURE [rate_plan3002_nested2_Structure]
(CUST_cycle ,
VOICE_CHARGES ,
DUR_PARTNER_GRP_1 ,
[nested_taarif_time_3002](SKIP,TARIFF_TIME ,TARIFF_VOICE_DUR_ALL)
)
SHAPE {
OPENQUERY([Cell],
'SELECT CUST_cycle ,
VOICE_CHARGES ,
DUR_PARTNER_GRP_1
FROM dbo.panel_anality_3002
order by CUST_cycle ')}
APPEND
({OPENQUERY([Cell],
'select CUST_cycle,
TARIFF_TIME,
CYCLE_DATE
from dbo.nested_taarif_time_3002
order by CUST_cycle,TARIFF_TIME')
}
relate CUST_cycle to CUST_cycle
) as nested_taarif_time_3002
The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)
Is there a way to do this regression correctly?
Thanks,
Dror
View 7 Replies
View Related
Sep 2, 2007
When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?
View 1 Replies
View Related
Jan 18, 2008
Hi,
I am trying to create a model using microsoft Linear Regression algorithm. But I want to constrain the coefficient of the parameters to non-negative value. There is concept of bound in SAS where we can specify the range of the coefficient. Does any of the SSAS mining algorithms support restricting the coefficient value?
Thanks,
DMN
View 3 Replies
View Related
Sep 18, 2006
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
y = constant + coeff1 * (x1 - avg_x1) + coeff2 * (x2 - avg_x2)
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
View 1 Replies
View Related
Sep 18, 2006
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
y = constant + coeff1 * (x1 - avg_x1) + coeff2 * (x2 - avg_x2)
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
View 3 Replies
View Related
Dec 19, 2006
With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:
1. How should these fields be interpreted?
2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?
Thanks for your help with this?
Sam
View 1 Replies
View Related
Jun 24, 2007
Hello,
This question is regarding the LogRegHelper - "A scorecard for Logistic Regression models" example in sqlserverdatamining Tips and Tricks page. I launched TestLogReg (Analysis Services Database associated with the project) and ran Logistic Regression over that. While the LogReg shows the highest score for IQ (107 - 121), a score of 558, the Logistic Regression shows that Parent Encouragement has the highest score for the case College Plans = 'Plans to Attend'. Can someone verify this and clarify?
I have a few other questions with LR
- In SQL Server 2005 LR Mining Model Viewer "favors" chart, what algorithm is used for generating Scores?
- Can I use this score as a feature selector? Higher score => stronger predictor (input)
- Is the coefficient weight algorithm used in LogReg wrong ?
Thanks
MA
View 1 Replies
View Related
Oct 10, 2006
I am trying to build a decision tree algorithm with multiple regressors. I am having trouble changing the properties for the input variables to regressors (the only multiple flag choice is "not null"). I created a mining model using the DMX query thinking I could change the code to include multiple regressors, however I get a syntax error when trying to execute the model. Also, I am wondering if you can view the dmx code for a model built using the mining wizard.
My query is as follows:
CREATE MINING MODEL BVRSlotq1052
(Machine Long Key,
[Banksize] LONG CONTINUOUS REGRESSOR ,
[Cabinet] Text Discrete,
[CUID] Long Continuous PREDICT_ONLY,
[Denom] Text Discrete,
[Gametype] Text Discrete,
[Max Coins] Text Discrete,
[Par] Long Continuous,
[Pos] Text Discrete,
[Progressive] Text Discrete,
[Type]Text Discrete)
Using Microsoft_Decision_Trees
View 3 Replies
View Related
Apr 24, 2007
Is there any way to force a regressor using the time series algorithm?
View 4 Replies
View Related
Jun 7, 2007
my data is like this:
header | data | key
-------------------
500 | 3.2 | 10
500 | 3.4 | 20
500 | 3.6 | 25
500 | 3.7 | 40
501 | 4.1 | 10
501 | 4.2 | 15
501 | 4.4 | 30
501 | 4.6 | 35
and what I want to do is find the median of "data", but keyed off of "key", so if my desired median is 30, I want to take the two records (data, key) nearest to key = 30, and get the average of "data".
...and do this within each "header" value.
actually, to be precise, I want the linear interpolation, so for header = 500, I want to get the (data, key) pairs of (3.6, 25) and (3.7, 40) and return the interpolated "data" value of 3.6333 (as done here (http://en.wikipedia.org/wiki/Linear_interpolation))
so for the above example the query would produce:
header | interp
-----------------
500 | 3.633
501 | 4.4
possible, or am I crazy?
View 4 Replies
View Related
Jul 23, 2005
How can I order the results of my query in non-linear fasion. I have afield with these values: Reg S, 144A, US and want to order my resultsby US, 144A, Reg S.I would prefer not to create another field in the table if possible.
View 4 Replies
View Related
May 26, 2008
Hi all
i wants to generate linear sequence number like 1,2,3,.............1000000
,are there any function like NEWID() ( this return unique guide, i want to get integer)
i want to used this generated number inside the SQL query
thanks
IndikaD (Virtusa Cop)
View 12 Replies
View Related
May 30, 2006
I need to write some SQL to do a power regression for a trendline. I have 2 columns of data which represent my X, Y data and all I'm after is the a and the b for the function y=ax^b. Has anyone ran into this before?? I know SSAS has a linear regression function but my data really only fits the power model.
View 4 Replies
View Related
Feb 14, 2008
Hi All,
We're currently preparing for a project for a bank client of ours where we would be using SQL Server 2008's data mining capabilities.
Does anyone know if logistic regression supports the following types:
Binomial (standard)
Multinomial (standard)
Conditional
Ordered
Rank-ordered
Nested
Stereotype
Regards,
Joseph
View 1 Replies
View Related
Dec 13, 2007
Hi!
I try to make linear regression in multiple dimensions
with SSAS (y = a + a1*x1+ ... a2*xn)
I got the equation, but I also want to see R squared and R adjusted in same manner as in Excel.
How to achieve that?
Greetings
View 2 Replies
View Related
Oct 3, 2006
How do I write a DMX query to return the coefficients of the independent variables in my regression equation?
Thanks,
Carrie
View 10 Replies
View Related
Feb 8, 2008
I would know if is possible to add the regression line to a scatter chart !!!
View 5 Replies
View Related
Apr 15, 2008
[using: Reporting Services 2005, SQL Server 2005, Analysis Services 2005]
Has anyone ever implemented dynamic trendlines with RS charts?
I have a requirement to create a web-based chart based on an existing Excel chart that the client is already using. This chart uses a trendline to forecast performance for 3 months out. I know in Excel it's as easy as right-click->add trendline.
Is there a similarly simple way to do this in Reporting Services?
Also, the data source for this is OLAP, so if any of you are MDX gurus, is there some regression function to plot all the parallel axis points?
thanks for any insight.
-michael
View 1 Replies
View Related
Oct 21, 2007
I have two questions about the regression tree of Microsoft Decision Trees algorithm.
1. The mining legend window has a column named Histogram showing a bar for each coefficient. What does this bar mean?
2. Since each node of a regression tree corresponds to a linear regression, how can I find the regression coefficient of each node? I mean the coefficient that tells how good the regression is.
Any tip will be greatly appreciated.
View 1 Replies
View Related
Feb 6, 2008
Hello,
I need to develop a Probit Regression Plug-In Algorithm.
Does anyone know if the plug-in framework will reasonably handle a Probit Regression?
Is anyone aware of any code or materials, specific to a Probit Regression Plug-in, that would help me to do this?
I am also interested in applying the dprobit methodology found in Stata for infinitesimal changes in independent variables.
Has anyone been successful using Stata to implement an SSAS plug-in algorithm?
thank you,
Bill Littlewood
View 4 Replies
View Related
Oct 11, 2007
Hi there,
We need to determine the prediction formula coefficients using the multivariate regression formula as is available in Excel AnalysisTool pack [something like Y = Ax + Bz + C and find A, B, C]. It would be a very "simple" type of analysis that would run on a single table. There does not seem to be an easy built-in SQL function to perform this. However, reading on the web, Analysis Services might be used to do this task? Is there a good sample for a multivariate regression?
Actually, is this a proper approach given the relative simplicity of the calculation? Do we really need to go through the trouble of setting up an Analysis Service solution just for this task?
Thanks in advance
View 8 Replies
View Related
Nov 24, 2015
I'm using a bullet chart in a SSRS report and I want to set the Maximum value in the Linear Scale properties to highest value of the following 4 fields. Is there any way to do this?? This will make all charts line up properly.
NC_LAST_YEAR
NC_LINKED
NC_CURRENT
NC_PLAN
View 5 Replies
View Related
Oct 18, 2007
That solved the application problem
However, now we face a different challenge. Running the same data through the SSAS Linear Regression model and the Excel Regression [Data Analysis] tool we get different answers:
Intercept
-3.57537
x
0.242462
z
0.353668
SSAS:
Intercept -2.95188545928199
x 0.201587406861264
z 0.371940525462092
In Excel we set up the Regression analysis using the 95% confidence interval. Is there a concept for confidence interval for linear regression in SSAS?. Since we are doing this for a company that has been using Excel for years, I do not think such a difference in results will be accepted...
Is there anything else we can do to ensure the answers are close? Must we then have to work around and call these calculations from Excel?
View 3 Replies
View Related
Aug 2, 2007
We are seeing a regression bug with the Microsoft JDBC driver 1.2 CTP.
Using this driver, we don't seem to be able to call stored procedures which return a result set, if those stored procedures use temporary tables internally.
The 1.2 CTP driver fails to access such stored procedures in both SQL Server 2000 and SQL Server 2005 databases.
The previous 1.1 driver, suceeds in both cases.
Here is a test case which demonstrates the problem (with IP addresses and logins omitted). The prDummy stored procedure being called is quite simple, and I've copied it below:
Code Snippet
public class MicrosoftJDBCDriverCallingStoredProceduresTest extends TestCase {
// CREATE PROCEDURE [dbo].[prDummy]
// AS
//
// CREATE TABLE #MyTempTable (
// someid BIGINT NOT NULL PRIMARY KEY,
// userid BIGINT,
// )
//
// SELECT 1 as TEST2, 2 as TEST2
// GO
public void testStoredProcedureViaDirectJDBC() {
Connection conn = null;
String driverInfo = "<unknown>";
String dbInfo = "<unknown>";
try {
// Set up driver & DB login...
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
String connectionUrl = "jdbc:sqlserver://xxx.xxx.xxx.xxx:1433";
Properties dbProps = new Properties();
dbProps.put("databaseName", "xxxxxx");
dbProps.put("user", "xxxxxx");
dbProps.put("password", "xxxxxx");
// Get a connection...
conn = DriverManager.getConnection(connectionUrl, dbProps);
driverInfo = conn.getMetaData().getDriverName() + " v" + conn.getMetaData().getDriverVersion();
dbInfo = conn.getMetaData().getDatabaseProductName() + " v" + conn.getMetaData().getDatabaseProductVersion();
// Perform the test...
CallableStatement cs = conn.prepareCall("{CALL prDummy()}");
cs.executeQuery();
// If the previous line executes okay, the test is passed...
System.out.println("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" calls the stored procedure successfully.");
}
catch (Exception e) {
// Fail the unit test...
fail("Accessing "" + dbInfo + "" with driver "" + driverInfo + "" fails to call the stored procedure: " + e.getMessage());
}
finally {
// Close the connection...
try { if (conn != null) conn.close(); } catch (Exception ignore) { }
}
}
}
The output of this test under both drivers and accessing both databases is as follows:
Code Snippet
Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.
Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.1.1501.101" calls the stored procedure successfully.
Accessing "Microsoft SQL Server v8.00.2039" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.
Accessing "Microsoft SQL Server v9.00.3042" with driver "Microsoft SQL Server 2005 JDBC Driver v1.2.2323.101" fails to call the stored procedure: The statement did not return a result set.
View 17 Replies
View Related
Nov 1, 2006
How do I write a regression test for a stored proc that produces multiple rowsets via multipl e select queries? E.g.
CREATE PROCEDURE myProc AS
SELECT 'Some stuff', GETDATE()
SELECT 'Some more stuff'
For single-select procs, I can create a temp table and INSERT #temp EXEC myProc, then evaluate the contents of the table to verify correct behavior, but that doesn't work in this case.
View 1 Replies
View Related
Oct 24, 2007
I have a production server log shipping to a secondary server every 30 minutes (both SQL 2000), which the second server is used for both a warm standby server and for reporting from users. Issue: the log shipping locks the DB so reporting can't be done until the load is finished, the load to the second set of databases has taken up to 15 minutes to finish allowing the users only 15 minutes to run reports, this is not acceptable. The server also needs to be used for DR.
I am looking for another solution, I can't use Transactional Log shipping as not all of the tables in the databases have a primary key identified. So, I am looking for a real-time or near real-time reporting server that is more available to running reports and a warm standby server for Disaster recovery. I am trying to figure out what SQL Server 2000 has to provide (or even 2005 or 2008?) or I am also looking at some third party software, but not sure what is the best for a reasonable price.
Any help is appreciated.
Thanks....JB
View 8 Replies
View Related
Feb 19, 2008
Hi!
I bought the book €œData Mining with SQL Server 2005€?, but I can€™t find the solution to a problem I have.
I want to retrieve from C# the logistic regression Attribute Value (AV) Scores for the Logistic Regression Algorithm. I can see the Scores from the Microsoft Logistic Regression Viewer (the same of Neural Network Viewer), but I cannot retrieve them via DMX, OLEDB or similar.
Otherwise, is there a formula that I can use to compute that score from the coefficient, support, or probability values of the Attribute Value pair (I can read this values from DMX)?
I can access to them via DMX:
NODE_DISTRIBUTION -> SUPPORT and PROBABILITY ATTRIBUTE_VALUE...
with a query like
SELECT FLATTENED (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE FROM NODE_DISTRIBUTION WHERE VALUETYPE = ... ) FROM [MyModel].CONTENT WHERE NODE_TYPE ....
Thanks in advance
Regards,
Marco
View 3 Replies
View Related
Jul 28, 2015
In the 70-461 objectives it says: Ensure code non regression by keeping consistent signature for procedure, views and function (interfaces); security implications...I think I understand what this means in general. They want us to be able to create a view that will still be able to call the original data even if the table is modified. In other words, the view table shouldn't easily be broken. ie, type a code that does NOT ensure non regression, then change the code so that it does ensure non regression.Â
View 4 Replies
View Related
Feb 16, 2005
Hi All,
I have a dilemn:
On one side, I have a column C1 which could be a primary key because it is never null, the value is unique and identify the record. The problem is its a char type and its lenght can be close to 30.
Then, I've planned to add another column C2 of int type as PK. But then I need to add a unique constraint index on C1. Does it improve performance anyway?
Thanks
View 14 Replies
View Related
Jul 6, 2006
good day, everyone
if i have a transaction table with fields below :
transaction_no, product_id, product_desc, product_qty, product_txn, transaction_date
can some expert here point out to me , which is the best cluster-index and non-clsuter index ?
and possible kindly please explain why is it so? i'm not good in database so just explain like to beginner
thank you very much for guidance
View 3 Replies
View Related
May 6, 2008
(Hard to put a good subject on this one...)
I have a database containing a lot of users and these users can have four different kind of telephone numbers connected to them: "Direct phone", "Switchboard", "Cell phone", "Home phone". The phone numbers are stored in a separate table. Some users have 0 phone numbers, some have 1, some have 3 etc.
Now I have to transfer the data to another database with a strict table structure and here the table that contains the user also should contain the users phone number and an alternative phone number, if the user currently has more than one phone number connected.
This means that if for instance we have three or more phone numbers connected to one user, we can maximum transfer two of them. This is not a big issue though...
We have ranked the importance of the phone numbers in the order as I presented them above.
What I do in my T-SQL query is to do a ISNULL() and see if the user has "Direct phone" connected, if not I check for the next type and so on.
Now to my problem! Can anyone give me a suggestion of how to write the code for the extraction of the Alternative phone? What I need to do is to check if there is a "Direct phone" connected to the user, if so I should NOT chose that but the next phone number that I find.
View 7 Replies
View Related