How Does Linear Regression Handle Missing Values For Prediction And For Training?
Sep 18, 2006
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
y = constant + coeff1 * (x1 - avg_x1) + coeff2 * (x2 - avg_x2)
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
View 1 Replies
ADVERTISEMENT
Sep 18, 2006
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
y = constant + coeff1 * (x1 - avg_x1) + coeff2 * (x2 - avg_x2)
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
View 3 Replies
View Related
Jul 24, 2006
This is a real challenge. I hope someone is smart enough to know howto do this.I have a tableTABLE1[Column 1- 2001][Column 2- 2002][Column 3- 2003][Column 4 - 2004][Column 5 - 2005][Column 6 - 2006][Column 7 - Slope][2001][2002][2003][2004][2005][2006] [Slope][1] [2] [3] [4] [5] [6] [1][1.2] [.9] [4] [5] [5.4] [6.2] [?]Slope is defined as "M" in the equation y=mx+bI need a way a finding the linear equation that best fits the points soI can have SQL calculate the slope.Are there any smart people around that would know how to do this?thanks
View 3 Replies
View Related
Apr 22, 2007
I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.
I think that it is different from the common methods used in statistics like stepwise, forward or backward.
Laura Lerner
View 8 Replies
View Related
Jan 22, 2007
We are trying to create a model of linear regression with nested table. We used the create mining model sintax as follow :
create mining model rate_plan3002_nested2
( CUST_cycle LONG KEY,
VOICE_CHARGES double CONTINUOUS predict,
DUR_PARTNER_GRP_1 double regressor CONTINUOUS ,
nested_taarif_time_3002 table
( CUST_cycle long CONTINUOUS,
TARIFF_TIME text key,
TARIFF_VOICE_DUR_ALL double regressor CONTINUOUS
)
) using microsoft_linear_regression
INSERT INTO MINING STRUCTURE [rate_plan3002_nested2_Structure]
(CUST_cycle ,
VOICE_CHARGES ,
DUR_PARTNER_GRP_1 ,
[nested_taarif_time_3002](SKIP,TARIFF_TIME ,TARIFF_VOICE_DUR_ALL)
)
SHAPE {
OPENQUERY([Cell],
'SELECT CUST_cycle ,
VOICE_CHARGES ,
DUR_PARTNER_GRP_1
FROM dbo.panel_anality_3002
order by CUST_cycle ')}
APPEND
({OPENQUERY([Cell],
'select CUST_cycle,
TARIFF_TIME,
CYCLE_DATE
from dbo.nested_taarif_time_3002
order by CUST_cycle,TARIFF_TIME')
}
relate CUST_cycle to CUST_cycle
) as nested_taarif_time_3002
The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)
Is there a way to do this regression correctly?
Thanks,
Dror
View 7 Replies
View Related
Sep 2, 2007
When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?
View 1 Replies
View Related
Jan 18, 2008
Hi,
I am trying to create a model using microsoft Linear Regression algorithm. But I want to constrain the coefficient of the parameters to non-negative value. There is concept of bound in SAS where we can specify the range of the coefficient. Does any of the SSAS mining algorithms support restricting the coefficient value?
Thanks,
DMN
View 3 Replies
View Related
Jul 19, 2007
Hi, all,
Just really wonder what is the good idea to deal with missing values? Should we leave the missing values there in the traning data set ? Or replace it with other values?
What I am really concerned is that if we simply replace those missing values with other values, then how will it really affect the correctness of the training models?
I am looking forward to hearing from you for the above issue and it will be really great if we have any kind of best practices of dealing with this issue.
Thanks.
With best regards,
Yours sincerely,
View 4 Replies
View Related
Dec 19, 2006
With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:
1. How should these fields be interpreted?
2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?
Thanks for your help with this?
Sam
View 1 Replies
View Related
Dec 3, 2007
Hi
I'm using service broker and keep getting errors in the log even though everythig is working as expected
SQL Server 2005
Two databases
Two end points - 1 in each database
Two stored procedures:
SP1 is activated when a message enters the sending queue. it insert a new row in a table
SP2 is activated when a response is sent from the receiving queue. it cleans up the sending queue.
I have a table with an update trigger
In that trigger, if the updted row meets a certain condition a dialogue is created and a message is sent to the sending queue.
I know that SP1 and SP2 are behaving properly because i get the expected result.
Sp1 is inserteding the expected data in the table
SP2 is cleaning up the sending queue.
In the Sql Server log however i'm getting errors on both of the stored procs.
error #1
The activated proc <SP 1 Name> running on queue Applications.dbo.ffreceiverQueue output the following: 'The conversation handle is missing. Specify a conversation handle.'
error #2
The activated proc <SP 2 Name> running on queue ADAPT_APP.dbo.ffsenderQueue output the following: 'The conversation handle is missing. Specify a conversation handle.'
I would appreceiate anybody's help into why i'm getting this. have i set up the stored procs in correctly?
i can provide code of the stored procs if that helps.
thanks.
View 10 Replies
View Related
Jun 24, 2007
Hello,
This question is regarding the LogRegHelper - "A scorecard for Logistic Regression models" example in sqlserverdatamining Tips and Tricks page. I launched TestLogReg (Analysis Services Database associated with the project) and ran Logistic Regression over that. While the LogReg shows the highest score for IQ (107 - 121), a score of 558, the Logistic Regression shows that Parent Encouragement has the highest score for the case College Plans = 'Plans to Attend'. Can someone verify this and clarify?
I have a few other questions with LR
- In SQL Server 2005 LR Mining Model Viewer "favors" chart, what algorithm is used for generating Scores?
- Can I use this score as a feature selector? Higher score => stronger predictor (input)
- Is the coefficient weight algorithm used in LogReg wrong ?
Thanks
MA
View 1 Replies
View Related
May 15, 2007
Hi all I am trying to populate a page with data from a SQL DB however one field is null. The problem is this is causing a StrongTypingException and I am not sure how I should handle this to stop the apllication crashing. I am trying to assign a bit value from a SQL Db to checkboxI have tried putting if (Convert.IsDBNull(contentRow.pag_status) == false){//Do what I want}
but this still throws the exception
Can anyone help!?
View 5 Replies
View Related
Feb 19, 2008
Hi,
In my Excel file, The Application date column contains empty for some rows. In SSIS I am using one Data Conversion to that Application Date column to change it as Date[dt_Date]. This data conversion is giving error Conversion failed. In Sqlserver table, I declare ApplicationDate column datatype as DateTime.
I want to keep those empty date values as Null in Sqlserver.
I tried the IMEX=1 property still it is not working. How to solve this error?
Thanks in advance.
View 1 Replies
View Related
Feb 22, 2008
Hi
After building a model in BI, I want to view the chart of model in mining model viewer, in the chart tab I can just see one prediction value that means for my model do prediction for some time slice and in prediction steps I can specify how many steps, I want to show this chart
In mining model viewer tab we can see the chart of prediction also decision tree and the chart is for showing all of value prediction, and with choosing prediction steps we can specify that show just one value prediction or two or several values. But sometime I can see just one value in chart and sometime I can see several values in chart,
This difference is for my data or no?
And also for viewing historic prediction I should choice €œshow historic prediction€? and before that I should set
Two parameters: Historic_ model _count and historic _model _count,
But I can€™t see historic prediction (sometime this happens)
Please help me.
View 1 Replies
View Related
Aug 17, 2005
If myDateTimeColumn contains a <NULL> value. How do you handle that when reading into a DateTime object in your code?DateTime myDate = Convert.ToDateTime(dr["myDateTimeColumn"]);Does not work, it throws: System.InvalidCastException: Object cannot be cast from DBNull to other types.
I am curious as to what others are doing to handle this?
View 6 Replies
View Related
Jun 18, 2004
I have a checkbox on my webform that is bound to a bit field in my SQL table. I'm fine as long as I've got the bit field set to 0 or 1, but if the field is NULL, the checkbox throws an exception during the databind.
Is there any way to handle this without removing the data binding and manually setting the value (ie: some way to intercept it before the exception gets thrown and then setting the field value in the dataset)?
Thanks!
View 1 Replies
View Related
Jul 8, 2015
how to handle space between multi-value parameter values in SSRS. For e.g. if the values are as follows -'KLO LUG', 'HGY KIU', 'LOT JUY', I know I can use the split function for the commas but its the space between the value which is the problem.
View 3 Replies
View Related
Oct 29, 2004
I have a table that keeps track of click statistics for each one of my dealers.. I am creating graphs based on number of clicks that they received in a month, but if they didn't receive any in a certain month then it is left out..I know i have to do some outer join, but having trouble figuring exactly how..here is what i have:
select d.name, right(convert(varchar(25),s.stamp,105),7), isnull(count(1),0)
from tblstats s(nolock)
join tblDealer d(nolock)
on s.dealerid=d.id
where d.id=31
group by right(convert(varchar(25),s.stamp,105),7),d.name
order by 2 desc,3,1
this dealer had no clicks in april so this is what shows up:
joe blow 10-2004 567
joe blow 09-2004 269
joe blow 08-2004 66
joe blow 07-2004 30
joe blow 06-2004 8
joe blow 05-2004 5
joe blow 03-2004 9
View 1 Replies
View Related
Jan 7, 2007
Hello all and a happy new year!
I used Microsoft clustering for grouping my data. Even though i already cleaned the data and have no null values i get one cluster with missing values in every attribute. (i set CLUSTER_COUNT=3 and i'm using Scalable k-means algorithm)
Does "missing" mean that the algorithm cannot group that particular tuple in another group so it consider it as missing?
Thank you in advance.
View 4 Replies
View Related
Mar 7, 2008
I have two columns, where I have the start and stop numbers (and each of them ordered asc). I would like to get a query that will tell me the missing range.
For example, after the first row, the second row is now 2617 and 3775. However, I would like to know the missing values, i.e. 2297 for start and 2616 for stop and so on as we go down the series. Thanks in advance to any help provided!
StartStop
---------
20452296
26173775
568936948
3727084237
84409178779
179013179995
180278259121
259292306409
307617366511
View 6 Replies
View Related
Nov 16, 2015
I have table with column having values 1,2,3,5,6,10.
I want to get the missing values in that column between 1 and 10 i.e., min and max... using sql query.
I want to get the values 4,7,8,9.
View 8 Replies
View Related
May 16, 2008
I've got a field that might have spurious values in it (say, an admin adds a new row but doesn't have an entry for this field).
I'm trying to swap in the string no_image_EN.jpg if the value in the db does NOT end in .jpg. That way, any value rreturned is either a valid filename or no_image
I'm having trouble with the CASE statement, particularly testing just the last few cahracters of the string:
select product_code,
CASE can_image_en
?? When (can_image_en LIKE '%.jpg') then can_image_en
Else 'no_image_EN.jpg'
End as can_image_en,
none of these do the trick either (some are bad syntax obviously):
? When (can_image_en LIKE '%.jpg') then can_image_en
? When LIKE '.jpg' then can_image_en
? When '%.jpg' then can_image_en
? When right(can_image_en,4) = '%.jpg' then can_image_en This is the one that has correct syntax, though it seems to return false in ALL cases CASE can_image_en
When '%.jpg%' then can_image_en
Else 'no_image_EN.jpg'
View 5 Replies
View Related
May 29, 2012
We are facing the following issue, several machines/users that are executing very often a command similar to :
INSERT INTO TableName (FieldOne,FieldTwo) VALUES ('ValueOne','ValueTwo');
SELECT SCOPE_IDENTITY() AS Table_ID;
Where TableName has a primary key defined as identity(1,1).and that Table_ID is being used as reference in others tables
These queries are executed using different dababase users and among several diffrent apps..The Problem is that we are detecting lost block of "Table_ID's" as the other tables shows the InsertedID as a reference, but the TableName table lacks of this ID record. In other words, the INSERT seems to work, the SCOPE_Identity returns an InsertedID, and the other tables are populated using this number. However, when we query the TableName table the mentioned record does not exist. We are profiling the server and we're sure that there are no DELETE statement on the TableName table. This seems to be happening when the are either deadlocks or blocked processes. Whenever the deadlocks and locks disappear/solved, everything works as expected.why the Scope_Identity returns the Inserted ID if the INSERT action had failed.
View 4 Replies
View Related
Jan 15, 2013
KEYIDGROUP
1 1 a
2 1 b
3 2 a
4 2 b
5 3 a
6 3 b
7 4 a
8 5 a
This is my simple table I need a query that will identity the ID's that are missing the group "b" but I don't want ID 1,2,3 to come up because they are part of a and b. I just need to see anything missing only "b" but not if it's part of a and b.
query should reveal answer should be missing the group b
KEYID
7 4
8 5
I tried the NULL search but since the records don't exist it cant find a null. I am writing a query to identify the missing ID without B but exclude ID that are part of A and B
View 3 Replies
View Related
Dec 13, 2007
I can't figure out how to get my line chart to break when there isn't a value. For example, I have a trend line over 4 time periods. The 3rd time period is missing a value. Instead of the line ending at the 2nd period and picking up again at the 4th time period, it's connecting the line 2nd to the 4th period. I'd like it to break and for there to be no line appearing in the 3rd period. I bet that's as clear as mud, but let me know if you have any questions.
Thanks!
sash
View 1 Replies
View Related
Apr 27, 2015
Write the query that produces the below results. I'm not ale to join the two sets in a way so that it displays NULLs if no purchase was made on a given day for a particular product. I need NULLs or s so that it shows up correctly on my SSRS report.
-- declare @from DATE='2015-1-5',@to DATE='2015-1-10'
-- test data
;with testdata as(
SELECT 1 AS Id,'1/6/2014' AS Date, 21 As Amount UNION ALL
SELECT 1 ,'1/8/2014', 25 UNION ALL
SELECT 1 ,'1/9/2014', 30 UNION ALL
SELECT 1 ,'1/10/2014', 60 UNION ALL
SELECT 1 ,'1/5/2015', 3800 UNION ALL
SELECT 1 ,'1/6/2015', 7120 UNION ALL
[code]....
View 2 Replies
View Related
Feb 8, 2007
Ok, so I must have screwed something up.
I have several databases set up for transactional replication to another instance of SQL Server 2005 for fail over purposes. Today, I restored one of those replicated databases to my development machine and discovered two surprising problems:
1) The Default Values settings in the replicated tables are missing. They are there in the publishing tables, just as they were before I set up replication. However, they are not in the subscribing tables. Now, this is not such a big issue, since I tend to send all default values in insert queries as necessary.
2) The second problem is a more of an issue, since I use auto-numbered Identity columns in my tables (yes, I know that's just plain lazy...). Anyway, in the replicated tables, €œIs Identity€? is indeed set to yes, but despite that fact that there are thousands of records with incrementally unique IDs, SQL server is trying to insert a record starting with 1. This, of course, throws a PK constraint error.
Obviously, if I am use them for failover purposes, these replicated databases need to be identical in every way.
So, what did I do to cause this situation, and how to I fix it?
Thanks a bunch!
md
View 9 Replies
View Related
Jan 7, 2008
Hello,
I have the following query which grabs monthly usage data which is logged to a database table from a web page:
SELECT Button = CASE ButtonClicked
WHEN 1 THEN '1st Button'
WHEN 2 THEN '2nd Button'
WHEN 3 THEN '3rd Button'
WHEN 4 THEN '4th Button'
WHEN 5 THEN '5th Button'
WHEN 6 THEN '6th Button'
WHEN 7 THEN '7th Button'
WHEN 8 THEN '8th Button'
WHEN 9 THEN '9th Button'
ELSE 'TOTAL'
END,
COUNT(*) AS [Times Clicked]
FROM WebPageUsageLog (NOLOCK)
WHERE DateClicked BETWEEN @firstOfMonth AND @lastOfMonth
GROUP BY ButtonClicked WITH ROLLUP
ORDER BY ButtonClicked
The results look like this:
TOTAL 303
1st Button 53
2nd Button 177
3rd Button 10
4th Button 4
6th Button 18
7th Button 19
8th Button 21
9th Button 1
If a button is never clicked in a given month, it never gets logged to the table. In this example, the 5th button was not clicked during the month of December, so it does not appear in the results. I want to modify my query so it displays the name of the button and a zero (in this case "5th Button 0") in the results for any buttons that were not clicked. For some reason I am drawing a blank on how to do this. Thanks in advance.
-Dave
View 3 Replies
View Related
Jun 2, 2015
I'm trying to swap out old partitions and getting "An error occurred while processing 'AltFile' metadata for database id 12 file id 605" 605 is missing from sys.sysfiles. I've tried adding new file groups since it seemed to be assigning them in that range to allow the command to find a match. Once created and I issue the alter command the file id of the target file changes to something else in the missing range.
The file id values seem to be managed solely by sqlserver so I'm not sure what to try. There are hundreds of files with millions of rows and the method has been used problem free for years. I do occasionally get "unable to remove file because it's not empty" once in a while which may be related. I wind up having to shrink those and leave them in existence.
The target file group has an existing file id value when you join sys.sysfiles using the filegroup name.
I partition data in 2 tables on one filegroup per day. I swap out parition 1 each day which makes the new earliest day's partition the new partition 1. Different databases have different day ranges depending on requirements.
View 0 Replies
View Related
Jun 7, 2007
my data is like this:
header | data | key
-------------------
500 | 3.2 | 10
500 | 3.4 | 20
500 | 3.6 | 25
500 | 3.7 | 40
501 | 4.1 | 10
501 | 4.2 | 15
501 | 4.4 | 30
501 | 4.6 | 35
and what I want to do is find the median of "data", but keyed off of "key", so if my desired median is 30, I want to take the two records (data, key) nearest to key = 30, and get the average of "data".
...and do this within each "header" value.
actually, to be precise, I want the linear interpolation, so for header = 500, I want to get the (data, key) pairs of (3.6, 25) and (3.7, 40) and return the interpolated "data" value of 3.6333 (as done here (http://en.wikipedia.org/wiki/Linear_interpolation))
so for the above example the query would produce:
header | interp
-----------------
500 | 3.633
501 | 4.4
possible, or am I crazy?
View 4 Replies
View Related
Jul 23, 2005
How can I order the results of my query in non-linear fasion. I have afield with these values: Reg S, 144A, US and want to order my resultsby US, 144A, Reg S.I would prefer not to create another field in the table if possible.
View 4 Replies
View Related
May 26, 2008
Hi all
i wants to generate linear sequence number like 1,2,3,.............1000000
,are there any function like NEWID() ( this return unique guide, i want to get integer)
i want to used this generated number inside the SQL query
thanks
IndikaD (Virtusa Cop)
View 12 Replies
View Related
May 30, 2006
I need to write some SQL to do a power regression for a trendline. I have 2 columns of data which represent my X, Y data and all I'm after is the a and the b for the function y=ax^b. Has anyone ran into this before?? I know SSAS has a linear regression function but my data really only fits the power model.
View 4 Replies
View Related