Mining Content Viewer For Linear Regression: Node Distribution Output
Dec 19, 2006
With the number of threads it is difficult to know if this has been posted. If I use the Mining Content Viewer for Linear Regression, under Node Distribution, there are values given for Attribute Name, Attribute Value, Support, Probability, Variance, and Value Type. The output is similar to what Joris supplied in his thread about Predict Probability in Decision Trees. My questions:
1. How should these fields be interpreted?
2. With Linear Regression, is it possible to get the coefficient values and tests of significance (t-tests?), if they are not part of the output I have pointed to?
This is a real challenge. I hope someone is smart enough to know howto do this.I have a tableTABLE1[Column 1- 2001][Column 2- 2002][Column 3- 2003][Column 4 - 2004][Column 5 - 2005][Column 6 - 2006][Column 7 - Slope][2001][2002][2003][2004][2005][2006] [Slope][1] [2] [3] [4] [5] [6] [1][1.2] [.9] [4] [5] [5.4] [6.2] [?]Slope is defined as "M" in the equation y=mx+bI need a way a finding the linear equation that best fits the points soI can have SQL calculate the slope.Are there any smart people around that would know how to do this?thanks
I would like to understand the algorithm that the linear regression method uses to choose the regressors in the model from a list of possible regressors.
I think that it is different from the common methods used in statistics like stepwise, forward or backward.
The results we got are a model with intercept only. if we don't use the nested variable (the red line) we get a rigth model . (we had more variable ....)
When using linear regression in the SQL Server 2005 Business IntelIigence Studio I interpet the information below as follow: X has a standard deviation of +- 37.046. Is it possible to obtain the standard deviation of each coefficient in the regression expression?
I am trying to create a model using microsoft Linear Regression algorithm. But I want to constrain the coefficient of the parameters to non-negative value. There is concept of bound in SAS where we can specify the range of the coefficient. Does any of the SSAS mining algorithms support restricting the coefficient value?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can someone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
Q1. Model Prediction -- Suppose we already have a trained Microsoft Linear Regression Mining Model, say, target y regressed on two variables:
x1 and x2, where y, x1, x2 are of datatype Float. We try to perform Model Prediction with an Input Table in which some records consist of NULL x2 values. How are the resulting predicted y values calculated?
My guess:
The resulting linear regression formula is in the form:
where avg_x1 is the average of x1 in the training set, and avg_x2 is the average of x2 in the training set (Correct?).
I guess that for some variable being NULL in the Input Table, Microsoft Linear Regression just treat it as the average of that variable in the training set.
So for x2 being NULL, the whole term coeff2 * (x2 - avg_x2) just disappear, as it is zero if we substitute x2 with its average value.
Is this correct?
Q2. Model Training -- Using the above example that y regressed on x1 and x2, if we have a train set that, say, consist of 100 records in which
y: no NULL value
x1: no NULL value
x2: 70 records out of 100 records are NULL
Can soemone help explain the mathematical procedure or algorithm that produce coeff1 and coeff2?
In particular, how is the information in the "partial records" used in the regression to contribute to coeff1 and the constant, etc ?
I bought the book €œData Mining with SQL Server 2005€?, but I can€™t find the solution to a problem I have.
I want to retrieve from C# the logistic regression Attribute Value (AV) Scores for the Logistic Regression Algorithm. I can see the Scores from the Microsoft Logistic Regression Viewer (the same of Neural Network Viewer), but I cannot retrieve them via DMX, OLEDB or similar.
Otherwise, is there a formula that I can use to compute that score from the coefficient, support, or probability values of the Attribute Value pair (I can read this values from DMX)? I can access to them via DMX:
NODE_DISTRIBUTION -> SUPPORT and PROBABILITY ATTRIBUTE_VALUE...
with a query like
SELECT FLATTENED (SELECT ATTRIBUTE_NAME, ATTRIBUTE_VALUE FROM NODE_DISTRIBUTION WHERE VALUETYPE = ... ) FROM [MyModel].CONTENT WHERE NODE_TYPE ....
I'm a beginner with SQL 2012 SSDT & SSMS. I get this error message when I try to deploy my project:Â
"Error 6 Error (Data mining): KEY SEQUENCE columns are not supported at the case level. The 'Customer Key' column of the 'TK448 Ch09 Cube Clustering' mining structure contains content that is not valid. 0 0 " I am finding it hard to locate the content that is not valid. I've been trying to find a answer for this problem but can't seem to find anything. How can I locate the content that is not valid and change or delete it so that I can deploy this solution?
I get the following error when I try to load the mining model in the mining model viewer
Query (1, 6) The '[System].[Microsoft].[AnalysisServices].[System].[DataMining].[NeuralNet].[GetAttributeValues]' function does not exist.
I get a similar error when I try to load the Load Mining Accuracy Chart
Failed to execute the query due to the following error:
Query (1, 6) The '[System].[Microsoft].[AnalysisServices].[System].[DataMining].[AllOther].[GenerateLiftTableUsingDatasource]' function does not exist.
Why I got different results for the same attribute value displayed in my mining model? Any suggestions on what I may have missed for that?
In my case, the mining structure is with case table which is the fact table, within this mining structure, I dragged other attributes from its related dimensional tables as well. E.g The schema of the mining structure is as below:
Then in my training model (using Microsoft Clustering algorithm), the content contained within the training model is very strange, e.g there are different results for the same value of attribute 'Agent Level' . Why did that happen and how can I figure it out? There should only one result for each value of each attribute within one mining model?
Please shed me some light on this issue and I am looking forward to hearing from you shortly for your kind advices and thanks a lot in advance.
In order to setup my forecasting mining model I have created a special view that runs against my fact table and creates time series on the level I need.
Code Snippet Select DFUKEY, DATE, QTY from Dim_FACT where DFKEY like '020%'
So I get the following input fr my model:
time series key (e.g. DFUKEY)
date (time key)
QTY (to be predicted)
For testing purposes I created a small view (similar to AdventureWorks) that only contained a few time series. The model was created and processed in ~2 minutes or less. The viewer came up almost immediately and I was able to see results.
Now my real view has about 25000 time series I need a forecast for and that I also like to review in the viewer. If I create a mining model against that bigger view the processing takes ~15m or so and the viewer is likely to time out.
The worst part thought is when I try to get the forecast for a time series (see query below) it takes minutes before the answers come back.
Small problem here. I have successfully installed the Data Mining Web Controls in my pc. I can use it to display the result that i want, the problem is the expand image and collapse image did not show out in the column, by the way I can click the blank column to expand or collapse the tree node, it function completely. How can i display the expand image and Collapse image?
I'm doing a custom clustering plugin for text to pre-process ("clean" the texts), calculate weights, estimate the number of clusters (using the PBM index) and finally, do the actual clustering.
So... I've made each of these modules on C++ and I'm putting them all togheter on the plugin.
My database (MDB file) has only one table, with only two fields within: a key (auto-incremental) and a small text. What I intend to do is to get the text in each test case, store them togheter somewhere and call my classes to cluster these texts.
I'm trying to log the texts in a file (just a test) on the ProcessCase method, in the CaseProcessor class. I've did it with no problems with numerical data.
But when I load the MDB file on the Mining Structures Wizard, it says the content type of the field holding the texts is "Continous" and the data type is "Text". Actually, when I saw it I didn't really mind.
But when I run the mining model it gives me the following error: "Error 1 Error (Data mining): The data type of the Table1.Texto mining structure column must be numeric since it has a continuous content type (Content is set to Continuous or Key Time or Key Sequence). 0 0 "
So... How do I change this content type ? (the content type combobox on the Mining Structures Wizard couldn't the changed)
I am going through the data mining web control viewer tutorial and its going great. I have been able to build and setup the viewer. The problem I am having is when I publish the site out to my web server, it gives me the following error:
Code Snippet
Error: Either the user, <domain><computerName>$, does not have access to the prospectDataMining database, or the database does not exist.
When I debug this on my local machine via Visual Studio 2005, it works GREAT! It is just when I publish the site to the web server.
I have a dedicated SSAS server along w/ a dedicated web server. To test, I published the site to the SSAS server to see if it was a connection issue. I received the same error w/ a different <domain><computerName>$.
I looked at trying to put in the optional connection info for the dataMining html tree viewer properties... but apparently dont know how to do that properly. I also checked the IIS directory security and enabled Integrated security.
What am i doing wrong? ANY help is much appreciated.
I tried to find the graphs I saved from Data Mining Model Viewer, but where is it saved? As we see from the mining model viewer we could save the graphs there by clicking the 'save graph' button, but where is the graph?
Really need help for that.
Thank you very much in advance for any guidance and help for that.
I've successfully created and processed a very simple neural network mining model (defined against a cube). However, when I go to the model viewer in BI studio, it displays the following error:
"Execution of the managed stored procedure GetAttributeScores failed with the following error: Exception has been thrown by the target of an invocation.Input string was not in a correct format.."
Any ideas about what's going wrong? This is with SQL Server 2005 SP1.
We've successfully processed a large decision tree model in SQL Server 2005. When I try to view the tree in the mining model viewer, I get the following error:
TITLE: Microsoft Visual Studio ------------------------------
The tree graph cannot be created because of the following error:
'Exception of type 'System.OutOfMemoryException' was thrown.'.
For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%u00ae+Visual+Studio%u00ae+2005&ProdVer=8.0.50727.42&EvtSrc=Microsoft.AnalysisServices.Viewers.SR&EvtID=ErrorCreateGraphFailed&LinkId=20476
The link provides no other documentaiton on the error.
We're using 64-bit SQL on a Dell Workstation running XP-64 with 16GB of memory. From my view of things we aren't close to running out of memory. Since the model processed and the error occurs when viewing the model, is this a problem with Visual Studio and nont necessarily Anlaysis Services?
I have a framework 2.0 winforms application that uses the data mining viewer controls. I upgraded the project to visual studio 2008 and compiled it under framework 2.0. It compiles fine, but when the form with the TimeSeriesViewer control loads, the application throws the following exception:
System.Reflection.TargetInvocationException was unhandled by user code Message="Unable to get the window handle for the 'AxChartSpace' control. Windowless ActiveX controls are not supported." Source="System.Windows.Forms" StackTrace: at System.Windows.Forms.AxHost.InPlaceActivate() at System.Windows.Forms.AxHost.TransitionUpTo(Int32 state) at System.Windows.Forms.AxHost.CreateHandle() at System.Windows.Forms.Control.CreateControl(Boolean fIgnoreVisible) at System.Windows.Forms.Control.CreateControl(Boolean fIgnoreVisible) at System.Windows.Forms.AxHost.EndInit() at Microsoft.AnalysisServices.Viewers.TimeSeriesViewer.InitializeComponent() at Microsoft.AnalysisServices.Viewers.TimeSeriesViewer..ctor() at RMS2.UI.DecisionSupport.ShowModel(String modelName, Int32 tabIndex) in C:UsersDougDocumentsRmsIIRmsIIRMS2.UIDecisionSupport.cs:line 72 at RMS2.UI.DecisionSupport.DecisionSupport_Load(Object sender, EventArgs e) in C:UsersDougDocumentsRmsIIRmsIIRMS2.UIDecisionSupport.cs:line 42 at System.Windows.Forms.Form.OnLoad(EventArgs e) at System.Windows.Forms.Form.OnCreateControl() at System.Windows.Forms.Control.CreateControl(Boolean fIgnoreVisible) at System.Windows.Forms.Control.CreateControl() at System.Windows.Forms.Control.WmShowWindow(Message& m) at System.Windows.Forms.Control.WndProc(Message& m) at System.Windows.Forms.ScrollableControl.WndProc(Message& m) at System.Windows.Forms.ContainerControl.WndProc(Message& m) at System.Windows.Forms.Form.WmShowWindow(Message& m) at System.Windows.Forms.Form.WndProc(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m) at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam) InnerException: System.AccessViolationException Message="Attempted to read or write protected memory. This is often an indication that other memory is corrupt." Source="System.Windows.Forms" StackTrace: at System.Windows.Forms.UnsafeNativeMethods.IOleObject.DoVerb(Int32 iVerb, IntPtr lpmsg, IOleClientSite pActiveSite, Int32 lindex, IntPtr hwndParent, COMRECT lprcPosRect) at System.Windows.Forms.AxHost.DoVerb(Int32 verb) at System.Windows.Forms.AxHost.InPlaceActivate() InnerException:
The control is being added programatically, as in the wiinforms samples. Can anyone suggest a workaround? Thank you in advance.
Well... As I said in other topics, I'm doing a clustering plugin for text mining. I'm facing many problems and, with your help, solving them one by one.
First of all, thanks a lot again.
Well... I've made a clustering function that is actually working very well. But I'm exporting its results to a log file I use as an algorithm trace for debugging.
My clustering method returns a vector containing information of what cluster each register belongs. For instance:
vector[0] = 1 -> The register of index 0 belongs to cluster 1.
vector[1] = 9 -> The register of index 1 belongs to cluster 9.
vector[2] = 2 -> The register of index 2 belongs to cluster 2.
...
And so on.
But... I know that none of the Navigation methods receives a structure like this one discribed above. I only use it to log the results to debug the algorithm.
But how to pass this information (what register (or test case) belongs to what cluster) to the Navigation ?
Thanks a lot again, and any help will be very appreciated.
This question is regarding the LogRegHelper - "A scorecard for Logistic Regression models" example in sqlserverdatamining Tips and Tricks page. I launched TestLogReg (Analysis Services Database associated with the project) and ran Logistic Regression over that. While the LogReg shows the highest score for IQ (107 - 121), a score of 558, the Logistic Regression shows that Parent Encouragement has the highest score for the case College Plans = 'Plans to Attend'. Can someone verify this and clarify?
I have a few other questions with LR
- In SQL Server 2005 LR Mining Model Viewer "favors" chart, what algorithm is used for generating Scores?
- Can I use this score as a feature selector? Higher score => stronger predictor (input)
- Is the coefficient weight algorithm used in LogReg wrong ?
I have a rather complicated report with lots and lots of textbox and line controls. When I preview the report on the Report Server the layout is all kinds of skewed and all kinds of stuff is out of place. But when I export the report to PDF or TIFF, the output reverts to it's proper form. Why is it doing this? Is there anything that I can do to not make it so ugly when previewing?
I am wondering where can I store my mining results in data mining engine? For example, I got mining results like accuracy chart, decision trees, and other formats of results based on different mining algorithms I used for my data mining, so where can I actually store the results for reporting service use later? Is it possible to do that in SQL Server 2005?
Thanks a lot for any help and guidance in advance.
I have MS Time Seeries model using a database of over a thousand products each of which has hundreds of cases. It amazingly takes only a few minutes to finish processing the model, but when I click Mining Model Viewer to view the models, it takes many hours to show up. Once the window is open, I can choose model for different products almost instantly. Is this normal?
On SQL Server 2005 SP2 for Publisher and Distributor on the same instance, my old snapshots are not being cleaned up.
The following error is in the agent history:
Executed as user: DomainMyUser. Could not remove directory '\vmsql01ReplDatauncPublication_TRANSACTIONAL20070702104416'. Check the security context of xp_cmdshell and close other processes that may be accessing the directory. [SQLSTATE 42000] (Error 20015). The step failed.
xp_cmdshell is enabled and I can run commands like :
exec master.dbo.xp_cmdshell ' md c:TestFolder'
The permissions to the snapshot share and file system are that DomainMyUser has full control.
I have logged into the machine as this user and can remove snapshots so it does not seem to be a permission issue.
On other machines I do not get any errors but the snapshot folder still is not cleaned up.
I have not used log shipping before and find myself in a position where I need to reboot the secondary node and then the primary node and I don't actually need to failover.
Is there anything I need to be aware of. When rebooting the secondary node I assume the transactions will be held in the primary nodes log till the secondary comes back and just carry on once back up?
When rebooting the primary node nothing needs to be done and the log shipping will just start again once it has come back?
But I'm not sure if I have to install SQL Server first on node 2, then add it to the cluster. Or does adding it to the cluster also install the software?