How do I write a DMX query to return the coefficients of the independent variables in my regression equation?
Thanks,
Carrie
All algorithm content is in the content schema rowset available through
SELECT * FROM <model name>.CONTENT
Although the schema is the same for all algorithms, each uses the schema slightly differently. The schema itself is difficult to decode, but you can download a plug-in viewer from http://www.sqlserverdatamining.com/dmcommunity/_downloads/1348.aspx that decodes all the types/etc into their parts. Once you do this, you will see what columns/etc you need from the content.
|||We do not have the sgKey.snk file. Can we generate one? If so how?
Cryptographic failure while signing assembly 'C:\Documents and Settings\dtm\My Documents\dot net examples\Generic Content Viewer\GenericContentTreeViewerSetup\obj\Debug\GenericContentTreeView.dll' -- 'Error reading key file 'c:\Documents and Settings\dtm\My Documents\dot net examples\Generic Content Viewer\GenericContentTreeViewerSetup\sgKey.snk' -- The system cannot find the file specified. '
Thanks
|||Something happened to the download - the snk file isn't the only one missing - we're looking into it. However, the setup should have installed the viewer anyway, so you should see it in BI Dev Studio, did it not?|||I understand the content viewer now. I thought maybe I was looking for something different but now I see.
Three more questions:
1. My regressor variable has three values associated with it. I understand the first (the coefficient in the regression equation) and the third, used to calculate the constant - but what is the second value? A screenshot would probably be more helpful.
2. How do I set an input variable to regressor in the mining wizard. It is not listed as a modeling flag option. If I have more than one regressor - will both regressors be included as part of the regression equation?
3. In my regression trees, if a node does not have a regression equation associated with it, is the model overtrained? How do I interpret these results?
Thanks so much,
Carrie
|||
1: I don't have it in front of me right now, so a screenshot would help :)
2: When you create a decision tree with continuous inputs and outputs, I believe the wizard automatically marks all continuous values as REGRESSOR. You can verify this by going to the Mining Models pane in the Data Mining designer. Click on a column name under the mining model (not the mining structure) and look at its properties in the property panel. This is where you can set algorithm-specific modeling flags, and where you would set or clear the REGRESSOR flag.
3. I wouldn't say it was necessarily overtrained, just that there were no significant regressors for that node. For example, assume I had a bunch of demographic data including Age and IQ as my only continuous values and I tried to predict either one. Statistically speaking, they should be independent and there shouldn't be any regressions - just constants. That being said, and since it may not be the case for your model, there are a couple of options open to you. If you think the model may be overfitting, you can increase the MINIMUM_SUPPORT parameter, or the COMPLEXITY_PENALTY parameter. Both of these have the impact of reducing the size of your tree. Additionally, the decision tree algorithm has a FORCE_REGRESSOR parameter allowing you to specify a regressor that will be included in any regression, regardless of how minimal its contribution
|||Is there a dmx query that will return the actual numeric value of the diamond (residual) in each node of the regression tree?|||I am just making sure my question is still in the cue....thanks|||Assuming your model name was "cp" and your attribute name was "IQ", I think this is the query you want
select FLATTENED NODE_CAPTION,
NODE_NAME,
(select ATTRIBUTE_VALUE AS mean,
[VARIANCE] as [variance]
from NODE_DISTRIBUTION WHERE VALUETYPE=3)
as stats
from cp.content WHERE ATTRIBUTE_NAME='IQ'
No comments:
Post a Comment