Wednesday, March 21, 2012

DMX Query for regression coefficients

How do I write a DMX query to return the coefficients of the independent variables in my regression equation?

Thanks,

Carrie

All algorithm content is in the content schema rowset available through

SELECT * FROM <model name>.CONTENT

Although the schema is the same for all algorithms, each uses the schema slightly differently. The schema itself is difficult to decode, but you can download a plug-in viewer from http://www.sqlserverdatamining.com/dmcommunity/_downloads/1348.aspx that decodes all the types/etc into their parts. Once you do this, you will see what columns/etc you need from the content.

|||

We do not have the sgKey.snk file. Can we generate one? If so how?

Cryptographic failure while signing assembly 'C:\Documents and Settings\dtm\My Documents\dot net examples\Generic Content Viewer\GenericContentTreeViewerSetup\obj\Debug\GenericContentTreeView.dll' -- 'Error reading key file 'c:\Documents and Settings\dtm\My Documents\dot net examples\Generic Content Viewer\GenericContentTreeViewerSetup\sgKey.snk' -- The system cannot find the file specified. '

Thanks

|||Something happened to the download - the snk file isn't the only one missing - we're looking into it. However, the setup should have installed the viewer anyway, so you should see it in BI Dev Studio, did it not?|||

I understand the content viewer now. I thought maybe I was looking for something different but now I see.

Three more questions:

1. My regressor variable has three values associated with it. I understand the first (the coefficient in the regression equation) and the third, used to calculate the constant - but what is the second value? A screenshot would probably be more helpful.

2. How do I set an input variable to regressor in the mining wizard. It is not listed as a modeling flag option. If I have more than one regressor - will both regressors be included as part of the regression equation?

3. In my regression trees, if a node does not have a regression equation associated with it, is the model overtrained? How do I interpret these results?

Thanks so much,

Carrie

|||

1: I don't have it in front of me right now, so a screenshot would help :)

2: When you create a decision tree with continuous inputs and outputs, I believe the wizard automatically marks all continuous values as REGRESSOR. You can verify this by going to the Mining Models pane in the Data Mining designer. Click on a column name under the mining model (not the mining structure) and look at its properties in the property panel. This is where you can set algorithm-specific modeling flags, and where you would set or clear the REGRESSOR flag.

3. I wouldn't say it was necessarily overtrained, just that there were no significant regressors for that node. For example, assume I had a bunch of demographic data including Age and IQ as my only continuous values and I tried to predict either one. Statistically speaking, they should be independent and there shouldn't be any regressions - just constants. That being said, and since it may not be the case for your model, there are a couple of options open to you. If you think the model may be overfitting, you can increase the MINIMUM_SUPPORT parameter, or the COMPLEXITY_PENALTY parameter. Both of these have the impact of reducing the size of your tree. Additionally, the decision tree algorithm has a FORCE_REGRESSOR parameter allowing you to specify a regressor that will be included in any regression, regardless of how minimal its contribution

|||Is there a dmx query that will return the actual numeric value of the diamond (residual) in each node of the regression tree?|||I am just making sure my question is still in the cue....thanks|||

Assuming your model name was "cp" and your attribute name was "IQ", I think this is the query you want

select FLATTENED NODE_CAPTION,
NODE_NAME,
(select ATTRIBUTE_VALUE AS mean,
[VARIANCE] as [variance]
from NODE_DISTRIBUTION WHERE VALUETYPE=3)
as stats
from cp.content WHERE ATTRIBUTE_NAME='IQ'

No comments:

Post a Comment