sekedar pet crepet

Thursday, April 03, 2008

Steps in Model Building

A friend asked me about best-practices steps in buiding a statistical model. Here is my suggestion:

1. You should identify the needed (predictor) variables that will be involved in the model. It could be done by consideration of the background knowledge, experiences, and the availability of the data. A key variable may not be available so you should find the proxy to it.

2. Still in the preparation, you have to recognize the nature of your data. Perhaps, you must check the definitions, unit of variables.

3. Then, do some simple descriptions such as frequency table for categorical variable or central tendency and dispersion for continuous data. Then you may also need to make some scatter plot. Within this step, we could characterize our data. We may find some 'unusual' data, and have to fix them.

4. Manipulate your data. The description step may suggest you to transform variables. You may also need to diskretisize the data, or otherwise to make dummy variables from the categorical ones.

5. After that, select the predictor variables which are proper to be included. You may use statistical correlation/association test to do that. Or, you may use some selection algorithm e.g stepwise.

6. List several tentative models from the previous step and calculate several performance indicators of the models. based on the indicators, you could find your best model. And of course, you need to validate your model before you apply it.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]



<< Home