sekedar pet crepet

Thursday, April 17, 2008

fight for your goal

RCTI has broadcasted several spots on audition session of Indonesian Idol 2008, a singing contest which is packaged as a reality show. Thousand teenagers and youngsters crowded every audition halls used. Sunray which was burning the skin was not hot enough to stop the participants. It was not also rainfall could throw them away form the queuing. Some of them, even, came to the session after once or twice fail at the same event last years. Each of them is a real fighter struggling for their goal.

For me, this program is always special because it presented me the spirit of battle overcoming barriers in reaching the ambition. I am envious that I do not have that spirit as large as them. Indonesia Idol shows me that nothing could come to me suddenly without pain and sacrificing, as the participants left their family and expensed their limited cash.

Thursday, April 03, 2008

Steps in Model Building

A friend asked me about best-practices steps in buiding a statistical model. Here is my suggestion:

1. You should identify the needed (predictor) variables that will be involved in the model. It could be done by consideration of the background knowledge, experiences, and the availability of the data. A key variable may not be available so you should find the proxy to it.

2. Still in the preparation, you have to recognize the nature of your data. Perhaps, you must check the definitions, unit of variables.

3. Then, do some simple descriptions such as frequency table for categorical variable or central tendency and dispersion for continuous data. Then you may also need to make some scatter plot. Within this step, we could characterize our data. We may find some 'unusual' data, and have to fix them.

4. Manipulate your data. The description step may suggest you to transform variables. You may also need to diskretisize the data, or otherwise to make dummy variables from the categorical ones.

5. After that, select the predictor variables which are proper to be included. You may use statistical correlation/association test to do that. Or, you may use some selection algorithm e.g stepwise.

6. List several tentative models from the previous step and calculate several performance indicators of the models. based on the indicators, you could find your best model. And of course, you need to validate your model before you apply it.