Sunday, January 17, 2016

Variable Selection

First, apply simple steps to remove junk variables. For instance, throw those variables that have got too high a proportion of missing values, too low coefficient of variation, etc.
Then apply Weight of Evidence and Information Value (IV).
Then throw those variables with IV either too low or too high.
Then choose those values that have a high WoE.
You can do this for categorical variables. For continuous variables, first apply binning and then apply WoE and IV to those.
Then apply VIF (Variance Inflation Factor) to remove those variables having high muliticollinearity.

No comments:

Post a Comment