Sunday, October 20, 2013

Different types of regression

I have always felt that regression is a very versatile tool. It can be used for measurement (to explain what happened), for analysis (to understand drivers) and for forecasting. It has a long history and still has relevance in our analytical suite of tools. 

Some of the evolution of regression is very interesting from the perspective of how shortcomings have been addressed. Some of the main arguments / shortcomings against regression are that it does not handle multicollinearity well (especially when you need driver analysis) and some of the assumptions (like the independence of the errors and the explanatory variables) that never seem to be satisfied. Research on these dimensions have led to improvements in methods that can handle these issues. There are three interesting ideas that I want to highlight in this week's blog post.  

There are many ways to handle multicollinearity in analysis. It's importance is driven by the fact that when one needs to measure the impact of key variables, it needs to be independent of other variables that could bias the measurement. Principal component analysis and factor analysis are options to handling multicollinearity but there are significant challenges in interpreting results after that. Latent class is a good way of handling this (and I will be discussing this in the future). Ridge (and Lasso) regression is a simple idea of handling multicollinearity in regression. Conceptually in Ridge regression, to handle multi-collinearity in the data, bias is introduced in the data. This has the effect of reducing the variance in the data which leads to better estimates from an analysis perspective. 

One other disadvantage of least squares regression is it's lack of flexibility. Variable transformations and interactions do add a lot of flexibility but there is one technique that adds a lot more flexibility. Local regression (also known as LOESS regression (or LOWESS - locally weighted least squares)) adds the flexibility that many machine learning techniques have. It does bring in some elements of computational intensity required to handle this but can add the flexibility to deliver interpretable results. Local regression basically creates local subsets to build models on and can hence manage very non-linear relationships well. 


One interesting issue in regression usage has been the difficulty in dealing with counter-intuitive results. Bayesian Regression provides an approach to formulate hypothesis that can be incorporated into the regression analysis. This can help bring in prior knowledge to play an important role in the analysis while minimizing very counter-intuitive results. Of course, as with all regression techniques, the modeler will need to use his intelligence to get to the best models.

In any case, there is a lot more to regression than meets the eye! 

No comments:

Post a Comment