I have always felt that regression is a very versatile tool. It
can be used for measurement (to
explain what happened), for analysis (to understand drivers) and for
forecasting. It has a long history and still has relevance in our analytical
suite of tools.
Some of the evolution of regression is very interesting from the
perspective of how shortcomings have been addressed. Some of the main arguments
/ shortcomings against regression are that it does not handle multicollinearity
well (especially when you need driver analysis) and some of the assumptions
(like the independence of the errors and the explanatory variables) that never
seem to be satisfied. Research on these dimensions have led to improvements in
methods that can handle these issues. There are three interesting ideas that I
want to highlight in this week's blog post.
There are many ways to handle multicollinearity in analysis. It's
importance is driven by the fact that when one needs to measure the impact of
key variables, it needs to be independent of other variables that could bias
the measurement. Principal component analysis and factor analysis are options
to handling multicollinearity but there are significant challenges in
interpreting results after that. Latent class is a good way of handling this
(and I will be discussing this in the future). Ridge (and Lasso) regression is a simple
idea of handling multicollinearity in regression. Conceptually in Ridge
regression, to handle multi-collinearity in the data, bias is introduced in the
data. This has the effect of reducing the variance in the data which leads to
better estimates from an analysis perspective.
One other disadvantage of least squares regression is it's lack of
flexibility. Variable transformations and interactions do add a lot of
flexibility but there is one technique that adds a lot more flexibility. Local
regression (also known as LOESS regression (or LOWESS - locally weighted least
squares)) adds the flexibility that many machine learning techniques have. It
does bring in some elements of computational intensity required to handle this
but can add the flexibility to deliver interpretable results. Local regression
basically creates local subsets to build models on and can hence manage very
non-linear relationships well.
One interesting issue in regression usage has been the difficulty
in dealing with counter-intuitive results. Bayesian Regression provides an
approach to formulate hypothesis that can be incorporated into the regression
analysis. This can help bring in prior knowledge to play an important role in
the analysis while minimizing very counter-intuitive results. Of course, as with all regression techniques, the modeler will need to use his intelligence to get to the best models.
In any case, there is a lot more to regression than meets the eye!
In any case, there is a lot more to regression than meets the eye!
No comments:
Post a Comment