While I am someone who likes to go deep into techniques, I rarely get the time (maybe some intellectual honesty would require me to say that I get distracted easily by what is happening around me) to understand something technical. However there is an expectation that "I get it faster!" So for this week, I wanted to actually get deep into one such thought process and bring out concepts into simple intuitive ideas.
This week I want to get into Stochastic Gradient Boosting. Purely because I understand it well enough to explain I guess but another reason that is a little more personal is the fact that I had a dinner conversation with the inventor of this methodology. Jerome Friedman made a presentation at and event at my previous job and since I was organizing the event I was able to meet up with him for dinner along with a few other colleagues. It always feels good to be with statistical royalty and these folks are quite down to earth. While dinner was good, the conversation was better as we learned about his Princeton days when he was colleagues with John Nash the famous Nobel Prize winning economist.
Anyways coming to the key idea of this blog!!! Stochastic gradient boosting is an approach used to improve supervised learning methods. In a typical classification problem accuracy needs to be improved without overfitting the data. With any algorithm, all one can typically do is come up with better features to improve the model. There is significant learning to be had from classification error though. Wherever error is high, there is an opportunity for improvement. Modeling the error (based on any algorithm that you have already used to get this far) will allow you to further reduce it. However, there is one problem to watch out for. Errors are technically independent of the model being developed and hence we need to watch out for spurious relationships. Penalizing the error reduces the impact of these variables being able to significantly impact the analysis unless there is value coming through them being in the model. This in a nutshell is SGB and Treenet is a commercial implementation of this for decision tree algorithms.
R also has an implementation of Stochastic Gradient Boosting. Actually it has many. The GBM package is a good place to start as it has simple implementation of Bagging and one can start exploring more advanced packages that implement boosting for other algorithms including regression (l2boost) and SVM (wSVM).
I guess as a next step I should read Jerome Friedman's paper and synthesize this!
This week I want to get into Stochastic Gradient Boosting. Purely because I understand it well enough to explain I guess but another reason that is a little more personal is the fact that I had a dinner conversation with the inventor of this methodology. Jerome Friedman made a presentation at and event at my previous job and since I was organizing the event I was able to meet up with him for dinner along with a few other colleagues. It always feels good to be with statistical royalty and these folks are quite down to earth. While dinner was good, the conversation was better as we learned about his Princeton days when he was colleagues with John Nash the famous Nobel Prize winning economist.
Anyways coming to the key idea of this blog!!! Stochastic gradient boosting is an approach used to improve supervised learning methods. In a typical classification problem accuracy needs to be improved without overfitting the data. With any algorithm, all one can typically do is come up with better features to improve the model. There is significant learning to be had from classification error though. Wherever error is high, there is an opportunity for improvement. Modeling the error (based on any algorithm that you have already used to get this far) will allow you to further reduce it. However, there is one problem to watch out for. Errors are technically independent of the model being developed and hence we need to watch out for spurious relationships. Penalizing the error reduces the impact of these variables being able to significantly impact the analysis unless there is value coming through them being in the model. This in a nutshell is SGB and Treenet is a commercial implementation of this for decision tree algorithms.
R also has an implementation of Stochastic Gradient Boosting. Actually it has many. The GBM package is a good place to start as it has simple implementation of Bagging and one can start exploring more advanced packages that implement boosting for other algorithms including regression (l2boost) and SVM (wSVM).
I guess as a next step I should read Jerome Friedman's paper and synthesize this!
No comments:
Post a Comment