Saturday, September 21, 2013

Analytics education... What is the best way to get ideas across?

While trying to figure out what to do in life, I am in the process of exploring the analytics education space. Given my background in this space, and the goal of retiring into this job, I have been thinking about this for a while. However I have not done a whole lot till in this space other than do some trainings in a haphazard manner till the beginning of this year. 

I am a firm believer in the idea that if you want to learn something you need to teach it (hopefully to a bunch of interested folks). I have developed some perspectives on what it takes to be a good business analyst and there is significant element of business inputs needed for that. However the quantitative element is very important as business intuition takes time time to develop. The focus on this element is minimal though in most education programs until recent times. MBA programs are now introducing more rigorous quant subjects and can actually make a full time quant MBA a possibility soon. Is it possible to make it better now? It feels like it should be available in college also and not just in specialized MBA programs.

In the interest of my personal journey, I have decided to do a couple of things. Take a step back and do some learning. I have signed up for a couple of topics in Coursera to see how it feels to learn something in a new world. I might be challenged also from a pure discipline perspective but I will need to try. The second thing that I am trying to do is evaluate different mediums from the perspective of analytics learning. My particular interest would be in quantitative subjects but this will be an interesting experience for me to explore other areas also as there will be significant learnings that I am hoping to get. These different mediums span the breadth of technology from the plain classroom to the Android / Windows / Iphone app. There seems to be a sea change in this world from the time I studied many of these subjects.


I have been talking to the director of a leading MBA education institution in Bangalore about conducting training at their location. This scenario seemed like a good place to start looking at how outsiders take to analytical education as compared to insiders. (My perspective is when you pay for it, you are more than willing to learn but if it is free then who cares - I am a shining example of this). A colleague of mine introduced me to a company that does coaching. I need to see how that will work out but I am struggling to get my thoughts on my future in order. It looks like there is some potential to do some interesting stuff there too. 

Friday, September 13, 2013

Treenet and Stochastic Gradient Boosting

While I am someone who likes to go deep into techniques, I rarely get the time (maybe some intellectual honesty would require me to say that I get distracted easily by what is happening around me) to understand something technical. However there is an expectation that "I get it faster!" So for this week, I wanted to actually get deep into one such thought process and bring out concepts into simple intuitive ideas.

This week I want to get into Stochastic Gradient Boosting. Purely because I understand it well enough to explain I guess but another reason that is a little more personal is the fact that I had a dinner conversation with the inventor of this methodology. Jerome Friedman made a presentation at and event at my previous job and since I was organizing the event I was able to meet up with him for dinner along with a few other colleagues. It always feels good to be with statistical royalty and these folks are quite down to earth. While dinner was good, the conversation was better as we learned about his Princeton days when he was colleagues with John Nash the famous Nobel Prize winning economist.

Anyways coming to the key idea of this blog!!! Stochastic gradient boosting is an approach used to improve supervised learning methods. In a typical classification problem accuracy needs to be improved without overfitting the data. With any algorithm, all one can typically do is come up with better features to improve the model. There is significant learning to be had from classification error though. Wherever error is high, there is an opportunity for improvement. Modeling the error (based on any algorithm that you have already used to get this far) will allow you to further reduce it. However, there is one problem to watch out for. Errors are technically independent of the model being developed and hence we need to watch out for spurious relationships. Penalizing the error reduces the impact of these variables being able to significantly impact the analysis unless there is value coming through them being in the model. This in a nutshell is SGB and Treenet is a commercial implementation of this for decision tree algorithms.

R also has an implementation of Stochastic Gradient Boosting. Actually it has many. The GBM package is a good place to start as it has simple implementation of Bagging and one can start exploring more advanced packages that implement boosting for other algorithms including regression (l2boost) and SVM (wSVM).

I guess as a next step I should read Jerome Friedman's paper and synthesize this! 

Monday, September 9, 2013

Analytical software for analysts - are they way too complex?


Are there analytical software out there that actually make learning from data intuitive? I have experience with quite a few of these packages but none of them are intuitive for the average business analyst without making them useless after looking at data in one or two dimensions. While this is good for business, I must admit that it makes life difficult as the problems one has to tackle get quite mundane when responding to queries from the not so statistically literate. 

What would be the ideal requirements for one to actually be able to get ideas from data? Let us assume that the average user has a sense of the business he / she is dealing in. At the end of the analysis he should be able to get a sense of how to drive the business forward or at least has a good sense of what are some of the drivers that would explore further. Let us further assume that the average business user also has the ability to understand counter-intuitive results and can basically understand two dimension analysis and can possibly understand three dimension analysis but will be unable to move forward beyond that. 

Ideally when my business problems are well-defined (in the sense that I at least know what I want to solve initially even though I might realize that I need to solve something much larger later), then these tools should be able to at least drive some initial value for the analysts by incorporating these business requirements. But when I am sifting through data without a clue as to what I am looking for, how do I identify patterns that are meaningful and at the same time not require me to be in that business domain forever?

Regression analysis required significant understanding of the statistics to be able to confidently drive the analysis. CART / CHAID type algorithms are relatively easier to understand but I am not sure if there are decent implementations of a software that makes the learning from CHAID / CART intuitive. Bayesian networks or topological data analysis might be an answer but I have not worked enough with these to have a viewpoint on the implementation perspective. These are good with identifying patterns but do not necessarily make it easier for the business to get their reads better.

Ultimately I believe business problems need to be solved with the business context in mind and there are no general software that will enable that. Is it time for one to be created?

Saturday, September 7, 2013

To Bayes or not to Bayes

Why is this argument important? For a long time, the argument of the frequentist approach was that the the data generating mechanism had a distribution where the parameters that described the distribution were fixed. While this made sense initially (why would that parameter change in any case) and all you would do would be to estimate that parameter based on the data that you observe. However Bayesian inference came into the world much later and postulated (I am not sure who did it specifically) that I should be using any prior information that I have about the parameter estimate and not necessarily let it be driven purely by data.

While this in theory sounded quite radical initially, there have been significant contributions that have enabled this idea to be used successfully in very practical applications. Specifically, Bayesian regression is quite useful to build updating models by using continuous data collection mechanism as opposed to waiting till models deteriorate to the point of having to be rebuild. This can incorporate a good test and learn setup from a data input perspective. These models have very practical applications in credit scoring, churn analysis and customer acquisition.

The machine learning world took to Bayes theorem a lot more seriously than the statistical crowd. Algorithms which assumed prior knowledge and then were updated based on fresh data seemed to make a lot more sense. Spam filtering is one of the biggest application of this theorem. A general rule to define spam based on many emails can be a baseline, and the model can then be updated based on user characteristics and performance. This allows the spam filter to be very customized to the user.

Judea Pearl is one of the pioneers in looking at Bayesian Inference from a fresh new perspective. Graph theory has been in mathematics for quite a long time. However the usage of Bayesian theory enabled a fresh new perspective in this domain and Bayesian Networks is the result of this marriage. The network structure allows one to incorporate a lot more variables in the model and measure causal relationships which previously was only available in the time series domain (Will write on this later!).

The bottom line that I see is that the frequentist approach is outdated and we need to develop that perspective when looking at new models. This should be the way we think of incorporating models in the real world.