I recently got pulled into a conversation with one of our really smart analysts. He has a dreamy vision of getting into bureaucratic India and at the same time is really smart when it comes to coding. We got talking and he started describing this competition on Kaggle. While I heard him out and was also getting a bit excited about doing something hands on, I realized that the industry has really grown around me and I have not had the chance to appreciate the growth.
Kaggle is one of many websites that offer competitions. KD Nuggets, Analytic Bridge etc. are other sites that have these competitions. What is interesting is where the the solution approaches seeming to be heading to. A few years ago (or maybe many years ago - and it can be a separate blog topic), we (at grad school) discussed the coming of age of machine learning. Given large amounts of data, how can you get accurate predictions for different problems. With the focus being only on predictions, these algorithms were able to meet many statistical techniques purely due to the lack of any constraints that a data generating model would impose on a statistician. Why are we constraining ourselves from a hypothesis perspective? Has statistics lost out on the chance to be the next cool thing in the world and will machine learning take over? It makes sense to understand why is one even relevant in this day and age.
Many machine learning algorithms are inherently black box purely because of the way the characteristics relate to the object that needs to be predicted. While there are ways of understanding which characteristics are important and associated sensitivities, there is potential for it to be misleading if not diagnosed properly. Most machine learning techniques have a significant validation component to ensure that the algorithms are robust and can handle exception cases.
Where does this lead us to? One of the most interesting expectations from machine learning is we can live in a IRobot kind of environment where machines can predict survival rate based on their learning. Google has designed an algorithm that can identify cats (even though I am not sure what it would call it) and there is potential for machines to get smarter with time.
BTW here is a plug for one more analytics competition. Should be fun if you are in college!!! (It is quite rewarding from a financial perspective!)
Kaggle is one of many websites that offer competitions. KD Nuggets, Analytic Bridge etc. are other sites that have these competitions. What is interesting is where the the solution approaches seeming to be heading to. A few years ago (or maybe many years ago - and it can be a separate blog topic), we (at grad school) discussed the coming of age of machine learning. Given large amounts of data, how can you get accurate predictions for different problems. With the focus being only on predictions, these algorithms were able to meet many statistical techniques purely due to the lack of any constraints that a data generating model would impose on a statistician. Why are we constraining ourselves from a hypothesis perspective? Has statistics lost out on the chance to be the next cool thing in the world and will machine learning take over? It makes sense to understand why is one even relevant in this day and age.
Many machine learning algorithms are inherently black box purely because of the way the characteristics relate to the object that needs to be predicted. While there are ways of understanding which characteristics are important and associated sensitivities, there is potential for it to be misleading if not diagnosed properly. Most machine learning techniques have a significant validation component to ensure that the algorithms are robust and can handle exception cases.
Where does this lead us to? One of the most interesting expectations from machine learning is we can live in a IRobot kind of environment where machines can predict survival rate based on their learning. Google has designed an algorithm that can identify cats (even though I am not sure what it would call it) and there is potential for machines to get smarter with time.
BTW here is a plug for one more analytics competition. Should be fun if you are in college!!! (It is quite rewarding from a financial perspective!)
 
No comments:
Post a Comment