We frequently come across customer questions which are variations of the above issue. For example, a financial services company will want to know how to select best prospects for their new product launch campaign. A collection agency within an asset management company will want to find out which loan defaulters to pursue so that they have the best chance of collection. A technology services company will want to know which IT support issues are likely to become major problems, so that they can correctly prioritize.
The three examples listed are all classification problems which can be handled by a host of business analytics techniques ranging from Naive Bayes, Decision Trees, Logistic Regression or Support Vector Machines to name a few. Many of these techniques may perform more or less equivalently for a given situation. But this also presents a problem.
The analysts at these businesses do recognize that there are many possible techniques to choose from, but they are strapped for time and cannot launch extensive R&D projects. They understand that analytics and statistics are complex subjects that require in-depth knowledge for best deployment. There is good news today in that open source software make it possible to run an "ensemble" of techniques at a fraction of the cost and time, compared to a few years ago. So the best approach would be to run several algorithms in parallel and compare performances to make the best choice.
In fact, for some well studied problems, the final choice of the technique boils down more to business issues rather than technical issue. Two very important business criteria are the following:
- Can an analyst explain to a non-technical manager why a particular solution makes sense?
- If the analysis is not merely an one-time project, is the solution easy to deploy in the long run so that experts are not needed to apply it for every new situation?