In common practice, statistical models are created by data scientists who use a very general language to select among a number of modeling packages. Languages grow in the number of features and the number of potential modeling packages is open ended. As the technology grows, the conceptual distance between the data scientist and the business end user will also tend to grow. Yet, the desire of the client to receive actionable information in a plain and intuitive way will always remain the same.
datadecisions Group asserts that the best approach to modeling must embody three principles.
Identify and understand the business need
The first of these is that everything must begin with the needs of the end user. His requirement for understanding and insights is easily met through models built using one predictive variable (univariate). A univariate model can be understood immediately by means of a curve or by a table. Each candidate predictor must have a model generated for it automatically, for there could be a thousand or more candidates.
Be resourceful with automation
As much as possible should be done for the modeler through automation with a consequent multiplication of his efficiency and productivity. The multivariate model would be the result of an automated, efficient combining of qualified univariate models. The univariate models are thus seen as building blocks, each of which is easily understood by the end user. That this works very well in practice we know from more than a decade of experience.
Other decisions can be aided by exploiting the automation principle wherever possible. For example, given that we have many univariate models, how to efficiently cull the best of them for the multivariate model? In our approach, they are put into a statistical competition so that those that perform best can be recommended objectively. Of course, the user has final control over what goes into the multivariate model, but he should be spared the considerable time and effort of trying to do the same without the advantage of automation.
Use the correct data
The protection of all parties against wrong data must be part of the automation-a sensible idea that receives surprisingly inadequate attention. For example, when scoring it should not be assumed that the data being scored are from the same population as was modeled. With suitable effort, any distributional differences can be detected and reported. It is the modeler’s responsibility to protect the end user from data processing and other errors that could subvert the entire process.
Automation is one aspect of facilitating the many decisions that a modeler would have to make. The great majority of these choices can be made quickly and naturally with a mouse click. Such common-sense ideas amount to allocating to machine power everything that a machine can do to lighten and speed the work of the modeler. Once freed of many of the mechanics of modeling production, he is better able to pursue the vital creation of perspicacious predictors that will both improve prediction accuracy and offer superior insights for the end user.
Conclusion
While the details around model construction can be complex, the implementation of these primary principles provides clear benefits to end users. Recognizing potential business areas that can be positively impacted enable diverse team members to align on goals and objectives.