There are two famous and industry accepted methods for predictive analytics, and we strongly recommend that you adopt one for your organization.
The method the Monkey Miners adopted is called CRISP-DM, pictured above. If you use SAS as your analytics software, then chances are you ard using the SEMMA method for data mining, which we are not explaining here, but there is plenty of information on SEMMA available on tne internet.
By selecting a methodology, you work will gain a credibility that will help you build trust within your organization. But more than that, you will have a guide that keeps you focused, and shows you the smaller goals to keep you on track during the months it can take to complete the process.
This is probably the most important phase of CRISP-DM as it lays the ground work for the entire analytics process. It isn’t just about learning the business process, it’s also about defining the goal and building expectations for your results. If you get this right, there is no guarantee of a successful result, but if you don’t get it all right, you will likely spend months that will not end well.
Once you have a goal and you know what you have to do, you have to be able to recreate the business logic in the data you are given. You will read all of the documentation on data you can find, but you will spend a lot of time experimenting and testing the data to see what you can do with it.
During this phase you will need to see if you can define the population and the target variable. It’s also a good idea to try and use what data you have gathered to this point to try and rapid prototype a test model to see what you can do.
During this phase, you will earnestly and carefully prepare the data and create new data that clarifies the existing data when necessary. A large part of this phase will be put into simulating the conditions using past data that you will have when the model is run. This will take about 80% of the entire process in time and effort.
This phase is usually considered the most fun by modellers. You will try different models, or a combination of models, to find which combination provides the most realable and useful result to your organization.
There are a lot of ways to test a model to make sure it will work. Simulation testing on other years, statistical tests and, my personal favourite, pilot testing.
Models do not always get to live in their final home feeding on the same data they are given when they are built. Converting models to run in a production environment can take a long time, and occasionally, nearly as long as you took to actually build the model itself.
Notice all the arrows. The process does not just go forward, it also goes backwards. You will discover things you did not know before, if you are doing a good job. Sometimes a discovery will cause you to go back and make changes to a previous phase. This is a good thing. Predictive Analytics is an “iterative” process, which means it goes backwards to earlier phases when needed, by design.
This is the process that we use, and we refer to it often, which is why we put it into its own page to serve as a continuous reference.