It’s very tricky to do predictive analytics without quality software. Fortunately for all of us, there is a great selection of high quality software to choose from. Even better news for people who are starting out, the quality of the open-source software is outstanding. So much so, that some commercial software products are building using open-source as its foundation, and others are allowing open-sourced software to be added to their commercial products as modules. It is an exciting time to get into predictive analytics.
We have a soft spot and an unconcealed bias toward this product, because its the one we trained on when our organization first purchased it back when it was originally offered by SPSS as Clementine. As the saying goes, it’s easy to learn and use, but can take years to master. The GUI interface is very visual and very friendly. You don’t need to know very much about software coding, but a little knowledge of BASIC helps. Similarly, you don’t need to know anything about SQL, but that too can help. IBM Modeler offers a rich and easy to use palette of options for the first-time user, and great support, but all of this comes at a cost. It is also quite expensive, but they do offer a limited free to try option if you want to get know “her” better (we still call her Clementine for sentimental reasons.)
SAS Enterprise Miner
As long as I have been using IBM Modeler I have heard about SAS, which has also been selling hard to our organization and seems to have as many fans. If you have a statistics background and have used SAS before, then I can understand the draw. SAS Enterprise Miner offers a more statistical view of the models, which will allow an experienced statistician greater understanding and access to his research. That said, I have come to understand that it is not as easy for a new user to learn as it involves a lot more coding.
If you are looking for open-source analytics software, then RapidMiner usually appears at the top of any search. RapidMiner consistently scores high in third party reviews and can be used free of charge, depending on how many records you are modelling. After a certain size, the price will scale depending on the number of records. I downloaded this software to give it a review and found it also offers a nice GUI and far more modelling options than my first favourite IBM Modeler. If you are looking to get set up for in-house analytics and you have a smaller budget, I don’t see how you can go wrong with Rapid Miner.
Orange is right up there with RapidMiner, and is also very highly recommended. Similarly to RapidMiner, Orange is open-sourced, except there does not appear to be a fee structure for its use. In the next few weeks we intend to review this software in detail and compare it ourselves to the others. For now, we can only tell you that we do not know personally how well it works, but we looked at the reviews and it seems to be a sound alternative.
Knime Analytics Platform
Recently, Alex directed me to Knime Analytics which I had heard of but never tried. Knime appears to be cloud based and that alone peaks my interest. While the organization I work for does not permit its data to exist on the cloud, I am interested in looking into this further because more and more data is being loaded into the cloud these days.
Underneath many of these platforms lies R, a powerful statistical language by itself which is also completely open-sourced. Using R as it is reminds me a lot of the good old days working in DOS. It does not come with a pretty GUI like the other packages, the data is just in there and all you see is a blinking cursor.
That said, if this does not sound like much fun to you, there are also provided many, many, many open-sourced interfaces for R that adds GUI capability, graphing and other tools. In fact, many of the modelling components themselves are offered as individual modules. Just pick and choose what you like. Because so many analytics packages are build on R, having a good understanding of R and its workers can only help you if you should choose one of the other analytics tools.
Of course, if you don’t want to learn that’s okay too. I’ve managed to get by without it for the last twelve years and I am doing fine so far. Consider this though, because R is open-sourced, it is moving faster and growing more powerful than many of the mainstream applications, so learning R might give you an extra edge when building your models. I am looking at learning R myself, just for that reason.