Some analytics terminology may be new to people so we are providing answers from our own perspective, to make it easier to follow along.
What is Predictive Analytics?
Predictive Analytics is the science of using data to link what happens before to what happens after.
Events do not take place randomly in time, they flow from one to another in a way that tends to make sense. If you have the right data, you can find the links and patterns from the events that preceded to the events that followed. Once you know that pattern, you can match the factors of what took place before to data in the present, and arrive at a reasonable expectation of what is to come after, in the future.
It doesn’t have to be a cause and effect relationship, any more than a crowing rooster causes the sun to rise, there only needs to be a link to be found and a pattern that is useful.
What is data mining?
Data Mining is the discipline of sifting through a mountain of data to find the handful of information you can actually use.
We capture data on almost everything, but almost all of it is of no use when you are looking to answer a simple question. There is data about when databases were created, how many cars are on a given highway, the values of codes that link to other tables and what kind of toothpaste you prefer. None of which matters when doing research on a cure to cancer. Data mining is the science of selecting the right data, preparing and cleaning it, turning it into information that serves the person who is asking the question.
What is Big Data?
Big Data is a popular culture reference that serves only to make people pay more attention to how they put all that data they gather to good use. It’s mostly used to sound clever, but doesn’t really mean anything. If you really need to a more definitive answer, then look to the five hallmarks of Big Data, which I remember as the five V’s of Big Data:
If you have just one of these factors, then you likely do not have Big Data, but if you have a combination of these factors then you are playing in the Big Data world. Volume is the amount of data. Velocity is how rapidly the data arrives, for example is it historical, or is it live? Variety refers to the number of sources and the range of the data, likely not linked by any unifying identifying number or name. Value is the quality of the data, where more missing fields or scrambled data makes you likely to be in the Big Data arena. Finally, veracity refers to the accuracy and precision of your data, which is often an issue with collection of data and surveys.
The bottom line with Big Data is, there is no bottom line. If you don’t have big data that’s fantastic, because we would treat you and your data in exactly the same way as someone with Big Data. Big Data just takes longer.