What can be predicted with Data Science? And what can’t?

Paulo C. Rios Jr.
3 min readMar 20, 2021

Data Science is predictive in a particular way

Some business people get annoyed when prediction is involved. They believe that prediction is more akin to fantasy and fake fortune tellers. Furthermore, many don’t understand what can and what cannot be predicted in Data Science. In this post we explain in more detail the predictive nature of Data Science.

Is prediction something that only belongs to science fiction? Of course not, as we have already tried to explain in other posts (for example, see my posts “Data Science is NOT Statistics” and “Can we predict the future? “). But what exactly can be predicted by Data Science? Because of its explorative and strategical nature, Data Science is able to search and find trends, patterns and relationships in the data with a specific goal in mind: actionable intelligence. Its findings must hold true for yet unknown data and for the future. And they must be actionable.

Are all its predictions future related? This is another common source of misunderstanding. What is predicted can be something happening right now. In this case, prediction is related to finding out something that was previously unknown. Once Data Science finds the hidden relationships among variables in the data, it can predict what is the result of a combination of variables, by classifying the outcome, for example:

  • Healthcare: a health diagnosis — which disease? Will the patient survive or die?
  • Manufacturing: a critical machinery failure — to predict service disruptions before they occur
  • Logistics: an efficient or inefficient operation/resource
  • Medicine: a damaging treatment
  • Public health: a disease identification
  • Trading: a decreasing or increasing oil price
  • Retail: a compelling combination of products and/or features
  • Banking: a fraudulent or legitimate transaction
  • Environmental protection: a better or worse air
  • Utilities: an unplanned outage of power
  • Marketing: a successful or unsuccessful campaign, a good or bad channel

The result of a combination of variables can also be numeric, so the prediction can also be, for example:

  • Marketing: by how much the sales will change by using certain channels?
  • Banking: how much customers with a certain profile will default?
  • Logistics: by how much the costs will be improved by using better routes?
  • Public health: how many more medicaments will be needed under different conditions?
  • Healthcare: how much sugar levels in blood will change by using a certain drug?
  • Manufacturing: how many hours before a machine will break down?
  • Insurance: how much customers will spend in medical expenses in a given period?
  • Environmental protection: how much more air pollutants will be in the air under certain scenarios?
  • Quality control: how many items will have faults under determined conditions?

Because of its discoverable nature, Data Science can also make different kinds of predictions: what is the structure in this data? How many different distinct groups exist? For example,

  • Marketing: how many types of customers?
  • Retail: which products are commonly bought together?
  • Consumer goods: which chemicals bring the best flavor/taste?
  • Public health: how many types of infections in each area?
  • Healthcare: how many types of cancer cell lines?
  • Environmental protection: how many kinds of pollutants in the air/water?
  • Public policy: how many communities will positively respond to our initiative?

Can Data Science predict everything? If there are no relationships among the variables in the data, no trends, no patterns, no groups, if the data is totally random and with no underlying structure or relationships, then there is no knowledge to be extracted from the data and no prediction can be made. For example, Data Science can’t predict the outcome of the draw of an unbiased dice or a lottery.

But for many things that are critical to business, healthcare, retail, marketing, finance, public policy, banking, insurance, public health, quality control, human resource management, environmental protection, trading, manufacturing and logistics, Data Science excels. It allows us to have a more desirable future by being able to better control the present.

Originally posted on my blog Cyzne.com

--

--