Data Science is predictive in a particular way

Some business people get annoyed when prediction is involved. They believe that prediction is more akin to fantasy and fake fortune tellers. Furthermore, many don’t understand what can and what cannot be predicted in Data Science. In this post we explain in more detail the predictive nature of Data Science.

Is prediction something that only belongs to science fiction? Of course not, as we have already tried to explain in other posts (for example, see my posts “Data Science is NOT Statistics” and “Can we predict the future? “). But what exactly can be predicted by Data Science? Because of…

Many people, especially those working in or with IT (Information Technology) departments, are familiar with software engineering practices and with the typical features of software development projects. Unfortunately many of them have the same expectations for Data Science projects. But they must change some entrenched and old practices. In this post we explain what is unique in the goals that Data Science projects pursue and why these unique aspects make their return on investment much higher.

Software development projects have as its ultimate goal the making of a software with determined features (that were specified in an iterative or not…

Deep Learning is one of the most successful areas of Machine Learning. It has now been applied to problems of very different types. It has also reached levels of accuracy in some of these problems that no other Machine Learning model has ever reached. But what is Deep Learning? Are there more than one type of Deep Learning model? How does Deep Learning differ from other areas of Machine Learning? In this post we answer these questions and more.

Deep Learning, as it is done nowadays, has really nothing to do with the actual simulation of the human neural network…

Data Science initiatives have a low cost base and a high ROI

The best financial aspects of Data Science projects aren’t only its low cost base but especially its large return on investment (ROI). It is one of the best financial investments you can make in your business. In this post we give you more details and explain how it can be done with great financial returns, low initial investment and no hidden costs (now and in the future).

Instead of the very expensive software mandated by large vendors, Data Science projects can be done with open source software that costs nearly nothing. There is a very active community, including some of…

ETL tools can play a major role in your analytics project

ETL tools are an important part of any data analytics, machine learning project as the required data is usually only available in different data sources. So ETL (extract transform load) is a much needed part of the process. In this post we explore the best open source ETL tools available.

The ETL process basically involves:

  • the extraction of data from homogeneous or heterogeneous data sources,
  • the transformation of the data for storing it in proper format or structure needed for querying and analytics purpose,
  • the loading of the data it into the final target (database, operational data store, data mart…

Online programming job interviews with tasks to be performed under critical eyes and a limited time don’t serve their purpose. There is a much better option.

Online programming interviews with tasks to be performed under critical eyes and a limited time are the opposite of what they are aimed to measure.

One of the best books I have read in the past years is “Deep Work” by Cal Newport. In this book, Carl shows how many of the greatest achievements of humankind were done through focused, concentrated work in isolation. …

When working with supervised models, evaluation of the model performance is a lot easier, you just need to compare the predicted target values and the actual target values.

But unsupervised models don't have any target values. They seem more challenging to be evaluated. But not always so. Clustering is a case with good performance evaluation metrics.

We can think of what proper clusters should look like. They have:

  • Large distances between the different clusters (that is, they are well separated)
  • Very small distances between observations in the same cluster

Silhouette coefficients are a good way to measure these two desired…

Customers are a key ingredient to the success of nearly any business in the market. Predictive customer analytics can help you to identify your most potential markets and their customers, to reach them through the most effective channels and, once you have acquired them, to offer them the best customer experience to keep them as your customer.

When the potential customer base is large and global, it is costly to search for means to target the right customer at the right price point with the right service and product features. There are also global competitors. Your business should try to…

The main goal of statistical data analysis, that is, data science, is to understand data. Once you understand it, you can also control it and manage it to meet your present and future goals. Including business goals.

Statistical data analysis allows us to find relationships, associations, trends and patterns in data. Things that you cannot see with your naked eyes. But the powerful tools of statistics can. From the classic and basic statistical methods like simple linear regression analysis to the new and advanced methods like support vector machines.

If just 20 years ago a feature film like "Toy Story"…

Paulo C. Rios Jr.

Data Science, Machine Learning and AI expert

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store