Guide to Fitting, Predicting and Creating Functions for Machine Learning Models

4 min readDec 25, 2020

In the world of data science, there are a variety of machine learning models at a data scientists’ disposal. Often, we are presented with a dataset and are required to identify which ML algorithm suits our data best. While there are built-in processes such as TPOT that can automate this process for us, as a student, it is useful to fit each model manually, in order to fully understand how each model works.

The aim of this blog is to complete a walk-through of how to instantiate, fit, predict, and preview results of a machine-learning model. However, once I walk through this process, I will introduce how to create a function that performs these same steps, in order to assist new students with creating a quicker, more efficient workflow. Let’s get started!

1. Context

The dataset we will explore looks at 2015 Flight Delays and Cancellations. The goal of the analysis is to see which factors help predict whether a flight will be delayed or not, particularly if a specific airport, airline, month or day of the week influences the result. To start, I examined the distribution in our dataset of cancelled and non-cancelled flights:

2. Fitting the Model

I then fit the model by setting our X and Y values, followed by a train test split. It is clear from the image above there is a strong class imbalance in the dataset, so I implemented a SMOTE technique in order to mitigate these differences. See the SMOTE documentation for more details here.

3. Vanilla Model Example

You need to use these resampled Xs and Ys in order to fit our model, but any predictions will be done on the original X_train and y_train values. See below for a basic, vanilla example using the ADBoost algorithm.

The model performs quite well overall, particularly with a strong accuracy and precision score for the test set. Of course, further tuning of this model, by including hyper-parameters, could only improve the model further. That is outside the scope of this blog post, but is quite important to note that so far we have only trained the vanilla model.

3. Creating a function

The process for fitting many machine algorithms is the same; fit on the SMOTE values, predict on original X_train and X_test, and then display accuracy, precision, recall and f1 scores. There are other results that can be shown as well, such as the classification report and a confusion matrix. If we were to run several ML models, we would need to rewrite the same code over and over again.

A much faster, as well as reader-friendly approach is to create a function that performs all these steps, and then just call the function on each different model that you instantiate.

Take a look at the function below:

The model takes in several inputs: the X and Y SMOTE values, the original X_train, y_train, X_test, and y_test, followed by the model classifier, and then the name of the model.

Next, the model is fit using the SMOTE Xs and Ys. The predictions are then calculated using the original X_train and X_test values. Then, a dictionary called “result” is created, which contains the model type and name, then the accuracy, precision, recall and f1 scores for both the train and test sets.

Finally, the function prints out several items, so that when the function is called, specific things are shown. First, we print the classification report for the train data, followed by the accuracy, precision, recall and f1 scores that were calculated in the dictionary within the function. Lastly, the function creates subplots using Matplotlib to display the confusion matrices for the train and test data.

While this code looks like a lot, it is quite easy to understand once it is used in practice. Below, we call the function on the ADBoost classifier that we instantiated before.

And that’s it! One simple function was written, and we quickly now have all the results for the model in one place. If you would like to run different machine learning models, simply instantiate the classifier, then run the model through your function, and viola!

Hopefully this post gave you a solid understanding of how building functions can simplify your modeling process, by eliminating the need to write repeated code over and over again. It is important to always consider writing a function when building machine learning models, as it can make your work a lot easier to follow.

Thank you for reading and good luck with building your machine learning models!

Guide to Fitting, Predicting and Creating Functions for Machine Learning Models

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Adina Steinman

No responses yet

More from Adina Steinman

Navigating my first API: the TMDb Database

The TMDb Database is a powerful, easy-to-use API. New to the world of APIs? Read below for some tips on how to get started!

Building A Collaborative-Filtering Recommendation System in Surprise

Many companies use collaborative filtering recommendation systems. For example, Dick’s Sporting Goods’ message in the image above makes…

Analyzing a Linear Regression Result

In statistics, a common tool is to build a linear regression model. As a data scientist, it is easy to get caught up in the technical…

Building a Tableau Dashboard

So far, I have covered various topics involving data collection, data mining, data processing and machine learning. While all of these…

Recommended from Medium

Interpreting Support Vector Machine Coefficients: A Comprehensive Analysis

In the rapidly advancing landscape of artificial intelligence (AI) and machine learning (ML), specific methodologies and their…

Fashion Recommendation System using Image Features and Python

A Fashion Recommendation System using Image Features leverages computer vision and Machine Learning techniques to analyze fashion items’…

Predicting Bitcoin Prices with Machine Learning: A Neural Network Approach

In the volatile world of cryptocurrency, accurately predicting price movements can be the difference between significant gains and…

Sentiment Analysis of Online Reviews with Different Lexicons using R

This is the third article in a series that explores the topic of sentiment analysis using R. Sentiment analysis is a powerful technique…

10 Must-Know Machine Learning Algorithms for Data Scientists

Machine learning is the science of getting computers to act without being explicitly programmed.” — Andrew Ng

Building an Autonomous Twitter Account with LLMs

I created my own Twitter bot using Hacker News posts, the GPT-4 API, and scheduled CRON jobs. Check out its tweets here.