So far, I have covered various topics involving data collection, data mining, data processing and machine learning. While all of these capabilities are important in the role of a data scientist, it is important to know how to present your findings to others in a clear, user-friendly manner. In light of this, I want to use this blog post as an introduction to the world of data visualization, particularly the use of Tableau.

Tableau is a powerful tool that can help data scientists showcase their datasets, analyses and results. …

Between all of my technical-oriented posts, I thought it would be beneficial to speak a little about myself and my choice for pursuing the Data Science Program at Flatiron School.

I have always been an analytical-oriented person. I began my studies at Brandeis University, focusing on business and economics. My economics courses consisted of statistics for economics, econometrics, etc. and these classes provided me with a strong mathematical foundation. However, my undergrad was a liberal arts degree, and while I was exposed to various areas of business and economics, I did not have a specific specialty.

As a result, I…

Oftentimes in Machine Learning, you are working towards solving a classification problem. For example, will someone default on their credit card payment, or pay it on time? Or, based on specific factors, will the flight be classified as delayed or non-delayed? In order to solve these problems, datasets will need to have data for each possible outcome, otherwise known as each “class”. However, the amount of information available for each class may be uneven. If 90% of your data belongs to one class, and 10% to the other, you will be faced with the issue of class imbalance. If you…

SQL is a powerful tool that is used amongst data analysts and scientists in the tech industry. Building a repertoire that includes both SQL and Python knowledge can really help give an aspiring data scientist a leg up in their job interviews, as well as on the job itself! I have begun my SQL learning journey and have decided to create this short blog post that outlines some of the basic queries that can be done in SQL.

  1. The “SELECT” statement:

SELECT * FROM table_name;

When you have a table in SQL, the most basic thing you can do is…

Netflix, Spotify, Facebook and many other platforms have powerful algorithms and tools that target specific advertising to their users. What makes these advertisements so powerful is their ability to cater their ads based on users existing preferences. These giants have comprehensive data that have been used to build sophisticated recommendation systems, resulting in effective, targeted recommendations. But how would a new, up-and-coming startup go about creating a recommendation system? How does a company build this from scratch?

Many companies use collaborative filtering recommendation systems. For example, Dick’s Sporting Goods’ advertises the message that “The more you shop, the better our recommendations are!” If more customers shop, then the company will have more data on users and items, and will be able to use this data to recommend other items or features that improve the customer’s shopping experience.

Collaborative filtering relies on the fact that the company has existing data; otherwise, recommendation systems will suffer from the “cold start problem” and will need to find another method to generate data in order to provide recommendations. For the context…

In the world of data science, there are a variety of machine learning models at a data scientists’ disposal. Often, we are presented with a dataset and are required to identify which ML algorithm suits our data best. While there are built-in processes such as TPOT that can automate this process for us, as a student, it is useful to fit each model manually, in order to fully understand how each model works.

The aim of this blog is to complete a walk-through of how to instantiate, fit, predict, and preview results of a machine-learning model. However, once I walk…

In statistics, a common tool is to build a linear regression model. As a data scientist, it is easy to get caught up in the technical improvements of a model: improving the R-squared, reducing the RMSEs, and removing features with high p-values. However, it is important not to lose sight of the context of your analysis: what do the regression results actually mean? Here we will go through several steps of regression interpretation so that we can understand the results we produce and apply them to a business problem at hand!


To begin, let’s look at the snapshot below…

The TMDb Database is a powerful, easy-to-use API. New to the world of APIs? Read below for some tips on how to get started!

By now, the middle of quarantine, it is safe to say that I have watched the majority of movies on Netflix. So you can imagine that I was excited to discover that our first Data Science project would involve analyzing movies! And thus began my process of investigating the TMDb API.

Using tmdbsimple Wrapper

One great thing about the TMDb API is that there is a Python wrapper available, called “tmdbsimple” which simplifies the use of…

Adina Steinman

Data Science Student at Flatiron School

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store