In the data science process, we need to do some preprocessing before machine learning algorithms. These can be some basic data analysis processes such as handling missing values and outliers and data cleaning. We also apply scaling (data transformation) for some data.

Scaling is not mandatory, but it performs better to scale the data before some machine learning algorithms.

The main purpose of scaling is to avoid the effects of greater numeric ranges. …

We have already touched on the importance of model deployment and sharing this model with others. We need to share our model with stakeholders to collaborate or get feedback. Therefore, we need a web app with powerful and interactive content for our colleagues or clients for who we want to showcase our work. Streamlit is the most practical and fastest way to create a web app. You can browse the below to learn more about Streamlit and see simple model deployment examples.

In this tutorial, we will create a virtual environment. In this virtual environment, we will prepare a python…

Streamlit is an open-source Python library that makes it easy to build beautiful custom web-apps for machine learning and data science.

Data scientists run the data science process to arrive at a solution, creativity, and a model. They create a product and at the end of this process, they have to share their product with their stakeholders in order to collaborate or get feedback. Stakeholders can be customers, colleagues, or anyone else, anyone located in another city or perhaps another country.

To share the model, it is necessary to deploy it over the web. Deploying the model is the most…

We previously covered the issue of encoding and its importance. In short, machine learning models are mathematical models that use algorithms that work with numerical data types, and neural networks also work with numerical data types. Therefore, we need encoding methods to convert non-numerical data to meaningful numerical data. We have covered the encoding methods and the options that we can apply these encoding methods at this link.

In this story, we will look at the Pandas get_dummies method. Pandas get_dummies is the easiest way to implement one hot encoding method and it has very useful parameters, of which we…

Label Encoding vs One Hot Encoding

We need numerical data in data science techniques such as machine learning and deep learning models. We start our analysis with categorical and numerical data types. When preparing the data for the model, we drop some categorical data types if we don’t need them, or we use some techniques such as regex and get numerical values. I refer to non-numeric data such as text, object, datetime, etc. with **“categorical data”**. There will always be some columns we need and, there won’t be any numerical values we can get in regex or another function. …

Data analysis is a long process. There are some steps to do this. First of all, we need to recognize the data. We have to know every feature in the dataset. Then we must detect the missing values and clear our dataset from these NaN values. We can fill these NaN values with some values (mean, median, etc.) or we can create our function to fill these missing values. We can also drop some columns that are not helpful or have more NaN values than others.

This process can change. It depends on the data and target. But we must…

Data Scientist | Machine Learning Proficiency | Industrial Engineer