Overfitting using Logistic Regression

6 min readApr 7, 2021

In this blog, I will share my knowledge of overfitting.

What is Overfitting?

Overfitting is a modelling error which occurs if model learns from noise. It learns each and every example from train data set and when new data appears it misclassifies it. Overfitting more likely to occur to complex models with small data size. An overfitting model has less training error and high testing error.

we can overcome overfitting by increasing data size, reducing model complexity or using regularization.

What is Underfitting?

Underfitting is another modelling error like overfitting which occurs if the model is not complex enough to find the relationship between features and labels. An underfitting model has both high training and testing error.

To understand this better, we will use linear regression with polynomial features.

What is Linear Regression?

Linear regression is an easy machine learning algorithm which predicts the output from independent input features. It uses simple formula to predict.

y=a_0+a_1*X

we will now directly dive into code and learn further about this along code

Importing libraries

NumPy is a mathematical python library which supports high dimensional arrays and matrices with functions operate on these.

Matplotlib is a plotting library which helps in plotting graphs

sklearn is a machine learning library for python which offers many regression and classification algorithms .

Now, we will generate 20 pairs of data points using below formula

y=sin(2*pi*X)+N

Here we generated 20 random pairs of data and split them into test and train data pairs.

Normal Linear regression equation cannot give good accurate values if features are distributed like this.

So we use Linear regression with polynomial features. Here we use quadratic equations instead of linear one.

y=a_0+a_1*x+a_2*X² #this is an example of order 2 equation.

y=a_0 #this is for order 0 equation

In above code, we get polynomial features from order 0 to 9 and trained on those features. Then we plot graphs of order 0 ,1, 3 and 9 models.

Since for order 0 polynomial features gives error. I derived points for that manually in following way.

Order 0 equation is y=a_0. So for my model y=N. So , I manually entered polynomial features for order 0.

we also gathered weights, training error and testing error of each model in lists.

So, the output of above code is below.

Here we extracted values from nested list and display table of models and their weights.

Above is table of models with orders and their weights.

Here we plotted errors of train and test datasets.

we can clearly see that for order 0 model, both train and test errors are high. so we can that order 0 model is underfitted model.

we can see that from order 5 model to order 9 model, we have low train error and high test error. So these models are overfitted models.

Now we will create another 100 data points, and see how it looks on model 9.

Here we can see that model is exactly passing through all the points which says that this is a overfitted model.

Here we regularize the model 9 with Ridge rergession.

Above is the graphs of models with different lambda values.

what is Ridge Regression?

Ridge regression is also known as L2 regularization.

Ridge regression uses squared sum of weights(coefficients) as penalty term to loss function. It is used to overcome overfitting problem.

L2 regularization looks like

Ridge regression is linear regression with L2 regularization.

Finding optimal lambda value is crucial. So, we experimented with different lambda values.

Here we plotted the graph of training and testing errors with l2 regularization.

So the best models are model with order 4 and among regularized models, model with lambda value =0.0001.

References

Underfitting vs. Overfitting - scikit-learn 0.24.1 documentation

This example demonstrates the problems of underfitting and overfitting and how we can use linear regression with…

scikit-learn.org

Matplotlib Subplots

With the subplots() function you can draw multiple plots in one figure: The subplots() function takes three arguments…

www.w3schools.com

sklearn.linear_model.Ridge - scikit-learn 0.24.1 documentation

Linear least squares with l2 regularization. Minimizes the objective function: This model solves a regression model…

scikit-learn.org

Introduction to Machine Learning Algorithms: Linear Regression

Build your own model from scratch

towardsdatascience.com

Contribution

I have used above references only for idea of how to plot subplot’s using matplotlib, to get theory of basic concepts and necessity of np.newaxis. Everything else I did on my own. I have created data pairs using normal and gaussian distribution. Instead of using pipelines I have used models directly. I have plotted table of weights using pandas. I have used L2 regularization for order 9 polynomial features and plotted those graphs.

Challenge

while coding, I have faced many challenges. One of them is when I was passing X values to regression model, I got errors. That error occurred because it has only feature. So, I overcame that by changing shape of X using np.newaxis.

Another one is when I was trying to get polynomial features for zero order. To overcome that I have manually entered polynomial features for that order.

Earlier when I was plotting graphs, I got zig zag lines instead of a graph. That happened because the points were created using random function. I solved this by sorting those points before plotting them.

Experiments

Instead of plotting just order 0,1,3 and 9. To know how models are behaving for each order, I have plotted for orders from 0 to 9.

After plotting these graphs, I got to know that order 0, order 1 and order 2 models are underfitted model. And models from order 5 to order 9 are overfitted ones.

To know how order 9 polynomial features performs on 10 and 1/10000000 I have plotted graphs for those two too.

As a result I got above graphs, which says that model with lambda 10 and lambda 1 have same graphs just like lambda 1/100000 and 1/1000000 have similar graphs.

I got different graphs when I have changed seed value of random.

These experiments helped me in concluding which is best model.

You can find code here.

Google Colaboratory

Edit description

colab.research.google.com