A Gentle Introduction to Logistic Regression

Swagata Ashwani
2 min readMay 24, 2022

--

Photo by Annie Spratt on Unsplash

What is Logistic Regression?

Logistic Regression is a supervised learning algorithm that predicts a category- yes/no, or any type of categorical classification or class types. It is primarily used for binary classification problems but can also be extended to multi-class classification problems.

Real life examples of Logistic Regression:

Some of the real life examples where Logistic Regression is used -

  • Gmail — Email classifier to tell us whether an incoming email should be marked as “spam” or “not spam”.
  • Health care- Check radiological images to predict whether a tumour is benign or malignant.

Types of Logistic Regression:

  • Binary logistic regression: In this type of Logistic Regression, the dependent variable is dichotomous in nature — i.e. it has only two possible outcomes (e.g. 0 or 1).
  • Multinomial logistic regression: In this type of logistic regression model, the dependent variable has three or more possible outcomes; however, these values have no specified order.
  • Ordinal logistic regression: This type of logistic regression model is leveraged when the response variable has three or more possible outcome, but in this case, these values do have a defined order.

Assumptions of Logistic Regression:

Unlike Linear Regression, Logistic Regression does not have a lot of assumptions, but assumes the following-

  • There should be no outliers in the data.
  • There should be no high correlations (multicollinearity) among the predictors.

Logistic Regression Representation:

Logistic regression is also a linear model like linear regression, but it does not predict continuous values, instead it predicts a class/category. It is represented using a sigmoid function as follows-

where y is the predicted probability of the class and wo + w1x is the linear model within logistic regression.

Logistic regression only calculates the outcome as either 0 or 1. The outptu is represented as a curve because logistic regression calculates a probability.

Consider an example where the probability of an event occurring is 0.6, does this output belong to class 0 or 1.

A threshold is used to categorize the probabilities of logistic regression into discrete classes.

y = 0 if predicted probability < 0.5

y = 1 if predicted probability > 0.5

In short, To predict the Y label — spam/not spam, cancer/not cancer, fraud/not fraud, etc. — you have to set a probability cutoff or threshold.

--

--

Swagata Ashwani
Swagata Ashwani

Written by Swagata Ashwani

I love talking Data! Data Scientist with a passion for finding optimized solutions in the AI space.Follow me here — https://www.linkedin.com/in/swagata-ashwani/

No responses yet