ghConnectHub

How to solve problems with Logistic regression

analyticagh |
Education & Learning
Using predictive models to make decisions is nothing new, quants and statisticians have made use of these models since the beginning of time. But what has changed?, is that still efficient?. With new advancement in technology, big data and the use of AI and Machine learning (ML) being adopted in business, mathematical computations have now become the preserve of computers and machines because they deliver real time, fast and accurate results within a fraction of a second, even with big data. One of such predictive models that is used to make predictions in business is Logistic Regression.

What is Logistic Regression?

Logistic regression (LR) a classification algorithm that is used where the target variable or outcome is of a categorical nature based on a series of independent variables. The main objective behind Logistic Regression is to determine the relationship between features and the probability of a particular outcome.
Logistic regression can be binary or multinomial logistic regression. Binary logistic regression is the most used form of regression where the dependent variable or outcome is binary. Multinomial logistic regression is where the dependent variable has more than two outcomes.

Logistic regression is based on the concept of probabilities.

What is Logistic or Sigmoid function?

In order to map predicted values to probabilities, we use the Sigmoid function. The function maps any real value into ]values between 0 and 1

Diagram

What is Logistic Regression used for?

Logistic regression can be used to compute or predict the probability of a binary event or outcome occurring , and to deal with issues or problems of classification. A binary event is an event that has two outcomes. For example, choosing between Yes /No, On/Off, Male/Female, Rain/No Rain, Spam/Not Spam.


Why do we log values in Regression?

Take for instance income level, it is commonly accepted to be right skewed, because people earn disproportionate large amounts of income, as a result, the mean will be much higher than the median. The skewness in that particular data set may contradict the prevailing norm. Logs help to reduce the magnitude of these values, also, using logs lowers the impact of heteroskedasticity 

 

When should logistic regression be used?

Logistic regression is used when dependent variable or outcome can take only two values , and if the data is linearly separable, it is more efficient to classify it into two separate classes.

Assumptions of Logistic Regression

The following are the main assumptions of logistic regression:
  1. There is little to no multicollinearity between the independent variables.
  2. The independent variables are linearly related to the log odds (log (p/(1-p)).
  3. With binary logistic regression, the dependent variable is dichotomous or binary; it fits into two distinct categories.
  4. The observations or sample data sizes should be large for better results.
  5. There are no outliers (an observation that lies outside or is at an abnormal distance from other values in a random in a sample population).

Linear regression vs. Logistic regression

With logistic regression, it predicts the categorical variable for one or more independent variables, linear regression on the other hand predicts the continuous variable. Simply put, logistic regression provides a constant output, whereas linear regression offers a continuous output. Since the outcome is continuous in linear regression, there are infinite possible values for the outcome. But for logistic regression, the number of possible outcome values is limited and falls between zero and one.

Cost function in Logistic Regression

A Cost function is used to check the performance of a model and help us to reach an optimal solution. In linear regression, we used mean squared error (MSE) as the cost function. But in logistic regression, using the mean of the squared differences between actual and predicted outcomes as the cost function might give a wavy, non-convex outcome displaying many local optima The cost function for logistic regression called log loss which is also derived from the maximum likelihood estimation method.

What is Gradient Descent in Logistic Regression?

Gradient Descent is one of the popular Machine Learning ML & deep learning optimization techniques used in linear regression, logistic regression and neural networks to minimize a cost function. Once the cost function is minimized, it can then be reliably used to make accurate predictions.

Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function that minimizes a cost function. Gradient descent is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm.

There are three types of Gradient Descent Algorithms:
  • Batch Gradient Descent
  • Stochastic Gradient Descent
  • Mini-Batch Gradient Descent

Sample logistic problem: Given the speed, time of transaction and amount, predict if the customer is a fraudulent customer or a legitimate customer?

How to make predictions in Logistic regression

Almost anyone with knowledge and understanding in statistics, mathematics or data science, it would be quite easy for you to make reliable predictions once you have gathered your data. Microsoft Excel, SPSS, Python, R are some of the applications or packages at your disposal that can aid you in making reliable predictions.



Like most individuals who will want to setup quickly, Cloud software solutions is a good and cost effective option and provides the platform for you to train and deploy your model from your dataset.

Below are the notable machine learning Cloud software providers:
  • AWS (Amazon Web Services)
  • Microsoft Azure
  • IBM
  • Google Cloud

What Next? In this blog post, having presented to you the basic concept of Logistic Regression, don't stop here, apply it to make predictions and master it.



References: Wikipedia , Medium -Saishruthi Swaminathan , Medium -Abhinav Mazumdar