linear regression: Using math to make business predictions

In a world where there is a cause and effect relationship between events, the outcome or results will most often than not be dependent on the inputs . How do you find out if these relationships affect the outcome? and to what extent does the activities or inputs influence the outcome. Linear regression provides this solution to help make reliable decisions.

Linear regression is a statistical method involving math used in finding out the relationship between dependent variables and independent.

The linear regression formula

Y=mx + c

A regression equation expresses a relationship between one or more independent variables (also known as X-variables, regressors, predictor, right hand side (RHS) variables) and a dependent variable (also known as the Y-variable, left hand side (LHS) variable, or outcome)

Dependent variable
Independent variable
Intercept (c)
Coefficient (m), slope

Estimating these coefficients will enable you to make a prediction to find out the effect of the scenario. If your regression is unbiased, then Y will be your best estimate of how price and advertising affect sales.

Linear regression diagram

Assumptions of Linear regression

In linear regression algorithm there is an assumption of a linear relationship between the independent variables and the dependent variable Y.
But Why the need for assumptions when using linear regression? If the assumptions do not hold, we cannot reliably use the model to make accurate predictions.

The following assumptions hold for linear regression:

Linear relationship : A linear relationship can be observed on a sloped line where as one variable changes, the other must also change in the same proportion, a scatterplot is used to visually inspect the linearity. If the relationship displayed in the scatterplot is not linear, optimization techniques may be used to transform the data.
Homoscedasticity : Homoscedasticity is a situation in which the variance of the residual or error term between the independent variables and dependent variable is the same across all values of the independent variables.
Independence : The observations should be independent of one another, thus they should not be correlated, simply put, degree of similarity between a given time series values and a lagged version of itself over successive time intervals should not be the same. Durbin Watson statistic is used to test for the presence or absence or correlation. Results range from 0 to 4, with values between of 0 and 2 showing positive autocorrelation, and values between 2 to 4 show negative autocorrelation. A value of 2, shows that there is no autocorrelation.
Normality : Normality occurs when residuals are normally distributed and resembles a diagonal line on a scatterplot.

How do you test for reliability in Linear Regression?

Various tests of reliability can be used to see how the dependent and independent variables influence each other. The most common technique used to estimate coefficients in statistics is ordinary least squares (OLS) Some of the techniques used are:

Ordinary Least Squares (OLS)
Coefficient of Determination (R-squared)
Standard Error of Estimates
Standard Error of Coefficient

In this post, the OLS technique will be of focus.

What is cost function in linear regression?

The cost function estimates how well the model's predictions are compared to the actual outcome.

In linear regression, we use SSE or mean squared error (MSE) as the cost function.

Sum of Squared Errors (SSE) :
Mean Squared Error (MSE) :

In order to fit the best intercept line between the points in the above scatter plots, we use a metric called “Sum of Squared Errors” (SSE) and compare the lines to find out the best fit by reducing errors. The errors are sum difference between actual value and predicted value.

Optimization Techniques in Linear Regression

Ordinary least squares (OLS): It is a non-iterative method that fits a model such that the sum-of-squares or mean squared error of differences between observed and predicted values is minimized.

Gradient Descent: Gradient Descent can be thought of as the direction you take to reach the least possible error. The error in your model can be different at different points, and it iteratively finds fastest way to minimize it in order to be efficient.
Gradient Descent is an algorithm used in machine learning and AI to optimize a cost function or the error of a model. Thus, It is used to find the minimum value of error possible in a model.

After optimization using OLS, you find out that the SSE value computed earlier has been minimized or significantly reduced.

Sample regression problem: Predict the value of sales(Y) for any set of values of X- variables if the selling price of a product is $100 and you spend $5 million on advertising?

How to make predictions in linear regression

For anyone with knowledge and understanding in statistics, mathematics or data science can pull this off and use it to make reliable predictions once you have gathered your data. Microsoft Excel, SPSS, Python, R are some of the applications or packages available that can help you with these predictions.

Like most folks who will want to setup quickly, Cloud software solutions is a good and cost effective option and provides the platform for you to train and deploy your model from your dataset.

Below are the notable machine learning Cloud software providers:

AWS (Amazon Web Services)
Microsoft Azure
IBM
Google Cloud

What Next?In this blog post, having presented to you the basic concept of Linear Regression, don't stop here, apply it to make predictions and master it.

References: Wikipedia , Medium- Prabhu