Univariate Linear Regression

Regression

In regression we fit a line through the data points to predict a continuous target variable using various independent explanatory variables
Let the line be y= α+βx
We use the least squared error approach We square the error because:

  1. It makes all errors positive and eliminates nullification of positive and negative error
  2. It diminishes the errors which are less than one and magnifies the errors which are greater than 1
  3. The projections of error taken along the x and y axis add up to the square of the error itself for example r sin⁡θ+r cos⁡θ=r2

Error = y-α-βx
Squared error = (y-α-βx)2
Sum of squared error = ∑(y-α-βx)2
Now we need to minimize the error with respect to alpha and beta
Error minimized with respect to alpha gives:

Error minimized with respect to beta gives

We can solve these equations to get the value of alpha and beta or else we can solve them by eliminating one variable

Now on subtracting 2 from 1 gives

Now on dividing both sides by n2 we get

Which can be simplified as

Hence:

For finding alpha

Python code for implementing Univariate Linear Regression

Regression using SciKitLearn

Regression using Statsmodels

Standard error

The standard error of the regression (S), also known as the standard error of the estimate, represents the average distance that the observed values fall from the regression line. Conveniently, it tells you how wrong the regression model is on average using the units of the response variable. Smaller values are better because it indicates that the observations are closer to the fitted line.
Here is the formula for standard error

Here y hat denotes the prediction that we have made using the formula y= α+βx
Here is the code to find the standard error using the Regressor class that we have created

R Squared (R2)

R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.
It is the percentage of the response variable variation that is explained by a linear model.
Or:
R-squared = Explained variation / Total variation
The formula for R2 is:

Where:
ypred are our predictions using the regression model
Here is the python code to find the R2:

--

--

--

A data science enthusiast currently doing bachelor's degree in data science

Love podcasts or audiobooks? Learn on the go with our new app.

Exploratory Data Analysis and Prediction of Heart Disease using Python

How I solved a class imbalance problem

How I used Spacy to compare phrases

Doordash Data Scientist Interview Guide

Doordash Data Scientist Interview Guide by instamentor.com

Learning Geospatial Data with GeoPandas

Is simple linear regression more accurate than multiple linear regression?

How to make a classification dataset and predict on it in Python

What are Geographic Information Systems (GIS)?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aayushmaan Jain

Aayushmaan Jain

A data science enthusiast currently doing bachelor's degree in data science

More from Medium

DIY Gradient Descending Linear Regression

Linear Regression

Basics understanding of Regression in Machine Learning

This graph shows the best fit line of the regression line

Extracting Linear Regression Coefficients from Cross-Validation