AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research…

Follow publication

Multivariate Linear Regression

--

Introduction to Multivariate Linear Regression

In this kind of regression, we have multiple features to predict a single outcome or in other words, a single dependent variable can be explained by multiple independent variables.
In this regression, we will use the Gauss Markov setup which has the following assumptions

  • Errors follow the normal distribution with mean 0 and variance σ2I hence: ε~N(0,σ2I)
  • The errors are homoscedastic (meaning that they have the same variance for all)
  • Distinct error terms are uncorrelated

Hence we can say: for a model containing p regressors and n observations

Where:

  • Y is the matrix of order n×1
  • X Is the matrix of dependent variables of order n×p
  • Beta is the matrix of the coefficients of the regressors of order p×1

Multivariate Linear Regression

Now the error can be given by:

Since in matrix notation, the sum of squares of all the elements in the matrix is given by ∑a2 = aTa
Hence the summation of the squared error can be given by
∑ε2 = εTε
Now substituting

in

Since

and

are scalars, we can write

For the ordinary least squared approach, ∑ε2 Should be minimum, hence (∂∑ε2)/(∂β)=0

Proof that the estimator we have found is BLUE (Best Linear Unbiased Estimator)

Here we see

that there are both positive and negative errors, hence we can conclude that the error looks like

Hence the distribution of the error looks like

Since errors are centered around zero and the distribution of the error looks like a normal distribution. Hence we can safely assume
ε~N(0,σ2I)
Where I is an Identity matrix of n×n
The residuals are uncorrelated since all the off-diagonal elements are 0 which means the covariance between them are 0

This shows that

  • is a linear combination of x
  • Substituting
  • in
  • Since (XTX)-1XTX=I hence the equation becomes
  • Since
  • , we can say that the estimator is unbiased
    Let β- be another unbiased estimator of the regression equation
  • For β- to be an unbiased estimator E(β-) = β-
  • Since (XTX)-1XTX=I hence the equation becomes
  • For the estimator to be unbiased DX=0
  • Now
  • Since DDT is a non-negative definite matrix,
  • Since the variance of any other estimator is greater than the variance of our estimator we can conclude that our estimator is the Best Linear Unbiased Estimator
  • R Squared (R2)
  • R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression.
    It is the percentage of the response variable variation that is explained by a linear model.
    Or:
    R-squared = Explained variation / Total variation
    The formula for R2 is:

Where:
y pred are our predictions using the regression model

Adjusted R squared (Adjusted R2)

The adjusted R-squared is a modified version of R-squared that has been adjusted for the number of predictors in the model. The adjusted R-squared increases only if the new term improves the model more than would be expected by chance. It decreases when a predictor improves the model by less than expected by chance
The formula for adjusted R2 is given by:

Python code for Multivariate Linear Regression

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

AIGuys
AIGuys

Published in AIGuys

Deflating the AI hype and bringing real research and insights on the latest SOTA AI research papers. We at AIGuys believe in quality over quantity and are always looking to create more nuanced and detail oriented content.

Aayushmaan Jain
Aayushmaan Jain

Written by Aayushmaan Jain

A data science enthusiast currently pursuing a bachelor's degree in data science

No responses yet

Write a response