Correlation

Correlation is a measure of robustness of a relation between two variables. The coefficient of correlation is used in various statistical analysis and machine learning algorithm

Need for correlation

Intuition behind correlation

For analyzing the correlation, we can plot them as follows

Then we draw a horizontal line at y=mean(y) and a vertical line at x=mean(x)

Now we can shift the origin to (x-mean(x), y-mean(y)) and we can then take the product of abscissa and ordinate of each of the point, now this product is negative for quadrants 2 and 4 and positive for quadrants 1 and 3, which means that if the value of x is increasing with the value of y, the product is positive and when the value of x is decreasing with value of y, we can say that the product is negative. Then we take sum of this product for all the points and then we divide the sum by the number of points in the data series to eliminate the effect of number of points in the data series. By doing so we obtain a quantity known as covariance which can be given by the formula

Now we divide covariance by the product of standard deviations of x and y. We do so because

  1. It cancels the effect of units of each column in the bivariate data series and hence makes correlation independent of units so that we can compare any series
  2. It brings the correlation to the range of -1 to 1 which provides a universal scale for comparison of bivariate data series
  3. Since standard deviation is always positive, it does not change the sign of any term in the formula of covariance

Hence the formula of correlation can be given by

Properties of correlation

  1. It is always between -1 and 1
  2. Correlation coefficient is independent of scale change and origin change which means that correlation does not change when you multiply or divide each element of the series with a particular number or you add or subtract each element of the series by a particular number

Inference of correlation

  1. If the correlation coefficient is close to 0 it means that the value of x is independent of value of y. This implies no correlation
  2. If the correlation coefficient is close to 1 it means that the value of x increases with increase in value of y. This implies strong positive correlation

Demerits of correlation

--

--

A data science enthusiast currently pursuing a bachelor's degree in data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aayushmaan Jain

A data science enthusiast currently pursuing a bachelor's degree in data science