The technique of Gradient Descent

  1. Learning rate — The rate at which the descent on the graph occurs, higher learning rate can imply faster descent on the graph which tends to miss the local minima during the gradient descent
  2. Epoch — The number of times you wish to revise the hyperparameters in order to reach the best possible hyperparameter
Variation of cost of gradient descent while changing the parameters by gradient descent
Decreasing cost during Gradient Descent
Variation of cost after adjusting theta
Updation rule for gradient descent
  1. Theta new — The new value of theta which is obtained after the iteration of gradient descent which yields more accurate predictions
  2. Theta old — The old value of theta which has to be updated
  3. Negative sign — This always guides the algorithm in the right direction, because if the derivative of the cost function is positive, then the gradient descent algorithm will direct the theta towards the left which points in the direction of decreasing cost whereas if the derivative of the cost function is negative, the algorithm will direct the theta towards the right which points in the direction of decreasing cost
  4. Learning rate — This term determines the size of the steps taken in the direction of the minima to find the parameter with the least cost
  5. Derivative — This term determines the magnitude of the differentiation which also plays a role in determining the steps taken in the direction of the minima to find the parameter with the least cost
  1. Linear Regression
Hypothesis function for Linear Regression
  1. Theta 0 is the intercept
  2. Theta1 is the parameter/coefficients of regression
  3. X is the input
Cost function of Linear Regression
  1. m is the number of training examples
  2. h(theta) is the hypothesis function
  3. y is the target variable
  • Positive and negative errors might cancel out each other, hence taking the square of the errors prevents the errors from being cancelled
  • The errors less than 1 in magnitude are reduced and the errors greater than 1 in magnitude are increased imposing more penalty for those errors and hence helping our model to learn in a better way
Differentiation of cost function with respect to theta 0
Differentiation of cost function with respect to theta 1
Sample dataset with ones
Hypothesis function
Calculation of error
Calculation of cost
Representation of differentiation of cost function
Showing the dot product
Result of the dot product
Python Implementation of Gradient Descent with Gradient Descent

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Aayushmaan Jain

Aayushmaan Jain

122 Followers

A data science enthusiast currently pursuing a bachelor's degree in data science