800 ML Questions and Answers

interview book
author: Taha Heidari
Email: taha.heidari@aalto.fi

What is the Normal Equation in Linear regression:
It is an alternative way to Gradient descent algorithm to find the best parameters without any iterations as a closed-form solution just using algebra and matrix calculation as follows:

$w=(X^T.X)^{-1}X^Ty$
```
import numpy as np
np.linalg.inv(X.T@X)@X.T@y
```
where hypothesis space is:
$h(w) = w^T.X$
$J(w) = \frac{1}{2m} \sum_{i = 1}^{m} \frac{1}{2} [h_{w}(x^{(i)}) – y^{(i)}]^{2}$
in matrix form
$J(w)=\frac{1}{2m}(Xw-y)^T(Xw-y)$
review: $(AB)^T=B^TA^T$
review: $\frac{\partial (X^Tw)}{\partial w}=\frac{\partial (w^TX)}{\partial w}=X$
$J(w)=\frac{1}{2m}(w^TX^T-y^T)(Xw-y)$
$J(w)=\frac{1}{2m}(w^TX^TXw-w^TX^Ty-y^TXw+y^Ty)$
$\frac{\partial J(w)}{\partial w}=2X^TXw-2X^{T}y=0$
$X^TXw=X^{T}y$
$w=(X^TX)^{-1}X^Ty$
- It is slow when the dataset and number of features are large
- It only works for linear regression and not other methods like logistic regression
  https://eli.thegreenplace.net/2014/derivation-of-the-normal-equation-for-linear-regression
  https://www.youtube.com/watch?v=g8qF61P741w
  https://prutor.ai/normal-equation-in-linear-regression/
What is batch gradient descent algorithm?

It refers to the gradient descent method which uses all of the training data to update the parameters at the same time instead of using a subset of data
$w = w - \alpha \frac{\partial J(w)}{\partial w}$
$J(\Theta) = \frac{1}{2m} \sum_{i = 1}^{m} \frac{1}{2} [h_{\Theta}(x^{(i)}) – y^{(i)}]^{2}$
where $m= \textbf{all of the trainign data}$
What is vectorisation?

In order to have a much faster implementation of the ML algorithms we need to vectorise the formulation of the problem like:
$\vec{w}^T.\vec{x}$

Written with StackEdit.