800 ML Questions and Answers

interview book
author: Taha Heidari
Email: taha.heidari@aalto.fi


  1. What is the Normal Equation in Linear regression:

    It is an alternative way to Gradient descent algorithm to find the best parameters without any iterations as a closed-form solution just using algebra and matrix calculation as follows:

    w=(XT.X)1XTyw=(X^T.X)^{-1}X^Ty

    import numpy as np
    np.linalg.inv(X.T@X)@X.T@y
    

    where hypothesis space is:
    h(w)=wT.Xh(w) = w^T.X
    J(w)=12mi=1m12[hw(x(i))y(i)]2J(w) = \frac{1}{2m} \sum_{i = 1}^{m} \frac{1}{2} [h_{w}(x^{(i)}) – y^{(i)}]^{2}
    in matrix form
    J(w)=12m(Xwy)T(Xwy)J(w)=\frac{1}{2m}(Xw-y)^T(Xw-y)
    review: (AB)T=BTAT(AB)^T=B^TA^T
    review: (XTw)w=(wTX)w=X\frac{\partial (X^Tw)}{\partial w}=\frac{\partial (w^TX)}{\partial w}=X
    J(w)=12m(wTXTyT)(Xwy)J(w)=\frac{1}{2m}(w^TX^T-y^T)(Xw-y)
    J(w)=12m(wTXTXwwTXTyyTXw+yTy)J(w)=\frac{1}{2m}(w^TX^TXw-w^TX^Ty-y^TXw+y^Ty)
    J(w)w=2XTXw2XTy=0\frac{\partial J(w)}{\partial w}=2X^TXw-2X^{T}y=0
    XTXw=XTyX^TXw=X^{T}y
    w=(XTX)1XTyw=(X^TX)^{-1}X^Ty

  2. What is batch gradient descent algorithm?

    It refers to the gradient descent method which uses all of the training data to update the parameters at the same time instead of using a subset of data
    w=wαJ(w)ww = w - \alpha \frac{\partial J(w)}{\partial w}
    J(Θ)=12mi=1m12[hΘ(x(i))y(i)]2J(\Theta) = \frac{1}{2m} \sum_{i = 1}^{m} \frac{1}{2} [h_{\Theta}(x^{(i)}) – y^{(i)}]^{2}
    where m=all of the trainign datam= \textbf{all of the trainign data}

  3. What is vectorisation?

    In order to have a much faster implementation of the ML algorithms we need to vectorise the formulation of the problem like:
    wT.x\vec{w}^T.\vec{x}

Written with StackEdit.