Evaluate a model and decide what to do next.

Model Evaluation

Train Test Split

Split the dataset to: 70% for training and 30% for testing.

Train/test Procedure for Linear Regression

Fit params by minimizing const function J (contains regularization term)

$$ J(\mathbf{w},b) = \frac{1}{2m_{train}} \sum\limits_{i = 1}^{m_{train}} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m_{train}} \sum_{j=1}^{n} w_j^2 $$

Compute Test Error:

$$ J_{test}(\mathbf{w},b) = \frac{1}{2m_{test}} \sum\limits_{i = 1}^{m_{test}} (f_{\mathbf{w},b}(\mathbf{x_{test}}^{(i)}) - {y_{test}}^{(i)})^2 $$

Compute Train Error:

$$ J_{train}(\mathbf{w},b) = \frac{1}{2m_{train}} \sum\limits_{i = 1}^{m_{train}} (f_{\mathbf{w},b}(\mathbf{x_{train}}^{(i)}) - {y_{train}}^{(i)})^2 $$

$J_{train}(\mathbf{w},b)$ will be low, and $J_{test}(\mathbf{w},b)$ will be high.

Train/test Procedure for Logistic Regression

Fit params by minimizing const function J (contains regularization term)

$$ J(\vec{w},b) = -\frac{1}{m_{train}} \sum\limits_{i = 1}^{m_{train}} [y^{(i)}\log(f_{\vec{w}, b}(\vec{x}^{(i)})) + (1 - y^{(i)})\log(1 - f_{\vec{w}, b}(\vec{x}^{(i)}))] + \frac{\lambda}{2m_{train}}\sum_{j=1}^{n} w_j^2 $$

Compute Test Error:

$$ J_{text}(\vec{w},b) = -\frac{1}{m_{test}} \sum\limits_{i = 1}^{m_{test}} [y_{test}^{(i)} \log(f_{\vec{w}, b}(\vec{x_{test}}^{(i)})) + (1 - y_{test}^{(i)})\log(1 - f_{\vec{w}, b}(\vec{x}_{test}^{(i)}))] $$

Compute Train Error:

$$ J_{train}(\vec{w},b) = -\frac{1}{m_{train}} \sum\limits_{i = 1}^{m_{train}} [y_{train}^{(i)}\log(f_{\vec{w}, b}(\vec{x_{train}}^{(i)})) + (1 - y_{train}^{(i)})\log(1 - f_{\vec{w}, b}(\vec{x}_{train}^{(i)}))] $$

Common Method for train and test error in classification model: $J_{test}(\vec{w}, b)$ is the fraction of the test set that has been misclassified. $J_{train}(\vec{w}, b)$ is the fraction of the train set that has been misclassified.

Model Selection

Train Cross-Validation Test Split

Split dataset into 60% for training, 20% for cross-validation and 20% for testing.

Evaluate and Choose

Train all models on the training set.

Evaluate all models on cross-validation set and pick the best one.

Then give a fair estimate value of accuracy of the chosen model using test set.