Evaluate a model and decide what to do next.
Model Evaluation
Train Test Split
Split the dataset to: 70% for training and 30% for testing.
Train/test Procedure for Linear Regression
Fit params by minimizing const function J (contains regularization term) $$ J(\mathbf{w},b) = \frac{1}{2m_{train}} \sum\limits_{i = 1}^{m_{train}} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m_{train}} \sum_{j=1}^{n} w_j^2 $$
Compute Test Error: $$ J_{test}(\mathbf{w},b) = \frac{1}{2m_{test}} \sum\limits_{i = 1}^{m_{test}} (f_{\mathbf{w},b}(\mathbf{x_{test}}^{(i)}) - {y_{test}}^{(i)})^2 $$
Compute Train Error: $$ J_{train}(\mathbf{w},b) = \frac{1}{2m_{train}} \sum\limits_{i = 1}^{m_{train}} (f_{\mathbf{w},b}(\mathbf{x_{train}}^{(i)}) - {y_{train}}^{(i)})^2 $$
$J_{train}(\mathbf{w},b)$ will be low, and $J_{test}(\mathbf{w},b)$ will be high.
Train/test Procedure for Logistic Regression
Fit params by minimizing const function J (contains regularization term) $$ J(\vec{w},b) = -\frac{1}{m_{train}} \sum\limits_{i = 1}^{m_{train}} [y^{(i)}\log(f_{\vec{w}, b}(\vec{x}^{(i)})) + (1 - y^{(i)})\log(1 - f_{\vec{w}, b}(\vec{x}^{(i)}))] + \frac{\lambda}{2m_{train}}\sum_{j=1}^{n} w_j^2 $$
Compute Test Error: $$ J_{text}(\vec{w},b) = -\frac{1}{m_{test}} \sum\limits_{i = 1}^{m_{test}} [y_{test}^{(i)} \log(f_{\vec{w}, b}(\vec{x_{test}}^{(i)})) + (1 - y_{test}^{(i)})\log(1 - f_{\vec{w}, b}(\vec{x}_{test}^{(i)}))] $$
Compute Train Error: $$ J_{train}(\vec{w},b) = -\frac{1}{m_{train}} \sum\limits_{i = 1}^{m_{train}} [y_{train}^{(i)}\log(f_{\vec{w}, b}(\vec{x_{train}}^{(i)})) + (1 - y_{train}^{(i)})\log(1 - f_{\vec{w}, b}(\vec{x}_{train}^{(i)}))] $$
Common Method for train and test error in classification model: $J_{test}(\vec{w}, b)$ is the fraction of the test set that has been misclassified. $J_{train}(\vec{w}, b)$ is the fraction of the train set that has been misclassified.
Model Selection
Train Cross-Validation Test Split
Split dataset into 60% for training, 20% for cross-validation and 20% for testing.
Evaluate and Choose
Train all models on the training set.
Evaluate all models on cross-validation set and pick the best one.
Then give a fair estimate value of accuracy of the chosen model using test set.