Predicted | Positive | Negative |
---|---|---|
Positive | A - TP | B-FP |
Negative | C - FN | D-TN |
\(e_i = y_i - \hat{y_i}\)
\(e_i^2 = (y_i - \hat{y_i})^2\)
\[ME = \frac{1}{n}\sum_{i=1}^n e_i\]
Mean Error
\[ME = \frac{1}{n}\sum_{i=1}^n e_i\]
\[MAE = \frac{1}{n}\sum_{i=1}^n |e_i|\]
\[MSE = \frac{1}{n}\sum_{i=1}^n e^2_i\]
\[MPE = \frac{1}{n}\sum_{i=1}^n \frac{e_i}{y_i}\]
\[MAPE = \frac{1}{n}\sum_{i=1}^n |\frac{e_i}{y_i}|\]
\[RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^n e^2_i}\]
Graphical representations reveal more than metrics alone.
Accuracy measure on training set: Tells about the model fit
Accuracy measure on test set: Model ability to predict new data
Naive approach: approach relies soley on \(Y\)
Outcome: Numeric
Naive Benchmark: Average (\(\bar{Y}\))
A good prediction model should outperform the benchmark criterion in terms of predictive accuracy.
Actual
|
||
---|---|---|
Predicted | Positive | Negative |
Positive | A - TP | B-FP |
Negative | C - FN | D-TN |
TP - True Positive
FP - False Positive
FN - False Negative
TN - True Negative
\(A\) - True Positive
\(B\) - False Positive
\(C\) - False Negative
\(D\) - True Negative
\[Sensitivity = \frac{A}{A+C}\]
\[Specificity = \frac{D}{B+D}\]
\[Prevalence = \frac{A+C}{A+B+C+D}\]
\[\text{Detection Rate} = \frac{A}{A+B+C+D}\]
\[\text{Detection Prevalence} = \frac{A+B}{A+B+C+D}\]
\[\text{Balance Accuracy}=\frac{Sensitivity + Specificity}{2}\]
\[Precision = \frac{A}{A+B}\]
\[Recall = \frac{A}{A+C}\]
\[F_1 = \frac{2 \times (\text{precision} \times \text{recall})}{\text{precision + recall}}\] The \(F1\) score can be interpreted as a harmonic mean of the precision and recall, where an \(F1\) score reaches its best value at 1 and worst score at 0. The relative contribution of precision and recall to the F1 score are equal.
\[F_1 = \frac{(1+\beta^2) \times (\text{precision} \times \text{recall})}{(\beta^2 \times \text{precision}) + \text{recall}}\]
Weighted harmonic mean of the precision and recall, reaching its optimal value at 1 and worst value at 0.
The beta parameter determines the weight of recall in the combined score.
\[\beta < 1 - \text{more weight to precision }\]
\[\beta > 1 - \text{favors recall}\]
Positive Prediction Value (PPV)
\[PPV = \frac{sensitivity \times prevalence}{(sensitivity \times prevalence)+((1-specificity)\times (1-prevalence))}\]
Negative Prediction Value (PPV)
\[NPV = \frac{specificity \times (1-prevalence)}{( (1-sensitivity) \times prevalence)+(specificity \times (1-prevalence))}\]
\[TPR = \frac{TP}{TP+FN}\]
\[FPR = \frac{FP}{FP+TN}\]
Area Under Curve (AUC)
Perfect classifier: \(AUC = 1\)
Random classifier: \(AUC = 0.5\)