Tuesday 12 May 2020

Evaluation Metrics of Models

My Take away

1. The metrics are provided for Regression and Classification and mostly for classification but not provided for Clustering.

2. RMSE (Root Mean Square Error) better suits for Regression and RLMSE (Root Logarithmic Mean Square Error) suits better for Classification and Variance was also used to compare. This is my prior understanding.

The below measures are for classification. The main concern they are addressing seems to be the gap in classification. Either the item should be classified as this or that based on the labels or attributes despite the fact that they are spread curvy linearly.

Static Measures - Measures does not changes over time, done for whole population.

3. Confusion Matrix - I got confused initially with it, but they are just the false positives and false negative measures. It is named as Sensitivity state the model is sensitive with real input and Specificity stating the model is not missing any part of classification or capturing wrong classification. It also capture accuracy, all correctly predicted outcomes.

4. F1 Score - Harmonic Mean of Sensitivity and Specificity. That is being stated as Harmonic mean of Recall and Precision and being confused. It actually ensures that the extreme values are diminished. FBeta has a parameter to tune either to Sensitivity or Specificity.


Dynamic Measures - Measures done over Changing time or population.

5. Gain vs Lift Chart - We could split our data into various sets and keep checking our outcome metrics. it perfectly identity the thresold post which the model inflects (pass/fail).

6. Kolomogorov Smirnov chart - similar to Gain vs Lift Chart. Provides a difference between +ve and -ve distribution

7. Area Under the ROC curve (AUC – ROC) - I love this term Receiver operating characteristic (ROC) , you can see that we are treating the model as thought it is an partly impaired Human Ear and measuring how good it is at listening. As said it is measuring the gap in the classification. Compared to above 2 measures, this measure does not change with respect to population much.

8. Log Loss - Takes model capability, not just the training data distribution probability as in prior models and figures out the loss. Lower the loss better the model.

9. Gini Coefficient - A derived measure of AUC-ROC

10. Concord and discord Ratio - again a listening related measure. This measure states something related to be in tune or not.

Again Linear Regression & Classification Measure

11. R-Squared/Adjusted R-Squared - Derived measure of RMSE.

Cross Validation nothing but a strategy for testing instead of testing in production, not sure how it becomes a measure, but also considered over here.

The Reference blog does not deals with Gradient Descent & Regularization Parameters and Derivative of Gradient descent to find optimal point post gradient descent to improve model. They are part of Training Model not evaluation.

Reference:
https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/

No comments:

Post a Comment

Meditation and 5 L

Relaxation and Laughter distinguishes being human and human being . Relaxation is meditation. May be it is a lie, but a beautiful one, whic...