Model Metrics Mastery Quiz: Evaluating Machine Learning Performance Quiz

Understanding Accuracy
In a binary classification problem, what does the accuracy metric measure?
1. The proportion of correct predictions to total predictions
2. The percentage of false positives among all positive predictions
3. The ratio of true negatives to false negatives
4. The number of true positives divided by total samples
5. The sum of recall and precision
Precision vs. Recall
If a medical test for a rare disease has high recall but low precision, what does this indicate about the test?
1. It often correctly detects most actual diseases cases, but also gives many false alarms
2. It rarely detects the disease but is always correct when it does
3. It never produces false positives
4. It has more true negatives than true positives
5. It always has a higher F1-score than accuracy
True Negatives in Confusion Matrix
Which part of the confusion matrix corresponds to true negatives?
1. The lower-right cell
2. The bottom-left cell
3. The top-right cell
4. The lower-middle cell
5. The upper-left cell
Evaluating F1-Score
Why might you prefer the F1-score over accuracy when evaluating a model on an imbalanced dataset?
1. F1-score balances both precision and recall, whereas accuracy can be misleading if classes are imbalanced
2. F1-score is only useful when there are no true negatives
3. Accuracy penalizes false positives more than false negatives
4. F1-score is easier to interpret than recall
5. F1-score is always higher than accuracy
Interpreting ROC-AUC
What does a ROC-AUC score of 0.5 indicate about a classifier’s performance?
1. The classifier performs no better than random chance
2. The classifier is perfect
3. The classifier has high precision but low recall
4. The classifier predicts all classes correctly
5. The classifier has maximized the F1-score
Precision Calculation
Given 80 true positives, 20 false positives, and 100 false negatives, what is the precision?
1. 0.80
2. 0.44
3. 0.20
4. 0.67
5. 0.50
Model Selection with Metrics
You have two models: Model A with higher accuracy but lower recall, and Model B with slightly lower accuracy but much higher recall. When would Model B be preferred?
1. When missing positive cases is more costly than having false alarms
2. If true negatives are the most important
3. When the classes are perfectly balanced
4. If precision is the only metric to optimize
5. When Model A has a better ROC-AUC
Averaging in Multi-class Problems
Which method of averaging precision, recall, and F1-score treats all samples equally regardless of class size?
1. Macro-average
2. Micro-average
3. Weighted-average
4. Harmonic mean
5. Geometric mean
Metric Implementation in Code
Which of the following scikit-learn functions can you use to compute the F1-score for a binary classification problem in Python?
1. f1_score(y_true, y_pred)
2. accuracy(y_true, y_pred)
3. roc_auc_score(y_pred, y_true)
4. calculate_f1(y_pred, y_true)
5. recall_score(y_true, y_pred)
Ready for Production?
A model deployed to production must meet certain metric thresholds on unseen data. Why is relying solely on training metrics a bad idea?
1. Training metrics can be overly optimistic and may not reflect true performance on unseen data
2. Training metrics are always lower than test metrics
3. Test metrics are ignored by most organizations
4. Unseen data has no impact on production decisions
5. Training metrics measure only recall

Model Metrics Mastery Quiz: Evaluating Machine Learning Performance Quiz

Understanding Accuracy

Precision vs. Recall

True Negatives in Confusion Matrix

Evaluating F1-Score

Interpreting ROC-AUC

Precision Calculation

Model Selection with Metrics

Averaging in Multi-class Problems

Metric Implementation in Code

Ready for Production?