Understanding Accuracy
In a binary classification problem, what does the accuracy metric measure?
- The proportion of correct predictions to total predictions
- The percentage of false positives among all positive predictions
- The ratio of true negatives to false negatives
- The number of true positives divided by total samples
- The sum of recall and precision
Precision vs. Recall
If a medical test for a rare disease has high recall but low precision, what does this indicate about the test?
- It often correctly detects most actual diseases cases, but also gives many false alarms
- It rarely detects the disease but is always correct when it does
- It never produces false positives
- It has more true negatives than true positives
- It always has a higher F1-score than accuracy
True Negatives in Confusion Matrix
Which part of the confusion matrix corresponds to true negatives?
- The lower-right cell
- The bottom-left cell
- The top-right cell
- The lower-middle cell
- The upper-left cell
Evaluating F1-Score
Why might you prefer the F1-score over accuracy when evaluating a model on an imbalanced dataset?
- F1-score balances both precision and recall, whereas accuracy can be misleading if classes are imbalanced
- F1-score is only useful when there are no true negatives
- Accuracy penalizes false positives more than false negatives
- F1-score is easier to interpret than recall
- F1-score is always higher than accuracy
Interpreting ROC-AUC
What does a ROC-AUC score of 0.5 indicate about a classifier’s performance?
- The classifier performs no better than random chance
- The classifier is perfect
- The classifier has high precision but low recall
- The classifier predicts all classes correctly
- The classifier has maximized the F1-score
Precision Calculation
Given 80 true positives, 20 false positives, and 100 false negatives, what is the precision?
- 0.80
- 0.44
- 0.20
- 0.67
- 0.50
Model Selection with Metrics
You have two models: Model A with higher accuracy but lower recall, and Model B with slightly lower accuracy but much higher recall. When would Model B be preferred?
- When missing positive cases is more costly than having false alarms
- If true negatives are the most important
- When the classes are perfectly balanced
- If precision is the only metric to optimize
- When Model A has a better ROC-AUC
Averaging in Multi-class Problems
Which method of averaging precision, recall, and F1-score treats all samples equally regardless of class size?
- Macro-average
- Micro-average
- Weighted-average
- Harmonic mean
- Geometric mean
Metric Implementation in Code
Which of the following scikit-learn functions can you use to compute the F1-score for a binary classification problem in Python?
- f1_score(y_true, y_pred)
- accuracy(y_true, y_pred)
- roc_auc_score(y_pred, y_true)
- calculate_f1(y_pred, y_true)
- recall_score(y_true, y_pred)
Ready for Production?
A model deployed to production must meet certain metric thresholds on unseen data. Why is relying solely on training metrics a bad idea?
- Training metrics can be overly optimistic and may not reflect true performance on unseen data
- Training metrics are always lower than test metrics
- Test metrics are ignored by most organizations
- Unseen data has no impact on production decisions
- Training metrics measure only recall