Model Metrics Mastery Quiz: Evaluating Machine Learning Performance Quiz

  1. Understanding Accuracy

    In a binary classification problem, what does the accuracy metric measure?

    1. The proportion of correct predictions to total predictions
    2. The percentage of false positives among all positive predictions
    3. The ratio of true negatives to false negatives
    4. The number of true positives divided by total samples
    5. The sum of recall and precision
  2. Precision vs. Recall

    If a medical test for a rare disease has high recall but low precision, what does this indicate about the test?

    1. It often correctly detects most actual diseases cases, but also gives many false alarms
    2. It rarely detects the disease but is always correct when it does
    3. It never produces false positives
    4. It has more true negatives than true positives
    5. It always has a higher F1-score than accuracy
  3. True Negatives in Confusion Matrix

    Which part of the confusion matrix corresponds to true negatives?

    1. The lower-right cell
    2. The bottom-left cell
    3. The top-right cell
    4. The lower-middle cell
    5. The upper-left cell
  4. Evaluating F1-Score

    Why might you prefer the F1-score over accuracy when evaluating a model on an imbalanced dataset?

    1. F1-score balances both precision and recall, whereas accuracy can be misleading if classes are imbalanced
    2. F1-score is only useful when there are no true negatives
    3. Accuracy penalizes false positives more than false negatives
    4. F1-score is easier to interpret than recall
    5. F1-score is always higher than accuracy
  5. Interpreting ROC-AUC

    What does a ROC-AUC score of 0.5 indicate about a classifier’s performance?

    1. The classifier performs no better than random chance
    2. The classifier is perfect
    3. The classifier has high precision but low recall
    4. The classifier predicts all classes correctly
    5. The classifier has maximized the F1-score
  6. Precision Calculation

    Given 80 true positives, 20 false positives, and 100 false negatives, what is the precision?

    1. 0.80
    2. 0.44
    3. 0.20
    4. 0.67
    5. 0.50
  7. Model Selection with Metrics

    You have two models: Model A with higher accuracy but lower recall, and Model B with slightly lower accuracy but much higher recall. When would Model B be preferred?

    1. When missing positive cases is more costly than having false alarms
    2. If true negatives are the most important
    3. When the classes are perfectly balanced
    4. If precision is the only metric to optimize
    5. When Model A has a better ROC-AUC
  8. Averaging in Multi-class Problems

    Which method of averaging precision, recall, and F1-score treats all samples equally regardless of class size?

    1. Macro-average
    2. Micro-average
    3. Weighted-average
    4. Harmonic mean
    5. Geometric mean
  9. Metric Implementation in Code

    Which of the following scikit-learn functions can you use to compute the F1-score for a binary classification problem in Python?

    1. f1_score(y_true, y_pred)
    2. accuracy(y_true, y_pred)
    3. roc_auc_score(y_pred, y_true)
    4. calculate_f1(y_pred, y_true)
    5. recall_score(y_true, y_pred)
  10. Ready for Production?

    A model deployed to production must meet certain metric thresholds on unseen data. Why is relying solely on training metrics a bad idea?

    1. Training metrics can be overly optimistic and may not reflect true performance on unseen data
    2. Training metrics are always lower than test metrics
    3. Test metrics are ignored by most organizations
    4. Unseen data has no impact on production decisions
    5. Training metrics measure only recall