Overfitting vs Underfitting: Model Tuning Quiz Quiz

Challenge your understanding of overfitting, underfitting, and key model tuning concepts in this focused quiz. Explore the causes, consequences, and possible solutions for model performance issues using real-world scenarios and essential terminology.

  1. Identifying Overfitting in Model Evaluation

    A decision tree model has a training accuracy of 98% but a validation accuracy of only 69%. Which issue does this most likely indicate?

    1. Overfitting
    2. Optimizer instability
    3. Data leakage
    4. Underfitting

    Explanation: Overfitting occurs when a model performs exceptionally well on training data but poorly on unseen validation data, usually because it has learned noise or irrelevant details. Underfitting would result in poor performance on both sets. Data leakage involves unintended information passing into training, which isn't the primary issue here. Optimizer instability refers to problems during training optimization, not high-variance accuracy gaps.

  2. Concept of Underfitting with a Linear Model

    If a linear regression model performs poorly on both training and test sets when fitting a clearly nonlinear dataset, which phenomenon is occurring?

    1. Overfitting
    2. Feature explosion
    3. Model selection bias
    4. Underfitting

    Explanation: Underfitting happens when a model is too simple to capture the underlying data pattern, leading to low accuracy everywhere. This is typical when using a linear model on nonlinear data. Model selection bias refers to errors from repeatedly choosing models based on their validation performance, not a simple lack of complexity. Overfitting would yield great training performance but poor testing. Feature explosion would indicate overly complex features but is unrelated here.

  3. Regularization and Model Complexity

    Which regularization technique can help reduce overfitting by penalizing overly complex weights in a model?

    1. L2 regularization
    2. Early stopping
    3. Data normalization
    4. Data augmentation

    Explanation: L2 regularization works by adding a penalty for large weights to the loss function, discouraging the model from becoming too complex. Data normalization helps stabilize training but does not directly prevent overfitting. Early stopping halts training to avoid overfitting, but does not penalize weights. Data augmentation increases dataset variety, helping generalization, but is not a direct regularization penalty.

  4. Model Tuning to Address Underfitting

    Which strategy is best for correcting underfitting in a polynomial regression model where predictions are consistently inaccurate?

    1. Adding more regularization
    2. Reducing training data
    3. Increasing model complexity
    4. Decreasing the number of polynomial features

    Explanation: To correct underfitting, especially if the model is too simple, increasing complexity (for example, by using a higher-degree polynomial) can help capture data patterns. Reducing training data generally worsens performance. Decreasing polynomial features would further restrict the model, increasing underfitting. More regularization would constrain the model further, which is not suitable for underfitting cases.

  5. Learning Curves and Diagnosing Model Issues

    When analyzing learning curves, what does it typically mean if both training and validation errors are high and closely matched?

    1. The model has imbalanced classes
    2. The model is overfitting to noise
    3. The model is underfitting the data
    4. The model suffers data drift

    Explanation: High and similar training and validation errors suggest the model cannot effectively learn patterns from the data, a clear sign of underfitting. Overfitting would result in a large gap between errors. Data drift refers to a change in data distribution, not matching errors. Imbalanced classes may cause other error patterns, but not consistently high matched errors across sets.