Challenge your understanding of overfitting, underfitting, and key model tuning concepts in this focused quiz. Explore the causes, consequences, and possible solutions for model performance issues using real-world scenarios and essential terminology.
A decision tree model has a training accuracy of 98% but a validation accuracy of only 69%. Which issue does this most likely indicate?
Explanation: Overfitting occurs when a model performs exceptionally well on training data but poorly on unseen validation data, usually because it has learned noise or irrelevant details. Underfitting would result in poor performance on both sets. Data leakage involves unintended information passing into training, which isn't the primary issue here. Optimizer instability refers to problems during training optimization, not high-variance accuracy gaps.
If a linear regression model performs poorly on both training and test sets when fitting a clearly nonlinear dataset, which phenomenon is occurring?
Explanation: Underfitting happens when a model is too simple to capture the underlying data pattern, leading to low accuracy everywhere. This is typical when using a linear model on nonlinear data. Model selection bias refers to errors from repeatedly choosing models based on their validation performance, not a simple lack of complexity. Overfitting would yield great training performance but poor testing. Feature explosion would indicate overly complex features but is unrelated here.
Which regularization technique can help reduce overfitting by penalizing overly complex weights in a model?
Explanation: L2 regularization works by adding a penalty for large weights to the loss function, discouraging the model from becoming too complex. Data normalization helps stabilize training but does not directly prevent overfitting. Early stopping halts training to avoid overfitting, but does not penalize weights. Data augmentation increases dataset variety, helping generalization, but is not a direct regularization penalty.
Which strategy is best for correcting underfitting in a polynomial regression model where predictions are consistently inaccurate?
Explanation: To correct underfitting, especially if the model is too simple, increasing complexity (for example, by using a higher-degree polynomial) can help capture data patterns. Reducing training data generally worsens performance. Decreasing polynomial features would further restrict the model, increasing underfitting. More regularization would constrain the model further, which is not suitable for underfitting cases.
When analyzing learning curves, what does it typically mean if both training and validation errors are high and closely matched?
Explanation: High and similar training and validation errors suggest the model cannot effectively learn patterns from the data, a clear sign of underfitting. Overfitting would result in a large gap between errors. Data drift refers to a change in data distribution, not matching errors. Imbalanced classes may cause other error patterns, but not consistently high matched errors across sets.