Interpretation of Slope in Linear Regression
In a simple linear regression analysis relating students’ study hours (X) to their test scores (Y), what does the slope coefficient represent?
- A) The expected change in test score for each additional hour studied
- B) The minimum value that test scores can reach
- C) The ratio between average test scores and study hours
- D) The proportion of study hours explained by test scores
- E) The sum of squared residuals divided by the number of students
Gauss-Markov Theorem Application
Which assumption is necessary for the Ordinary Least Squares (OLS) estimator in linear regression to be the Best Linear Unbiased Estimator (BLUE)?
- A) The variance of the errors is constant (homoscedasticity)
- B) There must be at least twice as many predictors as observations
- C) The dependent variable must be binary
- D) Residuals must always be positive
- E) Predictors are always measured in logarithmic scales
Interpreting the R-squared Value
What does an R-squared value of 0.82 indicate in the context of a linear regression predicting house prices from square footage?
- A) 82% of the variance in house prices is explained by square footage
- B) 18% of the house prices are predictable from other variables
- C) The residual standard error is 0.82
- D) 82% of the coefficients are statistically significant
- E) The correlation between square footage and price is 0.82
Dealing with Multicollinearity
If two independent variables in a multiple linear regression are highly correlated, what is the main risk introduced to the model?
- A) Coefficient estimates may become unstable and difficult to interpret
- B) The regression line will always pass through the origin
- C) The model will automatically switch to non-linear regression
- D) Prediction residuals will necessarily become negative
- E) The number of observations will have to double
Detecting Non-Linearity
When examining residual plots after fitting a linear regression model, which pattern suggests that the linearity assumption may have been violated?
- A) Residuals forming a distinct curve or pattern rather than being randomly scattered
- B) Residuals are all near zero
- C) Residuals exactly match the predicted values
- D) Residuals show a perfectly vertical line
- E) Residuals have only positive values
Assumption Checking: Independence
Why is it problematic if the errors in a linear regression model are autocorrelated, such as in time series data?
- A) Standard error estimates become unreliable, leading to invalid hypothesis tests
- B) The slope will always be zero
- C) R-squared value will exceed 1
- D) The residual plot shows only negative values
- E) The response variable must be categorical
Interpreting Regression Output
If a 95% confidence interval for a regression coefficient includes zero, what can you conclude about that predictor?
- A) The predictor may not be statistically significant at the 5% level
- B) The predictor explains all the variability in the response
- C) The predictor must be removed from the dataset
- D) The predictor causes perfect multicollinearity
- E) The regression coefficients cannot be interpreted
Effect of Outliers
How do influential outliers typically affect the fitted regression line?
- A) They can disproportionately shift the regression line and bias parameter estimates
- B) They improve the generalizability of the model
- C) They automatically reduce the residual standard deviation to zero
- D) They have no impact on the line due to normalization
- E) They always make the R-squared value larger
Application of Dummy Variables
When including categorical independent variables, such as gender or region, in a linear regression model, which technique is commonly used?
- A) Creating dummy variables for the categories
- B) Calculating the square root of their means
- C) Ignoring the categorical variables entirely
- D) Applying the chi-square test directly
- E) Dividing all predictors by the median value
Addressing Heteroscedasticity
If a plot of residuals against fitted values shows a fan or cone shape, what statistical issue might be present and what is a common remedy?
- A) Heteroscedasticity; try transforming the dependent variable or using robust standard errors
- B) Multicollinearity; remove one of the variables
- C) Non-stationarity; use lagged variables
- D) Homoscedasticity; proceed as usual
- E) Autocorrelation; shuffle the observations