Unlocking Linear Regression: Advanced Concepts and Applications Quiz

  1. Interpretation of Slope in Linear Regression

    In a simple linear regression analysis relating students’ study hours (X) to their test scores (Y), what does the slope coefficient represent?

    1. A) The expected change in test score for each additional hour studied
    2. B) The minimum value that test scores can reach
    3. C) The ratio between average test scores and study hours
    4. D) The proportion of study hours explained by test scores
    5. E) The sum of squared residuals divided by the number of students
  2. Gauss-Markov Theorem Application

    Which assumption is necessary for the Ordinary Least Squares (OLS) estimator in linear regression to be the Best Linear Unbiased Estimator (BLUE)?

    1. A) The variance of the errors is constant (homoscedasticity)
    2. B) There must be at least twice as many predictors as observations
    3. C) The dependent variable must be binary
    4. D) Residuals must always be positive
    5. E) Predictors are always measured in logarithmic scales
  3. Interpreting the R-squared Value

    What does an R-squared value of 0.82 indicate in the context of a linear regression predicting house prices from square footage?

    1. A) 82% of the variance in house prices is explained by square footage
    2. B) 18% of the house prices are predictable from other variables
    3. C) The residual standard error is 0.82
    4. D) 82% of the coefficients are statistically significant
    5. E) The correlation between square footage and price is 0.82
  4. Dealing with Multicollinearity

    If two independent variables in a multiple linear regression are highly correlated, what is the main risk introduced to the model?

    1. A) Coefficient estimates may become unstable and difficult to interpret
    2. B) The regression line will always pass through the origin
    3. C) The model will automatically switch to non-linear regression
    4. D) Prediction residuals will necessarily become negative
    5. E) The number of observations will have to double
  5. Detecting Non-Linearity

    When examining residual plots after fitting a linear regression model, which pattern suggests that the linearity assumption may have been violated?

    1. A) Residuals forming a distinct curve or pattern rather than being randomly scattered
    2. B) Residuals are all near zero
    3. C) Residuals exactly match the predicted values
    4. D) Residuals show a perfectly vertical line
    5. E) Residuals have only positive values
  6. Assumption Checking: Independence

    Why is it problematic if the errors in a linear regression model are autocorrelated, such as in time series data?

    1. A) Standard error estimates become unreliable, leading to invalid hypothesis tests
    2. B) The slope will always be zero
    3. C) R-squared value will exceed 1
    4. D) The residual plot shows only negative values
    5. E) The response variable must be categorical
  7. Interpreting Regression Output

    If a 95% confidence interval for a regression coefficient includes zero, what can you conclude about that predictor?

    1. A) The predictor may not be statistically significant at the 5% level
    2. B) The predictor explains all the variability in the response
    3. C) The predictor must be removed from the dataset
    4. D) The predictor causes perfect multicollinearity
    5. E) The regression coefficients cannot be interpreted
  8. Effect of Outliers

    How do influential outliers typically affect the fitted regression line?

    1. A) They can disproportionately shift the regression line and bias parameter estimates
    2. B) They improve the generalizability of the model
    3. C) They automatically reduce the residual standard deviation to zero
    4. D) They have no impact on the line due to normalization
    5. E) They always make the R-squared value larger
  9. Application of Dummy Variables

    When including categorical independent variables, such as gender or region, in a linear regression model, which technique is commonly used?

    1. A) Creating dummy variables for the categories
    2. B) Calculating the square root of their means
    3. C) Ignoring the categorical variables entirely
    4. D) Applying the chi-square test directly
    5. E) Dividing all predictors by the median value
  10. Addressing Heteroscedasticity

    If a plot of residuals against fitted values shows a fan or cone shape, what statistical issue might be present and what is a common remedy?

    1. A) Heteroscedasticity; try transforming the dependent variable or using robust standard errors
    2. B) Multicollinearity; remove one of the variables
    3. C) Non-stationarity; use lagged variables
    4. D) Homoscedasticity; proceed as usual
    5. E) Autocorrelation; shuffle the observations