Statistical Analysis in R: Hypothesis Testing and Regression Quiz Quiz

Assess your understanding of hypothesis testing techniques and regression analysis in R with these foundational questions. Strengthen your knowledge of interpreting p-values, selecting statistical tests, and applying linear regression concepts for data analysis.

  1. Interpreting a p-value

    If the p-value from a t-test in R is 0.03 and the significance level is 0.05, what should you conclude about the null hypothesis?

    1. Reject the null hypothesis
    2. The test is invalid
    3. Accept the null hypothesis
    4. Increase the sample size

    Explanation: A p-value of 0.03 is less than the significance level of 0.05, so you should reject the null hypothesis, indicating significant evidence against it. Accepting the null hypothesis is incorrect, as statistical testing does not prove the null is true. Increasing sample size is unrelated to interpreting results after the test. The test is not invalid simply because the p-value is low.

  2. Choosing the right test

    Which function in R would you typically use to perform a two-sample t-test comparing means of two independent groups?

    1. lm
    2. chi.sq.test
    3. anova.test
    4. t.test

    Explanation: The t.test function is designed for comparing means between two groups, making it the correct choice here. chi.sq.test is meant for categorical data comparison, not means. anova.test is not a standard function name, and lm is used for linear regression, not directly for t-tests.

  3. Interpreting regression coefficients

    If the coefficient for 'hours_studied' is 2.5 in a simple linear regression predicting test scores in R, what does this mean?

    1. Test scores decrease by 2.5 for each hour studied
    2. Each hour studied increases the predicted test score by 2.5 points
    3. The regression model is not significant
    4. The intercept increases by 2.5

    Explanation: A coefficient of 2.5 for 'hours_studied' means that each additional hour studied is associated with a 2.5 point increase in the predicted test score. It does not relate to model significance, so that option is incorrect. The negative change is not supported by a positive coefficient, and the intercept is a separate parameter from the slope.

  4. Understanding the null hypothesis

    In hypothesis testing, what does the null hypothesis typically state?

    1. The variables are strongly correlated
    2. There is no effect or no difference
    3. The regression line always fits the data perfectly
    4. There is always a significant effect

    Explanation: The null hypothesis usually claims there is no effect or difference, serving as a default assumption to be tested. The statement of always a significant effect refers to the alternative hypothesis, not the null. Strong correlation is not asserted by the null hypothesis. Perfect fit in regression is unrelated to hypothesis testing.

  5. Assumptions in linear regression

    Which assumption must be checked before performing a simple linear regression analysis in R?

    1. Variables are all categorical
    2. Linearity between predictor and response
    3. Constant sample size
    4. No missing data in any software

    Explanation: One key assumption in linear regression is that the relationship between the predictor and the response variable is linear. Sample size should be adequate but does not need to be constant. The method requires at least one numeric variable, so all categorical variables would not fit. While missing data is important to address, its absence is not specific to regression assumptions.

  6. Chi-squared test usage

    When would you use the chisq.test function in R to analyze data?

    1. Creating scatterplots
    2. Comparing categorical variables in a contingency table
    3. Comparing means from two numeric samples
    4. Testing correlation between two continuous variables

    Explanation: chisq.test is appropriate for analyzing associations between categorical variables in a contingency table. Comparing means between numeric samples would instead use t.test. Testing correlation is performed with cor.test, and scatterplots are for visualization, not hypothesis testing.

  7. Two-tailed vs. one-tailed test

    Which statement best describes a two-tailed hypothesis test in R?

    1. It requires more samples than a one-tailed test
    2. It tests for any difference, regardless of direction
    3. It only tests if one mean is greater than another
    4. It is used only in regression models

    Explanation: A two-tailed test checks for differences in both directions, not limited to greater or lesser outcomes. Only testing for one mean being greater refers to a one-tailed test. Sample size requirements and regression model specificity do not define a two-tailed test.

  8. Significance level concept

    What does a significance level (often set at 0.05) represent in hypothesis testing?

    1. The estimated mean difference
    2. The strength of the correlation
    3. The maximum probability of committing a Type I error
    4. The minimum sample size required

    Explanation: The significance level, or alpha, determines the maximum probability of making a Type I error: wrongly rejecting the null hypothesis. It does not relate to sample size or correlation strength. The estimated mean difference is a test outcome, not a significance threshold.

  9. Using lm for regression

    What is the main purpose of the lm function in R?

    1. Drawing histograms
    2. Fitting linear regression models
    3. Calculating chi-squared values
    4. Simulating random data

    Explanation: The lm function is used specifically to fit linear regression models in R. Calculating chi-squared values involves other functions like chisq.test. Simulating data uses functions like rnorm, while drawing histograms is done with hist, not lm.

  10. Interpreting an ANOVA result

    If the ANOVA table in R shows a p-value less than 0.05, what does this suggest about the group means?

    1. All group means are identical
    2. Regression cannot be performed
    3. At least one group mean is significantly different
    4. No variance exists in the data

    Explanation: A p-value below 0.05 in ANOVA indicates that at least one group mean differs significantly from the others. It does not mean all group means are the same or that variance is absent. ANOVA results do not preclude carrying out regression, so that distractor is incorrect.