Assess your understanding of hypothesis testing techniques and regression analysis in R with these foundational questions. Strengthen your knowledge of interpreting p-values, selecting statistical tests, and applying linear regression concepts for data analysis.
If the p-value from a t-test in R is 0.03 and the significance level is 0.05, what should you conclude about the null hypothesis?
Explanation: A p-value of 0.03 is less than the significance level of 0.05, so you should reject the null hypothesis, indicating significant evidence against it. Accepting the null hypothesis is incorrect, as statistical testing does not prove the null is true. Increasing sample size is unrelated to interpreting results after the test. The test is not invalid simply because the p-value is low.
Which function in R would you typically use to perform a two-sample t-test comparing means of two independent groups?
Explanation: The t.test function is designed for comparing means between two groups, making it the correct choice here. chi.sq.test is meant for categorical data comparison, not means. anova.test is not a standard function name, and lm is used for linear regression, not directly for t-tests.
If the coefficient for 'hours_studied' is 2.5 in a simple linear regression predicting test scores in R, what does this mean?
Explanation: A coefficient of 2.5 for 'hours_studied' means that each additional hour studied is associated with a 2.5 point increase in the predicted test score. It does not relate to model significance, so that option is incorrect. The negative change is not supported by a positive coefficient, and the intercept is a separate parameter from the slope.
In hypothesis testing, what does the null hypothesis typically state?
Explanation: The null hypothesis usually claims there is no effect or difference, serving as a default assumption to be tested. The statement of always a significant effect refers to the alternative hypothesis, not the null. Strong correlation is not asserted by the null hypothesis. Perfect fit in regression is unrelated to hypothesis testing.
Which assumption must be checked before performing a simple linear regression analysis in R?
Explanation: One key assumption in linear regression is that the relationship between the predictor and the response variable is linear. Sample size should be adequate but does not need to be constant. The method requires at least one numeric variable, so all categorical variables would not fit. While missing data is important to address, its absence is not specific to regression assumptions.
When would you use the chisq.test function in R to analyze data?
Explanation: chisq.test is appropriate for analyzing associations between categorical variables in a contingency table. Comparing means between numeric samples would instead use t.test. Testing correlation is performed with cor.test, and scatterplots are for visualization, not hypothesis testing.
Which statement best describes a two-tailed hypothesis test in R?
Explanation: A two-tailed test checks for differences in both directions, not limited to greater or lesser outcomes. Only testing for one mean being greater refers to a one-tailed test. Sample size requirements and regression model specificity do not define a two-tailed test.
What does a significance level (often set at 0.05) represent in hypothesis testing?
Explanation: The significance level, or alpha, determines the maximum probability of making a Type I error: wrongly rejecting the null hypothesis. It does not relate to sample size or correlation strength. The estimated mean difference is a test outcome, not a significance threshold.
What is the main purpose of the lm function in R?
Explanation: The lm function is used specifically to fit linear regression models in R. Calculating chi-squared values involves other functions like chisq.test. Simulating data uses functions like rnorm, while drawing histograms is done with hist, not lm.
If the ANOVA table in R shows a p-value less than 0.05, what does this suggest about the group means?
Explanation: A p-value below 0.05 in ANOVA indicates that at least one group mean differs significantly from the others. It does not mean all group means are the same or that variance is absent. ANOVA results do not preclude carrying out regression, so that distractor is incorrect.