Mastering Missing Data: Choosing Between Mean, Median, Mode, or Drop Quiz

  1. Missing Data Basics

    When you encounter missing values in a dataset, which strategy involves removing entire rows that contain missing values?

    1. A. Dropping
    2. B. Averaging
    3. C. Interpolating
    4. D. Filling with minimum
    5. E. Standardizing
  2. Mean Imputation

    If a column of numbers has missing values, which method replaces the missing values with the arithmetic average of the existing data?

    1. A. Mode imputation
    2. B. Mean imputation
    3. C. Median replacement
    4. D. Maximum imputation
    5. E. Random sampling
  3. Median Replacement

    For a dataset containing outliers, which method is most robust: replacing missing values with the mean, median, or mode?

    1. A. Mean
    2. B. Minimum
    3. C. Median
    4. D. Mode
    5. E. All give same result
  4. Mode for Categorical Data

    When handling missing values in a categorical column (e.g., color: red, blue, green), which imputation method is most appropriate?

    1. A. Median
    2. B. Mean
    3. C. Mode
    4. D. Drop the column
    5. E. Use next value
  5. Imputation vs. Deletion

    If a dataset has only a few missing values, which action is generally safer to preserve data: imputing or dropping?

    1. A. Dropping
    2. B. Imputing
    3. C. Replacing all
    4. D. Ignoring missing
    5. E. Normalizing
  6. Mean Weakness

    Why might replacing missing values with the mean not be the best choice in a skewed dataset?

    1. A. Mean always equals median
    2. B. Mean is sensitive to outliers
    3. C. Mean is always higher
    4. D. Mean ignores missing values
    5. E. Mean is for category data
  7. Unique Situations

    If an entire column has all values missing, what is the most logical action?

    1. A. Fill with mode
    2. B. Replace with zeros
    3. C. Drop the column
    4. D. Forward fill
    5. E. Fill with random values
  8. Continuous vs. Categorical

    Which method is least appropriate for dealing with missing data in a continuous numerical variable?

    1. A. Mean imputation
    2. B. Median infill
    3. C. Zero replacement
    4. D. Mode imputation
    5. E. Interpolate
  9. Consequence of Dropping

    What is a potential downside of dropping all rows with missing data from your dataset?

    1. A. Increased accuracy
    2. B. Reduced sample size
    3. C. Less missing data
    4. D. More outliers
    5. E. Extra variables
  10. Real-life Example

    Suppose a dataset records student scores, and some scores are missing. Which method would distort the highest if one student scored much higher than the rest?

    1. A. Drop missing scores
    2. B. Fill with mode
    3. C. Fill with mean
    4. D. Fill with median
    5. E. Fill with minimum