Quiz on Handling Missing or Inconsistent Data During Data Preprocessing — Questions & Answers

This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.

  1. Question 1: Types of Missing Data

    Which type of missing data occurs when the probability of missingness is related to the observed data but not to the missing data itself?

    • Missing at Random (MAR)
    • Missing Completely at Random (MCRN)
    • Missing Not at Random (MNAR)
    • Missing by Randomization (MBR)
    • Missign at Ramdon (typographical error)
    Show correct answer

    Correct answer: Missing at Random (MAR)

  2. Question 2: Identifying Missing Values in Pandas

    Given the pandas DataFrame df, which code correctly counts the total number of missing values in the entire DataFrame?

    • df.isnull().sum().sum()
    • df.null().count()
    • df.missings().total()
    • df.isnull().count().sum()
    • df.hasna().sum().totl()
    Show correct answer

    Correct answer: df.isnull().sum().sum()

  3. Question 3: Handling Missing Numerical Data

    What is a common and simple method for handling missing values in numerical columns during preprocessing?

    • Imputing the mean
    • Imputing with the value 'unknown'
    • Replacing with zero always
    • Duplicating nearby values
    • Remving the value entirely
    Show correct answer

    Correct answer: Imputing the mean

  4. Question 4: Dropping Rows with Missing Data

    What effect does using dropna(axis=0) in pandas have on a DataFrame?

    • It removes rows containing missing values.
    • It fills missing values with zeros.
    • It removes columns containing missing values.
    • It replaces missing values with NA.
    • It converts missing values to None
    Show correct answer

    Correct answer: It removes rows containing missing values.

  5. Question 5: Imputing Categorical Variables

    When handling missing values in a categorical feature, what is a common imputation strategy?

    • Filling with the mode
    • Filling with the median
    • Filling with the minimum
    • Using mean imputation
    • Replacing with NULL
    Show correct answer

    Correct answer: Filling with the mode

  6. Question 6: Detecting Inconsistent Data

    If a column for gender contains entries like 'Male', 'male', 'M', and 'femal', what inconsistency does this scenario illustrate?

    • Inconsistent data representation
    • Data redundancy
    • Missing data at random
    • Irrelevant feature
    • Datta duplications (with typo)
    Show correct answer

    Correct answer: Inconsistent data representation

  7. Question 7: Using Interpolation Methods

    In time series data, which pandas method allows you to fill missing values by inferring from neighboring points?

    • interpolate()
    • insertna()
    • replace_nulls()
    • imputeAvg()
    • backfillna()
    Show correct answer

    Correct answer: interpolate()

  8. Question 8: Flagging Missing Values

    Why might you want to add a binary indicator column to flag missing values before imputation?

    • To help models learn patterns involving missingness
    • To reduce the size of your dataset
    • To drop more columns in preprocessing
    • To normalize the data
    • To replace all values with the mean
    Show correct answer

    Correct answer: To help models learn patterns involving missingness

  9. Question 9: Consistent Value Formatting

    Which of the following methods is effective for handling inconsistent categorical values such as 'USA', 'United States', and 'us'?

    • Standardizing values using mapping or replacement
    • Imputing missing values with the most frequent value
    • Removing duplicate records only
    • Random forest imputation
    • Sorting the DataFrame
    Show correct answer

    Correct answer: Standardizing values using mapping or replacement

  10. Question 10: Removing Columns with Too Much Missing Data

    What is an appropriate action if a feature column in your dataset contains more than 80% missing values?

    • Consider dropping the column
    • Always fill missing values with zeros
    • Remove only the rows where this feature is missing
    • Impute the mean without any checks
    • Convert the column to a categorical variable
    Show correct answer

    Correct answer: Consider dropping the column