Quiz: Mastering Handling Missing or Inconsistent Data in Datasets — Questions & Answers

This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.

  1. Question 1: Imputation Basics

    Which method replaces missing numeric data with the most common value in a column?

    • Mode imputation
    • Median impuation
    • Mean imputation
    • Forward fill
    • Drop row
    Show correct answer

    Correct answer: Mode imputation

  2. Question 2: Handling Categorical Missing Values

    If you have a missing value in a categorical feature, what is a commonly used placeholder to represent missing data?

    • 'Unknown'
    • 'Mean'
    • '333'
    • 'Nullify'
    • 'Random'
    Show correct answer

    Correct answer: 'Unknown'

  3. Question 3: Recognizing Inconsistent Data

    You see the listed country values: 'USA', 'U.S.A.', 'United States', and 'usa'. What would best describe this scenario?

    • Inconsistent data formatting
    • Complete missing data
    • Properly encoded dataset
    • Noisy numeric data
    • Type conversion error
    Show correct answer

    Correct answer: Inconsistent data formatting

  4. Question 4: Detecting Missing Data with Pandas

    Which Pandas function would you use to detect missing values in a DataFrame?

    • isnull()
    • missing()
    • fillna()
    • imputate()
    • dropdf()
    Show correct answer

    Correct answer: isnull()

  5. Question 5: Dropping Rows or Columns

    In Pandas, which function is used to remove all rows with at least one missing value?

    • dropna()
    • removerows()
    • trmna()
    • isnotna()
    • clear()
    Show correct answer

    Correct answer: dropna()

  6. Question 6: Forward Fill Usage

    What does the 'ffill' method do when handling missing data in a time series?

    • It fills missing values with the previous non-null value.
    • It replaces missing values with zeros.
    • It drops all remaining nulls at the end of the data.
    • It duplicates the next valid entry.
    • It fills missing values with random values.
    Show correct answer

    Correct answer: It fills missing values with the previous non-null value.

  7. Question 7: Numeric Data Imputation

    If you want to minimize the effect of outliers when filling missing numeric values, which method should you use?

    • Median imputation
    • Mean imputation
    • Mode imputation
    • Random sample imputation
    • Zero imputation
    Show correct answer

    Correct answer: Median imputation

  8. Question 8: Data Consistency

    Given the dataset with 'M', 'F', and 'Femail' as possible entries for gender, what is the correct way to make the data consistent?

    • Standardize entries like 'Femail' to 'F'
    • Leave all as is
    • Replace all 'M' and 'F' with 'Unknown'
    • Remove all rows with 'Femail'
    • Ignore inconsistencies
    Show correct answer

    Correct answer: Standardize entries like 'Femail' to 'F'

  9. Question 9: Advanced Imputation

    You decide to use KNN imputation on missing values. What does KNN imputation primarily rely on?

    • Similarity to nearby data points
    • Random guessing
    • Filling with overall mean
    • Dropping rows with nulls
    • Reversing the data columns
    Show correct answer

    Correct answer: Similarity to nearby data points

  10. Question 10: Identifying Incomplete Rows

    Which code snippet identifies rows in a Pandas DataFrame 'df' that contain any missing values?

    • df[df.isnull().any(axis=1)]
    • df[df.notnull().all(axis=0)]
    • df.fillna(df.mean())
    • df[df.empty()]
    • df.remove(nan=True)
    Show correct answer

    Correct answer: df[df.isnull().any(axis=1)]