Quiz on Handling Missing or Inconsistent Data During Data Preprocessing Quiz

  1. Types of Missing Data

    Which type of missing data occurs when the probability of missingness is related to the observed data but not to the missing data itself?

    1. Missing at Random (MAR)
    2. Missing Completely at Random (MCRN)
    3. Missing Not at Random (MNAR)
    4. Missing by Randomization (MBR)
    5. Missign at Ramdon (typographical error)
  2. Identifying Missing Values in Pandas

    Given the pandas DataFrame df, which code correctly counts the total number of missing values in the entire DataFrame?

    1. df.isnull().sum().sum()
    2. df.null().count()
    3. df.missings().total()
    4. df.isnull().count().sum()
    5. df.hasna().sum().totl()
  3. Handling Missing Numerical Data

    What is a common and simple method for handling missing values in numerical columns during preprocessing?

    1. Imputing the mean
    2. Imputing with the value 'unknown'
    3. Replacing with zero always
    4. Duplicating nearby values
    5. Remving the value entirely
  4. Dropping Rows with Missing Data

    What effect does using dropna(axis=0) in pandas have on a DataFrame?

    1. It removes rows containing missing values.
    2. It fills missing values with zeros.
    3. It removes columns containing missing values.
    4. It replaces missing values with NA.
    5. It converts missing values to None
  5. Imputing Categorical Variables

    When handling missing values in a categorical feature, what is a common imputation strategy?

    1. Filling with the mode
    2. Filling with the median
    3. Filling with the minimum
    4. Using mean imputation
    5. Replacing with NULL
  6. Detecting Inconsistent Data

    If a column for gender contains entries like 'Male', 'male', 'M', and 'femal', what inconsistency does this scenario illustrate?

    1. Inconsistent data representation
    2. Data redundancy
    3. Missing data at random
    4. Irrelevant feature
    5. Datta duplications (with typo)
  7. Using Interpolation Methods

    In time series data, which pandas method allows you to fill missing values by inferring from neighboring points?

    1. interpolate()
    2. insertna()
    3. replace_nulls()
    4. imputeAvg()
    5. backfillna()
  8. Flagging Missing Values

    Why might you want to add a binary indicator column to flag missing values before imputation?

    1. To help models learn patterns involving missingness
    2. To reduce the size of your dataset
    3. To drop more columns in preprocessing
    4. To normalize the data
    5. To replace all values with the mean
  9. Consistent Value Formatting

    Which of the following methods is effective for handling inconsistent categorical values such as 'USA', 'United States', and 'us'?

    1. Standardizing values using mapping or replacement
    2. Imputing missing values with the most frequent value
    3. Removing duplicate records only
    4. Random forest imputation
    5. Sorting the DataFrame
  10. Removing Columns with Too Much Missing Data

    What is an appropriate action if a feature column in your dataset contains more than 80% missing values?

    1. Consider dropping the column
    2. Always fill missing values with zeros
    3. Remove only the rows where this feature is missing
    4. Impute the mean without any checks
    5. Convert the column to a categorical variable