Quiz on Handling Missing or Inconsistent Data During Data Preprocessing Quiz

Types of Missing Data
Which type of missing data occurs when the probability of missingness is related to the observed data but not to the missing data itself?
1. Missing at Random (MAR)
2. Missing Completely at Random (MCRN)
3. Missing Not at Random (MNAR)
4. Missing by Randomization (MBR)
5. Missign at Ramdon (typographical error)
Identifying Missing Values in Pandas
Given the pandas DataFrame df, which code correctly counts the total number of missing values in the entire DataFrame?
1. df.isnull().sum().sum()
2. df.null().count()
3. df.missings().total()
4. df.isnull().count().sum()
5. df.hasna().sum().totl()
Handling Missing Numerical Data
What is a common and simple method for handling missing values in numerical columns during preprocessing?
1. Imputing the mean
2. Imputing with the value 'unknown'
3. Replacing with zero always
4. Duplicating nearby values
5. Remving the value entirely
Dropping Rows with Missing Data
What effect does using dropna(axis=0) in pandas have on a DataFrame?
1. It removes rows containing missing values.
2. It fills missing values with zeros.
3. It removes columns containing missing values.
4. It replaces missing values with NA.
5. It converts missing values to None
Imputing Categorical Variables
When handling missing values in a categorical feature, what is a common imputation strategy?
1. Filling with the mode
2. Filling with the median
3. Filling with the minimum
4. Using mean imputation
5. Replacing with NULL
Detecting Inconsistent Data
If a column for gender contains entries like 'Male', 'male', 'M', and 'femal', what inconsistency does this scenario illustrate?
1. Inconsistent data representation
2. Data redundancy
3. Missing data at random
4. Irrelevant feature
5. Datta duplications (with typo)
Using Interpolation Methods
In time series data, which pandas method allows you to fill missing values by inferring from neighboring points?
1. interpolate()
2. insertna()
3. replace_nulls()
4. imputeAvg()
5. backfillna()
Flagging Missing Values
Why might you want to add a binary indicator column to flag missing values before imputation?
1. To help models learn patterns involving missingness
2. To reduce the size of your dataset
3. To drop more columns in preprocessing
4. To normalize the data
5. To replace all values with the mean
Consistent Value Formatting
Which of the following methods is effective for handling inconsistent categorical values such as 'USA', 'United States', and 'us'?
1. Standardizing values using mapping or replacement
2. Imputing missing values with the most frequent value
3. Removing duplicate records only
4. Random forest imputation
5. Sorting the DataFrame
Removing Columns with Too Much Missing Data
What is an appropriate action if a feature column in your dataset contains more than 80% missing values?
1. Consider dropping the column
2. Always fill missing values with zeros
3. Remove only the rows where this feature is missing
4. Impute the mean without any checks
5. Convert the column to a categorical variable

Quiz on Handling Missing or Inconsistent Data During Data Preprocessing Quiz

Types of Missing Data

Identifying Missing Values in Pandas

Handling Missing Numerical Data

Dropping Rows with Missing Data

Imputing Categorical Variables

Detecting Inconsistent Data

Using Interpolation Methods

Flagging Missing Values

Consistent Value Formatting

Removing Columns with Too Much Missing Data