Quiz: Mastering Handling Missing or Inconsistent Data in Datasets Quiz

Imputation Basics
Which method replaces missing numeric data with the most common value in a column?
1. Mode imputation
2. Median impuation
3. Mean imputation
4. Forward fill
5. Drop row
Handling Categorical Missing Values
If you have a missing value in a categorical feature, what is a commonly used placeholder to represent missing data?
1. 'Unknown'
2. 'Mean'
3. '333'
4. 'Nullify'
5. 'Random'
Recognizing Inconsistent Data
You see the listed country values: 'USA', 'U.S.A.', 'United States', and 'usa'. What would best describe this scenario?
1. Inconsistent data formatting
2. Complete missing data
3. Properly encoded dataset
4. Noisy numeric data
5. Type conversion error
Detecting Missing Data with Pandas
Which Pandas function would you use to detect missing values in a DataFrame?
1. isnull()
2. missing()
3. fillna()
4. imputate()
5. dropdf()
Dropping Rows or Columns
In Pandas, which function is used to remove all rows with at least one missing value?
1. dropna()
2. removerows()
3. trmna()
4. isnotna()
5. clear()
Forward Fill Usage
What does the 'ffill' method do when handling missing data in a time series?
1. It fills missing values with the previous non-null value.
2. It replaces missing values with zeros.
3. It drops all remaining nulls at the end of the data.
4. It duplicates the next valid entry.
5. It fills missing values with random values.
Numeric Data Imputation
If you want to minimize the effect of outliers when filling missing numeric values, which method should you use?
1. Median imputation
2. Mean imputation
3. Mode imputation
4. Random sample imputation
5. Zero imputation
Data Consistency
Given the dataset with 'M', 'F', and 'Femail' as possible entries for gender, what is the correct way to make the data consistent?
1. Standardize entries like 'Femail' to 'F'
2. Leave all as is
3. Replace all 'M' and 'F' with 'Unknown'
4. Remove all rows with 'Femail'
5. Ignore inconsistencies
Advanced Imputation
You decide to use KNN imputation on missing values. What does KNN imputation primarily rely on?
1. Similarity to nearby data points
2. Random guessing
3. Filling with overall mean
4. Dropping rows with nulls
5. Reversing the data columns
Identifying Incomplete Rows
Which code snippet identifies rows in a Pandas DataFrame 'df' that contain any missing values?
1. df[df.isnull().any(axis=1)]
2. df[df.notnull().all(axis=0)]
3. df.fillna(df.mean())
4. df[df.empty()]
5. df.remove(nan=True)

Quiz: Mastering Handling Missing or Inconsistent Data in Datasets Quiz

Imputation Basics

Handling Categorical Missing Values

Recognizing Inconsistent Data

Detecting Missing Data with Pandas

Dropping Rows or Columns

Forward Fill Usage

Numeric Data Imputation

Data Consistency

Advanced Imputation

Identifying Incomplete Rows