Types of Missing Data
Which type of missing data occurs when the probability of missingness is related to the observed data but not to the missing data itself?
- Missing at Random (MAR)
- Missing Completely at Random (MCRN)
- Missing Not at Random (MNAR)
- Missing by Randomization (MBR)
- Missign at Ramdon (typographical error)
Identifying Missing Values in Pandas
Given the pandas DataFrame df, which code correctly counts the total number of missing values in the entire DataFrame?
- df.isnull().sum().sum()
- df.null().count()
- df.missings().total()
- df.isnull().count().sum()
- df.hasna().sum().totl()
Handling Missing Numerical Data
What is a common and simple method for handling missing values in numerical columns during preprocessing?
- Imputing the mean
- Imputing with the value 'unknown'
- Replacing with zero always
- Duplicating nearby values
- Remving the value entirely
Dropping Rows with Missing Data
What effect does using dropna(axis=0) in pandas have on a DataFrame?
- It removes rows containing missing values.
- It fills missing values with zeros.
- It removes columns containing missing values.
- It replaces missing values with NA.
- It converts missing values to None
Imputing Categorical Variables
When handling missing values in a categorical feature, what is a common imputation strategy?
- Filling with the mode
- Filling with the median
- Filling with the minimum
- Using mean imputation
- Replacing with NULL
Detecting Inconsistent Data
If a column for gender contains entries like 'Male', 'male', 'M', and 'femal', what inconsistency does this scenario illustrate?
- Inconsistent data representation
- Data redundancy
- Missing data at random
- Irrelevant feature
- Datta duplications (with typo)
Using Interpolation Methods
In time series data, which pandas method allows you to fill missing values by inferring from neighboring points?
- interpolate()
- insertna()
- replace_nulls()
- imputeAvg()
- backfillna()
Flagging Missing Values
Why might you want to add a binary indicator column to flag missing values before imputation?
- To help models learn patterns involving missingness
- To reduce the size of your dataset
- To drop more columns in preprocessing
- To normalize the data
- To replace all values with the mean
Consistent Value Formatting
Which of the following methods is effective for handling inconsistent categorical values such as 'USA', 'United States', and 'us'?
- Standardizing values using mapping or replacement
- Imputing missing values with the most frequent value
- Removing duplicate records only
- Random forest imputation
- Sorting the DataFrame
Removing Columns with Too Much Missing Data
What is an appropriate action if a feature column in your dataset contains more than 80% missing values?
- Consider dropping the column
- Always fill missing values with zeros
- Remove only the rows where this feature is missing
- Impute the mean without any checks
- Convert the column to a categorical variable