Quiz: Mastering Encoding of Categorical Variables for Machine Learning Quiz

Basics of Categorical Encoding
Which encoding technique transforms each unique category value into a new binary column in the dataset?
1. One-Hot Encoding
2. Label Enconding
3. Ordinal Encodding
4. Hash Encoding
5. Frequency Encoding
Label Encoding Usage
When using Label Encoding, which kind of categorical variables is it most appropriate for?
1. Ordinal variables with a meaningful order
2. Nominal variables with no intrinsic order
3. Continuous variables
4. Variables with missing values only
5. Numerical features
Implications of Incorrect Encoding
What could happen if you apply Label Encoding to nominal variables when building a regression model?
1. The model may mistakenly interpret the categories as ordered
2. The model will ignore the feature
3. No effect; label encoding always works
4. It accelerates model training
5. It automatically normalizes the data
One-Hot Encoding and Feature Explosion
For a categorical feature with 50 unique values, how many columns will One-Hot Encoding produce (without dropping any column)?
1. 50
2. 49
3. 25
4. 2
5. 51
Code Snippet: Pandas get_dummies
Given the code snippet: pd.get_dummies(df['color']), what does this code return?
1. A DataFrame with one binary column for each unique color value
2. A list of unique values in the 'color' column
3. A Series with category frequencies
4. A single column with encoded integers
5. An error due to missing argument
Handling High Cardinality
Which encoding method is typically more efficient for categorical variables with high cardinality (many unique categories)?
1. Hash Encoding
2. One-Hot Encoding
3. Binary Encodingg
4. Label Encoding
5. Dummy Variable Encoding
Dropping First Column in One-Hot Encoding
Why might you set drop_first=True when using one-hot encoding in pandas?
1. To avoid multicollinearity by dropping one redundant column
2. To speed up computation by half
3. It encodes categories as numbers from 1
4. Required for all scikit-learn models
5. It preserves the original category names
Encoding for Tree-Based Models
Which encoding is generally acceptable for categorical features when using tree-based models such as Random Forest?
1. Label Encoding
2. One-Hot Encodding
3. Frequency Encodding
4. Hash Enconding
5. MinMax Encoding
Ordinal Encoding Pitfalls
What is a possible drawback of using Ordinal Encoding on nominal categorical features?
1. It introduces a false sense of order among categories
2. It creates too many columns
3. It's not supported in pandas
4. It only works for missing values
5. It is slower than one-hot encoding
Target Encoding Usage
What does Target Encoding (a.k.a. mean encoding) replace each category with?
1. The mean of the target variable for each category
2. The length of each string in the category
3. Randomly assigned unique integers
4. A new column per category
5. The total count of each category

Quiz: Mastering Encoding of Categorical Variables for Machine Learning Quiz

Basics of Categorical Encoding

Label Encoding Usage

Implications of Incorrect Encoding

One-Hot Encoding and Feature Explosion

Code Snippet: Pandas get_dummies

Handling High Cardinality

Dropping First Column in One-Hot Encoding

Encoding for Tree-Based Models

Ordinal Encoding Pitfalls

Target Encoding Usage