Quiz: Mastering Encoding of Categorical Variables for Machine Learning Quiz

  1. Basics of Categorical Encoding

    Which encoding technique transforms each unique category value into a new binary column in the dataset?

    1. One-Hot Encoding
    2. Label Enconding
    3. Ordinal Encodding
    4. Hash Encoding
    5. Frequency Encoding
  2. Label Encoding Usage

    When using Label Encoding, which kind of categorical variables is it most appropriate for?

    1. Ordinal variables with a meaningful order
    2. Nominal variables with no intrinsic order
    3. Continuous variables
    4. Variables with missing values only
    5. Numerical features
  3. Implications of Incorrect Encoding

    What could happen if you apply Label Encoding to nominal variables when building a regression model?

    1. The model may mistakenly interpret the categories as ordered
    2. The model will ignore the feature
    3. No effect; label encoding always works
    4. It accelerates model training
    5. It automatically normalizes the data
  4. One-Hot Encoding and Feature Explosion

    For a categorical feature with 50 unique values, how many columns will One-Hot Encoding produce (without dropping any column)?

    1. 50
    2. 49
    3. 25
    4. 2
    5. 51
  5. Code Snippet: Pandas get_dummies

    Given the code snippet: pd.get_dummies(df['color']), what does this code return?

    1. A DataFrame with one binary column for each unique color value
    2. A list of unique values in the 'color' column
    3. A Series with category frequencies
    4. A single column with encoded integers
    5. An error due to missing argument
  6. Handling High Cardinality

    Which encoding method is typically more efficient for categorical variables with high cardinality (many unique categories)?

    1. Hash Encoding
    2. One-Hot Encoding
    3. Binary Encodingg
    4. Label Encoding
    5. Dummy Variable Encoding
  7. Dropping First Column in One-Hot Encoding

    Why might you set drop_first=True when using one-hot encoding in pandas?

    1. To avoid multicollinearity by dropping one redundant column
    2. To speed up computation by half
    3. It encodes categories as numbers from 1
    4. Required for all scikit-learn models
    5. It preserves the original category names
  8. Encoding for Tree-Based Models

    Which encoding is generally acceptable for categorical features when using tree-based models such as Random Forest?

    1. Label Encoding
    2. One-Hot Encodding
    3. Frequency Encodding
    4. Hash Enconding
    5. MinMax Encoding
  9. Ordinal Encoding Pitfalls

    What is a possible drawback of using Ordinal Encoding on nominal categorical features?

    1. It introduces a false sense of order among categories
    2. It creates too many columns
    3. It's not supported in pandas
    4. It only works for missing values
    5. It is slower than one-hot encoding
  10. Target Encoding Usage

    What does Target Encoding (a.k.a. mean encoding) replace each category with?

    1. The mean of the target variable for each category
    2. The length of each string in the category
    3. Randomly assigned unique integers
    4. A new column per category
    5. The total count of each category