Getting Started with Data Cleaning and Preprocessing: Easy Quiz — Questions & Answers

This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.

  1. Question 1: Identifying Missing Data

    When a dataset contains empty cells in the 'Age' column, which term best describes those empty values?

    • A. Missing values
    • B. Duplicated entries
    • C. Outliers
    • D. Noisy data
    • E. Meta values
    Show correct answer

    Correct answer: A. Missing values

  2. Question 2: Removing Duplicate Records

    Which data cleaning step involves deleting repeated rows, such as having the exact same customer information appear twice in a table?

    • A. Standardization
    • B. Tokenization
    • C. Removing duplicates
    • D. Clustering
    • E. Smoothing
    Show correct answer

    Correct answer: C. Removing duplicates

  3. Question 3: Understanding Outliers

    In a sales dataset, a single entry showing a sales value much higher than the others could indicate what?

    • A. Consistency
    • B. Trend
    • C. Outlier
    • D. Normalization
    • E. Null value
    Show correct answer

    Correct answer: C. Outlier

  4. Question 4: Dealing with Typos

    If a column meant to store 'Yes' or 'No' contains values like 'Ye' or 'N0', what data issue is this?

    • A. Encoding drift
    • B. Typo errors
    • C. Scaling
    • D. Feature selection
    • E. Outlier injection
    Show correct answer

    Correct answer: B. Typo errors

  5. Question 5: Data Normalization

    What is it called when you adjust numerical values to a similar range, such as converting all ages to values between 0 and 1?

    • A. Normalization
    • B. Parsing
    • C. Categorizing
    • D. Merging
    • E. Randomization
    Show correct answer

    Correct answer: A. Normalization

  6. Question 6: Categorical Data Encoding

    Transforming text labels like 'red', 'green', and 'blue' in a color column into numbers is best known as what?

    • A. Decoding
    • B. Label encoding
    • C. Aggregation
    • D. Sampling
    • E. Binning
    Show correct answer

    Correct answer: B. Label encoding

  7. Question 7: Feature Scaling Tools

    Which method can you use to ensure all features contribute equally to analysis, such as giving equal weight to 'height' in cm and 'weight' in kg?

    • A. Feature removal
    • B. Feature scaling
    • C. Data shuffling
    • D. Row expansion
    • E. Random injection
    Show correct answer

    Correct answer: B. Feature scaling

  8. Question 8: Handling Inconsistent Formats

    If some dates are formatted as '01/02/2023' and others as '2023-02-01', what type of problem does this present?

    • A. Data consistency issue
    • B. Filtering noise
    • C. Overfitting pattern
    • D. Cluster drift
    • E. Smoothing outliers
    Show correct answer

    Correct answer: A. Data consistency issue

  9. Question 9: Dealing with Noisy Data

    Suppose a sensor records temperature as 20, 21, 500, 22, 23; what is the term for unusually high or low values that may distort analysis?

    • A. Noise
    • B. Dropouts
    • C. Typecasting
    • D. Encapsulation
    • E. Nulling
    Show correct answer

    Correct answer: A. Noise

  10. Question 10: Imputing Missing Values

    If you fill empty cells in a 'salary' column with the average salary from the data, which technique are you using?

    • A. Sampling
    • B. Imputation
    • C. Redundancy removal
    • D. One-hot thresholding
    • E. Discretization
    Show correct answer

    Correct answer: B. Imputation