Getting Started with Data Cleaning and Preprocessing: Easy Quiz Quiz

  1. Identifying Missing Data

    When a dataset contains empty cells in the 'Age' column, which term best describes those empty values?

    1. A. Missing values
    2. B. Duplicated entries
    3. C. Outliers
    4. D. Noisy data
    5. E. Meta values
  2. Removing Duplicate Records

    Which data cleaning step involves deleting repeated rows, such as having the exact same customer information appear twice in a table?

    1. A. Standardization
    2. B. Tokenization
    3. C. Removing duplicates
    4. D. Clustering
    5. E. Smoothing
  3. Understanding Outliers

    In a sales dataset, a single entry showing a sales value much higher than the others could indicate what?

    1. A. Consistency
    2. B. Trend
    3. C. Outlier
    4. D. Normalization
    5. E. Null value
  4. Dealing with Typos

    If a column meant to store 'Yes' or 'No' contains values like 'Ye' or 'N0', what data issue is this?

    1. A. Encoding drift
    2. B. Typo errors
    3. C. Scaling
    4. D. Feature selection
    5. E. Outlier injection
  5. Data Normalization

    What is it called when you adjust numerical values to a similar range, such as converting all ages to values between 0 and 1?

    1. A. Normalization
    2. B. Parsing
    3. C. Categorizing
    4. D. Merging
    5. E. Randomization
  6. Categorical Data Encoding

    Transforming text labels like 'red', 'green', and 'blue' in a color column into numbers is best known as what?

    1. A. Decoding
    2. B. Label encoding
    3. C. Aggregation
    4. D. Sampling
    5. E. Binning
  7. Feature Scaling Tools

    Which method can you use to ensure all features contribute equally to analysis, such as giving equal weight to 'height' in cm and 'weight' in kg?

    1. A. Feature removal
    2. B. Feature scaling
    3. C. Data shuffling
    4. D. Row expansion
    5. E. Random injection
  8. Handling Inconsistent Formats

    If some dates are formatted as '01/02/2023' and others as '2023-02-01', what type of problem does this present?

    1. A. Data consistency issue
    2. B. Filtering noise
    3. C. Overfitting pattern
    4. D. Cluster drift
    5. E. Smoothing outliers
  9. Dealing with Noisy Data

    Suppose a sensor records temperature as 20, 21, 500, 22, 23; what is the term for unusually high or low values that may distort analysis?

    1. A. Noise
    2. B. Dropouts
    3. C. Typecasting
    4. D. Encapsulation
    5. E. Nulling
  10. Imputing Missing Values

    If you fill empty cells in a 'salary' column with the average salary from the data, which technique are you using?

    1. A. Sampling
    2. B. Imputation
    3. C. Redundancy removal
    4. D. One-hot thresholding
    5. E. Discretization