Data Cleaning and Transformation with dplyr: Fundamentals Quiz Quiz

Assess your understanding of essential data cleaning and transformation techniques using dplyr, including filtering, selecting, mutating, summarizing, and handling missing values. Strengthen your data manipulation skills with practical questions designed for easy comprehension.

  1. Filtering Rows with Conditions

    Which dplyr function allows you to select only rows where the column 'age' is greater than 30?

    1. select
    2. filter
    3. fillter
    4. arrange

    Explanation: The 'filter' function is specifically used to select rows based on given conditions, such as 'age' greater than 30. 'fillter' is a misspelling and is not a valid function in dplyr. The 'select' function changes which columns are included, not which rows. The 'arrange' function orders the rows but does not filter them out.

  2. Selecting Specific Columns

    If you want to keep only the columns 'name' and 'score' from a data frame, which dplyr function is appropriate?

    1. subset
    2. sort
    3. select
    4. slect

    Explanation: The correct function is 'select', which returns only the specified columns from the data. 'sort' is used for ordering data, not choosing columns. 'subset' can filter both rows and columns but is not a dplyr verb. 'slect' is a typo and does not exist in dplyr.

  3. Creating New Columns

    To create a new column called 'total' as the sum of 'math' and 'english' columns, which dplyr function should you use?

    1. summarize
    2. mutate
    3. update
    4. mutat

    Explanation: 'mutate' is the dplyr function for creating new columns or modifying existing ones using calculations or expressions. 'summarize' reduces many values down to one summary per group, not per row, so it is not suitable here. 'update' is not a dplyr function. 'mutat' is a common typographical error.

  4. Sorting Data

    How can you reorder the rows of a data frame by the column 'salary' in descending order?

    1. descend(salary)
    2. arrange(desc(salary))
    3. filter(salary)
    4. arrnage(salary)

    Explanation: The 'arrange' function sorts rows, and 'desc' is used to specify descending order. 'descend' is not a valid function in dplyr. 'filter' selects rows based on logical conditions but does not sort them. 'arrnage' is a typo of 'arrange'.

  5. Summarizing Data

    If you need to calculate the mean 'score' for each 'class', which dplyr function helps along with group_by?

    1. summerize
    2. summarize
    3. compact
    4. mutate

    Explanation: 'summarize' works with 'group_by' to create summary statistics, such as the mean score per class. 'mutate' is for row-wise operations or creating columns, not collapsing values. 'compact' is not a dplyr function. 'summerize' is a frequent misspelling of 'summarize'.

  6. Removing Duplicate Rows

    Which function in dplyr helps remove duplicate rows from a data frame?

    1. duplicates
    2. distict
    3. unique_rows
    4. distinct

    Explanation: The correct answer is 'distinct', which returns only unique rows from the data. 'unique_rows' and 'duplicates' are not dplyr functions. 'distict' is a typographical error and is not recognized by dplyr.

  7. Handling Missing Values

    When cleaning data, which dplyr function allows you to remove all rows with missing values in any column?

    1. remove_na
    2. drop_na
    3. omit_na
    4. keep_na

    Explanation: 'drop_na' removes rows where there are missing values, making it the correct function for cleaning data in this way. 'remove_na' and 'omit_na' sound similar but are not actual dplyr functions. 'keep_na' would imply retaining missing values, which is the opposite of what is needed.

  8. Combining Data Frames by Rows

    Which dplyr function stacks two data frames on top of each other, combining their rows?

    1. join_rows
    2. bind_rows
    3. combine
    4. merge

    Explanation: 'bind_rows' is used for row-wise binding, combining multiple data frames by stacking their rows. 'merge' is a function for joining data frames by common columns but not for simply stacking them. 'combine' and 'join_rows' are not dplyr functions for this purpose.

  9. Renaming Columns

    To rename the column 'height' to 'tallness' in a data frame, which dplyr function should be used?

    1. renmae
    2. changenames
    3. rename
    4. rename_col

    Explanation: 'rename' safely changes the name of existing columns. 'changenames' and 'rename_col' might look appropriate but are not valid dplyr functions. 'renmae' is a common typographical error and would result in an error.

  10. Sampling Random Rows

    Which dplyr function should you use to select a random sample of 10 rows from a data frame?

    1. random_pick
    2. sample_n
    3. select_n
    4. sampel_n

    Explanation: 'sample_n' is designed to randomly select a specified number of rows from a data set. 'random_pick' and 'select_n' are not dplyr functions and will not work for this purpose. 'sampel_n' is a misspelling that will result in an error.