Data Cleaning and Transformation with dplyr: Fundamentals Quiz Quiz

Assess your understanding of essential data cleaning and transformation techniques using dplyr, including filtering, selecting, mutating, summarizing, and handling missing values. Strengthen your data manipulation skills with practical questions designed for easy comprehension.

Filtering Rows with Conditions
Which dplyr function allows you to select only rows where the column 'age' is greater than 30?
1. select
2. filter
3. fillter
4. arrange
Explanation: The 'filter' function is specifically used to select rows based on given conditions, such as 'age' greater than 30. 'fillter' is a misspelling and is not a valid function in dplyr. The 'select' function changes which columns are included, not which rows. The 'arrange' function orders the rows but does not filter them out.
Selecting Specific Columns
If you want to keep only the columns 'name' and 'score' from a data frame, which dplyr function is appropriate?
1. subset
2. sort
3. select
4. slect
Explanation: The correct function is 'select', which returns only the specified columns from the data. 'sort' is used for ordering data, not choosing columns. 'subset' can filter both rows and columns but is not a dplyr verb. 'slect' is a typo and does not exist in dplyr.
Creating New Columns
To create a new column called 'total' as the sum of 'math' and 'english' columns, which dplyr function should you use?
1. summarize
2. mutate
3. update
4. mutat
Explanation: 'mutate' is the dplyr function for creating new columns or modifying existing ones using calculations or expressions. 'summarize' reduces many values down to one summary per group, not per row, so it is not suitable here. 'update' is not a dplyr function. 'mutat' is a common typographical error.
Sorting Data
How can you reorder the rows of a data frame by the column 'salary' in descending order?
1. descend(salary)
2. arrange(desc(salary))
3. filter(salary)
4. arrnage(salary)
Explanation: The 'arrange' function sorts rows, and 'desc' is used to specify descending order. 'descend' is not a valid function in dplyr. 'filter' selects rows based on logical conditions but does not sort them. 'arrnage' is a typo of 'arrange'.
Summarizing Data
If you need to calculate the mean 'score' for each 'class', which dplyr function helps along with group_by?
1. summerize
2. summarize
3. compact
4. mutate
Explanation: 'summarize' works with 'group_by' to create summary statistics, such as the mean score per class. 'mutate' is for row-wise operations or creating columns, not collapsing values. 'compact' is not a dplyr function. 'summerize' is a frequent misspelling of 'summarize'.
Removing Duplicate Rows
Which function in dplyr helps remove duplicate rows from a data frame?
1. duplicates
2. distict
3. unique_rows
4. distinct
Explanation: The correct answer is 'distinct', which returns only unique rows from the data. 'unique_rows' and 'duplicates' are not dplyr functions. 'distict' is a typographical error and is not recognized by dplyr.
Handling Missing Values
When cleaning data, which dplyr function allows you to remove all rows with missing values in any column?
1. remove_na
2. drop_na
3. omit_na
4. keep_na
Explanation: 'drop_na' removes rows where there are missing values, making it the correct function for cleaning data in this way. 'remove_na' and 'omit_na' sound similar but are not actual dplyr functions. 'keep_na' would imply retaining missing values, which is the opposite of what is needed.
Combining Data Frames by Rows
Which dplyr function stacks two data frames on top of each other, combining their rows?
1. join_rows
2. bind_rows
3. combine
4. merge
Explanation: 'bind_rows' is used for row-wise binding, combining multiple data frames by stacking their rows. 'merge' is a function for joining data frames by common columns but not for simply stacking them. 'combine' and 'join_rows' are not dplyr functions for this purpose.
Renaming Columns
To rename the column 'height' to 'tallness' in a data frame, which dplyr function should be used?
1. renmae
2. changenames
3. rename
4. rename_col
Explanation: 'rename' safely changes the name of existing columns. 'changenames' and 'rename_col' might look appropriate but are not valid dplyr functions. 'renmae' is a common typographical error and would result in an error.
Sampling Random Rows
Which dplyr function should you use to select a random sample of 10 rows from a data frame?
1. random_pick
2. sample_n
3. select_n
4. sampel_n
Explanation: 'sample_n' is designed to randomly select a specified number of rows from a data set. 'random_pick' and 'select_n' are not dplyr functions and will not work for this purpose. 'sampel_n' is a misspelling that will result in an error.

Data Cleaning and Transformation with dplyr: Fundamentals Quiz Quiz

Filtering Rows with Conditions

Selecting Specific Columns

Creating New Columns

Sorting Data

Summarizing Data

Removing Duplicate Rows

Handling Missing Values

Combining Data Frames by Rows

Renaming Columns

Sampling Random Rows