Explore essential practices in data cleaning, manipulation, and visualization…
Start QuizDiscover essential techniques for exploring datasets using Pandas built-in…
Start QuizExplore the foundational preprocessing steps that enhance the quality…
Start QuizExplore key Pandas techniques for data visualization, preprocessing, and…
Start QuizTest your knowledge of using hash maps and sets…
Start QuizTest your knowledge of data preprocessing essentials! This quiz…
Start QuizSharpen your skills in feature engineering with this quiz!…
Start QuizLevel up your understanding of data preprocessing with this…
Start QuizSharpen your skills in feature engineering with this quiz!…
Start QuizTest your knowledge of outlier detection in datasets! Learn…
Start QuizSharpen your skills in handling missing data! This quiz…
Start QuizTest your knowledge of data cleaning fundamentals! This beginner-friendly…
Start QuizTest your foundational knowledge of SQL joins, group-by aggregations, and per-entity statistics to enhance feature engineering in relational data. This quiz covers key concepts like join types, aggregate functions, and techniques for generating features such as entity counts and group-wise summaries.
This quiz contains 16 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.
Which SQL join includes all rows from the left table and only the matching rows from the right table when enriching customer records with their purchase data?
Correct answer: LEFT JOIN
Explanation: A LEFT JOIN returns all records from the left table (including those without matches in the right table), which is useful when you want every customer even if some have no purchases. An INNER JOIN only gives records with matching rows in both tables, so customers without purchases would be excluded. RIGHT JOIN returns all records from the right table regardless of matches in the left, which would not fit this scenario. CROSS JOIN creates all possible combinations, which is generally not used for feature enrichment.
Which SQL aggregate function is best used to count the number of orders each user has placed?
Correct answer: COUNT
Explanation: COUNT is used to count the number of non-null rows or values, making it ideal for counting orders per user. SUM adds numeric values, not the number of records. AVG calculates the average, not the count, and MIN returns the minimum value, not the total number of occurrences.
What SQL clause is used to group purchase data by user_id so you can calculate total spending per user?
Correct answer: GROUP BY
Explanation: GROUP BY groups rows by specified columns such as user_id, allowing you to apply aggregate functions like SUM for feature creation. ORDER BY only sorts results and does not aggregate. WHERE filters rows before applying aggregations, and HAVING filters groups after aggregation.
What happens with an INNER JOIN between a users table and a transactions table if a user has no transactions?
Correct answer: The user will be excluded from the result set.
Explanation: INNER JOIN only returns rows where there are matches in both tables, so users without transactions are omitted. LEFT JOIN would include the user with NULL transaction values. It will not automatically display zeros for missing transactions unless handled in the query. No error occurs; it's valid to have mismatched rows.
To calculate the average transaction amount per customer, which SQL function should be used with GROUP BY?
Correct answer: AVG
Explanation: AVG returns the average of numeric values, which is exactly what is needed for average transaction amount. MIN returns the lowest value, not the average. CONCAT combines text values, not appropriate here. JOIN is not a function, but a keyword for linking tables.
Which join type combines two tables by showing only rows where there are matching keys in both, such as matching students to their classes?
Correct answer: INNER JOIN
Explanation: INNER JOIN only returns rows with matching keys in both tables, which ensures only students enrolled in classes appear. FULL JOIN includes all rows from both tables regardless of matches. LEFT JOIN includes all from the left table. CROSS JOIN creates all possible combinations, not filtered by keys.
If you want to get both total and average sales per store in a sales table, which aggregate functions can you combine with GROUP BY?
Correct answer: SUM and AVG
Explanation: SUM calculates total sales, while AVG computes the average, making them suitable to combine with GROUP BY for these features. COUNT tells you the number of sales, not the totals or averages. MIN and MAX show the extremes, not the sums or averages. JOIN is not an aggregate function.
Which SQL clause is used to filter groups after aggregation, such as returning only users with more than three logins?
Correct answer: HAVING
Explanation: HAVING lets you filter based on the result of aggregate functions, like 'COUNT(*) > 3'. WHERE cannot be used with aggregated results. ORDER BY sorts the results, and GROUP is not a valid clause on its own.
To count the number of unique items each user purchased, which SQL expression is appropriate?
Correct answer: COUNT(DISTINCT item_id)
Explanation: COUNT(DISTINCT item_id) counts the number of unique item_ids for each user. SUM(UNIQUE ...) is invalid because SUM adds values and UNIQUE is not an SQL keyword. COUNT(ALL ...) is redundant as COUNT by itself counts all non-null values, not unique ones. DISTINCT COUNT(...) is incorrect syntax.
Which query would produce a feature showing the highest score achieved by each player from a games table?
Correct answer: SELECT player_id, MAX(score) FROM games GROUP BY player_id;
Explanation: Using MAX(score) with GROUP BY player_id delivers the highest score for each player. SUM(score) gives the total, not the maximum. The other queries group by score, which is not useful for per-player stats, and have incorrect aggregate logic.
What values will appear in the columns from the left table if there is no match in the right table when using a RIGHT JOIN?
Correct answer: NULL
Explanation: RIGHT JOIN returns NULL in columns from the left table if there's no match with the right table row. Zero is not automatically substituted unless handled in the query. Empty strings also do not appear unless it's a string column with a default, and duplicate key is unrelated to join behavior.
When joining sales and inventory tables on both product_id and store_id, what must you include in your ON clause?
Correct answer: Both product_id and store_id conditions
Explanation: The ON clause should match both product_id and store_id to ensure correct rows are joined from each table. Using only one key may result in incorrect matches. GROUP BY is used after JOIN for aggregations, not for specifying join logic.
Which situation is a CROSS JOIN most likely to be useful for in feature engineering?
Correct answer: Generating all possible combinations between entities
Explanation: CROSS JOIN creates all possible pairings between rows of two tables, which might be useful for feature construction in some scenarios. It does not filter nulls, aggregate values, or restrict records to only matches (as INNER JOIN does).
How can you create both per-user total and average purchase value in a single SQL query from a purchases table?
Correct answer: By grouping by user_id and selecting SUM(amount) and AVG(amount)
Explanation: You can select SUM(amount) and AVG(amount) together while grouping by user_id to get total and average values per user. LEFT JOIN is for combining tables, not aggregating. COUNT and DISTINCT can count users but do not give total or average purchases. ORDER BY only sorts and doesn't aggregate.
After calculating per-category item counts using GROUP BY category_id, which clause would you use to display the categories with the most items at the top?
Correct answer: ORDER BY count DESC
Explanation: ORDER BY count DESC puts the highest counts first, which is ideal for sorting categories. WHERE doesn't work after aggregation for this purpose. HAVING filters groups but doesn’t sort. GROUP BY count is not valid; you group by a column, not by an aggregated value.
In feature engineering with joins, how can you avoid duplicate rows when joining user and event tables if users have multiple events?
Correct answer: Aggregate event data with GROUP BY before joining
Explanation: Aggregating event data with GROUP BY before joining ensures one row per user and prevents duplication in the output. CROSS JOIN would increase duplicates rather than reduce them. DISTINCT may remove duplicates, but won't address the source of redundancy if multiple event rows exist. Joining on non-key columns might create more duplicates rather than prevent them.