Test your understanding of SQL joins, co-occurrence analysis, and top-N aggregations in user–item interaction scenarios. This quiz covers essential query techniques for analyzing how users interact with items and identifying popular connections.
Which SQL JOIN type returns only the rows where a user has both a purchase record and a profile in separate tables?
Explanation: INNER JOIN combines rows from two tables based on a related column and only returns matches found in both tables. LEFT JOIN would include unmatched user records from the left table, while FULL JOIN includes all records from both tables regardless of matches. CROSS JOIN returns every combination of rows between both tables, which is not suitable for finding matched user records only.
If you want to find pairs of items that are often bought together by the same users, which SQL concept is best suited for this analysis?
Explanation: A self-join allows you to reference the same table multiple times, which is useful for identifying cases where users bought multiple items together. Outer join is mainly for preserving unmatched data, which is not needed here. UNION combines results from different queries without connecting related rows. Window functions are for advanced aggregation but do not create item-item pairs.
To compute how many unique users have interacted with each item, which aggregate function should be used in SQL?
Explanation: COUNT(DISTINCT user_id) returns the number of different users linked to each item, which is essential for user-item interaction analysis. SUM, AVG, and MIN would calculate numerical summaries of user IDs instead of counting distinct users. This makes COUNT(DISTINCT user_id) the appropriate choice.
Given a table where each row indicates a user viewing a product, what SQL approach lets you find the top N combinations of users who have viewed the same item together?
Explanation: A self-join on the item column pairs users who viewed the same product, and ordering/grouping by user IDs prevents duplicate pairs. RIGHT JOIN is unnecessary and does not focus on the same items, while UNION ALL simply stacks data. Natural JOIN is for matching columns with the same name but may not be intentional and could cause unintended results.
Which SQL clause combo is best for retrieving the 5 items with the highest number of user interactions?
Explanation: ORDER BY interaction_count DESC LIMIT 5 sorts items by interaction count and returns the top 5, making it ideal for top-N queries. GROUP BY user_id LIMIT 5 groups by users, not items. HAVING COUNT(*) = 5 finds items with exactly 5 interactions, not the top ones. SELECT TOP 5 is not valid in all SQL dialects and is not coupled with ordering in this context.
When constructing a co-occurrence matrix for items based on user purchases, which columns should typically be joined on?
Explanation: Joining on user_id links multiple item purchases by the same user, which is necessary for co-occurrence analysis. Joining only on item_id does not show which items co-occurred in the same user's behavior. Joining on date or order_total would not associate the correct co-purchasing behavior.
If you want to list all items and include those that have never been purchased by any user, which SQL join should you use with the user–item interaction logs?
Explanation: LEFT JOIN includes all records from the left table (items) and matches from the interaction logs, displaying items with and without purchases. INNER JOIN would exclude items with no user interactions. RIGHT JOIN is less commonly supported and focuses on the other table. CROSS JOIN would combine all possible item-user pairs, not just actual or potential interactions.
Which SQL clause should you use to show only those items purchased more than 10 times, after grouping by item_id?
Explanation: HAVING filters groups after aggregation, allowing you to show items purchased more than 10 times. WHERE cannot filter based on aggregated values—it works before aggregation. GROUPED is not a SQL keyword. LIMIT restricts the number of output rows but does not filter by count.
What technique should you use to ensure that each user-item pair combination appears only once in SQL aggregation results?
Explanation: GROUP BY user_id, item_id ensures aggregation is performed on unique user-item pairs, avoiding duplicates. ORDER BY only sorts the results, not aggregate them. SELECT * would return all raw records, including duplicates. CROSS JOIN produces all possible pairs, regardless of actual interactions.
To determine the average number of items interacted with by each user, which SQL aggregate function is most suitable?
Explanation: AVG(item_count) calculates the average items per user by first counting items for each user and then averaging across users. SUM(user_id) adds user IDs, which has no analytical value. MAX(user_id) finds the highest user ID, not the average interaction. COUNT(*) just totals rows, not an average per user.