SQL Joins and Aggregations for Data Pipelines Quiz

Test your knowledge of SQL joins, aggregations, deduplication, rollups, and upserts. This quiz helps you review important SQL concepts for building effective pipeline transforms and data workflows.

  1. Understanding INNER JOIN Usage

    Which SQL join type returns only the rows that have matching values in both tables, such as when combining orders and customers?

    1. LEFT JOIN
    2. OUTER JOIN
    3. FULL JOIN
    4. INNER JOIN

    Explanation: INNER JOIN returns rows that have matching values in both joined tables, making it useful for combining related data. LEFT JOIN includes all rows from the left table, even if there is no match, while FULL JOIN combines all rows from both sides, with NULLs if there is no match. OUTER JOIN is a general term and is not a specific operator in SQL syntax.

  2. Applying GROUP BY for Aggregations

    What SQL clause should you use to roll up total sales amounts by region from a transactions table?

    1. ORDER BY
    2. GROUP BY
    3. HAVING
    4. DISTINCT

    Explanation: GROUP BY groups rows sharing the same value in specified columns, commonly used with aggregate functions to create rollups. ORDER BY is for sorting results. HAVING filters groups after aggregation, and DISTINCT eliminates duplicate rows without aggregation.

  3. Using COUNT for Deduplication

    If you want to count the number of unique customer IDs in a sales table, which SQL function should you use?

    1. SUM
    2. AVG
    3. COUNT(DISTINCT ...)
    4. COUNT(*)

    Explanation: COUNT(DISTINCT ...) counts unique, non-null values in a column, perfect for deduplicating results. SUM adds up values, AVG returns the mean, and COUNT(*) counts all rows, including duplicates.

  4. LEFT JOIN Result Rows

    If you use a LEFT JOIN between users and orders, what will happen to users with no orders?

    1. They are excluded from the result.
    2. They are duplicated in the result.
    3. They appear with NULLs for order columns.
    4. They appear with zeroes in order columns.

    Explanation: LEFT JOIN includes all rows from the left table (users) and fills in NULLs for right table (orders) columns where no match exists. They are not excluded (that's INNER JOIN), nor would they have zeroes unless specifically assigned. Duplication only happens if multiple matches exist.

  5. Identifying a Rollup Query

    Which query type is used to provide subtotals and grand totals, such as sales per region plus an overall total?

    1. PIVOT
    2. ROLLUP
    3. INTERSECT
    4. UNION

    Explanation: ROLLUP is a GROUP BY extension that adds subtotal and total rows to aggregation results. PIVOT transforms rows to columns, UNION combines results of multiple queries, and INTERSECT returns common rows from queries.

  6. Handling Duplicate Rows

    What SQL keyword can be used in a SELECT query to remove duplicate rows from the returned result set?

    1. DISTINCT
    2. GROUP
    3. FILTER
    4. UNIQUE

    Explanation: DISTINCT returns only unique rows in the result set. UNIQUE is not a SQL keyword for this purpose, though it's a constraint on columns. FILTER and GROUP serve other roles, like conditional aggregation and grouping.

  7. Purpose of an UPSERT Operation

    In database terminology, what does an 'upsert' typically do during data transformations?

    1. Insert new rows or update existing ones
    2. Only delete rows
    3. Only select rows
    4. Rename columns

    Explanation: An upsert inserts new records or updates existing ones if they conflict with primary keys, enabling merge-like functionality. Deleting rows, selecting rows, and renaming columns are unrelated operations.

  8. Selecting Latest Entry per Group

    To select the most recent order for each customer, which SQL construct is commonly required?

    1. Window functions
    2. CASCADE
    3. CROSS JOIN
    4. TRIGGER

    Explanation: Window functions, such as ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...), help pick the latest entry per group. CROSS JOIN creates all combinations; CASCADE and TRIGGER relate to other database features.

  9. Joining for Deduplication

    When deduplicating rows based on email in a users table, which SQL technique is often used?

    1. MINUS
    2. DROP TABLE
    3. Self-join
    4. RIGHT JOIN

    Explanation: A self-join allows comparing rows within the same table to find or remove duplicates. RIGHT JOIN relates to joining two different tables, DROP TABLE deletes data structures, and MINUS returns set differences.

  10. Aggregating After a Join

    Which aggregate function would you use to calculate total revenue by joining orders and products tables?

    1. LIST
    2. MEDIAN
    3. SUM
    4. COUNT

    Explanation: SUM is used to add together numeric values, such as total revenue. COUNT tallies rows, MEDIAN is not standard in all SQL dialects, and LIST is not a common aggregate function.

  11. Identifying FULL OUTER JOIN Results

    What does a FULL OUTER JOIN between tables A and B return?

    1. All rows from A and B, with NULLs where there is no match
    2. Only rows with matching keys
    3. All rows from A only
    4. Only non-matching rows

    Explanation: FULL OUTER JOIN gives every row from both tables, placing NULLs in columns where no match exists. It does not restrict to just matching or non-matching rows, nor does it limit the result to just table A.

  12. Purpose of HAVING Clause

    After grouping sales by item, how do you filter groups to show only those with totals above 100?

    1. HAVING
    2. AS
    3. WHERE
    4. INTO

    Explanation: HAVING filters the results after aggregation, such as groups of sales totals. WHERE operates before grouping, INTO creates new tables, and AS is used for renaming (aliasing).

  13. Recognizing CROSS JOIN Outcomes

    What is created when you perform a CROSS JOIN between two small tables?

    1. Null results for unmatched rows
    2. Rows with only unique keys
    3. A Cartesian product of both tables
    4. A filtered subset of matching rows

    Explanation: A CROSS JOIN produces every possible combination of rows, known as a Cartesian product. It does not filter or match rows, nor is it related to uniqueness or nulls for unmatched data.

  14. Choosing the Correct JOIN for All Records

    To show all customers, even those without any sales, along with their sales if available, which JOIN is appropriate?

    1. LEFT JOIN
    2. NATURAL JOIN
    3. RIGHT JOIN
    4. INNER JOIN

    Explanation: LEFT JOIN includes all customers and attaches available sales, using NULLs where none exist. INNER JOIN only returns matches, RIGHT JOIN is the reverse direction, and NATURAL JOIN automatically joins on columns with the same name but doesn't specifically include all records from one side.

  15. Ensuring Atomic Row Selection

    Which function can help ensure only a single unique row is selected for each group in an aggregation transformation?

    1. FIRST()
    2. ROW_NUMBER()
    3. SUM()
    4. COLLATE

    Explanation: ROW_NUMBER() assigns a unique sequential integer to rows within a partition, helping pick one row per group. SUM() aggregates values, FIRST() is not a standard SQL function, and COLLATE is used for sorting and comparison settings.

  16. Identifying Duplicates in Table Data

    Which SQL operator can you use in conjunction with GROUP BY to find duplicate email addresses in a contact list?

    1. ORDER BY ASC
    2. MINUS
    3. HAVING COUNT(*) u003E 1
    4. OVERLAP

    Explanation: HAVING COUNT(*) u003E 1 filters grouped results to show only values that occur more than once, essential for finding duplicates. ORDER BY simply sorts data, MINUS shows differences between result sets, and OVERLAP is not standard for deduplication.