SQL Joins and Aggregations for Data Pipelines Quiz

Test your knowledge of SQL joins, aggregations, deduplication, rollups, and upserts. This quiz helps you review important SQL concepts for building effective pipeline transforms and data workflows.

Understanding INNER JOIN Usage
Which SQL join type returns only the rows that have matching values in both tables, such as when combining orders and customers?
1. LEFT JOIN
2. OUTER JOIN
3. FULL JOIN
4. INNER JOIN
Explanation: INNER JOIN returns rows that have matching values in both joined tables, making it useful for combining related data. LEFT JOIN includes all rows from the left table, even if there is no match, while FULL JOIN combines all rows from both sides, with NULLs if there is no match. OUTER JOIN is a general term and is not a specific operator in SQL syntax.
Applying GROUP BY for Aggregations
What SQL clause should you use to roll up total sales amounts by region from a transactions table?
1. ORDER BY
2. GROUP BY
3. HAVING
4. DISTINCT
Explanation: GROUP BY groups rows sharing the same value in specified columns, commonly used with aggregate functions to create rollups. ORDER BY is for sorting results. HAVING filters groups after aggregation, and DISTINCT eliminates duplicate rows without aggregation.
Using COUNT for Deduplication
If you want to count the number of unique customer IDs in a sales table, which SQL function should you use?
1. SUM
2. AVG
3. COUNT(DISTINCT ...)
4. COUNT(*)
Explanation: COUNT(DISTINCT ...) counts unique, non-null values in a column, perfect for deduplicating results. SUM adds up values, AVG returns the mean, and COUNT(*) counts all rows, including duplicates.
LEFT JOIN Result Rows
If you use a LEFT JOIN between users and orders, what will happen to users with no orders?
1. They are excluded from the result.
2. They are duplicated in the result.
3. They appear with NULLs for order columns.
4. They appear with zeroes in order columns.
Explanation: LEFT JOIN includes all rows from the left table (users) and fills in NULLs for right table (orders) columns where no match exists. They are not excluded (that's INNER JOIN), nor would they have zeroes unless specifically assigned. Duplication only happens if multiple matches exist.
Identifying a Rollup Query
Which query type is used to provide subtotals and grand totals, such as sales per region plus an overall total?
1. PIVOT
2. ROLLUP
3. INTERSECT
4. UNION
Explanation: ROLLUP is a GROUP BY extension that adds subtotal and total rows to aggregation results. PIVOT transforms rows to columns, UNION combines results of multiple queries, and INTERSECT returns common rows from queries.
Handling Duplicate Rows
What SQL keyword can be used in a SELECT query to remove duplicate rows from the returned result set?
1. DISTINCT
2. GROUP
3. FILTER
4. UNIQUE
Explanation: DISTINCT returns only unique rows in the result set. UNIQUE is not a SQL keyword for this purpose, though it's a constraint on columns. FILTER and GROUP serve other roles, like conditional aggregation and grouping.
Purpose of an UPSERT Operation
In database terminology, what does an 'upsert' typically do during data transformations?
1. Insert new rows or update existing ones
2. Only delete rows
3. Only select rows
4. Rename columns
Explanation: An upsert inserts new records or updates existing ones if they conflict with primary keys, enabling merge-like functionality. Deleting rows, selecting rows, and renaming columns are unrelated operations.
Selecting Latest Entry per Group
To select the most recent order for each customer, which SQL construct is commonly required?
1. Window functions
2. CASCADE
3. CROSS JOIN
4. TRIGGER
Explanation: Window functions, such as ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...), help pick the latest entry per group. CROSS JOIN creates all combinations; CASCADE and TRIGGER relate to other database features.
Joining for Deduplication
When deduplicating rows based on email in a users table, which SQL technique is often used?
1. MINUS
2. DROP TABLE
3. Self-join
4. RIGHT JOIN
Explanation: A self-join allows comparing rows within the same table to find or remove duplicates. RIGHT JOIN relates to joining two different tables, DROP TABLE deletes data structures, and MINUS returns set differences.
Aggregating After a Join
Which aggregate function would you use to calculate total revenue by joining orders and products tables?
1. LIST
2. MEDIAN
3. SUM
4. COUNT
Explanation: SUM is used to add together numeric values, such as total revenue. COUNT tallies rows, MEDIAN is not standard in all SQL dialects, and LIST is not a common aggregate function.
Identifying FULL OUTER JOIN Results
What does a FULL OUTER JOIN between tables A and B return?
1. All rows from A and B, with NULLs where there is no match
2. Only rows with matching keys
3. All rows from A only
4. Only non-matching rows
Explanation: FULL OUTER JOIN gives every row from both tables, placing NULLs in columns where no match exists. It does not restrict to just matching or non-matching rows, nor does it limit the result to just table A.
Purpose of HAVING Clause
After grouping sales by item, how do you filter groups to show only those with totals above 100?
1. HAVING
2. AS
3. WHERE
4. INTO
Explanation: HAVING filters the results after aggregation, such as groups of sales totals. WHERE operates before grouping, INTO creates new tables, and AS is used for renaming (aliasing).
Recognizing CROSS JOIN Outcomes
What is created when you perform a CROSS JOIN between two small tables?
1. Null results for unmatched rows
2. Rows with only unique keys
3. A Cartesian product of both tables
4. A filtered subset of matching rows
Explanation: A CROSS JOIN produces every possible combination of rows, known as a Cartesian product. It does not filter or match rows, nor is it related to uniqueness or nulls for unmatched data.
Choosing the Correct JOIN for All Records
To show all customers, even those without any sales, along with their sales if available, which JOIN is appropriate?
1. LEFT JOIN
2. NATURAL JOIN
3. RIGHT JOIN
4. INNER JOIN
Explanation: LEFT JOIN includes all customers and attaches available sales, using NULLs where none exist. INNER JOIN only returns matches, RIGHT JOIN is the reverse direction, and NATURAL JOIN automatically joins on columns with the same name but doesn't specifically include all records from one side.
Ensuring Atomic Row Selection
Which function can help ensure only a single unique row is selected for each group in an aggregation transformation?
1. FIRST()
2. ROW_NUMBER()
3. SUM()
4. COLLATE
Explanation: ROW_NUMBER() assigns a unique sequential integer to rows within a partition, helping pick one row per group. SUM() aggregates values, FIRST() is not a standard SQL function, and COLLATE is used for sorting and comparison settings.
Identifying Duplicates in Table Data
Which SQL operator can you use in conjunction with GROUP BY to find duplicate email addresses in a contact list?
1. ORDER BY ASC
2. MINUS
3. HAVING COUNT(*) u003E 1
4. OVERLAP
Explanation: HAVING COUNT(*) u003E 1 filters grouped results to show only values that occur more than once, essential for finding duplicates. ORDER BY simply sorts data, MINUS shows differences between result sets, and OVERLAP is not standard for deduplication.

SQL Joins and Aggregations for Data Pipelines Quiz

Understanding INNER JOIN Usage

Applying GROUP BY for Aggregations

Using COUNT for Deduplication

LEFT JOIN Result Rows

Identifying a Rollup Query

Handling Duplicate Rows

Purpose of an UPSERT Operation

Selecting Latest Entry per Group

Joining for Deduplication

Aggregating After a Join

Identifying FULL OUTER JOIN Results

Purpose of HAVING Clause

Recognizing CROSS JOIN Outcomes

Choosing the Correct JOIN for All Records

Ensuring Atomic Row Selection

Identifying Duplicates in Table Data