Common SQL Join & Aggregation Mistakes: Spot and Fix Quiz

Challenge your understanding of SQL joins and aggregations by identifying ten frequent mistakes and their solutions. This quiz targets foundational errors, practical examples, and tips for accuracy in SQL queries related to database systems.

Confusing INNER JOIN with OUTER JOIN
Which SQL clause would you use to retrieve all records from both tables, including non-matching ones, when joining 'customers' and 'orders'?
1. FULL OUTER JOIN
2. INNER JOIN
3. CROSS JOIN
4. SELF JOIN
Explanation: FULL OUTER JOIN returns all records from both tables and fills unmatched fields with NULLs. INNER JOIN only returns matching records in both tables. CROSS JOIN combines every row of the first table with every row of the second, regardless of matching. SELF JOIN joins a table with itself, which is not relevant here.
Using SELECT * in Aggregations
If you want to count the number of orders per customer, why shouldn't you use SELECT * with GROUP BY in your query?
1. SELECT * is ambiguous with GROUP BY
2. SELECT * increases query speed
3. SELECT * always produces errors
4. SELECT * hides NULL values
Explanation: Using SELECT * with GROUP BY causes ambiguity because it tries to select all columns, including those not grouped or aggregated, which SQL does not allow. Selecting all columns can trigger errors unless each column is aggregated or included in GROUP BY. SELECT * does not speed up queries, doesn't always produce errors unless misused, and does not hide NULL values.
Missing ON Clause in JOIN
What common mistake occurs if you use JOIN without an ON clause between 'employees' and 'departments'?
1. It creates a CROSS JOIN with all combinations
2. It treats it as a LEFT JOIN
3. It sorts data by default
4. It returns only matching rows
Explanation: A JOIN without an ON clause produces a CROSS JOIN, returning every possible combination of rows between the tables. This creates a much larger result set than intended. LEFT JOIN and sorting are unrelated to the absence of the ON clause. Only proper ON clauses ensure matching rows.
Incorrect Aggregation Placement
Which mistake leads to errors when you write SELECT name, COUNT(*) FROM users?
1. Not including name in GROUP BY
2. Using COUNT with no columns
3. Using an alias for COUNT
4. Having no ORDER BY clause
Explanation: Columns in SELECT statements that are not aggregated must appear in the GROUP BY clause to avoid SQL errors. Simply using COUNT without columns is allowed, and using an alias or lacking ORDER BY does not cause aggregation errors.
Null Handling in Joins
If you perform a LEFT JOIN from 'products' to 'categories', which records from 'products' might have NULL category fields?
1. Products with no matching category
2. Products belonging to all categories
3. All products without prices
4. Categories with no products
Explanation: LEFT JOIN returns all 'products', filling category fields with NULL if there is no match in 'categories'. Products assigned to all categories or without prices are unrelated to this join. Rows representing categories without products are not directly represented in the 'products' table with LEFT JOIN.
Using WHERE Instead of ON in JOINs
Why should join conditions be specified in the ON clause instead of the WHERE clause when joining 'students' and 'classes'?
1. It affects join type and result set
2. It is always faster
3. It allows OUTER JOINs to execute as INNER JOINs
4. It makes queries unambiguous
Explanation: Using the ON clause preserves OUTER JOIN semantics, while using WHERE can inadvertently filter out rows and turn an OUTER JOIN effectively into an INNER JOIN. It doesn't inherently make the query faster, nor does it always clarify unambiguity. OUTER and INNER JOINs are distinguished by placement of join conditions.
Aggregating with DISTINCT Incorrectly
What is a common mistake when writing SELECT COUNT(DISTINCT column1, column2) FROM data?
1. COUNT(DISTINCT) supports only one column argument
2. It sums all values in columns
3. It removes all duplicates from the table
4. It erroneously calculates averages
Explanation: Many SQL dialects allow COUNT(DISTINCT col) for a single column; using multiple columns like COUNT(DISTINCT column1, column2) may not be supported and causes errors. The other options misinterpret the function: it neither sums nor averages values, nor does it remove all duplicates from the table.
Forgetting Table Aliases
When joining tables with columns of the same name, what often happens if you forget to use table aliases?
1. SQL returns an 'ambiguous column' error
2. SQL automatically chooses the correct table
3. NULL values are returned for duplicate columns
4. All columns are combined into one
Explanation: If two tables have columns with the same name and you do not use aliases or prefix them, SQL cannot determine which column to use and throws an 'ambiguous column' error. SQL does not automatically choose which table to use, does not fill in NULLs, nor does it combine same-named columns.
Grouping by Non-Selected Columns
Which issue arises if your GROUP BY references columns not present in your SELECT clause?
1. There is no error; it works as intended
2. Only selected columns are grouped
3. SQL removes those columns from results
4. It always produces incorrect aggregates
Explanation: Grouping by columns not shown in SELECT works fine—the results will be grouped by those columns, but only the selected fields appear in the result. SQL does not remove any columns automatically, and the aggregates themselves continue to use the correct groups.
Joins Yielding Duplicate Rows
Why might an INNER JOIN between 'authors' and 'books' produce duplicate author names in the results?
1. Authors with multiple books repeat once per book
2. The JOIN did not use the authors' table
3. DISTINCT keyword was misspelled
4. The tables have no matching columns
Explanation: INNER JOIN produces one result row per matching pair. If an author wrote several books, their name repeats for each book. Not using the table at all doesn't produce repeated data, misspelling DISTINCT only affects deduplication, and lacking matching columns leads to empty results.
Filtering Before Aggregation
When should you use the HAVING clause instead of WHERE in a query grouping orders by customer?
1. To filter aggregated results after GROUP BY
2. To filter data before grouping
3. To join additional tables
4. To sort records alphabetically
Explanation: HAVING filters records after GROUP BY and aggregation are performed, making it ideal for conditions on aggregated data. WHERE filters raw data before aggregation. HAVING cannot be used to join tables or sort data, which require different SQL clauses.
Using Aggregates with Non-Grouped Columns
What mistake is shown in SELECT name, SUM(sales) FROM reps without grouping by name?
1. It causes a syntax error due to non-aggregated column
2. SUM automatically groups by name
3. It ignores the SUM function
4. SUM returns zero always
Explanation: You must group by all non-aggregated columns in your SELECT statement; otherwise, SQL throws an error. SUM does not automatically handle grouping, and ignoring the SUM function or returning zero is not correct behavior here.
Selecting DISTINCT with Aggregates
How does applying DISTINCT to only one column in an aggregate function affect results like SELECT COUNT(DISTINCT status) FROM orders?
1. It counts unique status values only
2. It returns all rows in the orders table
3. It sums the status values
4. It removes duplicate rows from the table
Explanation: COUNT(DISTINCT status) counts the number of different values in the status column. It does not return all table rows, sum values, or remove table duplicates, as these are misinterpretations of the COUNT and DISTINCT behavior.
Incorrect JOIN Order
What happens if you use LEFT JOIN but accidentally reverse the tables: SELECT * FROM B LEFT JOIN A ON B.a_id = A.id?
1. You get all rows from B, matched with A where possible
2. You get all rows from A, matched with B where possible
3. It returns only matched rows from both tables
4. The JOIN is invalid in SQL
Explanation: In a LEFT JOIN, all rows from the first table (here, B) are included, with data from A filled in when matches exist. Reversing the order changes which table is the 'left' one. The join itself remains valid SQL, and only matched results or returning all rows from A do not apply.
Improper Use of Aggregates in JOINs
What mistake occurs if you aggregate data in one table before joining, like SELECT avg(price) FROM products JOIN sales ON product_id = id?
1. Aggregated data ignores sales context if not grouped appropriately
2. AVG always requires DISTINCT
3. It produces a syntax error
4. It doubles the aggregated value
Explanation: If you aggregate 'products' before joining with 'sales', you might miss grouping by relevant attributes related to sales, which leads to incorrect analysis. Using AVG does not always require DISTINCT, SQL does not necessarily double values, and a syntax error only occurs if the query structure is violated.
Joining Unrelated Tables Without Key
What is the likely result of joining 'employees' and 'locations' without a logical key or ON condition?
1. A Cartesian product of both tables
2. A query that fails with an error
3. An empty result set
4. Only one row returned
Explanation: A join without a key or ON condition leads to a Cartesian product, producing every possible row combination between the tables. The query does not necessarily fail or return an empty set, nor does it reduce results to just one row unless more restrictive criteria are specified.

Common SQL Join & Aggregation Mistakes: Spot and Fix Quiz

Confusing INNER JOIN with OUTER JOIN

Using SELECT * in Aggregations

Missing ON Clause in JOIN

Incorrect Aggregation Placement

Null Handling in Joins

Using WHERE Instead of ON in JOINs

Aggregating with DISTINCT Incorrectly

Forgetting Table Aliases

Grouping by Non-Selected Columns

Joins Yielding Duplicate Rows

Filtering Before Aggregation

Using Aggregates with Non-Grouped Columns

Selecting DISTINCT with Aggregates

Incorrect JOIN Order

Improper Use of Aggregates in JOINs

Joining Unrelated Tables Without Key