Challenge your understanding of SQL joins and aggregations by identifying ten frequent mistakes and their solutions. This quiz targets foundational errors, practical examples, and tips for accuracy in SQL queries related to database systems.
Which SQL clause would you use to retrieve all records from both tables, including non-matching ones, when joining 'customers' and 'orders'?
Explanation: FULL OUTER JOIN returns all records from both tables and fills unmatched fields with NULLs. INNER JOIN only returns matching records in both tables. CROSS JOIN combines every row of the first table with every row of the second, regardless of matching. SELF JOIN joins a table with itself, which is not relevant here.
If you want to count the number of orders per customer, why shouldn't you use SELECT * with GROUP BY in your query?
Explanation: Using SELECT * with GROUP BY causes ambiguity because it tries to select all columns, including those not grouped or aggregated, which SQL does not allow. Selecting all columns can trigger errors unless each column is aggregated or included in GROUP BY. SELECT * does not speed up queries, doesn't always produce errors unless misused, and does not hide NULL values.
What common mistake occurs if you use JOIN without an ON clause between 'employees' and 'departments'?
Explanation: A JOIN without an ON clause produces a CROSS JOIN, returning every possible combination of rows between the tables. This creates a much larger result set than intended. LEFT JOIN and sorting are unrelated to the absence of the ON clause. Only proper ON clauses ensure matching rows.
Which mistake leads to errors when you write SELECT name, COUNT(*) FROM users?
Explanation: Columns in SELECT statements that are not aggregated must appear in the GROUP BY clause to avoid SQL errors. Simply using COUNT without columns is allowed, and using an alias or lacking ORDER BY does not cause aggregation errors.
If you perform a LEFT JOIN from 'products' to 'categories', which records from 'products' might have NULL category fields?
Explanation: LEFT JOIN returns all 'products', filling category fields with NULL if there is no match in 'categories'. Products assigned to all categories or without prices are unrelated to this join. Rows representing categories without products are not directly represented in the 'products' table with LEFT JOIN.
Why should join conditions be specified in the ON clause instead of the WHERE clause when joining 'students' and 'classes'?
Explanation: Using the ON clause preserves OUTER JOIN semantics, while using WHERE can inadvertently filter out rows and turn an OUTER JOIN effectively into an INNER JOIN. It doesn't inherently make the query faster, nor does it always clarify unambiguity. OUTER and INNER JOINs are distinguished by placement of join conditions.
What is a common mistake when writing SELECT COUNT(DISTINCT column1, column2) FROM data?
Explanation: Many SQL dialects allow COUNT(DISTINCT col) for a single column; using multiple columns like COUNT(DISTINCT column1, column2) may not be supported and causes errors. The other options misinterpret the function: it neither sums nor averages values, nor does it remove all duplicates from the table.
When joining tables with columns of the same name, what often happens if you forget to use table aliases?
Explanation: If two tables have columns with the same name and you do not use aliases or prefix them, SQL cannot determine which column to use and throws an 'ambiguous column' error. SQL does not automatically choose which table to use, does not fill in NULLs, nor does it combine same-named columns.
Which issue arises if your GROUP BY references columns not present in your SELECT clause?
Explanation: Grouping by columns not shown in SELECT works fine—the results will be grouped by those columns, but only the selected fields appear in the result. SQL does not remove any columns automatically, and the aggregates themselves continue to use the correct groups.
Why might an INNER JOIN between 'authors' and 'books' produce duplicate author names in the results?
Explanation: INNER JOIN produces one result row per matching pair. If an author wrote several books, their name repeats for each book. Not using the table at all doesn't produce repeated data, misspelling DISTINCT only affects deduplication, and lacking matching columns leads to empty results.
When should you use the HAVING clause instead of WHERE in a query grouping orders by customer?
Explanation: HAVING filters records after GROUP BY and aggregation are performed, making it ideal for conditions on aggregated data. WHERE filters raw data before aggregation. HAVING cannot be used to join tables or sort data, which require different SQL clauses.
What mistake is shown in SELECT name, SUM(sales) FROM reps without grouping by name?
Explanation: You must group by all non-aggregated columns in your SELECT statement; otherwise, SQL throws an error. SUM does not automatically handle grouping, and ignoring the SUM function or returning zero is not correct behavior here.
How does applying DISTINCT to only one column in an aggregate function affect results like SELECT COUNT(DISTINCT status) FROM orders?
Explanation: COUNT(DISTINCT status) counts the number of different values in the status column. It does not return all table rows, sum values, or remove table duplicates, as these are misinterpretations of the COUNT and DISTINCT behavior.
What happens if you use LEFT JOIN but accidentally reverse the tables: SELECT * FROM B LEFT JOIN A ON B.a_id = A.id?
Explanation: In a LEFT JOIN, all rows from the first table (here, B) are included, with data from A filled in when matches exist. Reversing the order changes which table is the 'left' one. The join itself remains valid SQL, and only matched results or returning all rows from A do not apply.
What mistake occurs if you aggregate data in one table before joining, like SELECT avg(price) FROM products JOIN sales ON product_id = id?
Explanation: If you aggregate 'products' before joining with 'sales', you might miss grouping by relevant attributes related to sales, which leads to incorrect analysis. Using AVG does not always require DISTINCT, SQL does not necessarily double values, and a syntax error only occurs if the query structure is violated.
What is the likely result of joining 'employees' and 'locations' without a logical key or ON condition?
Explanation: A join without a key or ON condition leads to a Cartesian product, producing every possible row combination between the tables. The query does not necessarily fail or return an empty set, nor does it reduce results to just one row unless more restrictive criteria are specified.