Test your knowledge with this SQL quiz focused on key data analyst interview topics, including duplicate detection, joins, subqueries, window functions, aggregation, and conditional logic. This quiz is designed to help candidates prepare for data analyst interviews by covering essential SQL interview questions and scenarios commonly asked for entry-level positions.
What is the primary purpose of using GROUP BY on columns that should be unique, followed by HAVING COUNT(*) u003E 1 in an SQL query?
Explanation: Using GROUP BY on unique columns groups together identical rows, and HAVING COUNT(*) u003E 1 filters those groups where duplicates exist. The query does not delete duplicates; it only helps in finding them. Identifying the primary key is a schema-level task, and inner joins are for combining tables, not for detecting duplicates.
If you join two tables using an INNER JOIN, which records will be included in the result set?
Explanation: INNER JOIN returns only the rows where there is a match in both tables based on the join condition. LEFT OUTER JOIN includes all records from the first (left) table, while FULL OUTER JOIN and RIGHT OUTER JOIN behave differently, including unmatched rows as well.
Which SQL query will correctly return the second-highest salary from the 'employees' table?
Explanation: The correct query uses a subquery to find the maximum salary and then selects the highest value below it, which yields the second-highest. The MIN function gives the lowest salary, and 'ROWNUM = 2' is not standard SQL and may not give the correct result. The last option retrieves the two lowest, not the second-highest salary.
Why is the HAVING clause used with GROUP BY in SQL?
Explanation: HAVING filters aggregated results, such as those produced by GROUP BY. It is not used for renaming, ordering, or ranking. ORDER BY is for sorting, and aliases/CTEs are for naming. RANK or window functions should be used for ranking.
Given an 'employees' table with columns: emp_id, name, salary, manager_id, what does a self-join using e.manager_id = m.emp_id help you retrieve?
Explanation: A self-join links employees to their respective managers using the manager_id and emp_id relationship. It does not identify duplicates or find highest salaries, and salary information can still be included unless filtered out.
Which statement best defines a window function in SQL?
Explanation: Window functions calculate values over related rows while maintaining row-level detail. They differ from aggregate functions, which collapse rows, and do not modify table structures or filter groups after aggregation.
In the context of window functions, what is a key difference between ROW_NUMBER() and RANK()?
Explanation: RANK() assigns the same rank to tied values and skips the next ranks accordingly, while ROW_NUMBER() always increments and provides unique row numbers. Neither function collapses rows, and both are used independently from GROUP BY.
What does 'ORDER BY total_sales DESC LIMIT 3' achieve when querying a 'sales_data' table?
Explanation: ORDER BY total_sales DESC sorts sales from highest to lowest, and LIMIT 3 returns only the first three rows, showing the top three sales. This does not update records or filter based on averages, nor does it sort by name.
What is the main difference between the UNION and UNION ALL operators in SQL?
Explanation: UNION eliminates duplicates from the result set, while UNION ALL includes all rows. Neither operator handles only sorting or requires identical column names, and both can combine more than two result sets.
Which SQL feature allows you to assign custom categories to rows based on column values, similar to IF/ELSE logic?
Explanation: The CASE statement enables row-level conditional logic, assigning categories or new values based on data conditions. GROUP BY groups data, HAVING filters aggregates, and ORDER BY sorts results but does not provide conditional branching.
How can the ROW_NUMBER() window function aid in managing duplicate records in a table?
Explanation: Assigning ROW_NUMBER allows you to number duplicate rows, so that you can identify which ones to keep or delete. It does not perform automatic deletion, collapse or randomization of data.
What does a query joining the 'employees' table to itself and filtering for e.salary u003E m.salary tell you?
Explanation: This self-join compares each employee with their manager based on salaries, finding cases where the employee earns more. It does not return overall highest salary, averages, or identify duplicates.
What is the effect of using PARTITION BY in a window function like ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)?
Explanation: PARTITION BY divides the data into departments, and ROW_NUMBER() restarts counting for each group. The function does not collapse data, only rank the top row, or eliminate NULLs.
Why is UNION ALL typically faster than UNION in SQL queries?
Explanation: UNION removes duplicate records, requiring extra computation, whereas UNION ALL simply combines results without extra filtering. Indexing and JOIN mechanics are unrelated, and sorting is not the main performance difference here.
Which clause would you use to filter groups after aggregation rather than individual rows before aggregation?
Explanation: HAVING is specifically used to filter groups post-aggregation. WHERE filters before aggregation, ORDER BY sorts the result, and CREATE is for making database objects.
If two rows have the same highest value and are ranked using RANK() OVER (ORDER BY value DESC), what will the assigned ranks look like for the top three values: 100, 100, 90?
Explanation: RANK() assigns the same rank to tied values and skips the next rank. For values 100, 100, and 90, the first two get rank 1, and the next gets rank 3. Other options do not reflect how SQL RANK() works.