SQL Data Analyst Interview Concepts Quiz Quiz

Test your knowledge with this SQL quiz focused on key data analyst interview topics, including duplicate detection, joins, subqueries, window functions, aggregation, and conditional logic. This quiz is designed to help candidates prepare for data analyst interviews by covering essential SQL interview questions and scenarios commonly asked for entry-level positions.

  1. Duplicate Row Detection

    What is the primary purpose of using GROUP BY on columns that should be unique, followed by HAVING COUNT(*) u003E 1 in an SQL query?

    1. To identify the primary key columns
    2. To perform an inner join between tables
    3. To delete all duplicate rows directly
    4. To identify duplicate rows in the data

    Explanation: Using GROUP BY on unique columns groups together identical rows, and HAVING COUNT(*) u003E 1 filters those groups where duplicates exist. The query does not delete duplicates; it only helps in finding them. Identifying the primary key is a schema-level task, and inner joins are for combining tables, not for detecting duplicates.

  2. INNER JOIN vs LEFT OUTER JOIN

    If you join two tables using an INNER JOIN, which records will be included in the result set?

    1. All records from the first table and matching records from the second table
    2. Only records with matching values in both tables
    3. All records from the second table and matching records from the first table
    4. All records from both tables regardless of matches

    Explanation: INNER JOIN returns only the rows where there is a match in both tables based on the join condition. LEFT OUTER JOIN includes all records from the first (left) table, while FULL OUTER JOIN and RIGHT OUTER JOIN behave differently, including unmatched rows as well.

  3. Fetching the Second-Highest Value

    Which SQL query will correctly return the second-highest salary from the 'employees' table?

    1. SELECT MAX(salary) FROM employees WHERE salary u003C (SELECT MAX(salary) FROM employees);
    2. SELECT DISTINCT salary FROM employees ORDER BY salary ASC LIMIT 2;
    3. SELECT MIN(salary) FROM employees;
    4. SELECT salary FROM employees WHERE ROWNUM = 2;

    Explanation: The correct query uses a subquery to find the maximum salary and then selects the highest value below it, which yields the second-highest. The MIN function gives the lowest salary, and 'ROWNUM = 2' is not standard SQL and may not give the correct result. The last option retrieves the two lowest, not the second-highest salary.

  4. GROUP BY and HAVING Usage

    Why is the HAVING clause used with GROUP BY in SQL?

    1. To order groups by their count
    2. To assign ranks to groups
    3. To filter results after aggregation is performed
    4. To rename group columns

    Explanation: HAVING filters aggregated results, such as those produced by GROUP BY. It is not used for renaming, ordering, or ranking. ORDER BY is for sorting, and aliases/CTEs are for naming. RANK or window functions should be used for ranking.

  5. Self-Join for Organization Hierarchies

    Given an 'employees' table with columns: emp_id, name, salary, manager_id, what does a self-join using e.manager_id = m.emp_id help you retrieve?

    1. Relationships between employees and their managers
    2. Duplicate employee records in the table
    3. The highest salary in the company
    4. The list of employees without salary information

    Explanation: A self-join links employees to their respective managers using the manager_id and emp_id relationship. It does not identify duplicates or find highest salaries, and salary information can still be included unless filtered out.

  6. SQL Window Function Basics

    Which statement best defines a window function in SQL?

    1. A function that modifies table structure
    2. A function that only filters groups after aggregation
    3. A function that only operates on raw, ungrouped table rows
    4. A function that performs calculations across a set of rows related to the current row without collapsing rows

    Explanation: Window functions calculate values over related rows while maintaining row-level detail. They differ from aggregate functions, which collapse rows, and do not modify table structures or filter groups after aggregation.

  7. ROW_NUMBER vs RANK in Window Functions

    In the context of window functions, what is a key difference between ROW_NUMBER() and RANK()?

    1. RANK() can only be used with GROUP BY, while ROW_NUMBER() cannot
    2. ROW_NUMBER() collapses duplicate rows, while RANK() does not
    3. RANK() assigns the same rank to ties, skipping subsequent numbers; ROW_NUMBER() gives each row a unique, sequential number
    4. ROW_NUMBER() assigns NULLs when there is a tie

    Explanation: RANK() assigns the same rank to tied values and skips the next ranks accordingly, while ROW_NUMBER() always increments and provides unique row numbers. Neither function collapses rows, and both are used independently from GROUP BY.

  8. Top-N Query Techniques

    What does 'ORDER BY total_sales DESC LIMIT 3' achieve when querying a 'sales_data' table?

    1. It updates the top 3 sales records
    2. It only shows products with sales below the average
    3. It lists all products ordered by name
    4. It retrieves the top 3 products with the highest total sales

    Explanation: ORDER BY total_sales DESC sorts sales from highest to lowest, and LIMIT 3 returns only the first three rows, showing the top three sales. This does not update records or filter based on averages, nor does it sort by name.

  9. UNION vs UNION ALL

    What is the main difference between the UNION and UNION ALL operators in SQL?

    1. UNION requires identical column names, but UNION ALL does not
    2. UNION ALL can only combine two tables
    3. UNION removes duplicate rows, whereas UNION ALL keeps all rows including duplicates
    4. UNION ALL sorts results while UNION does not

    Explanation: UNION eliminates duplicates from the result set, while UNION ALL includes all rows. Neither operator handles only sorting or requires identical column names, and both can combine more than two result sets.

  10. Conditional Logic with CASE

    Which SQL feature allows you to assign custom categories to rows based on column values, similar to IF/ELSE logic?

    1. HAVING clause
    2. CASE statement
    3. GROUP BY clause
    4. ORDER BY clause

    Explanation: The CASE statement enables row-level conditional logic, assigning categories or new values based on data conditions. GROUP BY groups data, HAVING filters aggregates, and ORDER BY sorts results but does not provide conditional branching.

  11. Removing Duplicates with ROW_NUMBER

    How can the ROW_NUMBER() window function aid in managing duplicate records in a table?

    1. It assigns a unique sequence to each row in a partition, making it easy to identify and remove extra duplicates
    2. It updates duplicate values to NULL automatically
    3. It collapses duplicates into one row
    4. It generates random numbers to mask duplicates

    Explanation: Assigning ROW_NUMBER allows you to number duplicate rows, so that you can identify which ones to keep or delete. It does not perform automatic deletion, collapse or randomization of data.

  12. Finding Employees Who Earn More Than Their Managers

    What does a query joining the 'employees' table to itself and filtering for e.salary u003E m.salary tell you?

    1. Duplicate salary records
    2. Which employees have salaries higher than their managers
    3. Employees with the highest salary overall
    4. The average salary per manager

    Explanation: This self-join compares each employee with their manager based on salaries, finding cases where the employee earns more. It does not return overall highest salary, averages, or identify duplicates.

  13. Partitioning in Window Functions

    What is the effect of using PARTITION BY in a window function like ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)?

    1. It collapses all departments into one group
    2. It assigns ranks only to the top row
    3. It removes NULL salaries
    4. It restarts the row numbering for each department

    Explanation: PARTITION BY divides the data into departments, and ROW_NUMBER() restarts counting for each group. The function does not collapse data, only rank the top row, or eliminate NULLs.

  14. Performance of UNION vs UNION ALL

    Why is UNION ALL typically faster than UNION in SQL queries?

    1. UNION ALL indexes the result set automatically
    2. UNION ALL does not perform duplicate elimination, while UNION does
    3. UNION requires more JOIN operations than UNION ALL
    4. UNION ALL sorts the output but UNION does not

    Explanation: UNION removes duplicate records, requiring extra computation, whereas UNION ALL simply combines results without extra filtering. Indexing and JOIN mechanics are unrelated, and sorting is not the main performance difference here.

  15. Aggregating and Filtering with HAVING

    Which clause would you use to filter groups after aggregation rather than individual rows before aggregation?

    1. WHERE
    2. HAVING
    3. CREATE
    4. ORDER BY

    Explanation: HAVING is specifically used to filter groups post-aggregation. WHERE filters before aggregation, ORDER BY sorts the result, and CREATE is for making database objects.

  16. Ranking with Ties

    If two rows have the same highest value and are ranked using RANK() OVER (ORDER BY value DESC), what will the assigned ranks look like for the top three values: 100, 100, 90?

    1. Ranks 1, 1, and 2
    2. Ranks 1, 1, and 3
    3. Ranks 1, 2, and 3
    4. Ranks 2, 2, and 3

    Explanation: RANK() assigns the same rank to tied values and skips the next rank. For values 100, 100, and 90, the first two get rank 1, and the next gets rank 3. Other options do not reflect how SQL RANK() works.