Mastering SQL Ranking, Partitioning, and Deduplication Quiz

Explore advanced SQL concepts used to analyze and organize data efficiently, including window functions like ROW_NUMBER, RANK, and DENSE_RANK. Ideal for those looking to excel at real-world analytics and reporting scenarios.

  1. Purpose of the ROW_NUMBER() Function

    What is the main purpose of using ROW_NUMBER() in a SQL query that analyzes top-selling products by region?

    1. To eliminate duplicate rows automatically
    2. To calculate the average sales per region
    3. To group data by product category
    4. To assign a unique sequential number to each row within a partition

    Explanation: ROW_NUMBER() assigns a unique sequential number to each row based on specified sorting within a partition, which helps identify the top product per region without ties. It does not group data, remove duplicates directly, or calculate averages—those require different functions.

  2. Handling Ties with RANK()

    If two products have the same total sales within a region, how does the RANK() function handle their ranking?

    1. Ignores tied values and skips ranking them
    2. Assigns continuous ranks with no gaps
    3. Assigns the same rank to both products and leaves a gap in subsequent ranks
    4. Assigns different ranks to each product

    Explanation: RANK() assigns the same rank to tied values and then leaves a gap in the ranking sequence. It does not assign different or continuous ranks, and it does not ignore tied values.

  3. Effect of DENSE_RANK() on Ranking

    When using DENSE_RANK(), what happens if there are ties in the sorted values within a partition?

    1. All rows are assigned the same rank
    2. Duplicates are removed from the results
    3. Tied rows receive different ranks
    4. Tied rows receive the same rank, and subsequent ranks are continuous with no gaps

    Explanation: DENSE_RANK() gives identical ranks to tied rows and continues with the next consecutive integer for following rows. It does not assign different ranks or remove duplicates, and not all rows share a single rank.

  4. Choosing ROW_NUMBER() for Deduplication

    Why would you choose ROW_NUMBER() when seeking to retain only the top-selling product per region in your result set?

    1. It ensures ties are explicitly represented
    2. It guarantees a single unique row per partition based on your sorting
    3. It calculates summations of sales
    4. It groups sales by product category automatically

    Explanation: ROW_NUMBER() allows you to rank rows and easily filter for the first row per partition, ensuring only one item (e.g., top seller) per group. It does not group by category, represent ties, or sum values directly.

  5. Partitioning Data in Window Functions

    In the query provided, what is the effect of PARTITION BY region in the window functions?

    1. It serves as a filter to exclude certain regions
    2. It prevents any ordering within the data
    3. It combines sales data across all regions
    4. It resets the ranking calculation separately for each region

    Explanation: PARTITION BY divides the dataset into separate regions so that ranking functions start over for each region. It does not aggregate regions together, ignore ordering, or act as a filter.

  6. Calculating Top Sellers Using Multiple Ranks

    Why might you use all three functions—ROW_NUMBER(), RANK(), and DENSE_RANK()—together in a sales report by region?

    1. To analyze different ways of handling ties and identifying the top seller
    2. To remove duplicate sales records directly
    3. To calculate total sales for each product
    4. To group products by category automatically

    Explanation: Using all three functions allows you to see each method's ranking logic, especially in scenarios with ties. They do not directly remove duplicates, sum sales, or group by category.

  7. Ranking with RANK() Example

    Given a region with sales totals: A=200, B=200, C=150, what ranks will RANK() assign to these products when ordered by sales descending?

    1. A=1, B=1, C=3
    2. A=1, B=2, C=3
    3. A=1, B=1, C=2
    4. A=1, B=2, C=2

    Explanation: RANK() assigns 1 to both A and B since their sales are equal, then skips rank 2 and assigns 3 to C. It does not assign unique, non-skipping ranks or misnumbered sequences.

  8. DENSE_RANK() with No Gaps Example

    If products X and Y both have highest sales in a region, followed by Z, how will DENSE_RANK() rank them?

    1. X=1, Y=1, Z=3
    2. X=1, Y=2, Z=2
    3. X=1, Y=2, Z=3
    4. X=1, Y=1, Z=2

    Explanation: DENSE_RANK() gives the same rank to X and Y for tied highest sales, then assigns the next rank, 2, to Z. It does not skip numbers or assign duplicate ranks elsewhere.

  9. Impact of ORDER BY in Window Functions

    In the context of window functions, how does the ORDER BY clause within an OVER() affect ranking?

    1. It determines how rows are sorted before ranks or numbers are assigned
    2. It groups rows by region only
    3. It filters out rows with lower sales
    4. It removes duplicate products from the output

    Explanation: ORDER BY in the window function sets the order in which rows are ranked within each partition. It does not filter, deduplicate, or group rows by itself.

  10. Main Difference Between ROW_NUMBER(), RANK(), and DENSE_RANK()

    What is the primary difference between ROW_NUMBER(), RANK(), and DENSE_RANK() when applied to partitions with ties?

    1. All three assign identical ranks when ties exist
    2. ROW_NUMBER() assigns a unique value to each row, while RANK() and DENSE_RANK() both assign the same rank to ties but differ in handling gaps
    3. ROW_NUMBER() and DENSE_RANK() assign the same values when there are ties
    4. RANK() and DENSE_RANK() both skip ties completely

    Explanation: ROW_NUMBER() does not allow ties and gives unique numbers per row, while RANK() leaves gaps after ties and DENSE_RANK() does not. RANK() and DENSE_RANK() do not skip tied rows or assign identical ranks in all cases.