Explore your understanding of advanced querying with window functions in TimescaleDB. This quiz reviews key concepts such as ranking, aggregation, partitioning, and analytic SQL for time-series data, helping you master efficient data analysis with window functions.
Which window function assigns a unique sequential integer to rows within each partition, starting at 1 for the first row in each partition?
Explanation: ROW_NUMBER assigns a unique, sequential integer to each row in the order specified within the partition. RANK and DENSE_RANK are similar but assign the same number to rows with identical values and can skip or repeat ranks. NTILE divides rows into a specified number of groups and does not guarantee sequential unique integers. Therefore, ROW_NUMBER is the correct function for this behavior.
When using window functions, what is the role of the PARTITION BY clause in a query analyzing daily sales data per store?
Explanation: The PARTITION BY clause splits data into partitions, such as by store, so calculations like rankings or sums reset for each store. It doesn't filter rows; that's what WHERE does. It doesn't increase the number of rows but instead groups them logically. It also does not perform ordering by itself; the ORDER BY clause within the window function handles sorting.
In a query using SUM(quantity) OVER (ORDER BY date), what effect does the ORDER BY inside the OVER clause have on the result?
Explanation: ORDER BY within the OVER clause determines the sequence for cumulative calculations, such as a running sum by date. It does not randomize, which would require a different function. Filtering out duplicates or limiting to one row per date is not the purpose of an ORDER BY within a window function.
What value does the LAG function return when used to access the previous temperature value in a time-ordered dataset?
Explanation: LAG is used to return the value from a previous row based on the defined order, such as time. It does not provide the next row's value (LEAD does that), nor does it give the average or earliest value, which are achieved with other functions or explicit queries. Thus, LAG retrieves the temperature from the previous row.
Why would you use the NTILE window function on a table of sensor readings ordered by value?
Explanation: NTILE divides ordered rows into a defined number of nearly equal-size groups, useful for percentiles or quartiles. It does not specifically calculate medians, nor does it filter out duplicates or return only first group values. Its main function is to assign group numbers for distribution analysis.
Which of the following shows the correct syntax for using AVG(temperature) as a window function for each sensor?
Explanation: The correct syntax requires placing the window function inside the OVER clause, specifying PARTITION BY for grouping. The second and fourth options don't correctly use SQL syntax for window functions, and the third option misplaces keywords. Only the first option is valid SQL.
What is the key difference between RANK and DENSE_RANK when ranking rows with duplicate values?
Explanation: RANK will leave gaps in ranking after tied values, while DENSE_RANK assigns sequential numbers with no gaps. DENSE_RANK does not use random numbers, and although they are similar, their treatment of ties is different. RANK does not guarantee a unique number for each row in the case of ties.
In window functions, what do frame clauses such as ROWS BETWEEN 1 PRECEDING AND CURRENT ROW specify?
Explanation: Frame clauses define which rows are considered for each window calculation, such as including one preceding row to the current row. They are not responsible for ordering the overall query, setting column counts, or creating primary keys. The frame determines calculation scope row by row.
When analyzing stock prices over time, which window function would you use to retrieve the price from the next row?
Explanation: LEAD returns data from the next row in the specified order, perfect for comparing current stock prices to upcoming values. LAG gets values from previous rows, not next. RANK and SUM do not retrieve adjacent values; they provide rankings or aggregate totals instead.
Which query is best for getting a cumulative sum of rainfall amounts ordered by day, when analyzing weather data?
Explanation: Using SUM with OVER (ORDER BY ...) gives a cumulative or running total per day. GROUP BY alone aggregates data but does not provide a running sum. The third option has incorrect syntax, and the fourth does not show how to calculate the running total. Only the first option implements the intended cumulative calculation.