Test your foundational knowledge of SQL data cleaning and preprocessing concepts, including handling duplicates, joins, window functions, grouping, and aggregate queries. Perfect for interview preparation and understanding key SQL techniques for structured data management.
Which SQL JOIN type returns all rows from the left table and matching rows from the right table, inserting NULLs where there is no match?
Explanation: LEFT JOIN returns all records from the left table and includes matched records from the right table, inserting NULLs for non-matching rows. INNER JOIN only includes rows present in both tables. FULL JOIN brings all rows from both tables, filling NULLs for missing matches in either table. RIGHT JOIN does the opposite of LEFT JOIN, returning all rows from the right table.
What SQL clause is typically used with GROUP BY to filter only groups that appear more than once, useful for identifying duplicates?
Explanation: HAVING is used to filter groups formed by GROUP BY, so it's ideal for finding duplicates when combined with COUNT. WHERE filters individual rows before grouping, not groups themselves. ORDER BY sorts results but does not filter groups. LIMIT restricts the number of results returned and isn't used for grouping.
Which clause should you use to arrange the result set in ascending or descending order based on a column's values?
Explanation: ORDER BY sorts query results by the specified columns in ascending (default) or descending order. GROUP BY is used for aggregating data and does not guarantee sort order. 'HAVIN' is a misspelling of HAVING, which filters groups, and SELECTION is not a valid SQL clause.
Which of the following best describes a window function in SQL?
Explanation: Window functions perform row-wise calculations that can access data from other rows in their window, enabling tasks like running totals or rankings. Traditional aggregation collapses rows into summary results. Deleting duplicates and sorting are not the core purpose of window functions.
To correctly compute the median in SQL for a numeric column, which two key window functions might you need?
Explanation: ROW_NUMBER() assigns a unique sequential number to rows, and COUNT() gives the total count, both essential for determining the middle value(s) for median calculation. SUM() and AVG() are for totals and averages, RANK() is for ranking, and MIN()/MAX() find lowest and highest values but do not directly help to locate the median.
Which SQL operator allows you to filter rows where a column value is missing or undefined?
Explanation: IS NULL is used to test whether a column contains a missing or undefined value. LIKE is for pattern matching in strings. GROUPED is not a valid SQL operator, and UNIQUE is commonly used in constraints, not filtering.
If you want to count the number of employees in each department, which SQL clause should you use to group records by department?
Explanation: GROUP BY collects rows sharing the same value in a specified column, enabling you to aggregate, such as counting employees per department. 'ORDERED BY' and 'GROUPED' are not valid SQL clauses, and 'HAVIG' is a misspelling of HAVING.
What combination of SQL keywords helps you retrieve only unique rows from a table?
Explanation: SELECT DISTINCT removes duplicate rows from a query's result set. UNION combines result sets from two queries, but does not remove duplicates unless UNION DISTINCT is used. SELECT REPEAT and SELECT OFF are not valid SQL commands.
Which SQL string function is commonly used to extract part of a string, such as separating first and last names?
Explanation: SUBSTRING or similar functions extract a portion of a string, helpful for splitting data like names. PRIMARY is not a string function, AGGREGATE refers to summary operations, and LINK is unrelated to string extraction.
If you want to make all text in a column lowercase for consistency, which SQL function should you use?
Explanation: LOWER() converts all letters in a string to lowercase, helping to standardize data entries. UCASE() is a non-standard synonym for UPPER() and usually capitalizes text, not lowercases it. COUNT() is used for counting rows, not changing text case.