Discover essential steps for creating informative data visualizations using…
Start QuizChallenge your understanding of key Python backend development projects…
Start QuizExplore practical Python project ideas that help with everyday…
Start QuizExplore simple Python projects that help you practice key…
Start QuizDiscover essential steps for beginner-level data visualization in Python…
Start QuizLearn the essentials of using Pandas in Python for…
Start QuizExplore the key differences and mental models for data…
Start QuizDiscover hands-on Python strategies that make scripts more usable,…
Start QuizKickstart your Python journey with practical beginner project ideas…
Start QuizExplore how Python Pandas streamlines data cleaning, analysis, and…
Start QuizExplore essential beginner-friendly Python projects perfect for students aiming…
Start QuizExplore creative backend Python projects that generate income through…
Start QuizSharpen your backend Python skills with these essential, production-proven…
Start QuizSharpen your understanding of Python backend development concepts such…
Start QuizExplore the practical benefits of building Python backend projects,…
Start QuizDiscover practical Python scripts that streamline daily routines, promote…
Start QuizExplore entry-level Python backend projects that introduce automation, productivity,…
Start QuizBoost your backend development productivity with these essential Python…
Start QuizExplore fundamental concepts and practical skills for effective data…
Start QuizExplore essential skills for data analysis using Python's Pandas…
Start QuizUnlock efficient data analysis in Python using the Pandas…
Start QuizExplore core skills in loading, manipulating, and visualizing data…
Start QuizExplore simple yet effective Pandas tricks for creating quick…
Start QuizDiscover the basics of creating Pandas DataFrames in Python…
Start QuizExplore essential skills for inspecting, manipulating, and visualizing data…
Start QuizTest your knowledge of common SQL and Python interview questions, covering key concepts such as window functions, data cleaning, aggregation, and database manipulation. This quiz is ideal for candidates preparing for data analytics or data engineering roles where SQL and Python skills are required.
This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.
Which SQL clause allows you to select the top 2 selling products in each category from a sales table, for example, using product revenue figures?
Correct answer: RANK() OVER (PARTITION BY category ORDER BY revenue DESC)
Explanation: RANK() OVER (PARTITION BY category ORDER BY revenue DESC) assigns a rank within each category based on revenue, making it possible to filter for the top N records per group. 'GROUP BY category LIMIT 2' does not return two rows per group but only two groups in total. 'WHERE revenue IN (SELECT TOP 2 revenue)' incorrectly attempts to select the top two revenues globally, not per category. 'ORDER BY revenue PARTITION category' is not valid SQL syntax.
Which pair of pandas methods would you use to count the number of NULL and non-NULL values in each column of a DataFrame named df?
Correct answer: isnull().sum() and notnull().sum()
Explanation: The methods df.isnull().sum() and df.notnull().sum() return counts of null and non-null values per column, which is ideal for data quality checks. The 'dropna().count() and fillna().count()' do not count nulls directly. 'nunique() and value_counts()' are for unique values and value frequencies, not nulls. 'isna().mean() and notna().mean()' give proportions, not counts.
If you need to compute a running total of revenue by date in SQL, which function should you use in your SELECT statement?
Correct answer: SUM(revenue) OVER (ORDER BY date)
Explanation: SUM(revenue) OVER (ORDER BY date) calculates a cumulative running total ordered by date. COUNT(revenue) PARTITION BY date would count rows per date, which is different. SUM(revenue) GROUP BY date only returns totals per date, not running sums. AVG(revenue) OVER (ORDER BY date) gives running averages, not totals.
To remove duplicate rows based on the 'email' column from a CSV file in pandas, which code would you use?
Correct answer: df = df.drop_duplicates(subset='email')
Explanation: The correct method is df.drop_duplicates(subset='email'), which keeps only the first occurrence of each unique email. The options 'delete_duplicates', 'remove_duplicates', and 'drop_duplication' are not valid pandas methods and will result in errors.
Given a transaction table, how would you identify the customer with the greatest total spend using SQL?
Correct answer: SELECT customer_id, SUM(order_total) AS total_spent FROM orders GROUP BY customer_id ORDER BY total_spent DESC LIMIT 1;
Explanation: Aggregating with SUM(order_total), grouping by customer, and ordering by total_spent in descending order before limiting to one result gives the customer with the highest spend. Option 2 incorrectly tries to filter by the maximum value. Option 3 is incomplete and only valid in some SQL dialects. Option 4 misuses HAVING without a proper condition.
Which SQL query pattern would best identify dates where revenue is more than three times the average revenue of the previous 7 days?
Correct answer: HAVING revenue > 3 * AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING)
Explanation: The use of HAVING with a window function comparing today's revenue with 3 times the average of the past 7 days flags significant surges. Option 2 misapplies GROUP BY and aggregates. Option 3 is for finding maximum revenue, not surges. Option 4 aggregates total revenue, not a moving average.
Which Python library and method combination allows you to connect to a PostgreSQL database and fetch the results of a query?
Correct answer: psycopg2 library with connect() and cursor().fetchall()
Explanation: psycopg2 is the standard PostgreSQL adapter for Python, using connect() for connections and fetchall() to retrieve result sets. pyodbc is used for ODBC connections and does not have a fetch_rows() method. sqlite3 is not for PostgreSQL, and Cursor.read() does not exist. mysqlclient works with MySQL, not PostgreSQL.
To find the maximum order value made by each user in an orders table, which SQL pattern would you use?
Correct answer: SELECT user_id, MAX(order_total) AS max_order FROM orders GROUP BY user_id;
Explanation: Grouping by user_id and applying MAX gives the largest order per user. Option 2 only returns the global maximum. Option 3's use of MAX inside WHERE is incorrect SQL syntax. Option 4 uses HAVING incorrectly, as HAVING is meant for aggregate filtering, not comparison with fixed values when you want all maxes.
After running a SQL SELECT query on a table, how do you convert the results into a pandas DataFrame for analysis?
Correct answer: df = pd.read_sql(query, conn)
Explanation: pd.read_sql(query, conn) reads the results of a SQL query directly into a pandas DataFrame. read_csv() expects a file or CSV text, not a database query. DataFrame(sql_results, as_frame=True) is not a valid constructor for this purpose. pd.read_query does not exist in pandas.
Which SQL query method lets you join 'orders' and 'users' tables on user ID and return only completed orders with user names?
Correct answer: SELECT o.order_id, u.name FROM orders o JOIN users u ON o.user_id = u.id WHERE o.status = 'Completed';
Explanation: The correct approach involves an explicit JOIN with ON, selecting fields from each table and filtering by status. Option 2 uses an implicit join, which is less clear and prone to errors. Options 3 and 4 are incomplete or invalid SQL statements with syntax issues.