Test your knowledge of common data analyst interview questions with this beginner-friendly quiz. Covering Power BI, SQL, Excel, and Python basics, this quiz is perfect for anyone preparing for a data analyst role or seeking to assess their foundational data analysis skills.
In Power BI, which type of table typically contains numeric values for analysis, such as sales amounts or quantities?
Explanation: A fact table stores measurable, quantitative data for analysis, like sales or profit figures, and is central to most data models. Dimension tables hold descriptive attributes, such as names or categories, and do not usually store numbers for calculations. Image tables are not standard terminology in Power BI modeling. A transaction log tracks changes but is not used directly for analysis in Power BI.
What is DAX in Power BI, and what is its primary purpose?
Explanation: DAX (Data Analysis Expressions) is a formula language designed for creating calculations and aggregations in Power BI and similar tools. It is not a visualization tool, nor does it store data like a database. DAX is built into Power BI, not provided as an Excel add-in.
If you discover missing or blank values while preparing a dataset, which is a commonly recommended first step?
Explanation: The first step should always be to identify and understand why data is missing before deciding on a course of action. Deleting rows right away may remove important information. Ignoring the issue can lead to inaccurate results, and simply changing column formats does not address missing data.
Which of the following operators can be used in the SQL WHERE clause to filter records?
Explanation: The AND operator allows combining multiple conditions in the WHERE clause for more precise filtering. PIVOT is used for restructuring data, not filtering rows. SUM is an aggregate function, not an operator for filtering. PASTE is not an SQL function or operator.
What is the main purpose of using JOINs in SQL queries?
Explanation: SQL JOINs are designed to combine records from multiple tables using common columns, supporting complex queries. Deleting tables is done with DROP, not JOIN. Sorting data is managed with ORDER BY, and encryption is a separate security feature.
In SQL, how does the DENSE_RANK() function differ from the RANK() function?
Explanation: DENSE_RANK() assigns consecutive ranks without gaps for rows with tied values, whereas RANK() inserts gaps after ties. The function doesn't sum or delete rows, nor does it change data types like converting text to numbers.
What is the main use of Pivot Tables in Excel?
Explanation: Pivot Tables are used to efficiently summarize, analyze, and explore large datasets in Excel. They do not serve to permanently filter out data or create scripts. Adding background images is unrelated to pivot table functionality.
Which formula in Excel would you use to search for a value in the first column of a table and return a value in the same row from another column?
Explanation: VLOOKUP looks up a value in the first column and retrieves information from other columns in the same row. AVERAGEIF calculates averages based on a condition, COUNTBLANK counts blank cells, and CONCATENATE joins text, but none of these fetch data from tables like VLOOKUP.
Which reference type in Excel changes both the row and column when a formula is copied to another cell?
Explanation: Relative references adapt both the column and row when copied across cells, making formulas dynamic. Absolute references keep the reference fixed using dollar signs. Mixed references vary either row or column, but not both. Sequential reference is not an Excel term.
Which Pandas function in Python is commonly used to read a CSV file into a DataFrame?
Explanation: The read_csv function loads CSV files directly into DataFrames for analysis. load_excel is incorrect (should be read_excel for Excel files), and there are no standard Pandas functions named importfile or file_read.
In Pandas, which indexer would you use to select rows by label instead of by integer position?
Explanation: The loc[] indexer selects data using explicit labels, making it suitable for named rows. iloc[] selects by integer index, not by label. read[] and find[] are not valid Pandas indexers.
What is a DataFrame in the context of Python’s Pandas library?
Explanation: A DataFrame is a table with rows and columns, similar to a spreadsheet. A one-dimensional numeric array is called a Series in Pandas. file-reading functions or programming interfaces are unrelated to DataFrame's definition.
Which SQL function calculates the total sum of values in a selected column?
Explanation: The SUM() function calculates the total of all numeric values in a column. ROWID() returns the unique ID for each row, GROUP BY organizes records but does not perform calculations itself, and INDEX() is not a standard SQL aggregate function.
What does Row-Level Security (RLS) allow you to control in Power BI reports?
Explanation: RLS restricts data access so different users see only the rows relevant to them, enhancing security and personalization. It does not affect report file size, available chart types, or report refresh timing.
In Power BI, where is a calculated column created, and how does it differ from a measure?
Explanation: A calculated column is created at the table level with values calculated for each row, whereas measures are dynamic and computed in the context of report filters. Conditional formatting does not create columns, and calculated columns do not replace or add images to reports.
What is a Bookmark in Power BI and how is it typically used?
Explanation: A Bookmark saves a specific view of a report, allowing users to return to that state or create interactive presentations. It does not connect data sources or provide encryption. Scheduling refresh is managed by separate options in Power BI.