Discover essential concepts and best practices for data manipulation, cleaning, and analysis using Python's powerful Pandas library. This quiz covers key skills from creating data structures to advanced time series methods.
What is the main difference between a Pandas Series and a DataFrame?
Explanation: A Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional structure with rows and columns. Both can hold various data types. Series can hold strings if needed, so option 2 is incorrect. Neither structure automatically visualizes data without external code, making option 3 wrong. Speed depends on the data and operation, not the structure itself, so option 4 is not accurate.
How would you select all rows in a DataFrame where the 'Age' column is greater than 30?
Explanation: The correct way to filter rows based on a condition in Pandas is using boolean indexing: df[df['Age'] > 30]. Options 2 and 3 use non-standard or incorrect syntax. Option 4 uses the wrong comparison and would select rows where 'Age' is less than 30.
Which method would you use to remove all rows with missing values from a Pandas DataFrame?
Explanation: df.dropna(inplace=True) removes all rows with missing values in place. The methods remove_nulls() and delete_missing() do not exist in Pandas. df.fillna(0) fills missing values with zero instead of removing the rows.
If you want to calculate the average value of each group in the 'Category' column, which Pandas operation would you use?
Explanation: The groupby() method in Pandas groups data by the values in 'Category', and mean() computes the average for each group. merge() and summarize() are incorrect or non-existent methods for this purpose. groupby('Category').sum() would compute sums, not averages.
How do you convert a column containing date strings to proper datetime objects in a DataFrame?
Explanation: pd.to_datetime() converts a column of date strings into datetime objects, enabling time-based operations. Option 2 misuses the dt accessor, which requires the column to already be datetime type. Option 3 uses an incorrect function. Option 4 attempts to cast dates to a numeric type, which is not appropriate for date handling.