Pandas and NumPy: Data Wrangling Essentials Quiz Quiz

Explore key concepts and techniques for efficient data wrangling using Pandas and NumPy in this beginner-friendly quiz. Assess your understanding of arrays, DataFrames, data selection, reshaping, and core data manipulation functions vital for powerful data analysis tasks.

  1. Identifying DataFrame Creation

    Which of the following code snippets correctly creates a DataFrame from a Python dictionary with columns 'A' and 'B' using Pandas?

    1. pd.DataFrame({'A': [1,2], 'B': [3,4]})
    2. pd.DataFrame(['A': [1,2], 'B': [3,4]})
    3. np.DataFrame({'A': [1,2], 'B': [3,4]})
    4. pd.DataArray({'A': [1,2], 'B': [3,4]})

    Explanation: The correct answer uses pd.DataFrame to create a DataFrame from a dictionary, which is the standard method in Pandas. np.DataFrame is incorrect because NumPy does not have a DataFrame constructor. pd.DataArray does not exist; DataArray is used in other libraries. The fourth option uses incorrect syntax for constructing a dictionary, making it invalid.

  2. Selecting Rows with Boolean Indexing

    In Pandas, if df is a DataFrame, which command selects all rows where the column 'score' is greater than 80?

    1. df.loc[df['score'] u003C 80]
    2. df.iloc[df['score'] u003E 80]
    3. df[df['score'] u003E 80]
    4. df['score' u003E 80]

    Explanation: df[df['score'] u003E 80] correctly selects rows where the 'score' column is greater than 80 using boolean indexing. The second option would select scores less than 80, which is the opposite of the requirement. The third option uses incorrect syntax and would cause an error. The fourth option misuses iloc, which is designed for integer-based indexing, not boolean.

  3. Understanding NumPy Array Shape

    If arr is a NumPy array created with arr = np.array([[5, 6, 7], [8, 9, 10]]), what is the shape of arr?

    1. (3, 2)
    2. (6,)
    3. (2, 3)
    4. (2, 6)

    Explanation: The array arr has 2 rows and 3 columns, making its shape (2, 3). The second option reverses the dimensions. The third option treats the array as flattened, which is not correct in this case. The fourth option is incorrect because there are only 6 elements, not a 2 by 6 shape.

  4. Renaming Columns in Pandas

    Which function should you use to rename a column called 'old_name' to 'new_name' in a Pandas DataFrame?

    1. df.switch_columns('old_name', 'new_name')
    2. df.changename('old_name', 'new_name')
    3. df.relabel({'old_name': 'new_name'})
    4. df.rename(columns={'old_name': 'new_name'})

    Explanation: df.rename(columns={'old_name': 'new_name'}) is the correct method for renaming columns in Pandas. df.relabel and df.changename are not valid Pandas functions and will result in errors. df.switch_columns is not a standard Pandas method, and does not exist for renaming columns.

  5. Finding Missing Values with Pandas

    What function would you use to count the number of missing (NaN) values in each column of a DataFrame called df?

    1. df.count_nan()
    2. df.isnull().sum()
    3. df.countna()
    4. df.isnan()

    Explanation: The correct approach is df.isnull().sum(), which checks for missing values and then sums them column-wise. df.count_nan() and df.countna() are not valid Pandas methods. df.isnan() would not work on a DataFrame and is more commonly used with arrays.

  6. Concatenating NumPy Arrays

    Given two NumPy arrays a = np.array([1, 2]) and b = np.array([3, 4]), which function can combine them into one array [1, 2, 3, 4]?

    1. np.merge(a, b)
    2. pd.merge(a, b)
    3. np.combine(a, b)
    4. np.concatenate([a, b])

    Explanation: np.concatenate([a, b]) is the correct function to join two NumPy arrays along an existing axis. pd.merge is used with DataFrames, not arrays. np.combine and np.merge do not exist in NumPy, and would cause errors.

  7. Dropping Rows with Pandas

    How would you remove a row with the index label 2 from a Pandas DataFrame called df?

    1. df.drop(2)
    2. df.delete(2)
    3. df.delrow(2)
    4. df.remove_row(2)

    Explanation: df.drop(2) is the correct method to remove a row by its index label in Pandas. The other functions, remove_row, delrow, and delete, are not valid Pandas DataFrame methods and will return errors.

  8. Accessing DataFrame Elements with iloc

    Which command would retrieve the value from the first row and second column of a DataFrame df using integer location?

    1. df.getvalue(0,1)
    2. df.access(1,2)
    3. df.iloc[0, 1]
    4. df.loc[0, 1]

    Explanation: df.iloc[0, 1] accesses the first row and second column using zero-based integer indexing. df.loc uses label-based indexing, which may not correspond to position. df.getvalue and df.access are not valid DataFrame methods for this purpose.

  9. Converting Data Types in Pandas Series

    Given a Pandas Series s containing numbers as strings, which method converts s to integers?

    1. s.type(int)
    2. s.astype(int)
    3. s.totype('int')
    4. s.convert(int)

    Explanation: s.astype(int) is the correct method to convert the data type of a Pandas Series. s.convert, s.totype, and s.type are either not methods of a Series or do not perform type conversion, making them incorrect.

  10. Replacing Values in a DataFrame

    Which command correctly replaces all occurrences of the value 0 in a DataFrame df with NaN?

    1. df.replace(0, np.nan)
    2. df.fillna(0, np.nan)
    3. df.switch(0, np.nan)
    4. df.nanreplace(0)

    Explanation: df.replace(0, np.nan) is the correct way to substitute all 0 values with NaN in a DataFrame using Pandas. df.switch and df.nanreplace are not valid DataFrame methods. df.fillna is used for replacing NaN values, not for targeting a specific value like 0.