Principal Component Analysis (PCA) Quiz Quiz

Challenge your understanding of Principal Component Analysis (PCA) with this focused quiz covering its fundamentals, objectives, and applications. Enhance your grasp of dimensionality reduction, eigenvalues, and data transformation in PCA.

  1. Purpose of PCA

    Which is the primary objective of applying Principal Component Analysis (PCA) to a data set with many correlated variables?

    1. To reduce the dimensionality while retaining most of the variance
    2. To increase the number of variables for deeper analysis
    3. To normalize all values to a zero mean
    4. To remove outliers through automated clustering

    Explanation: PCA is mainly used to reduce the number of variables by transforming them into principal components that capture the majority of the data's variance. It does not increase the variable count nor directly remove outliers, making options B and C incorrect. While data is centered before PCA, its main goal is not simply normalization (option D).

  2. Eigenvalues and Principal Components

    Why are eigenvalues important when interpreting principal components in PCA?

    1. They indicate the amount of variance captured by each principal component
    2. They represent angles between original variables
    3. They label each variable after rotation
    4. They serve as scaling factors for matrix normalization

    Explanation: Eigenvalues reveal how much variance each principal component explains, guiding how many components to retain. They do not represent geometric angles (option B) or serve as mere scaling factors (option C). Option D incorrectly suggests labels instead of quantitative measures of variance.

  3. Component Selection

    Suppose you apply PCA to a dataset, and the first two components explain 85% of the variance. What is a recommended next step?

    1. Use the smallest eigenvalues as the main features
    2. Retain only the first two components for further analysis
    3. Increase the number of components to 100
    4. Discard the results since 100% variance is not explained

    Explanation: If the first two components cover most variance, using them simplifies the data while preserving key information. Increasing to 100 components defeats PCA's purpose (option B), and using the smallest eigenvalues (option C) would ignore major information. Demanding 100% variance (option D) is impractical as minor components often capture noise.

  4. Data Preprocessing

    Why is it generally important to standardize features before applying PCA to a dataset with variables on different scales?

    1. Because only standardization creates binary variables
    2. Because standardization always leads to zero variance
    3. Because variables with larger scales could dominate the principal components
    4. Because PCA automatically standardizes data

    Explanation: Standardizing ensures that each variable contributes equally, preventing those with larger ranges from dominating the principal components. PCA itself does not standardize data (option B). Standardization does not cause zero variance (option C) or convert data to binary variables (option D).

  5. Interpretation of Loadings

    After performing PCA, what does a high absolute value in the loading of a variable on a principal component imply?

    1. The variable must always be removed
    2. The variable strongly influences that principal component
    3. The variable is not included in any component
    4. The component explains very little variance

    Explanation: A high absolute loading marks a variable's strong influence on the corresponding principal component, highlighting an important relationship. Variables are not omitted from components (option B), and high loadings do not imply low variance (option C). There is no requirement to remove such variables (option D).