Advanced Principal Component Analysis: Eigenvalues and Eigenvectors Quiz Quiz

Challenge your understanding of advanced Principal Component Analysis concepts focused on eigenvalues and eigenvectors, including their calculation, interpretation, and applications in dimensionality reduction and data variance. Ideal for anyone seeking to deepen their foundational knowledge of PCA mechanics and linear algebra’s role in machine learning.

  1. Interpreting Eigenvalues in PCA

    In Principal Component Analysis, what does a higher eigenvalue indicate about a principal component when analyzing a data set’s covariance matrix?

    1. The principal component is more likely to be correlated with the original variables.
    2. The principal component explains a larger proportion of the data's variance.
    3. The principal component is unrelated to the data's mean.
    4. The principal component has fewer missing values.

    Explanation: A higher eigenvalue in PCA indicates that its corresponding principal component accounts for more of the variance in the data. Variance explained is a key metric for feature selection and dimension reduction tasks. Being correlated with original variables does not directly relate to the eigenvalue; this is a common misconception. The number of missing values is unrelated to eigenvalues, and principal components are always related to the covariance and mean of the dataset, not 'unrelated.'

  2. Understanding Eigenvectors

    Which property best describes eigenvectors in the context of Principal Component Analysis applied to a standardized data set?

    1. They define the directions of maximum variance in the data.
    2. They correspond to the means of the variables.
    3. They represent the standardized values of the dataset.
    4. They are always orthogonal to the data axes.

    Explanation: Eigenvectors in PCA indicate the directions, or axes, along which variance is maximized, guiding the transformation of data into principal components. They do not represent the means but instead show the directions for principal components. While eigenvectors for a symmetric covariance matrix are orthogonal to each other, they are not necessarily orthogonal to the data axes. They are also not standardized values themselves.

  3. Calculating Principal Components

    After extracting the top two principal components from a data set, what is the correct process for projecting the data onto this new two-dimensional space?

    1. Multiply the original data matrix by the matrix of eigenvalues.
    2. Divide the original data matrix by the number of observations.
    3. Multiply the original data matrix by the matrix of the two principal eigenvectors.
    4. Add the two principal eigenvalues to each data point.

    Explanation: Projecting data onto principal components involves a matrix multiplication of the standardized data with the selected eigenvectors. Eigenvalues are not directly used for projection but for ranking components. Adding eigenvalues or dividing by the number of observations are not relevant steps in PCA’s dimensionality reduction process.

  4. Variance Explained by Principal Components

    If the first principal component explains 65% of the variance and the second explains 20%, what percentage of the original data variation is preserved by using the first two principal components?

    1. 65%
    2. 45%
    3. 85%
    4. 100%

    Explanation: Variance preserved by selected principal components is the sum of their individual explained variances, which is 65% plus 20%, totaling 85%. 45% and 65% underestimate the actual variance retained. 100% would only apply if all components were included.

  5. Role of Covariance Matrix

    Why is the covariance matrix used in PCA rather than the correlation matrix when all variables have the same scale?

    1. Because it preserves the variance structure and relationships in their original units.
    2. Because it guarantees a higher dimensional space.
    3. Because it removes the effect of scaling and standardizes the data.
    4. Because it eliminates outliers in the data.

    Explanation: Using the covariance matrix for variables with similar scales maintains their real variance and relationships, which might be lost with correlation. The correlation matrix standardizes data, but that’s only necessary when variables are on different scales. The covariance matrix does not impact dimensionality or handle outliers.

  6. Eigenvalue Decomposition Outcome

    Upon performing eigenvalue decomposition of a covariance matrix in PCA, which output is directly interpreted as the importance of each principal component?

    1. The number of rows in the data set.
    2. The original variable means.
    3. The product of the data matrix and the eigenvectors.
    4. The sorted list of eigenvalues, with higher values being more important.

    Explanation: Eigenvalues, once sorted, directly show the importance of each principal component—the higher the eigenvalue, the more variance explained. The product of the data and eigenvectors yields projected scores, not importance. Variable means and the number of rows do not measure component importance.

  7. Eigenvectors and Data Transformation

    When transforming a data set using PCA, which mathematical object is applied to rotate the data axes to the directions of greatest variance?

    1. Eigenvalues of the data matrix
    2. The identity matrix
    3. The cluster centers
    4. Eigenvectors of the covariance matrix

    Explanation: It’s the eigenvectors that define the directions of new axes (principal components) onto which the data is rotated for maximum variance. Eigenvalues rank the axes but do not perform the transformation. The identity matrix leaves the data unchanged, and cluster centers are unrelated to PCA.

  8. Sum of Eigenvalues in PCA

    Given a covariance matrix derived from four standardized variables, what does the sum of its eigenvalues represent in PCA?

    1. The number of principal components.
    2. The average value of the variables.
    3. The minimum value in the data set.
    4. The total variance in the original data set.

    Explanation: The sum of the eigenvalues of a covariance matrix equals the total variance present in the data set. The number of components refers to the number of variables, but the sum of eigenvalues is not equal to that unless variance is exactly one per variable. Minimum and average values are not represented by eigenvalues.

  9. Selecting Dimensions in PCA

    When selecting how many principal components to retain, which criterion is most directly informed by examining the sorted eigenvalues?

    1. Retain the same number of components as variables.
    2. Retain components that together explain a desired cumulative percentage of variance.
    3. Retain only those with negative eigenvalues.
    4. Retain components with the smallest eigenvalues.

    Explanation: The sorted eigenvalues reveal how much variance each principal component explains, guiding the choice based on cumulative variance. Negative eigenvalues are not possible for a positive semi-definite covariance matrix. Retaining as many components as variables defeats dimensionality reduction. Smallest eigenvalues explain the least variance.

  10. Reconstruction Error in PCA

    If only the first two principal components are used to reconstruct data originally in five dimensions, what does the sum of the remaining three eigenvalues represent?

    1. The product of all explained variances.
    2. The total mean of the original variables.
    3. The variance lost or not captured in the reduced representation.
    4. The number of new samples created.

    Explanation: The sum of the discarded eigenvalues quantifies the variance not retained during dimensionality reduction, representing information loss. Means are unaffected by dimensionality reduction. The product of variances is not meaningful here, and PCA does not create new samples, only new representations.