Explore core concepts and applications of Principal Component Analysis (PCA) within machine learning workflows. This quiz assesses your ability to understand PCA's purpose, processes, and its impact on data preprocessing and dimensionality reduction techniques.
What is the main purpose of applying Principal Component Analysis (PCA) to a dataset in a typical machine learning workflow?
Explanation: PCA is primarily used for reducing the dimensionality of a dataset by transforming it into a smaller set of uncorrelated variables, or principal components, that preserve most of the original variance. Increasing features is not the goal; instead, it's about summarizing the data efficiently. Replacing missing values or encoding categorical variables are handled by other preprocessing methods, not PCA.
Why is it important to standardize or normalize features before applying PCA in a workflow?
Explanation: PCA relies on the variance of each feature, so features with larger scales can dominate the result if the data is not standardized. PCA does not require binary variables, nor does standardization make data sparser. Normalization does not necessarily remove outliers; its main effect is ensuring that features contribute equally to the analysis.
When applying PCA, what do the new variables called 'principal components' represent?
Explanation: Principal components are constructed as linear combinations of the original features and are ordered to capture as much variance as possible. They are not the same as the original features nor are they random; they summarize the most important patterns. They are unrelated to the target labels in supervised learning tasks.
Suppose PCA is used to reduce a dataset from ten features to three principal components, retaining 90% of the variance. What does this indicate?
Explanation: Retaining 90% variance means the first three principal components contain most of the important information from the data. However, these three components do not enable perfect reconstruction; some information is lost. Using all ten components would indeed capture full variance, but PCA does not specifically remove outliers.
How does PCA handle highly correlated features in a dataset?
Explanation: PCA works by merging correlated features into principal components, thereby capturing shared information more efficiently. It does not exclude correlated features or simply give more weight to uncorrelated ones. PCA does the opposite of duplication—it reduces redundancy, not increases it.
How can integrating PCA into a machine learning pipeline help reduce the risk of overfitting?
Explanation: PCA helps to remove irrelevant or redundant information by retaining only the most important directions in the data, which can reduce overfitting and help models generalize better. Increasing features or discarding data points are not effective ways to counter overfitting, and dataset balance is unrelated to PCA’s function.
After applying PCA to a set of numeric features, what is the nature of the output for each sample?
Explanation: The output of PCA is a transformed dataset where each sample is described by its values along the principal component axes. PCA doesn't produce just feature names or categorical labels, and it does not yield unprocessed data.
In PCA, what mathematical concept defines the direction of each principal component?
Explanation: The direction of each principal component is defined by the eigenvectors of the covariance matrix, which point in the directions of maximum variance. Neither the means nor medians of the features or data points determine these directions, and random permutation has no relevance.
How is PCA commonly used to assist in the visualization of high-dimensional data?
Explanation: PCA allows complex, high-dimensional data to be visualized in a simpler 2D or 3D space, making patterns more apparent. It does not convert numerical data to categories, encrypt the data, or perform dataset splitting for model validation.
Which of the following is a limitation of Principal Component Analysis when used in machine learning workflows?
Explanation: PCA only captures linear patterns and may miss nonlinear relationships in data. It does not guarantee any model performance by itself and cannot handle categorical variables unless they're encoded numerically. There is no requirement with PCA that all data values must be normalized to between 0 and 1.