Machine Learning Fundamentals: Common Interview Problem Quiz Quiz

Challenge your understanding of core machine learning concepts with these scenario-driven and practical interview-style questions. Ideal for those preparing for machine learning fundamentals by reviewing key algorithms, model concepts, and essential terminology in the ai_machine_learning domain.

Supervised vs Unsupervised Learning
Which type of machine learning requires labeled data to train the model on input-output mappings?
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
4. Transfer learning
Explanation: Supervised learning relies on labeled data, meaning each input has a corresponding correct output, helping the model learn the relationship. Unsupervised learning does not use labeled outputs, instead grouping or clustering data based solely on input similarity. Reinforcement learning involves decision-making with rewards or penalties, not explicit labels per example. Transfer learning is a technique for leveraging a model trained on one task for another and is not itself a type of learning based on data labeling.
Classification vs Regression
If your task is to predict the price of a house based on its features, which type of machine learning problem is this?
1. Regression
2. Classification
3. Clustering
4. Association
Explanation: Predicting a continuous numerical value, such as a house price, is a regression problem. Classification would involve assigning discrete labels, like whether a house is luxury or standard. Clustering divides items into groups without labels, and association is about finding rules that describe large portions of your data, none of which fit this scenario.
Overfitting Concept
What is it called when a model performs well on training data but poorly on unseen test data?
1. Overfitting
2. Underfitting
3. Generalization
4. Normalization
Explanation: Overfitting occurs when a model memorizes the training data, including noise, leading to poor performance on new data. Underfitting means a model does not learn enough from the data to make accurate predictions. Generalization is the model’s ability to perform well on unseen data. Normalization refers to scaling input variables and is unrelated to model performance concerns.
Bias-Variance Tradeoff
In machine learning, what tradeoff describes how increasing a model's complexity can reduce bias but increase variance?
1. Bias-variance tradeoff
2. Hyperparameter grid
3. Batch normalization
4. Gradient checking
Explanation: The bias-variance tradeoff explains that more complex models may capture patterns better (lower bias) but can become overly sensitive to noise (higher variance). Hyperparameter grid relates to searching combinations of settings, batch normalization normalizes layers in neural networks, and gradient checking validates gradient computations, none of which directly refer to this tradeoff.
K-Nearest Neighbors Function
Which algorithm classifies new examples based on the majority class among the k closest training samples?
1. K-Nearest Neighbors
2. Decision Tree
3. Naive Bayes
4. Principal Component Analysis
Explanation: K-Nearest Neighbors (KNN) classifies data points by looking at the labels of the nearest points and choosing the majority class. Decision trees make splits based on feature values, Naive Bayes uses Bayes' theorem with independence assumptions, and Principal Component Analysis is a dimensionality reduction method, not a classifier.
Feature Scaling Reasoning
Why do some machine learning algorithms require feature scaling before training?
1. To ensure all features contribute equally to distance calculations
2. To increase the feature space dimensionality
3. To introduce randomness into the dataset
4. To automatically remove duplicate records
Explanation: Feature scaling ensures features with larger numeric ranges do not dominate calculations, especially in algorithms like KNN or SVM that use distance metrics. Increasing feature space dimensionality is unrelated to feature scaling. Introducing randomness and removing duplicates are distinct preprocessing steps not associated with feature scaling.
One-Hot Encoding Purpose
What is the main purpose of one-hot encoding in categorical feature processing?
1. To convert categorical variables into a format that can be used by machine learning algorithms
2. To lower the computational cost of training
3. To create non-linear relationships between features
4. To ensure each variable has equal variance
Explanation: One-hot encoding creates binary columns for each category, enabling algorithms to interpret categorical data. It does not necessarily lower computational cost and may actually increase dimensionality. One-hot encoding doesn't introduce non-linearity or ensure equal variance among variables.
Support Vector Machine Output
What does a linear Support Vector Machine essentially compute to separate two classes?
1. A maximum-margin hyperplane
2. A decision forest
3. A probability distribution
4. A recursive average
Explanation: A linear SVM finds the hyperplane that maximizes the margin between classes. It does not construct a decision forest—that's associated with random forests. SVMs do not output probability distributions by default, nor do they use recursive averages to separate data.
Loss Function Role
In model training, why do we use a loss function?
1. To measure how well the model predicts the target values
2. To generate random subsets of data
3. To encode target variables
4. To partition data into clusters
Explanation: The loss function quantifies the difference between predicted and actual values, guiding the training process. Generating subsets of data refers to techniques like bootstrapping; encoding target variables is done via encoding schemes; partitioning data into clusters relates to clustering algorithms, not loss functions.
Cross-Validation Utility
Why do practitioners use k-fold cross-validation when developing a machine learning model?
1. To obtain a more reliable estimate of model performance
2. To eliminate the need for preprocessing
3. To reduce the number of model parameters
4. To ensure features are orthogonal
Explanation: K-fold cross-validation splits data into folds, training and validating the model on all subsets, leading to a more robust performance estimate. It does not eliminate preprocessing, reduce model parameters, or enforce feature orthogonality, which are unrelated processes.
Label Encoding Caveat
What is a potential problem when applying label encoding to a categorical feature such as color: {red, green, blue}?
1. Algorithms may mistakenly treat encoded values as ordered
2. Encoding always decreases training time
3. It guarantees higher prediction accuracy
4. It converts numerical features to text
Explanation: Label encoding assigns integers to each category, possibly implying an order where none exists. This can mislead algorithms sensitive to numerical order. Encoding does not always improve training speed, nor does it guarantee accuracy. It does not transform numbers to text; it works in the opposite direction.
Train-Test Split Importance
Why is it important to use a separate test set when evaluating your machine learning model?
1. To evaluate model performance on unseen data
2. To save memory
3. To automatically tune hyperparameters
4. To increase the size of the training set
Explanation: A test set simulates how the model will perform on future, real-world data. Its main purpose is unbiased evaluation. It does not reduce memory usage or tune hyperparameters, and splitting uses up some data for testing rather than increasing training data.
Gradient Descent Step
What does each step of the gradient descent algorithm attempt to accomplish during model training?
1. Decrease the loss function value
2. Increase model overfitting
3. Randomly select feature variables
4. Increase the batch size
Explanation: Gradient descent updates model parameters to reduce the loss function, guiding the model toward better predictions. Its goal is not to encourage overfitting; randomness in feature selection is unrelated. Batch size changes affect computational efficiency, not the purpose of gradient descent steps.
Decision Tree Property
Which statement best describes the interpretability of decision tree models?
1. They are generally easy to interpret and visualize
2. They always require feature scaling
3. They cannot be used for regression tasks
4. They only work with binary features
Explanation: Decision trees are known for their clear, rule-based structure that is easily visualized. They do not require feature scaling like some algorithms do. Decision trees can be used for both classification and regression tasks and can handle categorical, not just binary, features.
Ensemble Methods Purpose
What is the main advantage of using ensemble methods like bagging or boosting in machine learning?
1. They combine multiple models to improve prediction accuracy
2. They guarantee 100% accuracy on all datasets
3. They completely eliminate training time
4. They are only suitable for small datasets
Explanation: Ensemble methods aggregate results from different models to achieve better performance. They do not guarantee perfect accuracy and may even take longer to train due to model combination. Ensembles work with varying dataset sizes, not just small ones.
Naive Bayes Assumption
What fundamental assumption does the Naive Bayes classifier make about features?
1. All features are conditionally independent given the class label
2. Features must be on the same scale
3. The dataset is linearly separable
4. Features are always categorical
Explanation: Naive Bayes assumes that the presence or absence of a feature is independent of others, given the output class. It does not require features to share scale, does not depend on linear separability, and is not restricted to categorical variables.
Evaluating Classification Accuracy
If a model predicts 90 out of 100 test labels correctly, what is the classification accuracy?
1. 90%
2. 10%
3. 180%
4. 9%
Explanation: Accuracy is calculated as the number of correct predictions divided by the total number of predictions, yielding 90%. Ten percent and nine percent are incorrect, and accuracy cannot exceed 100% so 180% is invalid.
Confusion Matrix Meaning
What does the confusion matrix help you to analyze in a classification problem?
1. The number of correct and incorrect predictions for each class
2. The correlation between features
3. The underlying loss function
4. The training time required
Explanation: A confusion matrix breaks down predictions by true and predicted class, showing both correct and misclassified instances. Feature correlation and loss function details are not visualized here, nor does it inform on training duration.
ROC Curve Purpose
In binary classification, what does the ROC (Receiver Operating Characteristic) curve plot?
1. True positive rate against false positive rate
2. Training accuracy versus test accuracy
3. Feature importance versus overfitting
4. Loss value against number of epochs
Explanation: The ROC curve demonstrates the trade-off between the true positive rate and false positive rate at various threshold settings. It does not compare training versus test accuracy, feature importance versus overfitting, nor does it plot loss value over training epochs.
Data Leakage Issue
What can result from using information from the test set during model training?
1. Data leakage, leading to inflated performance estimates
2. Faster feature encoding
3. Improved feature independence
4. Reduced model capacity
Explanation: Including test data information in training causes data leakage, making performance evaluation unreliable. Faster feature encoding, feature independence, and model capacity are not direct consequences of this issue.
Clustering Objective
What is the primary goal in clustering problems such as grouping customers by purchase behavior?
1. To find natural groupings (clusters) that are similar within and different between
2. To label each item with ground truth classes
3. To maximize overfitting
4. To train in a supervised manner
Explanation: Clustering identifies naturally occurring groups among data points without prior labels. Assigning ground truth classes describes classification, maximizing overfitting is not a goal, and clustering is an unsupervised technique, not supervised.

Machine Learning Fundamentals: Common Interview Problem Quiz Quiz

Supervised vs Unsupervised Learning

Classification vs Regression

Overfitting Concept

Bias-Variance Tradeoff

K-Nearest Neighbors Function

Feature Scaling Reasoning

One-Hot Encoding Purpose

Support Vector Machine Output

Loss Function Role

Cross-Validation Utility

Label Encoding Caveat

Train-Test Split Importance

Gradient Descent Step

Decision Tree Property

Ensemble Methods Purpose

Naive Bayes Assumption

Evaluating Classification Accuracy

Confusion Matrix Meaning

ROC Curve Purpose

Data Leakage Issue

Clustering Objective