Challenge your understanding of core machine learning concepts with these scenario-driven and practical interview-style questions. Ideal for those preparing for machine learning fundamentals by reviewing key algorithms, model concepts, and essential terminology in the ai_machine_learning domain.
Which type of machine learning requires labeled data to train the model on input-output mappings?
Explanation: Supervised learning relies on labeled data, meaning each input has a corresponding correct output, helping the model learn the relationship. Unsupervised learning does not use labeled outputs, instead grouping or clustering data based solely on input similarity. Reinforcement learning involves decision-making with rewards or penalties, not explicit labels per example. Transfer learning is a technique for leveraging a model trained on one task for another and is not itself a type of learning based on data labeling.
If your task is to predict the price of a house based on its features, which type of machine learning problem is this?
Explanation: Predicting a continuous numerical value, such as a house price, is a regression problem. Classification would involve assigning discrete labels, like whether a house is luxury or standard. Clustering divides items into groups without labels, and association is about finding rules that describe large portions of your data, none of which fit this scenario.
What is it called when a model performs well on training data but poorly on unseen test data?
Explanation: Overfitting occurs when a model memorizes the training data, including noise, leading to poor performance on new data. Underfitting means a model does not learn enough from the data to make accurate predictions. Generalization is the model’s ability to perform well on unseen data. Normalization refers to scaling input variables and is unrelated to model performance concerns.
In machine learning, what tradeoff describes how increasing a model's complexity can reduce bias but increase variance?
Explanation: The bias-variance tradeoff explains that more complex models may capture patterns better (lower bias) but can become overly sensitive to noise (higher variance). Hyperparameter grid relates to searching combinations of settings, batch normalization normalizes layers in neural networks, and gradient checking validates gradient computations, none of which directly refer to this tradeoff.
Which algorithm classifies new examples based on the majority class among the k closest training samples?
Explanation: K-Nearest Neighbors (KNN) classifies data points by looking at the labels of the nearest points and choosing the majority class. Decision trees make splits based on feature values, Naive Bayes uses Bayes' theorem with independence assumptions, and Principal Component Analysis is a dimensionality reduction method, not a classifier.
Why do some machine learning algorithms require feature scaling before training?
Explanation: Feature scaling ensures features with larger numeric ranges do not dominate calculations, especially in algorithms like KNN or SVM that use distance metrics. Increasing feature space dimensionality is unrelated to feature scaling. Introducing randomness and removing duplicates are distinct preprocessing steps not associated with feature scaling.
What is the main purpose of one-hot encoding in categorical feature processing?
Explanation: One-hot encoding creates binary columns for each category, enabling algorithms to interpret categorical data. It does not necessarily lower computational cost and may actually increase dimensionality. One-hot encoding doesn't introduce non-linearity or ensure equal variance among variables.
What does a linear Support Vector Machine essentially compute to separate two classes?
Explanation: A linear SVM finds the hyperplane that maximizes the margin between classes. It does not construct a decision forest—that's associated with random forests. SVMs do not output probability distributions by default, nor do they use recursive averages to separate data.
In model training, why do we use a loss function?
Explanation: The loss function quantifies the difference between predicted and actual values, guiding the training process. Generating subsets of data refers to techniques like bootstrapping; encoding target variables is done via encoding schemes; partitioning data into clusters relates to clustering algorithms, not loss functions.
Why do practitioners use k-fold cross-validation when developing a machine learning model?
Explanation: K-fold cross-validation splits data into folds, training and validating the model on all subsets, leading to a more robust performance estimate. It does not eliminate preprocessing, reduce model parameters, or enforce feature orthogonality, which are unrelated processes.
What is a potential problem when applying label encoding to a categorical feature such as color: {red, green, blue}?
Explanation: Label encoding assigns integers to each category, possibly implying an order where none exists. This can mislead algorithms sensitive to numerical order. Encoding does not always improve training speed, nor does it guarantee accuracy. It does not transform numbers to text; it works in the opposite direction.
Why is it important to use a separate test set when evaluating your machine learning model?
Explanation: A test set simulates how the model will perform on future, real-world data. Its main purpose is unbiased evaluation. It does not reduce memory usage or tune hyperparameters, and splitting uses up some data for testing rather than increasing training data.
What does each step of the gradient descent algorithm attempt to accomplish during model training?
Explanation: Gradient descent updates model parameters to reduce the loss function, guiding the model toward better predictions. Its goal is not to encourage overfitting; randomness in feature selection is unrelated. Batch size changes affect computational efficiency, not the purpose of gradient descent steps.
Which statement best describes the interpretability of decision tree models?
Explanation: Decision trees are known for their clear, rule-based structure that is easily visualized. They do not require feature scaling like some algorithms do. Decision trees can be used for both classification and regression tasks and can handle categorical, not just binary, features.
What is the main advantage of using ensemble methods like bagging or boosting in machine learning?
Explanation: Ensemble methods aggregate results from different models to achieve better performance. They do not guarantee perfect accuracy and may even take longer to train due to model combination. Ensembles work with varying dataset sizes, not just small ones.
What fundamental assumption does the Naive Bayes classifier make about features?
Explanation: Naive Bayes assumes that the presence or absence of a feature is independent of others, given the output class. It does not require features to share scale, does not depend on linear separability, and is not restricted to categorical variables.
If a model predicts 90 out of 100 test labels correctly, what is the classification accuracy?
Explanation: Accuracy is calculated as the number of correct predictions divided by the total number of predictions, yielding 90%. Ten percent and nine percent are incorrect, and accuracy cannot exceed 100% so 180% is invalid.
What does the confusion matrix help you to analyze in a classification problem?
Explanation: A confusion matrix breaks down predictions by true and predicted class, showing both correct and misclassified instances. Feature correlation and loss function details are not visualized here, nor does it inform on training duration.
In binary classification, what does the ROC (Receiver Operating Characteristic) curve plot?
Explanation: The ROC curve demonstrates the trade-off between the true positive rate and false positive rate at various threshold settings. It does not compare training versus test accuracy, feature importance versus overfitting, nor does it plot loss value over training epochs.
What can result from using information from the test set during model training?
Explanation: Including test data information in training causes data leakage, making performance evaluation unreliable. Faster feature encoding, feature independence, and model capacity are not direct consequences of this issue.
What is the primary goal in clustering problems such as grouping customers by purchase behavior?
Explanation: Clustering identifies naturally occurring groups among data points without prior labels. Assigning ground truth classes describes classification, maximizing overfitting is not a goal, and clustering is an unsupervised technique, not supervised.