Machine Learning Fundamentals: Common Interview Problem Quiz Quiz

Challenge your understanding of core machine learning concepts with these scenario-driven and practical interview-style questions. Ideal for those preparing for machine learning fundamentals by reviewing key algorithms, model concepts, and essential terminology in the ai_machine_learning domain.

  1. Supervised vs Unsupervised Learning

    Which type of machine learning requires labeled data to train the model on input-output mappings?

    1. Supervised learning
    2. Unsupervised learning
    3. Reinforcement learning
    4. Transfer learning

    Explanation: Supervised learning relies on labeled data, meaning each input has a corresponding correct output, helping the model learn the relationship. Unsupervised learning does not use labeled outputs, instead grouping or clustering data based solely on input similarity. Reinforcement learning involves decision-making with rewards or penalties, not explicit labels per example. Transfer learning is a technique for leveraging a model trained on one task for another and is not itself a type of learning based on data labeling.

  2. Classification vs Regression

    If your task is to predict the price of a house based on its features, which type of machine learning problem is this?

    1. Regression
    2. Classification
    3. Clustering
    4. Association

    Explanation: Predicting a continuous numerical value, such as a house price, is a regression problem. Classification would involve assigning discrete labels, like whether a house is luxury or standard. Clustering divides items into groups without labels, and association is about finding rules that describe large portions of your data, none of which fit this scenario.

  3. Overfitting Concept

    What is it called when a model performs well on training data but poorly on unseen test data?

    1. Overfitting
    2. Underfitting
    3. Generalization
    4. Normalization

    Explanation: Overfitting occurs when a model memorizes the training data, including noise, leading to poor performance on new data. Underfitting means a model does not learn enough from the data to make accurate predictions. Generalization is the model’s ability to perform well on unseen data. Normalization refers to scaling input variables and is unrelated to model performance concerns.

  4. Bias-Variance Tradeoff

    In machine learning, what tradeoff describes how increasing a model's complexity can reduce bias but increase variance?

    1. Bias-variance tradeoff
    2. Hyperparameter grid
    3. Batch normalization
    4. Gradient checking

    Explanation: The bias-variance tradeoff explains that more complex models may capture patterns better (lower bias) but can become overly sensitive to noise (higher variance). Hyperparameter grid relates to searching combinations of settings, batch normalization normalizes layers in neural networks, and gradient checking validates gradient computations, none of which directly refer to this tradeoff.

  5. K-Nearest Neighbors Function

    Which algorithm classifies new examples based on the majority class among the k closest training samples?

    1. K-Nearest Neighbors
    2. Decision Tree
    3. Naive Bayes
    4. Principal Component Analysis

    Explanation: K-Nearest Neighbors (KNN) classifies data points by looking at the labels of the nearest points and choosing the majority class. Decision trees make splits based on feature values, Naive Bayes uses Bayes' theorem with independence assumptions, and Principal Component Analysis is a dimensionality reduction method, not a classifier.

  6. Feature Scaling Reasoning

    Why do some machine learning algorithms require feature scaling before training?

    1. To ensure all features contribute equally to distance calculations
    2. To increase the feature space dimensionality
    3. To introduce randomness into the dataset
    4. To automatically remove duplicate records

    Explanation: Feature scaling ensures features with larger numeric ranges do not dominate calculations, especially in algorithms like KNN or SVM that use distance metrics. Increasing feature space dimensionality is unrelated to feature scaling. Introducing randomness and removing duplicates are distinct preprocessing steps not associated with feature scaling.

  7. One-Hot Encoding Purpose

    What is the main purpose of one-hot encoding in categorical feature processing?

    1. To convert categorical variables into a format that can be used by machine learning algorithms
    2. To lower the computational cost of training
    3. To create non-linear relationships between features
    4. To ensure each variable has equal variance

    Explanation: One-hot encoding creates binary columns for each category, enabling algorithms to interpret categorical data. It does not necessarily lower computational cost and may actually increase dimensionality. One-hot encoding doesn't introduce non-linearity or ensure equal variance among variables.

  8. Support Vector Machine Output

    What does a linear Support Vector Machine essentially compute to separate two classes?

    1. A maximum-margin hyperplane
    2. A decision forest
    3. A probability distribution
    4. A recursive average

    Explanation: A linear SVM finds the hyperplane that maximizes the margin between classes. It does not construct a decision forest—that's associated with random forests. SVMs do not output probability distributions by default, nor do they use recursive averages to separate data.

  9. Loss Function Role

    In model training, why do we use a loss function?

    1. To measure how well the model predicts the target values
    2. To generate random subsets of data
    3. To encode target variables
    4. To partition data into clusters

    Explanation: The loss function quantifies the difference between predicted and actual values, guiding the training process. Generating subsets of data refers to techniques like bootstrapping; encoding target variables is done via encoding schemes; partitioning data into clusters relates to clustering algorithms, not loss functions.

  10. Cross-Validation Utility

    Why do practitioners use k-fold cross-validation when developing a machine learning model?

    1. To obtain a more reliable estimate of model performance
    2. To eliminate the need for preprocessing
    3. To reduce the number of model parameters
    4. To ensure features are orthogonal

    Explanation: K-fold cross-validation splits data into folds, training and validating the model on all subsets, leading to a more robust performance estimate. It does not eliminate preprocessing, reduce model parameters, or enforce feature orthogonality, which are unrelated processes.

  11. Label Encoding Caveat

    What is a potential problem when applying label encoding to a categorical feature such as color: {red, green, blue}?

    1. Algorithms may mistakenly treat encoded values as ordered
    2. Encoding always decreases training time
    3. It guarantees higher prediction accuracy
    4. It converts numerical features to text

    Explanation: Label encoding assigns integers to each category, possibly implying an order where none exists. This can mislead algorithms sensitive to numerical order. Encoding does not always improve training speed, nor does it guarantee accuracy. It does not transform numbers to text; it works in the opposite direction.

  12. Train-Test Split Importance

    Why is it important to use a separate test set when evaluating your machine learning model?

    1. To evaluate model performance on unseen data
    2. To save memory
    3. To automatically tune hyperparameters
    4. To increase the size of the training set

    Explanation: A test set simulates how the model will perform on future, real-world data. Its main purpose is unbiased evaluation. It does not reduce memory usage or tune hyperparameters, and splitting uses up some data for testing rather than increasing training data.

  13. Gradient Descent Step

    What does each step of the gradient descent algorithm attempt to accomplish during model training?

    1. Decrease the loss function value
    2. Increase model overfitting
    3. Randomly select feature variables
    4. Increase the batch size

    Explanation: Gradient descent updates model parameters to reduce the loss function, guiding the model toward better predictions. Its goal is not to encourage overfitting; randomness in feature selection is unrelated. Batch size changes affect computational efficiency, not the purpose of gradient descent steps.

  14. Decision Tree Property

    Which statement best describes the interpretability of decision tree models?

    1. They are generally easy to interpret and visualize
    2. They always require feature scaling
    3. They cannot be used for regression tasks
    4. They only work with binary features

    Explanation: Decision trees are known for their clear, rule-based structure that is easily visualized. They do not require feature scaling like some algorithms do. Decision trees can be used for both classification and regression tasks and can handle categorical, not just binary, features.

  15. Ensemble Methods Purpose

    What is the main advantage of using ensemble methods like bagging or boosting in machine learning?

    1. They combine multiple models to improve prediction accuracy
    2. They guarantee 100% accuracy on all datasets
    3. They completely eliminate training time
    4. They are only suitable for small datasets

    Explanation: Ensemble methods aggregate results from different models to achieve better performance. They do not guarantee perfect accuracy and may even take longer to train due to model combination. Ensembles work with varying dataset sizes, not just small ones.

  16. Naive Bayes Assumption

    What fundamental assumption does the Naive Bayes classifier make about features?

    1. All features are conditionally independent given the class label
    2. Features must be on the same scale
    3. The dataset is linearly separable
    4. Features are always categorical

    Explanation: Naive Bayes assumes that the presence or absence of a feature is independent of others, given the output class. It does not require features to share scale, does not depend on linear separability, and is not restricted to categorical variables.

  17. Evaluating Classification Accuracy

    If a model predicts 90 out of 100 test labels correctly, what is the classification accuracy?

    1. 90%
    2. 10%
    3. 180%
    4. 9%

    Explanation: Accuracy is calculated as the number of correct predictions divided by the total number of predictions, yielding 90%. Ten percent and nine percent are incorrect, and accuracy cannot exceed 100% so 180% is invalid.

  18. Confusion Matrix Meaning

    What does the confusion matrix help you to analyze in a classification problem?

    1. The number of correct and incorrect predictions for each class
    2. The correlation between features
    3. The underlying loss function
    4. The training time required

    Explanation: A confusion matrix breaks down predictions by true and predicted class, showing both correct and misclassified instances. Feature correlation and loss function details are not visualized here, nor does it inform on training duration.

  19. ROC Curve Purpose

    In binary classification, what does the ROC (Receiver Operating Characteristic) curve plot?

    1. True positive rate against false positive rate
    2. Training accuracy versus test accuracy
    3. Feature importance versus overfitting
    4. Loss value against number of epochs

    Explanation: The ROC curve demonstrates the trade-off between the true positive rate and false positive rate at various threshold settings. It does not compare training versus test accuracy, feature importance versus overfitting, nor does it plot loss value over training epochs.

  20. Data Leakage Issue

    What can result from using information from the test set during model training?

    1. Data leakage, leading to inflated performance estimates
    2. Faster feature encoding
    3. Improved feature independence
    4. Reduced model capacity

    Explanation: Including test data information in training causes data leakage, making performance evaluation unreliable. Faster feature encoding, feature independence, and model capacity are not direct consequences of this issue.

  21. Clustering Objective

    What is the primary goal in clustering problems such as grouping customers by purchase behavior?

    1. To find natural groupings (clusters) that are similar within and different between
    2. To label each item with ground truth classes
    3. To maximize overfitting
    4. To train in a supervised manner

    Explanation: Clustering identifies naturally occurring groups among data points without prior labels. Assigning ground truth classes describes classification, maximizing overfitting is not a goal, and clustering is an unsupervised technique, not supervised.