Explore 15 essential math concepts and problem-solving skills frequently tested in machine learning interviews. This easy-level quiz covers topics like statistics, linear algebra, probability, calculus, and their applications in fundamental AI and machine learning problems.
Consider the data set [2, 4, 4, 8, 100]. What is the median value of this set?
Explanation: The median is the middle value after sorting the set, which is 4 in this case. 8 and 100 are higher values but do not fall in the center position. 24 is not present in the data. 100 is the largest number and is not the median.
If the probability of an event A is 0.4, what is the probability that A does not occur?
Explanation: The probability that event A does not occur is 1 minus the probability that it does occur, so 1 - 0.4 = 0.6. 0.4 is the probability that A occurs, not that it does not. 1.4 is greater than the total probability. 0.2 does not relate to the given value.
Which statement best describes the standard deviation in a data set used for training a model?
Explanation: Standard deviation quantifies how much data points deviate from the mean value. The number of data points relates to the count, not standard deviation. Data being sorted is not about standard deviation. The most frequent value is called the mode.
What is the dot product of two vectors [1, 2] and [3, 4]?
Explanation: The dot product is calculated as (1*3) + (2*4) = 3 + 8 = 11. 14 is obtained by multiplying cross-elements incorrectly. 21 comes from multiplying all numbers together, which is incorrect. 10 does not result from the correct formula.
Why is feature normalization important before applying k-nearest neighbors to a dataset with age in years and income in dollars?
Explanation: Normalization puts features on the same scale so no single feature dominates the distance metric. Making the dataset larger is not a goal of normalization. Converting to integers is unnecessary for normalization. Changing the order is unrelated to normalization.
In gradient descent, what does the gradient represent at a point on the loss curve?
Explanation: The gradient points toward the steepest increase of the function, which gradient descent uses to move in the opposite direction and minimize loss. The midpoint relates to coordinates, not gradients. The average value is unrelated to gradients. The function's minimum value is found by moving against the gradient.
Given a matrix A = [[1, 2], [3, 4]], what is the transpose of A?
Explanation: Transposing a matrix swaps rows with columns, so the result is [[1, 3], [2, 4]]. Keeping the matrix unchanged, as in the second option, is not a transpose. The third and fourth options reverse the entries, which is not the definition of transpose.
In which situation is a model most likely overfitting: 98% accuracy on training data but only 50% on test data?
Explanation: High accuracy on training but poor test accuracy usually means the model memorizes training data and doesn't generalize, indicating overfitting. Underfitting would show poor performance on both sets. Perfect fitting is not possible, and random guessing would not reach high training accuracy.
If P(A) = 0.3, P(B) = 0.5, and P(A and B) = 0.15, what is P(A|B)?
Explanation: Conditional probability P(A|B) = P(A and B)/P(B) = 0.15/0.5 = 0.3. The option 0.5 is P(B), not the conditional probability. 0.15 is P(A and B), not P(A|B). The duplication of 0.3 is a realistic distractor, but the calculation confirms 0.3 is correct.
Which function outputs 1 if the input is positive and 0 if the input is negative or zero?
Explanation: The Heaviside step function jumps from 0 to 1 when the input crosses zero. The sigmoid outputs values between 0 and 1 but never reaches exactly 0 or 1. ReLU outputs 0 for negative inputs but outputs the exact value for positives. Tanh outputs values from -1 to 1.
How does an outlier, such as the value 999 in the set [1, 2, 3, 4, 999], most affect the mean and median?
Explanation: Outliers have a large effect on the mean because it is calculated as the average, while the median is more resistant to such changes. The second and third options are false because the mean is always more sensitive to single large values. The last option is incorrect as both are at least somewhat affected.
In linear algebra, what do you call a nonzero vector that changes direction only by a scalar factor when a matrix is applied to it?
Explanation: An eigenvector is a vector whose direction remains unchanged when multiplied by a specific matrix, just scaled. The determinant is a scalar property of matrices. Trace is the sum of diagonal elements, not a vector. Norm measures the length of a vector.
Which of the following is best described as a discrete random variable in a machine learning context?
Explanation: A discrete random variable can take only distinct, separate values, such as a count of spam messages. Temperature can take continuous values, so it's not discrete. Blood pressure and time are also continuous variables and can have fractional values.
Which concept allows updating the probability estimate for a hypothesis as more evidence is observed?
Explanation: Bayes' Theorem is used to update the probability of a hypothesis in light of new evidence. The Pythagorean Theorem is related to geometry, not probabilities. Linear regression is used for prediction, not for updating probabilities. K-means is a clustering method.
What is the primary mathematical motivation for adding a regularization term to a machine learning model's loss function?
Explanation: Regularization discourages excessively complex models by adding a penalty to the loss, helping prevent overfitting. Simply improving size is not the goal. Reducing training error to zero can cause overfitting, not less. Slowing computation is not a motivation.
What does a correlation coefficient of -1 indicate about the relationship between two variables, X and Y?
Explanation: A correlation of -1 signifies that one variable increases as the other decreases in a perfect linear way. No relationship would show a correlation near zero. A perfect positive relationship is described by +1, not -1. 'Unrelated' is inaccurate for a -1 correlation.