Sharpen your understanding of key machine-learning concepts by exploring essential tools, syntax, and methods in Scikit-Learn and PyTorch. This quiz covers core fundamentals, data manipulation, model training, and evaluation techniques, ideal for those seeking practical ML proficiency.
Which Scikit-Learn object would you typically use for supervised classification tasks such as predicting whether a customer will buy a product?
Explanation: KNeighborsClassifier is designed for supervised classification, making it ideal for predicting categorical outcomes, such as customer purchase decisions. LinearRegression is for regression, predicting continuous values rather than categories. KMeans is used for unsupervised clustering, not classification. PCA is a dimensionality reduction tool rather than a classifier.
In Scikit-Learn, which common Python object represents the feature data (X) passed to fit methods?
Explanation: Scikit-Learn models expect input data in the form of arrays, typically NumPy arrays or similar 2-dimensional structures. Lists and tuples are not generally accepted directly, as they lack necessary shape and type controls. Dictionaries do not maintain order required for features. Arrays provide the structured and efficient format needed for ML operations.
What function creates a tensor filled with zeros of shape (3, 2) in PyTorch?
Explanation: torch.zeros(3, 2) is the correct function for creating a tensor of zeros with the desired dimensions in PyTorch. tensor.zeros(3, 2) and pt.zeros((3,2)) are not valid function calls. zeros(3,2) misses the required module prefix and thus is not recognized.
Which Scikit-Learn function can be used to divide your dataset into training and testing subsets while maintaining randomization?
Explanation: The train_test_split function is specifically designed to split data into training and testing sets with optional randomization. split_data and divide_dataset are not Scikit-Learn functions. random_partition is not a recognized method for this purpose in Scikit-Learn.
Which PyTorch class is commonly used to optimize neural network parameters during training with stochastic gradient descent?
Explanation: torch.SGD is a standard PyTorch optimizer for stochastic gradient descent, commonly used to adjust parameters during training. torch.Fit is not a valid class. torch.Optim is the module, not the optimizer itself. torch.Adadelta is another optimizer but not the most widely used example for standard SGD.
Which method of a Scikit-Learn model do you call to get predicted labels for new input data after fitting?
Explanation: The predict() method generates predicted labels based on the trained model and new input data. fit() is for training the model, not prediction. score() evaluates model performance but does not provide predictions. transform() is mainly used for data transformation, especially in preprocessing.
How do you ensure a PyTorch tensor supports automatic gradient calculation during training?
Explanation: Setting tensor.requires_grad = True ensures that PyTorch tracks operations on the tensor for gradient calculation. set_requires_grad(True), enable_grad(), and activate_gradient() are not valid syntax in PyTorch. Only the 'requires_grad' attribute controls this behavior.
If you are training a regression model in PyTorch, which loss function is most suitable for measuring prediction error?
Explanation: MSELoss (mean squared error loss) is ideal for regression tasks, measuring the average squared difference between predictions and actual values. CrossEntropyLoss and NegativeLogLoss are better suited for classification. CategoricalLoss is not a recognized PyTorch loss function.
In Scikit-Learn, what type of value is typically returned by the score() method of a classifier?
Explanation: The score() method for classifiers returns the accuracy, which is the proportion of correctly classified examples. Loss is measured by separate loss functions, feature importance requires a different method, and prediction probabilities are retrieved using predict_proba(), not score().
What operation would you use in PyTorch to change a tensor's shape from (10, 2, 2) to (10, 4)?
Explanation: reshape(10, 4) is the correct way to alter the tensor's shape, ensuring data order is preserved. flatten() would convert it to (40,) instead of (10,4). resize and expand do not serve the purpose here, and may lead to unexpected results or errors.
Which Scikit-Learn transformer would you use to scale features so they have zero mean and unit variance, often necessary before training?
Explanation: StandardScaler standardizes features by removing the mean and scaling to unit variance. Normalizer rescales individual samples but does not center data. MinMaxScaler scales features to a fixed range, usually 0 to 1, and MaxAbsScaler scales based on maximum absolute value. Only StandardScaler achieves zero mean and unit variance.
In the context of model training in PyTorch, what does the term 'batch size' refer to?
Explanation: Batch size indicates how many data samples are processed simultaneously before updating the model. The number of epochs refers to how many times the model sees the entire dataset. Neuron count and layer count relate to network architecture, not batch processing during training.
Which process in Scikit-Learn converts a categorical string feature like ['red', 'blue', 'green'] into a binary matrix?
Explanation: OneHotEncoder transforms categorical variables into a binary matrix, which is essential for algorithms requiring numeric input. LabelEncoder converts strings to integer labels but does not produce a binary matrix. Imputer handles missing values. Normalizer adjusts magnitude, not encoding.
How do you access the parameters (weights) of a neural network model in PyTorch for inspection or modification?
Explanation: model.parameters() provides an iterator over network parameters such as weights and biases, allowing inspection and modification. model.weights and model.vars() are not valid. model.get_params() is used in some other frameworks but not here.
Which function helps you perform k-fold cross-validation in Scikit-Learn to assess your model's robustness?
Explanation: cross_val_score computes scores for your model using k-fold cross-validation, offering a measure of generalization. fit_predict is for clustering and is not associated with cross-validation. cross_fold_test and model_assess are not Scikit-Learn functions.