Quick Practice: Fundamentals of Scikit-Learn & PyTorch in ML Quiz

Sharpen your understanding of key machine-learning concepts by exploring essential tools, syntax, and methods in Scikit-Learn and PyTorch. This quiz covers core fundamentals, data manipulation, model training, and evaluation techniques, ideal for those seeking practical ML proficiency.

Choosing the Right Estimator
Which Scikit-Learn object would you typically use for supervised classification tasks such as predicting whether a customer will buy a product?
1. LinearRegression
2. KNeighborsClassifier
3. KMeans
4. PCA
Explanation: KNeighborsClassifier is designed for supervised classification, making it ideal for predicting categorical outcomes, such as customer purchase decisions. LinearRegression is for regression, predicting continuous values rather than categories. KMeans is used for unsupervised clustering, not classification. PCA is a dimensionality reduction tool rather than a classifier.
Data Structure for Models
In Scikit-Learn, which common Python object represents the feature data (X) passed to fit methods?
1. list
2. tuple
3. array
4. dictionary
Explanation: Scikit-Learn models expect input data in the form of arrays, typically NumPy arrays or similar 2-dimensional structures. Lists and tuples are not generally accepted directly, as they lack necessary shape and type controls. Dictionaries do not maintain order required for features. Arrays provide the structured and efficient format needed for ML operations.
PyTorch Tensor Creation
What function creates a tensor filled with zeros of shape (3, 2) in PyTorch?
1. tensor.zeros(3, 2)
2. torch.zeros(3, 2)
3. zeros(3,2)
4. pt.zeros((3,2))
Explanation: torch.zeros(3, 2) is the correct function for creating a tensor of zeros with the desired dimensions in PyTorch. tensor.zeros(3, 2) and pt.zeros((3,2)) are not valid function calls. zeros(3,2) misses the required module prefix and thus is not recognized.
Splitting Data
Which Scikit-Learn function can be used to divide your dataset into training and testing subsets while maintaining randomization?
1. train_test_split
2. split_data
3. divide_dataset
4. random_partition
Explanation: The train_test_split function is specifically designed to split data into training and testing sets with optional randomization. split_data and divide_dataset are not Scikit-Learn functions. random_partition is not a recognized method for this purpose in Scikit-Learn.
Optimizers in PyTorch
Which PyTorch class is commonly used to optimize neural network parameters during training with stochastic gradient descent?
1. torch.SGD
2. torch.Fit
3. torch.Optim
4. torch.Adadelta
Explanation: torch.SGD is a standard PyTorch optimizer for stochastic gradient descent, commonly used to adjust parameters during training. torch.Fit is not a valid class. torch.Optim is the module, not the optimizer itself. torch.Adadelta is another optimizer but not the most widely used example for standard SGD.
Accessing Model Predictions
Which method of a Scikit-Learn model do you call to get predicted labels for new input data after fitting?
1. fit()
2. score()
3. predict()
4. transform()
Explanation: The predict() method generates predicted labels based on the trained model and new input data. fit() is for training the model, not prediction. score() evaluates model performance but does not provide predictions. transform() is mainly used for data transformation, especially in preprocessing.
Tensor Gradient Requirement
How do you ensure a PyTorch tensor supports automatic gradient calculation during training?
1. set_requires_grad(True)
2. tensor.requires_grad = True
3. enable_grad()
4. activate_gradient()
Explanation: Setting tensor.requires_grad = True ensures that PyTorch tracks operations on the tensor for gradient calculation. set_requires_grad(True), enable_grad(), and activate_gradient() are not valid syntax in PyTorch. Only the 'requires_grad' attribute controls this behavior.
Loss Functions for Regression
If you are training a regression model in PyTorch, which loss function is most suitable for measuring prediction error?
1. CrossEntropyLoss
2. MSELoss
3. NegativeLogLoss
4. CategoricalLoss
Explanation: MSELoss (mean squared error loss) is ideal for regression tasks, measuring the average squared difference between predictions and actual values. CrossEntropyLoss and NegativeLogLoss are better suited for classification. CategoricalLoss is not a recognized PyTorch loss function.
Model Evaluation Score
In Scikit-Learn, what type of value is typically returned by the score() method of a classifier?
1. Accuracy
2. Loss
3. Feature importance
4. Prediction probabilities
Explanation: The score() method for classifiers returns the accuracy, which is the proportion of correctly classified examples. Loss is measured by separate loss functions, feature importance requires a different method, and prediction probabilities are retrieved using predict_proba(), not score().
Reshaping Data for Neural Networks
What operation would you use in PyTorch to change a tensor's shape from (10, 2, 2) to (10, 4)?
1. flatten()
2. reshape(10, 4)
3. resize(10,4)
4. expand(10,4)
Explanation: reshape(10, 4) is the correct way to alter the tensor's shape, ensuring data order is preserved. flatten() would convert it to (40,) instead of (10,4). resize and expand do not serve the purpose here, and may lead to unexpected results or errors.
Standardizing Data
Which Scikit-Learn transformer would you use to scale features so they have zero mean and unit variance, often necessary before training?
1. StandardScaler
2. Normalizer
3. MinMaxScaler
4. MaxAbsScaler
Explanation: StandardScaler standardizes features by removing the mean and scaling to unit variance. Normalizer rescales individual samples but does not center data. MinMaxScaler scales features to a fixed range, usually 0 to 1, and MaxAbsScaler scales based on maximum absolute value. Only StandardScaler achieves zero mean and unit variance.
Batch Size Concept
In the context of model training in PyTorch, what does the term 'batch size' refer to?
1. The number of epochs
2. Data points processed at once
3. Neuron count per layer
4. Total layer count
Explanation: Batch size indicates how many data samples are processed simultaneously before updating the model. The number of epochs refers to how many times the model sees the entire dataset. Neuron count and layer count relate to network architecture, not batch processing during training.
One-Hot Encoding
Which process in Scikit-Learn converts a categorical string feature like ['red', 'blue', 'green'] into a binary matrix?
1. OneHotEncoder
2. LabelEncoder
3. Imputer
4. Normalizer
Explanation: OneHotEncoder transforms categorical variables into a binary matrix, which is essential for algorithms requiring numeric input. LabelEncoder converts strings to integer labels but does not produce a binary matrix. Imputer handles missing values. Normalizer adjusts magnitude, not encoding.
Accessing Model Weights
How do you access the parameters (weights) of a neural network model in PyTorch for inspection or modification?
1. model.weights
2. model.get_params()
3. model.parameters()
4. model.vars()
Explanation: model.parameters() provides an iterator over network parameters such as weights and biases, allowing inspection and modification. model.weights and model.vars() are not valid. model.get_params() is used in some other frameworks but not here.
Cross-Validation
Which function helps you perform k-fold cross-validation in Scikit-Learn to assess your model's robustness?
1. cross_val_score
2. fit_predict
3. cross_fold_test
4. model_assess
Explanation: cross_val_score computes scores for your model using k-fold cross-validation, offering a measure of generalization. fit_predict is for clustering and is not associated with cross-validation. cross_fold_test and model_assess are not Scikit-Learn functions.

Quick Practice: Fundamentals of Scikit-Learn & PyTorch in ML Quiz

Choosing the Right Estimator

Data Structure for Models

PyTorch Tensor Creation

Splitting Data

Optimizers in PyTorch

Accessing Model Predictions

Tensor Gradient Requirement

Loss Functions for Regression

Model Evaluation Score

Reshaping Data for Neural Networks

Standardizing Data

Batch Size Concept

One-Hot Encoding

Accessing Model Weights

Cross-Validation