Explore fundamental image preprocessing techniques used in computer vision,…
Start QuizDive into the practical applications of computer vision in…
Start QuizExplore key concepts in facial recognition and facial landmark…
Start QuizExplore fundamental concepts of image segmentation with questions on…
Start QuizChallenge your understanding of object detection and recognition principles…
Start QuizChallenge your understanding of SIFT, SURF, and ORB algorithms…
Start QuizChallenge your understanding of image filtering techniques and edge…
Start QuizSharpen your skills in understanding how digital images are…
Start QuizExplore key concepts in digital imaging and test your…
Start QuizSharpen your skills in computer vision fundamentals with this…
Start QuizExplore core concepts of Convolutional Neural Networks (CNNs) in computer vision, covering layers, operations, and basic architecture principles to reinforce your foundational understanding of image processing with neural networks.
This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.
In a typical Convolutional Neural Network for image classification, which type of layer is most commonly used to reduce the spatial dimensions of feature maps?
Correct answer: Pooling layer
Explanation: Pooling layers, such as max pooling or average pooling, are frequently used in CNNs to downsample or reduce the spatial dimensions of feature maps while retaining important information. While convolutional layers extract features, they usually do not reduce the dimensions as pooling does. Dense layers connect every neuron but do not manage spatial size, and dropout layers prevent overfitting without altering feature map dimensions.
When processing an image with CNNs, what is the main function of a kernel (or filter)?
Correct answer: To extract specific features like edges or textures from the input
Explanation: Kernels in convolutional layers move across the input image to detect local features such as edges and textures, enabling feature extraction for later processing. Randomly removing activations to reduce overfitting is the job of dropout, not kernels. Connecting neurons fully is handled by dense layers, while normalization is performed by other preprocessing steps, not the kernel itself.
In the context of convolutions, what does increasing the stride parameter accomplish when sliding a kernel over an image?
Correct answer: It reduces the spatial size of the feature map
Explanation: A higher stride means the kernel moves more pixels at each step, resulting in a smaller feature map and less computational cost. Increasing the number of output channels is controlled by the number of filters, not stride. A deeper network is achieved by stacking more layers, and while stride changes coverage, it does not necessarily improve edge detection accuracy.
Which activation function is most commonly used after convolutional layers in modern CNN architectures due to its simplicity and effectiveness?
Correct answer: ReLU (Rectified Linear Unit)
Explanation: The ReLU activation function is widely favored in CNNs because it accelerates convergence and helps alleviate the vanishing gradient problem. Sigmoid and tanh can be used but may lead to slower training and saturation issues. Softmax is typically used only in the output layer for classification, not directly after convolutions.
What is the primary role of padding in convolutional layers within CNNs?
Correct answer: To preserve the spatial dimensions of the input feature map
Explanation: Padding adds extra pixels (typically zeros) around the input feature map, ensuring that the spatial dimensions are preserved after convolution. It does not normalize pixel values, alter stride, or combine feature maps; those actions are handled by normalization, stride settings, or separate architectural techniques.
Suppose a convolutional layer outputs several feature maps when processing a grayscale image; what do these feature maps represent?
Correct answer: Different learned patterns or features at that layer
Explanation: Each feature map corresponds to a filter detecting a specific pattern or feature in the input image, such as textures or shapes. They do not represent random noise, which is unrelated to learned feature maps. Multiple color channels are a property of the input, not the feature maps, and probability scores are produced only at the very end in the output layer.
During the transition from convolutional layers to fully-connected layers in a CNN, what does the flattening operation perform?
Correct answer: It converts the multi-dimensional feature maps into a 1D vector
Explanation: Flattening takes the 2D or 3D output from convolutional and pooling layers and organizes it into a single long vector, suitable for input to fully-connected layers. It does not directly address overfitting, which is the role of dropout. Increasing map depth and non-linear transformations are handled by other layers.
Given a feature map, which pooling operation retains only the highest value from each region to downsample the map?
Correct answer: Max pooling
Explanation: Max pooling selects the highest value from each region covered by the pooling window, emphasizing the most prominent features. Min pooling, which is rarely used, would select the smallest value. Mean pooling takes the average, and soft pooling is a less common technique not typically used in basic CNNs.
What does parameter sharing mean in the context of CNNs?
Correct answer: The same set of kernel weights is used across different spatial locations in the input
Explanation: Parameter sharing in CNNs means that each filter (kernel) uses the same weights as it moves across various positions of the input, making the model more efficient. Sharing weights between networks describes transfer learning or model sharing, not parameter sharing. Duplicating weights or averaging activations are not characteristic of parameter sharing.
When a CNN achieves high accuracy on training data but performs poorly on new, unseen images, what is this scenario commonly called?
Correct answer: Overfitting
Explanation: Overfitting occurs when a model learns training data too closely and fails to perform well on new data because it cannot generalize. Underfitting is when a model performs poorly on both training and test data. Generalization is the desired ability to do well on new data. Normalization is a preprocessing step to scale data and is unrelated to model performance on unseen data.