Convolutional Neural Networks Essentials for Computer Vision Quiz

Explore core concepts of Convolutional Neural Networks (CNNs) in computer vision, covering layers, operations, and basic architecture principles to reinforce your foundational understanding of image processing with neural networks.

  1. Basic CNN Architecture

    In a typical Convolutional Neural Network for image classification, which type of layer is most commonly used to reduce the spatial dimensions of feature maps?

    1. Dropout layer
    2. Dense layer
    3. Convolutional layer
    4. Pooling layer

    Explanation: Pooling layers, such as max pooling or average pooling, are frequently used in CNNs to downsample or reduce the spatial dimensions of feature maps while retaining important information. While convolutional layers extract features, they usually do not reduce the dimensions as pooling does. Dense layers connect every neuron but do not manage spatial size, and dropout layers prevent overfitting without altering feature map dimensions.

  2. Kernel Function in CNNs

    When processing an image with CNNs, what is the main function of a kernel (or filter)?

    1. To extract specific features like edges or textures from the input
    2. To normalize pixel values across the image
    3. To connect every input neuron to every output neuron
    4. To reduce overfitting by randomly removing activations

    Explanation: Kernels in convolutional layers move across the input image to detect local features such as edges and textures, enabling feature extraction for later processing. Randomly removing activations to reduce overfitting is the job of dropout, not kernels. Connecting neurons fully is handled by dense layers, while normalization is performed by other preprocessing steps, not the kernel itself.

  3. Stride in Convolutions

    In the context of convolutions, what does increasing the stride parameter accomplish when sliding a kernel over an image?

    1. It makes the network deeper
    2. It increases the number of output channels
    3. It enhances the accuracy of edge detection
    4. It reduces the spatial size of the feature map

    Explanation: A higher stride means the kernel moves more pixels at each step, resulting in a smaller feature map and less computational cost. Increasing the number of output channels is controlled by the number of filters, not stride. A deeper network is achieved by stacking more layers, and while stride changes coverage, it does not necessarily improve edge detection accuracy.

  4. Activation Functions for CNNs

    Which activation function is most commonly used after convolutional layers in modern CNN architectures due to its simplicity and effectiveness?

    1. Sigmoid
    2. Softmax
    3. Tanh
    4. ReLU (Rectified Linear Unit)

    Explanation: The ReLU activation function is widely favored in CNNs because it accelerates convergence and helps alleviate the vanishing gradient problem. Sigmoid and tanh can be used but may lead to slower training and saturation issues. Softmax is typically used only in the output layer for classification, not directly after convolutions.

  5. Role of Padding in Convolutions

    What is the primary role of padding in convolutional layers within CNNs?

    1. To preserve the spatial dimensions of the input feature map
    2. To combine multiple feature maps into one
    3. To scale pixel values between 0 and 1
    4. To increase the stride of the convolution

    Explanation: Padding adds extra pixels (typically zeros) around the input feature map, ensuring that the spatial dimensions are preserved after convolution. It does not normalize pixel values, alter stride, or combine feature maps; those actions are handled by normalization, stride settings, or separate architectural techniques.

  6. Interpreting Feature Maps

    Suppose a convolutional layer outputs several feature maps when processing a grayscale image; what do these feature maps represent?

    1. Final probability scores for classes
    2. Different learned patterns or features at that layer
    3. Random noise added during training
    4. Multiple color channels of the input image

    Explanation: Each feature map corresponds to a filter detecting a specific pattern or feature in the input image, such as textures or shapes. They do not represent random noise, which is unrelated to learned feature maps. Multiple color channels are a property of the input, not the feature maps, and probability scores are produced only at the very end in the output layer.

  7. Flattening in CNNs

    During the transition from convolutional layers to fully-connected layers in a CNN, what does the flattening operation perform?

    1. It increases the depth of the feature maps
    2. It converts the multi-dimensional feature maps into a 1D vector
    3. It performs non-linear transformations
    4. It reduces overfitting by dropouts

    Explanation: Flattening takes the 2D or 3D output from convolutional and pooling layers and organizes it into a single long vector, suitable for input to fully-connected layers. It does not directly address overfitting, which is the role of dropout. Increasing map depth and non-linear transformations are handled by other layers.

  8. Pooling Layer Types

    Given a feature map, which pooling operation retains only the highest value from each region to downsample the map?

    1. Min pooling
    2. Max pooling
    3. Soft pooling
    4. Mean pooling

    Explanation: Max pooling selects the highest value from each region covered by the pooling window, emphasizing the most prominent features. Min pooling, which is rarely used, would select the smallest value. Mean pooling takes the average, and soft pooling is a less common technique not typically used in basic CNNs.

  9. Parameter Sharing in CNNs

    What does parameter sharing mean in the context of CNNs?

    1. Sharing model weights between different networks
    2. The same set of kernel weights is used across different spatial locations in the input
    3. Averaging the activations of multiple layers
    4. Duplicating all weights throughout the network

    Explanation: Parameter sharing in CNNs means that each filter (kernel) uses the same weights as it moves across various positions of the input, making the model more efficient. Sharing weights between networks describes transfer learning or model sharing, not parameter sharing. Duplicating weights or averaging activations are not characteristic of parameter sharing.

  10. Overfitting in CNNs

    When a CNN achieves high accuracy on training data but performs poorly on new, unseen images, what is this scenario commonly called?

    1. Generalization
    2. Normalization
    3. Overfitting
    4. Underfitting

    Explanation: Overfitting occurs when a model learns training data too closely and fails to perform well on new data because it cannot generalize. Underfitting is when a model performs poorly on both training and test data. Generalization is the desired ability to do well on new data. Normalization is a preprocessing step to scale data and is unrelated to model performance on unseen data.