Transformers Tokenization and Fine-Tuning Fundamentals Quiz Quiz

Assess your understanding of tokenization techniques and fine-tuning strategies in modern transformer models. This quiz features key concepts, useful practices, and common terminology in natural language processing, helping you grasp essential steps for effective model customization.

  1. Tokenization Basics

    What is the main role of tokenization when preparing text data for a transformer-based model?

    1. To find synonyms for each word
    2. To translate text into another language
    3. To sort text alphabetically
    4. To split text into smaller units such as words or subwords

    Explanation: Tokenization breaks down text into units that models can process, such as words or subwords, enabling effective embedding and analysis. Translating or sorting text does not create meaningful input for language models. Finding synonyms changes the meaning rather than preparing data for input.

  2. Special Tokens in Tokenization

    Why are special tokens like [CLS] and [SEP] added during tokenization in transformer models?

    1. To correct grammar errors
    2. To help with sentence classification and separation
    3. To increase word frequency
    4. To randomize input order

    Explanation: Special tokens such as [CLS] and [SEP] help models recognize sentence boundaries or identify the start of sequences, crucial for various tasks. They are not used for grammatical corrections or altering word frequencies. Randomizing inputs would disrupt model understanding.

  3. Types of Tokenizers

    Which tokenizer type most effectively manages rare or unknown words by splitting them into subword units?

    1. Byte Pair Encoding tokenizer
    2. Whole word tokenizer
    3. Rule-based tokenizer
    4. Character-case tokenizer

    Explanation: Byte Pair Encoding (BPE) tokenizers decompose rare words into frequent subwords, allowing better handling of unfamiliar vocabulary. Rule-based tokenizers typically split on whitespace or punctuation and may not handle rare words efficiently. Whole word tokenizers don't split words, while character-case tokenizers are not standard in this context.

  4. Padding and Attention Masks

    When batching input sequences of different lengths, what is the purpose of padding?

    1. To make all sequences the same length for efficient parallel processing
    2. To select the most important tokens for attention
    3. To hide sensitive personal information
    4. To accelerate tokenization

    Explanation: Padding ensures that all input sequences in a batch are of equal length, facilitating efficient computation. It does not serve privacy or speed up tokenization tasks directly. Attention selection is managed through separate mechanisms, not padding.

  5. Fine-Tuning Objective

    In the context of fine-tuning, what does it mean to customize a pre-trained transformer on a new dataset?

    1. To remove training data to reduce overfitting
    2. To only change the tokenizer
    3. To build a new model architecture from scratch
    4. To adjust model weights based on task-specific labeled examples

    Explanation: Fine-tuning involves continuing to train a pre-trained model with a new, task-specific dataset so its weights adapt to the targeted application. Creating a new architecture is not fine-tuning, merely adjusting the tokenizer is insufficient, and reducing data typically reduces model performance rather than fine-tune.

  6. Transfer Learning Benefit

    What is one main benefit of using a pre-trained transformer model before fine-tuning on your own text data?

    1. It predicts future training labels automatically
    2. It removes the need for tokenization
    3. It requires less data to achieve good results
    4. It guarantees perfect accuracy

    Explanation: Leveraging a pre-trained model allows the use of less task-specific data because the model has already learned general language patterns. Perfect accuracy is never guaranteed, tokenization is still required, and the model does not predict labels for future data without training.

  7. Truncation in Tokenization

    If an input sentence is longer than the model's maximum sequence length, what is typically done during tokenization?

    1. The sentence is truncated to fit the allowed length
    2. The sentence is reversed
    3. The sentence is discarded from the dataset
    4. Additional padding is added to exceed the limit

    Explanation: Truncating longer sequences ensures consistency with the model's input size limits. Adding padding beyond this limit is not supported, and discarding data reduces dataset size unnecessarily. Reversing sequences is not a standard practice in tokenization.

  8. Importance of Label Mapping

    During fine-tuning for classification, why is it important to map category names to integer labels?

    1. Because models process numbers, not text, as output labels
    2. Because mapping decreases the number of tokens needed
    3. Because names help the optimizer adjust weights faster
    4. Because mapping makes the data longer

    Explanation: Most models expect integer-encoded labels for classification tasks. Lengthening the data or decreasing token numbers are not reasons for label mapping, and optimizer speed is not directly influenced by label names.

  9. Freezing Layers During Fine-Tuning

    What does it mean to 'freeze' some layers of a transformer model during fine-tuning?

    1. To prevent their weights from updating during training
    2. To convert floating-point weights to integers
    3. To increase their training speed
    4. To change their position in the model

    Explanation: Freezing layers means their parameters do not change, preserving previously learned representations and sometimes reducing computational cost. It does not refer to numerical type changes or physical relocation in architecture. Increasing speed is a possible effect, not the definition.

  10. Handling Unknown Words

    When a tokenizer encounters a completely unknown word not in its vocabulary, what typically happens?

    1. It replaces it with a random word from the vocabulary
    2. It automatically learns the word's meaning
    3. It deletes the word from the input
    4. It assigns a special unknown token to that word

    Explanation: Unknown tokens signal the model that a part of the input was not represented in the vocabulary. Replacing, deleting, or auto-learning meanings are not standard in tokenization, as complete context or semantics would be lost or misrepresented.