Watch The Quiz in Action
Watch Now
Watch The Quiz in Action

Fundamentals of Large Language Model Techniques — Questions & Answers

Test your understanding of essential concepts and techniques in Large Language Models, including tokenization, efficient fine-tuning, decoding strategies, temperature settings, and masked language modeling. This quiz is designed for those seeking to grasp the basics of LLMs and their optimization in natural language processing applications.

This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.

  1. Question 1: Tokenization Basics

    What does tokenization accomplish in the context of large language models?

    • It compresses images for faster processing.
    • It breaks text into smaller units called tokens such as words or subwords.
    • It stores entire documents as single data points.
    • It translates text directly into another language.
    Show correct answer

    Correct answer: It breaks text into smaller units called tokens such as words or subwords.

    Explanation: Tokenization is the process of splitting text into smaller units (tokens), which may be words, subwords, or characters, making it possible for LLMs to process and understand the input. Translating text, storing documents, or compressing images are unrelated and do not fulfill the specific goal of turning language into model-compatible sequences.

  2. Question 2: Tokenization Example

    Given the word 'tokenization', how might a typical tokenization method process it?

    • 'tokenzation'
    • 'tokenization' as a whole
    • 'to', 'ken', 'iza', 'tion'
    • 'token' and 'ization'
    Show correct answer

    Correct answer: 'token' and 'ization'

    Explanation: Tokenization can split the word 'tokenization' into subwords like 'token' and 'ization', especially for managing rare words and reducing vocabulary size. Using the entire word as one token is possible but less flexible. Splitting into arbitrary small chunks like 'to', 'ken', 'iza', 'tion' is unlikely for most LLM approaches, and 'tokenzation' contains a typo and is not a correct splitting.

  3. Question 3: LoRA Overview

    In the context of LLM fine-tuning, what does LoRA (Low-Rank Adaptation) primarily achieve?

    • Removes layers to reduce accuracy
    • Increases the model size by adding duplicated layers
    • Adds trainable parameters to existing layers without increasing overall model size
    • Changes raw text into tokens
    Show correct answer

    Correct answer: Adds trainable parameters to existing layers without increasing overall model size

    Explanation: LoRA works by introducing new trainable parameters within the existing model, allowing changes in behavior without growing the total model size. It does not simply duplicate layers or remove layers (which could lower performance), nor does it handle tokenization.

  4. Question 4: QLoRA Function

    What is the main advantage of using QLoRA over standard LoRA?

    • It removes the need for tokenization.
    • It increases computation time by making the model larger.
    • It translates model outputs to new languages.
    • It further reduces memory usage by quantizing model weights to lower bits.
    Show correct answer

    Correct answer: It further reduces memory usage by quantizing model weights to lower bits.

    Explanation: QLoRA builds on LoRA by applying quantization, often to 4 bits, significantly reducing memory usage during training. It does not make the model larger or slower, nor is it related to translation or tokenization processes.

  5. Question 5: Beam Search Concept

    Which statement best describes beam search in text generation for LLMs?

    • It ignores all but the least probable words.
    • It randomly selects next words without ranking.
    • It always picks only the highest-probability word at every step.
    • It keeps multiple top candidate sequences at each step to find the most likely output.
    Show correct answer

    Correct answer: It keeps multiple top candidate sequences at each step to find the most likely output.

    Explanation: Beam search maintains several hypotheses at each step, allowing the model to explore multiple promising sequences rather than just the single best local choice. Greedy decoding picks only the top word, making it less flexible. Random or lowest-probability selection does not characterize beam search.

  6. Question 6: Greedy vs. Beam Search

    How does greedy decoding differ from beam search during LLM output generation?

    • Greedy decoding chooses only the highest-scoring word at each step.
    • Greedy decoding uses temperature to control randomness.
    • Greedy decoding sorts outputs alphabetically.
    • Greedy decoding maintains several possible word sequences in parallel.
    Show correct answer

    Correct answer: Greedy decoding chooses only the highest-scoring word at each step.

    Explanation: Greedy decoding selects the most probable next token at each step, resulting in a single path. Only beam search keeps multiple paths. Sorting alphabetically or adjusting temperature are not defining characteristics of greedy decoding.

  7. Question 7: Temperature Parameter

    What effect does increasing the temperature parameter in LLM text generation have?

    • It always chooses the most probable token, giving deterministic results.
    • It increases the diversity of possible outputs by making less likely tokens more probable.
    • It translates text into multiple languages automatically.
    • It decreases randomness, making text more repetitive.
    Show correct answer

    Correct answer: It increases the diversity of possible outputs by making less likely tokens more probable.

    Explanation: A higher temperature flattens the probability distribution, so unlikely tokens have a better chance of being selected, leading to more varied outputs. Lower temperature does the opposite by making the output more deterministic. Translation and repetition are not directly related to the temperature setting.

  8. Question 8: Low Temperature Behavior

    If the temperature parameter is set very close to zero during text generation, what is most likely to happen?

    • The model outputs are always random and surprising.
    • The model runs multiple decoding paths simultaneously.
    • The model splits all words into individual letters.
    • The model outputs become highly predictable and repetitive.
    Show correct answer

    Correct answer: The model outputs become highly predictable and repetitive.

    Explanation: A very low temperature increases the chance of picking the most likely tokens repeatedly, causing predictable and often repetitive responses. Maximum randomness is found with higher temperatures, not lower. Multiple paths are related to beam search, not temperature, and token splitting is tokenization, not temperature.

  9. Question 9: Masked Language Modeling Purpose

    What is the purpose of masked language modeling (MLM) during LLM pretraining?

    • To help models learn context by predicting missing tokens in a sentence.
    • To generate random text without context.
    • To translate masked text into different formats.
    • To reduce the number of model parameters.
    Show correct answer

    Correct answer: To help models learn context by predicting missing tokens in a sentence.

    Explanation: MLM helps models build contextual understanding by masking some tokens and asking the model to predict them, improving semantics. Translation or parameter reduction is not the goal of MLM. Generating purely random text is unrelated.

  10. Question 10: Masked vs. Unmasked Input

    Why does masking certain words in MLM improve LLM language understanding?

    • It ensures only common words are used in training.
    • It compresses sentence length for efficiency.
    • It forces the model to use context clues to infer missing information.
    • It reduces training time by skipping sentences.
    Show correct answer

    Correct answer: It forces the model to use context clues to infer missing information.

    Explanation: By masking words, the model must analyze the remaining context to correctly predict them, strengthening its grasp of language patterns. Skipping sentences, focusing only on common words, or shortening input do not contribute to contextual learning in the same targeted way.