Understanding LLMs
Which statement best describes a Large Language Model (LLM)?
- A handwritten set of language rules for translation.
- A simple algorithm for sorting text alphabetically.
- A small database of vocabulary and grammar rules.
- A large-scale neural network trained on vast text data to understand and generate human language.
- A collection of unrelated images and audio files.
Transformer Structure
Which of the following is a key component of the Transformer architecture commonly used in LLMs?
- Single-layer perceptron
- Tree-to-sequence mapping
- Random forest splitting
- Multi-head attention
- Convolutional pooling
Training Processes
What is the main difference between pretraining and fine-tuning in large language models (LLMs)?
- Fine-tuning always results in a larger model size.
- Pretraining uses only labeled data; fine-tuning uses unlabeled data.
- Pretraining trains on images; fine-tuning trains on audio.
- Pretraining learns general patterns; fine-tuning adapts to specific tasks.
- Pretraining occurs after fine-tuning has finished.
Tokenization Basics
Why is tokenization important in working with LLMs?
- It sorts words alphabetically before training.
- It adds random noise to improve results.
- It splits text into smaller units for model processing.
- It translates text into other languages automatically.
- It filters out numbers from the dataset.
Working with Context
What does 'context window' mean when referring to LLMs such as GPT-3?
- The order of tokenization in the dataset.
- The total training time for the model.
- A collection of random sentences fed to the model.
- The screen size required for model output.
- The maximum number of tokens the model can process at once.
Masked Language Modeling
How does a masked language model like BERT typically operate?
- By predicting words that have been masked in a sentence using surrounding context.
- By generating new stories from scratch.
- By replacing all punctuation with tokens.
- By translating masked words into different languages.
- By removing all stopwords from text.
Model Comparisons
Which feature distinguishes BERT from GPT models?
- BERT cannot be fine-tuned; GPT can.
- GPT trains solely on labeled data; BERT does not.
- Both use decision tree architectures.
- BERT is bidirectional, focusing on understanding; GPT is autoregressive, focusing on generation.
- BERT only processes images; GPT only processes audio.
Understanding Attention
In a transformer, what do attention weights represent?
- How much focus the model gives to each token when processing input.
- The font size of input text.
- The physical memory used for training.
- The size of the training dataset.
- The total number of layers in the model.
Deployment Challenges
Which is a common limitation of deploying LLMs in production environments?
- Complete absence of bias in outputs.
- High resource consumption and latency issues.
- Unlimited context window size for tokens.
- Instantaneous real-time performance without errors.
- Zero risk of data privacy concerns.
Improving Model Safety
Which approach helps reduce the chances of an LLM generating biased or toxic content?
- Doubling the batch size each epoch
- Using grayscale images during training
- Reinforcement learning with human feedback (RLHF)
- Lowering the model’s parameter count
- Randomly removing output tokens