Quiz: Best Practices in Handling and Preprocessing User Input for LLMs — Questions & Answers

This quiz contains 10 questions. Below is a complete reference of all questions, answer choices, and correct answers. You can use this section to review after taking the interactive quiz above.

  1. Question 1: Why is chunking user input before submitting to an LLM important?

    Why should unstructured user data, such as text from forms or chat logs, be chunked before submitting it to a large language model?

    • It enhances retrieval quality and reduces hallucination risk
    • It increases computation time and model error
    • It merges unrelated data into one block
    • It always reduces the accuracy of responses
    • It ensures that the LLM receives only numeric input
    Show correct answer

    Correct answer: It enhances retrieval quality and reduces hallucination risk

  2. Question 2: Choosing Effective Chunk Size

    Which principle should guide the selection of chunk size when splitting user input for an LLM?

    • Balance between including enough context without exceeding the model's context window
    • Always use the smallest possible chunk (single word)
    • Chunks should always be at least 1000 characters
    • Larger chunks always produce more accurate answers
    • Chunking does not affect LLM performance
    Show correct answer

    Correct answer: Balance between including enough context without exceeding the model's context window

  3. Question 3: Overlap in Chunking

    What is the main purpose of introducing overlapping sections between chunks when preparing text for LLM input?

    • To preserve semantic context that may span across chunk boundaries
    • To increase redundancy and storage usage unnecessarily
    • To make chunking more random
    • To reduce the number of relevant chunks retrieved
    • To speed up the chunking process with no other benefit
    Show correct answer

    Correct answer: To preserve semantic context that may span across chunk boundaries

  4. Question 4: Semantic Chunking

    What does semantic chunking refer to when processing unstructured user text for LLMs?

    • Splitting text based on meaningful units such as headings and sections
    • Cutting text into fixed-size byte blocks only
    • Merging all data into a single chunk
    • Chunking at arbitrary word intervals
    • Ignoring data contained in tables and images
    Show correct answer

    Correct answer: Splitting text based on meaningful units such as headings and sections

  5. Question 5: Prompt Engineering for Extraction

    Which practice enhances the reliability of LLM data extraction from user input?

    • Providing clear instructions and optionally few-shot examples in prompts
    • Adding unrelated questions to confuse the model
    • Using only system prompts with no user data
    • Avoiding examples in the prompt at all costs
    • Submitting raw, unformatted input without prompting
    Show correct answer

    Correct answer: Providing clear instructions and optionally few-shot examples in prompts

  6. Question 6: Use of Constrained Decoding

    How can constrained decoding make LLM output handling user input safer and more structured?

    • By limiting the generation to valid output formats and values
    • By encouraging the model to hallucinate new data
    • By forcing the LLM to ignore input details
    • By generating outputs only in natural language
    • By randomly skipping parts of the text
    Show correct answer

    Correct answer: By limiting the generation to valid output formats and values

  7. Question 7: Handling Multimodal Input

    When should a vision-language model (VLM) be used instead of a text-only LLM for preprocessing user data?

    • When the user data includes rich images, tables, or non-machine-readable depictions
    • Only when the data is exclusively in plain text paragraphs
    • When audio files need to be transcribed
    • When input data is all in XML format
    • Whenever lower computation cost is the priority
    Show correct answer

    Correct answer: When the user data includes rich images, tables, or non-machine-readable depictions

  8. Question 8: Evaluating Extraction Performance

    Which metrics are most relevant for evaluating the effectiveness of structured data extraction from user input with LLMs?

    • Precision, recall, and F1-score based on TP, FP, and FN entries
    • Pixel accuracy and frame rate
    • Page count and file size
    • Number of chunks per request
    • Training time of the underlying model
    Show correct answer

    Correct answer: Precision, recall, and F1-score based on TP, FP, and FN entries

  9. Question 9: Handling Complex Document Structures

    How should unstructured documents containing tables, plots, and unusual formatting be handled before submitting to LLMs?

    • Preprocess with specialized tools to partition and extract structured elements
    • Ignore non-text elements and only submit paragraph text
    • Merge all data into a single text chunk regardless of type
    • Convert all content to images
    • Only process the text that directly follows tables and images
    Show correct answer

    Correct answer: Preprocess with specialized tools to partition and extract structured elements

  10. Question 10: Addressing Ambiguities in Text

    What is a recommended approach for splitting user input when pronouns or references may cross sentence boundaries?

    • Use chunking methods that incorporate sentence overlap or semantic similarity
    • Always split after every period, regardless of context
    • Do not chunk, send full text as-is
    • Split strictly by fixed character count only
    • Chunk at random intervals to diversify input
    Show correct answer

    Correct answer: Use chunking methods that incorporate sentence overlap or semantic similarity