Quiz: Best Practices in Handling and Preprocessing User Input for LLMs Quiz

  1. Why is chunking user input before submitting to an LLM important?

    Why should unstructured user data, such as text from forms or chat logs, be chunked before submitting it to a large language model?

    1. It enhances retrieval quality and reduces hallucination risk
    2. It increases computation time and model error
    3. It merges unrelated data into one block
    4. It always reduces the accuracy of responses
    5. It ensures that the LLM receives only numeric input
  2. Choosing Effective Chunk Size

    Which principle should guide the selection of chunk size when splitting user input for an LLM?

    1. Balance between including enough context without exceeding the model's context window
    2. Always use the smallest possible chunk (single word)
    3. Chunks should always be at least 1000 characters
    4. Larger chunks always produce more accurate answers
    5. Chunking does not affect LLM performance
  3. Overlap in Chunking

    What is the main purpose of introducing overlapping sections between chunks when preparing text for LLM input?

    1. To preserve semantic context that may span across chunk boundaries
    2. To increase redundancy and storage usage unnecessarily
    3. To make chunking more random
    4. To reduce the number of relevant chunks retrieved
    5. To speed up the chunking process with no other benefit
  4. Semantic Chunking

    What does semantic chunking refer to when processing unstructured user text for LLMs?

    1. Splitting text based on meaningful units such as headings and sections
    2. Cutting text into fixed-size byte blocks only
    3. Merging all data into a single chunk
    4. Chunking at arbitrary word intervals
    5. Ignoring data contained in tables and images
  5. Prompt Engineering for Extraction

    Which practice enhances the reliability of LLM data extraction from user input?

    1. Providing clear instructions and optionally few-shot examples in prompts
    2. Adding unrelated questions to confuse the model
    3. Using only system prompts with no user data
    4. Avoiding examples in the prompt at all costs
    5. Submitting raw, unformatted input without prompting
  6. Use of Constrained Decoding

    How can constrained decoding make LLM output handling user input safer and more structured?

    1. By limiting the generation to valid output formats and values
    2. By encouraging the model to hallucinate new data
    3. By forcing the LLM to ignore input details
    4. By generating outputs only in natural language
    5. By randomly skipping parts of the text
  7. Handling Multimodal Input

    When should a vision-language model (VLM) be used instead of a text-only LLM for preprocessing user data?

    1. When the user data includes rich images, tables, or non-machine-readable depictions
    2. Only when the data is exclusively in plain text paragraphs
    3. When audio files need to be transcribed
    4. When input data is all in XML format
    5. Whenever lower computation cost is the priority
  8. Evaluating Extraction Performance

    Which metrics are most relevant for evaluating the effectiveness of structured data extraction from user input with LLMs?

    1. Precision, recall, and F1-score based on TP, FP, and FN entries
    2. Pixel accuracy and frame rate
    3. Page count and file size
    4. Number of chunks per request
    5. Training time of the underlying model
  9. Handling Complex Document Structures

    How should unstructured documents containing tables, plots, and unusual formatting be handled before submitting to LLMs?

    1. Preprocess with specialized tools to partition and extract structured elements
    2. Ignore non-text elements and only submit paragraph text
    3. Merge all data into a single text chunk regardless of type
    4. Convert all content to images
    5. Only process the text that directly follows tables and images
  10. Addressing Ambiguities in Text

    What is a recommended approach for splitting user input when pronouns or references may cross sentence boundaries?

    1. Use chunking methods that incorporate sentence overlap or semantic similarity
    2. Always split after every period, regardless of context
    3. Do not chunk, send full text as-is
    4. Split strictly by fixed character count only
    5. Chunk at random intervals to diversify input