Natural Language Processing — 9 Key Techniques Explained Quiz

Explore the essential NLP techniques that transform text into powerful insights for AI and machine learning applications. This quiz covers foundational concepts like tokenization, word embeddings, n-grams, and more.

  1. Tokenization

    Which NLP technique involves splitting a sequence of text into smaller units such as words, sentences, or characters?

    1. Stop Words
    2. Named Entity Recognition
    3. Tokenization
    4. Word Embeddings

    Explanation: Tokenization is the process of dividing text into tokens, such as words, sentences, or characters, making it a foundational step in NLP workflows. Stop words are filtered out as uninformative, but this step presupposes existing tokens. Named Entity Recognition identifies specific entities like names or locations, not general segmentation. Word embeddings refer to representing tokens as vectors but rely on prior tokenization.

  2. Stemming and Lemmatization

    What is the primary purpose of stemming and lemmatization in NLP?

    1. To split text into sentences
    2. To measure sentence similarity
    3. To identify important keywords
    4. To reduce words to their base or root form

    Explanation: Stemming and lemmatization aim to normalize words by reducing them to their base or dictionary form, improving consistency for analysis. Splitting text into sentences is handled by sentence tokenization, not stemming or lemmatization. Keyword identification uses frequency or significance metrics, and sentence similarity is assessed using other features like embeddings or n-grams.

  3. Bag of Words and TF-IDF

    How does the Bag of Words (BoW) technique represent a document for machine learning models?

    1. It encodes sentences as sequences of word vectors
    2. It counts the frequency of each word in a fixed vocabulary
    3. It parses sentences for grammatical structure
    4. It assigns weights to words based on context

    Explanation: BoW represents documents as vectors of word counts, capturing the frequency of each word from a predefined vocabulary. Encoding sentences as sequences of word vectors pertains to word embeddings. Parsing grammatical structure is related to syntactic analysis. Assigning weights based on context describes methods like attention or contextual embeddings, not BoW.

  4. N-grams

    Which statement best describes n-grams in the context of NLP?

    1. N-grams are sequences of n consecutive items from a given text
    2. N-grams remove irrelevant words from analysis
    3. N-grams are the most common keywords in a document
    4. N-grams identify subject and object in sentences

    Explanation: N-grams refer to overlapping sequences of n consecutive words, tokens, or characters and help capture local context. Identifying common keywords is a separate task, typically based on frequency or significance. Removing irrelevant words refers to stop word elimination. Recognizing grammatical roles like subject and object relates to parsing.

  5. Named Entity Recognition

    What does Named Entity Recognition (NER) specifically identify in text?

    1. It highlights punctuation and formatting
    2. It computes sentence sentiment
    3. It separates compound words into single tokens
    4. It detects and classifies proper nouns such as names, locations, or organizations

    Explanation: NER is focused on finding and classifying entities like people, places, or organizations within a text. Sentiment analysis deals with emotional polarity, not entity recognition. Decomposing compound words is related to tokenization, not NER. Punctuation and formatting are generally addressed during preprocessing, not in entity recognition.