Explore the essential NLP techniques that transform text into powerful insights for AI and machine learning applications. This quiz covers foundational concepts like tokenization, word embeddings, n-grams, and more.
Which NLP technique involves splitting a sequence of text into smaller units such as words, sentences, or characters?
Explanation: Tokenization is the process of dividing text into tokens, such as words, sentences, or characters, making it a foundational step in NLP workflows. Stop words are filtered out as uninformative, but this step presupposes existing tokens. Named Entity Recognition identifies specific entities like names or locations, not general segmentation. Word embeddings refer to representing tokens as vectors but rely on prior tokenization.
What is the primary purpose of stemming and lemmatization in NLP?
Explanation: Stemming and lemmatization aim to normalize words by reducing them to their base or dictionary form, improving consistency for analysis. Splitting text into sentences is handled by sentence tokenization, not stemming or lemmatization. Keyword identification uses frequency or significance metrics, and sentence similarity is assessed using other features like embeddings or n-grams.
How does the Bag of Words (BoW) technique represent a document for machine learning models?
Explanation: BoW represents documents as vectors of word counts, capturing the frequency of each word from a predefined vocabulary. Encoding sentences as sequences of word vectors pertains to word embeddings. Parsing grammatical structure is related to syntactic analysis. Assigning weights based on context describes methods like attention or contextual embeddings, not BoW.
Which statement best describes n-grams in the context of NLP?
Explanation: N-grams refer to overlapping sequences of n consecutive words, tokens, or characters and help capture local context. Identifying common keywords is a separate task, typically based on frequency or significance. Removing irrelevant words refers to stop word elimination. Recognizing grammatical roles like subject and object relates to parsing.
What does Named Entity Recognition (NER) specifically identify in text?
Explanation: NER is focused on finding and classifying entities like people, places, or organizations within a text. Sentiment analysis deals with emotional polarity, not entity recognition. Decomposing compound words is related to tokenization, not NER. Punctuation and formatting are generally addressed during preprocessing, not in entity recognition.