Test your understanding of building a basic keyword search engine with a hash-map-based inverted index. This quiz covers term-frequency counting, result sorting, pagination, and effective caching strategies for repeated queries.
What is the primary role of an inverted index in a keyword search system over documents?
Explanation: An inverted index is designed to map each keyword to the documents in which it appears, enabling quick retrieval for searches. Storing the full content is not the main function; full content is typically stored separately. Converting documents into hash values is used for deduplication or security, not search. Encrypting queries relates to privacy, not inverted index structure.
Which data structure is most efficient for implementing an inverted index that allows fast lookup of documents by keyword?
Explanation: A hash map allows for constant-time lookups of each keyword, making it ideal for building an inverted index. Arrays do not support fast keyword-based search unless the array is sorted and scanned. Linked lists and stacks are inefficient for large-scale keyword lookups because they require iteration over elements.
When ranking documents for a keyword search, why is term frequency useful as a ranking signal?
Explanation: Term frequency shows the number of times a keyword appears in a document, helping identify more relevant results. Counting documents in the index is related to document frequency, not term frequency. Encrypting keywords is unrelated to ranking, and term frequency does not impact retrieval speed directly.
If you want to display the most relevant documents first in search results, which sorting method should you use when ranking with term frequency?
Explanation: Sorting in descending order by term frequency displays documents where the keyword is most frequent at the top, providing higher relevance. Alphabetical sorting and sorting by document size do not relate to keyword relevance. Random sorting is not useful for ranking search results.
In a search system, what is the primary advantage of implementing pagination for search results?
Explanation: Pagination presents search results in user-friendly sections, making browsing easier and improving performance. Increasing the number of matching documents is not related—pagination just structures the output. Grouping keywords and encrypting page data are unrelated to pagination's main purpose.
A search query returns 80 results, and you want to show 10 results per page. Which result indices will be displayed on the fourth page (using 1-based indexing)?
Explanation: With 10 results per page, page one shows 1–10, page two 11–20, page three 21–30, and page four 31–40. Results 10–19 are on page two, 40–49 would be page five, and 20–30 is page three. Thus, 31–40 is correct.
When caching search results for repeated queries, what should the cache key uniquely represent?
Explanation: A cache key should uniquely identify a query by including its keywords and parameters so the correct results are returned for the given input. The total document count and index hash do not distinguish queries, and browser type does not affect search result relevance.
If a user changes the paging parameter of a search (for example, moves from page 2 to page 3), what should happen to the cache key to retrieve the correct results?
Explanation: Including the page number makes each cached page unique, retrieving the correct results for that page. Keeping the key constant or using only keywords would cause incorrect results to be served. Encrypting the results set is unrelated to cache-key correctness.
What issue might arise if two different keywords are assigned the same hash index in a hash-map-based inverted index?
Explanation: Hash collisions risk mixing the document lists of distinct keywords, causing search errors. Collisions do not improve performance; rather, they require resolution. Keywords are not removed, and the index can still grow with proper collision handling.
How does introducing a cache for search queries optimize keyword search performance in a high-traffic system?
Explanation: Caching allows the system to quickly return results for repeated queries without redundant processing. It speeds up, rather than slows down, access. It does not force slow storage use or eliminate ranking signals such as term frequency.