Explore core concepts and techniques in string hashing and rolling hash with this engaging quiz. Strengthen your understanding of hash functions, collision prevention, modulus properties, and real-world applications relevant to efficient string processing.
When applying a hash function to a string using modulus N, such as hash(s) mod N, what is the guaranteed range of possible output values for the hash?
Explanation: The output of hash(s) mod N is always in the range 0 to N-1, as the modulus operation returns remainders starting from zero up to one less than the modulus. The distractor '1 to N' is incorrect because it includes N, which is never possible. '1 to N-1' excludes zero, which can be an output. '0 to N' is incorrect because N is not included in a modulus result.
Which key advantage does the rolling hash technique provide when calculating hash values for all substrings of length k in a long text?
Explanation: The main advantage of rolling hash is that it lets you update the hash value in constant time when moving from one substring to the next, significantly speeding up searching. Rolling hash does not prevent hash collisions, so the second option is incorrect. It doesn’t reduce the string size or eliminate the need for the modulus operation, making the last two distractors incorrect.
While comparing hashes of two different strings, which scenario is most likely to result in a hash collision when using a basic hash function with a small modulus value (e.g., mod 100)?
Explanation: A small modulus increases the chance that different strings will have the same remainder after division, leading to hash collisions. The second option is unrelated to the modulus's impact. The third option, using a unique prime for each character, can help reduce collisions. Hash functions typically use non-negative results, making the last option incorrect.
Which of the following problems is commonly solved using a rolling hash algorithm in practical applications?
Explanation: Rolling hashes are widely used in substring search algorithms, enabling quick comparisons with constant-time updates for sliding windows. Finding the longest common subsequence often uses dynamic programming, not rolling hash. Counting character occurrences and sorting arrays don't benefit from rolling hash techniques, making those options incorrect.
Why is it a common practice to choose a large prime number as the modulus when implementing rolling hash functions?
Explanation: A large prime modulus helps distribute hash values more evenly, reducing the likelihood of collisions. It does not result in hash values always being even, so the second option is incorrect. Primes do not guarantee shorter hash values, and increasing collision resistance doesn't overwrite data, making the last two distractors inappropriate.