Assess your understanding of shard key selection strategies with ten straightforward questions on best practices, considerations, and potential pitfalls in database sharding. This quiz helps reinforce core knowledge about effective shard key design for balanced and scalable distributed systems.
Which of the following is a primary objective when selecting a shard key for distributing data across multiple database nodes?
Explanation: The main goal in choosing a shard key is to balance both data and query load evenly, helping prevent any single node from becoming overloaded. Making queries slower or ensuring all data is stored on one shard contradicts the core purpose of sharding. Minimizing the number of shards defeats the intent of scale-out architectures.
What problem is most likely to occur if a shard key causes many records to be located on a small subset of shards, such as using a 'Country' field with few possible values?
Explanation: Using a shard key like 'Country' with limited distinct values can concentrate most data on certain shards, creating skew or 'hot spots'. This situation reduces efficiency and scalability. Load balancing and even distribution are actually prevented by such a key, and distributed locking errors are unrelated to key selection in this context.
Why is it important to align shard key selection with typical query patterns, such as always querying by 'user_id'?
Explanation: Choosing a shard key that matches common query filters helps the database route queries to the relevant shard quickly, improving efficiency. Randomizing writes is less relevant to query patterns, composite keys are not always necessary, and the need for indexes does not directly relate to this alignment.
Which characteristic best describes an effective shard key for a scalable system?
Explanation: A good shard key should have many distinct values (high cardinality) and match common query parameters. Low cardinality causes skew, while a constant value or assigning values randomly per day does not ensure balanced splitting or optimal querying.
What risk arises when selecting a monotonically increasing field, like a timestamp, as the shard key?
Explanation: Monotonically increasing keys can direct all new inserts to the same shard, overloading it and limiting write scalability. Complete query failure or simultaneous shard splitting is not a common outcome of this choice, and loss of data is generally not caused by shard key selection alone.
Which approach may help avoid range query hot spots when using a string-based shard key, such as a product code?
Explanation: Hashing the string randomizes shard assignment, avoiding sequential or alphabetical hot spots. Sorting alphabetically or picking only part of the code (like the first three characters or vowels) may still create uneven distribution if those portions are not sufficiently varied.
Once data is sharded using an initial shard key, what is usually required to change the shard key afterward?
Explanation: Changing the shard key typically requires moving data around the shards to realign with the new key, which is a complex and sometimes costly operation. Renaming a field or deleting data is not how shard keys are updated, and the process is not automatic in most systems.
How does low cardinality of a shard key, such as a 'Gender' field, typically affect distribution?
Explanation: A field with few unique values leads to only a few shards being heavily loaded, which undermines the purpose of sharding. Balanced data and query speed are not improved by low cardinality, and storage needs are not directly affected by key diversity.
In a multi-tenant architecture, which shard key strategy helps separate data per tenant while ensuring balanced distribution?
Explanation: Combining tenant ID with a field that spreads data within each tenant avoids overloading a single tenant's shard and maintains balance. Using only tenant ID could lead to skew if some tenants are much larger, and using the same key or assigning keys randomly per session may prevent logical grouping or balancing.
Which shard key choice best supports efficient range queries on a 'date' field without causing hotspots?
Explanation: Combining a hashed user ID with date spreads records across shards while still allowing for efficient date-based queries without overloading a single shard. Using only a date as a shard key causes hot-spot issues with sequential loads, while random UUIDs and unrelated fields like the title may not support efficient range queries.