Explore the fundamental differences between database replication and sharding with these essential questions. This quiz helps clarify concepts, use cases, and terminology related to data distribution and fault tolerance.
Which primary purpose correctly distinguishes sharding from replication in database systems?
Explanation: Sharding is used to split and distribute data across multiple machines to achieve scalability, while replication creates copies of the same data to improve availability and fault tolerance. Option B reverses the definitions, making it incorrect. Option C misrepresents replication's role and incorrectly states that sharding increases redundancy. Option D introduces unrelated concepts of deletion and data addition, which are not accurate for either technique.
In a replication setup, what is true about the location of data copies?
Explanation: In replication, each replica stores a complete copy of the data on different machines to enhance availability. Option B describes sharding, where data is partitioned across servers. Option C incorrectly suggests replicas store only recent data, which is not a feature of replication. Option D is misleading, since replication is meant to work across multiple machines, not within one.
If user data is spread across three servers such that each server holds data for a different group of users, which technique is being applied?
Explanation: Sharding divides data among servers based on criteria like user groups or regions to distribute the load. Replication copies the entire dataset to several servers, not just parts. Mirroring is similar to replication but is not the same as distributing different portions of data. Indexing is unrelated to data distribution—it organizes data for faster searches.
Which technique primarily helps a database remain accessible even if a server fails?
Explanation: Replication increases data availability by keeping identical copies on multiple servers, so the system can continue if one server fails. Sharding distributes the data but losing a shard means part of the data is unavailable. Buffering and partitioning are different mechanisms that do not guarantee high availability in case of server failures.
When a database is sharded by region, what happens when a user from Asia queries their data?
Explanation: In sharding, only the relevant shard processes the query related to its data, reducing load on other shards. Option B incorrectly describes full dataset replication, not sharding. Option C confuses with replication terminology. Option D is false, as a query should work if the correct shard is online.
In replicated databases, what challenge can occur when handling write operations?
Explanation: Writes in replicated databases can lead to inconsistency if not properly synchronized, as replicas may become out of sync. Data loss is rare with proper configuration. Replication does not cause queries to work only for old data, and indexes are still important regardless of replication.
What is the main performance benefit provided by sharding in large databases?
Explanation: Sharding splits the data across servers, allowing workloads to be distributed and balanced as the dataset grows. Option B describes replication, not sharding. Option C contradicts sharding, which is about distributing data, not merging it. Option D is unrelated; encryption is a separate feature.
Which statement best applies to write operations in a sharded environment?
Explanation: In sharding, a write operation is directed to the shard responsible for that data segment, improving efficiency. Broadcasting writes to every shard is inefficient and incorrect. Single primary replica handling is a concept from some replication models, not sharding. Ignoring write requests is untrue for any robust data system.
In terms commonly used, a 'replica set' relates to replication, while a 'shard key' is associated with which concept?
Explanation: A shard key determines how data is split among the shards and is a key concept in sharding. Mirroring and replication can share datasets, but the term 'shard key' is not used with them. Caching is not related to data distribution techniques like sharding or replication.
For a system that needs to handle massive data growth while keeping query response times low, which technique is more suitable?
Explanation: Sharding is designed to scale both data storage and performance by distributing data across multiple servers. Replication improves availability and redundancy but doesn't itself address scaling with large data volumes. Snapshotting and archiving are backup strategies, not distribution techniques to improve query performance.