Explore the essential concepts of sharding and replication in distributed databases with this fundamentals quiz. Enhance your understanding of data partitioning, fault tolerance, and scalability strategies used to manage large-scale database systems.
Which of the following best describes sharding in the context of databases?
Explanation: Sharding involves splitting data into smaller, independent segments, each stored on a different server. This allows for improved scalability and performance as each shard can be managed separately. Duplicating data is related to replication, not sharding. Compressing data and table merging are unrelated to the concept of sharding in databases.
What is the main purpose of implementing replication in a database system?
Explanation: Replication creates additional copies of the same data on different servers, promoting data availability and fault tolerance in case one server fails. Minimizing network traffic and combining tables are not primary goals of replication. Enforcing unique constraints is typically handled through schema design, not through replication.
If an e-commerce database stores all users with last names starting with A-M on one server and N-Z on another, what practice is this an example of?
Explanation: Dividing users based on last name across different servers is an example of sharding, where data is split into distinct partitions (shards). Replication would involve duplicating the same data on multiple servers, not dividing it. Compression and sorting deal with data format and order, not distribution.
Which type of replication involves one primary server sending updates to multiple secondary servers?
Explanation: Master-slave replication has a primary server (master) that propagates updates to secondary servers (slaves), ensuring consistency and backup. Peer-to-peer replication involves all nodes acting as equals rather than designating a master. Client-server and circular replication refer to different architectures and are not standard terms for this relationship.
When all rows for a subset of users are stored on one server, this is known as horizontal sharding. What is vertical sharding?
Explanation: Vertical sharding splits a table by columns, storing different attributes on separate servers, while horizontal sharding divides data by rows. Duplicating tables is replication, not sharding. Sorting or encryption methods are unrelated to the structure of vertical sharding.
What problem might occur if there is a delay in applying updates on a replica server compared to the primary?
Explanation: Replication lag occurs when replica servers are delayed in reflecting the most recent updates from the primary server, potentially showing outdated data. Indexing lag relates to search speed and is not specifically about data synchronization. Sharding delay and partial commit are not terms used for this particular issue.
In a replicated system, what term describes when all copies of the data contain the same information?
Explanation: Consistency refers to all copies of data being the same across replicas, ensuring users receive accurate information. Redundancy refers to the existence of multiple copies, not their correctness. Concurrency is about simultaneous operations, and parity is commonly a term in error-checking or even/oddness, not data accuracy.
What is a potential challenge when querying data that is sharded by user ID across several servers?
Explanation: In sharded systems, performing joins across multiple shards can be challenging and may result in slower performance because the data is spread out. Data is not inherently out of sync or duplicated in sharding. Indexes can still be used within individual shards.
How can replication help scale out read operations in a distributed database system?
Explanation: Replication allows read operations to be distributed across several replica servers, increasing the system's read capacity and performance. Deleting data, partitioning tables, and reducing indices do not directly support scaling read operations via replication.
Why is the choice of a good shard key important in sharding a database?
Explanation: A well-chosen shard key ensures balanced data and query workloads, preventing bottlenecks and hot spots on any single shard. Shard keys do not influence replication directly, do not inherently increase latency, and have no connection with disabling write operations.