Sharding and Replication Fundamentals Quiz Quiz

Explore the essential concepts of sharding and replication in distributed databases with this fundamentals quiz. Enhance your understanding of data partitioning, fault tolerance, and scalability strategies used to manage large-scale database systems.

  1. Definition of Sharding

    Which of the following best describes sharding in the context of databases?

    1. Dividing data into smaller, independent segments across multiple servers
    2. Merging two tables into one
    3. Creating duplicate copies of whole databases on different servers
    4. Compressing data to save disk space

    Explanation: Sharding involves splitting data into smaller, independent segments, each stored on a different server. This allows for improved scalability and performance as each shard can be managed separately. Duplicating data is related to replication, not sharding. Compressing data and table merging are unrelated to the concept of sharding in databases.

  2. Objective of Replication

    What is the main purpose of implementing replication in a database system?

    1. To improve data availability and fault tolerance
    2. To automatically combine multiple tables
    3. To enforce unique constraints on fields
    4. To minimize network traffic

    Explanation: Replication creates additional copies of the same data on different servers, promoting data availability and fault tolerance in case one server fails. Minimizing network traffic and combining tables are not primary goals of replication. Enforcing unique constraints is typically handled through schema design, not through replication.

  3. Sharding Example

    If an e-commerce database stores all users with last names starting with A-M on one server and N-Z on another, what practice is this an example of?

    1. Replication
    2. Compression
    3. Sorting
    4. Sharding

    Explanation: Dividing users based on last name across different servers is an example of sharding, where data is split into distinct partitions (shards). Replication would involve duplicating the same data on multiple servers, not dividing it. Compression and sorting deal with data format and order, not distribution.

  4. Replication Type Identification

    Which type of replication involves one primary server sending updates to multiple secondary servers?

    1. Peer-to-peer replication
    2. Master-slave replication
    3. Circular replication
    4. Client-server replication

    Explanation: Master-slave replication has a primary server (master) that propagates updates to secondary servers (slaves), ensuring consistency and backup. Peer-to-peer replication involves all nodes acting as equals rather than designating a master. Client-server and circular replication refer to different architectures and are not standard terms for this relationship.

  5. Horizontal vs. Vertical Sharding

    When all rows for a subset of users are stored on one server, this is known as horizontal sharding. What is vertical sharding?

    1. Storing encrypted and unencrypted data separately
    2. Sorting data alphabetically
    3. Duplicating the same table across servers
    4. Storing different columns of a table on separate servers

    Explanation: Vertical sharding splits a table by columns, storing different attributes on separate servers, while horizontal sharding divides data by rows. Duplicating tables is replication, not sharding. Sorting or encryption methods are unrelated to the structure of vertical sharding.

  6. Replication Lag

    What problem might occur if there is a delay in applying updates on a replica server compared to the primary?

    1. Indexing lag
    2. Replication lag
    3. Partial commit
    4. Sharding delay

    Explanation: Replication lag occurs when replica servers are delayed in reflecting the most recent updates from the primary server, potentially showing outdated data. Indexing lag relates to search speed and is not specifically about data synchronization. Sharding delay and partial commit are not terms used for this particular issue.

  7. Data Consistency in Replication

    In a replicated system, what term describes when all copies of the data contain the same information?

    1. Parity
    2. Concurrency
    3. Consistency
    4. Redundancy

    Explanation: Consistency refers to all copies of data being the same across replicas, ensuring users receive accurate information. Redundancy refers to the existence of multiple copies, not their correctness. Concurrency is about simultaneous operations, and parity is commonly a term in error-checking or even/oddness, not data accuracy.

  8. Challenge with Sharding

    What is a potential challenge when querying data that is sharded by user ID across several servers?

    1. Indexes cannot be used
    2. Data is always out of sync
    3. All data must be duplicated
    4. Cross-shard joins can be complex and slower

    Explanation: In sharded systems, performing joins across multiple shards can be challenging and may result in slower performance because the data is spread out. Data is not inherently out of sync or duplicated in sharding. Indexes can still be used within individual shards.

  9. Replication for Read Scaling

    How can replication help scale out read operations in a distributed database system?

    1. By directing read queries to multiple replica servers
    2. By deleting old data regularly
    3. By partitioning tables into smaller pieces
    4. By reducing the number of indices

    Explanation: Replication allows read operations to be distributed across several replica servers, increasing the system's read capacity and performance. Deleting data, partitioning tables, and reducing indices do not directly support scaling read operations via replication.

  10. Shard Key Choice

    Why is the choice of a good shard key important in sharding a database?

    1. It determines how evenly data and queries are distributed across shards
    2. It disables write operations
    3. It improves how data is replicated
    4. It increases network latency

    Explanation: A well-chosen shard key ensures balanced data and query workloads, preventing bottlenecks and hot spots on any single shard. Shard keys do not influence replication directly, do not inherently increase latency, and have no connection with disabling write operations.