Replication vs Sharding: Key Differences Quiz Quiz

Explore the fundamental differences between database replication and sharding with these essential questions. This quiz helps clarify concepts, use cases, and terminology related to data distribution and fault tolerance.

  1. Purpose Distinction

    Which primary purpose correctly distinguishes sharding from replication in database systems?

    1. Sharding distributes data for scalability, while replication copies data for availability.
    2. Sharding increases redundancy, while replication splits data for better performance.
    3. Sharding copies data for backup, while replication splits data for faster queries.
    4. Sharding deletes duplicate data, while replication adds more data.

    Explanation: Sharding is used to split and distribute data across multiple machines to achieve scalability, while replication creates copies of the same data to improve availability and fault tolerance. Option B reverses the definitions, making it incorrect. Option C misrepresents replication's role and incorrectly states that sharding increases redundancy. Option D introduces unrelated concepts of deletion and data addition, which are not accurate for either technique.

  2. Data Location in Replication

    In a replication setup, what is true about the location of data copies?

    1. Replication embeds data copies within a single machine.
    2. Every replica stores only the newest data.
    3. Each replica contains only part of the data on one machine.
    4. Each replica contains all data on separate machines.

    Explanation: In replication, each replica stores a complete copy of the data on different machines to enhance availability. Option B describes sharding, where data is partitioned across servers. Option C incorrectly suggests replicas store only recent data, which is not a feature of replication. Option D is misleading, since replication is meant to work across multiple machines, not within one.

  3. Data Distribution Example

    If user data is spread across three servers such that each server holds data for a different group of users, which technique is being applied?

    1. Replication
    2. Mirroring
    3. Indexing
    4. Sharding

    Explanation: Sharding divides data among servers based on criteria like user groups or regions to distribute the load. Replication copies the entire dataset to several servers, not just parts. Mirroring is similar to replication but is not the same as distributing different portions of data. Indexing is unrelated to data distribution—it organizes data for faster searches.

  4. Resilience to Failures

    Which technique primarily helps a database remain accessible even if a server fails?

    1. Sharding
    2. Buffering
    3. Partitioning
    4. Replication

    Explanation: Replication increases data availability by keeping identical copies on multiple servers, so the system can continue if one server fails. Sharding distributes the data but losing a shard means part of the data is unavailable. Buffering and partitioning are different mechanisms that do not guarantee high availability in case of server failures.

  5. Query Handling in Sharding

    When a database is sharded by region, what happens when a user from Asia queries their data?

    1. All shards must process every user query.
    2. The query is refused unless all shards are online.
    3. Replicas synchronize the user's data before responding.
    4. Only the shard containing Asia's data responds to the query.

    Explanation: In sharding, only the relevant shard processes the query related to its data, reducing load on other shards. Option B incorrectly describes full dataset replication, not sharding. Option C confuses with replication terminology. Option D is false, as a query should work if the correct shard is online.

  6. Write Operations and Consistency

    In replicated databases, what challenge can occur when handling write operations?

    1. Inconsistent data if updates are not synchronized across replicas
    2. Queries only work for old data
    3. Loss of all data during replication
    4. Replication removes the need for indexes

    Explanation: Writes in replicated databases can lead to inconsistency if not properly synchronized, as replicas may become out of sync. Data loss is rare with proper configuration. Replication does not cause queries to work only for old data, and indexes are still important regardless of replication.

  7. Main Benefit of Sharding

    What is the main performance benefit provided by sharding in large databases?

    1. It automatically encrypts data during storage.
    2. It allows multiple servers to store different portions of data, balancing the load.
    3. It merges all user data into one server.
    4. It makes multiple identical backups for disaster recovery.

    Explanation: Sharding splits the data across servers, allowing workloads to be distributed and balanced as the dataset grows. Option B describes replication, not sharding. Option C contradicts sharding, which is about distributing data, not merging it. Option D is unrelated; encryption is a separate feature.

  8. Replication vs. Sharding Operations

    Which statement best applies to write operations in a sharded environment?

    1. All writes are broadcast to every shard.
    2. Write requests are ignored in a sharded system.
    3. Writes go only to the relevant shard based on data key.
    4. Writes only happen on a single primary replica.

    Explanation: In sharding, a write operation is directed to the shard responsible for that data segment, improving efficiency. Broadcasting writes to every shard is inefficient and incorrect. Single primary replica handling is a concept from some replication models, not sharding. Ignoring write requests is untrue for any robust data system.

  9. Terminology Difference

    In terms commonly used, a 'replica set' relates to replication, while a 'shard key' is associated with which concept?

    1. Mirroring
    2. Caching
    3. Sharding
    4. Replication

    Explanation: A shard key determines how data is split among the shards and is a key concept in sharding. Mirroring and replication can share datasets, but the term 'shard key' is not used with them. Caching is not related to data distribution techniques like sharding or replication.

  10. Selecting Between Replication and Sharding

    For a system that needs to handle massive data growth while keeping query response times low, which technique is more suitable?

    1. Sharding
    2. Replication
    3. Snapshotting
    4. Archiving

    Explanation: Sharding is designed to scale both data storage and performance by distributing data across multiple servers. Replication improves availability and redundancy but doesn't itself address scaling with large data volumes. Snapshotting and archiving are backup strategies, not distribution techniques to improve query performance.