Understanding Write and Read Path in Cassandra Quiz

Explore fundamental concepts of the write and read path in Cassandra architecture with this concise quiz. Assess your knowledge of replication, consistency, memtables, commit logs, and related components crucial for reliable distributed database operations.

  1. Commit Log Role

    When a write request is processed, what is the primary purpose of the commit log?

    1. To store replicated data across nodes
    2. To automatically delete old data
    3. To provide durability by recording every write operation
    4. To optimize read operations with faster lookup

    Explanation: The commit log ensures data durability by recording each write operation so that data can be recovered in case of a failure. It does not optimize read operations directly—that is managed by memtables and SSTables. Storing replicated data is handled by the replication mechanism, not the commit log itself. Automatic deletion of old data is managed elsewhere, such as through compaction and TTL settings, not by the commit log.

  2. Memtable's Function

    What is the memtable in Cassandra primarily used for during the write path?

    1. Indexing data for faster reads across the cluster
    2. Managing user authentication tokens
    3. Encrypting data before replication
    4. Temporarily storing written data in memory before flushing to disk

    Explanation: The memtable serves as an in-memory, write-back cache that temporarily holds data before it is flushed to disk as an SSTable. While it helps with speedy reads to some degree, its main role is not indexing data across the cluster. It does not perform data encryption or handle authentication tokens.

  3. SSTable Characteristics

    After flushing a memtable, what file format is used on disk to store the data?

    1. XMLTable
    2. TempTable
    3. SSTable
    4. CacheFile

    Explanation: Data is stored in SSTables (Sorted String Tables) on disk after it is flushed from the memtable. There are no file formats named TempTable or CacheFile used in this context, and XMLTable is not a component of Cassandra data storage.

  4. Consistency Level Usage

    Which aspect of a write or read request determines how many replicas must acknowledge an operation before success?

    1. Partition key length
    2. Commit log size
    3. Node IP range
    4. Consistency level

    Explanation: The consistency level specifies how many replicas must confirm the operation—either a read or write—before the request is considered complete. Commit log size does not relate to successful acknowledgment. Node IP range and partition key length are unrelated to operation acknowledgment.

  5. Coordinator Node Role

    During both read and write operations, which node is responsible for overseeing the request and aggregating responses?

    1. Coordinator node
    2. Seed node
    3. Index node
    4. Replica node

    Explanation: The coordinator node manages the request, forwarding it to the appropriate replicas and collecting responses. A seed node is mainly used for bootstrapping new nodes, not coordination. Replica nodes store the actual data, and index nodes are not a distinct role in this database system.

  6. Quorum Consistency Example

    If a table has a replication factor of 3, what is the minimum number of replica nodes that must respond under 'QUORUM' consistency?

    1. 0
    2. 1
    3. 2
    4. 3

    Explanation: Under QUORUM consistency, a majority of replicas (replication factor / 2 + 1) must respond, so with a replication factor of 3, at least 2 must reply. One is insufficient for quorum, three is not necessary unless consistency level is set to ALL, and zero is never sufficient.

  7. Hinted Handoff

    What does the system do when a node is temporarily down during a write operation?

    1. Rejects all write requests until the node returns
    2. Deletes the data immediately
    3. Stores a hint to deliver the write later when the node recovers
    4. Sends a notification to users

    Explanation: A hint is kept so the missing write can be delivered to the recovering node, ensuring eventual consistency. Data is not deleted because of temporary outage. Not all writes are rejected—writes can continue with other replicas. While notifications may be sent in some systems, the primary function here is the hinted handoff.

  8. Read Repair

    What is the main purpose of the read repair process when a read request is performed?

    1. To compress large files on disk
    2. To encrypt user queries
    3. To backup data to external storage
    4. To synchronize out-of-date replicas during reads

    Explanation: Read repair ensures all replicas have the most recent data version when a discrepancy is detected during read requests. It does not backup data, compress files, or encrypt queries. These functions are managed by other tools or processes.

  9. Replica Selection

    Which factor does the partitioner use to determine which node stores a particular piece of data?

    1. Column type
    2. Consistency level
    3. Partition key
    4. Memtable threshold

    Explanation: The partitioner distributes data based on the partition key, ensuring even spread and quick lookup. Consistency level affects acknowledgement, not data distribution. Memtable threshold is related to in-memory storage limits, and column type does not influence node selection.

  10. Read Path Optimization

    How does the bloom filter assist during the read path?

    1. By reducing the need to check every SSTable for requested data
    2. By managing replica synchronization automatically
    3. By deleting duplicate records
    4. By compressing all SSTable files

    Explanation: The bloom filter helps determine if a requested row might be present in an SSTable, thus avoiding unnecessary disk reads. It does not compress SSTables or handle replica synchronization, and it is not responsible for deleting duplicates.