Explore fundamental concepts of the write and read path in Cassandra architecture with this concise quiz. Assess your knowledge of replication, consistency, memtables, commit logs, and related components crucial for reliable distributed database operations.
When a write request is processed, what is the primary purpose of the commit log?
Explanation: The commit log ensures data durability by recording each write operation so that data can be recovered in case of a failure. It does not optimize read operations directly—that is managed by memtables and SSTables. Storing replicated data is handled by the replication mechanism, not the commit log itself. Automatic deletion of old data is managed elsewhere, such as through compaction and TTL settings, not by the commit log.
What is the memtable in Cassandra primarily used for during the write path?
Explanation: The memtable serves as an in-memory, write-back cache that temporarily holds data before it is flushed to disk as an SSTable. While it helps with speedy reads to some degree, its main role is not indexing data across the cluster. It does not perform data encryption or handle authentication tokens.
After flushing a memtable, what file format is used on disk to store the data?
Explanation: Data is stored in SSTables (Sorted String Tables) on disk after it is flushed from the memtable. There are no file formats named TempTable or CacheFile used in this context, and XMLTable is not a component of Cassandra data storage.
Which aspect of a write or read request determines how many replicas must acknowledge an operation before success?
Explanation: The consistency level specifies how many replicas must confirm the operation—either a read or write—before the request is considered complete. Commit log size does not relate to successful acknowledgment. Node IP range and partition key length are unrelated to operation acknowledgment.
During both read and write operations, which node is responsible for overseeing the request and aggregating responses?
Explanation: The coordinator node manages the request, forwarding it to the appropriate replicas and collecting responses. A seed node is mainly used for bootstrapping new nodes, not coordination. Replica nodes store the actual data, and index nodes are not a distinct role in this database system.
If a table has a replication factor of 3, what is the minimum number of replica nodes that must respond under 'QUORUM' consistency?
Explanation: Under QUORUM consistency, a majority of replicas (replication factor / 2 + 1) must respond, so with a replication factor of 3, at least 2 must reply. One is insufficient for quorum, three is not necessary unless consistency level is set to ALL, and zero is never sufficient.
What does the system do when a node is temporarily down during a write operation?
Explanation: A hint is kept so the missing write can be delivered to the recovering node, ensuring eventual consistency. Data is not deleted because of temporary outage. Not all writes are rejected—writes can continue with other replicas. While notifications may be sent in some systems, the primary function here is the hinted handoff.
What is the main purpose of the read repair process when a read request is performed?
Explanation: Read repair ensures all replicas have the most recent data version when a discrepancy is detected during read requests. It does not backup data, compress files, or encrypt queries. These functions are managed by other tools or processes.
Which factor does the partitioner use to determine which node stores a particular piece of data?
Explanation: The partitioner distributes data based on the partition key, ensuring even spread and quick lookup. Consistency level affects acknowledgement, not data distribution. Memtable threshold is related to in-memory storage limits, and column type does not influence node selection.
How does the bloom filter assist during the read path?
Explanation: The bloom filter helps determine if a requested row might be present in an SSTable, thus avoiding unnecessary disk reads. It does not compress SSTables or handle replica synchronization, and it is not responsible for deleting duplicates.