Challenge your understanding of advanced Cassandra deployment strategies across multiple data centers and cloud environments. This quiz covers replication, consistency, failover, and configuration best practices to optimize distributed database performance and availability.
Which replication strategy is typically used for deploying Cassandra across multiple data centers while ensuring data is distributed evenly?
Explanation: NetworkTopologyStrategy is designed for multi-data center deployments and lets you define replica distribution per data center. SimpleStrategy is meant for single data center setups and does not give control over multiple locations. RoundRobinStrategy and DataAwareStrategy are not valid replication strategies in Cassandra, making them incorrect choices for this scenario.
When reading from a multi-DC Cassandra cluster, which consistency level ensures the query returns the most recent write from at least one replica in each data center?
Explanation: EACH_QUORUM requires a quorum of replicas from each data center to respond, ensuring strong consistency across all DCs. LOCAL_ONE only reads from one replica in the local data center. ALL waits for all replicas in all data centers, which might not be practical, and TWO reads from two replicas but does not guarantee one from each data center.
To minimize latency for clients reading and writing data to Cassandra from different geographical regions, which deployment pattern should be preferred?
Explanation: Placing nodes in multiple data centers closer to clients reduces network travel time, thereby minimizing latency. Centralizing in one data center may cause increased latency for users farther away. Having only one node per data center does not provide adequate redundancy. Random node placement can lead to inconsistent latency and does not align with best-practice deployment.
In a multi-DC and cloud environment, which snitch is recommended to properly route traffic and determine network topology?
Explanation: GossipingPropertyFileSnitch dynamically learns network topology and is ideal for multi-DC or cloud deployments. SimpleSnitch does not recognize data centers or racks. RoundRobinSnitch and BasicSnitch are not recognized snitch options and thus not suitable or available for Cassandra topology needs.
When migrating Cassandra to a cloud environment with multiple availability zones, what is an essential best practice to maximize fault tolerance?
Explanation: Even node distribution across zones ensures high availability and failover capabilities if one zone becomes unavailable. Running all nodes in one zone, whether smallest or cheapest, increases risk. Doubling nodes in one zone disregards redundancy and undermines fault tolerance.
What is the recommended replication factor for each data center in a production-grade multi-DC Cassandra cluster for most workloads?
Explanation: Replication factor of 3 is a widely accepted default for balancing consistency, availability, and resource usage in production. A factor of 1 gives no redundancy. 5 or 7 adds more replicas than usually needed, increasing storage and cost without proportional availability benefit for most common workloads.
In a multi-DC Cassandra cluster deployment, what is the primary role of seed nodes?
Explanation: Seed nodes are responsible for introducing new nodes to the cluster and facilitating initialization. They do not directly manage backups or act as special query handlers. Disk balancing is managed internally by the system regardless of seed node configuration.
Why is running repair operations regularly important in a multi-DC Cassandra cluster?
Explanation: Running repairs ensures that all replicas, especially across DCs, have consistent data, preventing data divergence. Repair does not change replication strategy or instantly reduce storage. Old data removal is handled by compaction and tombstone expiry, not by repairs.
How does Cassandra ensure high availability for writes if a node fails in one data center of a multi-DC setup?
Explanation: Cassandra's replication allows other available replicas, including those in different DCs, to store incoming writes, keeping the system available. Writes are not halted cluster-wide, and failed nodes do not receive writes. Any missed writes can be replayed after recovery, but they are not permanently lost.
When expanding a multi-DC Cassandra cluster by adding a new data center, what is a key step to ensure clients route requests correctly?
Explanation: Ensuring the snitch and any connected drivers recognize the new data center is crucial for proper routing. Erasing data, reformatting disks, or disabling encryption would negatively impact the cluster or security and are not relevant steps when adding a new data center.