Challenge your knowledge of backup, restore, and disaster recovery processes in Cassandra databases. Understand best practices, key concepts, and essential techniques to ensure data protection and resilience in distributed systems.
Which type of backup in Cassandra captures the entire content of a data set at a specific point in time?
Explanation: A full backup copies all data and provides a complete set of the stored information at a specific moment. In contrast, a differential backup only captures changes since the last full backup, not the entire dataset. Incremental restore is not a backup type; it refers to the restoration process using incremental data. Snapshot schedule refers to a plan for backups, not the backup type itself.
When performing a backup in Cassandra, what does creating a snapshot primarily do?
Explanation: Creating a snapshot in Cassandra makes hard links or copies of SSTables, which preserves the current data state with minimal impact on running operations. It does not delete any files or restore data. Encryption is not the main purpose of snapshots, making the other options incorrect.
What is the first essential step when restoring a table from backup in Cassandra after data loss?
Explanation: You must stop the related node to avoid conflicts and ensure a clean restore operation. Copying snapshots without stopping the node risks data inconsistency. Changing all cluster tokens is not needed for restoration. Running a full cluster repair is done after restoring, not before.
What is the main advantage of using incremental backups in Cassandra?
Explanation: Incremental backups record only the new or changed SSTables since the last backup, saving time and storage. They do not decrease performance as much as full backups might. Backups do not delete old data, and snapshots are not automatically deleted after every operation.
In Cassandra disaster recovery, what does RTO (Recovery Time Objective) define?
Explanation: RTO refers to the duration within which service must be restored after a disaster. It does not measure data loss; that is RPO (Recovery Point Objective). Period between backups and the backup file size are unrelated to RTO, so the other options are less accurate.
Why is setting a regular backup schedule critical for Cassandra clusters?
Explanation: Regular backups make sure current data is consistently saved, providing the ability to restore the most up-to-date information after failures. Network speed is unrelated to backup schedules. Backups do not automatically shrink data size, nor do they disable recovery logs.
After restoring backups to a Cassandra node, which operation is commonly performed to ensure data consistency across replicas?
Explanation: Running a repair ensures that restored data is consistent and synchronized across all cluster replicas. Resetting tokens and upgrading software are not regular parts of restoration. Deleting commit logs may lead to data loss and is not standard after a restore.
Why should Cassandra backup files be stored in multiple, separate locations?
Explanation: Storing backups in diverse locations ensures data survivability if one site is compromised or fails. Storage strategies don't directly impact disk fragmentation, query efficiency, or CPU usage. Only the correct answer enhances disaster recovery resilience.
What role do commit logs play during data recovery in Cassandra?
Explanation: Commit logs contain unflushed data changes used to recover recent writes that may not be present in SSTables. They don't create backups, hold schedules, or manage authentication. This makes them vital in ensuring data safety following a crash.
What is the primary purpose of conducting disaster recovery drills in Cassandra environments?
Explanation: Simulated drills confirm that backup and restoration processes are reliable and efficient in real-world failures. Drills do not prune tables by themselves, delete old backups automatically, or change user roles. The primary focus is on organizational preparedness.