Explore foundational concepts and architectural principles behind Cassandra, including data modeling, partitioning, replication, consistency, and core terminology. This quiz is designed to help you review essential Cassandra features for building scalable, distributed databases.
Which component is the basic storage unit in Cassandra, where data is stored as rows identified by primary keys?
Explanation: A table is the basic storage unit in Cassandra, storing rows and columns of data identified by primary keys. 'Bucket', 'Block', and 'Binder' are not appropriate terms for Cassandra's core data model, as they refer to concepts in other contexts or are unrelated terms.
In Cassandra’s distributed architecture, what term describes the copies of data across multiple nodes to ensure reliability?
Explanation: Replication refers to the process of storing copies of data on multiple nodes for fault tolerance and availability. While 'Duplication' sounds similar, it's not the specific term used in Cassandra. 'Synchronization' relates to data consistency, and 'Fragmentation' concerns storage inefficiencies.
What is the main purpose of the partition key in a Cassandra table, as seen in an example where user_id determines data placement?
Explanation: The partition key is used to determine how rows are distributed across different nodes. It does not control the sorting of columns, data encryption, or memory allocation. The other options confuse partitioning with fundamentally different functionalities.
Which two out of Consistency, Availability, and Partition Tolerance does Cassandra prioritize in case of network failures?
Explanation: Cassandra is designed to provide high availability and partition tolerance, often at the expense of strict consistency during network partitions. 'Consistency and Partition Tolerance' would require sacrificing availability, which is not Cassandra’s default approach. 'Consistency and Availability' is not the optimal compromise in this system, and 'Durability and Scalability' are not part of the CAP theorem.
What happens first when data is written to Cassandra, for example, inserting a record into a table?
Explanation: Cassandra writes incoming data to a memory structure before flushing it to disk for long-term storage. 'Immediate distribution' is part of the replication process but happens after the write. Data is not encrypted and deleted by default, nor is it sent directly to archival storage upon writing.
When a user queries data with a 'QUORUM' consistency level, what is Cassandra ensuring?
Explanation: 'QUORUM' consistency level requires a majority of replicas to respond to a read or write request. 'Only the primary replica responds' describes a different scenario, and requiring all nodes is known as 'ALL' consistency. Guaranteeing 'always the most recent data' is not possible in distributed scenarios due to network delays.
In Cassandra’s architecture, what is the primary function of a node?
Explanation: A node is an individual server responsible for holding part of the overall database and handling read/write requests. Coordinating global transactions is handled by other components, not usually by a single node alone. Query compilation and file compression are unrelated to the main purpose of a node.
Why does Cassandra use a gossip protocol among its nodes in a cluster?
Explanation: The gossip protocol allows nodes to exchange information about their status, cluster membership, and health. It does not broadcast schema details to clients, nor does it encrypt data or log queries. The other options suggest functionalities that are handled by different mechanisms.
What does the term 'data center' refer to in a Cassandra cluster?
Explanation: In Cassandra, a data center groups nodes for replication and load balancing, often reflecting real-world geography. It is not a backup device, processing unit, or network switch, as those describe components outside the core architecture concepts.
What does it mean that Cassandra offers tunable consistency for read and write requests?
Explanation: Tunable consistency lets clients specify the number of replica responses required, allowing trade-offs between consistency and availability. Automatic deletion of data, periodic updates of replicas, or exact-once query processing are unrelated or inaccurate representations of tunable consistency.