Cassandra Basics: Key Concepts and Architecture Quiz Quiz

Explore foundational concepts and architectural principles behind Cassandra, including data modeling, partitioning, replication, consistency, and core terminology. This quiz is designed to help you review essential Cassandra features for building scalable, distributed databases.

  1. Core Data Model Element

    Which component is the basic storage unit in Cassandra, where data is stored as rows identified by primary keys?

    1. Binder
    2. Table
    3. Bucket
    4. Block

    Explanation: A table is the basic storage unit in Cassandra, storing rows and columns of data identified by primary keys. 'Bucket', 'Block', and 'Binder' are not appropriate terms for Cassandra's core data model, as they refer to concepts in other contexts or are unrelated terms.

  2. Understanding Replication

    In Cassandra’s distributed architecture, what term describes the copies of data across multiple nodes to ensure reliability?

    1. Fragmentation
    2. Duplication
    3. Synchronization
    4. Replication

    Explanation: Replication refers to the process of storing copies of data on multiple nodes for fault tolerance and availability. While 'Duplication' sounds similar, it's not the specific term used in Cassandra. 'Synchronization' relates to data consistency, and 'Fragmentation' concerns storage inefficiencies.

  3. Concept of Partition Key

    What is the main purpose of the partition key in a Cassandra table, as seen in an example where user_id determines data placement?

    1. Sorting columns within a row
    2. Encrypting the data
    3. Deciding which node will store the row
    4. Assigning memory limits

    Explanation: The partition key is used to determine how rows are distributed across different nodes. It does not control the sorting of columns, data encryption, or memory allocation. The other options confuse partitioning with fundamentally different functionalities.

  4. CAP Theorem Alignment

    Which two out of Consistency, Availability, and Partition Tolerance does Cassandra prioritize in case of network failures?

    1. Consistency and Partition Tolerance
    2. Availability and Partition Tolerance
    3. Durability and Scalability
    4. Consistency and Availability

    Explanation: Cassandra is designed to provide high availability and partition tolerance, often at the expense of strict consistency during network partitions. 'Consistency and Partition Tolerance' would require sacrificing availability, which is not Cassandra’s default approach. 'Consistency and Availability' is not the optimal compromise in this system, and 'Durability and Scalability' are not part of the CAP theorem.

  5. Write Process in Cassandra

    What happens first when data is written to Cassandra, for example, inserting a record into a table?

    1. Data is encrypted and then deleted
    2. Data is immediately distributed to all nodes
    3. Data is stored in memory before being written to disk
    4. Data is sent to archival storage

    Explanation: Cassandra writes incoming data to a memory structure before flushing it to disk for long-term storage. 'Immediate distribution' is part of the replication process but happens after the write. Data is not encrypted and deleted by default, nor is it sent directly to archival storage upon writing.

  6. Consistency Level Usage

    When a user queries data with a 'QUORUM' consistency level, what is Cassandra ensuring?

    1. Only the primary replica responds
    2. The data is always the most recent
    3. A majority of replicas acknowledge the operation
    4. All nodes in the cluster must respond

    Explanation: 'QUORUM' consistency level requires a majority of replicas to respond to a read or write request. 'Only the primary replica responds' describes a different scenario, and requiring all nodes is known as 'ALL' consistency. Guaranteeing 'always the most recent data' is not possible in distributed scenarios due to network delays.

  7. Role of a Node

    In Cassandra’s architecture, what is the primary function of a node?

    1. Coordinating global transactions
    2. Compressing large files
    3. Storing and managing a subset of data in the cluster
    4. Compiling queries into machine code

    Explanation: A node is an individual server responsible for holding part of the overall database and handling read/write requests. Coordinating global transactions is handled by other components, not usually by a single node alone. Query compilation and file compression are unrelated to the main purpose of a node.

  8. Purpose of Gossip Protocol

    Why does Cassandra use a gossip protocol among its nodes in a cluster?

    1. To encrypt data in transit
    2. To broadcast table schema to clients
    3. To share state information and maintain cluster awareness
    4. To log query histories

    Explanation: The gossip protocol allows nodes to exchange information about their status, cluster membership, and health. It does not broadcast schema details to clients, nor does it encrypt data or log queries. The other options suggest functionalities that are handled by different mechanisms.

  9. Understanding a Data Center

    What does the term 'data center' refer to in a Cassandra cluster?

    1. A central processing unit
    2. A logical grouping of nodes, often for physical or geographic separation
    3. A backup storage device attached to the cluster
    4. A network switch used for routing

    Explanation: In Cassandra, a data center groups nodes for replication and load balancing, often reflecting real-world geography. It is not a backup device, processing unit, or network switch, as those describe components outside the core architecture concepts.

  10. Meaning of Tunable Consistency

    What does it mean that Cassandra offers tunable consistency for read and write requests?

    1. Every query is processed exactly once
    2. Replicas automatically update themselves at fixed intervals
    3. Clients can choose how many replicas must acknowledge a request
    4. All data is eventually deleted after writing

    Explanation: Tunable consistency lets clients specify the number of replica responses required, allowing trade-offs between consistency and availability. Automatic deletion of data, periodic updates of replicas, or exact-once query processing are unrelated or inaccurate representations of tunable consistency.