Partitioning and Clustering Fundamentals in Cassandra Quiz

Explore key concepts of partitioning and clustering in Cassandra with this quiz, covering primary keys, partition keys, clustering columns, data distribution, and query patterns. Ideal for learners seeking a foundational understanding of how data is organized and accessed in Cassandra's distributed architecture.

  1. Partition Key Identification

    In a table with primary key ((user_id, order_id), date), which part is considered the partition key?

    1. user_id and order_id
    2. order_id and date
    3. date
    4. user_id only

    Explanation: user_id and order_id are both enclosed in the first set of parentheses, indicating they form the partition key. The partition key determines the node on which the data is stored. The date is a clustering column, not a partition key. order_id and date together do not form the partition key, and user_id alone is insufficient since both columns are required for partitioning.

  2. Role of the Clustering Column

    What is the main function of a clustering column in a Cassandra table?

    1. It determines the table schema
    2. It splits data between different nodes
    3. It assigns the replication factor
    4. It defines the sort order of rows within a partition

    Explanation: The clustering column controls how rows are organized and sorted inside each partition for efficient range queries. It does not split data between nodes, which is handled by the partition key. The replication factor is set at the keyspace level, not by clustering columns. Table schema involves more than just clustering columns.

  3. Data Distribution Mechanism

    How does Cassandra ensure that data is evenly distributed across nodes?

    1. By manually assigning nodes
    2. By using clustering columns
    3. By table definition
    4. By hashing the partition key

    Explanation: Cassandra uses a hash of the partition key to determine the node for each row, ensuring an even data spread. Clustering columns affect row order, not node placement. Data distribution isn't achieved by manually assigning nodes or by table definition alone, as these do not address automatic load balancing.

  4. Primary Key Components

    In Cassandra, which of the following best describes the primary key?

    1. It consists of all the columns in the table
    2. It only includes the first column of the table
    3. It consists of partition key and optional clustering columns
    4. It is the same as the replication key

    Explanation: The primary key is made up of the partition key and zero or more clustering columns, determining data uniqueness and layout. It is not the same as the replication key, which does not exist as a structural concept. The first column alone may not represent the full primary key. Not all columns are part of the primary key, only those explicitly defined.

  5. Partition Size Considerations

    What is a potential issue when a partition in Cassandra becomes too large?

    1. Increased replication factor
    2. Slower read and write performance
    3. Reduced cluster size
    4. Smaller disk usage

    Explanation: Large partitions can lead to slow read and write access due to increased I/O and network traffic. The replication factor is not influenced by partition size but is set per keyspace. Cluster size changes are unrelated to partition sizes, and large partitions actually increase, not decrease, disk usage.

  6. Query Patterns and Primary Keys

    Which query is most efficient for accessing data in Cassandra?

    1. Querying with only clustering columns
    2. Querying by the full primary key
    3. Querying by non-primary key columns
    4. Querying all data without filters

    Explanation: Queries using the full primary key are routed directly to the correct partition, making them fast and efficient. Non-primary key columns are not indexed by default and require full table scans. Using only clustering columns without the partition key results in inefficient searches. Querying all data is resource-intensive and slow.

  7. Effect of Changing Partition Key

    What happens if you change the partition key of a table with existing data?

    1. Nothing changes for existing data
    2. All existing data must be rewritten
    3. Only new data uses the new key
    4. Data is automatically reassigned to new keys

    Explanation: Changing the partition key requires creating a new table and migrating data, since the original storage layout relied on the old partition key. Data is not automatically reassigned; a schema change on an existing table does not affect data partitions retroactively. Using the new key only for new data isn't possible without a redesigned schema.

  8. Partition vs. Clustering Column Roles

    Which statement best distinguishes a partition key from a clustering column?

    1. Partition key decides data location; clustering column sorts rows within a partition
    2. Both are used only for sorting data
    3. Clustering column sets replication; partition key is used for sorting
    4. Partition key sorts data; clustering column determines node location

    Explanation: The partition key determines on which node the partition will reside, while clustering columns control the order of rows inside the partition. Clustering columns do not affect node placement or replication. Both roles differ from mere sorting; only clustering columns are responsible for row order.

  9. Composite Partition Keys

    Which situation would benefit from using a composite partition key in Cassandra?

    1. When sorting data within a partition is required
    2. When global ordering is necessary across the table
    3. When a table contains only a single column
    4. When you need to group data by multiple fields, like (city, department)

    Explanation: Composite partition keys allow partitioning data based on a combination of fields, such as city and department, aligning partitioning with access patterns. Sorting within a partition relies on clustering columns, not the partition key. Global ordering isn't supported in Cassandra, and composite keys make no sense for single-column tables.

  10. Purpose of Clustering Order

    Why might you specify a clustering order (ASC or DESC) on a table's clustering column?

    1. To optimize retrieval of most recent or oldest records first
    2. To increase table's replication factor
    3. To enforce unique values in a column
    4. To distribute partitions evenly across nodes

    Explanation: Clustering order helps return results in the desired sequence, useful for time-series data or retrieving latest entries quickly. It does not affect partition distribution, which is managed by the partition key. Unique constraints are not enforced by clustering order, and replication factor is unrelated to data ordering.