Explore key concepts of partitioning and clustering in Cassandra with this quiz, covering primary keys, partition keys, clustering columns, data distribution, and query patterns. Ideal for learners seeking a foundational understanding of how data is organized and accessed in Cassandra's distributed architecture.
In a table with primary key ((user_id, order_id), date), which part is considered the partition key?
Explanation: user_id and order_id are both enclosed in the first set of parentheses, indicating they form the partition key. The partition key determines the node on which the data is stored. The date is a clustering column, not a partition key. order_id and date together do not form the partition key, and user_id alone is insufficient since both columns are required for partitioning.
What is the main function of a clustering column in a Cassandra table?
Explanation: The clustering column controls how rows are organized and sorted inside each partition for efficient range queries. It does not split data between nodes, which is handled by the partition key. The replication factor is set at the keyspace level, not by clustering columns. Table schema involves more than just clustering columns.
How does Cassandra ensure that data is evenly distributed across nodes?
Explanation: Cassandra uses a hash of the partition key to determine the node for each row, ensuring an even data spread. Clustering columns affect row order, not node placement. Data distribution isn't achieved by manually assigning nodes or by table definition alone, as these do not address automatic load balancing.
In Cassandra, which of the following best describes the primary key?
Explanation: The primary key is made up of the partition key and zero or more clustering columns, determining data uniqueness and layout. It is not the same as the replication key, which does not exist as a structural concept. The first column alone may not represent the full primary key. Not all columns are part of the primary key, only those explicitly defined.
What is a potential issue when a partition in Cassandra becomes too large?
Explanation: Large partitions can lead to slow read and write access due to increased I/O and network traffic. The replication factor is not influenced by partition size but is set per keyspace. Cluster size changes are unrelated to partition sizes, and large partitions actually increase, not decrease, disk usage.
Which query is most efficient for accessing data in Cassandra?
Explanation: Queries using the full primary key are routed directly to the correct partition, making them fast and efficient. Non-primary key columns are not indexed by default and require full table scans. Using only clustering columns without the partition key results in inefficient searches. Querying all data is resource-intensive and slow.
What happens if you change the partition key of a table with existing data?
Explanation: Changing the partition key requires creating a new table and migrating data, since the original storage layout relied on the old partition key. Data is not automatically reassigned; a schema change on an existing table does not affect data partitions retroactively. Using the new key only for new data isn't possible without a redesigned schema.
Which statement best distinguishes a partition key from a clustering column?
Explanation: The partition key determines on which node the partition will reside, while clustering columns control the order of rows inside the partition. Clustering columns do not affect node placement or replication. Both roles differ from mere sorting; only clustering columns are responsible for row order.
Which situation would benefit from using a composite partition key in Cassandra?
Explanation: Composite partition keys allow partitioning data based on a combination of fields, such as city and department, aligning partitioning with access patterns. Sorting within a partition relies on clustering columns, not the partition key. Global ordering isn't supported in Cassandra, and composite keys make no sense for single-column tables.
Why might you specify a clustering order (ASC or DESC) on a table's clustering column?
Explanation: Clustering order helps return results in the desired sequence, useful for time-series data or retrieving latest entries quickly. It does not affect partition distribution, which is managed by the partition key. Unique constraints are not enforced by clustering order, and replication factor is unrelated to data ordering.