Explore core concepts of Cassandra indexing with this quiz on secondary indexes and materialized views, focusing on their functionality, use cases, and differences. Ideal for those seeking to reinforce their understanding of efficient data retrieval techniques in distributed databases.
Which statement best describes the main purpose of a secondary index in Cassandra?
Explanation: Secondary indexes in Cassandra are used to enable querying based on columns that are not part of the primary key. They do not increase write performance; in fact, they can add some overhead. Secondary indexes do not group tables, nor are they primarily intended for reducing storage space. Their core job is to support more flexible queries.
In which scenario would a materialized view be more appropriate than a secondary index?
Explanation: Materialized views in Cassandra are useful when you need an alternative primary key for efficient data retrieval. They don’t provide join functionality, data compression, or encryption capabilities directly. Each incorrect option describes a feature or behavior not supported by materialized views.
What is a common limitation of secondary indexes in Cassandra, especially on large datasets with low cardinality values?
Explanation: Secondary indexes can become inefficient and slow on large datasets with low cardinality because many rows can map to the same index entry, making lookups costly. They do not handle sharding or require unique columns. Rather than improving write speeds, secondary indexes may slightly reduce them due to the extra indexing overhead.
What happens to a materialized view when data in the base table is modified?
Explanation: Materialized views are kept in sync with their base tables, so updates on the base table automatically trigger corresponding changes in the view. The other options are incorrect because the view is neither left outdated, manually rebuilt for every change, nor deleted and recreated. This automatic update ensures query consistency.
If you need to query rows by a non-primary key column that is frequently updated, which approach might be less optimal?
Explanation: Secondary indexes on frequently updated columns can lead to performance issues, as the index must be constantly updated. Materialized views or table designs with appropriate key structures handle frequent updates more efficiently. Retrieving data using only primary keys doesn’t involve this column and is not relevant to the situation.
Which statement accurately distinguishes materialized views from secondary indexes in Cassandra?
Explanation: Materialized views physically store data based on a new primary key, leading to duplication, while secondary indexes store references to data and don’t duplicate the actual content. The claim of secondary indexes always being faster is untrue; their performance depends on query patterns. Materialized views are updated automatically after creation. Secondary indexes do not support joining tables.
What is the effect of dropping a secondary index in Cassandra?
Explanation: Once a secondary index is dropped, queries relying on that index (for non-primary key values) will fail unless those queries use the primary key. Dropping an index does not delete the data, require table recreation, or inherently improve all read operations; in fact, it only affects query possibilities on that column.
Why should secondary indexes generally be avoided on columns with very few distinct values (low cardinality)?
Explanation: Low cardinality columns result in secondary index entries pointing to large numbers of rows, making queries inefficient. Small indexes are not inherently problematic, nor does index size affect compression or block writes. The main concern is the performance penalty during lookups.
What is a key storage consideration when using materialized views in Cassandra?
Explanation: Materialized views replicate the underlying data with a new primary key, leading to higher disk consumption. They are not just metadata; the data is physically duplicated. They do not inherently compress pointers or minimize storage, and adding materialized views will affect disk use accordingly.
In Cassandra, what is typically required when querying a non-primary key column that does not have a secondary index or materialized view?
Explanation: Without an index or materialized view, Cassandra performs a full table scan to locate rows that match a non-primary key column, which is inefficient for large tables. Direct high-speed filtering and joins are not supported. Columns are not indexed automatically unless specified explicitly.