Cassandra Indexing: Secondary Indexes u0026 Materialized Views Essentials Quiz

Explore core concepts of Cassandra indexing with this quiz on secondary indexes and materialized views, focusing on their functionality, use cases, and differences. Ideal for those seeking to reinforce their understanding of efficient data retrieval techniques in distributed databases.

  1. Purpose of Secondary Indexes

    Which statement best describes the main purpose of a secondary index in Cassandra?

    1. To increase write performance for all queries
    2. To group multiple tables under a single key
    3. To reduce storage space for large datasets
    4. To allow efficient querying on non-primary key columns

    Explanation: Secondary indexes in Cassandra are used to enable querying based on columns that are not part of the primary key. They do not increase write performance; in fact, they can add some overhead. Secondary indexes do not group tables, nor are they primarily intended for reducing storage space. Their core job is to support more flexible queries.

  2. Scenario for Materialized Views

    In which scenario would a materialized view be more appropriate than a secondary index?

    1. When you need to retrieve data using a different primary key structure
    2. When you need to compress the data for storage
    3. When you want to join data from two unrelated tables
    4. When you want to automatically encrypt sensitive columns

    Explanation: Materialized views in Cassandra are useful when you need an alternative primary key for efficient data retrieval. They don’t provide join functionality, data compression, or encryption capabilities directly. Each incorrect option describes a feature or behavior not supported by materialized views.

  3. Secondary Index Limitation

    What is a common limitation of secondary indexes in Cassandra, especially on large datasets with low cardinality values?

    1. They always improve write speeds
    2. They automatically shard data between multiple nodes
    3. They require all columns to be unique
    4. They may perform poorly because the index can become very large and inefficient

    Explanation: Secondary indexes can become inefficient and slow on large datasets with low cardinality because many rows can map to the same index entry, making lookups costly. They do not handle sharding or require unique columns. Rather than improving write speeds, secondary indexes may slightly reduce them due to the extra indexing overhead.

  4. Materialized View Updates

    What happens to a materialized view when data in the base table is modified?

    1. The view is deleted and recreated each time
    2. The view must be manually rebuilt after every change
    3. The materialized view is automatically updated to reflect changes
    4. The view is unaffected by any changes

    Explanation: Materialized views are kept in sync with their base tables, so updates on the base table automatically trigger corresponding changes in the view. The other options are incorrect because the view is neither left outdated, manually rebuilt for every change, nor deleted and recreated. This automatic update ensures query consistency.

  5. Index Type for Non-Primary Key Query

    If you need to query rows by a non-primary key column that is frequently updated, which approach might be less optimal?

    1. Creating a secondary index on the frequently updated column
    2. Designing the table with the column as a clustering key from the start
    3. Retrieving data using the primary key only
    4. Using a materialized view with the column as part of its primary key

    Explanation: Secondary indexes on frequently updated columns can lead to performance issues, as the index must be constantly updated. Materialized views or table designs with appropriate key structures handle frequent updates more efficiently. Retrieving data using only primary keys doesn’t involve this column and is not relevant to the situation.

  6. Materialized Views vs. Secondary Indexes

    Which statement accurately distinguishes materialized views from secondary indexes in Cassandra?

    1. Secondary indexes are faster for all types of queries than materialized views
    2. Materialized views copy and store data using a new primary key, while secondary indexes store only pointers to existing data
    3. Secondary indexes support joins between tables
    4. Materialized views cannot be updated after creation

    Explanation: Materialized views physically store data based on a new primary key, leading to duplication, while secondary indexes store references to data and don’t duplicate the actual content. The claim of secondary indexes always being faster is untrue; their performance depends on query patterns. Materialized views are updated automatically after creation. Secondary indexes do not support joining tables.

  7. Dropping Indexes

    What is the effect of dropping a secondary index in Cassandra?

    1. All data in the table is deleted
    2. It immediately improves all read performance
    3. Queries on the indexed column using non-primary key values will no longer work
    4. The table schema must be recreated

    Explanation: Once a secondary index is dropped, queries relying on that index (for non-primary key values) will fail unless those queries use the primary key. Dropping an index does not delete the data, require table recreation, or inherently improve all read operations; in fact, it only affects query possibilities on that column.

  8. Index Cardinality Impact

    Why should secondary indexes generally be avoided on columns with very few distinct values (low cardinality)?

    1. Because many rows will map to each index entry, causing slow lookups
    2. Because writes to the table will be blocked
    3. Because such indexes compress poorly
    4. Because the index becomes too small to be useful

    Explanation: Low cardinality columns result in secondary index entries pointing to large numbers of rows, making queries inefficient. Small indexes are not inherently problematic, nor does index size affect compression or block writes. The main concern is the performance penalty during lookups.

  9. Materialized Views Storage

    What is a key storage consideration when using materialized views in Cassandra?

    1. Materialized views only store metadata and no actual data
    2. Disk usage is unaffected by the number of materialized views
    3. Materialized views store a copy of the selected data, resulting in increased disk usage
    4. Materialized views reduce storage by storing compressed pointers

    Explanation: Materialized views replicate the underlying data with a new primary key, leading to higher disk consumption. They are not just metadata; the data is physically duplicated. They do not inherently compress pointers or minimize storage, and adding materialized views will affect disk use accordingly.

  10. Query Restriction Without Secondary Index

    In Cassandra, what is typically required when querying a non-primary key column that does not have a secondary index or materialized view?

    1. You can directly filter the column at high speed
    2. You must scan the entire table to find matching rows
    3. The column is automatically indexed by default
    4. You can use joins to connect tables and filter the column

    Explanation: Without an index or materialized view, Cassandra performs a full table scan to locate rows that match a non-primary key column, which is inefficient for large tables. Direct high-speed filtering and joins are not supported. Columns are not indexed automatically unless specified explicitly.