AWS Glue, Athena, and Redshift Fundamentals Quiz Quiz

Explore essential AWS data and analytics services with this quiz on Glue, Athena, and Redshift. Challenge your understanding of cloud-based data integration, querying, and warehousing concepts, ideal for foundational learning and exam preparation.

AWS Glue's Main Functionality
Which capability best describes the primary use of AWS Glue in a data workflow?
1. Data cataloging and ETL
2. Managing NoSQL databases
3. Data warehousing for structured data
4. Running SQL queries on static files
Explanation: The main functionality of AWS Glue is to catalog data and perform ETL (extract, transform, load) tasks. This helps in organizing, cleaning, and preparing data for analytics. Running SQL queries on static files is a core feature of Athena, not Glue. Data warehousing for structured data is better associated with Redshift. Managing NoSQL databases is unrelated to Glue's primary purpose.
Querying Data in S3
If you need to run SQL queries on data stored in S3 buckets without loading it into a database, which tool should you use?
1. Glue
2. Redshift
3. ElasticSearch
4. Athena
Explanation: Athena enables users to directly query data in S3 using SQL syntax, making it ideal for analyzing raw or semi-structured files quickly. Glue is designed for ETL and cataloging. Redshift is a data warehouse solution requiring data to be loaded first. ElasticSearch, while used for searching log data, is not used for SQL queries on S3 files.
Purpose of Redshift
Redshift is primarily suited for which of the following tasks in a cloud environment?
1. Real-time messaging
2. Scheduling ETL jobs
3. Massively parallel data warehousing
4. Indexing JSON documents
Explanation: Redshift is designed for massively parallel processing, making it excellent for large-scale data warehousing and analytics. It is not intended for indexing JSON documents, which is handled by different tools. Scheduling ETL jobs is primarily a function of Glue. Real-time messaging is outside the core scope of Redshift.
Glue Crawlers
What is the role of a crawler in Glue when connected to a new data source?
1. Crawlers delete duplicate files
2. Crawlers discover schema and catalog table metadata
3. Crawlers encrypt data in transit
4. Crawlers optimize SQL query performance
Explanation: Crawlers in Glue scan data sources to automatically discover schemas and create metadata tables in the data catalog. They do not delete duplicate files or optimize SQL query performance directly. Encrypting data in transit is usually managed by security configurations, not by crawlers themselves.
Schema-on-Read Concept
Which AWS analytics service allows you to apply schema-on-read when analyzing S3 data?
1. Glue
2. DynamoDB
3. Redshift
4. Athena
Explanation: Athena applies the schema to the data only when you read or query it, known as schema-on-read. Glue assists in cataloging schemas but does not perform the querying itself. Redshift typically uses schema-on-write by loading fully structured data. DynamoDB is a NoSQL database and not an analytics tool.
Redshift Columnar Storage
Why does Redshift use columnar storage for its data tables?
1. To limit compatibility to certain file types
2. To support only unstructured data
3. To increase storage costs
4. To speed up analytical queries on large datasets
Explanation: Columnar storage allows Redshift to read only the relevant columns needed for a query, leading to faster processing of analytical queries on large datasets. This approach does not increase storage costs; it often reduces them. Redshift is designed for structured data, not unstructured data. Its compatibility is not limited to particular file types because of this storage method.
Glue Job Scheduling
You need to automate a daily ETL process. Which Glue feature handles this scheduling?
1. Lambda policies
2. Athena workgroups
3. Redshift clusters
4. Glue triggers
Explanation: Glue triggers can be set up to initiate ETL jobs on a schedule or based on events, automating regular data workflows. Athena workgroups are used for managing query execution. Redshift clusters refer to compute resources for warehousing, not scheduling. Lambda policies are related to permissions for serverless compute functions.
Data Cataloging with Glue
If you have diverse data sources and want a unified metadata repository, which service's catalog should you use?
1. Glue Data Catalog
2. Redshift Engine
3. Workgroup Metadata
4. Athena Catalog
Explanation: The Glue Data Catalog provides a unified metadata repository, accessible by multiple analytics services for consistent data discovery and management. Athena uses the Glue Data Catalog but does not create one of its own. Redshift Engine relates to data warehousing, not cataloging. Workgroup Metadata is not a standard service feature in this context.
Athena Pricing Model
How is pricing typically calculated when using Athena to query data in S3?
1. Based on the amount of data scanned per query
2. By the number of tables created
3. Using monthly subscription fees
4. By schema complexity
Explanation: Athena charges users according to the volume of data scanned for each query, making query optimization important. Number of tables or schema complexity do not directly impact cost. There is no monthly subscription required for Athena usage; costs are pay-per-query.
Redshift Spectrum Feature
Which Redshift feature enables querying data stored directly in S3, extending the warehouse's capacity?
1. Redshift Copy
2. Athena Connector
3. Redshift Spectrum
4. Glue Jobs
Explanation: Redshift Spectrum allows users to run SQL queries on both data in Redshift and data stored directly in S3, increasing flexibility. Redshift Copy is used for loading data into Redshift, not querying S3. Glue Jobs are for ETL operations, not direct querying by Redshift. Athena Connector is not a feature of Redshift.

AWS Glue, Athena, and Redshift Fundamentals Quiz Quiz

AWS Glue's Main Functionality

Querying Data in S3

Purpose of Redshift

Glue Crawlers

Schema-on-Read Concept

Redshift Columnar Storage

Glue Job Scheduling

Data Cataloging with Glue

Athena Pricing Model

Redshift Spectrum Feature