Test your foundational knowledge of data engineering with this beginner-friendly quiz covering key Python, AWS, and SQL topics. Ideal for interview preparation and review of essential concepts like inheritance, decorators, AWS Glue, SQL GROUP BY, database schemas, and more.
Which keyword is used in Python to create a child class that inherits from a parent class?
Explanation: In Python, the 'class' keyword is used to define both parent and child classes, supporting inheritance of attributes and methods. 'super' is used inside methods to call the parent class, not to define the child class. 'def' is used for defining functions, and 'import' is for importing modules.
What is a decorator in Python used for?
Explanation: Decorators in Python are used to modify or enhance the behavior of functions or methods without changing their code. They are not related to encrypting data, sorting, or class inheritance. The other options describe entirely different features or functionalities within Python.
When looping over a list in Python and you want both the index and the item, which built-in function should you use?
Explanation: The 'enumerate' function allows you to loop over a sequence while keeping track of both the index and the item. 'zip' combines two or more iterables into tuples, 'sum' adds items together, and 'counter' is not a built-in function for this purpose.
Which Python module is commonly used to write unit tests for your code?
Explanation: 'unittest' is the standard module included with Python for writing unit tests. While 'pytest' is also used for testing, it is third-party and not the standard library option. 'testit' and 'pycheck' are not standard Python modules for unit testing.
What is the main purpose of AWS Glue in data engineering tasks?
Explanation: AWS Glue is designed for ETL (Extract, Transform, Load) tasks like preparing, transforming, and cataloging data. Managing servers and running containers are tasks for other AWS services, while storing backup files is not the main focus of Glue.
How can you provide additional Python packages to an AWS Glue job?
Explanation: AWS Glue jobs require you to upload .whl or .egg files to S3 and reference them using the --extra-py-files parameter. Installing with pip inside the job script is not supported, and simply placing the packages in a directory or schema will not work.
In SQL, which clause is used to group rows that have the same values in specified columns, often for aggregate functions?
Explanation: The GROUP BY clause is used to organize rows into groups with the same values, enabling aggregate calculations. ORDER BY sorts results, WHERE filters rows before grouping, and HAVING filters groups after they're created.
If you have a list ['grape', 'apple', 'banana'] and sort it using Python's sort() method, what will be the first item in the sorted list?
Explanation: Sorting the list lexicographically (like in the dictionary) puts 'apple' first. 'banana' and 'grape' come later. 'bananaapple' is not in the original list, so it's incorrect.
Which statement best describes a database schema?
Explanation: A database schema defines the structure of the entire database, including tables and columns. It does not refer to individual rows, query collections, or SQL functions.
In Python, which type of object copying duplicates nested objects completely rather than just referencing them?
Explanation: Deep copy creates a new object and recursively copies all nested elements, avoiding shared references. Shallow copy only duplicates the outer object. Direct assignment points to the same object, and 'lambda copy' is not a copying term.