Challenge your understanding of InfluxDB schema design with questions covering measurement practices, tag and field selection, timestamp handling, and best practices for time series database structures. Strengthen your skills in organizing time series data for efficient queries and optimal performance.
Which approach is most recommended when naming measurements for a temperature sensor network storing city temperature values?
Explanation: Using a generic measurement name such as 'temperature' is recommended to group similar data together, allowing easier queries and efficient schema. Putting the city name in the measurement name or making a measurement for each device can lead to schema explosion, making management and querying inefficient. Adding spaces and special characters reduces readability and can cause errors in querying. Keeping measurement names simple and consistent improves maintainability.
When collecting power usage data from multiple buildings, which property should you store as a tag rather than a field?
Explanation: Building ID is a property that is useful for filtering and grouping data in queries, so it should be stored as a tag. Power consumption is a fluctuating value and should be a field; timestamp is handled automatically as part of time series data. Record status may change too often to serve as an effective tag and is usually better as a field for analysis. Tags are indexed, making them ideal for metadata that helps in searching.
Why is it considered a best practice to avoid high cardinality in tag values?
Explanation: Tags with high cardinality (many unique values) can result in excessive memory usage and decreased insert performance due to the need to store and index each possible combination. Low cardinality tags do not inherently prevent data loss. Contrary to one distractor, high cardinality actually harms query and storage performance; it does not improve them or lower storage usage.
If you omit the timestamp when inserting data, what will most time series databases typically do?
Explanation: Time series databases typically use the current system time when a timestamp is not provided, ensuring each data point has a temporal context. Rejecting data or using zero as the timestamp is not default behavior. Duplicating the previous timestamp would compromise data integrity and sequence. Automatic time assignment maintains chronological order.
Which is the most suitable field name for storing humidity percentage in a weather data measurement?
Explanation: The field name 'humidity_percent' is clear, descriptive, and follows naming conventions, making data management easy. 'Humid' is too vague, 'humid!%20' includes special characters (which can cause parsing issues), and 'humiditylevelreading' is unnecessarily long and unclear. Clear and concise field names help maintain schema clarity.
In a scenario collecting server CPU usage metrics, which value is best stored as a tag for efficient querying?
Explanation: Server region is a typical metadata used for filtering and grouping, so storing it as a tag is efficient. Timestamp is intrinsic to time series and automatically handled. CPU usage percentage and disk read speed are fast-changing and thus best as fields, not tags, to avoid high cardinality and maintain performance.
What is 'schema explosion' and how can you prevent it when designing InfluxDB schemas?
Explanation: Schema explosion refers to performance and storage problems that arise from having too many unique measurements or tag values. Limiting unique tags and measurements prevents it. While duplicating field names or having inconsistent timestamps can cause other issues, they do not result in schema explosion. Missing data is handled separately by validation.
Which field data type is most suitable for storing temperature values such as 23.5, 21.7, and 22.9?
Explanation: A float data type is recommended for storing temperature values with decimals, providing accuracy for measurements like 23.5. Boolean is only for true or false values and is not appropriate for numeric data. String is used for textual information, not numerical measurements. Integer cannot represent fractional temperatures, leading to data loss.
Why should you assign a retention policy to time series data in production environments?
Explanation: Retention policies specify how long data is kept, helping to control storage consumption and ensuring old, irrelevant data is removed. They do not enforce uniqueness of fields or tags. While retention policies are important, they are not used for authentication or security measures in database management.
What schema design principle increases the efficiency of queries filtering by sensor location?
Explanation: Storing location as a tag allows the database to index it, enabling fast and efficient queries filtering by location. Including location in the measurement name or as a field makes filtering less efficient due to lack of indexing. Relying on external documentation provides no help for querying. Using tags correctly enhances query performance.