Explore foundational concepts and techniques for time series prediction in machine learning. This quiz covers key methods, features, error metrics, and challenges involved in forecasting time-dependent data, helping learners solidify their understanding of the core principles.
Which of the following datasets is considered time series data?
Explanation: Hourly temperature measurements are taken at consistent intervals, and their order matters, making them a classic example of time series data. Population census and sales by region don't inherently include time sequences—it’s not about the order. Countries sorted alphabetically is not time-based at all. Time series requires temporal ordering, which only the correct option includes.
What is predicted in a simple autoregressive (AR) time series model at time t?
Explanation: Autoregressive models predict the present value using a linear combination of its previous values, capturing how past observations influence the current one. Predicting a random value has no logical basis. The average or maximum of all previous values misses the sequential dependency central to AR models. Thus, only the first option accurately describes AR.
In time series forecasting, what does a moving average model typically compute?
Explanation: A moving average model predicts by averaging a set number of recent values, smoothing out random short-term fluctuations. The cumulative sum adds up all values but doesn't average them. The difference gives changes, not a forecasted value. The product is unrelated to usual statistical forecasting methods. Only the mean of recent values is correct.
Which derived feature is commonly added to time series data for daily sales prediction?
Explanation: The day of the week can capture weekly seasonality or patterns in sales data, providing helpful context for prediction. Longitude is not a temporal feature. The event start year is static and irrelevant for short-term sales trends. File size is unrelated to temporal characteristics in sales data.
If a time series shows similar upward trends every December, what is this repeated pattern called?
Explanation: Seasonality describes periodic patterns that repeat at regular intervals, such as yearly increases every December. Heteroscedasticity refers to changing variance. Random walks have no predictable pattern. Interpolation involves estimating intermediate values. Only seasonality correctly defines the repeated pattern.
Why is it important for some time series models to work with stationary data?
Explanation: Many time series models assume stable statistical properties; non-stationarity can distort results and make predictions unreliable. While making data stationary does not eliminate noise, it makes underlying patterns more predictable. Increasing randomness is not desired. Non-stationary data is, in fact, harder—not easier—to model.
What is the role of a lag feature in time series analysis?
Explanation: Lag features help models use past values to predict future ones, capturing temporal dependencies. It does not measure prediction speed or count missing values. Lag features do not smooth fluctuations unless combined with other techniques like moving averages.
Which approach is most appropriate for splitting time series data into training and testing sets?
Explanation: Time series prediction mimics real-world scenarios where past data predicts the future, so holding out the latest data for testing is the logical approach. Random shuffling breaks the temporal sequence; sorting by value ignores time. Using only the last point as the test set can be insufficient for evaluation.
Which error metric is suitable for evaluating time series regression predictions involving temperature forecasts?
Explanation: MAE measures the average magnitude of errors in regression predictions, making it appropriate for continuous variables like temperature. Accuracy, confusion matrix, and precision are primarily used for classification tasks, not regression.
If a time series has occasional missing values, what is a common, simple method to fill them?
Explanation: Forward filling uses the most recent available value to fill in gaps, preserving trends and sequence. Replacing with the maximum or a random number can create distortions. Dropping all rows with missing data may remove too much information, especially if only a few values are missing.
In time series forecasting, what does the term 'forecast horizon' refer to?
Explanation: The forecast horizon is how far ahead the predictions extend, such as predicting the next 5 days. Trend line height, error variance, or window average do not relate to the time frame of forecasting.
How does time series forecasting fundamentally differ from general regression problems?
Explanation: Time series forecasting leverages the time order and dependencies, essential for predicting future values, while standard regression does not usually consider data order. Forecasting is not restricted to small datasets, does not ignore trends/seasonality, and is often applied to continuous data.
What is the main goal of applying smoothing techniques like exponential smoothing to time series data?
Explanation: Smoothing helps highlight underlying patterns by minimizing random, short-term variations. Increasing variance would make analysis harder. Shuffling breaks the time structure, and artificially adding seasonality usually is not a goal.
Which process refers to calculating statistics (like mean) over a sliding subset within the time series?
Explanation: A rolling window computes statistics for a moving subset, such as a 7-day moving average. Bootstrapping involves sampling with replacement. Feature selection is choosing predictors. Clustering groups data but does not focus on subsequence calculations.
What is the main purpose of decomposing a time series into components?
Explanation: Decomposition separates a series into its underlying trend, seasonal, and residual components, making analysis clearer. It does not mean merging series. Artificial feature creation is a separate process, as is removing time-based attributes, which actually reduces useful information.
Which scenario describes data leakage in time series prediction?
Explanation: Data leakage occurs if future data is used inappropriately during training, which unrealistically improves model performance. Proper train-test split, lag features, and forward fill methods avoid including information unavailable at prediction time, thus preventing leakage.