Explore core differences between model-based and model-free reinforcement learning methods by answering questions about definitions, characteristics, and simple scenarios. This quiz helps you assess your understanding of key RL approaches, decision making, planning, and learning mechanisms in artificial intelligence.
Which of the following best describes a model-free reinforcement learning method?
Explanation: A model-free RL approach does not construct a model of the environment's transition probabilities and instead learns proper actions based on direct interaction with the environment. Building a complete map of the environment is characteristic of model-based RL. Using only supervised learning is not reinforcement learning at all. Requiring prior expert demonstrations is associated with imitation learning, not model-free RL.
In model-based reinforcement learning, what is the main additional component compared to model-free RL?
Explanation: Model-based RL learns or is given a model of how its actions affect the environment, allowing the agent to plan ahead. Using only one exploration approach or ignoring experience is not unique to model-based methods. Responding only to negative rewards is not specific to any RL method and would limit learning.
Which statement best highlights a typical advantage of model-based RL over model-free RL in new environments?
Explanation: Model-based RL can simulate the environment using its learned or given model, allowing efficient planning and faster learning from limited experience. Acting randomly is inefficient and not accurate for model-based RL. Model-free RL cannot predict environment transitions with perfection. All RL methods, including model-free, benefit from exploration to improve policy.
How does planning typically differ between model-based and model-free RL agents when facing a maze as an environment?
Explanation: Model-based agents can plan by simulating moves using a learned model of the environment, which helps them find the best path. Model-free agents do not build explicit maps but instead depend on past learned value estimates. Random walking is not a strategic trait of model-based agents. Model-free agents can update strategies through learning, so the last option is incorrect.
Which type of RL is generally better at adapting quickly when environment rules change slightly after training?
Explanation: Model-based RL can adapt quickly when the environment changes, as it updates its model and replans. Model-free RL is slower to adapt to changes because it relies on repeated experience rather than explicit understanding. Randomized control and passive learning do not refer to standard RL approaches and do not specifically handle environment adaptation.
In which RL approach is the one-step lookahead planning method most commonly used?
Explanation: Model-based RL methods use one-step lookahead by simulating possible next states and rewards before taking action. Model-free RL lacks an explicit model and so cannot perform lookahead in this manner. Unsupervised learning does not involve rewards or planning. 'Reactive learning' is not a standard technical term in RL.
If an agent in a board game learns by trial and error and does not try to predict opponent moves in advance, what RL approach is it most likely using?
Explanation: In this scenario, the agent directly learns from the outcomes of its actions rather than by simulating future steps, which is characteristic of model-free RL. Model-based RL tries to predict and plan for future opponent moves. Predictive regression learning is not an RL approach, and optimal control refers more to a broader field, not specifically to this agent behavior.
Which RL method focuses mainly on learning a value function or policy directly from reward signals, without using a model of the environment?
Explanation: Model-free RL methods such as Q-learning and policy gradients learn values and policies from experience only, without simulating or modeling the environment. Model-based simulation requires a transition model. Environment regression and trajectory learning are not standard terms describing core RL approaches.
Which RL method typically requires more computational resources per decision when a new action choice needs to be made?
Explanation: Model-based RL often spends extra computation simulating outcomes before taking each action, especially in planning. Model-free RL is usually faster at decision time since it only looks up or computes actions based on stored values. Exploratory decision making and passive RL are not distinct learning methods and do not directly relate to computational demands at decision time.
Is 'Model-based' versus 'Model-free' the same as 'on-policy' versus 'off-policy' methods in reinforcement learning?
Explanation: Model-based vs. model-free is about whether the agent uses an explicit model of the environment, while on-policy vs. off-policy refers to how data is collected and used for learning. Model-free is not always off-policy; it can be on-policy as well. Model-based and on-policy methods are not identical, and model-based methods can indeed be implemented in both on- and off-policy ways.