The Bellman Equation is a fundamental recursive relationship used in dynamic programming and reinforcement learning to describe the optimal value of a decision-making problem. It expresses the principle of optimality, which states that the optimal policy (a set of decisions) is composed of optimal sub-policies. Mathematically, it can be represented as:
Here, is the value function representing the maximum expected return starting from state , is the immediate reward received after taking action in state , is the discount factor (ranging from 0 to 1) that prioritizes immediate rewards over future ones, and is the transition probability to the next state given the current state and action. The equation thus captures the idea that the value of a state is derived from the immediate reward plus the expected value of future states, promoting a strategy for making optimal decisions over time.
Start your personalized study experience with acemate today. Sign up for free and find summaries and mock exams for your university.