Reinforcement Learning -- Finite Markov Decision
Reinforcement Learning — Finite Markov Decision Processes (MDP) MDPs are a classical formalization of sequential decision making, where actions influence not just immediate rewards, but also subsequent situations, or states, and through those future rewards. Thus MDPs involve delayed reward and the need to tradeoff immediate and delayed reward. Whereas in bandit problems we estimated the value $q_*(a)$ of each action $a$, in MDPs we estimate the value $q_*(s, a)$ of each action $a$ in each state s, or we estimate the value $v_*(s)$ of each state given optimal action selections....