I have implemented the value iteration algorithm for simple Markov decision process Wikipedia in Python. In order to keep the structure (states, actions, transitions, rewards) of the particular Markov process and iterate over it I have used the following data structures:
dictionary for states and actions that are available for those states:
SA = { 'state A': {' action 1', 'action 2', ..}, ...}
dictionary for transition probabilities:
T = {('state A', 'action 1'): {'state B': probability}, ...}
dictionary for rewards:
R = {('state A', 'action 1'): {'state B': reward}, ...}
.
My question is: is this the right approach? What are the most suitable data structures (in Python) for MDP?
I implemented Markov Decision Processes in Python before and found the following code useful.
http://aima.cs.berkeley.edu/python/mdp.html
This code is taken from Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig.