I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning).
I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space.
I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action space. Since standard Q-learning requires the agent to evaluate all possible actions, such an approximation doesn't solve the problem in any practical sense.
Fast forward to this year, folks from DeepMind proposes a deep reinforcement learning actor-critic method for dealing with both continuous state and action space. It is based on a technique called deterministic policy gradient. See the paper Continuous control with deep reinforcement learning and some implementations.