Reinforcement Learning

How to search action space

seoho Song 2019. 7. 24. 19:25

1. Greedy apporach : Just taking an action that gives the biggest reward at each time.

- Cons : It the second best action was taken at the first time, it cannot be changed after.

 

2. Random approach : At every time, taking random action.

- Cons : Ideal only when a random policy is the optimal.

 

3. Epsilon-greedy apporach : Takes an action that gives the biggest reward, but with small probability, take an action randomly. Epsilon is a hyperparameter that means the probability to take a random action. Usually initialized as high and gradually decreases as a small constant (e.g. 0.1)  -- Almost standard

 

4. Boltzman approach : Takes an action with weighted probability. Use softmax to get estimates for each action.

- Pros : Consider also information about the value for the other actions.