Definition
A learning Policy is called GLIE (Greedy in Limit with Infinite Exploration) if it satisfies:
- All state-action pairs are explored infinitely many times where is incremental count of a .
- The learning policy converges to a greedy policy. where