Reinforcement Learning Based Empirical Comparison of UCB, Epsilon-Greedy, and Thompson Sampling

Document Type : Primary Research paper


School of Computer Science, UPES, Dehradun, Uttarakhand India


Reinforcement learning is a referred for what to do, how to align conditions to action, which leads to enhance analytical reward signal. The learner is not stated to determine which actions to take, although it must determine which actions return the most reward by applying them. In most exciting and puzzling cases, the actions may affect immediate reward and also the following situation and, through that, all succeeding rewards. These two characteristics as trial-and-error search and delayed reward are the two most significant distinguishing features of reinforcement learning. The main difference between reinforcement learning and other forms of learning is that it uses training knowledge to measure the actions performed rather than instructing by providing correct actions. This is what necessitates active exploration and an explicit quest for appropriate actions. Purely evaluative feedback shows how effective the action was, but not whether it was the best or worst choice. This paper investigates on a comparative analysis of Epsilon-Greedy, UCB, and Thompson sampling algorithm. Experimental results gives clear insight of comparative analysis of process.