Let's play a game based on the k-armed bandit problem. The green buttons above might give different rewards each time you press them. The goal is to find out what button would you give the most reward, in expectation. This will help you get a sense of what a bandit agent faces! Press one button to get started. We will pause the game after 10 button presses (actions). You have taken 10 actions so far! There is a button that would give you the most reward, in expectation, meaning the average reward over time if you pressed that button forever. Which button do you think will give you the most reward, in expectation? After you guess, you can press "Answer" below to find out. You can continue to the next section on the left, or to keep playing, just select the button you want to press.