Policy or Value ? Loss Function and Playing Strength in AlphaZero
Por um escritor misterioso
Last updated 19 novembro 2024
Results indicate that, at least for relatively simple games such as 6x6 Othello and Connect Four, optimizing the sum, as AlphaZero does, performs consistently worse than other objectives, in particular by optimizing only the value loss. Recently, AlphaZero has achieved outstanding performance in playing Go, Chess, and Shogi. Players in AlphaZero consist of a combination of Monte Carlo Tree Search and a Deep Q-network, that is trained using self-play. The unified Deep Q-network has a policy-head and a value-head. In AlphaZero, during training, the optimization minimizes the sum of the policy loss and the value loss. However, it is not clear if and under which circumstances other formulations of the objective function are better. Therefore, in this paper, we perform experiments with combinations of these two optimization targets. Self-play is a computationally intensive method. By using small games, we are able to perform multiple test cases. We use a light-weight open source reimplementation of AlphaZero on two different games. We investigate optimizing the two targets independently, and also try different combinations (sum and product). Our results indicate that, at least for relatively simple games such as 6x6 Othello and Connect Four, optimizing the sum, as AlphaZero does, performs consistently worse than other objectives, in particular by optimizing only the value loss. Moreover, we find that care must be taken in computing the playing strength. Tournament Elo ratings differ from training Elo ratings—training Elo ratings, though cheap to compute and frequently reported, can be misleading and may lead to bias. It is currently not clear how these results transfer to more complex games and if there is a phase transition between our setting and the AlphaZero application to Go where the sum is seemingly the better choice.
AlphaZero, a novel Reinforcement Learning Algorithm, in JavaScript, by Carlos Aguayo
AlphaZero Explained · On AI
Reimagining Chess with AlphaZero, February 2022
Value targets in off-policy AlphaZero: a new greedy backup
Frontiers AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong
AlphaGo Zero – How and Why it Works – Tim Wheeler
Frontiers AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong
AlphaGo Zero – How and Why it Works – Tim Wheeler
Reimagining Chess with AlphaZero, February 2022
Policy or Value ? Loss Function and Playing Strength in AlphaZero-like Self- play
Does the neural net of AlphaZero only evaluate the score of a given chess position or does it do something else? - Quora
Recomendado para você
-
AlphaZero really is that good19 novembro 2024
-
Inside the (deep) mind of AlphaZero19 novembro 2024
-
Simple Alpha Zero19 novembro 2024
-
AlphaZero: DeepMind's New Chess AI19 novembro 2024
-
AlphaZero - Chessprogramming wiki19 novembro 2024
-
Leela Zero( A Neural Network engine similar to Alpha Zero) - Chess Forums - Page 1519 novembro 2024
-
Simplifying MuZero in Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model — Andrew Silva19 novembro 2024
-
Cammy street fighter alpha/ zero 3 Greeting Card by watolo19 novembro 2024
-
Free Course: Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess (Paper Explained) from Yannic Kilcher19 novembro 2024
-
Why Artificial Intelligence Like AlphaZero Has Trouble With the19 novembro 2024
você pode gostar
-
They Turned SCP-939 Into A Plushie19 novembro 2024
-
Shikimori's went mall with his brother19 novembro 2024
-
The Swivel Guns - Discover Lewis & Clark19 novembro 2024
-
Mangá Cells At Work - Volume 2 (Panini, lacrado) - Geek Point19 novembro 2024
-
Live Wire Athens - AthFest19 novembro 2024
-
HURRY! GET ROBLOX FREE HAIR 🤩🥰 (2023)19 novembro 2024
-
Baixar Drift Clash no PC com NoxPlayer19 novembro 2024
-
Dwayne Johnson Workout Routine and Diet Plan: Train like The Rock19 novembro 2024
-
Quiz História do Brasil e do Mundo - 10 Perguntas selecionadas para te desafiar neste Super Quiz!19 novembro 2024
-
Em promoção! O Brinquedo Do Gato De Penas De Pau De Brinquedo Para Gatos Filhotes Interativa Do Gato De Brinquedo Animal De Estimação Com Sino De Brinquedos Do Animal De Estimação Suprimentos19 novembro 2024