DeepMind bets on an AI system capable of playing poker, chess, go and more


Hear from CIOs, CTOs, and other senior executives and leaders on data and AI strategies at the Future of Work Summit on January 12, 2022. Learn more


DeepMind, the AI ​​lab backed by Alphabet, Google’s parent company, has a long history of investing in gaming AI systems. It’s the lab’s philosophy that games, although they don’t have any Obvious business application, are challenges particularly relevant to cognitive and reasoning skills. This makes them useful benchmarks for the advancement of AI. Over the past few decades, games have spawned the type of self-learning AI that powers computer vision, self-driving cars, and natural language processing.

Continuing on from its work, DeepMind created a system called Player of Games, which the company first revealed in a research paper posted to the Arxiv.org preprint server this week. Unlike other gaming systems previously developed by DeepMind, like AlphaZero, Chess Winner and AlphaStar, Best of StarCraft II, Player of Games can perform well in perfect information games (e.g. Chinese board game Go and chess) as well as in imperfect games. information games (eg poker).

Tasks like planning congestion routes, negotiating contracts, and even interacting with customers all involve trade-offs and taking into account how people’s preferences coincide and conflict, such as in games. Even when AI systems are selfish, they might benefit from coordinating, cooperating, and interacting between groups of people or organizations. Systems like Player of Games, which can therefore reason about the goals and motivations of others, could pave the way for an AI capable of functioning successfully with others, including addressing the issues that arise around maintaining the confidence.

Imperfect versus perfect

Imperfect Info sets contain information hidden from players during the game. In contrast, Perfect Info sets show all information at the start.

Event

The 2nd Annual GamesBeat and Facebook Gaming and GamesBeat: Into the Metaverse 2 Summit

Learn more

The perfect info games take a fair amount of forethought and planning to play well. Players need to process what they see on the board and determine what their opponents are likely to do while working towards the ultimate goal of winning. On the other hand, imperfect information games require players to take into account hidden information and determine how they should act next to win, including bluffing or teaming up against an opponent.

Systems like AlphaZero excel in perfect information games like chess, while algorithms like DeepStack and Libratus work remarkably well in imperfect information games like poker. But DeepMind claims Player of Games is the first “general solid search algorithm” to achieve strong performance in perfect and imperfect information games.

“[Player of Games] learn to play [games] from scratch, just by repeatedly playing the game in self-play mode, ”DeepMind principal researcher Martin Schmid, one of the co-creators of Player of Games, told VentureBeat. “It’s a step towards generality – Player of Games is able to play perfect and imperfect information games, while sacrificing some strength in terms of performance. AlphaZero is stronger than Player of Games in perfect info games, but [it’s] not designed for imperfect information games.

Although Player of Games is extremely generalizable, it cannot simply play all Game. Schmid says that the system should think through all the possible perspectives of each player based on a game situation. While there is only one perspective in perfect information games, there can be. having several in imperfect info games – for example, around 2,000 for poker. Moreover, unlike MuZero, DeepMind’s successor to AlphaZero, Player of Games also needs to know the rules of the game he is playing. MuZero can grasp the rules of perfect information games on the fly.

In its research, DeepMind evaluated Player of Games – formed using Google’s TPUv4 accelerator chipsets – on chess, Go, Texas Hold’Em, and the Scotland Yard strategy game. For Go, he set up a 200-game tournament between AlphaZero and Player of Games, while for chess, DeepMind pitted Player of Games against top performing systems including GnuGo, Pachi and Stockfish as well as AlphaZero. Player of Games’ Texas Hold’Em match was played with the freely available Slumbot, and the algorithm played Scotland Yard against a bot developed by Joseph Antonius Maria Nijssen that the co-authors of DeepMind dubbed “PimBot”.

Above: An abstract view of Scotland Yard, which the gaming player can consistently win.

Image Credit: DeepMind

In Chess and Go, Player of Games was found to be stronger than Stockfish and Pachi in some, but not all, setups, and he won 0.5% of his games against the more powerful Agent AlphaZero. Despite the heavy losses against AlphaZero, DeepMind believes that Player of Games was playing at the level of “a high level human amateur”, and possibly even at the professional level.

Player of Games was a better poker and Scotland Yard player. Against Slumbot, the algorithm won an average of 7 milli big blinds per hand (mbb / hand), where one mbb / hand is the average number of big blinds won per 1,000 hands. (A big blind equals the minimum bet.) Meanwhile, at Scotland Yard, DeepMind reports that Player of Games won “significantly” against PimBot, even when PimBot had more opportunities to seek out winning moves. .

Future work

Schmid thinks Player of Games is a big step towards truly general game systems, but far from the last. The general trend in the experiments was that the algorithm performed better with more computational resources (Player of Games trained on a 17 million ‘steps’ or actions dataset, just for Scotland Yard), and Schmid expects this approach to evolve in the foreseeable future.

“[O]We would expect that apps that benefited from AlphaZero could also benefit from Player of Games, ”Schmid said. “Making these algorithms even more general is exciting research. “

Of course, approaches that promote massive amounts of compute disadvantage organizations with fewer resources, such as startups and academic institutions. This has become especially true in the area of ​​languages, where massive models like OpenAI’s GPT-3 have achieved peak performance, but with resource requirements – often millions of dollars – far exceeding the budgets of the most research groups.

The costs sometimes exceed what is considered acceptable, even in a company with deep pockets like DeepMind. For AlphaStar, the company’s researchers deliberately didn’t try multiple ways to design a key component because the cost of training would have been too high in the minds of executives. DeepMind didn’t make its first profit until last year, when it raked in £ 826million ($ 1.13 billion) in revenue. The year before, DeepMind recorded losses of $ 572 million and incurred $ 1 billion in debt.

It is estimated that AlphaZero cost tens of millions of dollars to train. DeepMind hasn’t disclosed Player of Games’ research budget, but it likely won’t be low given that the number of practice stages for each game ranged from hundreds of thousands to millions.

As research moves from games to other, more commercial areas, such as application recommendations, data center cooling optimization, weather forecasting, materials modeling, mathematics, health care and the calculation of atomic energy, the effects of inequity are likely to become more pronounced. “[A]An interesting question is whether this level of play is achievable with less computing resources ”, ask Schmid and his fellow co-authors – but leave unanswered – in the article.

VentureBeat

VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the topics that interest you
  • our newsletters
  • Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
  • networking features, and more

Become a member


Comments are closed.