Crazy Stone

There are many board games all over the world. I think you enjoy Chess, Othello, Backgammon, Go, and so on. Go is the game where two players put the white stone and black stone in turn and decide the winner by comparing the areas owned by each player. You might see Go before like the image above. I learned how to play Go when I was in the elementary school and enjoy it since then.

In most of the board games, even top professional human players feel difficulties to beat the artificial intelligence (AI) players. One of the most famous stories about competitions between human and computers is that “Deep Blue vs. Garry Kasparov ” , six-game chess matches between chess champion Garry Kasparov and an IBM supercomputer called Deep Blue. In 1997 Deep Blue defeated Garry Kasparov. It was the first win by computers against world chess champions under the tournament regulations1.

However Go is still dominated by human players. Top professional Go players are still stronger than AI players while they are getting better by improving algorithms. Crazy Stone is one of the strongest Go playing engines, developed by Rémi Coulom. On 21 March, 2014, at the second annual Densei-sen competition, Crazy Stone defeated Norimoto Yoda, Japanese professional 9-dan, in a 19×19 game with four handicap stones by a margin of 2.5 points 2. On 17 March 2015, Chikun Cho (The 25th Hon’inbo) defeated Crazy Stone in a 19×19 game with three handicap stones by a margin of 0.5 points by resignation (185 moves)3. Human player won against AI player in the game. But handicap is smaller from four stones in 2014 to three stones in 2015. I am not sure human players continue to win the competition in 2016.

For the AI players like Crazy Stone, the secret is the technology called “Reinforcement learning” which is used for selecting actions to maximize future reward. So this can be used to support decision making, such as investment management, helicopter control and advertizing optimizations. Let me look at the details.

1. Reinforcement learning can handle delayed rewards

Unlike quiz shows, it takes time to realize whether each action is good or bad for board games. For example, a board of Go has a grid of 19 lines by 19 lines. So at the beginning of the game, it is difficult to know if each action is good or bad as we have a long way to the end of the game. In other words, A reward by each action is not provided immediately after it is taken. Reinforcement learning has a mechanism to handle such cases.

2. Reinforcement learning can calculate optimal sequential actions

In Reinforcement learning, agents play a major role. Agents can take actions based on their observations and strategy. Actions can be formed as “path”, not just one-off action. This is similar to our decision making process. Therefore, Reinforcement learning can support human decision making. Actions are usually considered to have no impact against the environment.

3. Reinforcement learning is flexible enough to use many methods of searching

This is practically important. Like Go, some problems have a huge space to search for optimal actions. Therefore, we need to try several methods to do that. Reinforcement learning is flexible to try these search methods.

If you would like to study it more details, I recommend lectures by David Silver, Google DeepMind London, Royal Society University Research Fellow, University College London.

In future, a lot of devices will have sensors in them and be connected to the internet. Each device will send information, such as locations, temperatures, weather periodically. Therefore the massive amount of time series data is generated, collected automatically through the internet. Based on these data, we need sequential actions to maximize rewards. If we have data from engines in automobiles, we should know when a minor repair is needed and when an overhaul is needed to make engines work for a longer period.. If we have data from customers, we should know when notifications of sales should be sent to maximize the amount of sales in the long run. Reinforcement learning might be used to support this kind of business decisions.

I would like to develop my own AI Go players better than I am. It must be fun to have games with them! Would you like to try it?

Source

1. Deep Blue versus Garry Kasparov

https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov

2. Denseisen (Japanese only)

http://entcog.c.ooco.jp/entcog/densei/past.html

3. 2nd game in the 3rd Denseisen http://entcog.c.ooco.jp/entcog/densei/densei3/2nd_game.html

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy. The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user.

	Node Classification… on “GRAPH CONVOLUTIONAL NET…
	BERT also works very… on BERT performs very well in the…
	BERT also works very… on “BERT” can be a ga…
	BERT also works very… on Let us develop car classificat…
	BERT performs near s… on “BERT” can be a ga…

toshistats

A topnotch WordPress.com site

Could you win the game “Go” against computers? They are smarter now!