Let us consider “Brain as a service” again now!

monitor-1307227_1280

Two years ago, I wrote my article about Computer Go player “AlphaGo” and talk about “Brain as a service” in future. Because AlphaGo is so strong and it can improve itself by reinforcement learning with self-play.  Now I am more confident that “Brain as a service” will be available in near future. Let us consider why I think so.

 

1. Self-play without human interventions

In Oct 2017, DeepMind released a new version of Computer Go player “AlphaGo Zero“. The previous version of AlphaGo learned from human’s play at the early stage of training. But AlphaGo Zero can improve themselves without human interventions and knowledge. Starting with nothing, it can be stronger than human Go-champion. This is incredible! Of course, the real world is not the game of Go so we should modify self-play to apply our real-life problems. But fundamentally, there are many chances to improve our society by using self-play as it provides super-human solutions if it is correctly implemented. AlphaGo Zero proves it is true.

 

2. Reinforcement learning can be researched anywhere on the earth

Now  I research OpemAI Gym, which is an environment/simulator for reinforcement learning (RL). This is provided by OpenAI which is a nonprofit organization established by Elon Musk, Sam Altman. OpenAI provides us not only research results of theory but also codes of them to implement in our system. It means that as long as we have an access to the internet, we can start our own research of reinforcement learning based on OpenAI Gym. No capital is required as codes of OpenAI Gym are provided for free. Just download and use them as they are open-source-software.  Applications of RL like AlphaGo can be developed anywhere in the world. If you want to try it, go to the OpenAI Gym website and set OpenAI Gym by yourself. You can enjoy cool results of reinforcement learning!

 

3. It will be easier to obtain data from the world

Google said “The real world as your playground: Build real-world games with Google Maps APIs”  last week. It means that any game developers can create real-world-game by using Google Maps. We can access to countless 3D buildings, roads, landmarks, and parks all over the world as digital assets. This is amazing!  But this should not be considered as just a matter of games. This is just one of the example to tell us how we can obtain data from the world because we can create real-world computer-vision simulators with this service.  In addition to that,  I would like to mention blockchain a little. blockchain can be used to connect the world in a transparent manner. I imagine that many data inside companies or organizations can be accessed more easily through blockchain in near future. Therefore we will be able to accelerate AI development with far more data than now at a rapid pace.  This must be excited!

 

 

” These things must be a trigger to change the landscape of our business, societies and lives. Because suddenly computers can be sophisticated enough to work just like our brain.  AlphaGo teaches us that it may happen when a few people think so. Yes, this is why I think that the age of “Brain as a Service” will come in near future.  How do you think of that?”

This is what I said two years ago. Of course, it is impossible to predict when “Brain as a Service” will be available. But I am sure we are going in this direction step by step.  Do you agree that?

 

 

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I and TOSHI STATS.SDN.BHD. accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Advertisements

“Monte Carlo tree search ” is the key in AlphaGo Zero!

On October last year, Google DeepMind released “AlphaGo Zero“.  It is stronger than all previous versions of Alpha Go although this new version uses no human knowledge of Go for training. It performs self-play and gets stronger by itself.  I was very surprised to hear the news. Because we need many data to train the model in general.

Today, I would like to consider why AlphaGo Zero works well from the viewpoint of Go-player as I played it for entertainment purpose so many years. I am not a Profesional Go player. But I have the expertise of both Go and Deep learning.  So it is a good opportunity for me to consider it now.

When I play Go,  I make decisions for next move based on the intuition in many cases because I am very confident that “it is right’. But when we are in a more complex situation in Go and are not so sure what the best move is, I should try many paths that I and my opponent can take each turn in my mind (not move on the real board) and want to choose the best move based on trials.  We call it “Yomi” in Japanese.  Unfortunately, I sometimes perform “Yomi” wrongly, then I make a wrong decision to move. Professional Go players perform “Yomi” much more accurately than I do.  This is the key to be strong players in Go.

 

Then I wonder how AlphaGo Zero can perform “Yomi” effectively.  I think this is the key to understand AlphaGo Zero. Let me consider these points

 

1.Monte Carlo tree search (MCTS) performs “Yomi” effectively.

Next move can be decided by the policy/value function. But there might be another better move. So we need to search for it. MCTS is used for this search in AlphaGo Zero. Based on the paper, MCTS can find the better move that original move was chosen by the policy/value function.  DeepMind says MCTS works as “powerful policy improvement operator” and “improved MCTS-based policy” can be obtained. This is great as it means that AlphaGo Zero can perform “Yomi” just like us.

 

2. A game can be continued by Self-play without human knowledge.

I wonder how we can play a whole game of Go without human knowledge. The paper explains it as follows    “Self-play with search—using the improved MCTS-based policy to select each move, then using the game-winner z as a sample of the value—may be viewed as a powerful policy evaluation operator.”  So just playing games with itself,  the winner of the game can be obtained as a sample. These results are used for next learning processes. Therefore ”Yomi” by AlphaGo Zero can be more accurate.

 

 

3. This training algorithm is very efficient to learn from scratch

Computers are very good at performing simulations so many times automatically.  So without human knowledge in advance, AlphaGo Zero can be stronger and stronger when it does “self-play” so many times. Based on the paper, starting with random play, AlphaGo Zero outperformed the previous version of AlphaGo that beat Lee Sedol in March 2016,  just after 72 hours training. This is incredible because it is required only 72 hours to develop the model to beat professional players from scratch without human knowledge.

 

 

Overall, AlphaGo Zero is incredible. If AlphaGo Zero training algorithm can be applied to our businesses,  AI professional-businessman might be created in 72 hours without human knowledge. This must be incredibly sophisticated!

Hope you enjoy the story of how AlphaGo Zero works. This time I overview the mechanism of AlpahGo Zero.  When you are interested in it more details, I recommend watching the video by DeepMind. In my next article, I would like to go a little deeper into MCTS and training of models.  It must be exciting!  See you again soon!

 

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel & Demis Hassabis
Published in NATURE, VOL 550, 19 OCTOBER 2017

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

This could be a new way to train deep reinforcement learning in 2018!

flowers-19830_1280

As the end of this year is coming soon, I would like to consider what comes next in artificial intelligence next year.   So this week I reviewed several research papers and found something interesting to me. This is about “Genetic Algorithm (GA)”.  The paper (1) explains GA can be applied to deep reinforcement learning to obtain optimizations over parameters. This must be exciting for many researchers and programmers of deep learning.

 

According to Wikipedia, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection. As I had the project using GA in Tokyo more than 10 years ago,  I would like to re-perform GA in the context of deep learning in 2018. To prepare for that, I would like to explain major components of GA below

  • Gene: They are instruments that we can modify to optimize.  It has similar functions to human’s gene.  In order to adjust environments around them, they can be modified through GA operations as I explain below.
  • Generation: This is the corrections of an individual gene at the certain period. As time passes, new generations will be created recurrently.
  • Selection: Based on fitness values, better genes are selected to make next generations. There are many ways to do that. The best gene may be retained.
  • Crossover: Parts of components of genes are exchanged between different genes.  It can accelerate diversifications among genes.
  • Mutation: Part of components are changed into another one, Mutation and crossover are derived from the processes of evolutions in nature.

 

This is an awesome image to explain GA with ease (2). Based on fitness values, genes with higher scores are selected.  When reproductions are performed,  some of the selected genes can remain as the same before (Elite Strategy). Crossover and mutation are performed against the rest of genes to create next generations.  You can see the fitness values in generation t+1 are bigger than generation t. This is a basic framework of GA.  There are many variations of GA in terms of the way to create next generations.

Image 2017-12-25 17-36-42

Genetic algorithm

I put the simple python code of GA for portfolio management in Github. If you are interested in GA more details. Please look at it here.

 

Although GA has a long history,  its application to deep learning is relatively new.  In TOSHI STATS, which is AI start-up,  I continue to research how GA can be applied to deep learning so that optimizations can be obtained effectively.  Hope I can update you soon in 2018.  Happy new year to everyone!

 

 

 

1.Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning, Felipe Petroski Such Vashisht Madhavan Edoardo Conti Joel Lehman Kenneth O. Stanley Jeff Clune, Uber AI Labs, 18 December 2017

2. IBA laboratory, a research laboratory of Genetic and Evolutionary Computations (GEC) of the Graduate School of Engineering, The University of Tokyo, Japan.

 

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

 

Let us overview the variations of deep learning now !

office-581131_640

This weekend, I research recurrent neural network (RNN) as I want to develop my small chatbot. I also run program of convnet as I want to confirm how they are accurate.  So I think it is good timing to overview the variations of deep learning because this makes it easier to learn each of network in details.

 

1. Fully connected network

This is the basic of deep learning. When you heard the word “deep learning”, it means “Fully connected network” in most cases. Let us see the program in my article of last week again. You can see “fully_connected” in it.  This network is similar to the network in our brain.

Deep Learning

 

2. Convolutional neural network (Convnet)

This is mainly used for image recognition and computer vision. there are many variations in convnet to achieve higher accuracy. Could you remember my recommendation of TED presentations before?  Let us see it again when you want to know convnet more.

 

3. Recurrent neural network (RNN)

The biggest advantage of RNN is that no need to use fixed size input (Covnet needs it). Therefore it is frequently used in natural language processes as our sentences are sometimes very short and sometimes very long. It means that RNN can handle sequence of input data effectively. In order to solve difficulties when parameters are obtained, many kind of RNN are developed and used now.

RNN

 

4. Reinforcement learning (RL)

the output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward.

  • The goal in selecting each action is to maximize the expected sum of the future rewards. We usually use a discount factor for delayed rewards so that we don’t have to look too far into the future.

This is a good explanation according to the lecture_slides-lec1 p46 of  “Neural Networks for Machine Learning” by Geoffrey Hinton, in Coursera.

 

 

Many researchers all over the world have been developing new models. Therefore new kind of network may be added in near future. Until that, these models are considered as building blocks to implement the deep learning algorithms to solve our problems. Let us use them effectively!

 

Notice: TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software

Will the self-driving cars come to us in 2020?

city-1284400_640Since last year, the progress of development of self-driving cars are accelerated rapidly.  When I wrote about it last year, someone may not be convinced that the self-driving cars come true. But now no one can doubt about self-driving cars anymore. The problem is when it comes in front of us.  I would like to consider several key points to develop the technology of self-driving cars.

 

1.Data from experiments

It is key to develop self-driving car effectively. Because self-driving cars need artificial intelligence in it to drive cars by themselves without human interventions. As you know, artificial intelligence looks like our brains.  When we are born, our brain is almost empty. But as we grow, we can learn many things through our experiences.  This is the same for artificial intelligence. It needs massive amounts of data to learn. Recently, Google and  Fiat Chrysler Automobiles NV announced that they cooperate to enhance development of self-driving cars. According to the article on Bloomberg, “The carmaker plans to develop about 100 self-driving prototypes based on the Chrysler Pacifica hybrid-powered minivan that will be used by Google to test its self-driving technology.”(1)  The more cars are used in the experiments, the more data they can obtain. Therefore, it enables Google to accelerate to develop self-driving cars more rapidly.

 

2. Algorithm of artificial intelligence

With data from experiments, artificial intelligence will be more sophisticated.  The algorithms of artificial intelligence, which are called “Deep Learning” should be more effective from now.  Because driving cars generates sequences of data and need sequential decision making processes, such as stop, go, turn right, accelerate, and so on,  we need algorithms which can handle these situations. In my view, the combination of deep learning and reinforcement learning can be useful to do that.  This kind of technologies is developed in research centers, such as Google DeepMind which is famous for the artificial intelligence Go player. It says this technology can be used for robotics, medical research and economics.  So why not for self-driving cars?

 

3. Interactions with human drivers

It seems to be very difficult to decide who is responsible for driving cars.  Initially, self-driving cars might appear with the handle and brakes. It means that human can intervene the operations of self-driving cars. When accidents happen,  who is responsible?  Human or machines?  When self-driving cars without handle and brakes are available,  machines are responsible as human can not control cars anymore. So the machines are 100% responsible for accidents. It is very difficult to decide which is better, self-driving cars with and without handle and breaks. It depends on the development of technologies and regulations.

 

Impact on society is huge when self-driving cars are introduced to us.  Bus, Taxi, Track could be replaced with self-driving cars.  Not only drivers but also road maintenance  companies, car insurance companies, roadside shops, traffic light makers, railway companies, highway running companies,  car maintenance companies and car parking providers are also heavily impacted. Government should consider how we can implement self-driving cars to our societies effectively. I do not think we have spare time to consider it. Let us start it today!

 

(1) http://www.bloomberg.com/news/articles/2016-05-03/fiat-google-said-to-plan-partnership-on-self-driving-minivans

 

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Could you win the game “Go” against computers? They are smarter now!

game-362619_640There are many board games all over the world. I think you enjoy Chess, Othello, Backgammon, Go, and so on.  Go is the game where two players put the white stone and black stone in turn and decide the winner by comparing the areas owned by each player. You might see Go before like the image above.   I learned how to play Go when I was in the elementary school and enjoy it since then.

In most of the board games,  even top professional human players feel difficulties to beat the artificial intelligence (AI) players.  One of the most famous stories about competitions between human and computers is that “Deep Blue vs. Garry Kasparov ” ,  six-game chess matches between chess champion Garry Kasparov and an IBM supercomputer called Deep Blue.  In 1997  Deep Blue defeated  Garry Kasparov. It was the first win by computers against world chess champions under the tournament regulations1.

However Go is still dominated by human players.  Top professional Go players are still stronger than AI players while they are getting better by improving algorithms. Crazy Stone is one of the strongest Go playing engines, developed by Rémi Coulom. On 21 March, 2014, at the second annual Densei-sen competition, Crazy Stone defeated Norimoto Yoda, Japanese professional 9-dan, in a 19×19 game with four handicap stones by a margin of 2.5 points 2. On 17 March 2015,  Chikun Cho (The 25th Hon’inbo) defeated Crazy Stone in a 19×19 game with three handicap stones by a margin of 0.5 points by resignation (185 moves)3.   Human player won against AI player in the game.  But handicap is smaller from four stones in 2014 to three stones in 2015.  I am not sure human players continue to win the competition in 2016.

For the AI players like Crazy Stone, the secret is the technology called “Reinforcement learning” which is used for selecting actions to maximize future reward.  So this can be used to support decision making, such as investment management,  helicopter control and advertizing optimizations.  Let me look at the details.

 

1. Reinforcement learning can handle delayed rewards

Unlike quiz shows,  it takes time to realize whether each action is good or bad for board games.  For example, a board of Go has a grid of 19 lines by 19 lines. So at the beginning of the game, it is difficult to know if each action is good or bad as we have a long way to the end of the game. In other words, A reward by each action is not provided immediately after it is taken. Reinforcement learning has  a mechanism to handle such cases.

2. Reinforcement learning can calculate optimal sequential actions

In Reinforcement learning, agents play a major role.  Agents can take actions based on their observations and  strategy.  Actions can be formed as “path”, not just one-off action. This is similar to our decision making process. Therefore, Reinforcement learning can support human decision making.  Actions are usually considered to have no impact against the environment.

3. Reinforcement learning is flexible enough to use many methods of searching

This is practically important.  Like Go, some problems have a huge space to search for  optimal actions. Therefore, we need to try several methods to do that. Reinforcement learning is flexible to try these search methods.

If you would like to study it more details,  I recommend lectures by David Silver, Google DeepMind London, Royal Society University Research Fellow, University College London.

 

In future, a lot of devices will have sensors in them and be connected to the internet. Each device will send information, such as locations, temperatures, weather periodically. Therefore the massive amount of time series data is generated, collected automatically through the internet.  Based on these data,  we need sequential actions to maximize rewards.   If we have data from engines in automobiles,  we should know when a minor repair is needed and when an overhaul is needed to make engines work for a longer period..  If we have data from customers, we should know when notifications of sales should be sent to maximize the amount of sales in the long run.  Reinforcement learning might be used to support this kind of business decisions.

I would like to develop my own AI Go players better than I am.  It must be fun to have games with them!  Would you like to try it?

 Source
1. Deep Blue versus Garry Kasparov
2. Denseisen (Japanese only)

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy.  The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user. 

Can you win Atari games against computers? It seems to be impossible anymore

minecraft-529460_1280

I think it is better to watch the youtube of interview here first. Onstage at TED2014, Charlie Rose interviews Google CEO Larry Page about his far-off vision for the company.  Page talks through the company’s recent acquisition of Deep Mind, an AI that is learning some surprising.  At the time of 2 minutes 30 seconds in his interview,  he talks about DeepMind for two minutes.

 

According to white paper from DeepMind which were bought by Google at 650m USD in Jan 2014,  in three games of Atari 2600, Breakout, Enduro, Pong,  human can not win against computers after computer learns how each game works for a couple of hours.  There is only one same program prepared for each game and there is no input about how to win the specific game in advance.  It means that only one program should learn how to obtain high score from scratch by itself.  At the result of six games,  computers could record higher score than human experts in three games. It is amazing.

Reinforcement learning, one of machine learning, is used in this challenge. It is different form machine learning used in image recognition and natural language processing.  In reinforcement learning,  reward functions are used to decide what the best policy among many choices in the long run.  We can say in short “how much we should give up today’s lunch,  in order to maximize total sum of lunches tomorrow and later”. We always face this kind of problems but it is difficult for computers to answer.  However DeepMind proved reinforcement learning works well against this kind of problems when they presented the demo at the end of 2013.

 

If this kind of decision-making is available by computers, it will give huge impacts to intellectual jobs, such as lawyers, fund managers, analysts and cooperate officers because they make decisions in long-term horizon, rather than outcomes in tomorrow. They have a lot of experiences in the past, some of  them are successes and others are failures, they can use these experiences when they make a plan for the future.  If computers can use same logic as human and make decisions by themselves, it can be a revolution for intelligent job.  For example, at board meetings in companies, computers may answer questions about management strategies from board members based on the massive amount of past examples and tell them how to maximize future cash flow by using reinforcement learning.  Future cash flow is the most important thing to board members because share holders require to maximize it.

 

Currently a lot of discussions about our future jobs are going on because it is probable that many jobs will be replaced by computers in near future. If reinforcement learning have been improved, CEO of companies might be replaced by computers and share holders might welcome for them in future ?!