How can we create good movies based on big data?

27/08/201526/08/2015 / toshistats / Leave a comment

Last Sunday, my son came to Kuala Lumpur as he has summer vacation now. So I brought him to the movie theater. I chose “Mission: Impossible – Rogue Nation” to entertain him. In the movie, Tom cruise is very active. I cannot believe he is older than I am! My son and I could enjoy the movie very much, as his action is amazing.

Then I am wondering how we can create good movies. Every year, many movies are created, but few of them can stay in people’s mind in the long term. Let me consider it here for a while.

1. How can we define what good movies are?

There are many measures to evaluate movies. Critics can assess the quality of movies. But I would like to make it simple. Number of customers who watch the movie or the amount of sales revenue such as “Box office“of the movie can be used as a measure of “good movie” as it is easy to collect and measure. So the more people watch the movie, the better it is according to our definition of “good movies”.

2. Let us consider something related to the number of customers or sales revenue.

A lot of things relate to them. Like Mission:Impossible, actors and actresses are very important. The director is also important. In addition to that, where is it created? Is it an action movie or a love story or a thriller?, and so on. They may be closely related to the number of customers or sales revenue of the movie. So the data about something related to the number of customers or sales revenue in the past should be collected.

3. How can we obtain predictions of the number of customers or sales revenue of the unseen movie in advance?

Could you remember the last week’s letter about “Target” and “Features”? “Target” is something that we want to predict and “Features” are something that are related to “Target”. Predictions of “Target” can be obtained by inputting “features” into “Statistical model”. I would like to call this unit “module”. I summarize it as follows.

According to our definition of “good movies”, Target is the number of customers or sales revenue of the movie. Features are actors and actress, category of the story, locations where the movie was taken, and so on. So these features are input to statistical models to obtain predictions of target for unseen movies. Based on this analysis, we could predict the sales of movies before they are seen in theaters. It means that good movies could be created based on this prediction. When this prediction is accurate, film production companies might increase sales revenue because they can create good movies based on predictions of Targets. But in reality, we should prepare a lot of data to predict them accurately. In additions to that, customer preference might be changed suddenly, however, it is very difficult to update the statistical models in advance to follow such changes. Therefore, there is a risk where statistical models can not follow circumstance changes in a timely manner.

It should be noted that more features will be available as computers will understand videos or movies. Now the technology 1is in progress. It will enable computers to turn videos into texts. For example, when there is a scene where the swan is on the lake, computers understand the video and make sentences that explain the scene automatically. It means that whole part of the movie can be transformed into texts without human intervention. So movies will be analyzed based on their stories. More features can be identified in the results of this analysis. When new kinds of data are available to us, it may enable us to obtain more features and improve accuracy of predictions. Would you likely to make your own movie in future?

Source

1. A picture is worth a thousand (coherent) words:building a natural description of images, 17 Nov 2014, Google Research

http://googleresearch.blogspot.co.uk/2014/11/a-picture-is-worth-thousand-coherent.html

Note: Toshifumi Kuga’s opinions and analyses are personal views and are intended to be for informational purposes and general interest only and should not be construed as individual investment advice or solicitation to buy, sell or hold any security or to adopt any investment strategy. The information in this article is rendered as at publication date and may change without notice and it is not intended as a complete analysis of every material fact regarding any country, region market or investment.

Data from third-party sources may have been used in the preparation of this material and I, Author of the article has not independently verified, validated such data. I accept no liability whatsoever for any loss arising from the use of this information and relies upon the comments, opinions and analyses in the material is at the sole discretion of the user.

“Prediction” is very important in analyzing big data of the business

20/08/201519/08/2015 / toshistats / Leave a comment

It is a good timing to reconsider “Big data and digital economy” because this name of group on Linledin has four-month-history and more than 100 participants now. I would like to appreciate the cooperation of all of you.

In the beginning of 2000s, I worked in the risk management dept in the Japanese consumer finance company. There is a credit risk model which can predict who is likely to be in a default in the company. I learned it more details and understood how it worked so accurately. I found that if I collect a lot of data about customers, I could obtain accurate predictions for events of defaults in terms of each customer.

Now in 2015, I researched many algorithms and statistical models including the state of art “deep learning”. While there are many usages and objectives in using such models, in my view, the most important thing for business persons is “prediction” just like my experience in consumer finance company because they should make good business decisions to compete in markets.

If you are in health care industry, you may be interested in predictions about who is likely to be cured. If you are in sales, you may be interested in predictions about who is likely to come to the shop and buy the products. If you are in marketing, you may be interested in who is likely to click the advertisement on the web. Whatever you do, predictions are very important for your businesses because it enables us to take the right actions. Let me explain key points about predictions.

Target

What are your interests to predict? Revenue of your business? Number of customers? Satisfaction rate based on client feedback? Price of wine near futures? You can mention anything you want. We call it “Target”. So firstly, “Target” should be defined in predictions so that you can make right business decisions.

Features

Secondly, let us find something related to your target. For example, If you are a sales person and interested in who is likely to buy the products, features are “attributes of each customer such as age, sex, occupation” , “behavior of each customer such as how many times he/she come to the shop per month and when he/she bought the products last time”, “What did he/she click in the web shop” and so on. Based on the prediction, you can send coupons or tickets to “highly likely to buy”customers in order to increase your sales. If you are interested in the price of wine, features may be temperature, amount of rain and locations of farms, and so on. If you can predict the price of wine, you might make good investments of wine. These are just simple examples. In reality, a number of features may be 100, 1000 or more. It depends on whole data you have. Usually the more data you have, the more accurate your predictions are. This is why data is very important to obtain predictions.

Evaluation of predictions

Finally by inputting features into statistical models, predictions of the target can be obtained. Therefore, you can predict who is likely to buy the products when you think of marketing strategies. This is good for your business as marketing strategies can be more effective. Unfortunately customer preferences may be changed in the long run. When situations and environments such as customer preferences are changed, predictions may not be accurate anymore. So it is important to evaluate predictions and update statistical models periodically. No model can work accurately forever.

Once you can obtain the prediction, you can implement processes of the predictions as a daily activity, rather than one-off analysis. It means that data driven decisions are made on a daily basis. It is one of the biggest aspects of “digital economy”. From retail shops to health care and financial industry, predictions are already used in many fields. The methods of predictions are sometimes considered as “black-box”. But I do not think It is good to use predictions without understanding the methods behind predictions. I would like to explain them in my weekly letter in future. Hope you enjoy it!

Could you win the game “Go” against computers? They are smarter now!

13/08/201512/08/2015 / toshistats / Leave a comment

There are many board games all over the world. I think you enjoy Chess, Othello, Backgammon, Go, and so on. Go is the game where two players put the white stone and black stone in turn and decide the winner by comparing the areas owned by each player. You might see Go before like the image above. I learned how to play Go when I was in the elementary school and enjoy it since then.

In most of the board games, even top professional human players feel difficulties to beat the artificial intelligence (AI) players. One of the most famous stories about competitions between human and computers is that “Deep Blue vs. Garry Kasparov ” , six-game chess matches between chess champion Garry Kasparov and an IBM supercomputer called Deep Blue. In 1997 Deep Blue defeated Garry Kasparov. It was the first win by computers against world chess champions under the tournament regulations1.

However Go is still dominated by human players. Top professional Go players are still stronger than AI players while they are getting better by improving algorithms. Crazy Stone is one of the strongest Go playing engines, developed by Rémi Coulom. On 21 March, 2014, at the second annual Densei-sen competition, Crazy Stone defeated Norimoto Yoda, Japanese professional 9-dan, in a 19×19 game with four handicap stones by a margin of 2.5 points 2. On 17 March 2015, Chikun Cho (The 25th Hon’inbo) defeated Crazy Stone in a 19×19 game with three handicap stones by a margin of 0.5 points by resignation (185 moves)3. Human player won against AI player in the game. But handicap is smaller from four stones in 2014 to three stones in 2015. I am not sure human players continue to win the competition in 2016.

For the AI players like Crazy Stone, the secret is the technology called “Reinforcement learning” which is used for selecting actions to maximize future reward. So this can be used to support decision making, such as investment management, helicopter control and advertizing optimizations. Let me look at the details.

1. Reinforcement learning can handle delayed rewards

Unlike quiz shows, it takes time to realize whether each action is good or bad for board games. For example, a board of Go has a grid of 19 lines by 19 lines. So at the beginning of the game, it is difficult to know if each action is good or bad as we have a long way to the end of the game. In other words, A reward by each action is not provided immediately after it is taken. Reinforcement learning has a mechanism to handle such cases.

2. Reinforcement learning can calculate optimal sequential actions

In Reinforcement learning, agents play a major role. Agents can take actions based on their observations and strategy. Actions can be formed as “path”, not just one-off action. This is similar to our decision making process. Therefore, Reinforcement learning can support human decision making. Actions are usually considered to have no impact against the environment.

3. Reinforcement learning is flexible enough to use many methods of searching

This is practically important. Like Go, some problems have a huge space to search for optimal actions. Therefore, we need to try several methods to do that. Reinforcement learning is flexible to try these search methods.

If you would like to study it more details, I recommend lectures by David Silver, Google DeepMind London, Royal Society University Research Fellow, University College London.

In future, a lot of devices will have sensors in them and be connected to the internet. Each device will send information, such as locations, temperatures, weather periodically. Therefore the massive amount of time series data is generated, collected automatically through the internet. Based on these data, we need sequential actions to maximize rewards. If we have data from engines in automobiles, we should know when a minor repair is needed and when an overhaul is needed to make engines work for a longer period.. If we have data from customers, we should know when notifications of sales should be sent to maximize the amount of sales in the long run. Reinforcement learning might be used to support this kind of business decisions.

I would like to develop my own AI Go players better than I am. It must be fun to have games with them! Would you like to try it?

Source

1. Deep Blue versus Garry Kasparov

https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov

2. Denseisen (Japanese only)

http://entcog.c.ooco.jp/entcog/densei/past.html

3. 2nd game in the 3rd Denseisen http://entcog.c.ooco.jp/entcog/densei/densei3/2nd_game.html

Do it yourself for programming of image recognition. It works!

06/08/201505/08/2015 / toshistats / Leave a comment

Recently, Facebook, Pinterest and Instagram have gotten very popular. A lot of pictures and images are generated and sent by users. From human faces to landscape, there are a lot of varieties of pictures on them. In order to enhance their services, image recognition technology has been developed at the astonishing rate. By this technology, computers can understand what the objects in images are. Today, I would like to re-create the simple image recognition by just following the tutorials on the web.

Image recognition can be done by the state of the art “deep learning”. This is one of the latest iterations of computer programming. It sounds so complicated that business personnel may not want to do that by themselves. However, specific programming languages for deep learning are provided as open source and good tutorials are also available on the web, it is possible that the business persons program simple image recognition by themselves even though they may have no expertise in computer science. Let me tell you my experience of that.

1. Choose programming languages

There are several programming languages for deep learning. I choose “Torch” is provided Facebook artificial intelligence research as it becomes open source at the beginning of this year. I think it is easy to learn for beginners.

2. Find good tutorials for the theory

In order to understand what the theory is behind image recognition, I find the best tutorials and lectures provided by the Computer Science Department of University of Oxford 1 . This is a good reference to understand what deep learning is and its applications. Even though the theory is not always required for programming, it is recommended to watch the tutorials before programming in order to grasp broad pictures of image recognition.

3. Let us program image recognition and find what computer says

Programm itself is provided by the tutorial 2. In the tutorial I use image dataset, which has the classes: ‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. So computer should classify each image into one of 10 classes above. I just copy and past programs which are provided in the tutorial. It takes less that 10 minutes. I run the program and obtain the results. Then I choose three of the results and see what the computer says. Name of objects above images are correct answers. The computer provides its answers as the probability of the each class. Therefore sum of the 10 numbers below is close to “1”.

In this result, the correct answer is “frog”. In computer answer, frog has the highest probability of 0.4749…. So the computer has a good guess!

In this result, the correct answer is “cat”. In computer answer, cat has the highest probability of 0.3508…. So the computer has a good guess!

In this result, the correct answer is “automobile”. In computer answer, automobile has the highest probability of 0.3622…. So the computer has a good guess! Although this program is not perfect in terms of accuracy of whole test results, it is reasonable to learn programming of image recognition.

You may not be a computer scientist. However, it is good to program this image recognition by themselves because it enables you to understand how it works based on the state of art deep learning. Once you do it, you do not need to consider image recognition as “Black box”. It is beneficial for you at the age of the digital economy.

Yes, torch and the tutorials are free. No fee is required. Could you try it as your hobby?

Source

1. Machine Learning: 2014-2015, Nando de Freitas, the Computer Science Department of University of Oxford https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/

2. Deep Learning with Torch – A 60-minute blitz

https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb

	Node Classification… on “GRAPH CONVOLUTIONAL NET…
	BERT also works very… on BERT performs very well in the…
	BERT also works very… on “BERT” can be a ga…
	BERT also works very… on Let us develop car classificat…
	BERT performs near s… on “BERT” can be a ga…

toshistats

A topnotch WordPress.com site

Month: August 2015

How can we create good movies based on big data?

“Prediction” is very important in analyzing big data of the business

Could you win the game “Go” against computers? They are smarter now!

Do it yourself for programming of image recognition. It works!