It is awesome if you can create your own news-broadcasting, isn’t it?

apple-ipad-551502_1280

News broadcastings are well-known from everyone. For example, CNN, financial times and Bloomberg, etc.  If you can make your own news broadcasting, it is awesome and amazing. But is it possible?  One of the obstacles is how we can collect articles and information from all over the world in real-time basis.  Of course I do not have my own network of news correspondents all over the globe. Then, what should we do about that?

Last week I found the blog about “GDELT 2.0“. The GDELT Project, which monitors events driving global society, creating a free, open platform for computing in the entire world, was founded and led by Kalev H. Leetaru. The GDELT Project’s full name stands for the Global Database of Events, Language, and Tone (GDELT).  Now this project is going to a new stage of “GDELT 2.0”.  Compare with “GDELT 1.0”,  “GDELT 2.0” has a great deal of progress as follows

 

1.  “GDELT 2.0” can cover documents and information written in 65 languages

There is a lot of linguistic communication to be written and spoken all over the world. If we try to cover all over the Earth, we need to understand languages other than English. For example, an apple is called “Ringo” in Japanese. If computers cannot read what “Ringo”means, it is impossible to collect the information about apple in Japan because few of the articles are translated from Japanese to English. There is no need to worry about them. GDELT 2.0” can do that by using real time machine translation. This function is called “GDELT Translingual“.  It means that global news that GDELT monitors in 65 languages, representing 98.4% of its daily non-English monitoring volume, is transformed in real time into English. It is amazing because the media of the non-Western world can be included in our coverage. There are no language barriers to worry about.

 

2. “GDELT 2.0” can be updated in near-real time basis

A blog of  “GDELT 2.0″ says ” In essence, within 15 minutes of GDELT monitoring a news report breaking anywhere the world, it has translated it, processed it to identify all events, counts, quotes, people, organizations, locations, themes, emotions, relevant imagery, video, and embedded social media posts, placed it into global context, and made all of this available via a live open metadata firehouse enabling open research on the planet itself.”  These data use to be updated once a day. Now it is updated within 15 minutes. I think it is critically important when we try to create our own news-broadcasting.

 

3. “GDELT 2.0” can exercise content analysis for each article in near-real time basis

“GDELT 2.0” can also judge whether the articles are positive or negative. The blog says “GDELT 2.0” can quantify the extraordinary array of latent emotional and thematic signals subconsciously encoded in the world’s media each day. 18 content analysis systems totaling more than 2,230 dimensions are now run on each news article seen by GDELT each day and all of these scores are available. It is called “the Global Content Analysis Measures (GCAM)”.

 

In short,  information all over the world can be updated with real-time machine translation and content analysis.  It is definitely amazing. With this database of “GDELT 2.0”,  we might create our own news broadcasting!  Could you try it now?

If you are interested in “GDELT 2.0”, it is a nice video for an introduction.

Advertisements

Malaysia is the top emerging digital economy in Digital Evolution Index

smartphone-586903_1280

Last week I found an interesting report about digital economy.  It is called “Digital Evolution Index” conducted by the Fletcher School at Tufts University in collaboration with MasterCard and DataCash. Malaysia is the top in the category of “Break out” nations. It is ranked at 23 out of 50 nations and one of the fastest moving countries in the index from 2008 to 2013. 

In this report, I found that only 2.9 billion global internet users receive an access to the internet so far. Remaining people cannot use the internet because there is no access to it.  However, progresses of technologies are going along in many emerging countries, such as Malaysia, China and India.  Let us consider these progresses country by country and what will take place in the future.

 

The website states that the index is calculated according to the four pillars.

1. Demand: covers consumer income and demographics as well as internet usage

2. Supply: focuses on technology and infrastructure and whether or not they can support digital commerce and transactions.

3. Institutions: accounts for government policy and access to trade.

4. Innovation: rates the environment for creating startups and the overall competitive landscape.

In short, if there are many customers using the internet,  e-commerce companies,  support from governments and innovations promoted, the index will be higher.

Based on the score above, countries are classified into four categories below

1. Stand Out: These countries historically achieve high levels of digital transactions and continue to maintain that level.

2. Watch Out: The common thread among these countries is that they have both significant opportunities and challenges. Their economies function well in spite of limitations.

3. Break Out: Primarily, these are developing countries that have low but growing scores. While they are attractive to investors because of rapid improvement, they’re also riskier.

4. Stall Out: Typically, this group has a history of strong growth, but it’s no longer being achieved. Because of various factors, these countries are at risk of slipping in their development.

 

It is no surprising that there are lots of developed nations in the category of Stand out. These are US, Canada, Singapore, Hong Kong and so on.  But it is surprising that there are a lot of Asian nations in the category of Break Out. These are Malaysia, Thailand, China. Although India, Philippines, Vietnam and Indonesia are in Watch Out, they very close to Break Out. As you know, China and India have populations over a billion people and ASEAN nations have also six hundred million populations there. It means that the digital economy will be spread out with massive scale there in the future. Especially when android phones are getting cheaper and everyone can afford his/her smart phone in order to connect to the internet.

 

I live and work in Kuala Lumpur, Malaysia. I agree with this index as mobile internet is very proficient and the cost is reasonable here.    I pay 30RG (1USD is about 3.6 RG) per month to connect to the internet and voice telephone through my mobile.  The speed of the internet is enough to use e-mails and social network, although it is a little dull to watch the movies.  4G internet service is likewise available if you pay more.

From Malaysia to India, there is vast potential to expand digital economy.  I would like to find out “the next billion users” there.

This new toy looks so bright! Do you know why ?

doll-2679_640

Last week I found that new toy  called “CogniToys” for infants will be developed in the project of Kickstarter, one of the biggest platforms in cloud funding.  The developer is elemental path, one of the three winners of the IBM Watson competition. Let see why it is so bright!

According to the web site of this company,  this toy is connected to the internet.  When a child talks to this toy, it can reply because this toy can see what a child says and answer the question from a child.  It usually requires less than one second to answer because IBM Watson-powered system is powerful enough to calculate answers quickly.

 

Let us look at the descriptions of this company’s technology.

“The Elemental Path technology is built to easily license and integrate into existing product lines. Our dialog engine is able to utilize some of the most advanced language processing algorithms available driving the personalization of our platform, and keeping the interaction going between toy and child.”

Key words are 1. Dialog    2. Language processing   3. Personalization

 

1. Dialog

This toy communicates with children by conversation, rather than programming. Therefore technology called “speech recognition” is needed in it.  This technology is applied in real-time machine translation such as Microsoft Skype, too.

 

2. Language processing

In the area of machine learning, it is called “Natural language processing”. Based on the structure of sentence and phrase, the toy understands what children say.  IBM Watson is very expert in the field of natural language processing because Watson should understand the meaning of questions in Jeopardy contests before.

 

3. Personalization

It is beneficial when children talk to this toy, it knows children preference in advance. This technology is called “Personalization”.  Through interactions between children and the toy, it can learn what children like to cognize. This technology is oftentimes used in retailers such as Amazon and Netflix. There is no disclosure about the method of personalization as far as I know.  I am very interested in how the personalization mechanism works.

 

In short, machine learning enables this toy to work and be smart. Functions of Machine Learning are provided as a service by big IT companies, such as IBM and Microsoft.  Therefore, this kind of applications is expected to be put out to the market in future. This is amazing, isn’t it?  I imagine next versions of the toy can see images,  identify what they are and share images with children because technology called image recognition is also offered as a service by big companies.

I ordered one CogniToy through Kickstarter. It is expected to deliver in November this year. I will report how it works when I get it!

 

Note:IBM, IBM Watson Analytics, the IBM logo are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. 

What can computers do now ? It looks very smart !

restaurant-301951_1280-2

Lately I found that several companies such as Microsoft and IBM provide us services by machine learning. Let us see what is going on now.

These new services are based on the progress on Machine learning recently. For example, Machine translation services between English and Spanish are provided by Microsoft skype. It uses Natural Language Processing by Machine learning. Although it started at Dec 2014, the quality of the services is expected to be improved quickly as a lot of people use and computer can learn the data from such users.

 

It is beneficial for you to explain what computers can do lately so that you can imagine new services in future. First, computers can see the images and videos and identify what it is. This is image recognition. Second, it can listen to our speech and interpret what you mean. This is speech recognition. It can translate one language to another, as well. This is machine translation. Third, computers can research based on concepts rather than key words. Fourth, it can calculate best choice among the potential options. This is an optimization. In short computers can see, listen to, read, speak and think.

These functions are utilized in many products and services although you cannot notice it. For example, IBM Watson Analytics provides these functions through platform as a service to developers.

 

I expect these functions enable computers to behave just like us. At the initial phase, it may be not so good just like a baby. However, machine learning allows computers to learn from experience. It means that the computer will perform better than we do in many fields. As you know, Shogi, one of the popular Japanese board game, artificial machine players can beat human professional teams. This is amazing!

Proceeding forward, it is recommended that you understand how computers are progressing in terms of the functions above. Many companies such as Google, Facebook invest a great deal of money in this filed. Therefore, many services are anticipated to be released in near future. Some of new services can impact our jobs, education and society a lot. Some of them may arise new industries in future.

 

Some day, when you are in the room, the computer can identify you by computer vision. Then ask if you want to drink a cup of coffee. The computer holds a lot of data, such as temperature, weather, time, season, your preference in it and generates the best coffee for you. If you want to know how this coffee is generated, the computer provides you a detailed report about the coffee. All settings are done automatically. It is the ultimate coffee maker by using powerful computer algorithm. Do you want it for you?

 

 

Note:IBM, IBM Watson Analytics, the IBM logo are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. 

Can you win Atari games against computers? It seems to be impossible anymore

minecraft-529460_1280

I think it is better to watch the youtube of interview here first. Onstage at TED2014, Charlie Rose interviews Google CEO Larry Page about his far-off vision for the company.  Page talks through the company’s recent acquisition of Deep Mind, an AI that is learning some surprising.  At the time of 2 minutes 30 seconds in his interview,  he talks about DeepMind for two minutes.

 

According to white paper from DeepMind which were bought by Google at 650m USD in Jan 2014,  in three games of Atari 2600, Breakout, Enduro, Pong,  human can not win against computers after computer learns how each game works for a couple of hours.  There is only one same program prepared for each game and there is no input about how to win the specific game in advance.  It means that only one program should learn how to obtain high score from scratch by itself.  At the result of six games,  computers could record higher score than human experts in three games. It is amazing.

Reinforcement learning, one of machine learning, is used in this challenge. It is different form machine learning used in image recognition and natural language processing.  In reinforcement learning,  reward functions are used to decide what the best policy among many choices in the long run.  We can say in short “how much we should give up today’s lunch,  in order to maximize total sum of lunches tomorrow and later”. We always face this kind of problems but it is difficult for computers to answer.  However DeepMind proved reinforcement learning works well against this kind of problems when they presented the demo at the end of 2013.

 

If this kind of decision-making is available by computers, it will give huge impacts to intellectual jobs, such as lawyers, fund managers, analysts and cooperate officers because they make decisions in long-term horizon, rather than outcomes in tomorrow. They have a lot of experiences in the past, some of  them are successes and others are failures, they can use these experiences when they make a plan for the future.  If computers can use same logic as human and make decisions by themselves, it can be a revolution for intelligent job.  For example, at board meetings in companies, computers may answer questions about management strategies from board members based on the massive amount of past examples and tell them how to maximize future cash flow by using reinforcement learning.  Future cash flow is the most important thing to board members because share holders require to maximize it.

 

Currently a lot of discussions about our future jobs are going on because it is probable that many jobs will be replaced by computers in near future. If reinforcement learning have been improved, CEO of companies might be replaced by computers and share holders might welcome for them in future ?!

 

IBM Watson Analytics works well for business managers !

architecture-21589_1280

IBM Watson Analytics was released at 4th Dec 2014.  This is new service where data analysis can be done with conversations and no programming is needed.  I am very interested in this service so I opened my account of IBM Watson Analytics and reviewed it for a week. I would like to make sure how this service works and whether it is good for business manager with no data analysis expertise. Here is a report for that.

 

I think IBM Watson Analytics is good for beginners of data analysis because it is easy to visualize data and we can do predictive analysis without programming the codes. I used the data which includes  score of exam1, exam2 and results of admission.  This data can be obtained at Exercise 2 of Machine Learning at coursera.  Here is the chart drawn by IBM Watson Analytics. In order to draw this chart, All have to do is uploading data, write or choose “what is the relationship between Exam1 and Exam2 by result”, and adjust some options in red box below. In the chart,  green point means ‘admitted’ and blue point means ‘not admitted’. Therefore it enable us to understand what the data means easily.

watson2

 

Let us move on prediction.  We can analyze data in details here because statistical models are running behind it.  I decided “result” is a target in this analysis.   This target is categorical as it includes only “1:admitted and 0:not admitted” so logistic regression model, which is one of the classification analysis, is chosen automatically by IBM Watson Analytics.  Here is the results of this analysis. In the red box, explanations about this analysis is presented automatically. According to the matrix about score of each exam, we can estimate probability of admission. It is good for business manager as this kind of analysis usually requires  programming with R or MATLAB, python.

watson4

 

In my view, logistic regression is the first model to learn classification because it is easy to understand and can be applied to a lot of fields. For example I used this model to analyze how the counter parties are likely to be in default when I worked at financial industries.  For marketing,  the target can be interpreted as buy the product or not.  For maintenance of machines,  the target can be interpreted as normal or fail. The more data are corrected, the more we can apply this classification analysis to. I hope many business managers can be familiar with logistic regression by using IBM Watson Analytics.

IBM Watson Analytics has just started now so improvements may be needed to make the service better. However, it is also true that business manager can analyze data without programming by using IBM Watson Analytics.  I would like to highly appreciate the efforts made by IBM.

 

 

Note:IBM, IBM Watson Analytics, the IBM logo are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. 

What is the best language for data analysis in 2015 ?

word-cloud-432032_1280

 

 

RedMonk issued the raking about popularity of programming languages. This research is conducted periodically since 2010. This chart below is coming from this research. Although general purpose languages such as JavaScript occupy top 10 ranking,  statistical language is getting popular.  R is ranked 13th and MATLAB is ranked 16th. I have used MATLAB since 2001 and R since 2013 and currently study JavaScript. Then I found that the deference between R, which is statistical language, and other general purpose languages. Let us consider it in details and good way to learn statistical languages such as R and MATLAB.

 

languages 2015

 

1.  R focuses on data

Because R is a statistical language,  it focuses on data to be analyzed.  These data are handled in R as vectors and matrices. Unlike JavaScript, there is no need to define variables to handle data in R. There is no need to distinguish between scalar and vector, either.  So it is easy to start analyzing data with R, especially for beginners. Therefore I think the best way to learn R is to be familiar with vectors and matrices because data is represented as vectors or matrices in R.

 

2.  R has a lot of functions to analyze data

R has a lot of functions because many professionals contribute to develop statistical models with R. Currently there are more than 7000 functions, which are called “R package”. This is one of the biggest advantages to learn R for data analysis. If you are interested in “liner regression model” , which is the most simple model to predict price of services and goods,  all you have to do is just writing command “lm” then R can output the parameters so that predictions of prices can be obtained.

 

3. R is easy to visualize data

If you would like to draw the graph,  all you have to do is to write the code ‘plot’ then simple graph appears on the screen.  When there are a lot of series of data and you would like to know relationship among each of them and other,  all you have to do is to write the code ‘pairs’ then a lot of scatter charts appear so that we can understand the relationship among each of them.  Please look at the example of charts by “pairs”.

Rplot01

 

R is open source and free to anyone. However MATLAB is proprietary software.  It means that you should buy licenses of MATLAB if you would like to use it. But do not worry about that. Octave, which is similar to MATLAB, is available without license fee as an open source software.  I recommend you to use R or Octave for beginners of data analysis because there is no need to pay any fee.

Going forward, R must be more popular in programming languages. It is available for everyone without any cost.  R is introduced as a major language for data analysis in my company and I would recommend all of you to learn R as I do.  Is it fun, isn’t it?

Can we talk to computers without programming language?

astronaut-568620_1280

IBM announced that Watson analytics provides us data analysis and visualization as a service without programming at 4th Dec 2014. It said that “breakthrough natural language-based cognitive service that can provide instant access to powerful predictive and visual analytic tools for businesses, is available in beta”.  Let us consider what kind of impacts IBM Watson analytics provides us.

 

Watson analytics is good at doing natural language processing.  For example,  if doctors ask Watson analytics how to cure the disease, Watson analytics understand the questions from doctors, research massive data and answer the questions. There is no need to program codes by doctors. It means that we may change from “we should learn computer programming” to “we should know how to have a conversation with computers”.  It may enable a lot of non-programming persons to use computers effectively.

In addition to that,  Watson analytics is also good at handling unstructured data.  These data include text, image, voice and video.  Therefore Watson analytics can analyze e-mail, social media contents, pictures taken by consumers.  So It may be possible to recommend what we should eat at restaurants by taking pictures of menus there, because computers have our health data and they can choose the best meals for our health by analyzing the pictures of menus.

In terms of algorithm,  these functionalities above can be achieved by machine learning.  So the more people start using this service, the more accurate answers by computers are because computers learn from a lot of data and are getting better.

 

IBM Watson analytics may change the landscape of every industry.  Traditionally data analysis can be executed by data scientists, using numerical data and programming languages. However this new kind of data analysis by IBM Watson analytics,  data analysis can be executed by businessmen/women, using e-mail, pictures and video and natural languages.  Machine translation from one language to another will be also available therefore there are less language barrier going forward.  This must be democratization for data analysis. It is exciting when it happens in 2015 !

 

Note:IBM, the IBM logo are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. 

Mobile services will be enhanced by machine learning dramatically in 2015, part 2

iphone-518101_1280

Happy new year !   At the beginning of 2015,  it is a good time to consider what will happen in the fields of machine learning and mobile services in 2015.  Followed by the blog last week,  we consider recommender systems and internet of things as well as investment technologies. I hope you can enjoy it !

 

3. Recommender systems

Recommender systems are widely used from big companies such as amazon.com and small and medium-sized companies.  Going forward,  as image recognition technology progresses rapidly, consumer generated data such as pictures and videos must be taken to analyze consumers behaviors and construct consumers preferences effectively.  It means that unstructured data can be taken and analyzed by machine learning in order to make recommendations more accurate. This creates a virtuous cycle. More people take pictures by smartphones and send them thorough the internet, more accurate recommendations are.  It is one of the good examples of personalization. In 2015 a lot of mobile services have functions for personalization so that everyone can be satisfied with mobile services.

 

4. Internet of things

This is also one of big theme of the internet.  As sensors are smaller and cheaper,  a lot of devices and equipments from smart phone to automobile have more sensors in it. These sensors are connected to the internet and send data in real-time basis.  It will change the way to maintain equipments completely.  If fuel consumption efficiency of your car is getting worse, it may be caused by failure of engines so maintenance will be needed as soon as possible. By using classification algorithm of machine learning, it must be possible to predict fatal failure of automobiles, trains and even homes.  All notifications will be sent to smartphones in real-time basis. It leads to green society as efficiency are increasing in terms of energy consumption and emission control.

 

5. Investment technology

I have rarely heard that new technologies will be introduced in investment and asset management in 2014 as far as I concerned.  However I imagine that some of fin-tech companies might use reinforcement learning, one of the categories of machine leaning.  Unlike the image recognition and machine translation, right answers are not so clear in the fields of investment and asset management. It might be solved by reinforcement learning  in practice in order to apply machine learning into this field. Of course, the results of analysis must be sent to smart phone in real-time basis to support investment decisions.

 

Mobile services will be enhanced in 2015 dramatically because machine learning technologies are connected to mobile phone of each customer. Mobile service with machine learning will change the landscape of each industries sooner rather than later. Congratulations!

 

Mobile services will be enhanced by machine learning dramatically in 2015

mobile-phone-213368_1280

Merry Christmas !  The end of 2014 is approaching.  It is a good time to consider what will happen in the fields of machine learning and mobile services in 2015.  This week we consider machine translation and image recognition,  next week recommender systems and internet of things as well as mobile services by machine leaning. I hope you can enjoy it !

 

1.  Machine translation / Text mining

Skype is a top innovator in this fields.   Microsoft already announced that machine translation between English and Spanish is available by Skype. So in 2015,  it would be possible to translate between English and other languages. Text translation is also available among 40 languages in its chat service.  So language barrier are getting lower and lower.  It is still difficult to answer to questions by computers automatically.  But it is also gradually improved.  Mizuho bank announced that it will use IBM Watson, one of the famous artificial intelligence to assist call center operators.  These technologies make global service to be developed more easily as manuscripts and frequent Q&A are translated from the language to another automatically.  I love that because my educational programs can be expanded to all over the world!

 

2. Image recognition

Since computers identified the image of cats automatically by deep learning, images recognition technology progresses dramatically.  Soft bank announced that Pepper, new robot for consumers, will be able to read human emotions. In my view, the most important factor to read emotions must be image recognition of  human facial expressions. Pepper could be very good at doing this therefore it can read human emotions.  Image recognition technology is very good for us as each smart phone has a nice camera and it is easy for people to take pictures and send them to clouds and social media.  Image recognition can enable us to analyze massive amount of images, which are sent through internet. That data must be a treasure for us.

 

These machine learning technologies must be connected to mobile phone of each customer in 2015. It means that mobile services are enhanced by machine learning dramatically. All information around us will be collected through internet and send to machine learning in real-time basis and machine learning will return the best answer for individuals. This will be standard model of mobile services as speed of calculation and communication are increasing rapidly.

Next week we consider recommender systems,  internet of things and investment technology.  See you next week!