Challenge to Machine Learning

home-office-336373_640

Machine Learning is getting famous and attractive to analyze big data.  It has a long history to be developed as the algorithm since 1950s.  However machine learning gets a spotlight among data scientist recently because  a lot of data,  computer resources and data storage, which are necessary for machine learning, has been available with reasonable costs.    I would like to introduce a basic algorithm of Machine Learning by using R language,  which I recommended before.

1.  Problem sets

Observed data x= [1,2,3] and y= [5,7,9].  Then I would like to find what are a and b when I assume that it can be expressed  y=ax+b.  Yes, it is obvious that a=2 and b=3, however, I want this solution by using algorithms to calculate them.

 

2. Algorithm

This is my program of machine learning to find what  a and b are.  I would like to focus on Bold part of the program.

First step      :       update parameters

Second step :       calculate the updated value of the cost function by the updated parameters

Third step    :       compare the updated value with the old value of the cost function and stop calculation if it is considered as convergence

Go back to the first step above.

These three steps above are generally used in machine learning algorithms. So it is useful if you can remember them.

 

ML<-function(le){

x=matrix(c(1,1,1,1,2,3),3,2)
y=matrix(c(5, 7, 9),3,1)
t=matrix(1,2,1)
m=length(y)
h=x%*%t
j=1/(2*m)*(t(h-y)%*%(h-y))

for (i in seq(1,1000)){
h=x%*%t
tnew=t-le/m*t(x)%*%(h-y)
hnew=x%*%tnew
jnew=1/(2*m)*(t(hnew-y)%*%(hnew-y))
if (abs(jnew-j)<=10^(-8)) break
t=tnew
j=jnew
print(i)
print(t)
end
}
}

 

3.  The result of calculation

I use  le=0.1 as a learning rate.  Then I get the result of the calculation below.

[1] 521

[,1]
[1,] 2.997600
[2,] 2.001056

This means that the value of the cost function is convergent at 521 time calculations.  a = 2.001056 and b =  2.997600.    They are very close to true values a=2 and b=3.  So it is considered that this algorithm can find the solutions.

 

This algorithm is one of the most simple ones. However, it includes the fundamental structure which can be applied to other complex algorithms.  So I recommend you to implement this by yourself and be familiar with this kind of algorithms.  In short, 1. Update parameters  2. Calculate the updated value of cost function  3. Make sure updated value is convergent.   Yes, it is so simple!

TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using algorithms, instructions, methods or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s