A day with .Net

My day to day experince in .net

Archive for the ‘Machine Learning’ Category

The Democratization Of Machine Learning With ML.NET – Linear Regression With Gradient Boosted Tree (Ensemble learning method)

Posted by vivekcek on September 26, 2018

NB: This post is based on ML.NET 0.4 version. There will be some changes in 0.5 version.

Microsoft has decided to make Machine Learning accessible to .NET developers through ML.NET.
ML.NET API is very simple and easy to learn. You can forget about large mathematical equations and implementations of algorithms.

Below diagram show the process of building models using ML.NET.

1. Prepare your data.
2. Decide the algorithm to use.
3. Train a model using the selected algorithm.
4. Evaluate the model.
5. Save the model.
6. Use the saved model in your .NET applications.

1

Below diagram show the main components of ML.NET.

1. Transforms : – Use when you want to transform your data, like for converting categorical values to numerical vectors.
2. Learners : – Consists of various algorithms. Select one that is a perfect fit for your data and problem.
3. Misc :- Consists of API’s for loading data, doing evaluation’s etc..

2

ML.NET follow a pipeline kind of architecture. The pipeline architecture is shown below.

In this post, i will implement linear regression with ML.NET.The problem we are going to solve is predicting the New York taxi fare.
Data is divided into Training set and Testing set.Download it from here.

Train Data
Test Data

Our data include the below columns. The last column “fare_amount” is our label(value to be predicted).

3

ML.NET is cross platform that means you can do this in any OS that has .NET Core 2.0 installed.You can use either VS Code or Visual Studio. I am planning to use VS Code and .NET Core CLI.

Follow below steps(Completed source code will be provided at the end).

1. Make sure .NET Core 2.0 SDK is installed.

2. Create a .NET Core console app via command prompt.

>dotnet new console -o myApp
>cd myApp

3. Install ML.NET package

dotnet add package Microsoft.ML --version 0.4.0

4. Download and save the train and test dataset from above link.

5. Include below namespaces.

using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Threading.Tasks;
using Microsoft.ML.Runtime.Api;

6. Create two clasees that represent our input and output.

    public class TaxiTrip
    {
        [Column("0")]
        public string VendorId;
        [Column("1")]
        public string RateCode;
        [Column("2")]
        public float PassengerCount;
        [Column("3")]
        public float TripTime;
        [Column("4")]
        public float TripDistance;
        [Column("5")]
        public string PaymentType;
        [Column("6")]
        public float FareAmount;
    }

    public class TaxiTripFarePrediction
    {
        [ColumnName("Score")]
        public float FareAmount;
    }

7. Declare the data path. Store your csv files inside a folder named “Data”.

static readonly string _datapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
static readonly string _testdatapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
static readonly string _modelpath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

8. Declare learning pipeline and load data.

var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ','));

In this step the data will be stored in memory like below.

4

9. Copy the FareAmount to Label column.

pipeline.Add(new ColumnCopier(("FareAmount", "Label")));

In this step a new column(Label) will be added to our data by copying the values of FareAmount.

5

10. Convert string values to numerical vectors. Beacause our algorithm only support numerical vectors.

pipeline.Add(new CategoricalOneHotVectorizer("VendorId",
                                             "RateCode",
                                             "PaymentType"));

After this step data will be look like below. String values are now converted to numerical vectors.

6

11. Now copy all of our input coulmns to a single coulmn named Features.

pipeline.Add(new ColumnConcatenator("Features",
                                    "VendorId",
                                    "RateCode",
                                    "PassengerCount",
                                    "TripDistance",
                                    "PaymentType"));

In this step a new column(Features) will be added to our data structure.

7

12. Next is the important step of selecting the Algorithm. In our case i am using FastTreeRegressor algorithm.
Which is a Gradient Boosted Decesion Tree based algorithm, which provide better performance on non linear dataset.

pipeline.Add(new FastTreeRegressor());

13. Now we train our model.Our algorithm will use Features and Label column.

PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();

So what we have done still now can be represented as below.

14. Once you trained the model you can evaluate its performance by checking RSquared. If RSquared is close to 1, our model perform good.

var testData = new TextLoader(_testdatapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ',');

var evaluator = new RegressionEvaluator();
RegressionMetrics metrics = evaluator.Evaluate(model, testData);

Console.WriteLine($"Rms = {metrics.Rms}");
Console.WriteLine($"RSquared = {metrics.RSquared}");

8

15 Now predict a new value as below.

TaxiTripFarePrediction prediction = model.Predict(TestTrips.Trip1);
Console.WriteLine("Predicted fare: {0}, actual fare: 29.5", prediction.FareAmount);

9

16. Where TestTrips is a static class.


    static class TestTrips
    {
        internal static readonly TaxiTrip Trip1 = new TaxiTrip
        {
            VendorId = "VTS",
            RateCode = "1",
            PassengerCount = 1,
            TripDistance = 10.33f,
            PaymentType = "CSH",
            FareAmount = 0 // predict it. actual = 29.5
        };
    }

17. To save and retrieve your model use below codes.

 await model.WriteAsync(_modelpath);
PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = await PredictionModel.ReadAsync<TaxiTrip, TaxiTripFarePrediction>(_modelpath);

18. Full Source Code is given below.

using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Threading.Tasks;
using Microsoft.ML.Runtime.Api;

namespace LinearRegression
{

    class Program
    {

        static readonly string _datapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
        static readonly string _testdatapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
        static readonly string _modelpath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

        static void Main(string[] args)
        {
            var pipeline = new LearningPipeline();
            pipeline.Add(new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ','));
            pipeline.Add(new ColumnCopier(("FareAmount", "Label")));
            pipeline.Add(new CategoricalOneHotVectorizer("VendorId",
                                             "RateCode",
                                             "PaymentType"));

            pipeline.Add(new ColumnConcatenator("Features",
                                    "VendorId",
                                    "RateCode",
                                    "PassengerCount",
                                    "TripDistance",
                                    "PaymentType"));

            pipeline.Add(new FastTreeRegressor());

            PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();
            //await model.WriteAsync(_modelpath);
            //PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = await PredictionModel.ReadAsync<TaxiTrip, TaxiTripFarePrediction>(_modelpath);

            var testData = new TextLoader(_testdatapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ',');

            var evaluator = new RegressionEvaluator();
            RegressionMetrics metrics = evaluator.Evaluate(model, testData);

            Console.WriteLine($"Rms = {metrics.Rms}");
            Console.WriteLine($"RSquared = {metrics.RSquared}");

            TaxiTripFarePrediction prediction = model.Predict(TestTrips.Trip1);
            Console.WriteLine("Predicted fare: {0}, actual fare: 29.5", prediction.FareAmount);

            Console.Read();
        }
    }

    public class TaxiTrip
    {
        [Column("0")]
        public string VendorId;
        [Column("1")]
        public string RateCode;
        [Column("2")]
        public float PassengerCount;
        [Column("3")]
        public float TripTime;
        [Column("4")]
        public float TripDistance;
        [Column("5")]
        public string PaymentType;
        [Column("6")]
        public float FareAmount;
    }

    public class TaxiTripFarePrediction
    {
        [ColumnName("Score")]
        public float FareAmount;
    }

    // <Snippet1>
    static class TestTrips
    // </Snippet1>
    {
        // <Snippet2>
        internal static readonly TaxiTrip Trip1 = new TaxiTrip
        {
            VendorId = "VTS",
            RateCode = "1",
            PassengerCount = 1,
            TripDistance = 10.33f,
            PaymentType = "CSH",
            FareAmount = 0 // predict it. actual = 29.5
        };
        // </Snippet2>
    }
}

Posted in Machine Learning | Leave a Comment »

Predicting linear regression with Tensorflow and Azure Machine Learning Studio (Comparison ,Gradient descent)

Posted by vivekcek on September 20, 2017

In this post I am trying to evaluate the prediction done by Tensorflow and Azure Machine Learning Studio.
I am using a dataset obtained from Courseera machine learning tutorial. I will provide the dataset at the end of this post.
Here is the predicted value I got from Tensorflow and Azure Machine Learning for the input “8.5172”

Azure

Tensorflow

I used linear regression with Gradient descent optimizer, and below are the values used for epoch and learning rate in Tensorflow and azure ML.
Epoch=1000
Learning rate=0.01

Azure Model and Algorithm settings are given below

Tensorflow code is given below.

import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf

# Parameters
display_step = 50
learning_rate = 0.01
training_epochs = 1000

data = pd.read_csv('ex1data1.txt', names=['population', 'profit'])

X_data = data[['population']]
Y_data = data[['profit']]

n_samples = X_data.shape[0]  # Number of rows

# tf Graph Input
X = tf.placeholder('float', shape=X_data.shape)
Y = tf.placeholder('float', shape=Y_data.shape)

# Set model weights
W = tf.Variable(tf.zeros([1, 1]), name='weight')
b = tf.Variable(tf.zeros(1), name='bias')

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)

# Mean squared error
# cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
cost = tf.reduce_mean(tf.square(pred-Y)) / 2.0

# Gradient descent
# may try other optimizers like AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, FtrlOptimizer or RMSPropOptimizer
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    cost_value, w_value, b_value = (0.0, 0.0, 0.0)
    for epoch in range(training_epochs):
        # Fit all training data
        _, cost_value, w_value, b_value = sess.run((optimizer, cost, W, b), feed_dict={X: X_data, Y: Y_data})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            print('Epoch:', '%04d' % (epoch+1), 'cost=', '{:.9f}'.format(cost_value), \
                'W=', w_value, 'b=', b_value)

    print ('Optimization Finished!')
    print ('Training cost=', cost_value, 'W=', w_value, 'b=', b_value, '\n')
    print('Evaluation')
    print(w_value*8.5172+b_value)
    # Graphic display
    plt.plot(X_data, Y_data, 'ro', label='Original data')
    plt.plot(X_data, w_value * X_data + b_value, label='Fitted line')
    plt.legend()
    plt.show()

Please use below dataset.

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705

Posted in Azure, Machine Learning, Tensorflow | Tagged: , | Leave a Comment »

Linear Regression With TensorFlow – Part1

Posted by vivekcek on September 18, 2017

Hi Guys in this blog post i am trying to explain, how we can implement simple linear regression with tensorflow. Simple linear regression means regression with single input and single output.

In future posts i will explain about.

1. Linear regression with multiple features.
2. Polynomial regression.
3. Regularized/Normalized linear regression.
4. Linear regression with external data.

Today anyway i want keep the problem simple, so i am using some inline data for analysis.

I hope you have some good knowledge about Machine Learning. If not please take a training from course-era. Course-era has a good training in Machine Learning by Andrew Ng.

Do we really need tensorflow to do linear regression? We can implement it in Octave, MATLAB, Python, Scikit-learn etc…

Understanding the Math’s and statics behind linear regression is more important than the tool we are going to use.

So how we will approach this problem?

First we need to ensure the data available with us is linearly dependent. For that we need to plot it. You can use the below code to plot your data.

import matplotlib.pyplot as plt
import numpy 

train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
plt.plot(train_X, train_Y, 'ro', label='Original data')
plt.legend()
plt.show()

From the image it is clear that the data is linearly dependent and we can use simple linear regression with it.

Next we are going to define our hypothesis, Cost function and Optimizer. Hope the reader is aware of what you mean by cost, how to minimize the cost etc..

For linear regression the hypothesis we are going to use is the equation of a straight line.

hypothesis (prediction)=WX+b(Where W is the slope and b is the y intercept and X is our input).
In tensorflow we say them as Weight (W) and bias (b).

Next what is cost, cost is actually the difference from the actual to predicted. We need to minimize this cost to find a best W and b.

The cost function for linear regression is given below.

To minimize the cost we are going to use gradient descent algorithm.

The full code is given below.

from __future__ import print_function

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
rng = numpy.random


learning_rate = 0.01
training_epochs = 1000
display_step = 50


train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]


X = tf.placeholder("float")
Y = tf.placeholder("float")


W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")


hypothesis = tf.add(tf.multiply(X, W), b)


cost = tf.reduce_mean(tf.square(hypothesis - Y))

optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)


init = tf.global_variables_initializer()


with tf.Session() as sess:


    sess.run(init)
    for epoch in range(training_epochs):
        sess.run(optimizer, feed_dict={X: train_X, Y: train_Y})


        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    # Graphic display
    plt.plot(train_X, train_Y, 'ro', label='Original data')
    plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
    plt.legend()
    plt.show()

    # Testing example, as requested (Issue #2)
    test_X = numpy.asarray([6.83, 4.668, 8.9, 7.91, 5.7, 8.7, 3.1, 2.1])
    test_Y = numpy.asarray([1.84, 2.273, 3.2, 2.831, 2.92, 3.24, 1.35, 1.03])

    print("Testing... (Mean square loss Comparison)")
    testing_cost = sess.run(
        cost,
        feed_dict={X: test_X, Y: test_Y})  # same function as cost above
    print("Testing cost=", testing_cost)
    print("Absolute mean square loss difference:", abs(
        training_cost - testing_cost))

    plt.plot(test_X, test_Y, 'bo', label='Testing data')
    plt.plot(train_X, sess.run(W) * train_X + sess.run(b), label='Fitted line')
    plt.legend()
    plt.show()

Posted in Machine Learning, Tensorflow | Leave a Comment »