A day with .Net

My day to day experince in .net

Archive for September, 2018

The Democratization Of Machine Learning With ML.NET – Linear Regression With Gradient Boosted Tree (Ensemble learning method)

Posted by vivekcek on September 26, 2018

NB: This post is based on ML.NET 0.4 version. There will be some changes in 0.5 version.

Microsoft has decided to make Machine Learning accessible to .NET developers through ML.NET.
ML.NET API is very simple and easy to learn. You can forget about large mathematical equations and implementations of algorithms.

Below diagram show the process of building models using ML.NET.

1. Prepare your data.
2. Decide the algorithm to use.
3. Train a model using the selected algorithm.
4. Evaluate the model.
5. Save the model.
6. Use the saved model in your .NET applications.

1

Below diagram show the main components of ML.NET.

1. Transforms : – Use when you want to transform your data, like for converting categorical values to numerical vectors.
2. Learners : – Consists of various algorithms. Select one that is a perfect fit for your data and problem.
3. Misc :- Consists of API’s for loading data, doing evaluation’s etc..

2

ML.NET follow a pipeline kind of architecture. The pipeline architecture is shown below.

In this post, i will implement linear regression with ML.NET.The problem we are going to solve is predicting the New York taxi fare.
Data is divided into Training set and Testing set.Download it from here.

Train Data
Test Data

Our data include the below columns. The last column “fare_amount” is our label(value to be predicted).

3

ML.NET is cross platform that means you can do this in any OS that has .NET Core 2.0 installed.You can use either VS Code or Visual Studio. I am planning to use VS Code and .NET Core CLI.

Follow below steps(Completed source code will be provided at the end).

1. Make sure .NET Core 2.0 SDK is installed.

2. Create a .NET Core console app via command prompt.

>dotnet new console -o myApp
>cd myApp

3. Install ML.NET package

dotnet add package Microsoft.ML --version 0.4.0

4. Download and save the train and test dataset from above link.

5. Include below namespaces.

using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Threading.Tasks;
using Microsoft.ML.Runtime.Api;

6. Create two clasees that represent our input and output.

    public class TaxiTrip
    {
        [Column("0")]
        public string VendorId;
        [Column("1")]
        public string RateCode;
        [Column("2")]
        public float PassengerCount;
        [Column("3")]
        public float TripTime;
        [Column("4")]
        public float TripDistance;
        [Column("5")]
        public string PaymentType;
        [Column("6")]
        public float FareAmount;
    }

    public class TaxiTripFarePrediction
    {
        [ColumnName("Score")]
        public float FareAmount;
    }

7. Declare the data path. Store your csv files inside a folder named “Data”.

static readonly string _datapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
static readonly string _testdatapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
static readonly string _modelpath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

8. Declare learning pipeline and load data.

var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ','));

In this step the data will be stored in memory like below.

4

9. Copy the FareAmount to Label column.

pipeline.Add(new ColumnCopier(("FareAmount", "Label")));

In this step a new column(Label) will be added to our data by copying the values of FareAmount.

5

10. Convert string values to numerical vectors. Beacause our algorithm only support numerical vectors.

pipeline.Add(new CategoricalOneHotVectorizer("VendorId",
                                             "RateCode",
                                             "PaymentType"));

After this step data will be look like below. String values are now converted to numerical vectors.

6

11. Now copy all of our input coulmns to a single coulmn named Features.

pipeline.Add(new ColumnConcatenator("Features",
                                    "VendorId",
                                    "RateCode",
                                    "PassengerCount",
                                    "TripDistance",
                                    "PaymentType"));

In this step a new column(Features) will be added to our data structure.

7

12. Next is the important step of selecting the Algorithm. In our case i am using FastTreeRegressor algorithm.
Which is a Gradient Boosted Decesion Tree based algorithm, which provide better performance on non linear dataset.

pipeline.Add(new FastTreeRegressor());

13. Now we train our model.Our algorithm will use Features and Label column.

PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();

So what we have done still now can be represented as below.

14. Once you trained the model you can evaluate its performance by checking RSquared. If RSquared is close to 1, our model perform good.

var testData = new TextLoader(_testdatapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ',');

var evaluator = new RegressionEvaluator();
RegressionMetrics metrics = evaluator.Evaluate(model, testData);

Console.WriteLine($"Rms = {metrics.Rms}");
Console.WriteLine($"RSquared = {metrics.RSquared}");

8

15 Now predict a new value as below.

TaxiTripFarePrediction prediction = model.Predict(TestTrips.Trip1);
Console.WriteLine("Predicted fare: {0}, actual fare: 29.5", prediction.FareAmount);

9

16. Where TestTrips is a static class.


    static class TestTrips
    {
        internal static readonly TaxiTrip Trip1 = new TaxiTrip
        {
            VendorId = "VTS",
            RateCode = "1",
            PassengerCount = 1,
            TripDistance = 10.33f,
            PaymentType = "CSH",
            FareAmount = 0 // predict it. actual = 29.5
        };
    }

17. To save and retrieve your model use below codes.

 await model.WriteAsync(_modelpath);
PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = await PredictionModel.ReadAsync<TaxiTrip, TaxiTripFarePrediction>(_modelpath);

18. Full Source Code is given below.

using Microsoft.ML;
using Microsoft.ML.Data;
using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Threading.Tasks;
using Microsoft.ML.Runtime.Api;

namespace LinearRegression
{

    class Program
    {

        static readonly string _datapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-train.csv");
        static readonly string _testdatapath = Path.Combine(Environment.CurrentDirectory, "Data", "taxi-fare-test.csv");
        static readonly string _modelpath = Path.Combine(Environment.CurrentDirectory, "Data", "Model.zip");

        static void Main(string[] args)
        {
            var pipeline = new LearningPipeline();
            pipeline.Add(new TextLoader(_datapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ','));
            pipeline.Add(new ColumnCopier(("FareAmount", "Label")));
            pipeline.Add(new CategoricalOneHotVectorizer("VendorId",
                                             "RateCode",
                                             "PaymentType"));

            pipeline.Add(new ColumnConcatenator("Features",
                                    "VendorId",
                                    "RateCode",
                                    "PassengerCount",
                                    "TripDistance",
                                    "PaymentType"));

            pipeline.Add(new FastTreeRegressor());

            PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = pipeline.Train<TaxiTrip, TaxiTripFarePrediction>();
            //await model.WriteAsync(_modelpath);
            //PredictionModel<TaxiTrip, TaxiTripFarePrediction> model = await PredictionModel.ReadAsync<TaxiTrip, TaxiTripFarePrediction>(_modelpath);

            var testData = new TextLoader(_testdatapath).CreateFrom<TaxiTrip>(useHeader: true, separator: ',');

            var evaluator = new RegressionEvaluator();
            RegressionMetrics metrics = evaluator.Evaluate(model, testData);

            Console.WriteLine($"Rms = {metrics.Rms}");
            Console.WriteLine($"RSquared = {metrics.RSquared}");

            TaxiTripFarePrediction prediction = model.Predict(TestTrips.Trip1);
            Console.WriteLine("Predicted fare: {0}, actual fare: 29.5", prediction.FareAmount);

            Console.Read();
        }
    }

    public class TaxiTrip
    {
        [Column("0")]
        public string VendorId;
        [Column("1")]
        public string RateCode;
        [Column("2")]
        public float PassengerCount;
        [Column("3")]
        public float TripTime;
        [Column("4")]
        public float TripDistance;
        [Column("5")]
        public string PaymentType;
        [Column("6")]
        public float FareAmount;
    }

    public class TaxiTripFarePrediction
    {
        [ColumnName("Score")]
        public float FareAmount;
    }

    // <Snippet1>
    static class TestTrips
    // </Snippet1>
    {
        // <Snippet2>
        internal static readonly TaxiTrip Trip1 = new TaxiTrip
        {
            VendorId = "VTS",
            RateCode = "1",
            PassengerCount = 1,
            TripDistance = 10.33f,
            PaymentType = "CSH",
            FareAmount = 0 // predict it. actual = 29.5
        };
        // </Snippet2>
    }
}

Posted in Machine Learning | Leave a Comment »