Sunday, June 9, 2019

I saw Mark Kalal speak on ML.NET at the Twin Cities .NET User Group on Thursday night.

Microsoft.ML is the NuGet package to get for a Visual Studio 2019 project, but independent of that the ML.NET Model Builder is a pretty good thing to download from here. Once you have it, you may right-click on a project in Visual Studio 2019 and pick "Add" and then "Machine Learning" from beneath the "Add" to loop in some of the wonders we are about to discuss. This was the second tech talk I had seen on machine learning after this one (touched on supervised learning which breaks into classification and regression) and naturally there was some overlap. I will try to focus on what was new in what is ahead. Unsupervised learning breaks into clustering (Have you seen the film Rangasthalam?) and association (Have you seen the films Rangasthalam and A Quiet Place both?) and Netflix uses this to profile you and make recommendations. Reinforcement learning gets into positive and negative reinforcement. For example, Arthur Samuel came up with the term machine learning in attempting to build the perfect checkers player who learned from experience. His actor would learn from positive and negative reinforcement. If you want to build something like conceptnet.io (helps computers understand the semantics of English words), or a spam filter for email, or something like Facebook's facial recognition, there are a few tools available to you. Keras, TensorFlow (supports Nvidia cards, GPUs, and also even TPUs or tensor processing units for hardware-specific to machine learning), Microsoft Cognitive Services, Weka in the Java space, MathWorks, Prophet, Python which has about 90% of the pie of what has been baked in the machine learning space, and, yes, ML.NET which is what we care about. Let us say, in a simple example, that we want to write a program that makes sense of images of fruits and categorizes them. Well, we could maybe tell an apple and a banana apart by matching on the color red for an apple and the color yellow for a banana, right? What if the apple is green and we have an unripe green banana in a different image though? Rules pick up exceptions, and more rules, and soon you have to try to have an algorithm. The good news is, that once data starts to pick up in volume the various algorithms you try tend to become equal in their ability to predict. I'm sure there are some exceptions to this case in that an algorithm probably needs to be "legit" but assuming that it is it will eventually take you to the same place all of the competent rivals do. Volume (think big data) is one of three traits of your inputs for your algorithms to put attention to with the others being velocity and variety. In the case of velocity be wary of shoving in data faster than you can process it. ONNX, which is pronounced like onyx and stands for open neural network exchange, is a file format standard for the machine learning space. 90% of all of the data that humanity has gathered digitally has been collected in the past two years, so the big data space is blossoming now. data.gov and data.worldbank.org are go-to spots. Some POCO code for a Fruit that was expositioned looked like so:

public class Fruit
{
   [LoadColumn(0)]
   public string Color { get; set; }
   
   [LoadColumn(1)]
   public string Size { get; set; }
   
   [LoadColumn(2)]
   public string Class { get; set; }
}

 
 

We will try to get a FruitPrediction for each Fruit and a FruitPrediction looks like this:

public class FruitPrediction
{
   [ColumnName("PredictedLabel")]
   public string Class;
}

 
 

The following code was showed off by Mark Kalal in an attempt to get predictions. I took pictures of it with my phone and some of it got cut off. This code was a procedural blob inside of a method in a C# console application.

MLContext mlContext;
string trainDataPath = "c:\\dev\\Data\\fruit.csv";
PredictionEngine<Fruit, FruitPrediction> predEngine;
ITransformer trainedModel;
IDataView trainingDataView;
 
mlContext = new MLContext();
 
trainingDataView = mlContext.Data.LoadFromTextFile<Fruit>(trainDataPath, hasHeader: true, separatorChar
...
 
var pipeline = mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "Class", outputColumnName
...
   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Color",
         outputColumnNa
...
   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Size",
         outputColumnNam
...
   .Append(mlContext.Transforms.Concatenate("Features", "ColorFeaturized",
         "SizeFeaturized"
...
 
var trainingPipeline = pipeline.Append(mlContext.MulticlassClassification.Trainers.StochasticDualCoordi
...
   .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
 
trainedModel = trainingPipeline.Fit(trainingDataView);
 
Fruit singleIssue = new Fruit() { Color = "2", Size = "3" };
predEngine = trainedModel.CreatePredictionEngine<Fruit, FruitPrediction>(mlContext);
 
var prediction = predEngine.Predict(singleIssue);
Console.WriteLine("Prediction - Result: " + prediction.Class);

 
 

MLContext sounds alot like DBContext, huh? Ha ha. Microsoft names...

No comments:

Post a Comment