Make predictions with ML.NET models without defining schema classes

Introduction

To make predictions with ML.NET models you often have to define schema classes for your model inputs and outputs. In previous posts I wrote how you can use Netron to inspect an ML.NET model to determine the name and types of your model inputs and outputs. If you have data samples of what your data input and output look like in JSON format, you can automate the generation of model input and output classes by using Visual Studio's "Paste JSON as Classes" feature. However, what if you want to make predictions without defining these classes? In this post I'll show how you can use the .NET DataFrame API to make predictions with ML.NET models without having to create model input and output classes. Code snippets are in F# but notebooks with complete C# & F# code can be found in the mlnet-noschema-predictions repo on GitHub.

Install and reference packages

In addition to the Microsoft.ML ML.NET NuGet package, you'll also need the Microsoft.Data.Analysis NuGet package to use the .NET DataFrame API. For more information on the .NET DataFrame API, see an introduction to DataFrame.

Once your packages are installed, reference them in your application.

open Microsoft.ML
open Microsoft.Data.Analysis

Initialize MLContext and load the model

The MLContext is the entrypoint of ML.NET applications. Use it to load your model. The model used in this case categorizes sentiment as positive or negative. See the use Netron to inspect an ML.NET model blog post to learn more about the model.

let ctx = MLContext()
let model,schema = ctx.Model.Load("sentiment_model.zip")

Both the model and the input schema are returned when you load the model. The input schema is a DataViewSchema object containing a collection of Columns.

Define input and output column names

The input and output column names are for the DataFrames containing your input data and predictions. They help the model map the input and output values.

Use the schema which was loaded with the model to get the name of your input columns.

let inputColumnNames = 
    schema 
    |> Seq.map(fun column -> column.Name) 
    |> Array.ofSeq

Since this is a binary classification model by default only two columns are returned as part of the prediction:

You can create an array containing the names of these columns. For more information on default output columns, see the ML.NET Tasks documentation.

let outputColumnNames = [| "PredictedLabel" ; "Score" |]

Create input data for predictions

Use the LoadCsvFromString method to load your input data into a DataFrame. In this case, there's only one column and data instance so I represent it as a string literal. Additionally, I provide the name of the input columns.

let sampleInput = "This was a very bad steak"

let inputDataFrame = 
    DataFrame.LoadCsvFromString(
        sampleInput, 
        header=false, 
        columnNames=inputColumnNames)

Make predictions

Now that you've loaded your input data, it's time to use the model to make predictions.

let predictionDV = 
    inputDataFrame 
    |> model.Transform 

Calling the Transform method returns an IDataView with your predictions. You can then convert the IDataView into a DataFrame for further processing with the ToDataFrame method.

let prediction = predictionDV.ToDataFrame(1L, outputColumnNames)

The resulting DataFrame should look something like the following:

index PredictedLabel Score
0 False -2.1337974

Conclusion

If you want to load a model and make predictions without defining classes for your input and output schema's you can load your data into a DataFrame using the .NET DataFrame API. While this solution works, because DataFrames and IDataViews process data differently, I haven't tested whether this solution would scale for larger data sets.


Send me a message or webmention