Make predictions with ML.NET models without defining schema classes
Introduction
To make predictions with ML.NET models you often have to define schema classes for your model inputs and outputs. In previous posts I wrote how you can use Netron to inspect an ML.NET model to determine the name and types of your model inputs and outputs. If you have data samples of what your data input and output look like in JSON format, you can automate the generation of model input and output classes by using Visual Studio's "Paste JSON as Classes" feature. However, what if you want to make predictions without defining these classes? In this post I'll show how you can use the .NET DataFrame API to make predictions with ML.NET models without having to create model input and output classes. Code snippets are in F# but notebooks with complete C# & F# code can be found in the mlnet-noschema-predictions repo on GitHub.
Install and reference packages
In addition to the Microsoft.ML ML.NET NuGet package, you'll also need the Microsoft.Data.Analysis NuGet package to use the .NET DataFrame API. For more information on the .NET DataFrame API, see an introduction to DataFrame.
Once your packages are installed, reference them in your application.
open Microsoft.ML
open Microsoft.Data.Analysis
Initialize MLContext and load the model
The MLContext
is the entrypoint of ML.NET applications. Use it to load your model. The model used in this case categorizes sentiment as positive or negative. See the use Netron to inspect an ML.NET model blog post to learn more about the model.
let ctx = MLContext()
let model,schema = ctx.Model.Load("sentiment_model.zip")
Both the model and the input schema are returned when you load the model. The input schema is a DataViewSchema object containing a collection of Columns.
Define input and output column names
The input and output column names are for the DataFrames containing your input data and predictions. They help the model map the input and output values.
Use the schema
which was loaded with the model to get the name of your input columns.
let inputColumnNames =
schema
|> Seq.map(fun column -> column.Name)
|> Array.ofSeq
Since this is a binary classification model by default only two columns are returned as part of the prediction:
- Score
- PredictedLabel
You can create an array containing the names of these columns. For more information on default output columns, see the ML.NET Tasks documentation.
let outputColumnNames = [| "PredictedLabel" ; "Score" |]
Create input data for predictions
Use the LoadCsvFromString
method to load your input data into a DataFrame. In this case, there's only one column and data instance so I represent it as a string literal. Additionally, I provide the name of the input columns.
let sampleInput = "This was a very bad steak"
let inputDataFrame =
DataFrame.LoadCsvFromString(
sampleInput,
header=false,
columnNames=inputColumnNames)
Make predictions
Now that you've loaded your input data, it's time to use the model to make predictions.
let predictionDV =
inputDataFrame
|> model.Transform
Calling the Transform
method returns an IDataView
with your predictions. You can then convert the IDataView
into a DataFrame for further processing with the ToDataFrame
method.
let prediction = predictionDV.ToDataFrame(1L, outputColumnNames)
The resulting DataFrame should look something like the following:
index | PredictedLabel | Score |
---|---|---|
0 | False | -2.1337974 |
Conclusion
If you want to load a model and make predictions without defining classes for your input and output schema's you can load your data into a DataFrame using the .NET DataFrame API. While this solution works, because DataFrames and IDataViews process data differently, I haven't tested whether this solution would scale for larger data sets.