ML Estimators
An estimator is an abstraction of a learning algorithm that fits a model on a dataset.
Technically, an Estimator produces a Model (i.e. a Transformer) for a given DataFrame and parameters (as ParamMap). It fits a model to the input DataFrame and ParamMap to produce a Transformer (a Model) that can calculate predictions for any DataFrame-based input datasets.
- Estimator is the contract in Spark MLlib for estimators that fit models to a dataset.
- Estimator accepts parameters that you can set through dedicated setter methods upon creating an Estimator. You could also fit a model with extra parameters.
import org.apache.spark.ml.classification.LogisticRegression
// Define parameters upon creating an Estimator
val lr = new LogisticRegression().
setMaxIter(5).
setRegParam(0.01)
val training: DataFrame = ...
val model1 = lr.fit(training)