This is a continuation of the previous blog post Beyond Binary: Unlocking Business Insights with the Power of Multinomial Classifiers: “The Lab”- Part 2. In this blog, we’ll apply One-versus-Rest approach to train five binary models in EInstein Studio and combine their results for probabilistic consistency.
Part 3: Training the Binary Model Ensemble
- Click on the Einstein Studio tab.
- To start training a model (we will train five) click on the “Add Predictive Model” button
- Select “Create a model from scratch” and then click “Next.”
- Select the first of the five binary datasets we created in the previous blog post. In this instance, select “disengagement binary” from the “default” data space and click “Next.”
- Proceed to set the goal of the model. The objective is for the model to learn how to predict the bucketed labels, which are located in the column “disengagement OvR bucket” for this dataset. Click on the column header “disengagement OvR bucket” in the picklist.
- Model Builder will display detailed information about the chosen label. The next step is to determine the directionality of the prediction goal. In a binary use case, this decision can be crucial. For instance, if training the model to predict the likelihood of a customer purchasing a product, which is a favorable outcome, the goal should be set to Maximize. In this scenario, although any form of churn is undesirable, “Maximize” is still chosen since the prediction output is an intermediate result (used to calculate a final score). This approach simplifies tracking the five outcomes and avoids the complexity of double-negative reasoning.
In the picklist, there are two choices (hence the term binary prediction model): “disengagement” or “other” – this follows the One-versus-Rest strategy. Here, select “disengagement.”
Proceed by clicking “Next.” - On the subsequent screen, there is an option to select the input variables (also known as independent variables or predictor variables in statistics). The machine will use these variables to predict “disengagement.” If you are familiar with your data, it is advisable to switch off Autopilot and manually select the predictor variables to prevent data leakage.
In this exercise, the selected input variables will be “screen_time_weekly,” “average_bit_rate,” “content_completion_rate,” “search_success_rate,” and “login_frequency.” Model Builder will disregard the remaining variables, which is desirable since they do not contain relevant information and may lead to spurious correlations.
It is crucial to ensure that the predictor variables have behavioral correlations to the Churn Reason the machine is being trained to predict. - Click Next.
On this screen, the machine can be allowed to determine the most suitable algorithm for optimal results. The options include Generalized Linear Model (GLM), Gradient Boosted Machine (GBM), and XGBoost. The Automatic Selection switch should remain enabled. - Click Next. This screen provides a final overview of the selected actions. Note that Model Builder has chosen XGBoost for training on the data.
Click “Save” and provide a version description. Providing a version description is advisable, as Einstein Studio Model Builder facilitates re-running training sessions if any issues are discovered during a training run. - Click ‘Save and Train’ to initiate the training process. The system will then proceed to train the model. Training the model can take some time, sometimes even up to 24 hours.
- Apply the same process to the remaining datasets: ‘content binary’, ‘competitor binary’, ‘boring binary’, and ‘poor binary’.
How to interpret the results?
Upon completion of model training, the performance can be evaluated on the Model Training Quality screen. Model Builder will assist in interpreting the results. For example, is a high AUC a positive indicator?
In an ideal scenario, a high AUC is desirable. However, in practice, an excessively high AUC may indicate data leakage. Data leakage occurs when the model has access to information that would not be available in a real-world prediction scenario, leading to overly optimistic performance metrics.
For example, the multicategorical column ‘Churn Reason’ contains information that directly influences the binary columns derived from it. This is why ‘Churn Reason’ was excluded before sending the bucket-transformed datasets to the output nodes. This extra step helps to prevent the inadvertent inclusion of the ‘Churn Reason’ column in any of the training setups.
Since the original dataset is simulated and the strong signals from the predictor variables are intentional, a high AUC does not necessarily indicate data leakage. In a future post, a detailed analysis will be provided on how to interpret the various metrics and charts presented in the Model Training Quality screen.
This concludes Part 3 on building binary models. In the next installment of this blog series, we will bring together the binary models into a single, cohesive model.