This is a continuation of the previous blog post Beyond Binary: Unlocking Business Insights with the Power of Multinomial Classifiers: “The Lab” – Part 4. This is an optional step. In this blog post, we’ll see how you can build a matrix report of actual vs predicted churn reasons.
Part 5: Constructing a confusion matrix to view the results
This part is optional but it helps to see a summary of how well the classifiers predict the correct classes. We will build a Matrix Summary in Reports.
Create a “New” report. Look for the name of the DMO you created in Part 4 of the series.
In the next page, give the report a good title, then at the columns selector, click on the down button then click on “Add Row-Level Formula”:
In this Row-Level Formula, our goal is to convert probability scores into categorical predictions. The approach is straightforward: the category with the highest probability score is selected. For instance, if the “disengagement score” is the highest among all scores, the prediction will be “Disengagement” as the reason for churn.
We need to consider how to handle ties, such as when two scores are equal. In these cases, we’ll simply select the first category, even though this might seem a bit arbitrary. To implement this, we’ll use a nested If statement. While this adds some complexity to the code, it’s the most straightforward method for converting probabilities into categorical predictions.
Important, here’s how the nested IF statement should look for a situation with three categories in pseudo-code, keep this structure in mind when writing out your formula that you use when constructing the confusion matrix:
IF(score1 >= score2 && score1 >= score3, "Category 1",
IF(score2 >= score1 && score2 >= score3, "Category 2",
"Category 3"))
This structure ensures that the highest score determines the category, with ties resolved by selecting the first category listed.
Note: Row-Level Formulas operate on data within the same row, which means we’ll be creating a new column for this Row-Level Formula to display the results.
Let’s aim to create a matrix report with a clear diagonal pattern, where the counts of correct predictions run from the top-left to the bottom-right. To achieve this, I’ll structure the nested IF statements in the following order:
- Boring Content Score
- Content Score
- Disengagement Score
- Poor Video Score
- Competitor Score
The order was obtained by grouping Churn Reason and scrolling down to see the order in which the groups were presented:
Pro-tip: when you hit refresh, the preview may not have anything to show, just hit the Run button on the upper right of the Report page to see the groupings.
- Fill in the Column Name as “Predicted”, give it a description
- Since our results will be text, make sure the Formula Output type is “Text”
- The nested IF statement is a comparison of all the probability scores, all of which have “score” in their column names. So, for convenience, search for columns with “score” in their names
Begin by writing the formula. Following the nested IF statement structure from the pseudo-code, fill in the score comparisons in the specified order. Remember to maintain the commas, spaces, comparison operators and the parentheses.
The Formula I have used is this –
IF(Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c >= Multinomial_Churn_Reasons__dlm.competitor_c_formula__c && Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c >= Multinomial_Churn_Reasons__dlm.content_c_formula__c && Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c >= Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c && Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c >= Multinomial_Churn_Reasons__dlm.poor_video_c_formula__c, "boring content",
IF(Multinomial_Churn_Reasons__dlm.competitor_c_formula__c >= Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c && Multinomial_Churn_Reasons__dlm.competitor_c_formula__c >= Multinomial_Churn_Reasons__dlm.content_c_formula__c && Multinomial_Churn_Reasons__dlm.competitor_c_formula__c >= Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c && Multinomial_Churn_Reasons__dlm.competitor_c_formula__c >= Multinomial_Churn_Reasons__dlm.poor_video_c_formula__c, "Prefer Competitor",
IF(Multinomial_Churn_Reasons__dlm.content_c_formula__c >= Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c && Multinomial_Churn_Reasons__dlm.content_c_formula__c >= Multinomial_Churn_Reasons__dlm.competitor_c_formula__c && Multinomial_Churn_Reasons__dlm.content_c_formula__c >= Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c && Multinomial_Churn_Reasons__dlm.content_c_formula__c >= Multinomial_Churn_Reasons__dlm.poor_video_c_formula__c, "Content",
IF( Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c >= Multinomial_Churn_Reasons__dlm.boring_content_c_formula__c && Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c >= Multinomial_Churn_Reasons__dlm.competitor_c_formula__c && Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c >= Multinomial_Churn_Reasons__dlm.content_c_formula__c && Multinomial_Churn_Reasons__dlm.disengagement_c_formula__c >= Multinomial_Churn_Reasons__dlm.poor_video_c_formula__c, "Disengagement", "Poor Video Quality"))))
Pro-tip: remember to keep the proper nesting structure as shown in the code above (the code is syntactically correct). Check to ensure there are spaces on either side of the >= comparator and the variables, spaces on either side of the logical “AND” operator &&. Insert the variable names instead of typing them out.
Note: Pay special attention to the last line of my reference code, there are two string choices for the last logical test.
Validate your formula and click on “Apply”.
Next, return to the Report screen. Under Group Rows, choose “Churn Reason” to represent your known categories. Then, under Group Columns, select your Row-Level Formula, “Predicted,” which reflects the model’s prediction.
If you hit preview, you may not see anything. That’s because we need to report on the whole validation dataset instead of a partial preview, so go ahead and click on Save & Run or Run.
So Save and run your report. Once the run operation is complete, you’ll have the Confusion Matrix that provides a summary of our multinomial classifier’s performance.
For example, in the “Boring Content” category, the model correctly predicted 199 out of 201 instances (199/201 = 0.9900498, or over 99% accuracy). This is an excellent result. In this simulated dataset, only a few rows are misclassified. However, in real-world scenarios, many categories may overlap, leading to more nuanced predictions.
It’s important to be cautious of results that seem too good to be true. In an upcoming blog post, we’ll explore why overly perfect results can actually be problematic.
Concluding Remarks:
For numerous businesses, identifying whether a customer will churn represents only a fraction of the process. Elevating the analysis by predicting the Churn Reason provides actionable insights, enabling businesses to proactively address potential churn.
For instance, if the predicted churn reason is Poor Video Quality, offering a discount on subscription may not be as effective as addressing the customer’s preference for a competitor’s service. This mismatch in intervention is more likely when relying on binary predictions that only indicate whether a customer will churn, without specifying the underlying reason.
Multinomial classification enhances the power and utility of predictive models by offering a nuanced understanding of customer behavior. This deeper insight allows businesses to implement targeted interventions, thereby improving customer retention and overall satisfaction. The ability to predict specific reasons for churn transforms data into actionable strategies, making multinomial classification an invaluable tool for business intelligence.