Beyond Binary: Unlocking Business Insights with the Power of Multinomial Classifiers – Part 1
Multinomial classifiers open new possibilities for making business insights richer and more actionable. Unlike traditional binary models, which predict simple yes-or-no outcomes, multinomial classifiers dive deeper—identifying the specific reasons behind customer behaviors. In this blog, we’ll explore how multinomial classifiers can transform insights, using customer retention in a streaming service as a powerful example of their potential.
Part 1: What are Multinomial Classifiers and why do we need them?
Imagine you’re managing a popular streaming service, and every month, a certain percentage of subscribers decide to leave. Traditional binary prediction models might help you identify whether a subscriber is likely to churn, but you want to go a step further—specifically, to understand what action might keep them engaged. This is where multinomial classifiers come in, addressing a gap that many businesses overlook. The good news? It’s achievable within Data Cloud with just a bit of extra configuration.
Let’s use a real-world example. Instead of just knowing that a subscriber is likely to churn, imagine being able to predict the specific reason behind their decision. Are they leaving because they found a better deal elsewhere? Are they dissatisfied with the content? Or perhaps they’re facing financial difficulties? By understanding the exact reason, you can take targeted, preemptive actions (let’s call them proactive solutions) to address these issues before they turn into churn.
For instance, if you know a subscriber is likely to leave due to financial difficulties, you could offer them a temporary discount or a more affordable plan. If they’re unhappy with the content, you could recommend new shows or movies that match their preferences. This approach not only helps in retaining customers but also provides valuable insights into what drives customer behavior.
Some common churn reasons for a streaming service can be:
- Content dissatisfaction: Subscribers might feel the available content no longer meets their interests.
- Better alternatives: Competitors might offer more attractive packages or exclusive content.
- Financial constraints: Economic factors might make it difficult for subscribers to justify the expense.
- Technical issues: Persistent problems with streaming quality or app performance can frustrate users.
- Lack of engagement: Subscribers might not use the service enough to see its value.
By predicting these reasons, you can identify early indicators such as changes in viewing patterns, decreased login frequency, or increased buffering complaints. These insights allow you to engineer features and metrics that can help create a more robust and proactive churn prevention strategy.
If you have more churn reasons—or a long list of possibilities—these can be grouped into an “other” category. Monitoring predictions that frequently fall into the “other” category can reveal areas that warrant additional research, helping you decide whether to expand your list of categories.
If you’re facing a similar problem, then this is the blog post for you. In this post, I’ll explain how multinomial classifiers can be applied to business use cases like this and how they can revolutionize your approach to customer retention. It’s simple enough for you to implement, so let’s dive in.
Quantifying Churn Reasons: Turning Insights into Actionable Metrics
To make our churn reason predictions manageable, we don’t need to capture every possible reason subscribers might leave. Instead, focusing on the top five reasons (or top ten) can provide significant insights. For our example, we’ll use these five classes:
- Content Dissatisfaction
- Better Alternatives
- Financial Constraints
- Technical Issues
- Lack of Engagement
Some of these reasons are straightforward to quantify. For instance, Technical Issues and Lack of Engagement can be measured through early indicators like the frequency of buffering, contacts with technical support, and usage metrics. But what about the others? Let’s explore how we can turn them into quantifiable metrics:
- Content Dissatisfaction: This can be tricky, but we can look at metrics like the number of times a user skips content, the frequency of low ratings, or the lack of interaction with new releases. Surveys and feedback forms can also provide direct insights.
- Better Alternatives: To quantify this, we could track mentions of competitors in customer support interactions or social media. Additionally, monitoring the frequency of cancellations shortly after a competitor’s promotional campaign can be a useful indicator.
- Financial Constraints: This can be inferred from changes in payment behavior, such as late payments or downgrades to cheaper plans. Economic data, like unemployment rates in the subscriber’s region, can also serve as a proxy.
By redefining these reasons as quantifiable metrics, we can create a more robust prediction model. For example, if we notice a subscriber frequently skipping content and giving low ratings, this might indicate content dissatisfaction. Similarly, if a user downgrades their plan and misses payments, financial constraints could be the underlying reason.
In the next sections, we’ll dive deeper into how these metrics can be used to build a multinomial classifier that helps businesses not only predict churn but also understand the reasons behind it. This way, they can take proactive steps to address these issues and retain their subscribers.
Use Einstein Studio Model Builder
While Einstein Studio Model Builder offers convenient point-and-click model building, it currently doesn’t support multiple classes directly. However, this is manageable with the One-versus-Rest (OvR) strategy.
One-Versus-Rest (OvR) Method
The one-versus-rest (OvR) method, also known as one-versus-all, is a strategy to extend binary classification algorithms to multi-class classification problems. Here’s how it works:
- Splitting the Problem: For a multi-class classification problem with n classes, the OvR method splits it into n separate binary classification problems. Each binary classifier is trained to distinguish one class from all the others.
- Training the Classifiers: Each classifier is trained to predict whether an instance belongs to its respective class or not. For example, if we have three classes (A, B, and C), we would train three classifiers:
- Classifier 1: Class A vs. Not Class A (Classes B and C)
- Classifier 2: Class B vs. Not Class B (Classes A and C)
- Classifier 3: Class C vs. Not Class C (Classes A and B)
- Making Predictions: When making a prediction, each classifier provides a score or probability for its class.
- Score Normalization: The scores from independent binary classifiers need to be normalized using the softmax function to ensure they are on a comparable scale. This normalization converts raw scores into probabilities that sum to 1, making it easier to interpret and compare the confidence levels of each classifier.
Example Scenario
Imagine you have a dataset with three types of fruits: apples, oranges, and bananas. Using the OvR strategy, you would create three binary classifiers:
- Classifier 1: Apples vs. the rest (Oranges + Bananas)
- Classifier 2: Oranges vs. the rest (Apples + Bananas)
- Classifier 3: Bananas vs. the rest (Apples + Oranges)
When a new fruit instance is presented, each classifier will provide a score indicating how likely it is to belong to its respective class. The scores from the ensemble of binary classifiers are then normalized.
Pros and Cons of One-Versus-Rest
Pros:
- Simplicity: OvR is straightforward to implement and understand. It leverages existing binary classifiers, making it easy to apply.
- Flexibility: It can be used with any binary classification algorithm, allowing for a wide range of models to be employed.
- Interpretability: Each classifier’s decision boundary is easy to interpret, as it only needs to distinguish one class from the rest.
Cons:
- Imbalanced Data: OvR can struggle with imbalanced datasets, where some classes are much more frequent than others. This can lead to biased classifiers.
- Computational Cost: Training (n) separate classifiers can be computationally expensive, especially for large datasets with many classes.
- Overlapping Classes: If classes are not well separated, the classifiers might have difficulty distinguishing between them, leading to lower overall accuracy.
High-level plan to build out the Multinomial Classifier using Einstein Model Builder
- Data Preparation for Training Predictive Models
- The primary objective is to create five separate training datasets by converting the multiclass column into binary columns.
- Utilize Data Transforms to accomplish this task.
- Model Training
- Train five binary prediction models using the datasets created in Step 1.
- Simplified model evaluation; a more detailed blog on this topic may be provided.
- Activate the models to make them operational.
- Post-Prediction Calculations (Softmax)
- Use Data Transforms to make predictions with the five binary classification models using the AI Model node.
- Combine the results into a normalized score where the probabilities sum to 1 within the same Data Transforms design.
In the next part of this series, we’ll dive deep into the actual implementation and prepare the needed datasets.