In order to satisfy increasingly diversified customer demands, product and service portfolios are steadily becoming larger. B2B and B2C companies are expected to guide their customers to the best fitting bundle in their large panoply of products. Predicting the likelihood of a customer or prospect purchasing a particular product is essential and requires a deep analysis of large amounts of data, for which machine learning capabilities are often used. For companies that primarily target business customers, finding the best set of products for every customer is commonly referred to as propensity to buy. This article explains how to analyse and predict propensity to buy with Einstein Discovery. The use case was designed and written in collaboration with Reinier van Leuken, and I want to say thank you for the teamwork.
Large Variety of Targeted Customers
Let‘s start with a brief introduction to some important types of markets. This distinction clarifies which markets are best suited for the machine learning use case and which are better served by other types of models.
- Direct / Indirect: In a direct market, a product manufacturer sells to point-of-sale customers directly. In an indirect market, the manufacturer sells to a distributor first, who then sells to point-of-sale customers. This is a very common distinction in the Consumer Goods space, where companies can be in either direct or indirect markets, or in a combination of both.
- B2B: A B2B market addresses business customers. An example is the telecommunication space in which companies sell internet subscriptions and phone services to other companies.
- B2B2C: In a B2B2C market, companies sell to direct consumers via other distributors (for example, Consumer Packaged Goods companies).
- B2C: A B2C market addresses direct customers. An example is in the luxury industry in which companies have, in general, their own stores to sell their creations to their customers.
Propensity to buy is a valuable use case for both B2B and B2B2C
Propensity to buy is a good fit for B2B and B2B2C markets, both in direct and indirect markets. In fact, in order to promote their products, companies in those markets rely on a large population of sales representatives who, on daily basis, interact with their customers to check their product presence and look for upsell opportunities. In their big portfolio, sales reps need to quickly find high-value customers to visit. Once on-site, they need to come up with the best fitting products to propose to that customer. Please note that, even in an indirect B2B2C market, the sales rep makes visits to point-of-sale customers to check the product presence and to influence sales (even in an indirect way).
A high potential customer is a customer who has an important probability of accepting a new, high-margin product. The fit of a product for the customer depends on a combination of both the product characteristics (size, price, and so on) and the customer characteristics (region, demographics, and so on). Finding those best fits is like looking for gold in a river: an intense but extremely valuable process. Now imagine an automatic assistance that guides you directly to the gold! This guidance is referred to as propensity to buy, and it is why it is a very important puzzle to solve.
Propensity to buy is not the same as a product recommendation in B2C!
It’s common to confuse propensity to buy with product recommendations (recommending products directly to consumers) in B2C. It is, after all, about choosing the right product according to a consumer profile. However, from a machine learning point of view, they are very different.
First, a consumer (person) and a company are simply not comparable in terms of behavioural characterisation. For example, the complete overview of a consumer’s purchases is not always known, because they can buy across different platforms and have different identities on each one. As a consequence, the data used for consumer product recommendations are not about the past purchases, but rather about high-level consumer profiles and what they have included in the same basket. This is a key difference that partially explains why very different models used. In fact, in the traditional product recommendation world, there is a greater need for clustering algorithms. These generally use so-called collaborative filtering algorithms that cluster both consumers and products, matching them together depending on what similarly behaving customers have purchased in the past. Clustering examines the interaction of multiple variables to join similar products or customers. Collaborative filtering belongs to the unsupervised learning subset of machine learning problems. Einstein Discovery, on the other hand only answers supervised learning tasks and uses for that different algorithms (Ex: Generalized Linear Model, Gradient Boosting Machine).
Second, sizing differs between product recommendation and propensity to buy. Product recommendation is designed for thousands or millions of products and customer types. This is not ideal for the algorithms used by Einstein Discovery.
Leveraging Propensity to Buy in the CRM Context
One of the biggest threats for AI projects is the lack of adoption by business users. This is often due to not explaining the predictions coming from the model (in terms of top factors) and not embedding these results in an operational system. Einstein Analytics is embedded in the familiar Salesforce experience, which enables users to quickly build rich user experiences by combining AI and data visualisation capabilities for decision support. As a consequence, the Einstein Analytics platform increases adoption. Without leaving their day-to-day operational user experience, the sales rep can leverage integrated analytics dashboards to understand which customers to prioritise, and then navigate to the particular targeted account to understand which products to propose.
Dashboards are completely configurable. Dashboards can incorporate predictions and other data coming from external sources or from Salesforce. The goal is first to give the sales rep contextual and explained predictions, but also to provide a consultative selling story to use in the interaction with the customer.
An Example
The following gif shows a dashboard that can be included on the home page for sales reps.
The X-axis represents a Health score, which reflects customer loyalty (in terms of total purchases). The Y-axis represents the future potential of a customer, which is calculated using the propensity to buy model.
This dashboard allows for an informed prioritisation of the customers, balancing current health with future potential, in exactly the way the sales rep desires. The sales rep can attend to boosting already loyal customers even further, focus on the not-so-loyal customers with a high potential, and so on.
After having selected the Accounts to target, the sales rep can take action (like inserting in today’s route planning or inspect further details on the relevant Account pages) directly from the dashboard.
The following gif shows a dashboard embedded in the account page.
On the upper right side, all the SKUs already purchased by the customer are ranked with their corresponding sales volumes. Once we select a SKU, we see in the Treemap chart all the other SKUs that are purchased by customers who have also purchased the selected SKU, but that are not yet purchased by the Account that we are inspecting here. The middle dashboard shows those whitespace SKUs ranked by their Einstein prediction scores.
If a product is highly scored by Einstein in a particular shop, then it is probably because it is a success in other shops having similar characteristics as the studied one. However, Einstein’s scoring is not solely based on the products’ ID, but also on their features. Basing the machine learning on those product features as well can surface similar products that may also fit well, even if those exact products were not yet proposed in similar shops – their characteristics that are utilised in estimating the success. For example, suppose a newly launched product has a taste that is very similar to an already introduced product. The new product now benefits from the existing observations and gets ranked highly.
Einstein does not dictate to the sales rep which products to propose. Rather, it suggests new insights in a human-machine collaborative way. In fact, all results are explained transparently, which avoids black-box approaches and increases business trust and adoption.
How do you set up such a dashboard?
Required Input Data
To come up with such recommendations, Einstein Discovery is essentially solving a classification problem by answering the question “Is the product a good fit for this customer?” For that, the model needs to learn from both the success and failures of introducing new products to customers. Therefore, both past sales and product suggestions that were made to customers but not accepted need to be input to the model.
The data must be enriched with the relevant contextual information:
- Product characteristics: Ex: size, color…
- Customer characteristics: Ex: region, Demographics, …
- Customer Past purchases: Past purchases of customers are a good predictor of what they will buy in the future. It is therefore highly recommended that you add binomial columns (one per product or product family) in the dataset that indicates whether (yes, no) the customer has the associated product. In fact, the model would use those characteristics as drivers to quantify customer interest in a certain product. Note that the number of columns in the Einstein Discovery Story will increase. Thus, in order to keep the model sizeable, it is recommended to do it only for the main products, or aggregated to the product family level.
Data Structure Example
Let’s simplify to a case with only three products (A, B and C) and two customers with the following information:
This information requires a transformation before it can be used as input data to Einstein Discovery. All possible combinations of products and customers need to be explicitly present, with a row for every Product-Customer combination, and an output variable that specifies whether the product was purchased by the customer or not:
Definition of yes and no: It does not make sense to include all past purchases of a customer. Why not? Because their behaviour may change over time. One possible solution is to estimate the average ordering period between two orders. Positive examples (output variable = yes) then only include orders that fall in the last period. After you have decided on your definition of the period, you can make the necessary calculations in the data manager. The rationale behind this idea is that if a customer did not buy the product recently, their interest has either disappeared, or they were never interested in the first place.
How will my model learn from positive customer purchases if they are much less frequent than the negative ones?
Discovery is solving a classification problem by answering the question ”Is it a good fit?“. Discovery then needs to learn from the success and failures of introducing new products to customers. As mentioned previously, in a perfect scenario, we have the explicit negatives and positives: the customer should have been clearly introduced to a product and expressed their acceptance or refusal. However, looking at all possible combinations will generate a lot of negative examples and fewer positive examples, because a customer will typically only have a few products and not all of them. The model will, in this case, underperform. In fact, The model’s objective is mainly to maximise accuracy, which is a measure of the total ratio of correct predictions, and not a ratio per specific class (negatives and positives are two different classes of the predicted output). If one class is much more than frequent than the second one, then the rare one will barely count in the accuracy, and thus predicting positive in propensity to buy use case will be not accurate.
We, therefore, need to reduce the number of negative examples and restore a different “class balance”. Without the so-called supersampling of the positive examples, this means dropping some of the negative examples. However, what negative examples should we drop? This is a process that requires careful consideration so as not to introduce a bias into the model. One approach that we have used successfully in the past is to keep only the negatives for the top X% customers. We took the hypothesis that the relationship is better with big customers and thus they probably were introduced to the products or know them better than smaller customers. When these customers don’t purchase a product, it is assumed to be for an educated reason, and not just because they did not hear about it. Therefore, their negative example is more worthwhile to keep than the negative example of smaller customers. A good ratio is to keep more or less 30% of positives out of the total training dataset. The exact fraction of customers to use is then calculated in such a way that the negative examples build up to 70% of the total dataset.
Where to find customers’ past purchases
In the direct market, customers make purchases from the company itself. Therefore, past orders usually come from the company ERP systems. However, in the indirect market (let‘s stick to the CPG space as an example), retailers and restaurants usually buy from distributors. They rarely report their actual sales to the CPG companies. In this case, some workarounds must be found: the sales can be estimated from the visit reports of sales reps, which reveal what products are on the shelf or from distributors themselves if there is a contractual agreement to exchange such data.
Can we always have One model for multiple products’ Propensity?
When predicting the propensity to buy for multiple products, using a unique model works only if all products can be described by a common set of features. Those features should explain why a consumer would choose a certain product over another. For example, in the beer market, SKUs can be completely defined by their flavour and size. Those features would almost completely explain why a consumer would choose one beer over another.
However, if the products are of a different nature or type, then multiple models are needed: one model per product family with a congruent feature set, or one model per single product. For example in the postal delivery space, services are very different (digital, physical mailing, and so on). Describing those products by a common set of attributes can be challenging. In this case, multiple models are required to cover the total product portfolio.
Summary
We believe in the collaboration between humans and machines. The propensity to buy use case is one of the best examples that shows this collaboration. The fit between a product and a prospective customer for that product is calculated using past observed behaviours and keeps on improving with sales input. Integrating the predictions in users’ screens with additional business context reinforces the confidence that the sales representatives have in the recommendations, and makes their product proposals and explanations to customers more trustworthy. EA ensures, as well, a fast deployment of the use case with no algorithm development. Only drag and drop operations are required to incorporate the recommendations in users’ screens.
Interesting article Olfa!
Great article that helped me to improve my model (using the 70%-30% ratio of positive vs. negative)