The phrase “Time is of the essence“ is used to express urgency in all kinds of fields, from legal to medical. In the context of predictive modeling, I like to read it as time being one of the most important concepts to consider. Unfortunately, though, that concept of time is often overlooked. Simply put, what is true today, may no longer be true tomorrow. Therefore it is crucial that we make predictions in the right temporal context. That’s what Projected Predictions, a new Einstein Discovery feature in Summer’22 is all about. Let’s look at a few use case examples that I have recently encountered while working with Einstein Discovery users.
Use Cases
Example 1
A manufacturer of building materials sells goods to construction companies. Let’s consider window frames as an example. The customer, i.e. the construction company, has the option to buy these frames fully finished, or semi-finished, e.g. uncoated. Sometimes they may even not buy these products at all, and produce the frames from scratch themselves. This choice is heavily impacted by the raw material price. When the raw material price is low, the margins for the manufacturer of building materials are high and it becomes attractive for the customer to make the frames themselves instead. However, when the raw material price increases this becomes less and less profitable for the constructor – and he may decide to buy the frames from the manufacturer. So seen from a manufacturer’s perspective, the probability to win an Opportunity is dependent on the raw material price, which obviously changes over time depending on market dynamics.
When predicting the probability to win an Opportunity, what matters isn’t today’s raw material price, but the price at the expected close date of the Opportunity – in the future. After all, at that future date, the customer makes the final make or buy decision. So to predict the most accurate Opportunity win probability, we first need to make a raw material price forecast and use a forecasted value (i.e. price at the close date) to generate a prediction.
The illustration above explains the dynamics of time. The predictive model uses 4 variables to predict win probability:
- Product
- Industry
- Segment
- Raw Material Price
The first three variables are static and don’t change over time. The raw material price however fluctuates as we can observe from the past pricing data. Today, the raw material price is $50. The Opportunity is expected to close in 164 days, however, and if we extrapolate the price dynamics, we expect that at the close date the raw material price has increased to $130! This may lead to a completely different probability to win.
Example 2
A non-profit organization that I worked with is heavily dependent on private donations. To optimize their efforts during the donor cultivation lifecycle, they are interested in the probability of a household becoming a major donor, which means donating over $25,000 in a single donation. This organization employs a team of gift officers who engage with those potential donors. Building the relationship takes time, and the gift officers maintain an expected donation date, similar to how a sales representative maintains a close date on an Opportunity.
This major donation is typically not the first donation that these donors make to the cause; a series of smaller donations (often over a longer period of time) precedes such a generous gift. Not surprisingly, the total sum of money donated to date is highly indicative of the probability of making a major donation of $25,000 or more.
And you guessed it already: when making the probability-to-major-donation prediction, it’s not today’s total donation to date that matters, but the forecasted sum of donations at the expected major donation date.
Example 3
A final example considers churn prediction. Here we consider a company that offers service plans to their customers, each plan has a defined maximum usage. How much the customer then really uses the services within their plan, is up to them. The company tracks this in a monthly KPI called usage per customer defined as Monthly Available Services / Monthly Used Services. Whenever a customer has a high risk of churning, a service retention agent reaches out to the customer to set up an improvement plan to try to keep the customer on board.
Typically such a retention program takes time, and therefore it is important to start tuning the customer around before it’s too late. Let’s suppose this retention program takes 5 months. This company then needs to predict the probability for a customer to churn in 5 months. If that particular probability is high, the time to act is now.
And indeed – to predict the probability for a customer to churn in 5 months, we need to know the forecasted usage KPI for that customer in 5 months from now.
What all these scenarios have in common, is that they are about predictions for a point in the future (e.g. the expected close date, the expected major donation date, a potential future churn date) and that these predictions are based on a numeric variable that changes over time (raw material price, total donated sum, usage rate). To obtain an accurate prediction, we therefore first need to forecast this numeric variable to the right future point in time and use that forecasted value to make the prediction. And that is precisely what projected predictions do.
Please note that none of this requires actually a different training of the model. All of this functionality is only related to prediction time. After all, the relevant observations to train on, remain the correlation between e.g. the raw material price and the opportunity closure, which are recorded simultaneously as long as each row of the training data reflects an opportunity with its outcome (closed won or closed lost) and the raw material price at the close date.
Setting it up
Let’s see how to set this up, to solve for use case example number 2, i.e. predicting the probability of becoming a major gift donor based on the total donation amount so far.
Projected Predictions are a part of the feature transformations in Einstein Discovery. These are data transformation processes that are applied to the input training data and the data that is used to generate predictions when the model has been deployed. These are powerful transformations; define once and use them twice.
They are configured in the Story Settings. There, we find this option for every numerical feature (i.e. measure in the training dataset). When selecting ‘Projected Predictions’ we will see that there are various settings we need to configure, let’s walk through them one by one.
Trend Dataset
This is a CRM Analytics dataset that contains the historic trend of the numeric variable that we will forecast prior to making a prediction. In this case, the projected variable is ‘Donation Amount Prior to Major Gift’, so we selected a dataset called ‘Donation History’. This grain of this dataset is {Household, Month}, therefore, contains the total donation so far on a monthly level for each household. A good way of capturing this data is by using the history tracking of Salesforce against that field!
Unique Identifier Column
Here, we specify a column from the Trend Dataset that links it to the object that the model will be deployed to. In this case, the ‘Donation History’ dataset contains a column called ‘Record ID’ that stores the record id of the household, for identification purposes.
‘Donation Amount Prior to Major Gift’ column
The name of this setting is identical to the name of the model variable that we are projecting for. With this setting, we specify which column from the Trend Dataset contains the projected variable, i.e. in this case the total donation given thus far.
Time Interval Column
The trend dataset is trended over time, and with this setting we specify which column from the dataset contains the date field. In this instance, the column is called ‘Date’
Time Interval
The time interval column is of date format, so with this setting we specify on which granularity we want to be projecting (Day, Week, Month, or Quarter)
Number of Intervals to Project Ahead
With this setting, we control the projection horizon, i.e. how far ahead to project. For example, when the interval is Month, and this setting is set to 5, the prediction is projected 5 months into the future.
Seasonality
Many temporal effects contain also seasonal effects, and this setting controls the sensitivity to seasonality (auto-detect, none, or a specific number of periods).
Using the Projected Predictions
When an Einstein Discovery model is trained using projected predictions, it affects all prediction channels. For a detailed overview of the different deployment types, see this article. There is one prediction channel where we can actually see the impact of this visualized, and that is the Einstein Predictions Lightning Component that can be embedded on Salesforce pages.
The component now lists the prediction including the projection horizon, e.g. a high churn probability in 9 months.
Note: The component above is showing you the results of use case 3, instead use-case 2(we setup our model as per use-case 2). Do not get confused. This is done simply to illustrate and cover two use-cases at the same time.
When clicking the link that reads “in nine months”, the visualization shown here on the right is opened. We see the revenue trend going downwards (purple dashed line) with decreasing confidence (purple corridors) and we see the different predictions that are associated with these revenue numbers. As the revenue line dips below 600, for this particular customer the churn probability increases to 98%.
In summary
Projected Predictions is a powerful feature that lets you work temporal patterns into your predictions. Especially when your prediction is heavily dependent on numeric variables, it is important to consider if these variables will have constant values, or whether they change over time. When they change over time, chances are that the accuracy of your prediction can be improved with projected predictions. That way, these numeric variables will first be forecasted. down to the point in time that your prediction is actually for. It will therefore impact that prediction with the time-adjusted, forecasted value, instead of a value that will be outdated when that point comes!
What numeric fields do you have in your models, that actually change over time?