Einstein Discovery – Live Predictions with Snowflake
Training a machine learning model in Einstein Discovery relies upon data in CRM Analytics (CRMA) datasets. The tight coupling with CRMA datasets provides many benefits to customers in terms of data availability, feature transformations/calculations, a rich ETL layer, connections to dozens of common data sources, and so on. Once deployed to Salesforce, Einstein Discovery models can provide real-time predictions and recommendations on data the model has not seen before. This architecture works very well for scenarios where the center of gravity for the data (and the records to be scored) is in Salesforce.
Of course, sometimes the data required for building Einstein Discovery models does not reside in Salesforce. For that purpose, it’s been possible for a while now to join the external data (at prediction time) with data in a CRM Analytics dataset. That, however, requires moving the external data for predictions into a CRMA supplemental dataset and keeping it up to date there.
Now, in Winter ’23 we’ve introduced a capability that eliminates the need to move the external data for Snowflake customers. Specifically, if the data is in a Snowflake table, it can stay there, because now Snowflake can serve as the host of the supplemental dataset. This means that you can now choose to leave data in Snowflake and still have those features available at prediction scoring time for records in CRM. This blog will provide a brief over and walkthrough of the new “Make Live Predictions with Snowflake” capability.
This new type of “live dataset join” in Einstein Discovery allows predictions to be computed using data from a system external to Salesforce. The only data that needs to be moved is for the initial training of the model. So for example, you might have useful predictor variables/features in Snowflake, but perhaps you really only need a small fraction of that to train a quality model with Einstein Discovery. Therefore, once model training is completed, and you’re ready to deploy the model, the relevant features in Snowflake can be used for real-time predictions in Salesforce – without moving the data.
This opens up a new opportunity for customers who may have most of the necessary data for predictive modeling in CRM, but also want to enrich predictions with data from Snowflake. For some customers, this also means that you can effectively use Snowflake as a “Feature Store” to supply enriched/transformed inputs for your Einstein Discovery models. There are numerous examples where data elements in Snowflake could be valuable features for a machine learning model that will provide scores and recommendations in Salesforce. Some common types of data in Snowflake might include:
- Transactional data
- Sales and revenue history
- Custom features created with complex transformations
- Marketing data (e.g. web site clicks)
- Payment and financial data
- Scores from existing ML models (e.g. Total Lifetime Value, CSAT, etc).
Walkthrough
With that brief background on this new feature, we will now walk through a simple example of how to set it up and start generating predictions in Salesforce – using a combination of data in CRM plus data resident in Snowflake. Please note this walkthrough assumes you have at least the basic knowledge and experience required for navigating Analytics Studio, creating Recipes, Datasets, Einstein Discovery Stories, etc.
Step 1 – Dataset Readiness
For this article, we are going to assume you’ve already curated a CRM Analytics dataset that you intend to use for model training. If you’re not familiar with creating CRM Analytics datasets, there are numerous blogs, videos, online training to learn how.
In our example, we are going to build a loan approval prediction. In the CRM Analytics dataset below, you can see that our outcome variable is a binary value. Along with the Approved column, there are several other features that will be used to train our model. These include two columns (Region Assessment and Region Population) which were extracted from Snowflake to enrich our available CRM data.
Step 2 – Create a Live Dataset
Next, we will create an external dataset which will be ‘live’ data from Snowflake. We will use this later as a supplemental dataset in Einstein Discovery.
You can see below that our Snowflake table has three columns which we attached to CRM Analytics with the Live connector. Two of those columns are important predictor variables for our model. The ID column is also critical because we need it to join the Snowflake data to the records in the Salesforce object we want to deploy our model to.
Step 3 – Create Einstein Discovery Story/Model
Now that we have our datasets prepared, we can move on to training the model in Einstein Discovery. As usual, we create a Story with our outcome variable – in this case, we are maximizing the likelihood of loan approved = true.
Notice in the story below, that the Region fields from Snowflake are two of the three ‘most important‘ variables in our predictive model.
Step 4 – Deploy
Once you are finished iterating on your Einstein Discovery model, then of course it’s time to deploy it to Salesforce. This is where the real power of Live Predictions with Snowflake will become evident. The first thing you will do after selecting Deploy from your Einstein Discovery Story is to connect the model to the appropriate object. In our use case, we are selecting the Loan Approval object.
In the Map Model Variable step of the deploy wizard, you will need to add your Snowflake Live dataset as a supplemental dataset and map your model variables appropriately (shown below). Notice that the ID column from our Snowflake data matches an ID field in our CRM object.
Step 5 – View Predictions in Salesforce
After deploying the model to the appropriate object, and adding the Einstein Discovery Lightning component to the page, you should see something similar to the screen below. Notice in our model the top predictors are the Region variables that we joined from Snowflake. Even though these data elements are not persisted anywhere in Salesforce, they are available as predictors on each record so that your users can better understand the probability score.
To wrap things up, this new Einstein Discovery capability in Winter ’23 allows you to enrich CRM records with actionable predictive insights that use “live” Snowflake data. I’m anxious to hear about how you use this new functionality for your machine learning use cases!