Optimizing CRMA External Connectors with “Incremental Sync”

We all want our data to refresh at lightning speed. Slow refreshes mean stale data, and potentially some messages from our end users asking “what gives”. So when theres a way to speed up our connections to external data sources, I know I’m listening!

Through configuration of our external data store connections (such as Snowflake and S3), we are able to achieve “incremental sync” with these connections in Data Manager. While this article will be focusing on a Snowflake connection for examples sake, the functionality displayed applies to other external data stores such as S3.

What is incremental sync?

An incremental sync is when only new, updated, and deleted records are pulled relative to the latest successful sync. This method of data refresh results in fewer records being pulled from your datastore, meaning faster refreshes! Consider the example below. Initially in CRMA, we have a data set that is 3 rows. In our external database however, we have a new row “Taylor” in green. An incremental sync would only process the one new “Taylor” row into our CRMA dataset, rather than copying all 4 rows over as a Full Sync would.

In the real world, we aren’t dealing with single digit rows of data. You likely have thousands, if not millions, of rows of data you are working with. So, being able to reduce a refresh job from processing say 10 million rows down to 80,000 rows means a faster, more resource efficient data refresh.

To accomplish this in CRM Analytics with external data, we have 3 approaches we will explore: CRM Analytics’ Data Sync Filter approach, External Connection Custom View (Snowflake) approach, and finally the Beta (GA Summer ’25) Existing Dataset Append approach! First, let’s establish our external connection.

Setting Up Your External Data Connection

If you are familiar with our external connectors in CRM Analytics, you know that we only have “Full Sync” as an option when connecting to objects in Data Manager. Have no fear! We can still optimize our external connection to achieve incremental sync. Lets get started with our example. We first need to replicate our data into CRM Analytics from Snowflake. We do this by entering Data Manager in our org, and configuring the Snowflake Input connector.

More information about each argument in the connector window can be found in our help docs here. Once configured, we can now see the selection of Views and Tables from our Snowflake org. For this example, we are going to connect to a table of data, GLOBALORDERS, that houses all of our orders.

We can now save our connected object. Once we connect to our “Global Orders” Table from Snowflake, we can run an initial recipe to register the Table as a Dataset in CRM Analytics. Be sure here to register the dataset in an App in Analytics Studio (not My Private App!). This is crucial, as this is our first initial full copy of data over from Snowflake into CRMA’s data store. From here, we are able to incrementally add on new rows rather than full copy our data over every refresh.

Great! We now have our Snowflake connection configured, and our Table from Snowflake replicated as a data set in an App in Analytics Studio. If we ran our sync and recipe as defined now, it would be a Full Sync each time. Through some filtering, we are able to configure each sync to be incremental!

We have two major approach types to consider: a Data Sync Filter approach, or a Custom View in our External Database approach. This is more a matter of if we decide to define our refresh logic all on CRM Analytics Architecture, or the Architecture of our external datastore. For learnings sake, we will cover both.

We will start by looking at our data set in Snowflake. It looks like new orders for 2024 came in and are in our GLOBALORDERS table now. We need to sync this data into CRM Analytics!

Data Sync Filter Approach

High level, we can use a data sync filter on date fields to only get this years dataset. We then append the rows for this year to our existing main data set in CRMA. To handle duplicate rows, we drop the data from our initial dataset from the current year with a filter node.

We can start by going to our Snowflake Input connection. From there, we select our GLOBALORDERS Table, and navigate to the Data Sync Filter tab. This allows us to define a filter that, upon executing of a sync for this Table, will be passed to the data store and executed as part of a WHERE argument in querying the data Table.

We have defined the sync to now only get rows from our table where the Order_Date is 2024 and beyond. You can see we define the filter in the context and logic of our Snowflake instance. This is because the filter is computed in Snowflake. So, we need to use the API names of our fields in the Snowflake Table, and we need to follow Snowflake’s date-logic. We can test our Data Sync filter by running the sync.

The sync ran successfully, and we see only 3 rows came in! These are the three rows of 2024 orders from our Snowflake table. Great! Now we can use a recipe to automate adding these rows of data into our dataset.

Our recipe overwrites our existing dataset by first reading the dataset from CRMA in, adding new rows from the GLOBALORDERS table we just defined our data sync filter on via an Append node, then outputting the new dataset to overwrite our initial dataset. The key thing to mention is that we filter out the 2024 data before appending. This is to avoid duplicates in our data set. Once we save and run this recipe, we can go back to the data set in analytics studio and see it matches 1-1 with our Snowflake table! We can schedule our recipe run, and rest easy knowing we have a more performant sync now.

Custom View in Snowflake Approach

Alternatively, we can apply our logic in Snowflake by using Views. Here, we can create a new View, apply our logic to show only the most relevant data, then connect to that view in CRMA’s Data Manager.

First, we create the custom View in Snowflake. In the WHERE clause of our SQL statement for the view, we apply our filter logic to subset our data in Snowflake. In this View, I am getting a rolling window of the last 6 months of order data from the ORDERDATA Table.

We can verify our WHERE clause in the Data Preview pane, seeing here that of my original dataset, only this order 167533 has been placed in the last 6 months.

Now that our view looks good, we can go into CRMA’s Data Manager and add it as a new object in the Snowflake Connection.

Once we save our new Object, we can create a recipe to incrementally add our new rows from the View into our CRMA dataset.

The logic of our recipe follows that of the first Data Sync Filter approach we covered above. There are two key differences though. First is that we use the Custom View from Snowflake as an input node to append to the CRMA dataset. Secondly, our filter node matches the WHERE clause logic we defined in the View. This is so that we are not duplicating any rows in CRMA! Since our View filters data to only pull that last 6 months of rows, we drop the last 6 months of data in CRMA so we can update it with the latest information. Since we are using relative dates, you will want to ensure here that your Snowflake Org time zone matches your Salesforce Org time zone! If they do not match, adjust the WHERE clause in Snowflake and Filter Node in your recipe to match.

Snowflake Approach Using New Optimized Append

The most performant approach is to use the new Existing Dataset (Append) output node in recipes (Beta as of writing, GA target in Summer ‘25). This new feature allows us to minimize the number of rows of data for each recipe run. For example, if our Global Orders dataset in CRMA was 10,000,000 rows and the latest round of order data from our Snowflake View was 50,000 rows, then this approach would only be loading in and processing 50,000 rows for a recipe run. With the prior approaches, each recipe run would load in the new 50,000 rows dataset and the 10,000,000 rows dataset, then have to process 10,050,000 rows of data. This will result in a longer recipe run. So, this new Append approach is much faster, accounting for 60% faster recipe run times in our testing!

Let’s get started. For example’s sake, I’ve created a new snowflake view that pulls the last 7 days of order data from Snowflake. I have then gone ahead and connected to this new View in my Snowflake Input connection in Data Manager.

Create a new recipe with input node being our snowflake view, and output node being the new output node. Note, while we do not include any transforms, joins, or other operations in this example recipe, you certainly can for your use case!

We’ve defined a pretty sleek recipe. Our input node is pulling in 7 days worth of data from Snowflake, then our new output node type is appending those new rows of data to our main Global Orders dataset in CRMA. Let’s click into that node to see whats happening.

Here we can see we’ve selected the new “Existing Dataset (Append)” output type. We have also defined the “Append To Dataset” in the output node. This is effectively following the logic of the recipe from our previous method in the “Custom View in Snowflake Approach” above, but has a key advantage. Whenever this new recipe runs, only the Snowflake Custom View is loaded by the recipe engine, meaning we do not load the main Global Orders dataset from CRMA here. Less rows means faster recipes. But why don’t we include a filter to drop duplicate rows like we have done in our recipes so far? Well thanks to clever refresh scheduling, we don’t need that filter node.

Now, we schedule the Snowflake Connection data sync in Data Manager to sync every 7 days. While we use a 7 day window for examples sake, this could expand or contract based on your use case (1 day filter on the view in Snowflake, refresh data sync every day in Data Manager).

Now that we have scheduled our Snowflake data sync, we can finally schedule the recipe to run after the external connection syncs.

Will this run after every external connection syncs? Nope. Just the external connection(s) we use in our recipe. So, since we have used just the Snowflake_Input connection, this recipe will run only when that connection syncs. Let’s zoom out. We now have our Snowflake view querying the last week (7 days) of data, our Data Manager connection syncing every week, and our recipe adding rows to our main Global Orders dataset every week. We can now let automation do its work, and watch our dataset automatically refresh with current data.

Considerations

Optimizing your external data connections all comes down to using logic to subset our data. In this case, we explored how this logic can be handled entirely on the CRMA side, or handled largely by our external data store (Snowflake). We also looked at how we can use a new Output Node type to achieve the most performant approach!

What approach makes the most sense is up to you and your organization. We should be thinking about factors such as filter complexity, maintenance and governance, compute time, and costs to name a few. That said, always be sure you are bringing in only what data is necessary for your analysis. The most surefire way to reduce query costs and complexity is to exclude irrelevant data!

For more reading on external data connections in CRM Analytics, check out this article.

Written by

Andrew Caietti

Andrew Caietti is a Specialist Solutions Engineer focused on Analytics and Machine Learning use cases.

See author's posts

Aakash Chauhan

See author's posts

Optimizing CRMA External Connectors with “Incremental Sync”

What is incremental sync?

Setting Up Your External Data Connection

Data Sync Filter Approach

Custom View in Snowflake Approach

Snowflake Approach Using New Optimized Append

Considerations

Written by

Andrew Caietti

Aakash Chauhan

Related

Leave a Comment Cancel Reply

What is incremental sync?

Setting Up Your External Data Connection

Data Sync Filter Approach

Custom View in Snowflake Approach

Snowflake Approach Using New Optimized Append

Considerations

Written by

Andrew Caietti

Aakash Chauhan

Share this:

Related

Leave a Comment Cancel Reply