The Concepts of Data Sync – Data Orchestration Part 2
This blog is part of the Data Orchestration blog series. Having understood how to connect to different data sources in part 1 of this blog series, we will now take a closer look at data sync and its importance in regards to data orchestration.
What is Data Connect and Data Sync?
Tableau CRM (Einstein Analytics) by itself does not have any data. In order to get data into Tableau CRM, we need to first set up a connection between our data source and Tableau CRM as explained in the previous part of this blog series. We call this set up a “Connection”. The data could be in different data sources and we will have to set up a connection to each of our data sources. Tableau CRM has a list of out-of-the-box connectors available to bring in data, and the list of data sources you can connect to has increased with every release.
Note: You can check the latest list of available connectors here.
Note: We can connect to the same data source multiple times. See more on how to create another SFDC Local Connection.
Once we have connected to a data source, we can then decide which objects we want to sync and bring their data into the cached data. For every connection, we need to have at least one object for the connection to show in the list of connected data sources. The data gets cached in the Tableau CRM and this is not counted against the data row limit. There is no limit to the data that can be cached, though the external have some of their own limitations, which we will cover later.
The Importance of Cached Data
The major advantage of having the data cached in Tableau CRM is that we are able to query and fetch results faster as we are not hitting the actual data source. Rather we are just using the synced data with is locally available in Tableau CRM. As mentioned above the data that is cached does not count against the data row limit, as the data is not yet stored as a dataset that can be explored by users.
The below image shows the two ways we can get data in a dashboard; from a dataset via data prep and data sync or directly from the Salesforce object itself via Salesforce Direct. The query from the dataset is much faster compared to the Salesforce Direct (live data) as we covered in detail in part 1 of this blog series. Live Data and the Data from the dataset both have their advantages, and the use case determines which approach to take.
Note: For help with determining a good use case for live data, see tips for Salesforce Direct data queries.
Data Sync Considerations
Now we understand that having the data synced via the “Data Connect” we are leveraging the cached data layer which enables the ability to query large datasets faster. Leveraging this approach comes with a series of logistical considerations. One important consideration to have is how frequently do we need or want to have this cached data refreshed. There are two ways to kick off this data sync, we can run it manually or schedule the data connection.
When setting up connected data and scheduling the connections, there are certain limits to take into consideration, that can influence your data orchestration.
- The maximum number of concurrent data sync runs: 3.
- The maximum number of objects that can be enabled for data sync: 100.
- The maximum amount of time each data sync can run for local objects: 48 hours.
- The maximum amount of time each data sync can run for remote objects: 12 hours.
Note: See more information on how to schedule, run, and monitor your data sync.
When you set up a schedule for your data connection you can choose to have the cached data refreshed in time intervals as short as 15 min up to once a month. However, be sure to understand the limitations above as well as run times of the data sync as it may not be realistic to keep a 15 min schedule.
Different Types of Data Sync
Looking at your local Salesforce data and hence the local connector, there are three different data sync types to choose from; Full Sync, Incremental Sync and Periodic Full sync. Below each type is clarified to make it easier to choose which one fits your scenario.
- Full Sync: Pulls all records from the chosen Salesforce object, and overwrites the records from the previous sync. The complete cached data is wiped clean and fresh cached data is built for that object. In the below image, we see that we have made a data connection to the local Salesforce org (SFDC_Local) and are doing a full sync for the User object.
- Incremental Sync: Pulls only new, updated, and deleted records to match the changes since the previous sync for the defined Salesforce object. This is done by looking at the last modified date for a given record, if it lays within the last sync and now the whole record will be updated. You should be aware of any formula fields, if they are calculated based on related objects the last modified date might not be updated while the value of the formula field has, hence you can be at risk of data drift. Choosing this data sync method will regardless run faster at the cashed data of the object is not replaced. Hence incremental sync will be faster than a full sync.
- Periodic Full Sync: Runs an incremental sync on each scheduled sync. However, it also runs a weekly full sync on the first sync that takes place after Friday 11 PM in your org’s time zone. Hence periodic full sync ensures that there is no data drift, which might be the case with incremental sync as explained previously.
Mixing Up the Data Sync Type
A question you may ask now is: Can I have different syncs for different objects? The answer is yes, depending on the object we can choose to have a full sync, incremental sync, or periodic full sync. Below you can see how we can have different syncs for different objects.
Multiple Local Connectors and Data Sync Frequency
It is possible to have different sync frequencies by having multiple local connectors, divide the objects between the local connectors and schedule them at different intervals.
The business use case will define your data sync as well as the frequency. Generally speaking, there are a few things in mind before we decide on the cached data refresh rate.
Understand the Type of Data
First you should consider the types of data that you are dealing with, as it will help define how often your data changes and thereby how often needs to be refreshed.
There are two types of data; transactional and master data:
- Transactional Data: This is data related be day-to-day transactions hence it’s typically updated regularly. Transactional data usually requires a higher refresh rate, and you could set this data to be refreshed every hour. An example of transactional data could be “Events”, where its data typically change frequently and you might want to keep track of this near real-time.
- Master Data: This is data that remains unchanged over a period of time. An example of master data is the Account object, where data would not change as dynamically as “Events” data, discussed above. We typically don’t need to have a high refresh frequency for this data and a daily refresh would be sufficient.
How to Decide the Schedule Frequency?
The image below shows that we have two connections one for Master Data Objects and another for Transactional Data Objects. In this scenario, we have two connections and this gives us the flexibility to have these connections run at different schedules.
As a best practice, you have two connections. One connection will have the objects with transactional data and is set with higher refresh rates. The second connection will have the objects with master data which can be refreshed less frequent.
Consider the following example with two connectors as illustrated in the image below.
Connection 1 : SFDC_Local_Low_Frequency: Accounts / Products : Refreshed every day
Connection 2 : SFDC_Local_High_Frequency: Events / Tasks: Refreshed every hour
Object Level Sync Settings (For SFDC_LOCAL Connection Only)
The object level sync settings as described above including the different sync options (full sync, incremental sync, or periodic full sync) are only available for the local Salesforce connector and should be leverage as it will allow for a faster sync process. However, remember that we set the sync schedule at the connection level and not at the object level. In addition you should keep the following limits in mind when setting up your local connectors:
- You can have a maximum of 10 connections for SFDC_LOCAL connection.
- The maximum number of objects that can be enabled for data sync: 100 (external and internal objects).
In the next part of the data orchestration blog series, we bring all of these data connection and data sync learnings together by looking at a scenario and outlining what to keep in mind as you plan this part of the data orchestration process. Or head back to review the other blogs in the Data Orchestration blog series.
Can you please confirm is it for 24 hrs or 48 hrs, i can see the public document with 24hrs
The maximum amount of time each data sync can run for local objects: 48 hours