Practical Approach to Converting Dataflows to Recipes
Have you heard the saying “All great JSON must come to an end”? Well, maybe not…Or Have you had “go with the dataflow” as a mantra then it’s time to hop on the new trend and start to live by the new mantra “follow the recipe”. Dataflows are slowly being deprecated and while it’s fun to reminisce about “back in the good ole days” when there was no Visual Editor and we had to make direct changes in the dataflow JSON. Better yet, we had to find creative solutions to complex requirements; it’s time to look towards the future where Data Prep should be relatively easy regardless of your technical background. By the way, we’ll still have great JSON, now it’s just in the background… like your favorite socks or pillow.
NOTE: to read more about the next-gen data prep check out the blog published earlier this year tackling the future of dataflow and recipes.
So what do we do about all those dataflows you already have created that power all your dashboards, and that your users are dependent on? You should of course convert them to recipes. This might seem like a daunting process, so this blog aims to give you some best practices on the approach you should take. We’ve gathered input from product managers and technical architects who have helped convert highly complex dataflows. This blog is a joint collaboration effort and authoring that walks you through a methodical conversion plan based on expertise for a practical approach from Von Clark McClendon which you can immediately benefit from now. Let’s do this!
Approach overview
The point of view of this blog is based on large enterprise customers, where multiple people are involved in the conversion project. It’s very likely that a consultant or implementation partner is involved; thus it’s important to note that knowledge from different stakeholders is critical and customer engagement is necessary. Though the point of view is from an enterprise user’s perspective, others can still leverage this blog and the approach put forward by modifying/simplifying where applicable. Throughout this blog, it’s assumed you have good expertise in CRM Analytics features and how to navigate the tool, therefore this blog will not go into detailed guides on “where or how to click in the tool”.
Finally, the blog aims to lay out a repeatable and structured approach that will reference other Salesforce blogs and Help/Training pages where it is needed to give more detailed instructions.
While, as you may know, a conversion tool has been introduced to help the dataflow to recipe conversion; you should expand your focus to garner more consideration in the conversion process. Thus let’s suggest the following steps, which we will delve into more depth throughout the blog.
- Planning
- Identify key players/teams.
- Create the conversion plan/draft.
- Conduct a Discovery call w/Business Analyst or Data Architect.
- Review key metrics for orgs dataflows.
- Run Adoption Analytics App.
- Review stats/take action.
- Dataflow optimization
- Review CRMA Asset Lifecycle (DEV to PROD).
- Identify the (most complex) dataflow to convert.
- Review the dataflow.
- Optimize the dataflow.
- Conversion
- Convert the dataflow.
- Review the output recipe.
- Recipe optimization
- Optimize or re-create the recipe.
- Run and test the recipe.
- Split into multiple recipes if applicable.
- Finalize conversion
- Update dataset and field references.
- Test, test, test!
- Deploy and schedule your new recipe(s).
- Remove assets no longer needed.
Before we drill into detail with each of the steps, please note that it’s always best to develop in a sandbox specifically an updated full sandbox if possible, because you are able to test data volume and timings without impacting your production environment. This also helps minimize any impact to your production users’ KPIs/SLAs who rely on datasets/recipes and/or dashboards to help make their business decisions.
NOTE: To know more about the development cycle please refer to this best practice blog series.
Finally, a Learning Days webinar was delivered on this topic. The session explores the following spreadsheets in great detail and you can therefore benefit from watching the webinar in connection with this blog. Check out the webinar here.
Step 1: Planning
As with any new project, you need to start off with a kick-off call as a minimum:
- Define project scope
- Identify stakeholders
- Define the project plan or steps to be taken
- Timeline
During the discovery call, it’s imperative to involve all the key stakeholders; so ask yourself who in your organization will carry out or “own“ the conversion project? Who will be the project sponsor? Who best knows and more importantly understands the logic of the dataflow? Who knows how the output of the dataflow is being leveraged? Who is responsible for data governance in CRM Analytics? Those are the key people that you will need to engage as a minimum to involve in the kick-off call.
As prep for this call, it’s critical to identify any/all key objects/metrics for your data sync and dataflows – basically any jobs processed in the Data Manager. These will be used throughout the project to help benchmark the outputs which invariably identifies the dataflow that you will tackle first.
There are of course a myriad of ways in documenting this but we suggest the outlined points in the structured matrixes as in the following sections.
Replication log example
The replication log helps you identify how long each one of your objects is taking to run data sync. By adding the different dates and time of the day that the replication runs, we are able to further able to identify a pattern of approximately how long it takes to complete. This overview is important as it must be considered when looking at dataflow or recipe completion times since either would need updated data before transforming and denormalizing data for propagations into datasets.
NOTE: It may be good to review the data orchestration blog series to better understand how to manage data sync, including multiple local connectors and sync options for master and transactional data.
In the example above you will notice the date column to define the date the data sync ran – of course, you can have multiple data syncs a day – as well as the time the data sync kicked off. The following columns are dedicated to the different objects you are syncing. Each object has a total and an actual column, which describes the time it took to sync that object with queued time and without queued time. You can find all of these details in the job monitor log in the Data Manager.
Dataflow log example
The dataflow table above helps you identify where the complexity lies by not only listing the total time it takes to run each dataflow but also by node group and finally how long the individual node ran. Typically there is one dataflow that is “problematic”; one that has become more and more complex over time and now takes a longer time to run. This is most likely where you want to start your conversion project ticking one dataflow off at a time, do not convert all your dataflows simultaneously.
If there are any questions on how to prioritize the dataflow to convert, it can be a good exercise to review which dashboard is most used and look at the datasets that power that dashboard. To aid this you can install and look at the Adoption Analytics templated app, this will provide insight into what datasets and dashboards your users (DAU/MAU) are most dependent on and you can include this in your prioritization process.
During the call make sure to divide tasks and define how often you will meet to discuss the conversion project’s progression and field any questions that may have come up. Typically a 15-30 min check-in should be scheduled twice a week.
Key Objects example
The Key Objects table is comprised of all objects and related fields/columns to be replicated including the volume of row counts. This is captured by Org ID, Instance (DEV to PROD). In other words, you would need this for each environment you are using throughout the project.
This is a critical table to capture as it baselines and helps to understand performance metrics/throughput. It also helps to give a quick overview to compare and contrast field utilization (possible removals/clean-ups) across the deployment.
Additionally, prior to the conversion, you may leverage a filter node logic on a specific object to prune data further ie. a use case could be that an object only needs Current Year (CY) data so the object filter would be Created Day > 20220101. Continue to analyze this and update the table as you prep and optimize the dataflow to convert it to a recipe.
Step 2: Dataflow Optimization
As mentioned, during the kickoff call you should be able to identify the dataflow you want to tackle first. But remember do not work on more than one dataflow at a time. The very first task is to gather even more information about the dataflow you are working on, so let’s dive into that.
Dataset Usage
The first question to answer is where is the output of the dataflow used. The dataflow will have one or multiple datasets (1:N) that are being used in visualizations such as lenses, dashboards, and/or stories. It’s critically important to understand where these assets are being leveraged downstream (end to end) from the dataflows (data asset lineage) so make sure you know where to update references later. This helps to pinpoint as well as where to focus your testing.
The datasets are represented by register nodes in the dataflow and for each dataset, you can from Analytics Studio click the drop-down menu next to the dataset and select “Edit” or “Open in Analytics Studio” if you are in the Data Manager. This will give a view of the source and all the different dependencies it has as illustrated in the pictures below.
Dataflow Node Timings
The next item on the list is to gather the details of the dataflow run times. This is a crucial exercise as it will help you identify the complexities in the dataset, thus where you need to draw your attention. In our experience over time more and more functionality is added to dataflows without cleaning out what is no longer necessary and they become performance bottlenecks. You do not want to carry over unnecessary transformations and fields to the recipe, so it’s best to review this beforehand, thus the level of granularity is at the level of each individual node.
The best way to get this data is to simply look in the job monitor of the Data Manager and expand the details for the dataflow run you are interested in. Another, perhaps quicker way, is to leverage Mohan’s plugin to get the CSV with the node label, timing, and node type. For more information on how to leverage this plugin check out this blog.
Optimize
With all the specifics gathered pertaining to the dataflow execution; it’s now time to review the dataflow even further and optimize it where applicable. We have found that over time, most dataflows tend to have more transformation and fields added to them, which increases the run time yet the datasets may very well contain more than what the business need. Needless to say, it does not make sense to include this technical debt in the dataflow conversion and you should address this before progressing to the conversion tool.
Now Darshna Sharma has already written an excellent blog on how to optimize long-running dataflows, so instead of repeating all the steps, tools, tips, and tricks here, please review and complete the steps outlined in her blog.
Prepping for Conversion
As you hopefully know recipes have more advanced transformations than dataflows and certain data use cases are addressed differently in the two tools. What this means is that there most likely will be some “creative dataflow solutions” that could be addressed in a more efficient way in recipes instead of a 1-to-1 conversion.
From our experience, there are some “gotcha’s” and data use cases that need to be highlighted in the context of converting your dataflow to a recipe, so that you can address it. Some of these are merely things to remember to set up after the conversion whereas others are crucial to include before trying to convert. Let’s have a closer look.
When analyzing your dataflow and finding the parts highlighted in the table you can with benefit leverage Mohan’s plugin to get a visual presentation of your dataflow like we know from the dataflow canvas, but without having to click every single node to preview the logic behind it. Check out this blog on how to use the plugin, what commands to use, and the details on what will be visually available.
As you look through the above table and find the scenarios in your dataflow, you will go ahead and note it down as well as remove the logic from your dataflow – depending on your findings.
Step 3: Convert the dataflow
With the deep dive analysis complete we are getting closer to hitting that convert button and leveraging the dataflow to recipe conversion tool. But before we get to that part there are some things to keep in mind about the tool before you attempt to use it:
- The tool is only available in orgs that have data sync enabled in the Analytics Settings.
- The tool will only convert eligible transformations from the dataflow – see details in the following.
- The original dataflow is not changed.
- The run schedule for the original dataflow is not changed.
- Using the conversion tool creates one corresponding recipe.
- A dataflow can be converted multiple times, however, each conversion will replace the corresponding recipe definition.
- If you want to edit a recipe generated by the conversion tool you should save it as a new recipe to avoid it being overwritten.
- The recipe generated by the conversion tool is not automatically run and does not inherit the dataflow scheduling.
- The datasets created by the recipe will not overwrite datasets from your dataflow, instead, they will have “_df2R3” added to the end of the dataset naming.
As mentioned above, the tool will review the dataflow logic defined in the dataflow JSON and map it to recipe functionality. To understand what is mapped, how it is mapped, and perhaps more importantly what is not mapped please review the Salesforce Help documentation on Dataflow Conversion Mapping before clicking that magic conversion button. Thus you should expect to modify the output recipe before it can run.
NOTE: Please note that some configuration/transformations will be not converted and you do need to review your dataflow against the current mapping definition found in the Dataflow Conversion Mapping documentation.
After reviewing this document you may find additional logic to remove before the conversion or revisit after the conversion, please do so before converting anything.
Another great resource to check out is Jim Pan’s “Abra Cadabra, Dataflows to Recipes!“ blog which contains some key considerations and recommendations
Once you feel confident you have completed the dataflow revision you can proceed to convert your dataflow to a recipe using the conversion tool. All you have to do is navigate to the old data manager and your dataflow, then click on the drop-down next to the dataflow name and choose “Convert to Recipe”.
How long it takes to convert the dataflow to a recipe depends on how big your dataflow is, but generally speaking it’s pretty fast. Once the recipe is ready it will open up automatically in your browser. You might notice that certain things didn’t convert correctly and it’s time to review all those warnings you spot in the recipe.
Step 4: Review the output recipe and optimize it
With the click of a button, you have your converted dataflow as a recipe. But as mentioned before dataflows and recipes simply work differently, so we cannot just click the button and say we are done. We do need to review the results and make changes.
Firstly, go through your recipe and review all the nodes with a warning symbol. These are clear indications that something didn’t convert correctly and the help text should help guide you to the modifications you need to make.
Secondly, it’s time to tackle all the logic you found and removed from the conversion prep step. This goes back to the table we presented with all the known pitfalls we have come across while working with dataflow conversion projects. Recommendations for some of the most common use cases are addressed in the blog Abra Cadabra, Dataflows to Recipes!
NOTE: Some more complex dataflow logic may need to be addressed with a new approach including but not limited to multi-value. We hope to address these in different blogs as time permits and update this blog with links to the approach.
Once you have implemented your changes it’s time to run your recipe and test it. It sounds simple, but this is an absolutely crucial part and should not be skipped. We have three goals with this:
- Understand if everything will run without errors,
- Validate the transformations deliver the desired results,
- Understand the timings of the recipe.
NOTE: Make sure you are using a recently refreshed full sandbox to fully test the run time of your new recipe.
For the first two points, you would have to go back to your recipe and fix the logic in case something isn’t as expected. For the third point, we have seen some discrepancies between dataflow and recipe run times, this is where the run times we pulled in the planning phase come in handy. Know the product team is aware of this and is looking into this. However, as mentioned dataflows have in most cases expanded over time and it may be smart to start splitting your recipe into multiple recipes. Let’s review an example.
Let’s say your converted recipe is flattening the role hierarchy and joining the results back into your dataset. Your role hierarchy most likely isn’t changing several times a day let alone every day, so it could be a win to remove this complexity from your converted recipe. Instead, you can create a new recipe where you flatten the hierarchy and create a staged dataset as the output, which you can use in your converted recipe.
Note: Staged datasets are as of Winter 23 a beta feature and would have to be enabled by Salesforce Support. For more information check out the Summer ’22 Release Notes.
Unfortunately, a guide to when and how to split a recipe does not exist as every recipe is unique to the use case. However, it can be said that you should consider if the data and logic for a section of your recipe don’t change with the same frequency as the rest of the recipe, it might be a good candidate for a split.
Regardless if you do choose to split up your recipe into multiple make sure to thoroughly test each recipe individually to make sure your transformations and data are correct.
Step 5: Finalize conversion
Once you are happy with your recipe(s) and have fully tested the results you can move ahead and start using the datasets in your dashboards.
Backup and Update References
The easiest way to update the references is to return to your recipe and review the output nodes. First, you want to double-check that all the API names of your fields are the same as they are in the dataset originating from your dataflow. You can better get an overview of this by switching from the ‘Preview’ tab to the ‘Column’ tab (as seen below) and comparing this to your field usage spreadsheet.
If you have discrepancies it’s time to change that and avoid load errors on your dashboards and lenses. You can go back to where the field originated from – transformation or join nodes – to change the labels and API names or you can create a new transformation node and leverage the ‘Edit Attribute’ quick transformation. Just note that adding another transformation might impact your recipe run time.
Next, in the output nodes, you want to change the dataset API name to match that of the dataset originating from your dataflow. In other words, remove the ‘_df2R3’ from the end of the API name. This will cause you to overwrite the original datasets and replace them with the output of the recipe.
Finally, you want to check that datasets are saved in the correct app and if not update that to ensure that users are still able to access the datasets and again avoid any dashboard or lens load errors.
And don’t forget to run the recipe before moving on to the next step.
Test, Deploy & Clean
Once you have updated the references it’s time to test again, this time the focus should be on the analytics assets you have updated. You really just need to follow your standard user acceptance test making sure you:
- Identify any errors.
- Test data security.
- Do end-to-end performance testing.
NOTE: For more information on how to manage the asset life cycle including UAT, check out the blog Typical CRM Analytics Asset Lifecycle.
And when all are happy it’s time to deploy your changes. To read more details on how to deploy your analytics assets make sure to review the blog Uncovering Deployment Techniques for CRM Analytics, which in detail uncovers how to approach your deployment including pre- and post-deployment steps.
One important thing to remember in the deployment process is to set up the recipe schedule and remove the dataflow schedule.
Once everything is validated in production you can start the clean-up process of your dataflow. Guess what, there is another great blog on how to approach cleaning up your data manager assets. It’s worth noting that this blog considers the analysis part as well, however, since you have already done that at the beginning of your conversion project, thus you can skip a few steps and go to the cleanup section of the blog.
And that’s it! Please let us know your thoughts or use cases in the comments below.
Thanks this makes a great reference point to start examining this burgeoning demand.