Abra Cadabra, Dataflows to Recipes!
Do you love how simple it is to join, cleanse, and enrich your data using Data Prep Recipes, and wish you can transform all of your TCRM dataflows to recipes? But there are just too many of them and you don’t have time for all that work? Even more, perhaps you have these massive, spaghetti-looking dataflows full of legacy SAQL logic, that even just thinking of having to rebuild them in recipes gives you a headache? Do you, over your morning coffee, find yourself lamenting, “if only there’s some magic button that automatically converts my dataflows to recipes for me”?
Lament no more! The magic button is here. In the Summer ’22 release, you can convert your data flows to recipes using the Dataflow Conversion to Recipe (beta) with just a few clicks!
Here’s an example where I convert a Sales Analytics app dataflow into a recipe:
The Dataflow Converter (Beta) will convert essentially most if not all elements in your data flow into a corresponding capability in recipes.
Key Highlights
- The Dataflow Converter (Beta) creates and saves a new recipe based on the source data flow; it does not change the source data flow or its scheduling at all.
- The output datasets in the newly converted recipes are created with a suffix _df2R3 and will not overwrite the original datasets in the source data flow so your datasets and dashboards are completely unaffected. All of the field API names in the new datasets will remain exactly the same as the original datasets also.
- When you are happy with the generated recipe, you can update the dataset API names in the Output nodes; the recipe will overwrite the datasets based on the API names you specify.
- A dataflow can be converted to a recipe multiple times; only one corresponding recipe will be created, and it will be overwritten with each conversion request.
- If you want to make modifications to a converted recipe, you should save it as a new recipe and modify that to avoid the recipe being inadvertently overwritten by a subsequent conversion.
- The converter scans the input data sources (objects, datasets, etc) for column metadata, so if the input data sources are missing, or if fields referenced in the dataflow are missing, the converter won’t work. For example, if you upload the dataflow definition from your buddy (whose org may not have the same objects as your org), that dataflow may not be valid (i.e. it may not run) so it may not be convertible.
- Not all capabilities in data flows are convertible in the Summer ’22 release (see Considerations and Recommendations section below). However, converting a valid dataflow should always create a recipe and should not fail. If you experience issues with the Dataflow Converter (Beta), please contact me directly (see below for details) or Salesforce Support.
Considerations and Recommendations
- Date attributes such as fiscalMonthOffset, firstDayOfWeek, etc, should be defined as a Date Attribute in Data Prep Settings. Those date attributes automatically apply to all date fields in datasets created in recipes, so you do not need to apply them on a field-by-field basis like you had to in Dataflows.
- Conversion for “isMultiValue” and “multiValueSeparator” is not yet supported; if you have them in your computeExpression SAQL expression and you need to generate multivalue fields, you can leverage the new multivalue functions coming in Summer ’22:
split
: returns a multivalue from a delimited stringarray_join
: returns a delimited string from a multivaluearray_contains
: returns true if the multivalue contains the specified stringsize
: returns number of elements in a multivaluearray
: returns a multivalue consisting of the input parameters
- Field attributes such as “precision” and “scale” are not yet supported in conversion for sfdcDigest; if you need to override them in recipes, you can do it via the “Edit Attribute” transform.
- SAQL Functions in “defaultValue” parameters are not supported, and we have no plans to support at this time. Instead, you can implement it using coalesce function or case statement in Custom Formula.
- Self-referential ComputeRelative (eg a computeRelative node with saqlExpression that references the column it generates) is not supported. Depending on your use cases, you should rebuild them using other functions in recipes:
- Using computeRelative to aggregate data based on some data grouping? You can do that with aggregate node
- Using computeRelative to concatenate multiple text values in one column into a column based on a data grouping? You can do that with a Join node (using Lookup with MultiValue) and
array_join
function - Using computeRelative to generate row numbers? You can do that using
Row_Number
function in recipes - Using computeRelative along with filter to keep only the most recent records based on a data grouping (eg last activity record for an account), etc? You can do that using Aggregate and Filter node.
- SAQL “
like
” operator is not supported; you should usestartsWith
,endsWith
, orcontains
functions in Custom Formula - Not all SAQL date capabilities are supported yet: date filtering, relative date ranges, references to date epochs, etc, may not convert; it’s also best-practice to examine them for better ways to solve certain problems. For example, did you know that you can add hours to a DateTime field by using
DateTimeField + interval 6 hours
in recipes? It’s much easier than epoch arithmetics. Similarly, you can compare Dates or DateTime fields in recipes using simple operators like =, <, >, <=, >= without having to convert to epoch or some other formats! - SAQL reference of date parts (eg CreatedDate_Month) is not supported; if you only need the numeric value of the date parts, you can use the corresponding functions like
Year
orMonth
, etc. If you really need the formatted string versions of the date parts, you can map them based on this chart:
Dataflow (SAQL) | Recipe (SQL) |
---|---|
DateField_sec_epoch | to_unix_timestamp(DateField) |
DateField_day_epoch | to_unix_timestamp(DateField) / (60 * 60 * 24) |
DateField_Year | format_number(year(DateField), ‘0000’) |
DateField_Quarter | format_number(quarter(DateField), ‘0′) |
DateField_Month | format_number(month(DateField), ’00’) |
DateField_Week | format_number(weekofyear(DateField), ’00’) |
DateField_Day | format_number(day(DateField), ’00’) |
DateField_Hour | format_number(hour(DateField), ’00’) |
DateField_Minute | format_number(minute(DateField), ’00’) |
DateField_Second | format_number(second(DateField), ’00’) |
toDate() | to_timestamp() |
date_to_epoch() | to_unix_timestamp() |
month_first_day(toDate(‘CloseDate_sec_epoch’)) | date_trunc(‘month’, CloseDate) |
month_last_day(toDate(‘CloseDate_sec_epoch’)) | date_trunc(‘month’, CloseDate) + INTERVAL 1 month – INTERVAL 1 day |
week_first_day(toDate(‘CloseDate_sec_epoch’)) | date_trunc(‘week’, CloseDate) |
Other first_day or last_day functions | Use date_trunc on the different parts: ‘year’, ‘quarter’, ‘month’, ‘week’. |
toString(toDate(‘CloseDate_sec_epoch’), “yyyy-MM-dd”) | date_format(CloseDate, ‘yyyy-MM-dd’) |
- SAQL Date_diff is en route for Winter ’23 (Safe Harbor). In the meantime, this is how you achieve the SAQL date_diff functionality in recipes:
SAQL Date_Diff | Recipe Custom Formula Equivalent |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Next Steps
When Summer ’22 hits your sandbox, I want you to find your biggest, your oldest, your most complex dataflows, and convert them to recipes. If you run into any issues at all, please log a case and reach out to me.
Separately, do you have a SAQL function that you can’t find the equivalent of in Recipes? Is there a use case you can’t figure out how to implement in recipes? Do you feel that we need more Date or DateTime functions? Please contact me on LinkedIn, the Trailblazer Community or join us on Slack at DataTribe, I’d love to hear from you and your specific use cases!
I first thought this was an April Fool’s Day joke. 🙂
Wow! Huge news.
Are you able to comment on other considerations such as relative performance and error diagnostics?
Not sure if I am up to date, but my understanding is that currently Data Flows are superior for both.
The latest and greatest Data Manager in Summer ‘22 will break down recipe execution by more granular components, and we are continuing to make enhancements to make our customers lives easier!
As for perf, recipes and dataflows have very different characteristics so it’s a case-by-case situation. One thing that’s for sure: we have a myriad of perf enhancements for recipes already in the works and it’ll get faster every release!
Love this! Thanks for the blog, Jim 🙂
Thanks Jim – this is fantastic news. Really looking forward to having a go at moving a pivotal dataflow to a recipe so I can enhance my demo environment. Also, huge thanks for taking the time to give us so many really useful date function tips. Great in combination with your specific date blog post. Keep up the great work – we use this kit every single day, so it’s very important to us all.
Hello Jim,
Thx for sharing this post. Is this feature also supporting Flatten nodes for security?
We are using Flatten for role sharing security and adding some parameters manually in Dataflow json code like;
“action”: “flatten”,
“parameters”: {
“include_self_id”: true,
“self_field”: “Id”,
“multi_field”: “ParentRoles”,
“parent_field”: “ParentRoleId”,
“path_field”: “RolePath”,
“source”: “Dgst_UserRole”
Do you think is this something we will able to cover with this feature ?
Yes, flatten is available as a transform in recipes so you can convert safely!
I’m playing around with the new split function, but can’t get it to work as desired. Do you have any example for that?
Is there a way I can get the sales cloud template app compute relative computations for ValidFromDate and ValidToDate in recipe (screenshot) emailed? Decoding the dataflow isn’t a one-to-one conversion.
Could you please post.
I got the derived fields for ValidToDate and ValidFromDate and Duration if anyone wants them I can post the formulas.
Hi Francis,
I’d love to use the formulas. Could you post them, please?
Cheers,
David
Fabulous!
Hi Jim
I have a problem when I am joining nodes. When making several join nodes from the same transformation(or actually every node-type) node, the left source of the join node changes. This seems like a bug.
I am not sure if you are aware of this problem, but I can give you a call, to explain the bug in more detail.
Hi Jim
When I want to make several connections from one node, I have some problems. When I make the second join connection, the left key of the first join node changes. This is a bug.
I don’t know if you are aware of this problem, but give me a call, so we can discuss further.
Hi Jim
I noticed that input nodes (ex edgemart nodes in dataflows) take a lot of time (about 20 min) instead of edgemart nodes in dataflows (max 1 min ).
Is there a way to fix this problem?
Thanks in advance