Dataflow performance with field usage analysis
Some of you might have long running dataflows that you for many reasons want to have running faster. But how you approach this can be time-consuming. Siva Teja Ghattepally has provided a brilliant webinar giving techniques to optimize the dataflow performance. To aid this process Mohan Chinnappan‘s analytics plugin provides a great command that allows you to:
- Get an overview of your dataflow nodes and their performance,
- Get an overview of which fields are not used and can be removed,
- Gives you the full insight into a node without clicks unlike the actual dataflow.
In this blog, I will walk through how to use this ‘analyze’ command. Please note that you will need to have Salesforce CLI and Mohan’s plugin to leverage this blog. Please check out this blog for details on how to install or update the plugin.
Note: this blog is using the following version sfdx-mohanc-plugins 0.0.119. To see the latest details around the command check out github.
The dataflow jobs analyze command
The main command for this blog is the dataflow jobs analyze command. Let’s have a look at the options for the command by using the following:
sfdx mohanc:ea:dataflow:jobs:analyze -h
Let’s have a closer look at the options for this command.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataflow:jobs:analyze -u <insert username> --Example sfdx mohanc:ea:dataflow:jobs:analyze -u rikke@demo.org
Dataflow job id
Use the -j option to specify a dataflow job id to use for your command.
--The option sfdx mohanc:ea:dataflow:jobs:analyze -u <insert username> -j <insert dataflow job id> --Example sfdx mohanc:ea:dataflow:jobs:analyze -u rikke@demo.org -j 03CB000000383oAMAQ
Dataflow id
Use the -d option to specify a dataflow id to use for your command.
--The option
sfdx mohanc:ea:dataflow:jobs:analyze -u <insert username> -j <insert dataflow job id> -d <insert dataflow id>
--Example
sfdx mohanc:ea:dataflow:jobs:analyze -u rikke@demo.org -j 03CB000000383oAMAQ -d 02K3h000000MtyuEAC
The dataflow job list command
To use the dataflow jobs analyze command we need to have a dataflow job id, which we can get by using the dataflow job list command. To get the option for this command enter the following:
sfdx mohanc:ea:dataflow:jobs:list -h
Let’s have a closer look on the option for this command.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataflow:jobs:list -u <insert username> --Example sfdx mohanc:ea:dataflow:jobs:list -u rikke@demo.org
The dataflow list command
To use the dataflow jobs analyze command we need to have a dataflow id, which we can get by using the dataflow list command. To get the option for this command enter the following:
sfdx mohanc:ea:dataflow:list -h
Let’s have a closer look on the option for this command.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataflow:list -u <insert username> --Example sfdx mohanc:ea:dataflow:list -u rikke@demo.org
Analyze the dataflow job
Having looked at the dataflow jobs analyze command and the additional commands we need to get the dataflow id and dataflow job id, let’s have a look at steps to get a deeper look at how a given dataflow performs.
Note: Before using the load command you would have to log in to the desired org by using the command sfdx force:auth:web:login, which will launch the login window in a browser.
Step 1 – use the dataflow:jobs:list command to extract the list of jobs run in the org.
sfdx mohanc:ea:dataflow:jobs:list
Step 2 – define the username for the target org by adding the -u option.
sfdx mohanc:ea:dataflow:jobs:list -u rikke@discovery.gs0
Step 3 – press enter.
Step 4 – find the dataflow run you want to analyze and copy the id. I am saving the id in a text editor. Note that you see dataflow and data sync in the list, so it may be a long list. Essentially this list is identical to what you see in the Data Monitor in the Data Manager.
Step 5 – use the dataflow:list command to extract the list of dataflows in the org.
sfdx mohanc:ea:dataflow:list
Step 6 – define the username for the target org by adding the -u option.
sfdx mohanc:ea:dataflow:list -u rikke@discovery.gs0
Step 7 – press enter.
Step 8 – find your the dataflow in question, copy the id and save just as you did for the dataflow job id.
Step 9 – use the dataflow:jobs:analyze command to analyze a specific job and dataflow.
sfdx mohanc:ea:dataflow:jobs:analyze
Step 10 – define the username for the target org by adding the -u option.
sfdx mohanc:ea:dataflow:jobs:analyze -u rikke@discovery.gs0
Step 11 – define the dataflow job id from previously using the -j option.
sfdx mohanc:ea:dataflow:jobs:analyze -u rikke@discovery.gs0 -j 03CB000000383oAMAQ
Step 12 – define the dataflow id from previously using the -d option.
sfdx mohanc:ea:dataflow:jobs:analyze -u rikke@discovery.gs0 -j 03CB000000383oAMAQ -d 02KB0000000BRisMAG
Step 13 – press enter.
Once the command is done you will see three files being generated:
- A JSON file with the dataflow id as the name
- A CSV file with the dataflow id as the name – this is the same file you get when using the dataflow:fieldUsage command, which you can read more about in this blog.
- A SVG file with the dataflow job id as a the name.
Step 14 – locate the SVG file generated from the command on your computer and open it in your browser
The SVG file will show:
- Each node from the dataflow,
- the action type for each node,
- the parameters for each node,
- the duration it took to run each node
- the input and output rows from each node
- digest nodes will show the object used
- computeExpression nodes will show the SAQL expression
- register nodes will show the fields and the usage count across lenses and dashboards – highlighting the fields that are not used.
Below you can see some examples of how nodes are represented.
You can use the SVG file to analyze how each node is performing as well as see where some fields might be removed because there are not being used. Do remember that date components are automatically created and cannot be removed unless the date field isn’t used at all. Finally, giving the visual representation with timings can aid you in investigating which nodes to focus on in trying to optimize the performance. For more details on dataflow optimization please check out the learning days recorded webinar on the subject.
Hello, Rikke:
How are you today?
When I follow these blog’s step, I did get 3 files and .csv file contains contents as using the dataflow:fieldUsage command. But somehow, the .svg only contains “created by….” without any further content in it and it’s only 1 kb size. Do I miss something?
Command I’m using:
sfdx mohanc:ea:dataflow:jobs:analyze -u ghu@blackberry.com -j (a job id found in our production instance) -d ( the data flow id of corresponding job)
I have also updated plugins, the plugins version is 0.0.131
Many thanks,
Senior Data Systems Architect