Export node details of a dataflow job
When you are in your Data Manager and checking the data monitor you get a list of all the jobs that have recently run in your Tableau CRM (Einstein Analytics) org. When looking at how the jobs are running – especially the dataflows – you can expand the job and see all the nodes that have been run. This is helpful for many reasons first of all if a dataflow failed you can quickly identify which node you need to correct, but it is also beneficial when you want to look at which nodes are taking a long time to run and you want to identify opportunities for improvements.
Having this detail in the Data Monitor is great, but sometimes it’s nice to export these details for further details or maybe even a backup. In this blog, I will walk through how this is possible with Mohan Chinnappan‘s analytics plugin. Please check out this blog for details on how to install or update the plugin.
Note: this blog is using the following version sfdx-mohanc-plugins 0.0.122. To see the latest details around the command check out github.
The dataflow jobs timing command
The main command for this blog is the dataflow jobs timing command. Let’s have a look at the options for the command by using the following:
sfdx mohanc:ea:dataflow:jobs:timing -h
Let’s have a closer look at the options for this command.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataflow:jobs:timing -u <insert username> --Example sfdx mohanc:ea:dataflow:jobs:timing -u rikke@demo.org
Dataflow job id
Use the -j option to specify a dataflow job id to use for your command.
--The option sfdx mohanc:ea:dataflow:jobs:timing -u <insert username> -j <insert dataflow job id> --Example sfdx mohanc:ea:dataflow:jobs:timing -u rikke@demo.org -j 03CB000000383oAMAQ
The dataflow job list command
To use the dataflow jobs timings command we need to have a dataflow job id, which we can get by using the dataflow job list command. To get the option for this command enter the following:
sfdx mohanc:ea:dataflow:jobs:list -h
Let’s have a closer look on the option for this command.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataflow:jobs:list -u <insert username> --Example sfdx mohanc:ea:dataflow:jobs:list -u rikke@demo.org
Export dataflow job details
Having looked at the dataflow jobs timing command as well as the dataflow jobs list command to get the dataflow job id, let’s have a look at steps to get a deeper look at how a given dataflow performs.
Note: Before using the load command you would have to log in to the desired org by using the command sfdx force:auth:web:login, which will launch the login window in a browser.
Step 1 – use the dataflow:jobs:list command to extract the list of jobs run in the org.
sfdx mohanc:ea:dataflow:jobs:list
Step 2 – define the username for the target org by adding the -u option.
sfdx mohanc:ea:dataflow:jobs:list -u rikke@discovery.gs0
Step 3 – press enter.
Step 4 – find the dataflow job you want to export the details from and copy the id. I am saving the id in a text editor. Note that you see dataflow and data sync in the list, so it may be a long list. Essentially this list is identical to what you see in the Data Monitor in the Data Manager.
Step 5 – use the dataflow:jobs:timing command to export the timing and node details from a dataflow job.
sfdx mohanc:ea:dataflow:jobs:timing
Step 6 – define the username for the target org by adding the -u option.
sfdx mohanc:ea:dataflow:jobs:timing -u rikke@discovery.gs0
Step 7 – define the dataflow job id from previously using the -j option.
sfdx mohanc:ea:dataflow:jobs:timing -u rikke@discovery.gs0 -j 03CB000000383oAMAQ
Step 8 – press enter.
Once the command is done you will see three files being generated:
- A JSON file with the dataflow job id followed by ‘timing’ as the name.
- A CSV file with the dataflow job id followed by ‘timing’ as the name. This file is unformatted.
- A CSV with the title ‘DFTiming’, which has been formatted to be uploaded to Tableau CRM to visualize the timings of nodes.
The CSV files should automatically open on your computer, but if it doesn’t locate it (check the exact naming in the command window), open it up and you will see all the details for the job.
Visualizing dataflow timings
You can take the file DFTiming.csv and upload it to Tableau CRM to visualize how each nodes is performing. You can either upload this within the platform in Analytics Studio or Data Manager. However, you can also leverage the dataset load command from the plugin. For the later please refer to the blog Uploading datasets via CLI, which walks through all the steps.
All of these recent blogs regarding Mohan’s additions to the tool have been incredibly helpful and incredibly insightful, thank you!!!
Hi Rikke,
is there any way how to automate this steps?
Should be possible with code.