Export your Einstein Analytics datasets
Have you been in a situation that you need to download a dataset from Einstein Analytics? It may be to use it in another system or simply to have a back up of your data? It is actually possible to do and Mohan Chinnappan has made it even easier with his Dataset Export Utils. This blog will cover how to use this tool, however, it’s assumed you have experience with SalesforceDX CLI, if not check out this trail from Trailhead.
Note: You need to have installed the Salesforce Command Line Interface (CLI). Check out this Trailhead module to install the CLI.
Creating a dataflow
Before we get to the actual export of data we need to have a dataflow. This dataflow can be complex or simple, as long as you have the data that you want. In the example I am going to use I will keep it simple and just use two nodes or data transformations; edgemart to bring in the dataset and export to export it.
Note: You will only have the export node if you have Einstein Discovery (Einstein Analytics Plus licenses).
Head to your data manager, to set up your dataflow. I have created a brand new dataflow, but you can easily do this in existing dataflows as well – if you are using this for a production environment and not a demo org make sure this works with your general data orchestration.
As mentioned my dataflow is simple. First I am adding an edgemart node or transformation to bring in an existing dataset of opportunities, all I need to do is give my node a name and pick the dataset I want to use. Next, I choose the export node to be able to export my data as a CSV file. All I have to do here is give my node a name, choose the edgemart as my source, and define a user to perform the action. You can leave the target as is “Einstein Discovery”. See the steps I took below.
Note: The user chosen in the export node must have the Einstein Analytics Admin permission set.
You may wonder why the target is Einstein Discovery. Well, this node is a bit of a legacy thing. Previously Einstein Discovery was not part of Analytics Studio and users had to go elsewhere in the platform to make their predictions, but the data manager was a powerful tool to shape data before making predictions, hence the export tool is a way where you shaped your data but enabled Einstein Discovery to use the data by exporting the CSV file to Salesforce Core where it could be picked up. While Einstein Discovery retired this mechanism, we can now use it for downloading full datasets.
Where is your data now?
When your dataflow runs the data is exported to a sObject and stored for 48 hours, hence you need to grab it before then. Your dataset is split into several parts if it exceeds 32 MB and you would need to grab all parts to get all your data. Let’s have a quick look in WorkBench how it looks.
In order to download the dataset, we need to know the id of the dataset we exported. To do this you can run a SOQL query selecting the DatasetExport object. The main thing you want to include in your query is Id and PublisherInfo so you know what to extract and of course which row is the relevant one. See the query and steps below.
SELECT Id,PublisherInfo FROM DatasetExport
Remember that the file can be split into several parts. Hence to get the actual data we need to query a different object the DatasetExportPart. We can apply the Id from the query we just ran as a filter. You will end up with a query similar to this:
SELECT DatasetExportId,Id,Owner,PartNumber FROM DatasetExportPart WHERE DatasetExportId = '0PxB0000000TOnXKAW'
With the part id(s) noted down we can now use the REST Explorer and the GET function to grab the data. The path we have to use is:
/services/data/v48.0/sobjects/DatasetExportPart/<InsertPartId>/DataFile
You need to replace <InsertPartId> with the id we just found by querying the DatasetExportPart object. As my id was “0PyB0000000TPsdKAG”, for me it will look like this:
/services/data/v48.0/sobjects/DatasetExportPart/0PyB0000000TPsdKAG/DataFile
Clicking “Execute” you will get the data, which you can copy.
Note: If you have multiple parts you would need to repeat this step for each part and append the data afterward.
As you can see this is a very manual process, so let’s look at the Dataset Export Utils as mentioned in the introduction of the blog.
Installing the plugin
Before we install the plugin you can find all the details about it here including a list of all the commands offered in this plugin.
Note: You need to have node.js installed to leverage this plugin – download it from https://nodejs.org/en/download/.
In your command window (I am using Mac’s Terminal where I have already authenticated the org I will be using) enter the following command:
sfdx plugins:install sfdx-mohanc-plugins
When prompted to confirm the installation simply enter y and the installation kicks off. In the end, I fire off a command to confirm the installation has completed successfully.
--To see if the new plugin is installed successfully
sfdx plugins
Okay with the plugin installed what commands can you use? There are two commands to highlight the exportList and the export. The easiest way to understand what they are is by entering the help option, which we will look at in the following section.
exportList command
This command is useful for seeing all the export ids available to use when exporting your dataset. To see the exportList options enter the following in the command window:
sfdx mohanc:ea:dataset:exportList -h
This will give a list of options available for the exportList command as seen in the image below.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataset:exportList -u --Example sfdx mohanc:ea:dataset:exportList -u rikke@demo.org
export command
This command is what you will use to grab the files we exported in the dataflow. To see the export options enter the following in the command window:
sfdx mohanc:ea:dataset:export -h
This will give a list of options available for the export command as seen in the image below.
Let’s try to put these options to use by looking at the most common options for exporting your datasets.
Username
Use the -u option to specify a username to use for your command.
--The option sfdx mohanc:ea:dataset:export -u <insert username> --Example sfdx mohanc:ea:dataset:export -u rikke@demo.org
Export id
Use the -e option to specify the export id to grab. This refers to the DatasetExport object and the id we previously queried in Workbench. Note you can leave the -e option out and instead of taking a specific id it takes the latest export.
--The option sfdx mohanc:ea:dataset:export -u <insert username> -e <insert DatasetExportId> --Example sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW
It is also possible to define the file path, name, and extension, which I will show in the demo below.
Note: Before using the plugin make sure the authenticate the org you want to use by running the command sfdx force:auth:web:login, which will open up your browser and prompt you to login.
Viewing exports available
Before we can export our data we may need to find the relevant export id (DatasetExport Id) to use, especially if you have multiple export nodes across your dataflows. Of course, if you are not interested in using the -e option in the export command then you can skip this part. Regardless of using the exportList command we can easily find the DatasetExport Id, Owner Id, and Export Node Name. Let’s have a look at the steps to take.
Step 1 – use the exportList command from the plugin
sfdx mohanc:ea:dataset:exportList
Step 2 – define the username to use by adding the -u option
sfdx mohanc:ea:dataset:exportList -u rikke@demo.org
As you can see from the image above the command triggers a list of the DatasetExport ids available, but it also adds the owner id which was defined in the user parameter in the export node as well as the name of the export node. Hence the first id is the DataExport id, second is the owner id and third is the node name.
Looking at the result above I am interested in the dataset that is coming from the node export_Opportunities. All we need from the string is the first id, which is highlighted below. The rest is mere attributes to identify the dataset export.
0PxB0000000TOnXKAW,03CB0000002rbwTMAQ:export_Opportunities
Exporting your dataset with the Dataset Export Utils
Having found the dataset export we are interested in, let’s look at how we export the dataset we created with the dataflow.
Taking the options from before into consideration let’s construct the command we want to use.
Step 1 – use the export command from the plugin
sfdx mohanc:ea:dataset:export
Step 2 – define the username to use by adding the -u option
sfdx mohanc:ea:dataset:export -u rikke@demo.org
Step 3 – as I had multiple exports in my org I want to specify the DatasetExport id by adding the -u option. But remember you can leave this out and just get the latest exported dataset.
sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW
Step 4 – I could technically use the above command, however, that will result in my dataset being printed in the command window, I would much rather have a csv file. Hence I am going to add the path including the name and extension of my file.
sfdx mohanc:ea:dataset:export -u rikke@demo.org -e 0PxB0000000TOnXKAW > Downloads/Blog/Blog-Opportunities.csv
And that’s it, that is how you can export your datasets from Einstein Analytics. It is worth mentioning that this plugin doesn’t have a limit in file size as all parts from the export are automatically downloaded and joined together.
If you want to import the dataset to another org, check out the blog on the load command.
Hi Rikke, This is really helpful! I have found a bug with the plugin and am not sure where to report it so hopefully you can pass this on if this is not the right place. When I use the sfdx mohanc:ea:dataset:export command, the files are joined together but not by order of their PartNumber. DatasetExportPart 1 always has the column headers, so if it is not the first in the combined file, the data gets out of sort and doesn’t fall under the right heading. Appreciate if you can help pass this on to Mohan. Thanks!
Thanks for letting me know. Try and update the plugin, there should be an order build in the latest version.
Thanks! I updated the plugin and it looks like that solved the issue.
That’s great!! Thanks for letting us know.
Glad to have found this place, as I am wrestling with Salesforce (data querying).
But for point of reference, if I wanted to export (less than a million rows) from Oracle, I would write a simple query in PL/SQL Developer, then export to Excel. Boom. Done.
William, your point of reference does not really apply, TCRM datasets are not a transactional data mutable store , it is an end user data index made for aggregate BI queries. at scale and performance in the cloud. So the comparison is not an appropriate one. We know that you could not scale thousands of user on billions of rows of records to to query the excel Excel which is what our datasets are designed for. In general, we want people to use the data in TRCM not export it to external outside systems with lack of security and freshness.
Hi Rikke,
is there any way how to automate this steps?
Hi Rikki,
Is there a way to get the Dataset rows (similar like restore data) count history in workbench ?
Hi Rikke,
Thank you for the blog, I have a question, so is the limit of 32MB applies for data which we are exporting from a report(Values Table in TableauCRM) , I have read that only 10k rows are allowed to download as a csv, or excel file. Can we any how increase it. What could be the maximum downloadable size for a report in EA.
Best Regards,
Pranit.
Hi Rikke,
thanks for the post! Unfortunately I’m facing some error while exporting my dataset (rows: 30 millions):
“The “chunk” argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of Object”.
May you please help?
Thanks a lot!
Nevermind, resolved: I unchecked the “Specify CSV partition settings” in the Recipe
hi, I am getting the same issue, I have unchecked the “Specify CSV partition settings”, I am trying to export 27 million rows
Thanks!
You save my day Rikke
Anyway, do you know how to delete DatasetExport?
Thanks!
One quick question, when I’m pulling data out of Tableau CRM I’m noticing that the data is a few days older than the dataflow we created above. Do you have any idea why that is?
Hi, is there any way how to schedule this steps? thanks for your help!
Looks like even Salesforce help documentation found this blog entry useful
https://help.salesforce.com/s/articleView?id=sf.bi_integrate_recipe_output_results_csv.htm&type=5
Is there a way to automate this?
Thank you so very much. This is the first tutorial that has worked for us. I love that this process gives you a .csv with all of the rows instead of separate, chunked up ones. Thank you again!
Check out my new free Recipe Output for CRM Analytics app, a lot of questions and issues mentioned in the comments have been addressed in my app, enjoy it!
https://appexchange.salesforce.com/listingDetail?listingId=a0N4V00000Hs4jqUAB
Click here for the Demo Video
https://youtu.be/Wmo43ZONLaw
Click here for the Configuration Guide – Exporting CRM analytics Recipe Output CSV Files Made Easy
https://www.linkedin.com/pulse/exporting-salesforce-crm-analytics-recipe-output-csv-files-paul-liu
#Tableau CRM #Einstein Analytics
Hi,
Is there any plugin/commands available to find field usage from receipe now?
Any idea how to change the location of the output json and .csv file? Default location appears to be windows\system32 folder which I can’t write too, so the output files fail to be written.