SAQL simply explained – Part 1
When we create dashboards in Einstein Analytics most of us rely on the UI. In the dashboard editor, we click “New Step” select our dataset, define the measure and a grouping whereafter we add it to the dashboard. In other words, with point and click, we can create a great dashboard that gives insight into the data. However, sometimes we cannot achieve what we want in the UI, at least not yet, and that’s when we can use SAQL to get the result we want. For Einstein Analytics beginners SAQL may seem foreign, but it’s simple enough and it can be learned. So let’s have a look at the anatomy of the SAQL query.
Facts about SAQL
First of all, SAQL stands for Salesforce Analytics Query Language, it is what the engine used at the runtime of the query – it is a query language based on Apache Pig. When we create a step in a dashboard by default we use compact format, but when the dashboard is being executed the compact format is converted into SAQL.
As mentioned above when we want to do more complex queries we need to switch to SAQL. There are many use cases for using SAQL for instance if you want to load multiple datasets or use specific parameters of the timeseries() statement.
One thing to remember is that once you switch to SAQL, you cannot go back. You should make sure the use case for using SAQL is valid, since after the switch you can no longer modify the query with point and click. The reason is that when we switch to SAQL we are using a different step type with different parameters in the dashboard JSON. Step types are just the type of query you are going to use. You won’t be able to see this change unless you go to the dashboard JSON, where you’ll see that the step type has changed from ‘aggregateflex’ to ‘saql’ and the query is no longer using compact format. Hence I would always recommend you try to complete your query in the UI including compare tables.
Anatomy of the SAQL query
To be honest, whenever I work with SAQL I try to do as much as possible in the UI and only when I need to I switch to SAQL. It actually has nothing to do with the switch of the step type, I do it because that way I have to do minimal manual editing, which means I am less likely to make simple mistakes. And it also forces you to see if it is possible to solve via the UI before leveraging SAQL, that way you make it easier to maintain down the road.
So let’s create a step. I’ve picked the opportunity dataset in an Analytics Trailhead org, I’ve grouped by stage and change the measure to sum of amount.
Let’s switch to SAQL by clicking the ‘SAQL mode’ in the top right corner and have a look at the SAQL query.
1 q = load "opportunities"; 2 q = group q by 'StageName'; 3 q = foreach q generate 'StageName' as 'StageName', sum('Amount') as 'sum_Amount'; 4 q = order q by 'StageName' asc; 5 q = limit q 2000;
The first time you look at the SAQL it can be a little hard to understand, but like anything, it gets easier with practice. When I created my step I first selected my dataset and in line 1 above you will see that my dataset is being loaded.
The next thing I did was add a grouping and a measure. In line 2 you can see that I have grouped by stage. However, my query is not projecting any grouping or measure unless I mention it in a ‘foreach’ statement, which is why stage and the sum of amount are mentioned in line 3.
In a query, we can also define how it’s being sorted by adding an ‘order’ statement as in line 4. Finally, we have applied a limit, which limits the result to only show the 2000 first groupings, which is what is being done in line 5. In theory, you can remove line 4 and 5 as they are not necessary for the query to run, but they are always added by default.
Let’s look a little deeper
Hopefully, going through the SAQL it doesn’t seem so scary anymore. But I am sure as you looked it over, you also noticed a certain pattern. A query is constructed of different statements, that each serves a purpose in the query. To simplify it each line in the query above is a statement.
A statement will have an input stream (some data to work with), potentially an operator (that looks at criteria for the input stream) and an output stream. The latter can be used later in the query. In the above example, we had five statements, that each has a keyword that determines what the statement does: ‘load’, ‘group’, ‘foreach’, ‘order’ and ‘limit’. Each statement is completed by adding a semicolon.
Each query will always start with a ‘load’ statement, and the data that is loaded will be used in the following statements. Each of these statements are given a name or unique identifier that can be used as a reference later in the query. In the above example, you can see that this name by default is ‘q’, however, it can be changed to anything you wish. As mentioned previously, a keyword defines what to do in a statement. Directly following the keyword an input stream needs to be defined by using the name or unique identifier from a previous statement. In the below illustration you can see the unique identifier, the keyword, and the input stream as well as how they follow each other.
By calling a new statement the same as the previous you overwrite the output. Hence once you overwrite the ‘q’ then information from the previous statement is lost. This becomes important to remember when you do something more complex.
Syntax worth knowing
There are some things to keep in mind when writing SAQL, things that will cause errors if they are not done as expected.
- Keywords (load, group, foreach, etc.) are written in lower case and they are case sensitive.
- When you are calling API names from the dataset they should be enclosed in single quotes like ‘StageName’. Also, notice that API names are case sensitive as well.
- Strings always have to be enclosed in double quotes like “opportunities”.
For more details on syntax feel free to refer to the official documentation.
That’s not all
As you probably have guessed this is not all there is the SAQL. We will continue uncovering different constructs of a SAQL query in the following parts of this blog series. In the next part we will look at groupings, measures and order in more details. It is, however, worth mentioning that the SAQL query that we have looked at in this blog, does not qualify as a use case for SAQL, meaning I would in this simple query not have switched from compact form to SAQL.
Nice walk through..
thanks
Thank you so much for this great tutorial!
Thanks for the blog… It took me a while to understand the concept ‘once you switch to SAQL, you cannot go back’.
Do you see anytime in the future that Salesforce will totally get rid of SAQL and switch to aggregateflex?