SAQL simply explained – Part 6
If you have been following along in this blog series, I’m sure you are starting to see how powerful SAQL can be joining data as we see fit, but there is more to the story. You can create new derived fields that we already touched upon in part 5 of this blog series, but let’s have a look at simple calculations and bucketing. Before getting started it’s worth mentioning that the examples that we will be covering in this blog can also be achieved while creating the dataset or by a compare table. The bucketing example can be partly done in the compare table, but you will need SAQL to achieve the final grouping. With this in mind, the examples shown are not necessarily using best practice however the examples are relatively simple and serving as building blocks that can be combined as you see fit in a use case where SAQL is needed.
In the first part of this blog series, we covered the anatomy of a SAQL query and we discussed how groupings and measures need to be projected in the foreach statement. It’s in this same statement we can manipulate our data further and come up with new derived fields. For the examples, we will be using the below query as a starting point.
1 q = load "opportunity"; 2 q = group q by 'Account.Name'; 3 q = foreach q generate 'Account.Name' as 'Account.Name', sum('Amount') as 'sum_Amount'; 4 q = order q by 'sum_Amount' desc; 5 q = limit q 2000;
Calculating new measures
Let’s say for each account you want to take 3% of their opportunities as a fee, which is not calculated on the account, so you need to do it in your query. As mentioned we can manipulate our foreach statement a little bit and thereby perform the calculation we need.
First, how do we calculate 3%? Well, my math from primary school tells me we can take our Amount field and multiply with 0.03 to get the 3% fee. That means we are looking at something like below.
'Amount' * 0.03
While the above is correct, we cannot simply add that in our foreach statement as this is not an aggregated measure. By adding the aggregation function it becomes something we can work with.
sum('Amount') * 0.03
Remember that we need to give our columns an alias, so I want to call my new derived measure “Fee”:
sum('Amount') * 0.03 as 'Fee'
Now, all we have left to do is add it to the foreach statement. I’ve chosen to leave out the opportunity amount and just have the fee, but you could, of course, include both measures. Below you will see the end query.
1 q = load "opportunity"; 2 q = group q by 'Account.Name'; 3 q = foreach q generate 'Account.Name' as 'Account.Name', sum('Amount') * 0.03 as 'Fee'; 4 q = order q by 'Fee' desc; 5 q = limit q 2000;
Notice I didn’t only make a change in my foreach statement (line 3) I also had to modify my line 4 as I removed my Amount from the query, hence I changed ‘sum_Amount’ to ‘Fee’. If I hadn’t done this Einstein Analytics would complain that it cannot find the identifier, which makes sense as it no longer exist.
With those changes, you can switch to chart mode and use your new calculated measure.
Case statement aka bucketing
Looking at the starting query from the beginning of this blog the use case for the bucketing example is to create a new column in my query result that looks at the amount for each Account Name and evaluate if this is a highly valued customer or not. The criteria to be a highly valued customer is if the amount is greater than 2 million. Hence I need to evaluate the amount field, if it’s greater than 2,000,000 then return “true” in a new column or dimension called ‘isValuedCustomer’, if not then return “false”.
If you are familiar with formula fields in core Salesforce, you’ve probably had to write an if statement or two. In Einstein Analytics we do not operate with if statements, but something that in my view is easier to grasp and read a case statement. The syntax is pretty straightforward just remember it should all be in lower case unless it’s an API name and in one line:
case when [logical expression] then [output value] else [output value] end
The basic idea is that you have a logical expression such as amount is greater than 2,000,000, if that is true you return the value expressed after “then”, which in our example is “true”. If the logical expression is not met then return the output value expressed after “else”, which for us would be “false”. Note that you of course have 100% flexibility over what the output value is and you can for instance have it return “cat” and “dog” if that made any sense.
There are a few more interesting things to know about the case statement. For instance, you can add as many ‘when’ statements as you want:
case when [logical expression] then [output value] when [logical expression] then [output value] ... when [logical expression] then [output value] else [output value] end
And you can also leave out the else statement, which may result in null values if you haven’t considered all scenarios in your logical expressions:
case when [logical expression] then [output value] when [logical expression] then [output value] end
Going back to the use case and with the syntax, we just looked at my case statement would look something like this:
case when 'Amount' > 2000000 then "true" else "false" end
Note that the API name is in single quotes, the number stands alone and text strings are in double-quotes, this is to make sure that Einstein Analytics interprets the value correctly.
A useful tip for the logical expression, instead of remembering the syntax I will often open a new lens with the same dataset and simply apply a filter with the logical expression and switch to SAQL. That way I can copy-paste the expression into my case statement.
With the case statement at hand we want to add it into our query, which as mentioned should be part of the foreach statement. In our example, we only have one foreach statement, but technically there is no ‘Amount’ field. The foreach statement projects aggregated values based on the function you choose into a name you define, typically the function followed by the field name. Why is this important? Well, if we add the case statement as is we face two issues:
- it will tell us that we are trying to call a none aggregated measure, which we cannot do and
- we haven’t defined the alias of the case statement’s output.
Considering the above points, the first thing we want to do is modify our case statement slightly. We want to make sure the Amount reference is an aggregated measure by using sum(‘Amount’) and that the output has an alias (column name). As my use case stated I will call it ‘isValuedCustomer’ by adding as ‘isValuedCustomer’.
case when sum('Amount') > 2000000 then "true" else "false" end as 'isValuedCustomer'
Now where to add the case statement. Well, depending on the order you want the columns in, you can add it in before or after any of the existing fields just remember to add your comma separating the different columns. I’ve added mine at the end of the statement (line 3).
1 q = load "opportunity"; 2 q = group q by 'Account.Name'; 3 q = foreach q generate 'Account.Name' as 'Account.Name', sum('Amount') as 'sum_Amount', case when 'sum_Amount' > 2000000 then "true" else "false" end as 'isValuedCustomer'; 4 q = order q by 'sum_Amount' desc; 5 q = limit q 2000;
You can run the query now and see the result. If you try to switch from query mode to chart mode you will notice the new field we created is hidden. You’ll be able to view it in a compare table, but if you look closely the new field is added as a column and not a grouping. This also means we will not be able to show this as a grouped dimension in a chart.
We can, of course, fix that. All we have to do is add a new group statement that groups by both ‘Account.Name’ and ‘isValuedCustomer’, just as we covered in part 2 of this blog series. So with those learnings, I’ve added lines 4 and 5 in the query below.
1 q = load "opportunity"; 2 q = group q by 'Account.Name'; 3 q = foreach q generate 'Account.Name' as 'Account.Name', sum('Amount') as 'sum_Amount', case when 'sum_Amount' > 2000000 then "true" else "false" end as 'isValuedCustomer'; 4 q = group q by ('isValuedCustomer', 'Account.Name'); 5 q = foreach q generate 'isValuedCustomer', 'Account.Name', sum('sum_Amount') as 'sum_Amount'; 6 q = order q by 'sum_Amount' desc; 7 q = limit q 2000;
Now you can run the query and switch to a chart showing our new derived dimension as a grouping.
In this example, we generated a new dimension, but you can of course also generate new measures using the same principles.