SAQL simply explained – Part 7
In this part of the SAQL series, we are going to look at the rollup modifier, which is used to create total and subtotals in your query. Now as with the previous parts of this series, total and subtotals can also be achieved in the UI by using compare tables. Hence just because you need to see total and subtotals is not a good enough reason to switch to SAQL, however, this example will remain simple to explain the concepts, so they can be used as building blocks for your own use cases where SAQL is needed.
For the walkthrough of the rollup function, I will be using the following SAQL query with one measure Amount, and two groupings Industry and Account Type from the Opportunity dataset you get when you install the Sales Analytics templated app.
1 q = load "opportunity"; 2 q = group q by ('Account.Industry', 'Account.Type'); 3 q = foreach q generate 'Account.Industry' as 'Account.Industry', 'Account.Type' as 'Account.Type', sum('Amount') as 'sum_Amount'; 4 q = order q by ('Account.Industry' asc, 'Account.Type' asc); 5 q = limit q 2000;
Using rollup()
When you want to use total and subtotals in a query we need to define it for our grouping by using the rollup() modifier and optionally identify the total and subtotals using the grouping() function. Let’s first have a look at the grouping.
In line 2 from the above query we group by two dimensions, all we have to do is add the rollup modifier before the dimensions. This will tell the query engine to calculate the total for the first grouping (Industry) and subtotals for second grouping (Account Type). Our grouping will look like this:
q = group q by rollup('Account.Industry', 'Account.Type');
If you try to run your query now you will notice that you will get total and subtotal in your table below the query. The only problem is the values are not identified as total and subtotal.
Using grouping()
When we are using the rollup() function the query engine is aggregating the records by the values in a dimension but also adding those together to create a total. Let’s look at an example; consider Industry from our query above. All the records in the opportunity dataset aggregated by the industry resulting in 1,000,000 for the industry Insurance, 150,000 for Communications, etc. At the same time, it also aggregates all those industries resulting in 1,150,000 as a total. The query engine looks at the aggregates and categorizes the individual industries and the total with a boolean (true/false) if it’s a total. If the value is an individual industry like “Insurance” then 0 (false) is returned, if it’s the total value then 1 (true) is returned. In order to get the engine to return the 0 or 1, we use the grouping() function in combination with the dimension we are interested in.
grouping('Account.Industry')
You might now be thinking, why is this relevant? Well, we can use this information to give our totals labels if we take our learnings around bucketing from part 6 of this blog series. Expanding on the statement above we are using a case statement to say if it’s a total call it total but if it’s not a total just take the value from the dimension like Insurance and Communications. The statement would look like this:
case when grouping('Account.Industry') == 1 then "Total" else 'Account.Industry' end
Since we are using this in the foreach statement we need to project the value as an alias and the easiest is just to call it Account.Industry, that way we maintain the formatting from the XMD, which controls labels, colors, etc. Our statement will then look like this:
case when grouping('Account.Industry') == 1 then "Total" else 'Account.Industry' end as 'Account.Industry'
Since we have two groupings and we also want to include the totals for Account Type, we have to do the same exercise there. We do have to remember that this is the second grouping hence this would be considered a subtotal when we look at it in context with the industries. The statement would look like this:
case when grouping('Account.Type') == 1 then "Subtotal" else 'Account.Type' end as 'Account.Type'
Putting this together the foreach statement looks like this:
q = foreach q generate case when grouping('Account.Industry') == 1 then "Total" else 'Account.Industry' end as 'Account.Industry', case when grouping('Account.Type') == 1 then "Subtotal" else 'Account.Type' end as 'Account.Type', sum('Amount') as 'sum_Amount';
And if the whole query now looks like this:
1 q = load "opportunity"; 2 q = group q by rollup('Account.Industry', 'Account.Type'); 3 q = foreach q generate case when grouping('Account.Industry') == 1 then "Total" else 'Account.Industry' end as 'Account.Industry', case when grouping('Account.Type') == 1 then "Subtotal" else 'Account.Type' end as 'Account.Type', sum('Amount') as 'sum_Amount'; 4 q = order q by ('Account.Industry' asc, 'Account.Type' asc); 5 q = limit q 2000;
Note you can of course be more specific than “Total” and “Subtotal” and use something like “All Industries” and “All Account Types”. For additional information check out the documentation on rollup() and grouping().
Use it with unique
In addition, the rollup modifier can be used in connection with other aggregates like unique to count distinct numbers. For instance in the example below I’ve replaced sum(‘Amount’) with unique(‘Account.Id’) counting how many accounts there are for each Account Type and Industry.
1 q = load "opportunity"; 2 q = group q by rollup('Account.Industry', 'Account.Type'); 3 q = foreach q generate case when grouping('Account.Industry') == 1 then "Total" else 'Account.Industry' end as 'Account.Industry', case when grouping('Account.Type') == 1 then "Subtotal" else 'Account.Type' end as 'Account.Type', unique('AccountId') as 'unique_AccountId'; 4 q = order q by ('Account.Industry' asc, 'Account.Type' asc); 5 q = limit q 2000;
Just starting my EA journey and I wanted to greatly thank you for this and your other content. It has been instrumental in my understanding and jumping from DevOp/Architect to EA.
Thank you!!!
I appreciate that, thank you ? also I’m glad the blog is helping you get started.
Dear Rikke
I have 2 grouping dimensions and I use grouping and rollup as described in your post and in some other sources.
My question is: is there a way to only include the subtotals (i.e. the rollups of the second grouping dimension) but not the grand totals (i.e. the rollup of the first grouping dimension)?
Like if you have a dimension for Continent and another for Country and you are not interested in the Continent totals, only in the Country totals? is there a way to not show the Continent totals at all?
Thank you for your help and kind regards,
Peter
Hello Rikke,
Thanks for this great post. My groupings are Sub Region and Channel but my Sub Region data do not show up after I added grouping, only shown as Total.
But how do you do subtotal if you have multiple dimensions say more than 3 and you want to subtotal on a specific dimension?
can you please help for rollup on three fields. It is not total/subtotal as the last row when we use case for all the three fields. I have given alias using “All – “.
q = load “Sales_KPI_DS”;
–q = group q by (‘YEAR_ID_to_dimension’, ‘SALES_formula_bucket’, ‘PRODUCTLINE’);
q = group q by rollup(‘YEAR_ID_to_dimension’, ‘SALES_formula_bucket’, ‘PRODUCTLINE’);
–q = foreach q generate ‘YEAR_ID_to_dimension’ as ‘YEAR_ID_to_dimension’, ‘SALES_formula_bucket’ as ‘SALES_formula_bucket’, ‘PRODUCTLINE’ as ‘PRODUCTLINE’, sum(‘SALES_formula’) as ‘sum_SALES_formula’;
q = foreach q generate
(case
when grouping(‘YEAR_ID_to_dimension’)==1 then “All Year”
else ‘YEAR_ID_to_dimension’
end) as ‘YEAR_ID_to_dimension’,
(case
when grouping(‘SALES_formula_bucket’)==1 then “-”
else ‘SALES_formula_bucket’
end) as ‘SALES_formula_bucket’,
–(case
— when grouping(‘PRODUCTLINE’)==1 then “All Product-Line”
— else ‘PRODUCTLINE’
–end) as ‘PRODUCTLINE’,
‘PRODUCTLINE’ as ‘PRODUCTLINE’,
sum(‘SALES_formula’) as ‘sum_SALES_formula’;
q = order q by (‘YEAR_ID_to_dimension’ asc, ‘SALES_formula_bucket’ asc, ‘PRODUCTLINE’ asc);
q = limit q 2000;