Even to an experienced SQL query writer,
the mechanics of blending data, rather than joining it, can be difficult to
understand upon initial exposure. Data blending is a fairly advanced concept;
this article explains the concept of data blending at a basic level using Excel
as an example.
Data blending in Tableau is the operation
of combining multiple data sources into the same view by finding common fields
between them to join on.
How is this different from a regular join?
Let’s first describe the equivalent of a
data blend using an example with Excel spreadsheets.
Suppose you have two different Excel
spreadsheets, the first sheet contains data on sales, the second sheet contains
quota for sales. If you wanted to do a side by side comparison of sales vs. sales
quota by country, you would likely first aggregate the sales data to be summed
by country, then perform a vlookup to the sales quota sheet on each country to
grab the quota for that country and place the result next to the sales for that
country in the aggregated table.
This concept is referred to as a
post-aggregate join, as the vlookup to the sales quota spreadsheet did not
occur until after the sales data was already aggregated by country. Therefore,
the “join” to the second data source was done post-aggregation. This process of
joining data sources post-aggregation is referred to as data blending in
Tableau.
Had we performed a vlookup on the sales
data before aggregating by country, we would likely have multiple sales records
for each country with the quota next to each record. In order to get the
desired comparison, we would have to aggregate the data by summing the sales
column by country, and grabbing the max or min sales quota by country. This
process is understandably more work, as two different aggregations have to
occur, and much more data ends up being processed at once because the vlookup occurred
before the aggregation.
Creating this in Tableau would mean having
your sales data in one data source, sales quota data in another data source,
creating an aggregated view of the sum of sales by country, and simply dragging
quota onto the visualization next to the sales column. Tableau will automatically
recognize the common field of Country between the two data sources, and use
that to do a post-aggregate join. Tableau does not join the two data sources
until after the data is aggregated. Given that the desired visualization
aggregates the primary data source, the post-aggregate join is much better for
performance than joining at the row-level first and then performing the
aggregate calculations.
Good write-up! The last paragraph really helps explain what blending is doing.
ReplyDeleteHappy to hear it was helpful!
Delete