This article is part of the Advanced Analysis series of data visualization tutorials.
In this tutorial, we will be testing out Benford's law on sample sales data. This tutorial uses the video demonstration of Benford's Law found here: Benford's Law.
Benford's Law, also known as the First-Digit Law, refers to the frequency distribution of digits in many real-life sources of data, including sales data. In this distribution, 1 occurs as the leading digit in data about 30% of the time, 2 at 17.6% of the time, 3 at 12.5% of the time, while larger digits occur in that position less frequently: 9 is the first digit less than 5% of the time [Wikipedia]. The pattern of distribution frequency for digits 1-9 follow a logarithmic pattern. Benford's Law can be used to detect accounting fraud, among various other forgeries.
We will be using a bar chart to visualize the distribution of leading digits in our sales data using a packaged Tableau workbook pasted below for our analysis.
Download the packaged Tableau workbook "Benford's Law.twbx" and open the file. The file is available here: Benford's Law.twbx
Build the first view
Now that you have your data source set up, begin building the view.
Step 1.1
Create a calculated field to extract the leftmost digit for the Sales field.
Right-click Sales from the Dimensions pane, select Create Calculated Field...
In the "Formula" textbox, type LEFT(STR([Sales]), 1)
Click OK.
Step 2
Right-click Sales from the Dimensions pane, select Create Calculated Field...
Step 1.2
In the "Name" textbox, type Leftmost Sales ValueIn the "Formula" textbox, type LEFT(STR([Sales]), 1)
Click OK.
From the Dimensions pane, drag Leftmost Sales Value to the Columns shelf.
Step 3
From the Measures pane, drag Number of Records to the Rows shelf.
From the Measures pane, drag Number of Records to the Rows shelf.
Step 4
Right-click the Number of Records pill in the Rows shelf and select Quick Table Calculation > Percent of Total
Step 5
Click the "Abc" toolbar icon to show mark labels in the view.
The final view looks like this.
As we can see, the distribution of frequency of leading digits in our sales data follows a logarithmic pattern from 1-9, thus holding Benford's Law to be true for our sample data set. Despite the data set being fictitious, it was likely not modeled after Benford's Law, which makes it just as interesting that it follows this pattern.
The final view looks like this.
As we can see, the distribution of frequency of leading digits in our sales data follows a logarithmic pattern from 1-9, thus holding Benford's Law to be true for our sample data set. Despite the data set being fictitious, it was likely not modeled after Benford's Law, which makes it just as interesting that it follows this pattern.
No comments:
Post a Comment