Friday, January 16, 2015

Advanced Analysis – Benford's Law (Step by Step Tutorial)

This article is part of the Advanced Analysis series of data visualization tutorials.

In this tutorial, we will be testing out Benford's law on sample sales data. This tutorial uses the video demonstration of Benford's Law found here: Benford's Law

Benford's Law, also known as the First-Digit Law, refers to the frequency distribution of digits in many real-life sources of data, including sales data. In this distribution, 1 occurs as the leading digit in data about 30% of the time, 2 at 17.6% of the time, 3 at 12.5% of the time, while larger digits occur in that position less frequently: 9 is the first digit less than 5% of the time [Wikipedia]. The pattern of distribution frequency for digits 1-9 follow a logarithmic pattern. Benford's Law can be used to detect accounting fraud, among various other forgeries.

We will be using a bar chart to visualize the distribution of leading digits in our sales data using a packaged Tableau workbook pasted below for our analysis.

Download the data
Download the packaged Tableau workbook "Benford's Law.twbx" and open the fileThe file is available hereBenford's Law.twbx

Build the first view
Now that you have your data source set up, begin building the view.

Step 1.1
Create a calculated field to extract the leftmost digit for the Sales field. 
Right-click Sales from the Dimensions pane, select Create Calculated Field...




Step 1.2
In the "Name" textbox, type Leftmost Sales Value
In the "Formula" textbox, type LEFT(STR([Sales]), 1)
Click OK.




Step 2
From the Dimensions pane, drag Leftmost Sales Value to the Columns shelf.


Step 3
From the Measures pane, drag Number of Records to the Rows shelf.

Step 4
Right-click the Number of Records pill in the Rows shelf and select Quick Table Calculation > Percent of Total




Step 5
Click the "Abc" toolbar icon to show mark labels in the view.









The final view looks like this.
















As we can see, the distribution of frequency of leading digits in our sales data follows a logarithmic pattern from 1-9, thus holding Benford's Law to be true for our sample data set. Despite the data set being fictitious, it was likely not modeled after Benford's Law, which makes it just as interesting that it follows this pattern.

No comments:

Post a Comment