Using graphics to communicate with data

This article decomposes a graphic that visually portrays the speedtrap test—a realtime test that checks whether a pair of VPN connections from the same person are realistic in terms of when they were established and from what geographic locations they originated. Pairs of connections that fail the test are potential anomalies that demand further investigation.

The fundamental idea this graphic communicates is that a person failing the speed trap test could not have possibly gone from one geographic location to the next in the time it took him or her to establish connections from both locations. It enhances an alerting system that is primarily used by a team of data scientists.

We can show that with a line to denote time and an annotation to denote the maximum distance the person could have traveled.

Interpreting an arrow, a ubiquitious encoding, does not require much cognitive effort.

Time flows from left to right, and the vertical line shows why the test failed. Albeit, the graphic could benefit from spatial context, by making the line's distance, direction, and angle in accordance with reality.

The graphic's horizontal and vertical scales now map to longitude and latitude coordinates.

A detail introduced can potentially demand more from a reader and must therefore serve a justifiable purpose, namely, maximizing some fundamental quality. In this piece, that quality is readability.

Another detail our graphic could benefit from is a more realistic portrayal of where the person may have legitimately established a connection from. We can show that by replacing the vertical line with a circle that extends outward from the first location.

Interaction is an excellent way to reveal detail. Had there been a need for the piece to, say, show how large the area covered by the circle is, revealing a line of text by way of mousing over the circle would have been a good choice.

In light of the graphic's objective, denoting direction seems unnecessary—that a person went from India to the United States isn't important; that the person traveled more than double the distance possible in five hours is.

To give a sense of direction, but not persist it, the graphic can instead transition the line on first load and denote both geographic locations using the same encoding—a circle.

The decision also leads, in my view, to an aesthetic improvement. You may not wish to make the same decision if you determine the result to not follow.

We perceive the pairing of each text label with a dot due to the two elements's proximity to one another. See Gestalt principles for more.

So as to not overwhelm the reader, we can transition the circle after a brief delay. This tweak helps bolster the narrative that the person began in one location, then showed up in another, and this is how far the person could have traveled. The graphic need not show all the data, only the portion that the audience is likely to find useful.

To borrow a turn of phrase, the audience is, often, the message.

We have so far encoded elements in our graphics using position in two dimensions (x and y). To further maximize our fundamental quality, we can double-code the person's two states—distance-apparently-traveled and distance-could-have-traveled—using a third dimension (z) by way of a pair of contrasting colors.

A limited color palette mitigates accessibility issues due to colorblindness. For a color palette that is easier to read for those with colorblindness, consider Viridis.

In order to reduce any potential ambiguity, we can annotate the graphic with prose that explains what each line segment denotes and how distance-could-have-traveled is determined.

The graphic's white space plays a crucial role in helping to highlight the graphic's non-white space. Both are important.

The values shown for time and distance are approximations. As with the rest of the graphic, we are trading precision for insight.

The graphic, simple though it is, aims to satisfy the objective of portraying why a VPN connection may be anomalous while maximizing the quality of readability. Every decision that contributes to the graphic in its final form is made with that objective and that quality in mind. Furthermore, decisions about which abstractions might make sense are driven to an extent by who the audience is and the level of granularity that is likely to afford them insight.