Visualization Curmudgeon: Visualization Primitives and Data Correspondences

Would a formal ontology for data visualization be useful? Could it either improve our existing visualizations or point to new possible ones? I think analysis and improvement of existing visualizations is tractable and mostly understood. From my perspective, there is plenty of work to be done just applying the few criteria I've described for model introspection and all of the design thinking that Tufte has proposed, especially in "The Graphical Display of Quantitative Information". Building tighter correspondences between visual representation and underlying data I believe is tractable by examining visualizations we have. A systematization might highlight gaps or unimagined combinations of complementary elements such as the relatively new Sunburst visualization, which organizes hierarchical information radially. But if this is the only potential use, a list of design elements may be more desirable and much less work than a formal ontology in its ability to find new combinations.

For this reason, I'd like to look at parts of visualization that are primitives or nearly so. We'll make a first-pass partial list. Visual elements should correspond to the data sources they represent. We should appreciate the value of intuitive correspondence.

Points in space should represent single samples

Symbols rather than dots should correspond to categorical values or different classes
Words can function as symbols inside the data space, or data themselves, but are labels outside the data space

Line segments

If aligned with an axis, they represent scalar values
If arranged parallel to others, they are compared scalar values
If connecting two points, they represent a relationship between those two points

Points in a 2D space

May represent a sample in a 2D projection
May be points along a function in that space
In a graph, positional information is less informative but points should be arranged to ease reading of relationships
Without axes and not in a graph, positional information, usually through proximity, should express a relation with other samples

Curves express a function, or a boundary in the data space, but these are analogous

May also substitute for line segments in graphs

Areas

Multiple rectangles with one edge aligned and of constant length should be interpreted as line segments
Rectangles naturally represent the product of two variables if aligned with orthogonal axes
May also represent a fractional component of a larger whole

May be appropriate over divided line segment/stacked bar/compared parallel bars in some cases, especially in hierarchical division of a quantity to give more space for labels

The danger is in imputation of meaning to the spatial arrangement of the subdivisions

Non-rectangular forms should not be used to compare single scalar values

Unless these forms have some underlying 2D nature
One may have a desire to plot circles with area (or diameter) representing scalars. These are difficult to compare visually, and should be avoided in favor of rectangles.

Other geometrical objects are even worse

Colors

From a discrete palette represent distinct labels, which should have accompanying text
From a continuous palette, represent a single continuous variables, which should have accompanying color bar
At some point, I'd like to illustrate a more complex idea with hue and color value representing two orthogonal dimensions, such as correlation with a positive or negative output class and strength of correlation. The space above the added V in this figure shows a color space with decent differentiability of points and intuitive interpretation.

Edited Wikipedia example of cross-section of HSV solid

"Small Multiples"

Visual distinction and comparison between similar objects

Parallel and tiled items are interpreted as analogues with differences in features

You could browse through all of Mike Bostock's examples and not come across something as funky as Chernoff faces, but they are an example of small multiples

These are the most obvious visual forms that I believe are intuitive to interpret. I will add to this list if I think there are more, but it may be very small overall.

Visualization Curmudgeon

Friday, December 18, 2015

Visualization Primitives and Data Correspondences

No comments:

Post a Comment