Friday, December 18, 2015

Visualization Primitives and Data Correspondences

Would a formal ontology for data visualization be useful?  Could it either improve our existing visualizations or point to new possible ones?  I think analysis and improvement of existing visualizations is tractable and mostly understood.  From my perspective, there is plenty of work to be done just applying the few criteria I've described for model introspection and all of the design thinking that Tufte has proposed, especially in "The Graphical Display of Quantitative Information".  Building tighter correspondences between visual representation and underlying data I believe is tractable by examining visualizations we have.  A systematization might highlight gaps or unimagined combinations of complementary elements such as the relatively new Sunburst visualization, which organizes hierarchical information radially.  But if this is the only potential use, a list of design elements may be more desirable and much less work than a formal ontology in its ability to find new combinations.

For this reason, I'd like to look at parts of visualization that are primitives or nearly so.  We'll make a first-pass partial list.  Visual elements should correspond to the data sources they represent.  We should appreciate the value of intuitive correspondence.  

  • Points in space should represent single samples
    • Symbols rather than dots should correspond to categorical values or different classes
    • Words can function as symbols inside the data space, or data themselves, but are labels outside the data space
  • Line segments 
    • If aligned with an axis, they represent scalar values
    • If arranged parallel to others, they are compared scalar values
    • If connecting two points, they represent a relationship between those two points
  • Points in a 2D space 
    • May represent a sample in a 2D projection
    • May be points along a function in that space
    • In a graph, positional information is less informative but points should be arranged to ease reading of relationships
    • Without axes and not in a graph, positional information, usually through proximity, should express a relation with other samples
  • Curves express a function, or a boundary in the data space, but these are analogous
    • May also substitute for line segments in graphs
  • Areas
    • Multiple rectangles with one edge aligned and of constant length should be interpreted as line segments
    • Rectangles naturally represent the product of two variables if aligned with orthogonal axes
    • May also represent a fractional component of a larger whole
      • May be appropriate over divided line segment/stacked bar/compared parallel bars in some cases, especially in hierarchical division of a quantity to give more space for labels
        • The danger is in imputation of meaning to the spatial arrangement of the subdivisions
    • Non-rectangular forms should not be used to compare single scalar values 
      • Unless these forms have some underlying 2D nature
      • One may have a desire to plot circles with area (or diameter) representing scalars.  These are difficult to compare visually, and should be avoided in favor of rectangles.
        • Other geometrical objects are even worse
  • Colors
    • From a discrete palette represent distinct labels, which should have accompanying text
    • From a continuous palette, represent a single continuous variables, which should have accompanying color bar
    • At some point, I'd like to illustrate a more complex idea with hue and color value representing two orthogonal dimensions, such as correlation with a positive or negative output class and strength of correlation.  The space above the added V in this figure shows a color space with decent differentiability of points and intuitive interpretation.
Edited Wikipedia example of cross-section of HSV solid


  • "Small Multiples"
    • Visual distinction and comparison between similar objects
      • Parallel and tiled items are interpreted as analogues with differences in features
    • You could browse through all of Mike Bostock's examples and not come across something as funky as Chernoff faces, but they are an example of small multiples


These are the most obvious visual forms that I believe are intuitive to interpret.  I will add to this list if I think there are more, but it may be very small overall.

No comments:

Post a Comment