Friday, December 18, 2015

Utility of Interactivity

In a previous post, I referenced the Sunburst visualization, which makes allows users to interactively explore a decision tree in its prediction, the confidence in the prediction, and the split field at any given node in the tree.  It does not also give the full (expected) class distribution, so some information is unnecessarily lost.  The viewer can guess at the distribution by looking at children nodes, however.  In general, I think this is a clever and well-made decision tree visualization because of its use of space, but as noted, it might not be very useful.  Here is a sunburst with more labels.  For this task, at least, the labeling scheme harder to read then equivalent horizontally aligned design.  There should also be a legend for split fields, and for the classes when in the appropriate views.  

Interaction can be dangerous, in my opinion, because it transforms the presentation of data into a tool for exploration.  If it is treated in the same way as static visualization, it can mislead both the creator and the viewer by leaving the most important information to be discovered, rather than sought after beforehand.

Static visualization freezes points in the exploration process.  If I am sorting through many histograms and find one particularly interesting, I will save it or add comments in the IPython notebook.  When I then go to present my data science work, I will pick out the few static visualizations I want to make an argument.  I do plenty of the same exploration that interactivity facilitates, but I am aware of exactly what I'm doing through the commands used.  

Interactivity can be used for seamless exploration of a data set, but exploration is only one step of the data science process. My working hypothesis is that interactive visualizations should never be used as a final form, because it passes the burden of interpretation onto the user.  Figures ought to support a prose argument.  It is not that users should be left to decide for themselves - for that they too should have access to a data set.  Rather, the creator must know what exactly is being said.  I fear that the ease of exploration gives a false sense of security and leads to incomplete thoughts.  If it's so easy, one could always go back an look at it again.  It also reduces the burden need to arrive at the most important information quickly.  The presentation of data can be distributed over many potential views, instead of concentrated in a single one.  We might feel that we can escape the information loss of compression by creation of a network of connections between all possible frames or data views.  In doing so, we might forget that effective communication depends on compression.  We want to be informed quickly, accurately, and with high confidence that we are learning what is important, rather than being presented with all of the data.

So what exactly is interactivity in visualization?  My curmudgeonly view is that it is decompression allowed by ease of transition between data views.  As a form of exploration, it's great, but danger lies in its shiny animations.

No comments:

Post a Comment