Tal's Weblog

A collection of assorted trinkets and memorabilia

Ruminations on interactive documents

tl;dr (or Abstract) Online media (from scientific literature to newspapers to blogs) have the potential to revolutionize debate and scientific literacy by incorporating explorable models and their underlying data directly into reporting — and subsequently public discourse. I'll show examples of converting plain text to interactive documents and describe some potential authoring tools to make creation of these documents fast and straightforward.

The set-up

I've been spending a lot of time recently thinking about data-driven debate and discussion. Many have argued (I think very correctly) that claims of "alternative facts" and "fake news" are due in no small part to siloed social networks and a lack of common ground.

The logical follow-up for me is the following: what tools can we use (or create) to help get everyone back on the same page? Furthermore, how can those tools enable all the hard work done by scientists and engineers and economists and policy-makers to be accessible and intuitive to people without years of specialized training in those fields? Two experts in a field might disagree, but if the models and data they're each using to come to their respective conclusions are transparently shared and communicated, at least everyone is debating within a shared space of possibility.

As a biomedical data scientist working with electronic health record data, I'm used to biased and missing sources of information, and my job is to design models to correct for these issues. But in the primary source of science — The Literature — incomplete methods sections and poorly documented data sets (when they're even included at all) are problematic to say the least.

A number of solutions exist for leveraging the web to better disseminate models and data, but they're often geared towards researchers. And more damningly, these tools always exist at least one link away from the document describing them; even with the proliferation of open-access and online-only scientific journals, the ways in which readers can interact with data and models are still often relegated to the limitations of the printed page.

I can't think of anyone this decade who has highlighted these shortcomings — and suggested concrete ways to move forward — better than Bret Victor; his monumental essay What can a technologist do about climate change? is my climate-/data-/technology-geek's equivalent of holy writ for these issues. In describing how a reader should respond to differing points of view in editorial articles about the Cash for Clunkers program, Bret writes:

The real question is — why are readers and decision-makers forced to “believe” anything at all? Many claims made during the debate offered no numbers to back them up. Claims with numbers rarely provided context to interpret those numbers. And never — never! — were readers shown the calculations behind any numbers. Readers had to make up their minds on the basis of hand-waving, rhetoric, bombast.

The interactive example he goes on to show is well worth exploring and exemplifies what I think all media (from scientific literature to front page news articles to blog posts) should aspire for. Here's another:

Current evidence-sharing using text and figures

I recently published an open access drug-drug interaction discovery paper in the Journal of the American College of Cardiology; in the study we used a combination of clinical data mining and wet lab experiments to predict and experimentally confirm that two commonly prescribed drugs (ceftriaxone and lansoprazole) interact to cause a dangerous change in heart rhythm.

Drugs typically cause this arrhythmia by physically blocking a protein in the heart called the hERG channel that helps coordinate the heart beat. In one part of the study we used a computational model of a human heart cell to predict what the effect of blocking this channel would be. Here is that section from the paper:

In combination, we found that ceftriaxone and lansoprazole block the hERG channel responsible for prolonging the QT interval on the electrocardiogram.

We used a computational model of the human ventricular myocyte (ref) to simulate the action potential for the hERG block we observed for ceftriaxone, lansoprazole, and the combination from our laboratory experiments. We ran the model for a ventricular action potential paced at 1 Hz with baseline conditions and 10% or 55% block of hERG current (chosen using the current blocks observed experimentally; see results in top figure). We evaluated the action potential duration at 70% of repolarization (APD70).

Using the hERG current blocks observed in the electrophysiology experiments as input to the computational model, the APD prolongation (measured as APD70) was 9 ms for the combination of 1 μM lansoprazole and 100 μM ceftriaxone (shown in brown) and 50 ms for 10 μM lansoprazole and 100 μM ceftriaxone (shown in red; see bottom figure).

First figure: hERG experimental results Second figure: Computational model plot

This is all well and good, but we have a lot more experimental data points in the first figure that would quickly get crowded if we plotted all the corresponding model outputs in the second figure. Even as supplementary material, the data would be hidden behind a series of links. Additionally, a number of questions a reader could have – "What does the action potential look like at 100% block?"; "What's the range of action potential prolongations that the model can output?"; and others – are simply not answerable with these static documents.

Embedding models and data directly into text

Taking inspiration from the examples shown in Bret Victor's essay, I created an interactive version of the study excerpt by linking D3 with Tangle, a JavaScript library created by Bret for generating reactive documents. Here's the resulting dynamic document:

First figure: hERG experimental results

We used a computational model of the human ventricular myocyte (ref) to simulate the action potential for the hERG block we observed for ceftriaxone, lansoprazole, and the combination from our laboratory experiments. We ran the model for a ventricular action potential paced at 1 Hz with baseline conditions and for every possible current block from 5% to 100%, including the current blocks observed experimentally (see results in top figure). We evaluated the action potential duration at 70% of repolarization (APD70).

We observed a block of 10% for the combination of 1 μM lansoprazole/ 100 μM ceftriaxone and 55% for 10 μM lansoprazole/ 100 μM ceftriaxone. For % block, the action potential prolongation was ms.

Adjust the percent block by and dragging to see the changes in the model and results.

Now that's more like it! Note that when selecting a current block for which we have experimental results in the top figure (e.g. at 40% block), the text to the right of the interactive element updates to describe the drug concentrations used to observe that block. The code to generate this interactive document is available on GitHub.

Authoring considerations and next steps

The above example is already a step forward that could run on any web browser, but how should one go about creating these sorts of interactive documents? In this case I sat down for part of an afternoon and ran the MATLAB computational model for each current block. I converted the model output to JSON using Python and then wired together the JavaScript to have the interactive Tangle element update a D3 chart.

This workflow was fine for a proof of concept, but it begins to break down when thinking about creating interactive documents on a regular basis. The same motivation John Gruber had in creating Markdown applies here; why should authors be taken out of the headspace of writing in order to "mark up" a document with HTML and CSS and D3 and Tangle (and helper JavaScript and Python and R, etc.)?

I think the solution is a Dynamic Markdown.

#### An adjustable number ####
I would walk [500 miles]{walk_distance: 100..1000 by 10}.  
And I would walk [500]{second_walk} more.
@second_walk = @walk_distance * 3

The syntax on the left will be familiar to anyone who's used Markdown, with some new additions. That syntax has been processed in-browser to render the corresponding HTML and JavaScript on the right (feel free to check the site source or open the JavaScript console to see it in action!).

The project is up on GitHub and currently supports a subset of the elements in Tangle; I'll update this post as I add syntax for new elements (including D3 templates for declaratively describing interactive charts).

I'll conclude with this thought: when reading a document that includes any type of data analysis, we're only getting that author's perspective (often limited for space considerations) rather than the full suite of answers made possible with the model and data. Allowing readers to both a) explore what evidence an author is using to support her argument and b) understand how that underlying model actually works can be challenging to do concisely with static text and graphics. But I think it's much more likely to happen when not only an embedded data visualization but the document text itself reacts to a reader's questions and points of view. Let's make it so.