Notes on Engineering Health, September 2019

“Technical debt” is a cost of doing business well known to software developers. The term implies that a future payment will come due by choosing a short-term (if pragmatic) implementation today, knowing full-well that the system will need to be refactored to allow for a more complete or scalable solution in the future.

Jonathan Zittrain, co-founder of the Berkman Klein Center for Internet & Society at Harvard University, recently wrote (shorter version here; longer version here) about a new kind of debt that the technical world is beginning to rack up, so-called “intellectual debt:”

An approach to discovery—answers first, explanations later—accrues what I call intellectual debt. It’s possible to discover what works without knowing why it works, and then to put that insight to use immediately, assuming that the underlying mechanism will be figured out later. In some cases, we pay off this intellectual debt quickly. But, in others, we let it compound, relying, for decades, on knowledge that’s not fully known.

Zittrain points out that the ever increasing use of machine learning to make “theory-free” predictions (that is, predictions whose rationale we don’t understand, whether or not the prediction is correct) is a source of rapidly-mounting intellectual debt. Similar to technical debt, we are making short-term decisions to outsource judgement and control to systems that “just work.” Which is fine, until they don’t.

Zittrain identifies three ways in particular that which our intellectual debts could come due. First, the problem of adversarial examples manipulating deep learning systems. Zittrain’s first example of this problem is the alteration of a few pixels in a picture of a cat that leaves a human eye still seeing just a cat, but causes a sophisticated neural net to see with 99.99% surety a photograph of guacamole. In a less trivial example of this technique, a system designed to classify skin lesions as benign or malignant was tricked into making inaccurate medical judgements. In this scenario, malicious or even just inadvertent spoofing could lead to significant health risks.

The second problem called out by Zittrain is the compounding of intellectual debt caused by “the coming pervasiveness of machine learning models.” Here the issue is that data produced by one machine learning system is increasingly being used to train other machine learning systems. The potential for an unrecognized flaw in the initial system to propagate exponentially through all its connected systems leads to a problem known in engineering as cascading failure.

The final challenge created by accreting intellectual debt called out by Zittrain is that the tools of machine learning are equally applicable in the private sector and in academia. Historically, it has been the role of pure academics to pay off our intellectual debts by, as Zittrain notes, “backfilling the theory,” while industry has generally been happy to just apply the right answer (think of marketing a drug without a known mechanism of action versus doing the basic science to sort out how the drug actually works). While this only answers approach may work in the short term, the likely shift of support away from basic research risks not replenishing the seed corn necessary to make the next set of fundamental breakthroughs as our current understanding of various phenomena reach their limits. Zittrain cites a provocative essay from the field of protein folding in exploring this concern.

Zittrain ends the longer version of his essay pointing out:

Most important, we should not deceive ourselves into thinking that answers alone are all that matters: indeed, without theory, they may not be meaningful answers at all. As associational and predictive engines spread and inhale ever more data, the risk of spurious correlations itself skyrockets. Consider one brilliant amateur’s running list of very tight associations found, not because of any genuine association, but because with enough data, meaningless, evanescent patterns will emerge. The list includes almost perfect correlations between the divorce rate in Maine and the per capita consumption of margarine, and between U.S. spending on science, space, and technology and suicides by hanging, strangulation, and suffocation. 

Remaining vigilant to these issues, while continuing to harness the power of statistics and computation, is a requirement for making sure that our intellectual debts do not overwhelm our pressing need to continue to invest in the improvement of our health and healthcare.