# The cr.yp.to blog

## 2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word

The boss needed item 3 inserted into a numbered list of hundreds of items. The intern used a mouse to select the original 3 on the screen, then typed 4, then selected the original 4, then typed 5, then scrolled down, then selected the original 5, then typed 6, and so on. Another intern sat watching the screen to make sure there were no mistakes.

I happened to be in the room for other reasons. I remember the horror of watching the beginning of this barbaric editing process. Those poor interns!

When I enter a list of items into the computer, what I'm typing doesn't look like

     1. ...
2. ...
3. ...


but more like

     * ...
* ...
* ...


Each asterisk is a special command to the computer, telling the computer to automatically display the next number for the reader. The reader eventually sees

     1. ...
2. ...
3. ...


but that isn't what I typed. This small difference produces a tremendous savings of time whenever I insert an item, or delete an item, or move an item.

If I decide later to skip the numbers and use bullets instead, I tell the computer to introduce each list item with a bullet. This is one command covering the whole list. There's also a command that does the same thing for the whole document. There isn't separate work for each item. It's no problem if a coauthor later wants to change bullets back to numbers.

The interns, I suppose, would be manually changing "1." and "2." and "3." and so on to "•" and "•" and "•" and so on. Or maybe they would be trying to figure out how some search-and-replace feature could do the same thing; let's hope the document doesn't have a sentence somewhere that talks about something that happened in the year 2001. Or maybe the interns would be quitting and finding a better job.

[Note added 2020.12.07: I was expecting that many of my readers would already be accustomed to relying on the computer for automatic numbering. I was surprised, however, to see some comments along the lines of "Inconceivable!" from readers unable to imagine how the interns could have been in a different situation, going through such a shockingly inefficient revision process. Here's a hint: Each item in the list looked like a flush-left paragraph, like the paragraphs in this blog post, adjacent to the left margin. The text being selected by the mouse, for example to change "3" to "4", was to the right of the margin, like the rest of the text in each item.]

Abstraction as a time-saver for authors. This use of asterisks is just one example of how I'm often typing something more abstract than what's seen by the ultimate reader. I don't type "Figure 12" or "see [41]", for example; I type things like "Figure \ref{network-measurements}" and "see \cite{multiplication-survey}", and I let the computer automatically convert "\ref{network-measurements}" and "\cite{multiplication-survey}" into numbers to display for the reader.

With one extra command, covering the entire document, I can tell the computer to include section numbers as part of all figure numbers in the document, so that the figures are easier for the reader to find: e.g., Figures 3.1 and 3.2 and 3.3 are in Section 3. With another command, again covering the entire document, I can tell the computer to cite all authors by name rather than by number.

As another example, I was recently editing a mathematical paper, and I decided that a particular concept would be easier for the reader to remember if I changed the notation that I was using for the concept. The notation was all over the paper, but this change took just a few seconds of editing. I had given a name to the concept, had told the computer once to display this name as a particular notation, and had then typed this name throughout the paper, so there was only one place where I had to change the notation.

Of course one can't, and shouldn't try to, prepare in advance for every possible change to a document. But it's not hard to prepare for the most likely changes. This small initial effort saves a tremendous amount of time later. When I say "small", I'm including the effort to select a document-creation system that's designed to make this sort of thing easy.

(As a side note, programmers will recognize this strategy as an example of the information hiding strategy introduced by Parnas, and will recognize that modern program-creation systems are designed to make this easy.)

Microsoft Word isn't completely missing abstractions, but these abstractions are competing for user-interface resources against features encouraging the user to work at lower abstraction layers. The extra effort to use the abstractions ends up pushing users into doing something simpler, something that just works now, and paying heavily for this choice later when the document is being revised.

Have I done a scientific study proving that Microsoft Word is less efficient than LaTeX? No. I'd love to see a careful study of this topic. Short-term, this would help guide new authors to make sensible choices. Longer-term, insights from this sort of study could be the basis for further improving our document-creation systems. I certainly don't think that the existing systems are perfect. (Example.)

Imagine, however, that a study looks only at the time for someone looking at a printout to create a document matching this printout. This would be blind to the time for subsequent edits. This would be blind to the suffering of those interns. This would incorrectly conclude that typing "1. ... 2. ... 3. ..." and "see [41]" is more efficient than typing "* ... * ... * ..." and "see \cite{multiplication-survey}". It is slightly more efficient in this limited metric, but it is much less efficient in the metric that matters, namely the total time spent by the user.

An example of a "scientific" study. At this point you're probably thinking that nobody could possibly miss such an obvious issue. This brings me to the main topic of this blog post, a 2014 peer-reviewed study "An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development" by two psychologists, Knauff and Nejasmic ("KN").

Participants in the study were given a page of text and were given a limited time to type the page into the computer. There were three different types of text:

• simple prose with a few footnotes;
• a page with a complicated table of data; and
• a page with many mathematical formulas.

Participants were scored on the basis of how much text they typed and how accurately they typed it. The time was so rushed that a significant fraction of participants didn't finish typing the whole page, even for the case of simple prose.

The study considered two document-creation systems: LaTeX and Microsoft Word, in each case with "all tools, editors, plug-ins, and add-ons" that participants were "accustomed to using". Of course different "add-ons" could have different efficiency, and of course there are other document-creation systems, but these are topics for another blog post.

The study produced many pages of results, which I'll summarize by saying that Word did slightly better on the prose and much better on the table, while LaTeX did better on the formulas. The study authors made no effort to measure any subsequent document-editing step.

Slithering from one metric to another. The fundamental mistake in the KN paper is the change of cost metric.

The original question was how efficiently authors are creating documents: in particular, how efficiently authors are creating academic research papers. KN claimed in their title to be comparing "efficiency" of "document preparation systems used in academic research". But they then quietly changed this metric in three ways:

• They considered only the efficiency of an initial fragment of the document-creation process, ignoring the time spent revising documents. They provided no reason to believe that the efficiency of this fragment was well correlated with what they had previously claimed to be measuring. Nothing in their paper acknowledges the most obvious reason for a negative correlation, namely that slightly more work at the outset makes revisions much easier later. In my experience, document-creation systems vary in how well they support this work.
• KN didn't even measure the time taken for this initial fragment of the process. Instead they imposed a rushed time limit, and measured how incomplete and inaccurate the resulting document was. Again they provided no reason to believe that what they measured was well correlated with what they had previously claimed to be measuring. Perhaps they were assuming that more mistakes will take more time to fix, but my experience is that some types of mistakes are much easier to fix than others, and that document-creation systems vary in the types of mistakes they encourage.
• KN didn't even measure creating a new document, which is what academics are actually spending their time doing. People who were writing papers in the age of typewriters will remember writing and editing papers by hand before tediously typing the final pages, but that was because editing a typed page ranged from annoying (white-out, or sometimes scissors and tape) to super-annoying (retyping the whole page). Today the initial writing on paper is skipped, and typing is interleaved in small chunks with parts of the author's thought process, making the typing process much less boring. I'm continually re-reading and thinking about what I just typed. Is the error rate of the academic's modern typing process well correlated with the error rate of the archaic retyping process that KN measured? Again KN provide no reason to believe this.

Did KN use the honest title "A comparison of the unreliability of rushed retyping of a page using document preparation systems that are also used for academic research and development"? No. Would you expect a journal to accept a paper with such a title?

Instead they used a title claiming, without justification, to measure something else: "An efficiency comparison of document preparation systems used in academic research and development". So they were advertising metric X, the efficiency of academic document preparation, while actually studying metric Y, the unreliability of rushed retyping of an existing page. Anyone who simply asks "Could a Y comparison mispredict an X comparison?" will immediately come up with all sorts of reasons that the answer is yes.

Fake science, piled higher and deeper. I'll close this review by commenting on some quotes from KN:

We empirically compared the usability of LaTeX and Word under highly realistic working conditions.

Where is the justification for the claim that this was "highly realistic"? Is it "highly realistic" to have researchers starting from an existing page of text, rushing to type the page into the computer, and then not spending any time revising the text? Perhaps the authors of this study produce their own papers this way, but they don't say this, and they also don't justify extrapolating from anecdotal evidence.

Let me suggest a followup study of the following hypothesis. Compared to researchers who use LaTeX, researchers who use Microsoft Word produce papers that are significantly worse, not just in appearance but also in content. One of the reasons for this is that researchers who use Microsoft Word need much more time for revisions than researchers who use LaTeX, and as a result are systematically deterred from making revisions that would significantly improve the content of their papers.

Our study suggests that LaTeX should be used as a document preparation system only in cases in which a document is heavily loaded with mathematical equations. ... LaTeX is also used frequently for text that does not contain a significant amount of mathematical symbols and formula [sic]. We believe that the use of LaTeX under these circumstances is highly problematic and that researchers should reflect on the criteria that drive their preferences to use LaTeX over Microsoft Word for text that does not require significant mathematical representations.

If scientists claim that researchers "should reflect on" something, aren't they under an obligation to cite at least a small sample of the previous literature doing exactly this? Of course there were also various responses to this study, and the responses generally sound like things that people had already thought through.

A striking result of our study is that LaTeX users are highly satisfied with their system despite reduced usability and productivity. From a psychological perspective, this finding may be related to motivational factors, i.e., the driving forces that compel or reinforce individuals to act in a certain way to achieve a desired goal. A vital motivational factor is the tendency to reduce cognitive dissonance. ... This bias is usually unconscious and becomes stronger as the effort to reject the chosen alternative increases, which is similar in nature to the case of learning and using LaTeX.

It's certainly striking to see the contrast between (1) LaTeX users being more satisfied than Microsoft Word users and (2) KN claiming that Microsoft Word is more efficient. It's even more striking to see how KN explained this: basically, LaTeX users are emotionally unable to handle the thought that they might be making a mistake in using LaTeX, and thus bury this thought under an artificial feeling of satisfaction. Using LaTeX is a happiness drug! Hey, buddy, want to give LaTeX a try? It'll make you feel great! First page is free!

A much more obvious explanation is that KN screwed up their entire study by choosing an unrealistic efficiency metric. Nowhere did KN acknowledge this explanation. From a psychological perspective, this surprising blindness on the part of KN may be related to motivational factors, such as the tendency to reduce cognitive dissonance. Authors who have put work into a study have a bias towards being satisfied with their own work, and this interferes with them rationally considering the possibility that their study was fundamentally flawed.

A third decision criterion that should factor into a researcher's choice of a document preparation system is the cost of research and development to the public or industry. Researchers have a responsibility to act economically and efficiently to create new technologies and theories that benefit society, especially in cases in which research is publicly funded. ... Given these numbers it remains an open question to determine the amount of taxpayer money that is spent worldwide for researchers to use LaTeX over a more efficient document preparation system, which would free up their time to advance their respective field.

Wow. We're supposed to be making decisions about how public money is used on the basis of an "efficiency" study that incorrectly equates different efficiency metrics and displays no understanding of the importance of the selection of a metric?

From our point of view empirical results make a stronger case than claims that are not based on such empirical findings. It is astonishing how some commentators ignore the basic principles of scientific decision-making that is, collecting facts, control over variables, using systematic methods, careful measurement, connecting causes and effects, and making rational evidence-based decisions, instead of generalizing personal impressions or opinions.

This isn't a quote from the paper itself; it's a followup comment from KN.

Did KN carry out a scientific study of the efficiency of preparing academic research papers? No, they didn't. What they actually measured was something different. This change of metric undermined their "facts", their "control", their "systematic methods", their "careful measurement", their "connecting causes and effects", and their "rational evidence-based decisions". Their title and main conclusions were, and are, speculation posing as science.

I'm not saying that quantitative efficiency studies are a bad thing in general. Again, I would love to see a properly designed study of the total cost of creating a document. But this would take serious effort, first to monitor the entire lifecycle of various documents and then to see how efficiently these lifecycles can be reproduced in different document-creation systems.

The authors of this particular study didn't want to bother spending so much time, so they switched to another metric that was easier to measure. The fundamental problem is that this metric says very little about the user's actual costs.

Version: This is version 2020.12.07 of the 20201206-msword.html web page.