Operational Transformation, the real time collaborative editing algorithm

pubgcheatsfree
5 min readFeb 20, 2024

--

I have been researching the optimistic concurrency algorithm for real time collaboration known as Operational Transformation (OT). OT is the algorithm used by many real time collaborative editors that you see today, such as Google Wave, Google Docs, Etherpad, Mockingbird, and Novell’s Vibe Cloud (formerly known as Pulse). This is the first in a series of posts I will be writing on OT.

As I am writing this series, I am I also implementing the ideas being discussed in JavaScript. I have made two public repositories on github:

  1. operational-transformation — This is an Operational Transformation library for plain text which assumes nothing about how the storage backend for your documents works, what kind of frontend you will be using, etc. This is where most of the ideas discussed in this series will be implemented.
  2. operational-transformation-example — This repository contains an example application which takes advantage of the OT library.

Why Operational Transformation?

Operational Transformation is a good fit for web applications because it is optimistic. Consider the scenario where multiple clients are concurrently editing the same document hosted on a central server. The goal is to provide a real time editing experience so that the clients can collaborate as they change the document, and each client’s changes are preserved.

OT optimisticly assumes that whatever operations are currently being applied to the document on any given client will not conflict with any operations that might be applied at this same moment by one of the other clients. Conflicts can’t be discovered, let alone dealt with, until some point in the future. This frees operations to be applied to the document as they are generated by the client and the user interface is updated in a responsive manner. There is no need to request a lock on the document from the server.

Locking, and other pessimistic concurrency algorithms, are unrealistic within the constrains of the web application environment because of the high latency associated with communicating round trip with the server and other clients. The majority of clients would be powerless to change the document for many seconds at a time while another client has acquired the lock. This is a recipe for bad user experience.

Furthermore, it is natural to stop thinking of changes at the document level, and instead focus on individual operations being applied. Operations are much more malleable than whole documents. If two clients change their documents such that they become out of sync, how do you bring them back to the same state while preserving both changes in a meaningful way? In comparison, it is easy to adjust the index of a single insertion operation on client A because of a deletion from client B, and then share this adjusted operation with client B so that both clients’ documents are in the same state.

Finally, Operational Transformation allows us to only consider two operations at any given time. Rather than fret about how client A’s changes will affect client B’s document after client B just applied client C’s recent changes, we only compare one client’s document and the central master document at any given time. Not only is this much more simple, with many less edge cases, but it also directly maps on to the client/server architecture of the web.

A Bird’s Eye View of Operational Transformation

At the heart of the aptly-named Operational Transformation is the transformation of operations such that they can be applied to documents whose states have diverged, bringing them both back to the same state. In other words,

xform(a, b) = {a', b'}

such that if applying the operation a followed by the operation b’ to document Y yields document Z, applying the operation b followed by the operation a’ to document Y must also yield document Z.

The one caveat to this rule is summarized in the paper High Latency, Low-Bandwidth Windowing in the Jupiter Collaboration System:

Of course, there are many possible functions that have this property. For example, the function xform(c, s) = {delete everything, delete everything} would satisfy our ordering property, but would probably not satisfy many users! In general, the transforms should try to find some “reasonable” way to combine the two operations into a final effect.

However, xform only handles the case where each side has diverged by one operation. In practice, it is hardly ever the case that we only consider divergences of one. To rectify the situation, xform will be called repeatedly, as you traverse across document state space until both the client and server’s documents reach the same state. I will expand on these transformations in a later post in this series. In the meantime, see Understanding and Applying Operational Transformation for more.

Resources

Read the following papers and articles to learn more about Operational Transformation and to prepare for the rest of this series.

High Latency, Low-Bandwidth Windowing in the Jupiter Collaboration System

Authored by David A. Nichols, Pavel Curtis, Michael Dixon, and John Lamping.

This is the paper that I have seen referenced most often by other papers and articles discussing Operational Transformation. It is very well written and approachable, and was the first paper published describing OT as we know it today.

Understanding and Applying Operational Transformation

Authored by Daniel Spiewak.

This is a great article written by one of the developers working on Novell Vibe Cloud (previously named Pulse). Vibe Cloud implements the Wave Protocol, and the article reflects this by focusing on Wave’s implementation of OT and Google’s contribution to OT theory with the idea of composing operations. Nevertheless, this piece is incredibly informative and provides a very detailed walkthrough of the one of the more difficult transformations you will have to do.

Wikipedia on Operational Transformation

Wikipedia’s article on OT is very good, but not as approachable as either of the two resources above. I suggest reading them first and then only reading the Wikipedia entry once you feel you feel comfortable with the others.

Very Simple Operational Transformation in Erlang

This was a very simple implementation of OT that I wrote in Erlang. It is written in somewhat of a Literate Programming style, so it should be fairly easy to follow along with. I will be going in to much more depth in this series, and with the JavaScript implementation, than I previously did in this project.

The goals of the Erlang project was to prove to myself that I understood OT, could implement a simple version of it, and pass all of my tests. The goals of this series of blog posts, and accompanying JavaScript OT implementation, is a production-worthy real time collaboration app.

--

--