p2p software

decentralized software is making another rise with the recent explosion in popularity of CRDTs, popularized by martin kleppman. These new data structures “solve” the problem of conflict resolution among replicated data types in a distributed system, so that many clients with different states of the same data can all operate independently and sync when consistency is needed.

These structures are being popularized in libraries and tools such as:

yjs: a collaborative data types open-source library (provides CRDTized data types of the common javascript data structures, notably Map and Array)
libp2p: a p2p networking stack to enable development of p2p applications, developed by IPFS. In short, it defines a “pubsub interface for sending messages to all peers subsribed to a given “topic.""

There are several kinds of software that are derived from “decentralized” including

p2p software: scuttlebutt, canvas, bitTorrent
federated software: bluesky or mastodon
centralized arbiter: something like Coda or Figma
local-first: can be a combination of the above but focuses on making full functionality available while offline

These different areas of software all have their individual different goals, but decentralized software as a whole hopes to make better individual user experiences with software (via local-first of offline availability, speed of functionality, and the ability to own your own data), easy collaborative extensions (easy sync between devices), and community-stewarded platforms (via federated management and the decentralized choice of who to sync with).

Data Format

A big challenge is to support all the existing popular data structures that are already in use by developers in a way that requires no behavioral changes. This has proven challenging because of the risk from bad actors (see Jacky’s post on making a Byzantine-fault-tolerant JSON CRDT)

There are some known CRDTS, including:

G-Sets (Grow-only Set): an append-only log
2P-Set (Two-Phase Set): a G-Set that allows for removals via the tracking of another “tombstone” set (remove-wins though so once removed can never be re-added again)
LWW-Element-Set: same as 2P-Set but adds a timestamp with each operation and prioritized the last write
Sequence CRDTs:

Networking

Permissioning

This is still one of the most open-ended questions of this field, in terms of handling sensitive scoped actions and access.

The general approach that has emerged so far (via jazz.tools and others) is to create a blessed arbiter (that is functionally just another client connected to the same room) that must sign off any permissioned actions that clients try to make (e.g. a client that tries to change their email will commit a pending updateEmail action on their own user ID and must wait for the arbiter to sign it off for it to be considered valid for everyone)

local-first software

local-first

Timing and Clocks

Being aligned on time (more specifically, the order in which events happened) is one of the most important foundations for decentralized software because they are needed to determine how to resolve conflicts while maintaining the most correctness.

You can’t just add a “timestamp” to every event because machine clocks are not exactly reliable:

Until someone points out to you that the source you’re using for a timestamp – the local machine’s “wall clock” – isn’t completely reliable. Clocks drift, sometimes they go backward, and a malicious or particularly incompetent user could set their device’s clock to a wildly inaccurate value. For example, if one client’s clock is a day ahead, and they send some changes to an object, other clients would be completely unable to make changes to that object until the following day, when their local timestamps finally become “later” than the bad-actor’s timestamps

The most common approach to this is to use 1) lamport timestamps (which guarantee ordering) or 2) hybrid logical clocks( which do the same but also have a representation of “datetime”). These mechanisms do not guarantee knowledge of real ordering (although hybrid logical clocks store the “real” time value), but they do guarantee that 1) all events created on a single machine will be correctly ordered and 2) once A sends event(s) to B, all events created on B after those sent events will be ordered after.

Overlap with collaborative software

Effectively, they are two sides of the same coin. Offline is dividing the state temporally while real-time is dividing the state spatially. Effectively, CRDTs and decentralized software are about state reconcilation

Tradeoffs with traditional client-server software

~~infinitely growing log of events (can be addressed by rebasing / pruning)~~ actually these only apply to state-based CRDTs? whereas operation-based CRDTs can only transmit updates and drop what has happened before?
- do you still have this problem in having to keep around deleted nodes though?
data is more complex and not relational inherently (can be addressed by creating relational materialized views of the data)

spencer's cafe

Table of Contents

p2p software

Data Format

Networking

Permissioning

local-first software

Timing and Clocks

Overlap with collaborative software

Tradeoffs with traditional client-server software

References

Raw notes

Graph View

Backlinks