Demystifying JAM

jam-pen-polkadot.png

The following is a ground-up explanation of Polkadot 1, Polkadot 2, and how it will evolve to JAM. It is targeted towards a technical audience, but more so those that are not very familiar with Polkadot, but have a good high level understanding of blockchain based systems, and are possibly familiar with one other ecosystem at a technical level. I believe reading this can be a great prelude before reading the JAM graypaper.

Background Knowledge

This article makes use and assumes familiarity of the following concepts:

Prelude: Polkadot 1

First, a recap of what I consider the top novel features of Polkadot 1.

Let's dive further into sharded execution and what we mean by it.

Sharded Execution: All About Cores

For now, we are talking in the context of an L1 network that hosts other L2 "blockchain" networks, much like Polkadot and Ethereum. Therefore, the words L2 and Parachain can be used interchangeably.

The core problem with blockchain scalability can be stated as: There exists a set of validators, whose execution of some code can be trusted through crypto-economics of proof-of-stake. By default, these validators are expected to re-execute the entirety of each other's work. Therefore, the system as a whole is not scalable so long as we force all validators (re)execute everything at all times.

Note that increasing the number of validators in this model doesn't really increase the system's throughput, so long as the above absolute re-execution principle is in place.


The above demonstrated a monolithic (as opposed to sharded) blockchain. Inputs (i.e. blocks) are processed by all network validators, one by one.

In such a system, if the L1 wants to host further L2s, all validators have to now re-execute the work of all L2s as well. Obviously, this will not scale. Optimistic rollups are one way to circumvent this issue, in that re-execution (aka. fraud-proofs) only happen if someone claims a fraud to have happened. SNARK-based rollups circumvent this by leveraging the fact that verifying a SNARK proof is significantly cheaper than generating it, and therefore it is reasonable to allow all validators to verify a SNARK proof. More about this in Scalability Space Map Appendix.

A naive solution to sharding is to merely split the validator set into smaller subsets, and have this smaller subset re-execute L2 blocks. What is the issue with this approach? We are sharding execution and economic security of the network. The security of such an L2 is less than that of the L1, and the security drops further and further as we carve up the validator set into more shards.

Contrary to optimistic rollups that cannot afford re-execution at all times, Polkadot was designed with execution sharding in mind, and therefore can have a subset of its validators re-execute L2 blocks, whilst providing sufficient crypto-economical evidence to all network participants that the veracity of the L2 block is as secure as if the entire set of validators had re-executed it. This is possible through the novel (recently formally published) ELVES mechanism.

In short, one can see ELVES as a "Cynical Rollup" mechanism. Through a few rounds of validators proactively asking other validators if an L2 block is valid, we can reach an extremely high probability that the L2 block is valid. Indeed, in case of any disputes, very soon the entire validator set is asked to participate. This is explained in detail in an article by Rob Habermeier, Polkadot co-founder.

ELVES is why Polkadot can have two properties previously assumed to be mutually exclusive: "Sharded Execution", with "Shared Security". This is the main technological outcome of Polkadot 1 when it comes to scalability.

Now, moving on to the "Core" analogy.

An execution-sharded blockchain is very much like a CPU: In much the same way that a CPU can have many cores that execute instructions in parallel, Polkadot can progress L2 blocks in parallel. This is why an L2 on Polkadot is called a Parachain[1], and the environment in which the smaller subgroup of validators re-executes a single L2 block is called a "core*. Each core can be abstracted as "a group of validators working in coordination".

You can imagine a monolithic blockchain as one that ingests a single block at any given time-slot, while Polkadot ingests 1 relay-chain block, and 1 parachain block per core, per time-slot

Heterogeneous

So far, we only talked about scalability, and that Polkadot provides sharded execution. It is important to note that each of Polkadot's shards is an entirely different application[2]. This is achieved through the usage of a bytecode-stored meta-protocol: A protocol in which the definition of the blockchain is stored as bytecode in state of the blockchain itself. In Polkadot 1.0, WASM was used as the bytecode of choice, and in JAM, PVM/RISC-V is being adopted.

All in all, this is why Polkadot is called a heterogeneous sharded blockchain. Each of the L2s is an entirely different application.

Polkadot 2

A big part of Polkadot 2 is about making cores more flexibly use-able. In the original Polkadot model, a core could have been rented for 6 month up to 2 years at a time. This is suitable for resourceful businesses, but less so for small teams. The feature that enables Polkadot cores to be used in a more flexible way is called "agile coretime". In this model, Polkadot cores can be rented for as little as one block at a time, up to a month at a time, with price cap guarantees for those that want to rent for long term.


Polkadot 2, among other features, is unfolding as we speak, and not much further needs to be said about it here.

Inside Core vs. On the Chain

To understand JAM, it is first useful to look into what happens in a Polkadot core when an L2 block comes in.

The following contains a lot of simplification.

Recall that a core is constituted primarily from a group of validators. So, when we say data is sent to the core", it is gossiped to this group of validators.

  1. An L2 Block + a subset of that L2's state is sent to the core. This is all the data that is needed to execute the L2 block[3].
  2. A subset of validators, those that constitute the core, re-execute the L2 block and proceed with further consensus related tasks.
  3. The core validators make the data needed for re-execution available to other validators (outside the core). Further validators might decide to re-execute this L2 block based on ELVES rules, and they need the data to do so.

Note that all the operation so far is outside the Polkadot's main block and state transition function. Everything so far happens inside a core, and the data availability layer.

  1. Eventually, a small commitment of the L2's latest state becomes visible on the main Polkadot relay chain state. This operation, which is much cheaper than the actual re-execution of the L2 block, unlike everything else so far, affects the main Polkadot state, becomes visible in a Polkadot block, and is executed by all Polkadot validators.

From the above, let's explore some of the operations that Polkadot is performing:

From 1, we can learn that there exists a new type of execution in Polkadot that differs from normal blockchain state transition function. Typically, when all validators of a network execute some work, and as the outcome the main blockchain state is updated. We call this an on-chain operation. This is what happens in step 3. Yet, what happens inside a core (step 1) differs from this. We call this novel type of blockchain computation in-core execution.

Next, from point 2 we can deduce that Polkadot already provides a native Data-Availability (henceforth DA) layer, and L2's automatically use it to keep their execution evidence available for some period of time. Yet, the blob that can be posted to this DA[4] layer is fixed, and it is always the evidence needed to re-execute an L2 block. Moreover, parachains's code never read from the DA layer.

Understanding the above is foundation of understanding majority of what JAM does. To recap:

JAM

With the understanding gained in the previous section, we can smoothly transition into what JAM is.

JAM is a new protocol, heavily inspired by Polkadot, and fully compatible with it, aiming to replace the Polkadot relay chain and make the usage of cores radically un-opinionated[5].

JAM builds on top of Polkadot 2, in that it tries to make Polkadot cores more accessible, but in ways radically more flexible and un-opinionated than agile-coretime.

This is primarily achieved through exposing all the main 3 primitives discussed in the previous section to programmers, namely: on-chain, in-core, and the DA layer.

In other words, in JAM, developers are exposed to:

  1. Allow both the work that is being done in-core and on-chain to be fully programmable.
  2. Allow arbitrary data to be read-from and written-into the Polkadot DA layer.

This is a foundational description of what JAM aims to be. Needless to say, a lot has been simplified here, and the protocol might still evolve.

With this foundation in mind, we can now dive further into a few details of JAM in the coming sections.

Service and Work Items

This is why in the context of JAM, what used to be called an L2/Parachain is now called a Service, and what used to be called block/transaction is now called Work-Item or Work-Package. Concretely, Work-Items belong to a Service, and Work-Package is a group of Work-Items. Both terms are deliberately chosen to be generic enough to encapsulate use-cases beyond a blockchain/L2.

A service is described by 3 entry points, two of which are fn refine() and fn accumulate()[6]. The former describes what the service does in-core, and the latter describes what the service does on-chain.

Finally, the name of the two entry-points is why the protocol is called JAM: the Join Accumulate Machine. Join is fn refine(), when all Polkadot cores do a lot of work, all in parallel, for different services. Join is when the data is distilled into a smaller subset, and is then passed onto the next stage. Accumulate is when the result of all the aforementioned are accumulated into the main JAM state. This is the on-chain execution part.

Work-items can specify exactly what code they execute in-core, on-chain, and if/how/what they read and write to/from the Distributed Data Lake.

Semi Coherence

Recall from existing material around XCM, Polkadot's language of choice for parachain communication, that all such communication is asynchronous. That is, a message is sent, and its reply cannot be waited upon.

Asynchrony is the manifestation of an incoherent system, and is a major drawback of systems that are permanently sharded, such as Polkadot 1 and Polkadot 2 and the existing L2 landscape in Ethereum.

Yet, as described in the graypaper section 2.4, a fully coherent system that is always synchronous for all of its tenants can also only grow so much without compromising generality, accessibility or resilience.

Synchronous ~ Coherent || Asynchornous ~ Incoherent

This is another area where JAM stands out: through the introduction of multiple properties, JAM achieves a novel middle-ground: a semi-coherent system, one in which sub-systems that communicate often have a chance at creating a coherent environment with one another, whilst not enforcing the entire system to be coherent. This is best described in this interview with the Dr. Gavin Wood, the author of the graypaper:

Another way to look at this is to see Polkadot/JAM as a sharded system where the boundaries of those shards are fluid, and determined on the fly.

Polkadot has always been sharded, and fully heterogenous.

Now, it will be sharded, heterogenous, AND the boundaries of those shards can be determined flexibly, what @gavofyork is referring to as semi-coherent system in https://t.co/tjAboJL9IA

— Kian Paimani (@kianenigma) May 15, 2024

The properties that enable this are:

  1. Access to both a state-less, parallel in-core execution where different services can only interact synchronously with other services that reside on the same core in that particular block, and on-chain execution where a service has access to the outcome of all services across all cores.
  2. JAM does not enforce any particular scheduling of services. Services that talk to each other frequently can create an economic incentive for their sequencers to create WorkPackages that contain WorkItems of services that often communicate. This enables them to reside in the same core, and in practice, talk to each other as-if they were in a synchronous environment.
  3. Moreover, JAM services have access to the DA layer and can use it as an ephemeral, yet extremely cheap data layer. Once a data is placed in the DA, it is eventually propagated to all cores, but is guaranteed to be available in that same core immediately. Therefore, JAM services, through scheduling themselves into the same core in consecutive blocks[7], can enjoy a much higher degree of access to data.

It is important to note that while the above is possible in JAM, it is not enforced at the protocol layer. Therefore, it is expected that certain interfaces are asynchronous in theory, but can act synchronously in practice through elegant abstractions and incentives. One such example in CorePlay, discussed in the next section.

CorePlay

This section describes CorePlay, an experimental idea in the context of JAM, which can be described as a new model for programming smart contracts. CorePlay is, at the time of writing, un-specified[8] and remain an idea.

To understand CorePlay, we need to first introduce JAM's virtual machine of choice, PVM

PVM

An important detail of JAM and CorePlay is Polkadot Virtual Machine, PVM for short. Low level details of PVM are beyond the scope of this article and are best described in the graypaper by the domain experts. Yet, for the sake of this article, we only have to elaborate on a few properties of PVM:

The latter is particularly important for CorePlay.

CorePlay is one example of using JAM's flexible primitives to create a synchronous and scalable smart contract environment with a very flexible programming interface. CorePlay suggests deployment of actor-based smart contracts directly on JAM cores, and enabling them to enjoy a synchronous programming interface whereby they can be coded as a normal fn main(), in which they can communicate using let _result = other_coreplay_actor(data).await?. In the case other_coreplay_actor is in the same core in that JAM block, this call is synchronous, and in case in another care, the actor is paused and will be resumed in a later JAM block. This is precisely possible because of JAM services and their flexible scheduling, and PVM's properties.

CoreChains Service

Finally, let's wrap up by reminding all readers of the main reason why we mentioned JAM is fully compatible with Polkadot. The main product of Polkadot is Parachains in the agile-coretime fashion, and this product persists in JAM.

The first service that will be deployed in JAM is likely to be one that is called something along the lines of CoreChains or Parachains. This is the service that will allow existing Polkadot-2-style parachains to be executed on JAM.

Further Services can be deployed on JAM, and the existing CoreChains service can communicate with them, but the existing product offering of Polkadot will remain strong, and only new doors will be opened to existing Parachain teams.


Appendix: Data Sharding

The majority of this article covered the topic of scalability from the perspective of execution sharding. We can also look at the same in the context of data. Interestingly, we find a similar situation as with what was mentioned in Semi Coherence: A fully coherent system is better in principle, but does not scale. A fully incoherent system scales, but is not desirable, and JAM, with its semi-coherent model, poses a new possibility.

Fully Coherent System: This is what we see in a fully synchronous, smart contract platform, like Solana, or those brave enough to only deploy on Ethereum L1. All the application data is stored on-chain, and is easily accessible to all other applications. A perfect property for program-ability, but not scalable.

Incoherent System: Application data is kept outside the L1, and in different, isolated shards. Extremely scalable, but not great for composability. Polkadot and the Ethereum rollup model.

JAM, other than providing both of the above, also allows programmers to post arbitrary data into the JAM DA layer, which in some sense is a middle-ground between on-chain and off-chain data. A novel category of applications can be written while leveraging the DA layer for majority of the application data, while only persisting what is absolutely crucial into the JAM state.

Appendix: Scalability Space Map

This part re-explain our view on the blockchain scalability landscape. This is also explained in the graypaper, and this is a more concise version of that.

The scalability in blockchains follows the approaches used in traditional distributed systems for the most part: scaling up (vertical), and scaling out (horizontal).

Scaling up is what the likes of Solana are doing. Hyper optimize both the code, and the hardware to achieve maximal throughput.

Scaling out is what Ethereum and Polkadot are doing: Reducing the amount of work that has to be done by everyone. In a traditional distributed system, this is done by adding more replicated machines. In blockchains, the "computation machine" is the entire validator set of a network. By splitting work between them (what ELVES does), or optimistically discounting their duties (what optimistic rollups do), we are reducing the workload on the entire validator set as a whole, and therefore scaling out the system.

Scaling out in blockchains is analogous to "reducing the number of machines that have to execute everything".

To summarize:

  1. Scaling up: Beefy hardware + optimization in a monolithic blockchain.
  2. Scaling out:
    1. Optimistic rollups
    2. SNARK-based rollups
    3. ELVES: Polkadot's cynical rollups

Appendix: Same Hardware, Kernel Update

This section builds on top of an analogy provided by Rob Habermeier in Sub0 2023: Polkadot: Kernel/Userland | Sub0 2023 - YouTube, and demonstrates what JAM, as an upgrade is to Polkadot: a Kernel Update on top of the same hardware.

In a typical computer, we can divide the entire stack into 3 segments:

  1. Hardware
  2. Kernel
  3. Userland

In Polkadot, the hardware, the essence of what provides computation and data availability has always been cores, as described above.

The kernel in Polkadot, in practice[9], so far has consisted of two things:

  1. The parachains protocol. An opinionated, rigid way to use cores.
  2. A set of low level functionalities, such as the DOT token and it transfer-ability, staking, governance and alike.

Both of these are what today resides in the Polkadot relay chain.

The userland applications are finally instances of parachains, their native tokens, and whatever else is built on top of them.

We can visualize this as follows:

Polkadot always envisioned moving more of its core functionality to its first class user, parachains. This is what the Minimal Relay RFC aims to achieve.

This implies that the Polkadot relay chain deals only with providing the parachains protocol, somewhat shrinking the kernel space.

Once this architecture is achieved, it is easier to visualize what JAM migration would look like. JAM would drastically shrink the kernel space of Polkadot, and make it more general-purpose. Moreover, the parachains protocol is moved to the user space and because merely one of the only ways that applications can be written on top of the same cores (hardware) and kernel (JAM).

This should also, one last time, illustrate why JAM is a replacement only for the Polkadot relay chain, and not the parachains.

In other words, we can see the JAM migration as a kernel upgrade. The underlying hardware remains the same, and a large chunk of the old kernel is moved to the userland for simplicity.

Resources


  1. Parallel Chain. ↩︎

  2. aka. Blockchain, or state transition function. ↩︎

  3. In Polkadot 1.0 we called the L2's state proof PoV (Proof of Validity), and the combination of the state proof and the parachain block PVF (Parachain Validation Function). ↩︎

  4. Note that the graypaper calls this DA layer, the Distributed Data Lake, DDL for short. For the sake reducing the number of new keywords in the article, we continue to refer to it as DA, or DA Layer. ↩︎

  5. It is crucial to point out that JAM is only meant to replace the Polkadot relay chain. Parachains, and all applications that run on top of Polkadot remain intact, mainly thanks to CoreChain. ↩︎

  6. And the 3rd one being an on_message, which is called when a message from another service arrives. ↩︎

  7. For more information about the scheduling of services, see the "Authorization" section in the graypaper. ↩︎

  8. The best written resource on CorePlay is this draft RFC. ↩︎

  9. Emphasizing the word "in practice", as even in the aforementioned talk, Rob denotes the parachain protocol as Userland application of Polkadot. But, this is a theoretical assumption and the parachains protocol has been well baked into the core Polkadot protocol, i.e. the kernel itself. ↩︎