Shoal framework optimizes DAG Consensus significantly drops Aptos Bullshark latency

2025-08-09 04:59:16

Optimizing DAG Consensus: How the Shoal Framework Drops Bullshark Latency on Aptos

Aptos Labs recently solved two important open problems in DAG BFT, significantly dropping the latency, and for the first time eliminated the need for timeouts in deterministic asynchronous protocols. Overall, the Shoal framework improved Bullshark's latency by 40% in fault-free situations and by 80% in fault situations.

Shoal is a framework that enhances the Narwhal-based Consensus protocol ( through pipelining and leader reputation, such as DAG-Rider, Tusk, and Bullshark ). Pipelining reduces DAG sorting latency by introducing an anchor point in each round, while leader reputation further improves latency by ensuring that anchor points are associated with the fastest validating nodes. Additionally, leader reputation allows Shoal to leverage asynchronous DAG constructions to eliminate timeouts in all scenarios. This enables Shoal to provide universal responsiveness, incorporating the typically required optimistic responsiveness.

The technology of Shoal is relatively simple, involving multiple instances of the underlying protocol running in sequence. When instantiated with Bullshark, we get a group of "sharks" participating in a relay race.

Motivation

In the pursuit of high performance in blockchain networks, there has been a continuous focus on dropping communication complexity. However, this approach has not resulted in a significant increase in throughput. For example, the Hotstuff implemented in early versions of Diem only achieved 3500 TPS, far below the target of 100k+ TPS.

Recent breakthroughs stem from the realization that data propagation is the main bottleneck based on leader protocols, which can benefit from parallelization. The Narwhal system separates data propagation from core Consensus logic, proposing an architecture where all validators propagate data simultaneously, while the Consensus component only orders a small amount of metadata. The Narwhal paper reports a throughput of 160,000 TPS.

In previous articles, we introduced Quorum Store. Our Narwhal implementation separates data propagation from Consensus and how to use it to scale the current Consensus protocol Jolteon. Jolteon is a leader-based protocol that combines Tendermint's linear fast paths with PBFT-style view changes, reducing Hotstuff latency by 33%. However, leader-based Consensus protocols clearly cannot fully leverage Narwhal's throughput potential. Despite separating data propagation from Consensus, the leader of Hotstuff/Jolteon is still constrained as throughput increases.

Therefore, we decided to deploy Bullshark, a zero-communication-overhead consensus protocol, on top of the Narwhal DAG. Unfortunately, compared to Jolteon, the DAG structure that supports Bullshark's high throughput incurs a 50% latency cost.

This article introduces how Shoal significantly drops Bullshark latency.

DAG-BFT Background

Each vertex in the Narwhal DAG is associated with a round. To enter round r, a validator must first obtain n-f vertices that belong to round r-1. Each validator can broadcast one vertex per round, and each vertex must reference at least n-f vertices from the previous round. Due to network asynchronicity, different validators may observe different local views of the DAG at any point in time.

A key attribute of DAG is non-ambiguity: if two validating nodes have the same vertex v in the local view of the DAG, then they have exactly the same causal history of v.

Total Order

A total order consensus can be reached on all vertices in the DAG without any additional communication overhead. To achieve this, the validators in DAG-Rider, Tusk, and Bullshark interpret the DAG structure as a Consensus protocol, where vertices represent proposals and edges represent votes.

Although the group intersection logic in the DAG structure is different, all existing consensus protocols based on Narwhal have the following structure:

Anchor Point: Every few rounds, there is a pre-determined leader, whose vertex is called the anchor point.
Sorting Anchors: Validators independently but deterministically decide which anchors to sort and which anchors to skip.
Causal History Sorting: Validators process the ordered anchor point list one by one, sorting all previously unordered vertices in the causal history of each anchor point.

The key to satisfying security is to ensure that in step (2), all honest validator nodes create an ordered anchor list that shares the same prefix. In Shoal, we make the following observations about all these protocols:

All validators agree on the first ordered anchor point.

Bullshark latency

The latency of Bullshark depends on the number of rounds between ordered anchors in the DAG. Although the synchronous version of Bullshark is more practical and has better latency than the asynchronous version, it is far from optimal.

Question 1: Average block latency. In Bullshark, there is an anchor point for every even round, and the vertices of each odd round are interpreted as votes. In common cases, it takes two rounds of DAG to sort the anchor points; however, the vertices in the causal history of the anchor points require more rounds to wait for the anchor points to be sorted. Generally, the vertices in odd rounds need three rounds, while the non-anchor point vertices in even rounds need four rounds.

Question 2: Latency of failure conditions. If a leader in a round fails to broadcast the anchor point quickly enough, the anchor point cannot be ordered ( and is thus skipped ). All un-ordered vertices from previous rounds must wait for the next anchor point to be ordered. This significantly drops the performance of the geographical replication network, especially since Bullshark uses timeout to wait for the leader.

Shoal Framework

Shoal enhances Bullshark( or any other Narwhal-based BFT protocol) through a pipeline, allowing for an anchor point in each round and reducing the latency of all non-anchor vertices in the DAG to three rounds. Shoal also introduces a zero-cost leader reputation mechanism in the DAG, biasing selection towards fast leaders.

Challenge

In the context of the DAG protocol, pipeline and leader reputation are considered challenging issues for the following reasons:

Previous attempts to modify the core Bullshark logic in the pipeline seem to be essentially impossible.
The introduction of leader reputation in DiemBFT and its formalization in Carousel is based on the dynamic selection of future leaders according to the past performance of validators, the idea of anchors in (Bullshark. While discrepancies in leader identity do not violate the security of these protocols, they may lead to completely different orderings in Bullshark, which raises the core issue: dynamically and deterministically selecting round anchors is necessary for resolving Consensus, and validators need to reach an agreement on the ordered history to select future anchors.

As evidence of the difficulty of the problem, we note that Bullshark's implementation, including the one currently in production, does not support these features.

Protocol

Despite the challenges mentioned above, the solution lies in simplicity.

In Shoal, we rely on the ability to perform local computations on the DAG and achieve the capability of saving and reinterpreting information from previous rounds. With all validators agreeing on the core insight of the first ordered anchor point, Shoal sequentially combines multiple Bullshark instances for pipelining, making ) the switching point of the first ordered anchor point, as well as ( the causal history of the anchor point used to calculate leader reputation.

![A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-46d37add0d9e81b2f295edf8eddd907f.webp(

Pipeline

V that maps rounds to leaders. Shoal runs Bullshark instances one after another, such that for each instance, the anchor is determined in advance by the mapping F. Each instance orders an anchor, triggering the switch to the next instance.

Initially, Shoal launched the first instance of Bullshark in the first round of DAG and ran it until the first ordered anchor point was determined, such as in round r. All validators agreed on this anchor point. Therefore, all validators could confidently agree to reinterpret the DAG starting from round r+1. Shoal simply launched a new Bullshark instance in round r+1.

In the best case, this allows Shoal to sort an anchor in each round. The anchor point for the first round is sorted by the first instance. Then, Shoal starts a new instance in the second round, which has its own anchor point, sorted by that instance, and then another new instance sorts the anchor point in the third round, after which the process continues.

![Detailed Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-0b0928cb6240e994c1514c75e080a4b2.webp(

Reputation of Leaders

During the Bullshark sorting, skipping anchor points increases latency. In this case, pipeline technology is powerless because a new instance cannot be started before the previous instance sorts the anchor point. Shoal ensures that the corresponding leaders are less likely to be chosen in the future to handle the lost anchor points by using a reputation mechanism to assign scores to each validator based on the recent activity history of each validation node. Validators that respond and participate in the protocol will receive high scores; otherwise, validation nodes will be assigned low scores as they may crash, be slow, or act maliciously.

The concept is to deterministically recalculate the predefined mapping F from rounds to leaders during each score update, favoring higher-scoring leaders. To achieve consensus among validators on the new mapping, they should reach agreement on the scores, thus achieving consensus on the history used to derive the scores.

In Shoal, the pipeline and leadership reputation can naturally combine, as they both use the same core technology, which is to reinterpret the DAG after reaching consensus on the first ordered anchor point.

In fact, the only difference is that after sorting the anchor points in the r-th round, the validators only need to calculate the new mapping F' starting from the (r+1)-th round based on the causal history of the ordered anchor points in the r-th round. Then, the validating nodes execute a new instance of Bullshark using the updated anchor point selection function F' starting from the (r+1)-th round.

![A Comprehensive Explanation of the Shoal Framework: How to Reduce Bullshark latency on Aptos?])https://img-cdn.gateio.im/webp-social/moments-859e732e16c3eee0e2c93422474debc2.webp(

No more timeouts

Timeouts play a crucial role in all leader-based deterministic partially synchronous BFT implementations. However, the complexity they introduce increases the number of internal states that need to be managed and observed, which complicates the debugging process and requires more observability techniques.

Timeouts can also significantly increase latency, as it is important to configure them properly and they often require dynamic adjustments, as they are highly dependent on the environment ) network (. Before transitioning to the next leader, the protocol pays the full timeout latency penalty for a faulty leader. Therefore, timeout settings cannot be too conservative, but if the timeout period is too short, the protocol may skip over good leaders. For example, we observed that under high load conditions, leaders in Jolteon/Hotstuff were overwhelmed, and timeouts had expired before they could push progress.

Unfortunately, leader-based protocols like Hotstuff and Jolteon ) essentially require timeouts to ensure that the protocol makes progress whenever a leader fails. Without timeouts, even a crashed leader may forever halt the protocol. Since it is impossible to distinguish between a faulty leader and a slow leader during asynchronous periods, timeouts may cause validating nodes to change all leaders without Consensus activity.

In Bullshark, latency is used for DAG construction to ensure that honest leaders add anchors during synchronization.

DAG-0.3%

APT6.4%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

14 Likes

Reward
14
10
Repost
Share

Comment

0/400

OldLeekMaster

· 08-12 03:42

Buddha, this latency has dropped too harshly.

View OriginalReply0

probably_nothing_anon

· 08-11 15:09

Ah, shouting about the bull all day~ Aptos is hard at it with the technology again.

View OriginalReply0

MetaEggplant

· 08-11 02:35

Is it still possible to survive at such a fast speed?

View OriginalReply0

GateUser-c799715c

· 08-09 07:45

The performance improvement is quite good~

View OriginalReply0

AllInAlice

· 08-09 05:29

It's a tough guy, 40% reduction in latency.

View OriginalReply0

0xSherlock

· 08-09 05:28

The latency of this wave can be handled, which is a bit cool.

View OriginalReply0

BrokenYield

· 08-09 05:27

meh, another "optimization" that'll probably create more systemic risks than it solves... smart money's already hedging against this

Reply0

FromMinerToFarmer

· 08-09 05:26

80% Goodness, really impressive!

View OriginalReply0

BearMarketBard

· 08-09 05:02

The progress is really great, this is To da moon.

View OriginalReply0

Topic
#Token of Love: Cheer on Square & Win Tickets
19k Popularity
#Crypto Market Rebound
210k Popularity
#FOMC July Minutes
34k Popularity
#Show My Alpha Points
178k Popularity
#Crypto-Related xStocks Rally
5k Popularity

Sitemap