The Zilliqa Design Story Piece by Piece: Part 1 (Network Sharding)

Note: ➤ We’re now on Slack! Join our community, ask us questions, and get updated on the latest (and hopefully the greatest!)

As introduced in our previous introductory article, Zilliqa is a new blockchain platform capable of processing thousands of transactions per second. Zilliqa hence has the potential to rival traditional payment methods (such as VISA and MasterCard). Even more importantly, Zilliqa’s transaction throughput increases (roughly) linearly with its network size.

This article starts a series in which we break down Zilliqa’s design piece by piece. In this article, we present the core idea of sharding that makes Zilliqa scale. Sharding in Zilliqa takes many forms: network sharding, transaction sharding, and computational sharding. The most important being network sharding as the other sharding mechanisms are built atop the network sharding layer.

So, what is network sharding anyway?

Well, network sharding (which we will refer to in this article simply as sharding unless it is unclear from the context) is a mechanism that allows the Zilliqa network to be divided into smaller groups of nodes each referred to as a shard. Simply put, imagine a network of 1,000 nodes, then, one may divide the network into 10 shards each composed of 100 nodes.

Network sharding is the secret sauce that makes Zilliqa truly scalable. Imagine our example network of 1,000 nodes. Zilliqa would automatically divide the network into 10 shards each with 100 nodes. Now, these shards can process transactions in parallel. If each shard is capable of processing 10 transactions per second, then all shards together can process 100 transactions per second. The ability to process transactions in parallel due to the sharded architecture ensures that the throughput in Zilliqa linearly increases with the size of the network.

The idea of sharding is certainly not new, and in fact it can be traced back to the field of databases, where it is employed to improve performance, scalability and I/O bandwidth. The idea of sharding in the context of blockchains however was first put forth in an academic paper co-authored by Zilliqa team members in 2015.

Eh! Is that it? Sounds so simple!

While the key idea is simple, making it work in practice is not quite so. To understand the underlying challenges, let us first start with some of the key problems that we would like to solve to make sharding a reality:

  1. Sybil resistance. Since Zilliqa is a public blockchain platform, it is open to any person with a working computer. On one hand, for any public blockchain such as Zilliqa to work, it must have a sufficient number of nodes. But, on the other hand, by having a public blockchain, one also opens up to malicious nodes. By definition, malicious nodes will try to subvert the system by spanning multiple nodes and influence any majority-based decision making process. This is what is often referred to as sybil attacks.
  2. Shard creation. Let us assume for a while that the problem of sybil resistance is somehow solved. So, we have say 1,000 nodes in the network and now we wish to create 10 shards each having 100 nodes. How should we do it and who decides which node gets assigned to which shard? It certainly cannot be a node or a set of nodes controlled by say Zilliqa team members. This is because if the Zilliqa team ever became malicious (only hypothetically), then they could cluster all malicious nodes in a single shard and compromise the security of Zilliqa. By the way, in case you do not know who Zilliqa team members are, take a break and see the team composition. We will be waiting for you right here.
  3. [Actually right here!] Shard size. The question that we ask here is simple: Can we have a tiny shard, say composed of 10 nodes? Well the answer is of course no! Things cannot be that simple, can they? In fact, if the shard size is small, then it becomes easier to be taken control of by attackers. Moreover, since these shards run the rest of the Zilliqa protocol (such as consensus), we certainly do not want to have a tiny shard (potentially composed of only malicious nodes) making decisions on which transaction can be accepted or rejected.

How does Zilliqa solve these issues?

Below, we present the approach that Zilliqa takes to solve the above issues.

  1. Keeping sybils in check. There are several possible ways to make sybil attacks costly or difficult to mount. For instance, by asking nodes to deposit a considerably large sum of money (or tokens) as a collateral. Or by asking them to perform some computationally intensive task also known as proof-of-work (PoW). Zilliqa uses PoW. Every new node who wishes to join the Zilliqa network has to first perform a PoW. Existing nodes in the network validate the PoW and authorize the node to join the network. PoW serves as an entry ticket to the network. Only nodes holding valid tickets can join the network. PoW makes it hard for any real-world entity to span multiple nodes.
  2. Automated shard creation. The use of PoW in Zilliqa automatically provides a way to create shards. Zilliqa first elects a dedicated set of nodes called the directory service committee (or DS committee). The election is based on PoW. At regular intervals aka a DS-epoch, one of the members of the DS committee is pushed out and a new member is added (using a first-in-first-out policy). Hence, at any time the size of the DS committee is fixed.The new node that gets into the committee is the one that solves the PoW the fastest. Once the DS committee is elected, it initiates the sharding process. All the other nodes in the network now perform another PoW. This PoW is then validated by the DS committee. Depending on the PoW submission and some randomness, each node gets assigned to a specific shard. The last few bits of a PoW submission decide which shard the node will be assigned to.
  3. Choosing the right size. Choosing the right shard size is critical to the security of the system. Consider our example network of size 1000 of which 1/4 (i.e., 250) are malicious nodes and the remaining 3/4 are honest. Now, let us create a shard of different sizes and check the fraction of malicious nodes in it. Shard creation using PoW is equivalent to sampling a subset of nodes uniformly at random. This is because PoW submissions are generated using a hash function. The figure below shows the probability that at least 1/3 of the shard members are malicious as a function of shard size. Note that with our example shard size of 100, this probability is around 0.04. As we want to have an honest super-majority (later for consensus), a shard size of 100 is clearly insecure. The good news is that the probability decreases with the increase in shard size. Starting from 600 nodes, the probability drops to 1 in a million. For this reason, Zilliqa always considers a minimum shard size of 600.
Probability that shard (of a given size) has at least 1/3 malicious nodes. Since we want to limit the number of malicious nodes, the smaller the probability is, the more secure Zilliqa will be. For shard size of 100, the probability is considerably high. In Zilliqa, we chose a shard size of 600 which yields a probability of 1 in a million.

If you are curious to know how we derive the probability, take a look at the sharding paper.

What does sharding bring to the table?

As discussed at the beginning of the article, network sharding opens up avenues for parallel transaction processing — each shard should now be able to independently process transactions and hence yield high throughput. In fact, whenever a transaction reaches the network, it gets assigned to a specific shard. The assignment is determined by the first few bits of the sending address of the transaction. This is called transaction sharding.

Sharding also allows to perform computations or run smart contracts in a very efficient manner. For instance, a subset of shards can act as mappers, the others as reducers and perform a map-reduce task very efficiently. This is also referred to as computational sharding.

Closing remarks

A shard can however still have malicious nodes in it. Recall that any malicious node (just as an honest node) can do PoW and join the network. It is crucial that despite the presence of these malicious nodes (in limited numbers), the shard must be able to agree on a new set of transactions and propose the next block.

How Zilliqa builds a secure and efficient consensus protocol with this sharded architecture is the topic of the next piece of this series.

Here’s how you can follow our progress — we would love to have you join our community of technology, financial services, and crypto enthusiasts!

➤ Follow us on Twitter,

➤ Subscribe to our Newsletter,

➤ Subscribe to our Blog,

➤ Ask us questions on Slack