Zilliqa Technical Blog 22 October 2019

Apart from crossing the million-transaction mark around a week ago, the Zilliqa mainnet has also seen an uptick in smart contract processing over the past month.

Zilliqa Technical Blog 22 October 2019

— Latest upgrade speeds up Smart Contract transactions; recorded execution time have reduced by up to 60 percent!

Greetings everyone,

Apart from crossing the million-transaction mark around a week ago, the Zilliqa mainnet has also seen an uptick in smart contract processing over the past month. While the network has been mostly resilient enough to manage the load, there have been a couple of times (specifically, on October 2nd and 18th) where we experienced some downtime when the nodes were unable to handle the traffic. Fortunately, we utilised the opportunity during the more recent network stall to upgrade the mainnet to the new version 5.1.0. This new upgrade comes with several improvements that should help avoid similar incidents in the future. We explain in detail below:

Core Tech

The cause of both network stalls was essentially a mismatch in the number of smart contact transactions processed among the Directory Services nodes. This happened particularly in transaction epochs where smart contracts occupied the bulk of the processing time in each node. The mismatch manifests itself as a gas consumption difference, which becomes noticeable when smart contract transactions are involved.

  • In the 2nd October network stall, the mismatch occurred when the consensus leader processed fewer transactions than the rest of the committee. This scenario would normally be corrected by a view change, where the nodes elect a new leader among themselves. However, we discovered a flaw in our state reversal logic that was triggered only in this exact scenario. The lack of proper state reversal means that nodes don’t eventually get restored to the exact same state pre-consensus, and thus view changes are expected to occur indefinitely. This ultimately led to the observed stall
  • In the 18th October network stall, the mismatch occurred when the consensus leader and most of the committee processed more transactions than a relatively smaller number of nodes. In this case, a majority of the committee was still performing up to par so that smart contracts didn’t influence the result of the consensus. However, those that did fail to finish processing the same list of transactions ended up rejecting the mined block, and as a result, fell out of sync with the rest of the committee. Those nodes are effectively inactive for the rest of the DS epoch and can only rejoin after the next round of proof-of-work. This situation happened twice in the same DS epoch, and by the second time there were enough slower nodes to prevent the committee from acquiring the minimum number of nodes required to reach consensus. At this point, with a significant portion of the DS nodes no longer active participants in the network, a view change is insufficient to correct the situation, and a stall is inevitable

Following the introduction of version 5.1.0 into the mainnet, we are now able to address these stalls along multiple fronts:

  • First, the state reversal bug obviously has been fixed. If another view change occurs from hereon, all the nodes should now be able to revert to the right states with or without smart contracts
  • Second, version 5.1.0 also includes two other notable bug fixes. One is related to our view change operation itself. It turns out we weren’t properly saving the latest leader after view changes. This is of little consequence for miners in the mainnet, but it does make our stall recovery operations much slower as a consequence. Another bug fix involves our code for terminating transaction processing when the gas limit is exceeded. Earlier, we were still unnecessarily processing a transaction even though doing so would go beyond the gas limit. Such a transaction should be skipped, and version 5.1.0 rightfully does that
  • Third, an ounce of prevention is always better than a pound of cure. As alluded to in our blog post a few weeks back, we’ve added code to prevent a miner from launching a Zilliqa node in a machine that doesn’t meet the minimum hardware specifications. By preventing such nodes from entering the network in the first place, we can at least enforce some level of computational capacity for the mainnet collectively and provide better stability to the network. Miners will also know it immediately if their node is below the minimum hardware specification
  • Finally, apart from prevention and corrective actions, we should, of course, continue to explore enhancing our transaction processing capabilities. In version 5.1.0, we’ve upgraded our libjson-rpc-cpp library from 0.7.0 to 1.2.0. After conducting profiling on our smart contract execution, we discovered the older library version had a performance bottleneck around a busy-wait loop with “usleep” in the socket listening code, which the newer version removed in favor of non-blocking sockets. Interestingly, with the library upgrade alone, we are seeing smart contract transaction execution time reduced by up to 60%

Version 5.1.0 includes other improvements and fixes not described here (including the improved sharding algorithm previously described), but overall we should see further stability in the mainnet moving forward.

Scilla updates:

As previously mentioned, we are working on providing support for zkSNARK primitives in Scilla. This work is now complete and will be part of the next release of Scilla. Meanwhile, work on the scilla compiled execution backend is progressing. We have setup the compiler (written in OCaml — to be open-sourced in the near future) to link to LLVM OCaml bindings, and also a basic build system (using CMake) for the Scilla virtual machine and run-time library, which are both to be written in C++.

We also have a couple of bug fixes to the Scilla code base:

  1. Fix a bug in dune config files (the build framework used by Scilla) which eliminates error messages that was seen in IDEs when browsing the Scilla OCaml codebase, and now providing more accurate type information to the IDE.
  2. Fixed a bug in our pattern checker: the pattern checker wasn’t being run for code imported from libraries.

For further information, connect with us on one of our social channels: