From BTC Script to Subscript: Anatomy of Smart Contract Language
The Extropy Institute website defines a smart contract in a broad sense: a code program that allows users to define the required transaction logic by themselves, which exists in almost all Blockchain systems. Including the most widely known Bitcoin, Ethereum, Hyperledger, Parity, Zcash, etc.
Consider the programming language performance or operating environment. Smart contracts can be divided into three types: script type, Turing complete type, and verifiable contract type.
The Bitcoin system can allow simple transaction logic to be implemented by writing a stack-based Opcode. For example, to change the prerequisites for Bitcoin spending, this system is called the Bitcoin script system. Ethereum provides a smart contract platform based on the Turing complete language, and it is also the earliest Turing complete smart contract.
The Ethereum system provides the Ethereum Virtual Machine (EVM), and the contract code runs inside the EVM. Ethereum users write smart contract code in a specific language and compile it into EVM bytecode to run. Hyperledger provides another Turing complete smart contract, which runs language-independent smart contracts in a Docker container environment. The smart contract code can be written in any programming language, and then compiled by the compiler and packaged into the Docker image, using the container as the operating environment.
With the development of the Polkadot multi-chain system, developers have designed a new development language for cross-chain interoperable smart contracts. For example, Polkadot’s TrustBase project provides a smart contract development system based on the Substrate framework and compatible with the Web Assembly (WASM) virtual machine. And independently developed a new easy-to-use language Subscript to develop Polkadot native smart contracts.
This article will review the mainstream smart contract development languages in Bitcoin, Ethereum and Polkadot. From the original Bitcoin scripting language to today’s Subscript, readers can get a glimpse of the development context of the smart contract language.
1. Bitcoin Scripting Language
This is a stack-based Reverse Polish Notation (RPN) simple execution language, which is used to write the Unspent Transaction Output (UTXO) Locking Script and Unlocking Script of Bitcoin transactions. Locking Script determines the conditions required for cost output. Unlocking Script is used to meet the conditions determined by the Locking Script on UTXO, unlock and pay. When a transaction is executed, the Unlocking Script and Locking Script of each UTXO are executed simultaneously. According to the execution result (True/False), determine whether the transaction meets the payment conditions.
Bitcoin scripting language is designed to be very simple, similar to embedded devices. Script instructions are called Opcodes, which are divided into constants, flow control, stack operations, arithmetic operations, bit operations, cryptographic operations, reserved words, etc. The OP-DUP mentioned later belongs to stack operation instructions. The script is a non-Turing complete language, and the included Opcodes do not have loops and complex flow control functions. It can only be executed a limited number of times, avoiding infinite loops or other types of logic bombs caused by negligence in writing. The limited execution environment and simple execution logic of Bitcoin scripts are conducive to verifying the security of programmable currencies and can prevent script loopholes from being exploited by malicious attackers.
Most transaction costs processed by the Bitcoin system are output locked by the “Pay to Public Key Hash (P2PKH)” script. The Locking Script contains the hash value of a public key (Bitcoin address), which is verified by a script containing the digital signature created by the public key and the corresponding private key when unlocking. For example, if user A pays a transaction to user B, the Locking Script can be expressed as:
OP＿DUP OP＿HASH160〈B Public Key HASH〉OP＿EQUALVERIFY OP＿CHECKSIG
When user B unlocks the transaction, use the unlock script containing user B’s digital signature and public key:
〈B Signature〉〈B Public Key〉
The nodes in the Bitcoin system combine Unlocking Script and Locking Script to form a verification script:
〈B Signature〉〈B Public Key〉OP＿DUP OP＿HASH160
〈B Public Key HASH〉OP＿EQUALVERIFY OP＿CHECKSIG
The verification script is put into the stack for execution. The output result determines the validity of the transaction.
2. Ethereum Turing-complete language
Because scripting languages such as Bitcoin do not have Turing-completeness. The smart contract transaction mode written is very limited and can only be used for virtual currency applications. Therefore, Vitalik Buterin launched an Ethereum smart contract platform that supports Turing-complete language. Ethereum provides a dedicated development language for smart contracts, most other systems or platforms use general-purpose programming languages.
2.1 Solidity Language
1) Special data type — — Address. The smart contract running on Ethereum is treated as a special account-contract account. Similar to external accounts, contract accounts are also located by a 20-byte address. Therefore, the Solidity language is designed to define the Address of the contract.
2) Flexible variable declaration. In the scope, the definition of state variables is declared and called. There is no absolute order relationship. The definition statement can follow the call statement.
3) Two data storage methods: Memory and Storage. Memory is similar to the variable storage method of other high-level languages, and is recycled after use. The default function parameter is the Memory type. However, there are many states on the Blockchain that need to be permanently recorded. State variables are saved as Storage type by default. In the user programming process, you can also use keywords to flexibly specify the data storage mode manually.
4) Digital currency payment attributes. The Payable keyword makes it support the payment and collection of digital currencies such as ETH at the code layer. This allows the contract to accept transactions and hold a certain amount of currency.
5) Support the exception mechanism of rollback. For abnormal events, it is not to let the program capture and process. Instead, it triggers the automatic processing of the corresponding code for the rollback to ensure the consistency of the state data in the contract and the atomicity of contract execution.
6) Strictly control visibility. There are 4 visibility definitions for functions and state variables, namely External, Internal, Public, and Private. It is used to limit the calling and access rights of functions or state variables inside and outside the contract and in the inheritance relationship. Solidity also supports some unique variables for the characteristics of smart contracts. They act on the global namespace. Mainly used to obtain the relevant information of the Blockchain, as listed in Table 1.
Table 1 Special variables in smart contract programming language
2.2 Serpent language
Serpent’s design is very similar to Python. It is a high-level language dedicated to writing smart contracts, with low-level language efficient and easy-to-use programming style and features for smart contracts. The latest version of the compiler is written in C++ language, with the purpose of being able to embed client programs more widely. Although Serpent and Python are similar, there are many differences:
1) The value of Serpent cannot be greater than 2256, otherwise overflow will occur;
2) Serpent does not support Decimal numeric types;
3) Serpent does not support List, Dictionary and some other advanced features;
4) Serpent does not have the concept of first-class functions. Although you can define functions in the contract, you can also call your own functions, but in the process of calling, variables (except the Storage type) are not permanent;
5) Similar to Solidity. Serpent supports the concept of persistent storage variables, which are Storage variables;
6) Similar to Solidity. Serpent can use extern statement to call other contracts, or call functions from other contracts;
7) As a programming language running on the Blockchain, Serpent also supports the special variables in Table 1.
2.3 Verifiable language Pact
The Pact language is similar to the Haskell language. Used to write smart contracts that run directly on the Kadena Blockchain. It is mainly used in commercial transactions where safety and efficiency are required.
The Pact smart contract consists of 3 parts: Tables, Keys, and Modules. They are responsible for contract data storage, contract authorization verification, and contract code code.
The main features of the language are:
1) The logical structure of the language is Turing incomplete, and does not support loops and recursion;
2) The code is human-readable and runs embedded on the Blockchain;
3) Support componentized design and import;
4) Support key row and column database mode;
5) Support type inference;
6) Support key rotation;
7) Supports integration with industrial databases.
The Pact syntax design is similar to the LISP language. The code structure facilitates rapid analysis and execution of the syntax tree. There is a section of function code for calculating the average value:
“take the average a and b”
The code defines the average function to calculate the average of two numbers. This syntax feature enables the computer to execute code more quickly.
2.4 Hyperledger smart contract language
Hyperledger smart contract Chain code is generally written in Golang, and also supports other programming languages such as Java. Go is developed by Robert Griesemer, Rob Pike, Ken Thompson from the end of 2007. The language, which was finally open sourced in November 2009, is Turing complete. Every Go program is composed of packages and always starts execution from the main package. Go language has the following characteristics:
1) Good concurrency mechanism, the program can make full use of multi-core and networked machines. Go language introduces goroutine to implement concurrency mechanism and uses message passing to share memory.
2) Simple design. The code style is concise, the format is uniform, the readability is high and the maintainability is high. The language has only 25 keywords, but it can support most of the features supported by other programming languages such as inheritance, overloading, objects, etc.
3) Built-in C language support. The language can directly include C language code and utilize the existing rich C library.
4) Error handling. Go language uses 3 keywords to handle exception errors. Unlike the Try-Catch module of Java language, it can greatly reduce the amount of code to handle exceptions.
5) Support automatic garbage collection. There is no need for the delete keyword in the Go language, nor the free method to explicitly release the memory.
2.5 TrustBase smart contract language
The development language of TrustBase Parachain, Subscript, is a native smart contract language designed for WASM. It can support any smart contract development platform compatible with the Substrate architecture to develop native smart contracts of Polkadot.
At the same time, TrustBase Parachain has developed IDE and testing tools that support Subscript language. Subscript provides the following development tools to build a complete contract development ecosystem:
1) Subscript Workbench: Browser-based IDE development environment;
2) Subscript Onechain：TrustBase’s contract chain;
3) Subscript Tempest: smart contract testing and verification framework;
Subscript uses an account-based method for data storage and deploys contracts in the form of transactions. The deployed smart contract carries a state rent, and the contract will be suspended when the rent runs out. This design is very compatible with the ecology of Polkadot smart contract cross-chain execution.
Smart contracts developed based on Subscript can be upgraded, which is similar to Solidity contracts. The contract library functions are implemented in the contract language with AS. Contains basic cryptographic functions, information on the chain (random number, Block height, Block time, etc.), and smart contract operations (transfers, calling other contracts, calling modules on the chain).
Because the Subscript language has easier to understand semantics. It is helpful for developers to use Turing-complete flexibility. As a result, smart contracts developed based on Subscript will have fewer security vulnerabilities.
Subscript is designed for WASM from API to syntax. Strict syntax type and language checks are used overall. At the same time, a paradigm function is provided to support the encapsulation of third-party package. The specific implementation is as follows:
1) Static syntax checking. It is different from TypeScript, which is targeted at the dynamic type runtime environment. Subscript has strict static syntax checking when compiling, avoiding the dynamic feature of TypeScript not being able to effectively compile in advance. By assigning or inferring certain types, the compiler can produce predictable performance from the start of execution while ensuring that the generated WASM object code is small.
2) Strict type. The basic types in Subscript are designed for the WASM standard, using WASM-specific integer and floating-point types. When dealing with numeric types, it allows developers to realize the ideal type of numeric values.
3) Low-level access support. When the smart contract interacts with the external environment of the sandbox, the parameters that can be passed are limited to basic integer types. Subscript provides a complete syntax that can be used to define external interface types. Subscript also comes with its own command functions that can access the underlying WASM, providing integer operations, virtual machine stack access, memory loading and other operations.
4) Paradigm support. Can define paradigm types to support code reuse. Subscript defines a series of reusable library functions through paradigms.
In terms of library functions, Subscript provides a wealth of library functions for developers to call. Library functions are divided into three parts: standard library, core library, and extended library.
2.5.1 Subscript library functions
The Subscript standard library includes functions such as basic mathematical operations, array operations, string processing, and memory access; The Subscript Core Lib can be directly accessed through functions in the contract code, such as:
1) Contains basic cryptographic functions: blake2b, sha3, sha256;
2) Information on the chain (random number, block height, block time, etc.)
3) Smart contract transfer, call other contracts, call other modules on the chain (XCMP cross-chain messaging, pledge, governance, etc.)
Subscript Support Lib is a modular contract library function collection, including many commonly used contract templates. By extending the contract template in the library, developers can automatically integrate currently widely used contract functions, enhance the security of the contract, and avoid repeated development of basic functions. The contract functions included in the extension library are:
1) ERC20 compatible contract, providing token library functions compatible with ERC20 interface;
2) ERC721 compatible contract, supports the creation of non-homogeneous tokens;
3) Permission control contracts, which provide account-based permission control functions for extended contracts by using contract base classes;
4) Agency contract, through the abstract contract interface to achieve the contract’s upgradeable function;
5) Governance contract, which provides on-chain governance through voting;
6) Multi-signature contract, capable of supporting multi-signature address contracts in multiple account formats.
3. Polkadot cross-chain smart contract language comparison
Because Subscript is a language specially developed for the Polkadot cross-chain smart contract ecosystem. So Subscript is most suitable for comparison with Parity’s Ink! language developed based on Rust. In contrast, Subscript provides easier-to-use features:
1) No need for Rust development foundation, Web developers can get started quickly;
4) Easy-to-use development environment support, can run deployment and test based on existing jsIDE.
Table 2 Compare all language features that appear in the text
Except the common features listed in the table, the Subscript language realizes compatibility with the WASM virtual machine based on the Polkadot Substrate framework. Unlike Solidity’s EVM compatibility, WASM has better backward compatibility with Polkadot’s fragmented multi-chain structure. Developers use Subscript language to develop Polkadot smart contracts, which will not generate historical baggage similar to the Ethereum smart contract platform. With the enrichment of the Polkadot Parachain ecosystem in the future, this will become crucial important.
The author of this article started with the Bitcoin scripting language and explained the development process of the smart contract development language. Analyzed the characteristics of Polkadot’s cross-chain smart contract language Subscript. In September 2020, the Subscript language has successively completed the preliminary development and debugging. Obtained the Web3 Foundation Grant certification. It can be seen that with the development of different tracks of the Blockchain. Especially with the emergence of Polkadot’s cross-chain smart contract track, the new language that matches the track requirements will be tried and used by more developers.