ArkStream Capital: A deep dive into zero-knowledge investment opportunities about the blockchain scaling and privacy protection (1)

15 min readAug 25, 2022

Ray

Preface

Currently, in the network of Bitcoin or Ethereum, many transactions are pending and waiting for the bundle in block to be confirmed. Since the confirmation mechanism has severely damaged the user experience, transaction congestion has become an urgent problem waiting for a solution in the entire industry. In addition, due to the transparency of blockchain, once the address is marked, the privacy for all of our transactions is gone. With the development of DeFi, in the dark forest of cryptocurrencies, the demand for privacy protection has never been so urgent. When the DeFi lego block is on the verge of collapse, the clear and visible liquidation price and sniping address liquidation have become the important cause of the unstoppable price down.

Geeks and developers have spent years exploring the two major puzzles: network concurrent capacity and privacy. Zero-knowledge-proof technology with its unique characteristic is just as important as the transparency, decentralization, and immutability of blockchain technology. It brings systematic solutions to blockchain scaling and privacy issues. This article will start with the definition and implementation of zero-knowledge proof, using examples from many popular case projects, discussing ZK’s exploration in the two major areas: scaling and privacy. From the perspective of investors, the investment opportunities will be thoroughly pondered.

Table of Contents:

· Definition of zero-knowledge proof

· Implementation of zero-knowledge proof

· Virtual machine of zero-knowledge proof

· Zero-knowledge proof and scaling

· Zero-knowledge proof and privacy

· Investment trends of zero-knowledge proof

Definition of zero-knowledge proof

In socializing and people’s everyday life, the majority tends to believe what they see and hear, however, the owners of valuable information or knowledge want people to believe that they have ownership over the information without revealing the core secrets. To address such a demand, Zero-knowledge proof is born. Utilizing more academic expression, the zero-knowledge proof aims to prove and verify that “the knowledge is true’’ for both parties, in which the owner of knowledge is the prover and the other party is the verifier.

We can use a rough sketch to directly illustrate what zero-knowledge proof can achieve.

Figure 1: Proof without ZKP and with ZKP

From this figure, we can dig out three properties of zero-knowledge proof:

Completeness: an honest prover who proves “the knowledge is true” can certainly convince the verifier to believe.

Soundness: the prover cannot prove “false knowledge” as “true knowledge”

Zero-knowledge: in the whole proof, the verifier will not obtain any knowledge related to secrets except for “knowledge is true”.

These three properties can be stated in three simple expressions that are: truth cannot be false and falsity cannot be true; the unknown is true and the unknown remains unknown. The third property, in a more colloquial way, it is can be understood as: I tell “what is true”, you verify “what is true”, but you don’t know what it is exactly.

Figure 2: Three properties and two features of the zero-knowledge proof

The first two properties are easier to understand and notice than the third one, the third property is really easy to get confused and get

neglected. When both verifier and prover parties know the content of knowledge, it is easy to verify that “knowledge is true”, but it doesn’t fit into the zero-knowledge requirement. Conversely, if zero-knowledge is satisfied, it is difficult to verify. Just consider, how can we determine there is no knowledge leaking during the verifying and proving processes? If there is no knowledge leaking, how does the verifier know the knowledge is true? After all, he doesn’t know what the knowledge is, or even, he doesn’t know whether the knowledge exists or not! However, the fact is that knowledge must exist, otherwise, there is no point to prove the “knowledge is true”!

Let’s talk about more specific cases, for example, we all believe that Satoshi Nakamoto must exist in this world, otherwise, Bitcoin would never have been invented. However, many years later, we do not know whether Satoshi Nakamoto is alive in this world. If Satoshi Nakamoto is alive, he wants to tell the world he is alive. Then, Satoshi Nakamoto can use zero-knowledge proof to deliver the message that “I am still alive”, now the world can verify and obtain this message base on zero-knowledge proof, but whether Satoshi Nakamoto uses a private key or whether he signs in to the old account of BitcoinTalk to verify himself remain unknown. Another scenario is when Satoshi Nakamoto invented the bitcoin, the Genesis address was the owner’s first bitcoin address. When this address uses the private key for activities, at the same time it delivers a message that Satoshi Nakamoto is still alive. But this no longer satisfies the property of zero-knowledge, because the world knows Satoshi Nakamoto is still alive and his private key is still active.

Simply speaking, both knowledge privacy and knowledge validity can be ensured by the zero-knowledge proof. These two factors directly decide zero-knowledge proof can be applied to two scenarios in the crypto world: privacy and scaling. Without a doubt, there are more subdivided applications in privacy scenarios, such as privacy payment, anonymous vote, and even privacy public blockchain, etc., it can also be universally understood as asset payment privacy and logical universal privacy. As for the scaling scenarios, they are jointly achieved by the completeness and soundness of the zero-knowledge proof as well as other technical means.

Implementation of Zero-Knowledge Proof:

Zero-knowledge proof is collaborated, invented, and implemented by using a lot of mathematical and cryptographic knowledge. The implementation of the whole proof process can be divided into interactive proof and non-interactive proof based on whether it is interactive or non-interactive. The interactive proof is a proof process that requires both the prover and the verifier to alternately proceed with certain orders and rules, and then complete the proof by random probability. The non-interactive proof is a proof process that requires the prover to compute and submit all the proof materials at one single time according to the proof rules or the proof procedures and then the verifier can directly use these proof materials for verification. In another extreme understanding, the non-interactive proof is compressing multiple steps of interactive proof into one-step interaction. Additionally, non-interactive proof can be seen as separating the entire implementation into a process of proof and a process of verification.

Next, we use two different scenarios to simulate the differences between interactive proof and non-interactive proof.

Simplified scenario 1. Let’s suppose that Satoshi Nakamoto’s Genesis private key reappears in the world, now, it is necessary to prove if this private key is still active.

Processes of interactive proof:

1. The verifier speaks to the Genesis private key prover: write a sentence after a certain block height: Hello World.

2. The Genesis private key prover will leave a message to the whole network after a certain block height as required: Hello World.

3. The verifier verifies if the Genesis private key leaves a message in the network after a certain block height as required: Hello World.

4. Repeat these three steps 1, 2, and 3. When the number of times reaches a certain level, it can be determined from the perspective of probability that the private key is basically active.

Processes of non-interactive proof:

A device/emulator/Turing machine generates a shared, random string XYZ, and then the string is published, the Genesis private key prover is required to leave the message to the whole network in the form of “XYZ, Hello World”.
According to the rules of the device/emulator/Turing machine, the Genesis private key prover uses the string XYZ, with its own Hello World, and leaves the message to the whole network: “XYZ, Hello World”.
The verifier who holds the string XYZ can directly decide that the private key is active after seeing the message from the Genesis private key.

Simplified scenario 2. Assuming that person Alice understands the Gaussian algorithm, person Bob does not.

Processes of interactive proof:

1. Bob asks Alice to compute 1+2+3+……+8887+8888.

2. Alice computes the result in 30 seconds by using the Gaussian algorithm, and then Alice tells the result to Bob.

3. Bob uses simple addition to examine whether the result is correct.

4. Repeat these three steps 1, 2, and 3. When the number of times reaches a certain level, it can be determined from the perspective of the possibility that Alice understands the Gaussian algorithm.

Processes of non-interactive proof:

1. A device/emulator/Turing machine generates a shared, random whole number X.

2. Alice computes the result of 1+2+3+……+(X-1) +X by using the Gaussian algorithm.

3. Bob uses simple addition to examine whether the result is correct, if it is, then Alice understands the Gaussian algorithm.

Through the examples above, we can find that non-interactive proof is superior to interactive proof: 1. It does not rely on a specific verifier, it is an excellent solution when 1 prover faces N verifiers; 2. It is not limited to the interactive moment, it can be done at any moment. However, we also notice that simulator/Turing machines are needed to participate in the non-interactive proof!

From the scenarios, the core concept of zero-knowledge proof is introduced: the emulator/Turing machine can hide the knowledge but still be able to determine “knowledge is true”, therefore the proof and verification can smoothly proceed.

To achieve the core functionality of the emulator/Turing machine in non-interactive proof, the current technical schemes are Random Oracle and Common Reference String (CRS), and CRS is the most widely used scheme. We consider that there is no significant difference between Random Oracle and Common Reference String. CRS must be generated by a trusted third party, then it is shared with both verifier and prover. Here, the generation of CRS must be guaranteed to be random, thus, the process of CRS generation can be addressed as a trusted setup.

We can compare the emulator/Turing machine with God, God is public, it can not only help the prover to hide knowledge and complete the proof but also identify the false and true knowledge, and assist the verifier to complete verification. God is perfect, however, ultimately God has to be created by men, the process of creation can be correspondingly applied to the trusted setup. Furthermore, God has applicability, and it is not necessarily universal, for example, Jesus is the God of the west, and Tathagata is the God of the east (applicability of trusted setup). If we think one step further, God may also be unreliable. (Trusted setup security)

All proofs have to come back to specific procedures. Just like how we solve the mathematic proof question, there are procedures of solution. As for zero-knowledge proof, there is a specific terminology for the procedure which is called: Circuit. For QAP/QSP (quadratic arithmetic program and quadratic span program), Boolean circuits and arithmetic circuits, we can roughly classify them as the specific procedures of non-interactive proof. There are thousands of mathematical proof questions in this world, we need to write corresponding procedures to solve all these proof questions. In the same way, circuits are specially written for zero-knowledge proof under different knowledge. Of course, certain types of writing processes for knowledge proofs can be simplified by sharing the circuit framework. For the different mathematical proof questions, we need to write down separate solving procedures for (specific circuits), and for one certain type of mathematical proof question we can write general answering procedures (general circuits), even more, there is another type of question, we can directly cite from completed answering procedures and conclusion. (Composability and interactivity of circuits).

Let’s talk about circuits location based on previous mathematical proof questions

It is known that N is an integer, please prove:1+2+3+……+n-1+n= (1+n) * n/2

Answering procedures/circuit:

1. When n=1, 1= (1+1) *1/2=1, conclusion achieved;

2. Assume that when n=k, conclusion achieved, meaning: 1+2+3+……+(k-1) +k=(1+k) * K/2

3. So, when n=k+1, thus 1+2+3+……+k+(k+1) =(1+k) * k/2+(k+1) =(1+(k+1)) * (k+1) / 2

4. When n = k+1, the conclusion achieved

5. So, we know that the proof equation is achieved.

As regards other concepts and terminology, such as the rank-1 constraint system (R1CS), nondeterministic polynomial problem (NP), polynomials, knowledge of polynomials, factorization, fuzzy computation, information, knowledge, circuit satisfiability, complete safety, semantic safety, indistinguishability, mapping, homomorphism, finite field, cyclic group, Fiat-Shamir transformation, The Elliptic Curve Digital Signature Algorithm (ECDSA)……. Due it involves too many concepts and mathematic computations, except for mathematician and cryptographist, we don’t recommend spending too much brainpower and energy in this aspect. The only key information we need to know is that the implementation of different zero-knowledge proof schemes lies in whether the trusted setup is introduced, the applicability of the trusted setup and the difficulty level of writing circuits.

For implementing different zero-knowledge proof schemes, we can use five dimensions to determine the pros and cons of implementation schemes, which are time cost, space cost for both verifier and prover as well as security.

The time cost of prover: that is the computation time, which determine the prove process speed

The time cost of verifier: that is the time consumed to verify;

The space cost of Prover and verifier: these combine to become the concept : proof size, which decides the requirements for storage space, andit is often called succinctness.

Security: some protocols need to introduce the trusted setup, and corresponding implementation schemes rely on the trusted setup for security. In addition, there are applicability differences in trusted setup, which are the permanent type and one-time type.

Differences in implantation schemes lead to two major factional protocols: zk-SNARK stands for “zero-knowledge Succinct non-interactive argument of knowledge, and zk-STARK stands for “zero-knowledge scalable transparent argument of knowledge”.

zk-SNARK: Zero-Knowledge Succinct Non-Interactive Arguments of Knowledge. It is a non-interactive proof that requires a trusted setup, different implementation schemes would require different trusted setups and effects. Since the beginning of ZCash which has been adopted and developed for many years, there are also plenty of specifically optimized implementation schemes, some of the famous schemes are ZCash, zkSync, Aztec, etc. Due to the long time period, a large number of derivative protocols were born: Schnorr protocol, Pinocchio protocol, Marlin protocol, Sonic Protocol, Libra protocol, PLONK protocol, Spartan protocol and BulletProof protocol and so on.

zk-STARK: Zero-Knowledge Scalable Transparent Argument of Knowledge. It is a non-interactive proof that does not require a trusted setup and requires post-quantum cryptography, it also demands large space costs.

Invented and proposed by Starkware team , nowsome dedicated projects are using such as dYdX (currently announced shifting to Cosmos), Immutable and Deversifi.

Considering the impossible trilemma of blockchain, the implementing schemes of zero-knowledge proof can be addressed as the imperfect trilemma, although all the implementing schemes can achieve completeness, soundness and zero-knowledge at the same time, however, due to the requirements of applied scenarios, there will always be trade-offs. The more realistic situation is if we focus on completeness and soundness, the zero-knowledge side will be weakened or even neglected (scaling Layer 2). Furthermore, zero-knowledge itself does not have the nature of decentralization, if the protocols/projects require decentralized features, then they will need to combine with a decentralization design. (Extremely difficult)

Zero-knowledge proof virtual machine

After exploring different implementation schemes, next, we are going to talk about how we are going to launch and apply schemes. Nowadays advanced programming languages like C++, Java, Rust, and Solidity are widely used in internet applications and products. Virtual machine/interpreter is the black box for program execution, they can translate the program into the language that the machine can recognize and understand, and then substitute the machine to run the program. If there is no zero-knowledge proof virtual machine, when we use zero-knowledge proof technology to program, we always need to write a corresponding implementing circuit for the program. This type of circuit writing, testing and productivity are difficult and low efficient. For this very reason, we need to make zero-knowledge technology more popular and efficient, so it is necessary to have a zero-knowledge virtual machine (zkVM) which can generate proofs, it can also submit and verify programs. As for which advanced programming language the zkVM is going to adopt, whether it is C++, Java or other specifically designed circuits programming advanced languages (Stareware’s Cairo, zkSync’s Zinc), all these depend on the design and capability of the zkVM. It should be emphasized that zkVM does not require to use in blockchain area.

For zkVM, it is very important that they can support programs universally if they can only virtualize certain types of programs, the versatility of the zkVM will be greatly reduced. Apart from versatility, there are other indicators to consider, which include succinctness, recursive availability, composability and accessibility.

Here we are mainly talking about recursive availability:

Question: Assuming that the computer only supports two-digit addition and doesn’t support multi-digit addition and multiplication.

Please compute 1+2+3+…….+99+100?

The non-recursive version of the solution:

1+2=3, 3+3=6, 6+4=10,…, 4950+100=5050

A total of 99 additions were performed. (The concept of iteration is not introduced here, please search and learn if you are interested)

Recursive version of solution:

1+2+3+….+99+100

In this case, we only compute the total sum of 1 and (2+3+….99+100), the same method can be used again for the computation of (2+3+….+99+100), which is to switch from computing (2+3+…+99+100) to compute 2 and (3+..+99+100) instead, by repeating this method till it meets the restriction of adding two digits, that is 99+100=199, finally, we use 199 to end the recursion and compute the result.

recursive availability allows you to directly reuse the solution. This can greatly simplify the thought process of solving practical problems.

Previously we mentioned that the use of zkVM has no hard link to the blockchain, so, next we are going to try and explore the combination of zkMV and blockchain. First of all, let’s go back to Ethereum, this innovative world computer with smart contracts or in a more technical way a distributed virtual machine. Broadly defined EVM (Ethereum virtual machine) is equivalent to Ethereum and includes modules such as variable machine state, read-only EVM code and stored account data. Narrowly defined EVM is equivalent to read-only EVM code and EVM opcode of the Ethereum virtual machine.

As far as Ethereum is concerned, the whole network maintains the canonical chain of consensus all the time, this canonical chain is determined by the longest block. The transactions is bundled in Block will drive the Ethereum world state to change. When we complain about the shortage of network throughput and privacy, the real root cause is the bottleneck and flaws of Ethereum’s internal components. At the moment, let us enjoy the beautiful expectation for now if all the modules in the whole Ethereum can be zero-knowledge proven, then the computation of transactional execution will no longer require Ethereum’s processing, Ethereum only needs to verify the validity of transactions and Ethereum’s internal account data can be selectively open and transparent according to demands. In this way, the transaction processing capability of Ethereum can be greatly released, and privacy data on the chain can also be ensured. With the current diversity of zero-knowledge proof implementation, Ethereum is not limited to a certain type of zero-knowledge implementation, it can compatibly support the implementations of all types of mainstream zero-knowledge proof, and at the same time, it can keep openness and scalability

The expectation is always beautiful, but the reality is often bitter or even cruel. When Ethereum was founded, introducing zero-knowledge proof was not considered. Therefore, if Ethereum is directly zero-knowledge proven, the biggest problem that comes with it will be the vast number of computations caused by components of Ethereum’s protocols, in essence, the advantage of zero-knowledge proof is not utilized at all. Sure, Ethereum will iterate and update itself with time and technological development. Just like the epochal change of understanding consensus mechanisms from PoW to PoS (Proof of Work to Proof of Stake), it is possible to achieve Ethereum zero-knowledge proven through future Ethereum upgrades.

Figure 5: ETHEREUM VIRTUAL MACHINE (EVM)

Figure 6: ETHEREUM VIRTUAL MACHINE (EVM)

On the 4th of August, 2022, the founder of Ethereum Vitalik Buterin published an article: The different types of ZK-EVMs. This article is similar to the Ethereum EVM categories previously proposed by the Scroll team, and Ethereum researcher Justin Drake. We can have a clearer understanding of the pros and cons of zkEVM/zkVMs from the figures below.

Figure 7: The different types of ZK-EVMs

Now, let’s briefly summarise the progress of the whole zkVMs/zkEVMs technology team and project.

The study and implementation of zkVM and zkEVM face different implementing problems, there is no sequential dependency, so they can progress simultaneously. For specific design challenges and solutions, please read the information and documents from each project. There are different considerations for different projects, at the moment, most of them will prioritize the zkEVM method, after all the solid demand for scaling will always exist. In addition, Ethereum’s ultimate goal for ZK is to be fully compatible. From this perspective, research projects based on zkEVM maybe evolve into Ethereum’s multiple light nodes client end. As for zkVM, we can only say that blockchain is going to be its partner on the path, and its distant dreams are not limited to the blockchain.

References:

[1] https://github.com/sec-bit/learning-zkp/blob/master/zkp-resource-list.md zero-knowledge proof learning resources summary

[2] https://mp.weixin.qq.com/s/808jMXvIUqB973aVHrAzGQ Foresight Ventures: understanding zk, zkVM, zkEVM and the future

[3] https://www.odaily.news/post/5178462 Scroll study: design challenges and solutions of zkEVM

[4] https://www.odaily.news/post/5177903 LD Capital: An overview of ZK all-star project

[5] https://mirror.xyz/cvalleylive.eth/Deag4YB_EYHaTDv0lc5iPhLpWxzMLG1tyC2GSTwJE4k hardcore interpretation: why is ZK important？

[6] https://ethereum.org/en/developers/docs/evm/ Ethereum Virtual Machine

ArkStream Capital: A deep dive into zero-knowledge investment opportunities about the blockchain scaling and privacy protection (1)

Written by ArkStream Capital

No responses yet