Arkstream Capital: Why We Are Investing in Verifiable Computation and Space and Time
Date: Aug 28, 2024
TL; DR
1. Verifiability is one of the core characteristics of blockchain, ensuring basic transparency and security through independent verification of every operation or transaction on the network.
2. On-chain verifiability refers to performing all computation and execution on the blockchain, while off-chain verifiability involves moving some data and computation off the chain.
3. The verifiable computation layer is an off-chain verifiability service that supports computation scenarios requiring high trust and security.
4. Zero-Knowledge Proof (ZKP) technology plays a crucial role in the development of the verifiable computation layer.
5. Relational databases are based on the relational model, which is grounded in the mathematical principles of set theory and first-order logic, providing a rigorous formal framework for data storage and manipulation.
6. The computational theory of the relational model is relational algebra, which provides a set of mathematical operations for data manipulation, such as selection, projection, and join.
7. SxT’s Proof of SQL technology enhances the verifiability of SQL, bridging the gap between Web2 and Web3.
8. SxT’s core cryptography team has developed and open-sourced the Blitzar GPU acceleration framework to speed up Proof of SQL.
9. SxT achieves HTAP (Hybrid Transactional/Analytical Processing) through its data fabric architecture, enabling real-time transactions and complex analysis. It also stores common data from multiple mainstream blockchains in a ZKP-friendly manner, providing tamperproof, real-time, and decodable data services.
I、Blockchain Verifiability and the Verifiable Computation Layer
Blockchain Verifiability
Verifiability refers to the ability to independently verify every operation or transaction within the network, ensuring its authenticity and legality. For blockchain technology, verifiability is one of its core features, ensuring basic transparency and security.
In Satoshi Nakamoto’s Bitcoin whitepaper, he proposed a new peer-to-peer electronic cash system aimed at eliminating reliance on third-party intermediaries. The core of this system is the verifiability of transactions. Every transaction in the Bitcoin network is verified and confirmed through a Proof-of-Work (PoW) mechanism. Miners create new blocks by solving complex mathematical problems, and these blocks contain several transaction records. Each block is linked to the previous one through a hash function, forming an immutable blockchain.
In this system, transaction verifiability is achieved by multiple participating nodes. Each node can verify the legitimacy of the transactions within a block, and these verification processes are public and transparent. Anyone can download the open-source Bitcoin client to view and verify transaction records on the blockchain. This transparent and verifiable mechanism ensures the credibility and security of Bitcoin transactions, preventing the double-spending problem.
Ethereum extended the application scope of blockchain by introducing smart contracts, enabling more complex transactions and applications. The verifiability of smart contracts lies in their code and execution results. Smart contracts are code that runs on the Ethereum blockchain, and their code is public, allowing anyone to view and audit it. The process of verifying a smart contract’s execution is carried out by every node in the network. Each node independently executes the smart contract code and verifies the consistency of the results. This decentralized verification mechanism ensures the reliability of smart contract execution results and eliminates the risk of a single point of failure. All contract execution processes are recorded on the blockchain, ensuring transparency and immutability.
On-Chain and Off-Chain Verifiability
When all transactions and smart contract executions occur on the blockchain and are jointly verified by the network nodes, we refer to this as on-chain verifiability — for example, Bitcoin’s P2HK transactions. However, in some cases, concentrating all data and computation on the blockchain might be inefficient and costly. To address this issue, the industry has developed off-chain verifiability solutions. These solutions include Rollups using optimistic fraud proofs and zero-knowledge proofs, and decentralized oracle networks like Oracles.
On-chain verifiability relies on the blockchain’s consensus mechanism and node verification, ensuring that all transactions and computations are transparent and secure. However, off-chain verifiability enhances efficiency and reduces costs by moving part of the data and computation off-chain, using independent verification networks or relying on highly trusted on-chain verification networks. For instance, Rollup technology reduces the on-chain load by processing and batching a large number of transactions before submitting them to a layer-one network, thereby inheriting the security of the layer-one network. Decentralized oracle networks use off-chain data sources and computation, submitting the results on-chain to ensure data reliability and accuracy.
Off-chain verifiability overcomes the limitations of pure on-chain verifiability, making it applicable to more scenarios. It not only improves the scalability and efficiency of blockchains but also reduces transaction costs, making complex computation and large-scale data processing possible. Whether in finance, gaming, or social networking, off-chain verifiability shows broad application prospects.
Verifiable Computation Layer
The verifiable computation layer is an off-chain verifiability service that supports a wide range of computation tasks. This layer is particularly suited to scenarios requiring high trust and security, including but not limited to:
1. AI Model Inference: Executing artificial intelligence model inference processes to ensure accuracy and traceability of the results.
2. Blockchain Data Indexing: Providing fast query and indexing services for blockchain data, enhancing the efficiency and convenience of data retrieval.
In this way, the verifiable computation layer expands the application scope of blockchain and AI, allowing complex computations to be executed off-chain while maintaining the verifiability and transparency of the results.
Zero-Knowledge Proof (ZKP) technology plays a crucial role in the development of the verifiable computation layer. As ZKP technology continues to advance, the verifiable computation layer can be segmented into several specialized domains to meet different computational needs and application scenarios:
1. ZKP Coprocessor Network: This network consists of specially designed coprocessors optimized for zero-knowledge proof computation. These coprocessors can efficiently execute and verify complex computations that are difficult to handle on traditional blockchains, thereby expanding the computational capabilities of the blockchain.
2. General-Purpose Verifiable Computation Layer: This is a broader computation platform designed to provide cross-platform verifiable computation services. It not only serves the blockchain industry but also seamlessly integrates with existing Web2 systems. This versatility allows verifiable computation to be applied in a wide range of fields, including but not limited to financial services and artificial intelligence.
The general-purpose verifiable computation layer has several significant advantages compared to the ZKP co-processor network:
1. Broader Applicability: It is not limited to specific computational tasks and can adapt to a wide variety of computational needs.
2. Facilitates Technological Integration: As a bridge for transitioning from Web2 to Web3, it aids in integrating existing systems with blockchain technology, driving innovation and diversifying applications.
3. Enhanced Interoperability: By offering compatibility with Web2 environments, the general-purpose verifiable computation layer enhances interoperability between different systems and platforms.
On the other hand, the general-purpose verifiable computation layer’s complexity in technical implementation can be a drawback. It needs to accommodate a variety of computational tasks and scenarios, which might lead to higher abstraction and compatibility design requirements. Additionally, maintaining broad applicability may introduce additional performance overhead. In contrast, the ZKP coprocessor network, focused on specific tasks, might be more efficient.
The broader verifiable computation layer includes various subtypes, such as Zero-Knowledge Virtual Machines (zkVM), Zero-Knowledge Machine Learning (zkML), Zero-Knowledge Coprocessors (zkCoprocessor), oracles, and some general-purpose verifiable computation layers. In certain application scenarios, these technologies may overlap in functionality — for example, between zkCoprocessors and general-purpose verifiable computation layers — but they generally complement each other, creating synergies. To clearly define their respective functional scopes and advantages, we have provided a simple distinction between them.
Ⅱ、Why We Invested in Space and Time
One promising project actively exploring the general-purpose verifiable computation layer is Space and Time(hereafter referred to as SxT). By integrating advanced zero-knowledge proof technology, SxT has independently developed key technologies, including Proof of SQL and a high-performance GPU proof generator. SxT plans to launch a Layer 2 based on the zkSync technology stack, providing high-performance verifiable computation services for AI and blockchain.
Before diving into Proof of SQL, let’s briefly review the basics of relational databases and SQL, which will help us better understand the concept and mechanism of Proof of SQL.
An Introduction to Relational Databases and SQL
Since their inception in the 1970s, relational databases, based on the relational model, have been the cornerstone of data management. The relational model is grounded in the mathematical principles of set theory and first-order logic, providing a strict formal framework for data storage and manipulation. The introduction of relational databases represented a major shift in data management, moving from early file systems to a structured and normalized tabular form. This shift made data organization more orderly and operations more consistent and reliable. A common analogy is that a table in a relational database is akin to a worksheet in Excel, representing an entity in the relational model. A single table can perform queries and create indexes, while different tables can be associated with each other through relational operations.
The computational theory behind relational models is relational algebra, which offers a set of mathematical operations for manipulating data, such as selection, projection, and join. Relational algebra defines the basic units of data operations, allowing data to be queried and manipulated without altering the data itself. SQL (Structured Query Language) is a high-level query language based on relational algebra theory. It provides a declarative way to query, update, and manage data. SQL is designed so that users can write queries in a syntax close to natural language, while the underlying database management system is responsible for optimizing and executing these queries.
With SQL, relational databases can perform complex data computations and support various queries, including multi-table joins, nested subqueries, and aggregation operations. This capability allows relational databases to efficiently handle not only daily transactional operations (such as insert, update, and delete) but also execute complex data analysis tasks, such as business intelligence and report generation. Modern relational databases have further extended SQL’s capabilities to support window functions, recursive queries, and large-scale data processing in distributed environments, enhancing their application in complex computations.
The combination of relational databases and SQL makes data management and analysis more intuitive and efficient. In today’s computing landscape, relational databases remain a core component of many applications, and SQL continues to play a key role as the primary tool for querying and manipulating data in large-scale data processing and complex computations.
Architecture and Implementation of Relational Databases
A relational database system is composed of two main parts: the client and the server, which work together to ensure efficient data management and operations.
The client acts as the bridge between the user and the database, enabling users to execute SQL queries and commands easily through a graphical user interface (GUI) or a command-line interface (CLI). The core responsibilities of the client include:
1. Providing an interface for querying and data manipulation, allowing users to retrieve, add, update, or delete data in the database.
2. Supporting data analysis, enabling users to apply advanced data processing techniques such as aggregate functions and window functions.
3. Managing the database, including maintaining database objects and controlling user permissions.
The server is the heart of the database system, responsible for handling client requests, executing data operations, and maintaining data storage. The server’s key responsibilities include:
1. Query Processing: This involves parsing the user’s SQL queries, devising execution strategies, and executing these queries.
2. Data Storage: Ensuring data persistence and managing storage structures like tables and indexes to optimize read and write efficiency.
Due to its complexity, the server typically comprises multiple components, such as:
1. Storage Engine: Responsible for the physical storage of data and indexing mechanisms. Different storage engines may be optimized for specific types of operations.
2. Parser: Reads and analyzes SQL statements, converting them into an internal format that can be executed.
3. Indexer: Creates and maintains indexes to speed up data retrieval.
4. Query Optimizer: Parses SQL queries and generates efficient execution plans.
5. Executor: Executes the SQL according to the parsed query plan, retrieving or modifying data.
6. Transaction Manager: Ensures the atomicity, consistency, isolation, and durability (ACID) of database operations.
Note: The components such as the parser, storage engine, optimizer, and executor can collectively be understood as the Query Execution Engine.
Through this layered and modular design, relational databases can flexibly adapt to various data management needs while maintaining operational efficiency and data integrity.
Among the various relational databases that implement SQL, there is MySQL, which is open-source and suitable for rapid development and small to medium-sized applications; PostgreSQL, which is standards-compliant and ideal for complex queries and data integrity requirements; and SQL Server, which is enterprise-grade and offers comprehensive integrated tools.
Proof of SQL
Even the most sophisticated design can pose potential risks if misused. SQL is no exception, with one of the most well-known risks being SQL injection attacks. Attackers can embed malicious code into legitimate SQL queries, manipulating the database to execute unauthorized operations. This can lead to sensitive data leaks, data tampering, or even complete system control. Traditionally, parameterized queries, input validation, and the principle of least privilege are employed to prevent SQL injection attacks. However, despite these defensive measures, potential risks in SQL query execution still exist, and data users may find it challenging to effectively verify the execution process.
SxT’s Proof of SQL technology enhances the verifiability of SQL, bridging the gap between Web2 and Web3. This technology has already been integrated into Google BigQuery and is available on the Microsoft Azure Marketplace.
The timing interaction diagram in the Proof of SQL documentation clearly shows the interaction points and positions of zero-knowledge proofs (ZKP) in the SQL execution process, along with additional mechanisms introduced by SxT, such as data ingestion and commitment. Proof of SQL abstracts three core roles: the Data Source, the Verifier, and the Prover. Before creating a data table, Proof of SQL ensures the original integrity and tamper resistance of the data through data ingestion and commitment steps. When executing a specific SQL query, the Prover’s role is not only to execute the traditional SQL query but also to generate and submit zero-knowledge proofs, thereby enhancing the security and verifiability of the query.
To support different functions, SxT has designed a three-layer decentralized node architecture, consisting of Prover, Validator (Indexing Node), and Transaction Serving (Consensus Node). In this architecture, the Prover’s role aligns with the previously discussed timing interaction diagram, while the roles of the Data Source and Verifier are executed by the Validator (Indexing Node) and Transaction Serving (Consensus Node).
Validator (Indexing Node) is a lightweight node that does not require GPU support and focuses on data indexing and generating cryptographic fingerprints for the Prover nodes. These nodes obtain raw blockchain data from RPC providers, decode it, and convert it into a relational data model. They submit indexed data to transaction nodes for consensus approval and efficient data storage. Additionally, Validators enhance data security by developing cryptographic signature libraries used by data providers before ingestion. The design of Validator nodes supports an updatable commitment scheme, providing accurate cryptographic assurances for each row of data in the database. Their commitment signatures are crucial for verifying each transaction and event on the blockchain.
Transaction Serving (Consensus Node) handles multiple network requirements, including Byzantine Fault Tolerance (BFT) consensus, serialization/compression during data ingestion, and providing low-latency, tamper-resistant query services. These nodes use threshold signatures to reach consensus when receiving indexed blockchain data commitments from the Validator nodes, buffer raw data for storage compression, and use consensus services to decentralizedly verify transactions. The design of Transaction Serving (Consensus Node) employs Multi-Version Concurrency Control (MVCC) to ensure data consistency. It buffers data in memory as Apache Arrow record batches before compressing them into Parquet files, optimizing storage efficiency. Additionally, transaction nodes utilize the Apache Datafusion library to provide fast query services without the need for zero-knowledge proofs. They also offer a comprehensive REST API to satisfy common blockchain data requests, such as token transfers, balance queries, transaction history, and block metadata.
ZKP Proof Generation and Acceleration Framework
In ZKP projects, proof generation is not only the foundation of technical implementation but also the cornerstone of realizing the value of ZKP. Without a reliable prover, the practical application and potential value of ZKP technology cannot be achieved. Ensuring the accuracy of ZKP proof generation is a prerequisite for the widespread adoption and continued development of this technology.
The process of generating proofs by SxT’s Prover is a highly integrated four-step process. First, the Prover parses the SQL query text and creates an Abstract Syntax Tree (AST), which serves as the foundation for subsequent steps. Next, the Prover executes the query and constructs an interactive protocol for proof generation, which relies on the data in the database. During the query execution, the Prover generates necessary intermediate values, known as “witnesses,” which are critical for verifying the query execution but are unknown to the verifier. The Prover then computes commitments, which are cryptographic representations of the witness data, used to prove its existence without revealing the data itself. Finally, the Prover constructs constraints, which are part of the proof, to demonstrate the correctness of the query results to the verifier. Throughout the process, the Fiat-Shamir transformation in ZKP allows the proof process to be non-interactive, meaning the verifier does not need real-time communication with the Prover to verify the validity of the proof.
Moreover, since the Prover is crucial to the execution of ZKP technology, its efficiency directly impacts usability. ZKP technology can only realize its application value and be adopted in real-world production environments if proof generation time is near real-time or accelerated to a user-acceptable range. To this end, SxT’s core cryptography team developed and open-sourced Blitzar, a GPU acceleration framework designed to speed up Proof of SQL. Performance benchmarks available on SxT’s official website provide a clear view of its outstanding performance.
HTAP and Community-Operated Data Warehouse
The diverse demands for data operations have led to two different focuses in SQL implementations: OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing). OLTP focuses on online transaction processing, mainly handling data writes and updates, while OLAP focuses on data computation, querying, and reading. For example, PostgreSQL is an OLTP database emphasizing data integrity, whereas Snowflake excels in real-time analysis as an OLAP database.
In the blockchain world, both OLTP and OLAP are needed. To address this, SxT implements HTAP (Hybrid Transaction/Analytical Processing) through its data fabric architecture. This architecture integrates high-performance OLTP and OLAP capabilities, enabling it to handle real-time transactions and complex analyses. It is particularly optimized for processing blockchain data, capable of handling massive events and transactions from major blockchains. Additionally, as a decentralized multi-node system, SxT can easily scale beyond 1TB of data while providing row-based OLTP in-memory caching and column-based GPU-accelerated OLAP, ensuring high throughput and real-time data analysis. Currently, SxT has stored general data from multiple mainstream blockchains in a ZKP-friendly manner and offers real-time, decodable, and tamperproof data services.
Meanwhile, SxT’s whitepaper introduces the design of a decentralized data warehouse, offering smart contracts an efficient way to reduce computational and storage burdens, thereby accelerating contract execution and lowering on-chain transaction costs. This solution aligns with the decentralized ethos of Web3, enabling decentralized control of data through community-owned and operated nodes. Users can directly manage and access data while choosing between data transparency or privacy protection through zero-knowledge proofs as needed. As more chains and participants join the network, SxT’s network effects and scalability are significantly enhanced, enriching public datasets. This design not only allows data consumers to verify the accuracy of information but also ensures that everyone can participate in the data collection, processing, and service provisioning process, promoting the democratization of data sharing and usage.
AI and RAG
Large language models (LLMs) face limitations in training, such as context window size, generalization capability, external knowledge access restrictions, and the repetitiveness and predictability of generated content. To overcome these limitations, Retrieval-Augmented Generation (RAG) technology leverages vector search databases to store and manage data in vector form. By calculating the distance between query vectors and vectors in the database, RAG can retrieve data points that are semantically closest to the user’s input, thereby extending the context window for LLMs and enhancing the specificity and accuracy of answers. Although RAG enhances LLM functionality, it also faces challenges such as processing latency and a high dependency on the quality of the retrieved information. The intelligence of RAG-enhanced LLMs largely depends on the relevance and quality of the information retrieved by the vector search database. It is foreseeable that the next generation of LLMs will integrate RAG technology, traditional training methods, and structured data processing. Currently, SxT has launched Proof of SQL and Proof of Vector Search technologies specifically designed for RAG. These technologies provide LLMs with a new approach, enabling them to integrate the latest contextual information, access a wide range of data sources in real-time, and perform in-depth analysis of structured data — all achieved in a traceable and verifiable manner.
Application Scenarios
Due to SxT’s SQL compatibility and verifiability, it has distinct advantages in both Web2 and Web3 application scenarios, making it highly attractive to numerous projects. Its whitepaper details multiple application scenarios, including:
· Building flexible ZK-Rollup/L2 solutions
· Enabling data-driven dApp development
· Providing secure bridging and multi-chain data backends
· Supporting decentralized lending and derivatives markets
· Enhancing reward systems in gaming and social applications
· Ensuring transaction security and custody of digital assets
· Implementing tokenization of physical assets and dynamic NFTs
Additionally, it can provide transparency and security for settlement systems, third-party audits, liquidity pool management, and CeFi platforms.
Ⅲ、Conclusion
The story of the verifiable computation field is still in its early stages, even though several projects have been deeply involved in this field for years. Over the past six months, this concept has gradually gained attention. For instance, the Bonsol project, launched by a research institution closely collaborating with the Solana Foundation, and the recent efforts by Fabric, a company focusing on achieving verifiability from the chip level, both demonstrate the active nature of this field. In this wave of technological innovation, SxT has made continuous progress in the field of Proof of SQL technology through years of R&D and has also proposed innovative solutions in key data technologies like RAG, keeping pace with the development of AI. In the future, we have every reason to believe that SxT will achieve widespread application across multiple fields, driving technological advancement and industry development!
Notes
1. The difference between Integrity and Verifiable is that Integrity ensures data has not been tampered with, while Verifiable ensures the authenticity and legality of data and operations through independent verification.
2. Chainlink’s Functions can achieve some verifiable computation functionalities, but the use cases are relatively limited.
3. Verifiable can be seen as a higher-order feature, akin to self-actualization in Maslow’s hierarchy of needs, positioned above usability and security.
Reference
Space and Time Website:
https://www.spaceandtime.io/sxt-platform/indexed-data-and-integrations#indexed-data
Succinct Website:
Axiom Developer Docs:
https://docs.axiom.xyz/protocol/zero-knowledge-proofs/introduction-to-zk
Archetype Verifiable Compute:
https://archetype.mirror.xyz/Lov-dI8FOueUt4J4MXPH9gXLyS4VXfHCdEmSg67jzoM
ABCDE zk Coprocessor:
https://medium.com/@ABCDE.com/en-abcde-a-deep-dive-into-zk-coprocessor-and-its-future-1d1b3f33f946
RISC Zero Developer Docs:
https://dev.risczero.com/api/next/use-cases
Lagrange Website:
https://www.lagrange.dev/zk-coprocessor
Brevis SDK Doc:
SxT Blitzar:
https://github.com/spaceandtimelabs/blitzar-rs
Chainlink Functions:
https://functions.chain.link/playground
Google Bigquery:
Microsoft Azure Marketplace:
Anagram Bonsol:
https://blog.anagram.xyz/bonsol-verifiable-compute/
SQL Joins:
https://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
SxT Proof of SQL:
https://github.com/spaceandtimelabs/sxt-proof-of-sql
Proof of Vector:
https://www.spaceandtime.io/blog/vector-search-to-success
MySQL Architecture:
https://dev.mysql.com/doc/refman/8.4/en/pluggablestorage-overview.html
ArkStream Capital is a venture capital firm specializing in early-stage investments in Web3 unicorns.
Founded by crypto experts with pedigrees from MIT, Stanford, Tencent, Google, and BlackRock, ArkStream leverages eight years of deep Web3 expertise to drive the zero-to-one growth of its portfolio companies.
ArkStream Capital is managing a portfolio of over 100 companies, including Aave, Flow, Sei, Manta, Fhenix, Merlin, Particle Network, and Space and Time.