This proposal introduces a design for "minimum value selection" storage proofs on Merkle trees. The design consists of two main components:
Additionally, the proposal discusses the practical implementation of this design in various scenarios and suggests some improvements to the ERC-721 and ERC-1155 standards.
The ERC-721 and ERC-1155 standards are widely used in the NFT fields. However, the current standards do not provide a mechanism for verifying the existence of public data. This is a major obstacle to the development of many applications, such as decentralized data markets, decentralized data storage, and decentralized data oracles.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 and RFC 8174.
MixHash is a Merkle tree root hash value that incorporates data length information. Its structure is as follows:
Given a file, we can construct a MixHash through the following defined steps:
File MUST Split into 1KB chunks. MUST Pad zeros to the end of the last chunk if needed.
Calculate the hash for each chunk and the low 128bits is the Merkle Tree leaf value.
Construct a Merkle tree , root node hash algorithm is 256bits, other node use low 128bits of the 256bits hash result.
Return the combination of hash type, the file size, and the low 192 bits of the Merkle tree root node hash.
MixHash retains a length of 256 bits, so replacing the widely used Keccak256 and SHA256 with MixHash incurs no additional cost. Although including the file length in the upper 62 bits compromises security to some extent, the 192-bit hash length is already sufficient for defending against hash collisions.
The following is the pseudo code for generating Mixhash:
When MixHash is used to identify a piece of public data, anyone can construct a storage proof to demonstrate possession of a copy of that data. Here is a typical process for using a public data storage proof:
mix_hash_d) based on a block at height h. A 256-bit nonce value for the proof is derived from this block (usually directly using the block's hash).m. This is done by attempting to append the nonce value to the end of each chunk to minimize the new Merkle tree root hash. After determining m, the path m_path and leaf node value m_leaf_data of m are extracted.h using {mix_hash_d, h, m, m_path, m_leaf_data} and submits it to the public network.m, m_path, and m_leaf_data based on mix_hash_d: verifying that m is indeed a chunk of D. The timeliness of the proof can be verified through h. After passing both correctness and timeliness checks, the public network calculates proof_result_m based on the nonce value and existing proof information, and saves it.{mix_hash_d, h, better_m, better_m_path, better_m_leaf_data} to challenge the published storage proof.proof_result_m and proof_result_better_m. A successful challenge indicates the old storage proof was forged. If no one challenges the published storage proof within a certain timeframe, it can be considered correct from a game-theoretic perspective.With an understanding of the above process, let us describe the generation and verification of storage proofs more precisely using Pseudocode.
To minimize the size of the storage proof as much as possible, we have optimized the implementation of getMerkleTreeRoot: besides the RootHash, the hash values of other nodes are truncated to the lower 128 bits. This approach effectively compresses the hash value of a complete Merkle tree to half its size. The full implementation details can be found in the subsequent Reference Implementation section.
As can be seen from the process described above, the core of constructing public data storage proofs is based on a public, non-repeating nonce value generated at a specific moment. It requires traversing the entire content of the file within a designated time to construct a correct proof. Without restrictions, this process is vulnerable to external data source attacks: Suppliers do not store data locally but obtain it through network requests when constructing storage proofs. How does our design prevent such attacks?
Time-Limited Response: Suppliers must submit storage proofs within a specified time. On a typical public network like Ethereum, the block time is about 15 seconds. A typical maximum block interval could be 2 (MAX_BLOCK_DISTANCE = 2), meaning Suppliers must complete the construction and submission of the storage proof within 30 seconds. This duration is insufficient for most data sources to complete transmission, thus Suppliers must store data locally to have a chance to construct storage proofs within the allotted time.
Economic Game Theory: The economic model based on public data storage proofs usually rewards the first Supplier to submit a correct storage proof. This means that, from a game-theoretic standpoint, the inherent delay in using external data sources to construct storage proofs reduces the likelihood of successful submission. Economically, it's less profitable than the expected gains from storing data locally. The economic model incentivizes Suppliers to store data locally.
Using a strategy combining block interval limitations and priority for first-time submissions is often effective in defending against external data source attacks. The effectiveness of this approach primarily relies on the difference in speed between reading files from local storage and retrieving files from the network. We can define the success rate `R`` of defending against external data source attacks using the following formula:
The larger the AvgProofTime, the lower the success rate of defending against Sourcing Attack. Currently, the most significant factor affecting AvgProofTime is the average time for on-chain transactions. For example, in the BTC network, the time for 2 blocks is approximately 20 minutes. With such a large AvgProofTime, the success rate `R`` decreases rapidly, making sourcing attacks more likely to succeed. We can introduce a dynamically adjustable Proof of Work (PoW) mechanism to further defend against Sourcing Attack. This modification alters the formula as follows:
With the introduction of the Proof of Work (PoW) concept, the strategy for submitting storage proofs becomes: constructing and submitting storage proofs within a specified time while endeavoring to complete as much PoW computation as possible. In the valid proof time window, the storage proof with the greater amount of PoW computation prevails. Such a mechanism can effectively defend against external data source attacks, especially when AvgProofTime is large.
Integrating a PoW mechanism into the design of public data storage proofs is not complex. A simple implementation could modify the second step to:
The Pseudocode adjusted according to the above modifications is as follows:
Applying this mechanism increases the cost of generating storage proofs, which deviates from our initial intent to reduce the widespread effective storage of public data. Moreover, heavily relying on a PoW-based economic model may allow Suppliers with significant advantages in PoW through specialized hardware to disrupt the basic participatory nature of the game, reducing the widespread distribution of public data. Therefore, it is advised not to enable the PoW mechanism unless absolutely necessary.
The storage proofs discussed in this paper are not suitable for storing very small files, as small files inherently struggle to defend against external data source attacks.
Public data storage proofs do not address the issue of whether the data is genuinely public. Therefore, it is important to verify the public nature of MixHash in specific scenarios (which is often not easy). Allowing Suppliers to submit storage proofs for any MixHash and receive rewards would lead to a situation where Suppliers create data only they possess and exploit this to gain rewards through constructed attacks, ultimately leading to the collapse of the entire ecosystem.
We can use the existing Ethereum ecosystem to confirm whether a MixHash is public data and track its value. For any contracts related to unstructured data, the ERCPublicDataOwner interface can be implemented. This interface determines whether a specific MixHash is associated with the current contract and attempts to return an Owner address corresponding to a MixHash. Additionally, for the existing and widely recognized NFT ecosystem, we suggest that new ERC-721 and ERC-1155 contracts implement a new extension interface ERC721MixHashVerify. This interface can explicitly associate an NFT with a MixHash. The specific interface definition is as follows:
The ERC721MixHashVerfiy extension is OPTIONAL for ERC-721 smart contracts or ERC-1155 smart contracts. This extension can help establish a relationship between specified NFT and MixHash.
Storage proofs (often referred to as space-time proofs) have long been a subject of interest, with numerous implementations and related projects existing.
Using HashType allows storage proofs to be compatible with EVM-compatible public blockchain systems, as well as BTC-Like public blockchain systems. In fact, MixHash could become a new cross-chain value anchor: it can track the value of the same data represented by MixHash across different public blockchain networks using different models, achieving the aggregation of cross-chain values. Considering the need for backward compatibility, we have set the default HashType of MixHash to SHA256. Two categories of HashType remain unused, leaving ample room for future expansion.
PublicDataProofDemo includes test cases written using Hardhat.
PublicDataProof Demo
DMC public data inscription
Learn more background and existing attempts
This storage proof revolves around public data. In demonstrating storage proofs, it often involves sending 1KB segments of the data to the public network. Therefore, please do not use the storage proof design presented in this paper for private data.
The design of MixHash can support storage proofs for private files, but this requires some adjustments in the processing of the original data and the construction of the storage proof. A detailed discussion on the design of storage proofs for private files is beyond the scope of this paper. In fact, some of the projects mentioned in the Reference Implementation section use both public data storage proofs and private data storage proofs.
Copyright and related rights waived via CC0.