EIP-7745: Trustless log and transaction index

An efficient, light client and DHT friendly replacement for block header bloom filters


Metadata
Status: DraftStandards Track: CoreCreated: 2024-07-17
Authors
Zsolt Felföldi (@zsfelfoldi)
Requires

Abstract


Replace the fixed 2048 bit log event bloom filters with a new lookup index data structure that can adapt to the changing number of events per block and consistently guarantee a sufficiently low false positive ratio, allowing efficient trustless proofs of log event queries, canonical block hash and transaction hash lookups.

The proposed structure maps all index entries (log events, transaction and block markers) onto a global linear index space and hashes them into a binary Merkle tree based on that index. It also contains a filter map for every fixed length section of the index space. These are two dimensional sparse bit maps that provide an efficient probabilistic method for looking up indexed values or query patterns of values, yielding potential matches in the form of exact positions in the linear index space. Unlike the per-block bloom filters, they allow searching for specific events by accessing only a small portion of the entire dataset which can also be proven with a Merkle proof, making the search both efficient and light client friendly.

The proposed structure can be efficiently used both for local search and for remote proof generation/verification, thereby simplifying implementation of provers and verifiers. It also allows validators that are not interested in either searching or proving logs to generate the index root hash by maintaining a minimal index state with a relatively small (hard capped) size.

Motivation


Adding logs has a significantly lower gas cost and should accordingly be less resource consuming than writing to the state. The original design of bloom filters in each block achieves this goal as there is no complex data structure like the state to update, the set of logs emitted in each block is all contained within the header and receipts belonging to that block. Logs mostly just have long term storage costs. On the other hand, searching logs in a long range of blocks is very expensive and the current bloom filters do not really help.

Bloom filters are only useful as long as they are sufficiently sparse. False positive ratio rises rapidly with the number of events per filter and the density of 1 bits in the filter bit vector. In the currently existing bloom filter each log address and topic sets 3 out of a fixed length of 2048 bits which resulted in sufficiently sparse filters in the beginning but with the increase of the block gas limits the false positive ratio soon made the filter practically useless. Mainnet blocks currently add over 1000 log addresses and topics in average and therefore the bloom filter size would need to increase about tenfold in order to achieve acceptable false positive rates again. This would raise block header size to about 3 kilobytes. Even if the size of the per-block bloom filters would be raised to a sufficient level, log search would still require accessing the entire header chain. Searching in just the most recent one year history would cost over 6 gigabytes of data, not counting the access of actual logs where the bloom filters have a match. The current situation is even worse, requiring a large portion of the full block receipt sets to be accessed due to the high false positive rate of the bloom filters. Transaction inclusion lookup is also very expensive, requiring all block bodies in the searched range.

While full nodes and RPC servers can and usually do build additional index structures to aid certain lookups, these structures cannot be verified remotely. While it is important to improve the transaction processing capabilities of Ethereum, trustless provability is required throughout the entire stack, from end user applications signing transactions, to the same or other end user applications getting the results they need. Scaling transaction processing only makes sense as long as there is also a trustless and scalable way to access the relevant results of those transactions. Users relying on trusted servers and indexers kind of beats the whole point of Ethereum.

Specification


Terms and definitions

  • index entry: a single entry in the index_entries tree associated with an indexed event. It is either a log entry, a transaction entry or a block entry. An index entry generates one or more map values. Each log entry adds one address value and 0..4 topic values (in this order). Each transaction entry adds one transaction value and each block entry adds one block value.
  • map value: a searchable value associated with an index entry. It is either an address value, a topic value, a transaction value or a block value. Each map value is represented by a 32 byte hash (the map value hash) which is calculated as sha2(address), sha2(topic), sha2(tx_hash + b"\x01") or sha2(block_hash + b"\x02").
  • map value index: a single position in the global linear index space. A new map value index is assigned to each indexed map value.
  • map entry index: the position where an index entry is located in the index_entries tree. Each index entry is located at the position corresponding to the map value index of the first map value it generates.
  • filter map: a MAP_WIDTH by MAP_HEIGHT sized sparse bit map assigned to a VALUES_PER_MAP length section of the map value index space. Each map value assigned to this range is marked on the map at a row and column that depends on the map value index, mapping layer and the map value hash. Rows are sparsely encoded as a list of marked column indices (in strictly ascending order, which also coincides with the order of occurence). Each map contains at most VALUES_PER_MAP marks and therefore the chance of false positives is kept at a constant low level.
  • mapping layer: each layer defines a (map index, map value hash) => row index mapping and imposes a limit on the number of map values mapped into each row using that particular mapping. Multiple mapping layers can be applied simultaneously on the same filter map. Each map value is mapped using the lowest possible mapping layer where the number of marks in the assigned row is lower than the limit. On layer 0 the row index mapping only changes once per index epoch. On higher layers the mapping changes more frequently and also the row size limit is higher.
  • index epoch: a MAPS_PER_EPOCH sized group of consecutive filter maps stored in the hash tree in a way so that multiple rows of adjacent filter maps with the same row index can be efficiently retrieved in a single Merkle multiproof.

Consensus data format

Block headers

Beginning at the execution timestamp FORK_TIMESTAMP, execution clients MUST replace the logs_bloom field of the header schema with log_index_root which is the root hash of the LogIndex structure after adding the logs emitted in the given block.

The resulting RLP encoding of the header is therefore:


Container types

These definitions use the ProgressiveList and ProgressiveByteList SSZ types as defined in EIP-7916.

Note that the LogIndex and IndexEpoch container definitions below are "naive" definitions suitable to define SSZ merkleization. The entire log index is typically too big to be represented in memory as a whole. A more efficient implementation can be found here: log_index.py, binary_tree.py (taken from the EELS implementation). This code keeps only the currently processed parts of the tree in memory, expanding untouched empty subtrees on demand and collapsing finalized subtrees.

Also note that the containers defined here (including FilterRow) have a non-initialized default value, represented as a null leaf in the Merkle tree. Initialization of these containers is explicit and happens at a well defined point. IndexEpoch is initialized when the first index entry is added to the epoch. All rows of a map are initialized when the first index entry is added to the map. IndexEntry is initialized when added; entry positions belonging to unused map entry indices are left uninitialized. The Log container is only initialized in case of log entries. If it is initialized then topics and data lists in the Log container are always initialized.


The entry_meta vector has a fixed format but its interpretation depends on the type of index entry and the type of entry can also be determined based on its contents (and based on whether log_entry is initialized or not). Whenever IndexEntry is initialized, entry_meta contains either one of these containers:


Index entry types and corresponding map values

Three types of index entries are added during state transition: log, transaction and block. For each processed transaction, a transaction entry is added first, then log entries are added in the order of EVM execution. The block entry referring to the processed block is not added during the state transition because it is supposed to reference the block hash which is not known yet. It can be added either after block building/validaton or are the beginning of the next state transition. In either case, the block entry of block N first appears in the log index state of block N+1 as the first new index entry.

The following table shows an example of mapping all index entry types onto the map value index space:

Index entryMap entry indexMap valuesMap value indices
Block #00Block0
Tx #1/01Transaction1
Log #1/0/02Addr, 3x Topic2, 3, 4, 5
Log #1/0/16Addr, 3x Topic6, 7, 8, 9
Tx #1/110Transaction10
Log #1/1/011Addr, 2x Topic11, 12, 13
Log #1/1/114Addr, Topic14, 15
Log #1/1/216Addr, 2x Topic16, 17, 18
Block #119Block19
Tx #2/020Transaction20
Log #2/0/021Addr, 4x Topic21, 22, 23, 24, 25

Note that the missing map entry indices are represented in the index_entries tree with uninitialized IndexEntry containers (zero value leaves in the index_entries vector). Most of these empty entries correspond to the topic value indices. Also, the last few entries of each map might be emtpy because the map values generated by a single index entry never cross a filter map boundary. So, for example, when VALUES_PER_MAP is 0x10000 and next_index is 0x2FFFE and a log entry with on address and two topics is added, the last two positions of the map are left empty and the new log entry is added starting from position 0x30000.

Filter map row encoding

Each row of the filter map is encoded as a series of little endian binary encoded column indices. For encoding simplicity the column indices are always encoded as 32 bit values, regardless of the fact that MAP_WIDTH is less than 32. This only applies to consensus encoding and hashing though while local storage and proof encoding can use a more tightly packed format.

Proposed constants

NameValue
MAP_WIDTH2**24
MAP_HEIGHT2**16
VALUES_PER_MAP2**16
MAPS_PER_EPOCH2**10
MAX_EPOCH_HISTORY2**24
MAX_ROW_LENGTH[8, 168, 2728, 10920]
MAPPING_FREQUENCY[2*10, 2*6, 2**2, 1]

Note that MAX_ROW_LENGTH and MAPPING_FREQUENCY parameters are specified as lists because the applicable values depend on the used mapping layer. For layers higher than the index of the last elements of these lists the last element is applicable. Also note that the MAX_ROW_LENGTH values correspond to the cumulative capacity of ProgressiveList tree levels; a layer 0 row fits into the first level, a layer 1 row fits into the first three levels and so on.

Constructing the filter map

For each VALUES_PER_MAP long section of the map value index space a filter map is generated. These are fixed size MAP_WIDTH by MAP_HEIGHT sparse bit maps and each map value is marked on the map with a single bit being set to one. The number of marks in a row (the length of the sparse encoded row) is referred to as "row length", not to be confused with the constant MAP_WIDTH.

Epochs and mapping layers

In order to allow efficient search of a certain map value in a long historical range, filter maps are organized into index epochs, each consisting of a fixed MAPS_PER_EPOCH number of maps. In the most usual case (when row density is around average or below) row mapping stays the same during an entire epoch. The hash tree is organized in a way that instead of putting all rows of a single map close to each other, rows of the same row index throughout an entire epoch are close to each other and therefore are cheap to access and/or prove with a Merkle proof.

In order to mitigate collisions in densely populated rows, the concept of mapping layers is introduced, meaning that if a certain row already has a certain number of entries then the row mapping is changed and very frequent map values are mapped into multiple rows. Initially, when a map is empty, every map value is mapped using mapping layer 0 or the "base layer". If a row reaches MAX_ROW_LENGTH[0] then any further map values mapped onto that row in the base layer mapping will use layer 1 mapping instead. On layer 1 a different row is assigned to the same map value and the row length limit in increased to MAX_ROW_LENGTH[1]. If this row also reaches its limit, layer 2 mapping is used and so on. Note that a row filled at a certain mapping layer can be grown further on a higher layer. Different map values colliding in the same row on a certain layer are probably mapped in different rows on the next layer, which means that a very popular value might populate multiple rows (also very long ones) but an unlucky less popular one colliding with it on base layer will probably just have to move one layer up. The search process is similar, if the searcher finds that the row belonging to the searched value is full according to the base layer limit then it also has to check the next layer and so on, until it finds a non-full row.

If a row is longer than the limit according to the layer the searcher is looking at then it can safely ignore the extra entries assuming that they were added by another value on a higher layer. The ProgressiveList container makes it efficient to prove row data belonging to the lower layer even if there is much more data in the same row added on a higher layer.

Row mapping

While base layer row mapping stays the same for an entire epoch, higher layer mappings are changed more frequently. Each mapping change has a cost in terms of database access overhead and Merkle proof size overhead and epoch size is determined in a way that these overheads stay sufficiently low compared to the cost of accessing the actual useful data. On higher layers where the rows are longer, a more frequent remapping is possible because the useful data size per map is also higher. It is also desirable so that a less frequent map value will only suffer from colliding with a more popular one for a shorter time.

The row index is calculated as follows:


The following figure shows how map values are mapped to rows on different mapping layers. Each dot represents a map row and the numbers indicate the mapping layer on which the row has been assigned to the given map value. Note that it might happen that a higher layer mapping coincides with a lower layer mapping for the same value; this does not cause any problem though as the row is simply grown further on the higher layer. The search algorithm can also simply revisit the same row in a higher layer iteration if necessary and process the rest of the row that it first ignored.


Column mapping

Column mapping assumes that MAP_WIDTH is a multiple of VALUES_PER_MAP. column index is calculated as follows:


As shown on the figure below, this mapping practically assigns a MAP_WIDTH // VALUES_PER_MAP by MAP_HEIGHT rectangle to each map value index and ensures that each map value places exactly one mark in its own rectangle (the letters A-D represent different map values). This property also ensures that map value index can be restored from map index and column index, column indices never collide and keep the original order of map value indices, allowing efficient Merkle exclusion proofs of certain column indices in long rows.


Updating the log index and calculating the root hash

See the following functions in log_index.py:

  • log_index_add_log_entries
  • log_index_add_transaction_entry
  • log_index_add_block_entry
  • log_index_root

Note that the block entry may be added either after building/verifying the given block or before processing the next one, but log_index_root should always be calculated after adding transaction and log entries and before adding the block entry for the some block (block building would not even be possible otherwise as the log index root is referenced in the block header).

Finding potential matches

Determining whether a column index found in the appropriate row is relevant for the searched map value is possible by restoring the map value index and then calculating the column index from it again in order to check whether the quasi-random collision filter part matches the expected value for the given map value:


Iterating through all relevant mapping layers and corresponding rows is similar to how new map values are added. Filtering all potential matches from all relevant rows can be done with the following function:


Initialization, minimal state and wire protocol extension

The log index tree has a property that over time most of the tree nodes get finalized, which also means that they are no longer needed in order to be able to add new entries. At every single moment there is one filter map being updated, maybe the few most recently completed maps might still be affected by a reorg, but if the chain finalizes properly then all but the few most recent maps are also finalized. Internal tree nodes with all finalized children are also finalized and can be collapsed into the lowest finalized ancestor. Similarly, internal nodes covering a range that has not been touched yet can be stored without having their children in memory, and they can be expanded on demand.

Certain types of clients (those only interested in validation and not serving users) have the option to just keep a minimal log index state. Those who are interested in event history should probably also only keep the minimal state in memory and store the finalized and collapsed subtrees on disk. Clients can also initialize their log index with such a minimal state and then optionally regenerate the older parts from blocks and receipts locally. When the log index state of a finalized block is represented and no room for rollback is needed, the minimal tree representation simplifies into a set of Merkle branches leading to the map rows and log entry corresponding to next_index (with everything on the left side of these branches collapsed and nothing on their right sides expanded yet). The LogIndexMinimalState container contains this minimal amount of information required to add further events (but not to roll back). Once the log index is added to consensus, the Ethereum wire protocol should also be extended with a message pair that requests/sends this container as a snapshot made of the log index after the lastest finalized block (in order to be able to serve this data, clients have to save a snapshot at each beacon chain epoch boundary and keep it until it becomes finalized).


Assuming an efficient encoding of the current filter_map_rows where the variable length of each filter row is encoded in two bytes and the column indices tightly packed into 3 bytes each, the minimal log index state can be serialized into 21300808 bytes. This number can be considered as a worst case estimate of the amount of data required to initialize the log index data structure at any point.

Note that initialization is also possible on an epoch boundary, in which case the log index is very cheap to initialize (only requires the epoch branch and the last block entry branch from the last epoch) but in this case all blocks and receipts between the boundary and the current head are required to generate the last unfinished epoch.

Rationale


Design goals

The proposed data structure is intended to realize a balance between the cost of adding items and accessing old ones. In a search structure of a constantly growing dataset there is typically a tradeoff between the cost of adding new data and the cost of searcing the existing dataset. One extreme is just linearly storing the data, which is practically the case now with logs, with the bloom filters being mostly useless. The other extreme is one big Merkle tree with all map values ever used as keys and the list of all occurences (possibly in a further merkleized format) as values. With billions of unique map values, adding new entries here is expected to have costs similar to that of the state, with multiple lookups and modifications/insertions at random places in a database on the order of magnitude of hundreds of gigabytes. Another issue where this is similar to the state is that removing old entries is hard and expensive. Adding logs is supposed to be cheaper than writing the state so solutions between these two extremes were considered as potentially practical, with multiple smaller structures generated periodically.

The filter maps have a fixed tree size and an efficient tree hashing scheme. Filter entries are sorted into rows based on content and position in a way that allows quick linear database access and size efficient Merkle proofs. The difficulties arising from certain types of events being much more frequent than others are also mitigated. Update and maintenance costs are also limited as tree nodes are eventually finalized and the number of non-finalized non-empty nodes is always hard capped, ensuring moderate memory requirements. Initialization costs of the data structure at any point of the chain are also capped. Additional database storage costs of filter maps is about 15-20% of the size of the actual logs, depending on certain implementation tradeoffs. The Merkle tree structure also makes it easy to discard entire epochs along with the corresponding Merkle subtrees, making the implementation of history expiry of the log index simple.

Map value index space

In each block a varying number of map values are emitted. In addition to inefficient search, another drawback of per-block fixed size bloom filters is the varying filter utilization leading to over-utilized filters giving many false positives in some blocks and/or wastefully under-utilized filters in some blocks. Block gas limits also tend to change significantly over the long term so any future-proof solution has to be able to adapt to the varying number of map values per block.

Mapping map values on their own linear index space ensures uniform filter utilization of identically structured filter maps. Compared to the alternative of constantly changing adaptive filter size, this approach greatly improves the efficiency of database access and Merkle proofs. It also allows efficient search pattern filtering as each potential match provides exact position information.

Index entries tree

Hashing every event into the index_entries tree keyed by map entry index is additional work and complexity but it simplifies the proving process significantly. Proving the events based only on map value indices would be possible if receipts had a map value index pointer hashed into them (so the client could also verify that the received receipt actually covers the map value index derived from the filter map; in this case though an extra mechanism would be necessary to prove that the receipt is referenced in a block that is part of the same canonical chain where the log index root was referenced. Proving old canonical blocks is currently theoretically possible through the beacon chain but painful and also would require extra infrastructure. On the other hand, the index_entries tree not only solves the problem of proving the matching canonical receipts, with the block entries it also provides a convenient way to prove canonical EL blocks through EL data structures only, and even provides provable canonical block hash to block number lookups.

Note though that storing the entireindex_entries trees directly in their proposed merkleized format on disk is not really efficient. This is not an issue for validators that want to maintain a minimal state; for them, updating the index_entries subtrees is really cheap as they only need to maintain a single Merkle branch pointing to the next map value index. Provers can implement index_entries efficiently by storing a subset of the Merkle tree nodes (the ones at least a few levels below leaf level in order to save space) and generate the rest on demand based on the receipts that encode the same data in a more compact form.

Alternative filter structures considered

One question considered was whether to periodically build lookup trees of events with each unique map value emitted in a given period being a separate key under which only the occurences of that specific event are listed, or to use a more compressed fixed size tree format where different map values might collide (though preferably not too many of them). This decision mostly boiled down to data access efficiency, both in terms of local disk access and remote Merkle proof size. Identically structured trees can be efficiently arranged in larger units (called index epochs here), with values belonging to the same key in subsequent trees of an epoch located close to each other. This improves database access speed. It also allows smaller Merkle proofs with a series of leaves encoded together in an efficient format and internal nodes on only two boundary branches. Database writes are also efficient as the order of adding tree entries is not random and all the non-finalized parts of the tree can be kept in memory with a hard capped memory requirement.

The other design decision considered here was whether to hash entire index entries into the list of map value occurences or just store position info and have a separate tree of log entries. Though the separate filter_maps and index_entries structures do present some additional complexity, the second option was chosen because of the size of Merkle proofs. Looking up a rarely used map value means encountering more irrelevant entries than relevant ones. By adding probabilistic filter information to position information, the vast majority of the irrelevant entries can be filtered out after accessing/proving 3 bytes of data instead of an entire map value (32 bytes). When proving matches of multiple map value patterns this 10x advantage exists even with more frequent individual lookup values. Tests have shown that realistic log searches often yield a lot more matches for the individual map values themselves that the pattern itself. Pattern matching can be performed on the position information of individual potential matches and actual index entries only need to be proven at the potential pattern matches.

In conclusion, for the given application the fixed tree size approach with separate position info plus probabilistic collision filter approach seemed to be the most appropriate.

False positive rate

From the filter maps a set of potential matches can be derived for any block range and map value or pattern of map values. These matches can then be looked up in the corresponding index_entries trees and actually matching events can be added to the set of results. The design guarantees that the set of potential matches includes all actual matches but and also has a consistently low false positive rate.

False positives can happen when the quasi-random collision filter part of a column index accidentally matches the expected value even though it was generated by a map value other than the searched one. The chance of this happening is VALUES_PER_MAP / MAP_WIDTH per colliding enrty in a row that is relevant for the search. Assuming that most entries in a map are different from the searched one, assuming uniform random distribution of entries, the average number of colliding entries found in a relevant row is VALUES_PER_MAP / MAP_HEIGHT giving an average false positive rate of VALUES_PER_MAP ** 2 / MAP_WIDTH / MAP_HEIGHT per filter map.

Though certain map values might be emitted a lot more than others and therefore the row index distribution might not be entirely uniform, periodical remapping of rows and using multiple mapping layers ensures that over a long enough search period random collisions with more frequent map values do even out. Mapping layers do have another consequence though; if any row has at least MAX_ROW_LENGTH[0] entries then the search logic requires looking into another row that is mapped to the searched map value on the next mapping layer. The maximum possible number of such rows is VALUES_PER_MAP / MAX_ROW_LENGTH[0] and therefore the chance of randomly hitting one is VALUES_PER_MAP / MAX_ROW_LENGTH[0] / MAP_HEIGHT in the worst case. In this case an extra row has to be processed, with extra chance of finding false positives. A collision with a frequent value at a certain mapping layer does not indicate a collision on the next layer though, therefore the expected number of entries in that row is no different from the first one. Having to process a third row would presume that the second one had at least MAX_ROW_LENGTH[1] entries. The chance of this happening after the first coincidence is practically negligible in the context of expected false positives.

The expected number of false positives for a single map value search can be estimated as VALUES_PER_MAP ** 2 / MAP_WIDTH / MAP_HEIGHT * (1 + VALUES_PER_MAP / MAX_ROW_LENGTH[0] / MAP_HEIGHT) per filter map. With the proposed constants this roughly equals 0.0044 false positives per map. As of March 2025 the average number of map values emitted in a mainnet block is slightly over 1000 while a filter map consists of 65536 map values. This gives a rough estimate of one false positive per 14000 blocks, which costs the searcher an extra lookup in a log_entries tree. The expected number of false positives in the entire chain history is around 1200.

Note that this is only true for a single value search while a typical pattern search requiring certain values on multiple positions has an exponentially lower false positive rate. For example if the pattern is [Addr, Topic1, Topic2] then three map value searches are performed and an actual log lookup is only necessary if the first search yields N, the second N+1 and the third N+2 simultaneously. If necessary, the rate can be easily reduced by using a higher MAP_WIDTH, at the cost of growing the size of encoded rows.

Backwards Compatibility


The existing log filter API (eth_getLogs, eth_newFilter, eth_getFilterLogs, eth_getFilterChanges) can be implemented with the new filter data structure. Applications relying on this API can operate without any change, benefiting from a higher search performance. Repricing the LOG opcode might be considered after performing benchmarks but the extra processing cost is not significant while the extra storage cost is around 15%. Other than that the EVM is not affected in any way as it only emits logs but does not directly access them.

Security Considerations


Safe access with a remote prover

In order to prove a complete set of matches matching a given search pattern in a given block range, the prover needs to

  • prove the map value index range that corresponds to the searched block number range by proving the block entries of first_block - 1 and last_block
  • prove the relevant rows of filter maps based on map index and row index (verifier can determine the relevant rows in advance based on the map values in the search pattern and the relevant map value index range)
  • prove the actual index entry belonging to any potentially matching map value index and also the block entry with the same block number if the blockHash of the log position info is needed

Since all three steps can be realized with Merkle proofs of the same LogIndex structure referenced in the block headers, any search with a remove prover is as safe as the client's knowledge about the chain head.

Deliberate false positive attacks

The design guarantees that false positive rates do even out statistically over several epochs, even in case of random collisions with very frequent values, ensuring that an excessive amount false positives will not make the bandwidth and processing costs of the search prohibitively high. All of this is true for random collisions only though, not deliberately created collisions. A deliberate attack on a certain important map value in order to raise its false positive rate can not be ruled out entirely since with a low amount of filter data generated per map value it is always possible to "mine" another value that generates colliding filter data. The column mapping used here makes this attack a lot harder though, since the column index depends on both the map value and the exact map value index, making this attack only possible for block builders who are probably offered MEV rewards for much more lucrative manipulations of the transaction set than making the search of certain events slightly more expensive.

Copyright


Copyright and related rights waived via CC0.