EIP-3508: Transaction Data Opcodes


Metadata
Status: StagnantStandards Track: CoreCreated: 2021-04-16
Authors
Alex Papageorgiou (@alex-ppg)

Simple Summary


Provide access to original transaction data.

Abstract


This EIP introduces the following three EVM instructions: ORIGINDATALOAD, ORIGINDATASIZE, and ORIGINDATACOPY.

These three instructions are meant to provide access to the original transaction's data payload, enabling a gas-efficient way of accessing large data payloads in cross-contract calls.

Motivation


As the Ethereum development scene matures, more ambitious and complex features are introduced into smart contracts more often than not requiring the utilization of complex and at times large data structures. Given the inherent limits of the EVM, however, transporting large data structures in between contracts is a costly task that can at times lead to even futile scenarios whereby the gas consumption of such an operation is impossible to execute within the gas limit bounds as well as without sacrificing a large chunk of ETH to facilitate its gas cost.

The purpose of this EIP is to render these features viable by introducing a way via which multi-contract systems are able to access the same in-memory data source without necessarily transmitting the full payload between them.

This EIP enables elaborate smart contract features to become part of a larger call-chain by efficiently reading data from the original transaction payload rather than requiring the data to be passed in as call-level data. Its inclusion will mainly benefit advanced trustless schemes to manifest, such as efficient verification of Merkle Patricia trees validating the storage value of a particular Ethereum block or EVM-based layer 2 solutions.

A side-effect of this change is that smart contract systems relying entirely on origin data inherently guarantee that the data they receive has not been malformed by an intermediate smart contract call.

Specification


ORIGINDATALOAD (0x47), ORIGINDATASIZE (0x48) and ORIGINDATACOPY (0x49)

These instructions are meant to operate similarly to their call-prefixed counterparts with the exception that they instead operate on the original data of a transaction instead of the current call's data. In detail:

  • ORIGINDATALOAD (0x47) performs similarly to CALLDATALOAD (0x35)
  • ORIGINDATASIZE (0x48) performs similarly to CALLDATASIZE (0x36)
  • ORIGINDATACOPY (0x49) performs similarly to CALLDATACOPY (0x37)

As the data is retrieved once again from the execution environment, the costs for the three instructions will be G_verylow, G_base and G_base + G_verylow * (number of words copied, rounded up) respectively.

The transaction data the ORIGINDATA* opcodes operate on will be equivalent to the calldata specified in the args* parameter to the nearest AUTHCALL (0xf7) up the stack. If there is no AUTHCALL in the stack then ORIGINDATA* will operate on the transaction's original data field.

This interaction ensures full compatibility with EIP-3074 and ensures that no form of discrimination is introduced back into the system by this EIP e.g. by contracts entirely relying on ORIGINDATA* and thus allowing only EOAs to supply data to them.

Rationale


AUTHCALL (0xf7) Interaction

The EIP-3074 that will be part of the London fork has introduced a new call instruction called AUTHCALL (0xf7) that will replace a transaction's ORIGIN (0x32) with the context variable authorized. The intention of AUTHCALL is to prevent discrimination between smart contracts and EOAs which ORIGIN initially facilitated and as a result, it is sensible also replace the values retrieved by the ORIGINDATA* opcodes to the ones used in the AUTHCALL.

Naming Conventions

The ORIGIN-prefixed instructions attempted to conform to the existing naming convention of CALL-prefixed instructions given the existence of the ORIGIN (0x32) instruction which is equivalent to the CALLER (0x33) instruction but on the original transaction's context.

Instruction Address Space

The instruction address space of the 0x30-0x3f has been exhausted by calls that already provide information about the execution context of a call so a new range had to be identified that is suitable for the purposes of the EIP.

Given that the EIP-1344 CHAINID opcode was included at 0x46, it made sense to include additional transaction-related data beyond it since the Chain ID is also included in transaction payloads apart from the blocks themselves, rendering the 0x46-0x4f address space reserved for more transaction-related data that may be necessary in the future, such as the EOA's nonce.

Gas Costs

The opcodes ORIGINDATALOAD (0x47), ORIGINDATASIZE (0x48), and ORIGINDATACOPY (0x49) essentially perform the same thing as opcodes CALLDATALOAD (0x35), CALLDATASIZE (0x36), and CALLDATACOPY (0x37) respectively and thus share the exact same gas costs.

Instruction Space Pollution

One can argue that multiple new EVM instructions pollute the EVM instruction address space and could cause issues in assigning sensible instruction codes to future instructions. This particular issue was assessed and a methodology via which the raw RLP encoded transaction may be accessible to the EVM was ideated. This would future-proof the new instruction set as it would be usable for other members of the transaction that may be desired to be accessible on-chain in the future, however, it would also cause a redundancy in the ORIGIN opcode.

Backwards Compatibility


The EIP does not alter or adjust existing functionality provided by the EVM and as such, no known issues exist.

Test Cases


TODO.

Security Considerations


Introspective Contracts

Atomically, the ORIGINDATALOAD and ORIGINDATACOPY values should be considered insecure as they can easily be spoofed by creating an entry smart contract with the appropriate function signature and arguments that consequently invokes other contracts within the call chain. In brief, one should always assume that tx.data != calldata and these instructions should not be used as an introspection tool alone.

Denial-of-Service Attack

An initial concern that may arise from this EIP is the additional contextual data that must be provided at the software level of nodes to the EVM in order for it to be able to access the necessary data via the ORIGINDATALOAD and ORIGINDATACOPY instructions.

This would lead to an increase in memory consumption, however, this increase should be negligible if at all existent given that the data of a transaction should already exist in memory as part of its execution process; a step in the overall inclusion of a transaction within a block.

Multi-Contract System Gas Reduction

Given that most complex smart contract systems deployed on Ethereum today rely on cross-contract interactions whereby values are passed from one contract to another via function calls, the ORIGIN-prefixed instruction set would enable a way for smart contract systems to acquire access to the original transaction data at any given step in the call chain execution which could result in cross-contract calls ultimately consuming less gas if the data passed between them is reduced as a side-effect of this change.

The gas reduction, however, would be an implementation-based optimization that would also be solely applicable for rudimentary memory arguments rather than storage-based data, the latter of which is most commonly utilized in these types of calls. As a result, the overall gas reduction observed by this change will be negligible for most implementations.

Copyright


Copyright and related rights waived via CC0.