EIP-3508: Transaction Data Opcodes
Simple Summary
Provide access to original transaction data.
Abstract
This EIP introduces the following three EVM instructions: ORIGINDATALOAD
, ORIGINDATASIZE
, and ORIGINDATACOPY
.
These three instructions are meant to provide access to the original transaction's data
payload, enabling a gas-efficient way of accessing large data payloads in cross-contract calls.
Motivation
As the Ethereum development scene matures, more ambitious and complex features are introduced into smart contracts more often than not requiring the utilization of complex and at times large data structures. Given the inherent limits of the EVM, however, transporting large data structures in between contracts is a costly task that can at times lead to even futile scenarios whereby the gas consumption of such an operation is impossible to execute within the gas limit bounds as well as without sacrificing a large chunk of ETH to facilitate its gas cost.
The purpose of this EIP is to render these features viable by introducing a way via which multi-contract systems are able to access the same in-memory data source without necessarily transmitting the full payload between them.
This EIP enables elaborate smart contract features to become part of a larger call-chain by efficiently reading data from the original transaction payload rather than requiring the data to be passed in as call-level data. Its inclusion will mainly benefit advanced trustless schemes to manifest, such as efficient verification of Merkle Patricia trees validating the storage value of a particular Ethereum block or EVM-based layer 2 solutions.
A side-effect of this change is that smart contract systems relying entirely on origin data inherently guarantee that the data they receive has not been malformed by an intermediate smart contract call.
Specification
ORIGINDATALOAD (0x47
), ORIGINDATASIZE (0x48
) and ORIGINDATACOPY (0x49
)
These instructions are meant to operate similarly to their call-prefixed counterparts with the exception that they instead operate on the original data
of a transaction instead of the current call's data. In detail:
- ORIGINDATALOAD (
0x47
) performs similarly to CALLDATALOAD (0x35
) - ORIGINDATASIZE (
0x48
) performs similarly to CALLDATASIZE (0x36
) - ORIGINDATACOPY (
0x49
) performs similarly to CALLDATACOPY (0x37
)
As the data is retrieved once again from the execution environment, the costs for the three instructions will be G_verylow
, G_base
and G_base + G_verylow * (number of words copied, rounded up)
respectively.
The transaction data the ORIGINDATA*
opcodes operate on will be equivalent to the calldata
specified in the args*
parameter to the nearest AUTHCALL
(0xf7
) up the stack. If there is no AUTHCALL
in the stack then ORIGINDATA*
will operate on the transaction's original data
field.
This interaction ensures full compatibility with EIP-3074 and ensures that no form of discrimination is introduced back into the system by this EIP e.g. by contracts entirely relying on ORIGINDATA*
and thus allowing only EOAs to supply data to them.
Rationale
AUTHCALL (0xf7
) Interaction
The EIP-3074 that will be part of the London fork has introduced a new call instruction called AUTHCALL
(0xf7
) that will replace a transaction's ORIGIN
(0x32
) with the context variable authorized
. The intention of AUTHCALL
is to prevent discrimination between smart contracts and EOAs which ORIGIN
initially facilitated and as a result, it is sensible also replace the values retrieved by the ORIGINDATA*
opcodes to the ones used in the AUTHCALL
.
Naming Conventions
The ORIGIN
-prefixed instructions attempted to conform to the existing naming convention of CALL
-prefixed instructions given the existence of the ORIGIN
(0x32
) instruction which is equivalent to the CALLER
(0x33
) instruction but on the original transaction's context.
Instruction Address Space
The instruction address space of the 0x30-0x3f
has been exhausted by calls that already provide information about the execution context of a call so a new range had to be identified that is suitable for the purposes of the EIP.
Given that the EIP-1344 CHAINID
opcode was included at 0x46
, it made sense to include additional transaction-related data beyond it since the Chain ID is also included in transaction payloads apart from the blocks themselves, rendering the 0x46-0x4f
address space reserved for more transaction-related data that may be necessary in the future, such as the EOA's nonce.
Gas Costs
The opcodes ORIGINDATALOAD (0x47
), ORIGINDATASIZE (0x48
), and ORIGINDATACOPY (0x49
) essentially perform the same thing as opcodes CALLDATALOAD (0x35
), CALLDATASIZE (0x36
), and CALLDATACOPY (0x37
) respectively and thus share the exact same gas costs.
Instruction Space Pollution
One can argue that multiple new EVM instructions pollute the EVM instruction address space and could cause issues in assigning sensible instruction codes to future instructions. This particular issue was assessed and a methodology via which the raw RLP encoded transaction may be accessible to the EVM was ideated. This would future-proof the new instruction set as it would be usable for other members of the transaction that may be desired to be accessible on-chain in the future, however, it would also cause a redundancy in the ORIGIN
opcode.
Backwards Compatibility
The EIP does not alter or adjust existing functionality provided by the EVM and as such, no known issues exist.
Test Cases
TODO.
Security Considerations
Introspective Contracts
Atomically, the ORIGINDATALOAD
and ORIGINDATACOPY
values should be considered insecure as they can easily be spoofed by creating an entry smart contract with the appropriate function signature and arguments that consequently invokes other contracts within the call chain. In brief, one should always assume that tx.data != calldata
and these instructions should not be used as an introspection tool alone.
Denial-of-Service Attack
An initial concern that may arise from this EIP is the additional contextual data that must be provided at the software level of nodes to the EVM in order for it to be able to access the necessary data via the ORIGINDATALOAD
and ORIGINDATACOPY
instructions.
This would lead to an increase in memory consumption, however, this increase should be negligible if at all existent given that the data of a transaction should already exist in memory as part of its execution process; a step in the overall inclusion of a transaction within a block.
Multi-Contract System Gas Reduction
Given that most complex smart contract systems deployed on Ethereum today rely on cross-contract interactions whereby values are passed from one contract to another via function calls, the ORIGIN
-prefixed instruction set would enable a way for smart contract systems to acquire access to the original transaction data at any given step in the call chain execution which could result in cross-contract calls ultimately consuming less gas if the data passed between them is reduced as a side-effect of this change.
The gas reduction, however, would be an implementation-based optimization that would also be solely applicable for rudimentary memory arguments rather than storage-based data, the latter of which is most commonly utilized in these types of calls. As a result, the overall gas reduction observed by this change will be negligible for most implementations.
Copyright
Copyright and related rights waived via CC0.