EIP-2315: Simple Subroutines for the EVM

Two opcodes for efficient, safe, and static subroutines.


Metadata
Status: WithdrawnStandards Track: CoreCreated: 2019-10-17
Authors
Greg Colvin (@gcolvin), Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede), John Max Skaller (skaller@internode.on.net)

Abstract


This proposal provides a complete, efficient, safe and static control-flow facility.

It introduces two new opcodes to support calling and returning from subroutines:

  • RJUMPSUB relative_offset -- relative jump to subroutine
  • RETURNSUB -- return to PC after most recent RJUMPSUB.

It depends on the two new opcodes proposed by EIP-4200 to support static jumps:

  • RJUMP relative_offset — relative jump to PC + relative_offset
  • RJUMPI relative_offset — conditional relative jump

It deprecates JUMP and JUMPI, allowing valid code to support streaming, one-pass, and other near-linear compilers.

In concert with EIP-3540 and EIP-3670 it ensures, at initialization time, that valid code will not execute invalid instructions or jump to invalid locations, will not underflow stack, will maintain consistent numbers of inputs and outputs for subroutines, and will have bounded stack height in the absence of recursion.

This is among the simplest possible proposals that meets these requirements.

Motivation


A complete control-flow facility.

Jumps, conditional jumps and subroutines were proposed by Alan Turing in 1945 as a means of organizing the logic of the code and the design of the memory crystals for his Automatic Computing Engine:

We wish to be able to arrange that sequences of orders can divide at various points, continuing in different ways according to the outcome of the calculations to date... We also wish to be able to arrange for the splitting up of operations into subsidiary operations... To start on a subsidiary operation we need only make a note of where we left off the major operation and then apply the first instruction of the subsidiary. When the subsidiary is over we look up the note and continue with the major operation.

— Alan Turing — in B.E. Carpenter, R.W. Doran, "The other Turing machine." The Computer Journal, Volume 20, Issue 3, January 1977.

In more contemporary terms, we have sequences of instructions, jumps and conditional jumps that divide sequences into blocks, subroutine calls, and a stack of addresses to return to. The details vary, but similar facilities have proven their value across a long line of important machines over the last 75 years, including most all of the machines we have programmed or implemented -- physical machines including the Burroughs 5000, CDC 7600, IBM 360, DEC PDP-11 and VAX, Motorola 68000, Sun SPARC, and Intel x86s, as well as virtual machines for Scheme, Forth, Pascal, Java, Wasm, and others.

Unlike these machines, the Ethereum Virtual Machine does not provide subroutine operations. Instead, they must be synthesized using the dynamic JUMP instruction, which takes its destination on the stack. Further, the EVM provides only dynamic jumps, impeding the static analysis we need.

Efficient control-flow.

Efficient to write by hand, compile from high level labguages, validate at deploy time, interpret by VMs, and compile to machine code.

Static jumps, conditional jumps, and subroutines are sufficient and efficient in space and time, as shown by historical experience and as we will show for the EVM below.

Safe control-flow.

The EVM has unusually high requirements for safety. Not only do many smart contracts control inordinately large amounts of valuable Ether, but once placed on the blockchain any defects are visible to attackers and cannot be repaired. We propose to statically validate important safety constraints on code at initialization time.

Static control-flow.

The EVM's dynamic jumps cause two major problems. First, the need to synthesize static jumps and subroutines with dynamic jumps wastes space and gas with needlessly complex code, as we will show below.

Worse, jumps that can dynamically branch to any destination in the code can cause quadratic "path explosions" when traversing the flow of control. For Ethereum, this is a denial-of-service vulnerability that prevents us, at initialization time, from validating the safe use of EVM code and from compiling EVM code to machine code.

We need static control-flow to validate program safety and to compile EVM bytecode to machine code -- in time and space linear in the size of the code.

Specification


Opcodes

RJUMPSUB (0x5f) relative_offset

Transfers control to a subroutine.

  1. Decode the relative_offset from the immediate data at PC.
  2. Push the current PC + 3 to the return stack.
  3. Set PC to PC + relative_offset.

The relative_offset is relative to the current PC. The offset is encoded as a two-byte, twos-complement signed integer, stored MSB-first.

The gas cost is low.

RETURNSUB (0x5e)

Returns control to the caller of a subroutine.

  1. Pop the return stack to PC.

The gas cost is verylow.

Notes:

  • Values popped off the return stack do not need to be validated, since they are alterable only by RJUMPSUB and RETURNSUB.
  • The description above lays out the semantics of these instructions in terms of a return stack. But the actual state of the return stack is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may code RJUMPSUB to unobservably push PC on the return stack rather than PC + 1, which is allowed so long as RETURNSUB observably returns control to the PC + 3 location.)

Validity

Execution is defined in the Yellow Paper as a sequence of changes in the EVM state. The conditions on valid code are preserved by state changes. At runtime, if execution of an instruction would violate a condition the execution is in an exceptional halting state. The Yellow Paper defines six such states.

  1. Insufficient gas
  2. More than 1024 stack items
  3. State modification during a static call
  4. Insufficient stack items
  5. Invalid jump destination
  6. Invalid instruction

We would like to consider EVM code valid iff no execution of the program can lead to an exceptional halting state. In practice, we must test at runtime for the first three conditions. We don’t know how much gas there will be, we don’t know how deep a recursion may go, analysis of stack depth even for non-recursive programs is nontrivial, and we don't know whether a call will be static. All of the remaining conditions MUST be validated statically, in time and space quasi-linear in the size of the code.

Static Constraints on Valid Code

  • Every instruction MUST be valid:
    • The JUMP and JUMPI instructions ARE NOT valid.
  • Every jump MUST be valid:
    • The RJUMP, RJUMPI, or RJUMPSUB instructions MUST NOT address immediate data or addresses outside of their code section.
  • The stacks MUST be valid:
    • The number of items on the data stack MUST always be positive.
    • The number of items on the return stack MUST always be positive.
  • The data stack MUST be consistently aligned:
    • The data stack height is
      • the absolute difference between the current stack pointer and the stack pointer on entry to the current subroutine.
    • It MUST be the same for every reachable path through a given PC and
    • MUST NOT exceed 1024.

Rationale


This is a purely semantic specification, placing no constraints on the syntax of code sections beyond being a sequence of opcodes and immediate data – a subroutine is not a contiguous sequence of bytecode, it is a subgraph of the bytecode's control-flow graph. The EVM is a simple state machine. We only promise that valid code will not, as it were, jam up the gears of the machine.

By avoiding syntactic constraints we allow for optimizations like tail call elimination, multiple-entry subroutines, moving "cold" code out of line, and other ways of reducing redundancy and keeping "hot" code in cache. Since we wish to support one-pass compilation of EVM code to machine code it is crucial that the EVM code be as well optimized as possible up front.

Validation

Rather than enforce constraints via syntax, we enforce them via validation.

The constraints on valid code cover all of the exceptional halting states that we can validate and allow code to be validated and compiled in time and space quasi-linear in the size of the code.

The RJUMP, RJUMPI and RJUMPSUB instructions take their relative_offset as immediate arguments, which cannot change at runtime. Having constant destinations for all jumps means that all jump destinations can be validated at initialization time, not runtime. Dynamic jumps can branch to any destination in the code, so exploitable quadratic "path explosions" are possible when traversing the control flow graph. Deprecating JUMP and JUMPI prevents this.

Requiring a consistently aligned data stack

  • prevents stack underflow
  • ensures that all calls to a subroutine have the same number of inputs and the same number of outputs and
  • ensures that stack height is bounded in the absence of recursion.

Requiring a consistently aligned data stack also allows some algorithms that traverse the control-flow graph -- including code validation and compilation -- to break cycles at joins, again preventing quadratic path explosion. When a traversal gets to a PC it has visited before it is either at the beginning of a loop or the entry to a function. Since the stack height at that PC is constant we know that loops will not grow stack, and that the number of arguments to a subroutine will always be the same -- there may be no need to traverse that path again.

Note: The JVM and Wasm enforce similar constraints for similar reasons.

Alternative Designs

There are a few major designs for a subroutine facility, two of which are considered here. The others are mostly not appropriate for the EVM, such as the Wheeler Jump -- self-modifying code that writes return addresses into called subroutines.

1. Keep return addresses on a dedicated return stack. Turing's design is often used by stack machines, including those for Forth, Java, Wasm, and others. The data stack is used for computation, with a dedicated stack for return addresses. A single instruction suffices to call, and another to return from a routine.

2. Keep return addresses on the data stack. This design is often used by register machines, including those from CDC, IBM, DEC, Intel, and ARM. The registers are used primarily for computation, and the stack maintains call frames for return addresses, arguments, and local variables. On the EVM there are no registers for computation, so using the stack for both purposes can cause the sort of inefficiencies we see below. Pascal p-code does use this design, but as part of a complex calling convention with dedicated registers.

We prefer the dedicated return stack.

  • It maintains a clear separation between calculation and flow of control:
    • the data stack is free of vulnerable return addresses and
    • it's impossible to overwrite the return stack.
  • It improves efficiency:
    • it uses native arithmetic rather than 256-bit EVM instructions for the return address,
    • doesn't use up a data stack slot for the return address and
    • needs less motion of 256-bit data on the stack.

Efficiency

We illustrate here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using JUMP.

Simple Subroutine Call

Consider these examples of a fairly minimal subroutine, including the code to call it.

Subroutine call, using RJUMPSUB:


Total gas: 19

Subroutine call, using JUMP:


Total: 41 gas.

Using RJUMPSUB versus JUMP saves 41 - 19 = 22 gas — a 54% improvement.

Tail Call Optimization

Of course in cases like this one we can optimize the tail call, so that the return from SQUARE actually returns from TEST_SQUARE.

Tail call optimization, using RJUMPSUB and RETURNSUB:


Total: 17 gas

Tail call optimization, using JUMP:


Total: 38 gas

Using RJUMPSUB versus JUMP saves 38 - 17 = 21 gas — a 55% improvement.

Efficiency Caveats

We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using JUMP — in our examples they cut gas use by about half.

Clearly, the benefits of this efficiency are greater for programs that have been factored into smaller subroutines. How small? Wrapping code in a subroutine costs only 8 gas using RJUMPSUB and RETURNSUB versus 30 gas using JUMP, PUSH and SWAP as above.

Costs

The low cost of RJUMPSUB versus the mid cost of JUMP is justified by needing only to decode the immediate two byte destination to the PC and push the return address on the return stack, all using native arithmetic, versus using the data stack with emulated 256-bit instructions.

The verylow cost of RETURNSUB is justified by needing only to pop the return stack into the PC. Benchmarking will be needed to tell if the costs are well-balanced.

Backwards Compatibility


These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data. Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue.

Test Cases


Simple routine

This should jump into a subroutine, back out and stop.

Bytecode: 0x5f0003005e (RJUMPSUB 3, RETURNSUB, STOP)

PcOpCostStackRStack
0RJUMPSUB5[][]
2STOP0[][]
3RETURNSUB3[][]

Output: 0x Consumed gas: 10

Two levels of subroutines

This should execute fine, going into one two depths of subroutines

Bytecode: 0x5f00045F00025200 (RJUMPSUB 4, RJUMPSUB 2, RETURNSUB, RETURNSUB, STOP)

PcOpCostStackRStack
0RJUMPSUB5[][]
3RJUMPSUB5[][]
4RETURNSUB5[][]
5RETURNSUB5[][]
6STOP0[][]

Consumed gas: 20

Failure 1: invalid jump

This should fail, since the given location is outside of the code-range.

Bytecode: 0X5fff(RJUMPSUB -1)

PcOpCostStackRStack
0RJUMPSUB10[][]

Failure 2: shallow return stack

This should fail at first opcode, due to shallow return_stack

Bytecode: 0x5e (RETURNSUB)

PcOpCostStackRStack
0RETURNSUB5[][]

Subroutine at end of code

In this example the RJUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error

Bytecode: 0x5c00045e5fffff (RJUMP 4, RETURNSUB, RJUMPSUB -1)

PcOpCostStackRStack
0RJUMP5[][]
3RETURNSUB5[][]
4RJUMPSUB5[][]
7STOP0[][]

Consumed gas: 15

Reference Implementation


The following is a pseudo-Python implementation of an algorithm for predicating code validity. An equivalent algorithm must be run at initialization time.

This algorithm performs a symbolic execution of the program that recursively traverses the code, emulating its control flow and stack use and checking for violations of the rules above.

It runs in time equal to O(vertices + edges) in the program's control-flow graph, where edges represent control flow and the vertices represent basic blocks — thus the algorithm takes time proportional to the size of the code. It maintains a stack of continuations for conditional jumps, the size of which is at most proportional to the size of the code.

Validation Function

** Note: this function is known to be incomplete and incorrect. **

For simplicity's sake we assume that all jumpdest analysis and prior validation has been done, including EIP-3540, EIP-3670, and EIP-4200, so EOF headers and sections are well-formed, and there are no invalid instructions or jumps. In practice, all passes of validation can be folded into a single loop no recursion.

We also assume some helper functions.

  • is_valid(opcode) returns true iff opcode is valid.
  • is_terminator(opcode) returns true iff opcode is terminator.
  • is_valid_jumpdest(pc) returns true iff pc is a valid jump destination.
  • immediate_data(pc) returns the immediate data for the instruction at pc.
  • immediate_size(opcode) returns the size of the immediate data for an opcode.
  • removed_items(opcode) returns the number of items removed from the data_stack by the opcode.
  • added_items(opcode) returns the number of items added to the data_stack by the opcode.

Security Considerations


These changes introduce new flow control instructions. They do not introduce any new security considerations. This EIP is intended to improve security by validating a higher level of safety for EVM code deployed on the blockchain. The validation algorithm must be quasi-linear in time and space to not be a denial of service vulnerability. The algorithm here makes one linear-time pass of the bytecode, and uses a stack of continuations that cannot exceed the number of RJUMPI instructions in the code.

Copyright


Copyright and related rights waived via CC0.