The problem of infinitely expanding state has been discussed as a side effect of other issues and as something that requires short-term fixes but hasn’t been addressed head-on in a way that prompts discussion around sustainable, long-term changes to the incentive structures for users of blockchains.
This post is not deeply technical but I hope it will prompt the necessarily technical discussion to support and challenge it.
Basically, I have two conclusions:
We need to both model the actual relationship between state increases and node capacity so we can clearly determine when we’re above or below the line of sustainability.
We should look into actively incentivizing behavior that releases state so it can be pruned away, eg refunding gas fees for releasing state, which can mitigate this effect.
I recommend reading through these resources to get up to speed:
A response to the above, The Ethereum Blockchain size has exceeded 1TB and yes it’s an issue (warning: Longread). A bit peripheral but goes into a lot of the counter-arguments like why lite nodes aren’t an answer and why this is important in the first place.
Claim: Scalability brings significantly bigger state size problems than the ones already showing up in Ethereum.
Let’s say Ethereum’s 10 tps results in a potential storage growth rate of roughly ~50 GB per year. This seems low-end but why not.
Now enter the larger-but-not-scalable blockchains like EOS, which can process something like 1k-10k tps, so we’d imagine their state to increase 100x-1000x faster, at 50TB per year at capacity.
Now enter the scalable blockchains that claim upwards of 100k tps, and we’d imagine that full capacity would result in state increases of 500TB per year.
Nodes already can’t sync properly with the Ethereum network so there’s no way a node would be able to keep up with the kind of size these other networks are proposing.
As important as the rate of increase is just the fact that it is increasing infinitely and the fundamental design of the system encourages this.
Counter Arguments and Solutions
A common counter is that Moore’s Law will keep disk space expanding fast enough to keep up, thus mitigating the runaway effects of infinite state expansion. This seems nakedly wrong on two dimensions.
First, the numbers above clearly show disk space requirements exploding higher well beyond the capacity for any doubling effect to keep up with.
Second, there are other considerations when increasing state like the ability of network or database operations to keep up with validation needs.
Another mitigation is layer 1 scaling in the form of state sharding (like we’re proposing). This helps to mitigate the issue by splitting the state across multiple shards so nodes only carry the relevant bits they need to validate transactions on their shard. But as state overall increases to infinity, the average state per node (let’s pretend it’s split fully across nodes) would still increase significantly unless the rate of growth in network node capacity increases fast enough to keep up.
It’s basically the same problem – you can think of each shard as operating like its own Ethereum network, so each shard has the same let’s-call-it-50GB of state increase every year at their maximum
capacity… only the cross-shard communication overhead continues to increase as the overall network topography expands into more shards and it’s obviously not as simple in reality as just thinking about a single shard in isolation.
Either way, as time goes to infinity, we again find ourselves battling the ability of nodes to store, process and transmit the necessary state against the overall network activity.
In the “good” case, improvements in technology outpace the growth of the network usage and node capability remains sufficient to keep up. More nodes can join the network and it decentralizes over time.
In the “bad” case, the network activity and state size exceeds node abilities, nodes fall out of sync and we either see a large spike in transaction costs that destroy the economics of businesses built on top of the chain or the runaway effects continue until the last node drops out and the chain dies.
The problem is that I haven’t yet seen an analysis that puts these factors next to each other on a chart such that it’s obviously fixed.
Pruning (see this Stack Overflow post) basically allows for the garbage collection of old, now-obsolete branches of the state trie whenever new transitions edit or delete data. In a system with deterministic finality like POS with BFT (unlike probabilistic PoW), that’s viable.
The way pruning is generally presented is as a passive efficiency that is gained as a byproduct of everyday transactions. I’m particularly interested in how we can actually incentivize active pruning by creating incentives for applications to release their state.
It seems simply unsustainable that a one-time fee allows you to lock up state for all of eternity, creating infinite future costs for the network. At the very least, we should incentivize people to release state and allow it to be pruned away, eg by rebating some of their gas fees. The economics of this are particularly interesting (presumably you’d be more incentivized to remove old state when gas prices are high, which seems like an appropriate mechanism but might increase congestion further in that moment), but the concept bears further research.
Recommendations and Open Questions
We need to nail down the math on this. What are the second-order effects of increasing state size like network usage, database query times and compute requirements? If we can model these, we can at least gut-check assumptions about the ability of a Moore’s Law, Nielson’s Law or whatever law to keep up.
We need to think about how to incentivize freeing up space to create a strong countering force to this runaway state. I’d love to explore more about refunding a portion of gas fees for freeing state, even if that means potentially increasing them up front sort of like a bottle deposit at the grocery store.