Fatal flaw of Ethereum 2.0

12 min readJun 6, 2020

Original : North American block king

Source: Those things in the blockchain

Following the halving of Bitcoin, another major event is brewing in the currency circle, that is, Ethereum is about to upgrade. Before I start talking about this Ethereum 2.0, I will first ask a question.

If a bank has a total reserve of 10,000 yuan, there are separate ATMs in cities A and B. One person uses ATM for 5,000 yuan in city A, but another person also takes 5,000 yuan in city B. How much money is left in the bank now?

This is a very simple math problem. I believe everyone can give the correct answer. The bank had a total of 10,000 yuan, and A and B each withdrew 5,000 yuan, so the total amount was 10,000 yuan. 10000–10000=0. So the bank has 0 yuan left.

This question is very simple for us, but not so easy for computers. For example, we want to write an application program to achieve the above functions. The difficulty lies in how we ensure that the bank data is synchronized in real time during the withdrawal process. Because if it is not synchronized, when B is operating the ATM, the reserve amount he may read is still 10,000, without deducting 5000 from A. This is obviously messy! I use a Java program to simulate it, you will understand.

We first define a simple class, the name is called Bank. Only one of the custom variables is balance, which refers to the balance of the account. There are only two operation functions, namely withdraw, and getBalance.

public class Bank{   //银行余额   private int balance;
   public Bank(int balance){       this.balance = balance;   }
   //用户提款   public void withdraw (int value)  {       try {           Thread.sleep(300); //0.3秒的模拟时延       } catch (InterruptedException e) {           e.printStackTrace();       }       this.balance -= value;   }
   //查询当前余额   public int getBalance(){       return this.balance;   }
}

Next is the main program used for demonstration:

public class Demo {
   public static void main(String args[]) throws InterruptedException {       Bank bank = new Bank(10000); //银行的初始余额       Runnable Atm1 = () -> {           bank.withdraw(5000);           System.out.println("A 提款 5000");       };
       Runnable Atm2 = () -> {           bank.withdraw(5000);           System.out.println("B 提款 5000");       };
       Thread A = new Thread(Atm1); //提款人A的操作线程       Thread B = new Thread(Atm2);//提款人B的操作线程       A.start();//A开始提款       B.start();//B开始提款       A.join();//等待A操作结束       B.join();//等待B操作结束
       //显示余额       System.out.println("银行余额："+bank.getBalance());   }}

In this program, we set the initial balance to 10,000. Then we simulated the two operation threads of A and B. Both made withdrawals at about the same time, and both took 5,000 yuan. Let’s take a look at the results of the program:

But the problem is that after B mentioned 5000, the bank has no balance, so it should show 0. But here is still 5000. This is where the problem lies! Interestingly, if you run this program repeatedly, you will find that the result of each time may be different. Sometimes it shows 0, sometimes it shows 5000. This phenomenon is a typical term in the computer called race condition (race condition) . Refers to the fact that there are multiple computer threads competing for the same resource, causing disruption of data updates. In our case, A and B use two different ATMs to start two withdrawal thread. Both threads have to modify the bank balance. In this case, B is equivalent to snatching the right to change A’s data, causing the newly updated data to be immediately overwritten by B.

So why does this happen? This is due to the special architecture of the computer CPU. Any instruction of the computer needs to know who it operates on and what its value is, otherwise this instruction is meaningless. Where to find the operation object? There is a component in the CPU called a register dedicated to storing this information. Any instruction needs to access this register to obtain the value of its operation object, and the instruction can be executed completely. For example, AX in the figure below is a register of the CPU. A 16-bit binary number can be stored in it.

Take our example, withdraw is a withdrawal instruction, and its operation object is the bank balance. What is the value of this balance? This is going to look in the register.

After obtaining this value, the instruction begins to execute. In this process, it will modify the contents of the register to complete the data update. So this is how our bank balance is updated. At this time, if there is a new instruction to get the current balance, we can go back to that register to find the answer.

But the problem is that our computer doesn’t just execute an instruction in unit time. In many cases, multiple instructions run simultaneously. Otherwise, how can you get online while listening to music? So in order to achieve “parallel operation”, our CPU introduces a multi-thread management mechanism. It is to encapsulate these instructions in different threads and run multiple programs in parallel through proper scheduling. For example, we can use a thread to execute the withdraw instruction, and at the same time we can allocate another thread to query the current balance (getBalance), as shown in the following figure:

The query operation does not affect the state of the register, so the two threads can be safe, but if you then introduce a third thread to perform the withdrawal operation, then things will become very difficult. Because it may grab the resources of the same register as thread 1. As shown in the following figure, threads 1 and 3 update the state of the register at the same time. It is likely that when thread 3 is executed, thread 1 has not had time to update the value of balance, so the value it reads is still before the update , Ie balance=10000. And after thread 1 finishes running, although the balance is updated to 5000, it does not help, because thread 3 is already operating. Therefore, the update of the register by thread 3 is still based on the original old value: 10000, resulting in the final balance still being 5000. (10000–5000=5000)

So in order to avoid this situation, we must ensure that the state of this register is synchronized in multi-threaded operation. Although they all share the same block of data resources, there must be a first come first. Taking the above situation, we must ensure that when thread 1 operates the register, other threads cannot access it. Only after thread 1 ends, thread 3 can operate. In this way, the register data read by each thread is synchronized.

In order to achieve synchronization between multiple threads, our CPU introduces a “protection lock” (lock) mechanism. It is to mark a “latched” state for this shared register resource. When any thread accesses the register, it can “lock” it. In this way, other threads cannot be accessed and can only wait obediently. This “protection lock” will only be released after the current thread is executed. Then the remaining threads will be automatically woken up and begin to access this register resource. Like our example, you can make thread 1 “lock” when accessing the register, then thread 3 will be forced to wait. After thread 1 is executed, the balance will be updated to 10000–5000=5000. Then the register releases the protection lock, and thread 3 is woken up and begins to access the variable balance. Similarly, it puts on a protective lock. At this time, the value it gets is 5000. After it is executed, the balance will be updated to 5000–5000=0. As follows:

Correspondingly, as long as our Java source code is modified as follows, this “protection lock” mechanism can be implemented:

public class Bank{   //银行余额  private int balance;  private final ReentrantLock lock = new ReentrantLock();  …….   //用户提款   public void withdraw (int value)  {       lock.lock(); //加上保护锁       try {           Thread.sleep(300); //0.3秒的模拟时延       } catch (InterruptedException e) {           e.printStackTrace();       }       this.balance -= value;       lock.unlock(); //释放保护锁   }}

The results are as follows:

According to the results of this operation, you can see that our bank balance has been correctly updated to 0 after the two withdrawals of A and B.

So we can see that although each thread is independent, the scheduling of the entire thread is centralized. The CPU is like a brain. It has to allocate resources to different threads reasonably and arrange the order of execution to ensure the synchronization of data. So the brain must know which registers are locked, which threads are accessing, and which threads are waiting. In other words, it has a “God perspective” that can monitor the status of each thread and each register in real time.

The operation of a single computer program is like this, but if we go big, the node deployment of multiple computers is like this. Take the Taobao website for example. It has to process tens of millions of transaction requests on the Double 11 day, so a server is definitely not enough. It must deploy multiple server nodes, and then distribute these requests to each server evenly through a load balancer. As shown below:

But no matter how many server nodes there are, no matter how many requests, in the end they access the same database! This point is very important. Because only in this way, you can introduce a “protection lock” mechanism to lock the corresponding database form during the transaction to ensure the synchronization of reading and writing. For example, now a Tmall store has 10 Chanel bags, 20% off, 100 people snapped up, when the first person has placed an order and has started trading, then the database must put other purchase requests in the waiting queue . This will ensure that the next person sees 9 packets instead of the first 10.

Therefore, although the node deployment of Taobao is a distributed architecture, it is essentially centralized. This centralization is reflected in the monitoring and scheduling of node threads, as well as database solutions. That is to say, there is a control center behind Taobao server, which can detect the status of each node in real time, and these nodes access the same database. It is this centralized architecture that allows so many nodes to process so many requests in parallel, while also ensuring data synchronization.

But the distributed architecture of the public chain is completely different, because it is decentralized in nature, and each node fights separately. So it does not have a control panel to control the status of all nodes. Secondly, it does not have a centralized database for these nodes to access, but each node independently configures its own database. Therefore, there will be multiple ledgers at the same time, we can only introduce a voting mechanism to determine a final ledger, and indirectly achieve synchronization between nodes. Bitcoin uses voting power to select the longest blockchain as the final ledger. Although different nodes will generate multiple blockchain ledgers, causing the Byzantine Generals problem, Bitcoin’s algorithm works. Because it is a single chain structure, and only one block can be generated per unit time. Although different nodes can broadcast the block at the same time, the mining mechanism of Bitcoin guarantees the uniqueness of this block. So Bitcoin is essentially a single-threaded database read and write operation.

Ethereum was originally no problem, because it is a single-chain structure like Bitcoin, using POW consensus. But after upgrading to 2.0, the problem is very big. Because of Ethereum 2.0, it introduces a mechanism called “sharding”. In simple terms, it is based on Taobao’s Load Balancer mechanism-setting up multiple nodes to process different requests in batches. For example, now that there are 10,000 transaction requests, I let Node A handle 5000, and Node B handles the remaining 5000, so the speed is not fast. I admit that this original intention is good, but in fact it does not work. According to the introduction of Ethereum 2.0, it first introduced a main chain called (Beacon), which is responsible for recording the status of all transactions, which is equivalent to the core of the ledger. Then it divides the entire node network into different areas, and each area acts as a shard, which is equivalent to Load Balancer. Each shard processes different transaction requests and is recorded on the main chain. As shown below:

You may be a little bit confused here, but I will explain it in another way. As long as I read the previous introduction, you should have a simple understanding of multi-thread synchronization. Ethereum 2.0 has a similar architecture. You can understand the Beacon chain as a central database. Each shard is equivalent to an independent thread. The block reported by each thread is different. For example, the transaction sequence contained in the block of shard 1 is 1 to 3000, and the block 2 is 3000 to 6000. So Ethereum 2.0 is equivalent to a multi-threaded database read and write operation. This is fundamentally different from Bitcoin.

If multiple threads operate on the same database, the problem of data synchronization is prone to occur, so the correct approach is to add a protection lock to this database during the execution of each thread to avoid simultaneous access by other threads. So the situation is the same for us, that is, when each shard updates the Beacon chain, it is necessary to add a “protection lock” to the main chain to force other shards to enter the waiting queue. V God has indeed taken this into consideration and is preparing to introduce this “protection lock” mechanism. But the mistake is wrong. This Beacon chain is not the only central database .

We must know that Ethereum is a public chain, and the public chain is decentralized! So every mining node has its own Beacon chain. So the “lock” here is the lock added to your Beacon chain. This latched state is obviously not synchronized with other nodes, so the remaining shard nodes will continue to access the main chain. At this time, the race condition (race condition) I mentioned earlier will be generated between different shards. The update of slice 1 may be overwritten by slice 2. As shown below:

This shows that the protection lock in the decentralized architecture is undoubtedly a fake. And Ethereum 2.0 also allows read and write operations between shards, which would expose the same multi-thread synchronization problem.

Some people may ask, can we synchronize the latch status of this Beacon chain to other shards? Here comes another voting issue. Because if each shard node is from its own perspective, the main chain state it sees is different. For example, node 1 in the above figure locks the main chain of its own record, but node 2 does not think so. Because it did not see the lock. So node 1 thinks there is a lock, node 2 thinks there is no lock. The Byzantine General problem reappears, so it can only be decided by vote. But Ethereum 2.0 uses POS, which is no longer a POW, so your consensus on choosing the longest chain is meaningless. Because there is no cost in the production of blocks, as long as you get the accounting rights, you can broadcast multiple blocks at once, so the longest chain cannot represent the most consensus. At this time, the counting of votes becomes more complicated. Even if the consensus algorithm can calculate the correct number of votes and determine that node 1 wins, then at the same time it means that the block of shard 2 is discarded. What is the point of sharding at this time? If you want to keep the shard 2 block, you must put this thread in the waiting queue. The problem is that you don’t have the same thing as a control panel and can globally allocate resources for different threads. You don’t even know the status of each thread. So is this going back to the old path of centralization?

Based on the above analysis, I can conclude that when Ethereum 2.0 goes live, there will be a lot of data out-of-sync issues. Not only is the shards not synchronized, but the Beacon chain of each node is also not synchronized. The public chain is different from Taobao. If there is a data out-of-synchronization problem on Taobao, I will modify the database at most, or restart the server. However, the unsynchronization on the public chain will cause the miners’ camp to tear and cause bifurcation, which is a very serious problem.

Fatal flaw of Ethereum 2.0

Original : North American block king

Written by 比特币新闻 — 区块链新闻