Solana recently experienced severe performance degradation due to network congestion. The TPS (number of transactions processed per second) dropped by orders of magnitude (from thousands to tens) for several hours.
Technically, this problem is caused by performance bugs in Solana, in particular — the transaction processing unit (TPU). During market volatility, bots are heavily spraying duplicate spam and that bogs down the TPU.
This article elaborates on the design of the TPU and highlights some intricacies.
When a user submits a transaction, it includes a pre-compiled representation of a sequence of instructions, called “message”:
The message must be signed by one or or more keypairs:
The signed signatures are also included in the transaction, and together with the message content, are sent to the Solana cluster via RPCRequest:
Upon receiving a transaction, the TPU has three main stages to process it.
All these three stages are executed by different threads communicated via message passing using crossbeam_channel (a multi-producer multi-consumer channel).
The TPU creates a channel of unbounded capacity with (packet_sender, packet_receiver):
The fetch_stage reads the packets on the transaction sockets, and simply forwards them to the sigverify_stage using packet_sender .
The sigverify_stage receives the transaction packets from packet_receiver and uses TransactionSigVerifier to verify if the signature in each packet is valid.
It assumes each packet contains one transaction, and the packets are verified in parallel using all available CPU cores (and it can also be done on GPU if available).
Note that the TPU creates another channel (verified_sender, verified_receiver), and it uses verified_sender to forward the verified transactions to the next stage (banking_stage).
It not only verifies the signature but is also piggybacked to filter out redundant packets and discard excessive packets in order to improve performance. The fixes to the recent performance degradation are applied in this component.
It contains three steps:
The discard_excess_packets function is defined as:
The ed25519_dalek::PublicKey.verify function is defined as:
It takes a signature and a message as input, and verifies the signature with respect to the message using the key pair’s public key.
Note that the ed25519_dalek::PublicKey.verify function is non-trivial and subtle, and it is not audited.
The banking_stage creates a thread which executes in a loop to process the received transactions batch by batch. The number of transactions in each batch is limited by
The banking_stage uses an important component called bank to load and execute transactions. The function is defined as:
For each transaction, the bank uses MessageProcessor to process the transaction message:
This method calls each instruction in the message over the set of loaded accounts.
For each instruction, it calls the program entrypoint and verifies that the result of the call does not violate the bank’s accounting rules.
Internally, the bank creates an InvokeContext to execute each instruction:
Each transaction has a limited compute budget (by default 200_000 units), defined in ComputeBudget :
The bank involves a lot of complications to execute an instruction, such as
We will elaborate on these details and the bank life cycle in the next article.
sec3 is a security research firm that prepares Solana projects for millions of users. sec3’s Launch Audit is a rigorous, researcher-led code examination that investigates and certifies mainnet-grade smart contracts; sec3’s continuous auditing software platform, X-ray, integrates with GitHub to progressively scan pull requests, helping projects fortify code before deployment; and sec3’s post-deployment security solution, WatchTower, ensures funds stay safe. sec3 is building technology-based scalable solutions for Web3 projects to ensure protocols stay safe as they scale.
To learn more about sec3, please visit https://www.sec3.dev