Handling P2P Connectivity Challenges in Personal Nodes

Handling P2P Connectivity Challenges in Personal Nodes
Generated by ChatGPT

OverNode(now OverScape) has been my primary project since I joined the current team. Its goal is clear: lowering the barriers users face when attempting to maintain a validator node for Over Protocol. To achieve this, I developed a GUI client for desktop users in 2023 and have been working as a Protocol Engineer in 2024 to support and maintain a resilient P2P network.

The main issue that users encounter is connectivity. Users often experience sudden drops in their peer count, leading to two critical problems: first, their nodes fail to sync, and second, their validators perform poorly. Through extensive investigation, I discovered that these issues can be categorized into two main causes: (1) incorrect client implementations and (2) user environment factors. Let’s delve into each category.

Client Bugs with respect to P2P Layer.

As our CLI clients are forked from Ethereum(go-ethereum for EL, prysm for CL), I need to thoroughly review their implentation.

time="2024-08-02 11:31:08" level=debug msg="Could not ping peer" error="invalid sequence number provided" prefix=sync

One critical issue is the persistency of MetaData. Referring to the consensus-specs, each CL client should manage its own metadata and store peer metadata. Sequence number(seq_number) is used for versioning metadata updates and differs from the sequence number in ENR.

To maintain a healthy connection, a node continuously revalidates its peers by pinging them. Let me denote Alice as the sender and Bob as the receiver. A ping message contains Alice's sequence number, and Bob validates the message by comparing the sequence number in the ping with the one in Bob's own table. If Bob receives a larger sequence number, it updates its table accordingly. However, receiving a smaller sequence number indicates that Alice is attempting to establish a connection with stale metadata. Alice will keep sending ping messages to Bob in a loop, incrementing the bad peer score. Eventually, both nodes classify each other as bad peers.

To suppress the IP colocation issue, OverNode has enabled --p2p-static-id to persist its ID. When the ID is fixed, the metadata must also persist. However, in the current implementation of Prysm, there is no write operation for the updated sequence number, meaning that with every restart, a node initializes its sequence number to 0. I committed a PR to address this issue in Prysm while also fixing it in our client.

fix: update p2p `metaData` file when it is changed by syjn99 · Pull Request #14401 · prysmaticlabs/prysm
What type of PR is this? Bug fix What does this PR do? Why is it needed? This PR addresses Issue #13586, where the beacon-chain does not update the metaData file as expected. The issue arises becau…

Additionally, there was a port mapping issue in go-ethereum (fixed in PR #28911). The issue caused the mapped port to be deleted prematurely during the UPnP lifecycle, breaking inbound connections unexpectedly. Dynamic port mapping is especially important for our users, as they primarily rely on router-based connections.


User Environment Factors in P2P Issues.

As Ethereum's beacon chain is based on wall clock time, time drifts are critical for participating in the consensus. In a given slot time(12 seconds), a proposer must build a block and broadcast it, while attesters receive the block from gossipsub and broadcast their votes(called attestations) to the network. Aggregators for each committee then collect all the votes they have seen and submit them to the network. A node that is not properly synchronized in time cannot fully participate or perform its duties as expected (e.g., its attestation might be included late).

One key point to note is that the discovery protocol is not affected by system time. The rationale behind discv5 clearly states this. Discv5 uses a handshake method to exchange session keys, which prevents replay attacks.

There have been reports describing the impact of time drift in Prysm (#8144, #13936). While most node operators run their nodes in a Linux environment, approximately 70% of OverNode users run it on Windows. It has also been observed that Windows users are more prone to experiencing time drifts.

This is why OverNode alerts users who suffer from significant time drift issues by recommending the use of NTP servers. Since adjusting time settings requires administrator permissions, OverNode provides guidance for accessing the system settings page specific to each OS.

Another issue arises from the router configuration. Most of our users operate nodes over Wi-Fi. If the router supports UPnP, a node can accept inbound requests from its peers. However, a significant number of routers either do not support UPnP or have it disabled. Nodes that only send outbound requests without providing data to peers are often referred to as 'leaf nodes'. If too many nodes behave as leaf nodes, the entire network becomes unhealthy due to the lack of nodes capable of providing data. In such cases, the best advice so far is to encourage users to: 1) enable UPnP, 2) use a wired LAN connection, or 3) manually forward ports for P2P communication.

// go-ethereum
// IP address limits.
bucketIPLimit, bucketSubnet = 2, 24 // at most 2 addresses from the same /24
tableIPLimit, tableSubnet   = 10, 24

// prysm
// DefaultColocationLimit restricts how many peer identities we can see from a single ip or ipv6 subnet.
DefaultColocationLimit = 5

The last issue I want to highlight is the colocation limit. If a user frequently resets their node and re-joins the network with a new identity, the colocation limit for each bucket comes into effect. This means that within a few hours, the user may be rejected by other peers. This issue can be mitigated by persisting the node ID during reboots and limiting the frequency of resets.


These are the issues I have addressed to ensure robust connectivity for users. Through thorough investigation, I developed a useful guide to help users operate their own nodes effectively.