On relay search we show an amber dot next to the relay nickname when it is overloaded. This means that one or many of the following load metrics have been triggered:

Note that if a relay reaches an overloaded state we show it for 72 hours after the relay has recovered.

假若发生节点过载,请执行以下步骤:

  1. https://status.torproject.org/ 的“Tor 网络”(Tor Network)分类检查任何已知问题。

  2. 考虑为你的系统调整 sysctl 优化网络,内存和 CPU 负载。

  3. 请考虑启用 MetricsPort 了解实际情况。

调整 sysctl 优化网络,内存和 CPU 负载

TCP port exhaustion

If you are experiencing TCP port exhaustion consider expanding your local port range. You can do that with

# sysctl -w net.ipv4.ip_local_port_range="15000 64000"

# echo 15000 64000 > /proc/sys/net/ipv4/ip_local_port_range

Keep in mind that tuning sysctl as described is not permanent and will be lost upon restart. You need to add the configuration to /etc/sysctl.conf or to a file in /etc/sysctl.d/ to make it permanent.

MetricsPort

To understand the well-being of Tor relays and the Tor network it is vital to provide and have access to relay metrics. Relay overload information has been added to relay descriptors since 0.4.6+ but it was not until Tor >= 0.4.7.1-alpha that an interface to the underlying relay metrics was available: the metrics port.

Enabling MetricsPort

Tor provides access to the metrics port via a torrc configuration option called MetricsPort.

It's important to understand that exposing the tor MetricsPort publicly is dangerous for the Tor network users, which is why that port is not enabled by default and its access has to be governed by an access policy. Please take extra precaution and care when opening this port, and close it when you are done debugging.

Let's assume you are the only user on a server that runs a Tor relay. You can enable the metrics port adding this to your torrc file:

MetricsPort 127.0.0.1:9035
MetricsPortPolicy accept 127.0.0.1

And then you will be able to easily retrieve the metrics with:

# curl http://127.0.0.1:9035/metrics

which are by default in a Prometheus format.

Note: every user on that server will be able to access those relay metrics in the example above. In general, set a very strict access policy with MetricsPortPolicy and consider using your operating systems firewall features for defense in depth.

For a more detailed explanation about MetricsPort and MetricsPortPolicy see tor's man page.

MetricsPort output

Here is an example of what output enabling MetricsPort will produce:

# HELP tor_relay_load_onionskins_total Total number of onionskins handled
# TYPE tor_relay_load_onionskins_total counter
tor_relay_load_onionskins_total{type="tap",action="processed"} 0
tor_relay_load_onionskins_total{type="tap",action="dropped"} 0
tor_relay_load_onionskins_total{type="fast",action="processed"} 0
tor_relay_load_onionskins_total{type="fast",action="dropped"} 0
tor_relay_load_onionskins_total{type="ntor",action="processed"} 0
tor_relay_load_onionskins_total{type="ntor",action="dropped"} 0
# HELP tor_relay_exit_dns_query_total Total number of DNS queries done by this relay
# TYPE tor_relay_exit_dns_query_total counter
tor_relay_exit_dns_query_total{record="A"} 0
tor_relay_exit_dns_query_total{record="PTR"} 0
tor_relay_exit_dns_query_total{record="AAAA"} 0
# HELP tor_relay_exit_dns_error_total Total number of DNS errors encountered by this relay
# TYPE tor_relay_exit_dns_error_total counter
tor_relay_exit_dns_error_total{record="A",reason="success"} 0
tor_relay_exit_dns_error_total{record="A",reason="format"} 0
tor_relay_exit_dns_error_total{record="A",reason="serverfailed"} 0
tor_relay_exit_dns_error_total{record="A",reason="notexist"} 0
tor_relay_exit_dns_error_total{record="A",reason="notimpl"} 0
tor_relay_exit_dns_error_total{record="A",reason="refused"} 0
tor_relay_exit_dns_error_total{record="A",reason="truncated"} 0
tor_relay_exit_dns_error_total{record="A",reason="unknown"} 0
tor_relay_exit_dns_error_total{record="A",reason="tor_timeout"} 0
tor_relay_exit_dns_error_total{record="A",reason="shutdown"} 0
tor_relay_exit_dns_error_total{record="A",reason="cancel"} 0
tor_relay_exit_dns_error_total{record="A",reason="nodata"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="success"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="format"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="serverfailed"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="notexist"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="notimpl"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="refused"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="truncated"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="unknown"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="tor_timeout"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="shutdown"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="cancel"} 0
tor_relay_exit_dns_error_total{record="PTR",reason="nodata"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="success"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="format"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="serverfailed"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="notexist"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="notimpl"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="refused"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="truncated"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="unknown"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="tor_timeout"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="shutdown"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="cancel"} 0
tor_relay_exit_dns_error_total{record="AAAA",reason="nodata"} 0
# HELP tor_relay_load_tcp_exhaustion_total Total number of times we ran out of TCP ports
# TYPE tor_relay_load_tcp_exhaustion_total counter
tor_relay_load_tcp_exhaustion_total 0
# HELP tor_relay_load_socket_total Total number of sockets
# TYPE tor_relay_load_socket_total gauge
tor_relay_load_socket_total{state="opened"} 135
tor_relay_load_socket_total 1048544
# HELP tor_relay_load_oom_bytes_total Total number of bytes the OOM has freed by subsystem
# TYPE tor_relay_load_oom_bytes_total counter
tor_relay_load_oom_bytes_total{subsys="cell"} 0
tor_relay_load_oom_bytes_total{subsys="dns"} 0
tor_relay_load_oom_bytes_total{subsys="geoip"} 0
tor_relay_load_oom_bytes_total{subsys="hsdir"} 0
# HELP tor_relay_load_global_rate_limit_reached_total Total number of global connection bucket limit reached
# TYPE tor_relay_load_global_rate_limit_reached_total counter
tor_relay_load_global_rate_limit_reached_total{side="read"} 0
tor_relay_load_global_rate_limit_reached_total{side="write"} 0

Let's find out what some of these lines actually mean:

tor_relay_load_onionskins_total{type="ntor",action="dropped"} 0

When a relay starts seeing "dropped", it is a CPU/RAM problem usually.

Tor is sadly single threaded except for when the "onion skins" are processed. The "onion skins" are the cryptographic work that needs to be done on the famous "onion layers" in every circuits.

When tor processes the layers we use a thread pool and outsource all of that work to that pool. It can happen that this pool starts dropping work due to memory or CPU pressure and this will trigger an overload state.

If your server is running at capacity this will likely be triggered.

tor_relay_exit_dns_error_total{...}

Any counter in the "*_dns_error_total" realm indicates a potential DNS related problem. However, we realized during the 0.4.7 release cycle that DNS errors are way too noisy and contain too many false positives to be useful for overload reporting purposes. We therefore don't use them anymore for that purpose starting with 0.4.6.9 and 0.4.7.4-alpha. However, we still keep DNS metrics around to give the relay operator insight into what is going on with their relay.

DNS timeout issues and errors only apply to Exit nodes.

tor_relay_load_oom_bytes_total{...}

An Out-Of-Memory invocation indicates a RAM problem. The relay might need more RAM or it is leaking memory. If you noticed that the tor process is leaking memory, please report the issue either via Tor gitLab or sending an email to the tor-relays mailing list.

Tor has its own OOM handler and it is invoked when 75%, of the total memory tor thinks is available, is reached. Thus, let say tor thinks it can use 2GB in total then at 1.5GB of memory usage, it will start freeing memory. That is considered an overload state.

To estimate the amount of memory it has available, when tor starts, it will use MaxMemInQueues or, if not set, will look at the total RAM available on the system and apply this algorithm:

    if RAM >= 8GB {
      memory = RAM * 40%
    } else {
      memory = RAM * 75%
    }
    /* Capped. */
    memory = min(memory, 8GB) -> [8GB on 64bit and 2GB on 32bit)
    /* Minimum value. */
    memory = max(250MB, memory)

To avoid an overloaded state we recommend to run a relay above 2GB of RAM on 64bit. 4GB is advised, although of course it doesn't hurt to add more RAM if you can.

One might notice that tor could be called by the OS OOM handler itself. Because tor takes the total memory on the system when it starts, if the overall system has many other applications running using RAM, it ends up eating too much memory. In this case the OS could OOM tor, without tor even noticing memory pressure.

tor_relay_load_socket_total

These lines indicate the relay is running out of sockets. The solution is to increase ulimit -n for the tor process.

tor_relay_load_tcp_exhaustion_total

These lines indicate the relay is running out of TCP ports.

Try to tune sysctl as described above.

tor_relay_load_global_rate_limit_reached_total

If this counter is incremented by some noticeable value over a short period of time, the relay is congested. It is likely being used as a Guard by a big onion service or for an ongoing DDoS on the network.

If your relay is still overloaded and you don't know why, please get in touch with network-report@torproject.org. You can encrypt your email using network-report OpenPGP key.