RTP Circuit Breaker: Interactions with Explicit Congestion Notification

How should the RTP circuit breaker react to persistent excessive congestion signalled via ECN? Indeed, should the circuit breaker react to such a congestion signal?

During the IESG review of the RTP circuit breaker draft, Mirja Kühlewind asked whether its response to ECN-CE markings was correct. This turns out to be a surprisingly difficult question to answer, and what is appropriate guidance for an implementation now might not be appropriate in the future.

The guidelines in RFC 3168 are that “upon the receipt by an ECN-Capable transport of a single CE packet, the congestion control algorithms followed at the end-systems MUST be essentially the same as the congestion control response to a single dropped packet” (RFC 6679 has similar language). Since the congestion circuit breaker responds to the same congestion signals as a congestion control algorithm, this suggests that it should consider ECN-CE marked packets as lost packets when calculating the TCP throughput estimate to determine if the congestion circuit breaker triggers. Accordingly, the -15 version of the RTP circuit breaker draft states that “ECN-CE marked packets SHOULD be treated as if they were lost when calculating if the congestion-based RTP circuit breaker (Section 4.3) has been met, unless the RTP implementation can determine that the ECN-CE marking on this path is not reliable”.

More recent work, however, has suggested that the response to an ECN-CE mark ought to be less severe than the response to packet loss. For example, Naeem Khademi's work on TCP alternate back-off for ECN makes the argument that TCP congestion control ought to back-off less in response to an ECN-CE mark than to packet loss, because networks that generate ECN-CE marks tend to use AQM schemes with much smaller buffers. For RTP congestion control, the current versions of both NADA and SCReAM suggest responding differently to ECN-CE marked packets than to lost packets, for quality of experience reasons, but make different proposals for how the response ought to change. These imply that a different circuit breaker threshold should be used for congestion signalled by ECN-CE marks than for congestion signalled by packet loss, but unfortunately offer no clear guidance on how the threshold ought to be changed.

Looking further forward, there are suggestions that forthcoming AQM proposals might mark packets with ECN-CE in a significantly more aggressive manner that at present, as part of a move towards using more scaleable congestion control algorithms, like DCTCP, in a dual-queue scenario to lower latencies. Any such deployment would likely be incompatible with deployed TCP implementations, so is not a short-term issue, but would require significant changes to the congestion circuit breaker response (specifically, it requires a congestion control algorithm where the throughput is proportional to 1/p, where p is the ECN-CE marking probability, rather than to 1/sqrt(p) where p is the loss event rate, and equivalent changes to the circuit breaker).

Given this, what should be the response of the RTP circuit breaker to ECN-CE marks? Firstly, it's important to note that it is safe for the circuit breaker to ignore ECN-CE marks entirely, since excessive persistent congestion will eventually lead to packet loss that will trigger the circuit breaker. Doing this will protect the network from congestion collapse, but might result in sub-optimal user experience for competing flows that share the bottleneck queue, since that queue will be driven to overflow, inducing high latency.

If this is a concern, the only current guidance that can be given is for implementations to treat ECN-CE marked packets as equivalent to lost packets, whilst being aware that this might trigger the circuit breaker prematurely in future, depending on how AQM and ECN deployment evolves. Developers that implement a circuit breaker based on ECN-CE marks will need to track future developments in AQM standards and deployed ECN marking behaviour, and ensure their implementations are updated to match.

The -16 version of the RTP circuit breaker draft will be updated to reflect this guidance.