12 June 2016
How should the
RTP circuit breaker react to persistent excessive congestion
signalled via ECN? Indeed, should the circuit breaker
react to such a congestion signal?
During the IESG review of the RTP circuit breaker draft,
Mirja Kühlewind asked whether its response to ECN-CE markings was
correct. This turns out to be a surprisingly difficult question to
answer, and what is appropriate guidance for an implementation now might
not be appropriate in the future.
The guidelines in
RFC 3168 are that “upon the receipt by an ECN-Capable
transport of a single CE packet, the congestion control algorithms
followed at the end-systems MUST be essentially the same as the
congestion control response to a single dropped packet”
(RFC 6679 has similar
language). Since the congestion circuit breaker responds to the same
congestion signals as a congestion control algorithm, this suggests
that it should consider ECN-CE marked packets as lost packets when
calculating the TCP throughput estimate to determine if the congestion
circuit breaker triggers. Accordingly, the -15 version of the RTP
circuit breaker draft states that “ECN-CE marked packets SHOULD
be treated as if they were lost when calculating if the
congestion-based RTP circuit breaker (Section 4.3) has been met, unless
the RTP implementation can determine that the ECN-CE marking on this
path is not reliable”.
More recent work, however, has suggested that the response to an ECN-CE
mark ought to be less severe than the response to packet loss. For
example, Naeem Khademi's work on
TCP alternate back-off for ECN makes the argument that TCP
congestion control ought to back-off less in response to an ECN-CE
mark than to packet loss, because networks that generate ECN-CE marks
tend to use AQM schemes with much smaller buffers. For RTP congestion
control, the current versions of both
suggest responding differently to ECN-CE marked packets than to lost
packets, for quality of experience reasons, but make different
proposals for how the response ought to change. These imply that
a different circuit breaker threshold should be used for congestion
signalled by ECN-CE marks than for congestion signalled by packet
loss, but unfortunately offer no clear guidance on how the threshold
ought to be changed.
Looking further forward, there are suggestions that
forthcoming AQM proposals
might mark packets with ECN-CE in a significantly more aggressive
manner that at present, as part of a move towards using more
scaleable congestion control algorithms, like
DCTCP, in a
dual-queue scenario to lower latencies.
Any such deployment would likely be incompatible with deployed TCP
implementations, so is not a short-term issue, but would require
significant changes to the congestion circuit breaker response
(specifically, it requires a congestion control algorithm where the
throughput is proportional to 1/p, where p is the ECN-CE marking
probability, rather than to 1/sqrt(p) where p is the loss event rate,
and equivalent changes to the circuit breaker).
Given this, what should be the response of the RTP circuit breaker
to ECN-CE marks? Firstly, it's important to note that it is safe for
the circuit breaker to ignore ECN-CE marks entirely, since excessive
persistent congestion will eventually lead to packet loss that will
trigger the circuit breaker. Doing this will protect the network from
congestion collapse, but might result in sub-optimal user experience
for competing flows that share the bottleneck queue, since that queue
will be driven to overflow, inducing high latency.
If this is a concern, the only current guidance that can be given is
for implementations to treat ECN-CE marked packets as equivalent to
lost packets, whilst being aware that this might trigger the circuit
breaker prematurely in future, depending on how AQM and ECN deployment
evolves. Developers that implement a circuit breaker based on ECN-CE
marks will need to track future developments in AQM standards and
deployed ECN marking behaviour, and ensure their implementations are
updated to match.
The -16 version of the
RTP circuit breaker draft will be updated to reflect this guidance.