csperkins.org

Multiplexing and RTP Sessions

The following is extracted from version -02 of our RTP Requirements for RTC-Web draft, and describes some of the issues relating to RTP session multiplexing. This was discussed at IETF 81 in Quebec City. The linked set of slides on Multiplexing RTP Sessions were prepared for the RTC-Web working group session at that meeting, but were not presented to the working group: discussion in a break-out meeting covered the key points.

Expected Topologies

As RTC-Web is focused on peer to peer connections established from clients in web browsers the following topologies further discussed in RTP Topologies RFC5117 are primarily considered. The topologies are depicted and briefly explained here for ease of the reader.

The point to point topology (Figure 1) is going to be very common in single user to single user applications.

Figure 1: Point to Point

For small multiparty sessions it is practical enough to create RTP sessions by letting every participant send individual unicast RTP/UDP flows to each of the other participants (Figure 2). This is called multi-unicast and is unfortunately not discussed in the RTP Topologies RFC. This topology has the benefit of not requiring central nodes. The downside is that it increases the used bandwidth at each sender by requiring one copy of the media streams for each participant that are part of the same session beyond the sender itself. Thus this is limited to scenarios with few end-points unless the media is very low bandwidth.

Figure 2: Multi-Unicast

It needs to be noted that, if this topology is to be supported by the RTC-Web framework, it needs to be possible to connect one RTP session to multiple established peer to peer flows that are individually established.

An RTP mixer (Figure 3) is a centralised point that selects or mixes content in a conference to optimise the RTP session so that each end- point only needs connect to one entity, the mixer. The mixer also reduces the bit-rate needs as the media sent from the mixer to the end-point can be optimised in different ways. These optimisations include methods like only choosing media from the currently most active speaker or mixing together audio so that only one audio stream is required in stead of 3 in the depicted scenario. The downside of the mixer is that someone is required to provide the actual mixer.

Figure 2: RTP Mixer with Only Unicast Paths

If one wants a less complex central node it is possible to use an relay (called an Transport Translator) (Figure 4) that takes on the role of forwarding the media to the other end-points but doesn't perform any media processing. It simply forwards the media from all other to all the other. Thus one endpoint A will only need to send a media once to the relay, but it will still receive 3 RTP streams with the media if B, C and D all currently transmits.

Figure 2: RTP Translator with Only Unicast Paths

To support legacy end-point (B) that don't fulfil the requirements of RTC-Web it is possible to insert a Translator (Figure 5) that takes on the role to ensure that from A's perspective B looks like a fully compliant end-point. Thus it is the combination of the Translator and B that looks like the end-point B. The intention is that the presence of the translator is transparent to A, however it is not certain that is possible. Thus this case is include so that it can be discussed if any mechanism specified to be used for RTC-Web results in such issues and how to handle them.

Figure 2: RTP Translator Towards Legacy End-Point

RTP Multiplexing Points

There are three fundamental points of multiplexing within the RTP framework:

These multiplexing points area fundamental part of the design of RTP and are discussed in Section 5.2 of [RFC3550]. Of special importance is the need to separate different RTP sessions using a multiplexing mechanism at some lower layer than RTP, rather than trying to combine several RTP sessions implicitly into one lower layer flow. This will be further discussed in the next section.

RTP Session Multiplexing

In today's network with prolific use of Network Address Translators (NAT) and Firewalls (FW), there is a desire to reduce the number of transport layer ports used by an real-time media application using RTP. This has led some to suggest multiplexing two or more RTP sessions on a single transport layer flow, using either the Payload Type or SSRC to demultiplex the sessions, in violation of the rules outlined above. It is not the first time some people look at RTP and question the need for using RTP sessions for different media types, and even more the potential need to separate different media streams of the same type into different session due to their different purposes. Section 5.2 of [RFC3550] outlines some of those problems; we elaborate on that discussion, and on other problems that occurs if one violates this part of the RTP design and architecture.

Why RTP Sessions Should be Demultiplexed by the Transport

As discussed in Section 5.2 of [RFC3550], multiplexing several RTP sessions (e.g., audio and video) onto a single transport layer flow introduces the following problems:

We do note that some of the above issues are resolved as long as there is explicit separation of the RTP sessions when transported over the same lower layer transport, for example by inserting a multiplexing layer in between the lower transport and the RTP/RTCP headers. But a number of the above issue are not resolved by this.

In the RTCWEB context, i.e. web browsers running on various end- points it might appear unlikely that flow based QoS is available on the end-points that will support RTCWEB. We don't disagree that it is unlikely for the common case of users in their home- network or at WiFi hotspots will have flow-based QoS available. However, if one considers enterprise users, especially using intranet applications, the availability and desire to use QoS is not implausible. There are also web users who use networks that are more resource-constrained than wired networks and WIFI networks, for example cellular network. The current access network QoS mechanism for user traffic in cellular technology from 3GPP are flow based.

RTP's design hasn't been changed, although session multiplexing related topics have been discussed at various points of RTP's 20 year history. The fact is that numerous RTP mechanism and extensions have been defined assuming that one can perform session multiplexing when needed. Mechanism that has been identified as problematic if one doesn't do session separation are:

As can be seen, the requirement that separate RTP sessions are carried in separate transport-layer flows is fundamental to the design of RTP. Due to this design principle, implementors of various services or applications using RTP have not commonly violated this model, and have separated RTP sessions onto different transport layer flows. After 15 years of deployment of RTP in its current form, any move to change this assumption must carefully consider the backwards compatibility problems that this will cause. In particular, since widespread use of multiplexed RTP sessions in RTC-Web will almost certainly cause their use in other scenarios, the discussion regarding compatibility must be wider than just whether multiplexing works for the extremely limited subset of RTP use cases currently being considered in the RTC-Web group. Any such multiplexing extension to RTP must therefore be developed by the AVTCORE working group, since it has much broader applicability and scope than RTC- Web.

Arguments for a single transport flow

The arguments we are aware of for why it is desirable to use a single underlying transport (e.g., UDP) flow for all media, rather than one flow for each type of media are the following:

Summary

As we have noted in the preceding sections, implicit multiplexing of multiple RTP sessions onto a single transport flow raises a large number of backwards compatibility issues. It has been argued that these issues are either not important, since the RTP features disrupted are not of interest to the current set of RTC-Web use cases, or can be solved by somehow explicitly dividing the SSRC space into different regions for different RTP sessions. We believe the first argument is short-sighted: those RTP features may not be important today, but the successful deployment of simple RTC-Web applications will generate interest to try more advanced scenarios, which may well need those features. Partitioning the SSRC space to separate RTP sessions results in new set of issues, where the biggest from our point of view is that it effectively creates a new variant of the RTP protocol, which is incompatible with standard RTP. Having two different variants of the core functionality of RTP will make it much more difficult to develop future protocol extensions, and the new variant will likely also have different set of extensions that work. In addition the two versions aren't directly interoperable, and will force anyone that want to interconnect the two version to deploy (complex) gateways. It also reduces the common user base and interest in maintaining and developing either version.

On the other hand, we are sympathetic to the argument that using a single transport flow does save some time in setup processing, it will save some resources on NATs and FWs that are in between the end- points communicating, it may have somewhat higher success rate of session establishment.

Thus we consider it required that RTP sessions are multiplexed using an explicit mechanism. We strongly recommend that the mechanism used to accomplish this multiplexing is to use unique UDP flows for each RTP session, based on simplicity and interoperability. However, we can accept a WG consensus that using a single transport layer flow between peers is the default, and that also the fallback of using separate UDP flows are supported, under one constraint: that the RTP sessions are explicitly multiplexed in such a way existing mechanism or extensions to RTP are not prevented to work, and that the solution does not result in that an alternative variant of RTP is created (i.e., it must not disrupt RTCP processing, and the RTP semantics). In this later case we recommend that some type of multiplexing layer is inserted between UDP flow and the RTP/ RTCP headers to separate the RTP sessions, since removing this shim- layer and gatewaying to standard RTP sessions is simpler than trying to separate RTP sessions that are multiplexed together to gateway them to standard RTP sessions.