draft-ietf-avtcore-multiplex-guidelines-04.txt   draft-ietf-avtcore-multiplex-guidelines-05.txt 
skipping to change at page 1, line 17 skipping to change at page 1, line 17
University of Glasgow University of Glasgow
H. Alvestrand H. Alvestrand
Google Google
R. Even R. Even
H. Zheng H. Zheng
Huawei Huawei
October 30, 2017 October 30, 2017
Guidelines for using the Multiplexing Features of RTP to Support Guidelines for using the Multiplexing Features of RTP to Support
Multiple Media Streams Multiple Media Streams
draft-ietf-avtcore-multiplex-guidelines-04 draft-ietf-avtcore-multiplex-guidelines-05
Abstract Abstract
The Real-time Transport Protocol (RTP) is a flexible protocol that The Real-time Transport Protocol (RTP) is a flexible protocol that
can be used in a wide range of applications, networks, and system can be used in a wide range of applications, networks, and system
topologies. That flexibility makes for wide applicability, but can topologies. That flexibility makes for wide applicability, but can
complicate the application design process. One particular design complicate the application design process. One particular design
question that has received much attention is how to support multiple question that has received much attention is how to support multiple
media streams in RTP. This memo discusses the available options and media streams in RTP. This memo discusses the available options and
design trade-offs, and provides guidelines on how to use the design trade-offs, and provides guidelines on how to use the
skipping to change at page 7, line 43 skipping to change at page 7, line 43
An RTP Session is the highest semantic layer in the RTP protocol, and An RTP Session is the highest semantic layer in the RTP protocol, and
represents an association between a group of communicating endpoints. represents an association between a group of communicating endpoints.
The set of participants that form an RTP session is defined as those The set of participants that form an RTP session is defined as those
that share a single synchronisation source space [RFC3550]. That is, that share a single synchronisation source space [RFC3550]. That is,
if a group of participants are each aware of the synchronisation if a group of participants are each aware of the synchronisation
source identifiers belonging to the other participants, then those source identifiers belonging to the other participants, then those
participants are in a single RTP session. A participant can become participants are in a single RTP session. A participant can become
aware of a synchronisation source identifier by receiving an RTP aware of a synchronisation source identifier by receiving an RTP
packet containing it in the SSRC field or CSRC list, by receiving an packet containing it in the SSRC field or CSRC list, by receiving an
RTCP packet mentioning it in an SSRC field, or through signalling RTCP packet mentioning it in an SSRC field, or through signalling
(e.g., the SDP OCGBPa=ssrc:OCOe attribute). Thus, the scope of an (e.g., the SDP "a=ssrc:" attribute). Thus, the scope of an RTP
RTP session is determined by the participants' network session is determined by the participants' network interconnection
interconnection topology, in combination with RTP and RTCP forwarding topology, in combination with RTP and RTCP forwarding strategies
strategies deployed by the endpoints and any middleboxes, and by the deployed by the endpoints and any middleboxes, and by the signalling.
signalling.
RTP does not contain a session identifier. Rather, it relies on the RTP does not contain a session identifier. Rather, it relies on the
underlying transport layer to separate different sessions, and on the underlying transport layer to separate different sessions, and on the
signalling to identify sessions in a manner that is meaningful to the signalling to identify sessions in a manner that is meaningful to the
application. The signalling layer might give sessions an explicit application. The signalling layer might give sessions an explicit
identifier, or their identification might be implicit based on the identifier, or their identification might be implicit based on the
addresses and ports used. Accordingly, a single RTP Session can have addresses and ports used. Accordingly, a single RTP Session can have
multiple associated identifiers, explicit and implicit, belonging to multiple associated identifiers, explicit and implicit, belonging to
different contexts. For example, when running RTP on top of UDP/IP, different contexts. For example, when running RTP on top of UDP/IP,
an RTP endpoint can identify and delimit an RTP Session from other an RTP endpoint can identify and delimit an RTP Session from other
RTP Sessions using the UDP source and destination IP addresses and RTP Sessions using the UDP source and destination IP addresses and
UDP port numbers. Another example is when using SDP grouping UDP port numbers. Another example is when using SDP grouping
framework [RFC5888] which uses an identifier per OCGBPm=OCOe-line; if framework [RFC5888] which uses an identifier per "m="-line; if there
there is a one-to-one mapping between OCGBPm=OCOe-lines and RTP is a one-to-one mapping between "m="-lines and RTP sessions, that
sessions, that grouping framework identifier will identify an RTP grouping framework identifier will identify an RTP Session.
Session. [I-D.ietf-mmusic-sdp-bundle-negotiation] extends the [I-D.ietf-mmusic-sdp-bundle-negotiation] extends the "m-"-line for
OCGBPm-OCGBP-line for bundled media, which adds complexity to bundled media, which adds complexity to demultiplexing media stream.
demultiplexing media stream. Section 10.2 of Section 10.2 of [I-D.ietf-mmusic-sdp-bundle-negotiation] provides
[I-D.ietf-mmusic-sdp-bundle-negotiation] provides information about information about how RTP/RTCP streams are associated with SDP media
how RTP/RTCP streams are associated with SDP media description. description.
RTP sessions are globally unique, but their identity can only be RTP sessions are globally unique, but their identity can only be
determined by the communication context at an endpoint of the determined by the communication context at an endpoint of the
session, or by a middlebox that is aware of the session context. The session, or by a middlebox that is aware of the session context. The
relationship between RTP sessions depending on the underlying relationship between RTP sessions depending on the underlying
application, transport, and signalling protocol. The RTP protocol application, transport, and signalling protocol. The RTP protocol
makes no normative statements about the relationship between makes no normative statements about the relationship between
different RTP sessions, however the applications that use more than different RTP sessions, however the applications that use more than
one RTP session will have some higher layer understanding of the one RTP session will have some higher layer understanding of the
relationship between the sessions they create. relationship between the sessions they create.
skipping to change at page 8, line 48 skipping to change at page 8, line 47
multiple synchronisation sources identifiers if it contains multiple multiple synchronisation sources identifiers if it contains multiple
RTP sources (i.e., if it sends multiple media streams). Endpoints RTP sources (i.e., if it sends multiple media streams). Endpoints
that are both RTP sources and RTP sinks use the same synchronisation that are both RTP sources and RTP sinks use the same synchronisation
sources in both roles. At any given time, a RTP source has one and sources in both roles. At any given time, a RTP source has one and
only one SSRC - although that can change over the lifetime of the RTP only one SSRC - although that can change over the lifetime of the RTP
source or sink. source or sink.
The synchronisation Source identifier is a 32-bit unsigned integer. The synchronisation Source identifier is a 32-bit unsigned integer.
It is present in every RTP and RTCP packet header, and in the payload It is present in every RTP and RTCP packet header, and in the payload
of some RTCP packet types. It can also be present in SDP signalling. of some RTCP packet types. It can also be present in SDP signalling.
Unless pre-signalled using the SDP OCGBPa=ssrc:OCOe attribute Unless pre-signalled using the SDP "a=ssrc:" attribute [RFC5576], the
[RFC5576], the synchronisation source identifier is chosen at random. synchronisation source identifier is chosen at random. It is not
It is not dependent on the network address of the endpoint, and is dependent on the network address of the endpoint, and is intended to
intended to be unique within an RTP session. Synchronisation source be unique within an RTP session. Synchronisation source identifier
identifier collisions can occur, and are handled as specified in collisions can occur, and are handled as specified in [RFC3550] and
[RFC3550] and [RFC5576], resulting in the synchronisation source
identifier of the affecting RTP sources and/or sinks changing. An [RFC5576], resulting in the synchronisation source identifier of the
RTP source that changes its RTP Session identifier (e.g. source affecting RTP sources and/or sinks changing. An RTP source that
transport address) during a session has to choose a new SSRC changes its RTP Session identifier (e.g. source transport address)
identifier to avoid being interpreted as looped source. during a session has to choose a new SSRC identifier to avoid being
interpreted as looped source.
Synchronisation source identifiers that belong to the same Synchronisation source identifiers that belong to the same
synchronisation context (i.e., that represent media streams that can synchronisation context (i.e., that represent media streams that can
be synchronised using information in RTCP SR packets) are indicated be synchronised using information in RTCP SR packets) are indicated
by use of identical CNAME chunks in corresponding RTCP SDES packets. by use of identical CNAME chunks in corresponding RTCP SDES packets.
SDP signalling can also be used to provide explicit grouping of SDP signalling can also be used to provide explicit grouping of
synchronisation sources [RFC5576]. synchronisation sources [RFC5576].
In some cases, the same SSRC Identifier value is used to relate In some cases, the same SSRC Identifier value is used to relate
streams in two different RTP Sessions, such as in Multi-Session streams in two different RTP Sessions, such as in Multi-Session
skipping to change at page 9, line 32 skipping to change at page 9, line 32
RTP sessions. RTP sessions.
Note that RTP sequence number and RTP timestamp are scoped by the Note that RTP sequence number and RTP timestamp are scoped by the
synchronisation source. Each RTP source will have a different synchronisation source. Each RTP source will have a different
synchronisation source, and the corresponding media stream will have synchronisation source, and the corresponding media stream will have
a separate RTP sequence number and timestamp space. a separate RTP sequence number and timestamp space.
An SSRC identifier is used by different type of sources as well as An SSRC identifier is used by different type of sources as well as
sinks: sinks:
Real Media Source: Connected to a OCGBPphysicalOCOe media source, Real Media Source: Connected to a "physical" media source, for
for example a camera or microphone. example a camera or microphone.
Processed Media Source: A source with some attributed property Processed Media Source: A source with some attributed property
generated by some network node, for example a filtering function generated by some network node, for example a filtering function
in an RTP mixer that provides the most active speaker based on in an RTP mixer that provides the most active speaker based on
some criteria, or a mix representing a set of other sources. some criteria, or a mix representing a set of other sources.
RTP Sink: A source that does not generate any RTP media stream in RTP Sink: A source that does not generate any RTP media stream in
itself (e.g. an endpoint or middlebox only receiving in an RTP itself (e.g. an endpoint or middlebox only receiving in an RTP
session). It still needs a sender SSRC for use as source in RTCP session). It still needs a sender SSRC for use as source in RTCP
reports. reports.
skipping to change at page 10, line 45 skipping to change at page 10, line 45
3.2.4. RTP Payload Type 3.2.4. RTP Payload Type
Each Media Stream utilises one or more RTP payload formats. An RTP Each Media Stream utilises one or more RTP payload formats. An RTP
payload format describes how the output of a particular media codec payload format describes how the output of a particular media codec
is framed and encoded into RTP packets. The payload format used is is framed and encoded into RTP packets. The payload format used is
identified by the payload type field in the RTP data packet header. identified by the payload type field in the RTP data packet header.
The combination therefore identifies a specific Media Stream encoding The combination therefore identifies a specific Media Stream encoding
format. The format definition can be taken from [RFC3551] for format. The format definition can be taken from [RFC3551] for
statically allocated payload types, but ought to be explicitly statically allocated payload types, but ought to be explicitly
defined in signalling, such as SDP, both for static and dynamic defined in signalling, such as SDP, both for static and dynamic
Payload Types. The term OCGBPformatOCOe here includes whatever can Payload Types. The term "format" here includes whatever can be
be described by out-of-band signalling means. In SDP, the term described by out-of-band signalling means. In SDP, the term "format"
OCGBPformatOCOe includes media type, RTP timestamp sampling rate, includes media type, RTP timestamp sampling rate, codec, codec
codec, codec configuration, payload format configurations, and configuration, payload format configurations, and various robustness
various robustness mechanisms such as redundant encodings [RFC2198]. mechanisms such as redundant encodings [RFC2198].
The payload type is scoped by sending endpoint within an RTP Session. The payload type is scoped by sending endpoint within an RTP Session.
All synchronisation sources sent from a single endpoint share the All synchronisation sources sent from a single endpoint share the
same payload types definitions. The RTP Payload Type is designed same payload types definitions. The RTP Payload Type is designed
such that only a single Payload Type is valid at any time instant in such that only a single Payload Type is valid at any time instant in
the RTP source's RTP timestamp time line, effectively time- the RTP source's RTP timestamp time line, effectively time-
multiplexing different Payload Types if any change occurs. The multiplexing different Payload Types if any change occurs. The
payload type used can change on a per-packet basis for an SSRC, for payload type used can change on a per-packet basis for an SSRC, for
example a speech codec making use of generic comfort noise [RFC3389]. example a speech codec making use of generic comfort noise [RFC3389].
If there is a true need to send multiple Payload Types for the same If there is a true need to send multiple Payload Types for the same
skipping to change at page 11, line 32 skipping to change at page 11, line 32
The payload type is not a multiplexing point at the RTP layer (see The payload type is not a multiplexing point at the RTP layer (see
Appendix A for a detailed discussion of why using the payload type as Appendix A for a detailed discussion of why using the payload type as
an RTP multiplexing point does not work). The RTP payload type is, an RTP multiplexing point does not work). The RTP payload type is,
however, used to determine how to render a media stream, and so can however, used to determine how to render a media stream, and so can
be viewed as selecting a rendering context. The rendering context be viewed as selecting a rendering context. The rendering context
can be defined by the signalling, and the RTP payload type number is can be defined by the signalling, and the RTP payload type number is
sometimes used to associate an RTP media stream with the signalling. sometimes used to associate an RTP media stream with the signalling.
This association is possible provided unique RTP payload type numbers This association is possible provided unique RTP payload type numbers
are used in each context. For example, an RTP media stream can be are used in each context. For example, an RTP media stream can be
associated with an SDP OCGBPm=OCOe line by comparing the RTP payload associated with an SDP "m=" line by comparing the RTP payload type
type numbers used by the media stream with payload types signalled in numbers used by the media stream with payload types signalled in the
the OCGBPa=rtpmap:OCOe lines in the media sections of the SDP. If "a=rtpmap:" lines in the media sections of the SDP. If RTP media
RTP media streams are being associated with signalling contexts based streams are being associated with signalling contexts based on the
on the RTP payload type, then the assignment of RTP payload type RTP payload type, then the assignment of RTP payload type numbers
numbers needs to be unique across signalling contexts; if the same needs to be unique across signalling contexts; if the same RTP
RTP payload format configuration is used in multiple contexts, then a payload format configuration is used in multiple contexts, then a
different RTP payload type number has to be assigned in each context different RTP payload type number has to be assigned in each context
to ensure uniqueness. If the RTP payload type number is not being to ensure uniqueness. If the RTP payload type number is not being
used to associated RTP media streams with a signalling context, then used to associated RTP media streams with a signalling context, then
the same RTP payload type number can be used to indicate the exact the same RTP payload type number can be used to indicate the exact
same RTP payload format configuration in multiple contexts. In case same RTP payload format configuration in multiple contexts. In case
of bundled media, Section 10.2 of of bundled media, Section 10.2 of
[I-D.ietf-mmusic-sdp-bundle-negotiation] provides more information on [I-D.ietf-mmusic-sdp-bundle-negotiation] provides more information on
SDP signalling. SDP signalling.
3.3. Issues Related to RTP Topologies 3.3. Issues Related to RTP Topologies
skipping to change at page 13, line 25 skipping to change at page 13, line 25
existing RTP session and when it is better to use multiple RTP existing RTP session and when it is better to use multiple RTP
sessions. This section tries to discuss the various considerations sessions. This section tries to discuss the various considerations
needed. needed.
3.4.1. The RTP Specification 3.4.1. The RTP Specification
RFC 3550 contains some recommendations and a bullet list with 5 RFC 3550 contains some recommendations and a bullet list with 5
arguments for different aspects of RTP multiplexing. Let's review arguments for different aspects of RTP multiplexing. Let's review
Section 5.2 of [RFC3550], reproduced below: Section 5.2 of [RFC3550], reproduced below:
OCGBPFor efficient protocol processing, the number of multiplexing "For efficient protocol processing, the number of multiplexing points
points should be minimised, as described in the integrated layer should be minimised, as described in the integrated layer processing
processing design principle [ALF]. In RTP, multiplexing is provided design principle [ALF]. In RTP, multiplexing is provided by the
by the destination transport address (network address and port destination transport address (network address and port number) which
number) which is different for each RTP session. For example, in a is different for each RTP session. For example, in a teleconference
teleconference composed of audio and video media encoded separately, composed of audio and video media encoded separately, each medium
each medium SHOULD be carried in a separate RTP session with its own SHOULD be carried in a separate RTP session with its own destination
destination transport address. transport address.
Separate audio and video streams SHOULD NOT be carried in a single Separate audio and video streams SHOULD NOT be carried in a single
RTP session and demultiplexed based on the payload type or SSRC RTP session and demultiplexed based on the payload type or SSRC
fields. Interleaving packets with different RTP media types but fields. Interleaving packets with different RTP media types but
using the same SSRC would introduce several problems: using the same SSRC would introduce several problems:
1. If, say, two audio streams shared the same RTP session and the 1. If, say, two audio streams shared the same RTP session and the
same SSRC value, and one were to change encodings and thus same SSRC value, and one were to change encodings and thus
acquire a different RTP payload type, there would be no general acquire a different RTP payload type, there would be no general
way of identifying which stream had changed encodings. way of identifying which stream had changed encodings.
skipping to change at page 14, line 27 skipping to change at page 14, line 27
RTP session would avoid the first three problems but not the last RTP session would avoid the first three problems but not the last
two. two.
On the other hand, multiplexing multiple related sources of the same On the other hand, multiplexing multiple related sources of the same
medium in one RTP session using different SSRC values is the norm for medium in one RTP session using different SSRC values is the norm for
multicast sessions. The problems listed above don't apply: an RTP multicast sessions. The problems listed above don't apply: an RTP
mixer can combine multiple audio sources, for example, and the same mixer can combine multiple audio sources, for example, and the same
treatment is applicable for all of them. It might also be treatment is applicable for all of them. It might also be
appropriate to multiplex streams of the same medium using different appropriate to multiplex streams of the same medium using different
SSRC values in other scenarios where the last two problems do not SSRC values in other scenarios where the last two problems do not
apply.OCOe apply."
Let's consider one argument at a time. The first is an argument for Let's consider one argument at a time. The first is an argument for
using different SSRC for each individual media stream, which is very using different SSRC for each individual media stream, which is very
applicable. applicable.
The second argument is advocating against using payload type The second argument is advocating against using payload type
multiplexing, which still stands as can been seen by the extensive multiplexing, which still stands as can been seen by the extensive
list of issues found in Appendix A. list of issues found in Appendix A.
The third argument is yet another argument against payload type The third argument is yet another argument against payload type
multiplexing. multiplexing.
The fourth is an argument against multiplexing media streams that The fourth is an argument against multiplexing media streams that
require different handling into the same session. As we saw in the require different handling into the same session. As we saw in the
discussion of RTP mixers, the RTP mixer has to embed application discussion of RTP mixers, the RTP mixer has to embed application
logic in order to handle streams anyway; the separation of streams logic in order to handle streams anyway; the separation of streams
according to stream type is just another piece of application logic, according to stream type is just another piece of application logic,
which might or might not be appropriate for a particular application. which might or might not be appropriate for a particular application.
A type of application that can mix different media sources A type of application that can mix different media sources "blindly"
OCGBPblindlyOCOe is the audio only OCGBPtelephoneOCOe bridge; most is the audio only "telephone" bridge; most other type of application
other type of application needs application-specific logic to perform needs application-specific logic to perform the mix correctly.
the mix correctly.
The fifth argument discusses network aspects that we will discuss The fifth argument discusses network aspects that we will discuss
more below in Section 4.2. It also goes into aspects of more below in Section 4.2. It also goes into aspects of
implementation, like decomposed endpoints where different processes implementation, like decomposed endpoints where different processes
or inter-connected devices handle different aspects of the whole or inter-connected devices handle different aspects of the whole
multi-media session. multi-media session.
A summary of RFC 3550's view on multiplexing is to use unique SSRCs A summary of RFC 3550's view on multiplexing is to use unique SSRCs
for anything that is its own media/packet stream, and to use for anything that is its own media/packet stream, and to use
different RTP sessions for media streams that don't share a media different RTP sessions for media streams that don't share a media
skipping to change at page 15, line 28 skipping to change at page 15, line 27
consideration regarding the usage of RTP session and considers consideration regarding the usage of RTP session and considers
multiple media types in one RTP session as possible choice for the multiple media types in one RTP session as possible choice for the
RTP application designer. RTP application designer.
3.4.2. Multiple SSRCs in a Session 3.4.2. Multiple SSRCs in a Session
Using multiple SSRCs in an RTP session at one endpoint requires Using multiple SSRCs in an RTP session at one endpoint requires
resolving some unclear aspects of the RTP specification. These could resolving some unclear aspects of the RTP specification. These could
potentially lead to some interoperability issues as well as some potentially lead to some interoperability issues as well as some
potential significant inefficiencies. These are further discussed in potential significant inefficiencies. These are further discussed in
OCGBPRTP Considerations for Endpoints Sending Multiple Media "RTP Considerations for Endpoints Sending Multiple Media Streams"
StreamsOCOe [RFC8108]. A application designer needs to consider [RFC8108]. A application designer needs to consider these issues and
these issues and the impact availability or lack of the optimization the impact availability or lack of the optimization in the endpoints
in the endpoints has on their application. has on their application.
If an application will become affected by the issues described, using If an application will become affected by the issues described, using
Multiple RTP sessions can mitigate these issues. Multiple RTP sessions can mitigate these issues.
3.4.3. Binding Related Sources 3.4.3. Binding Related Sources
A common problem in a number of various RTP extensions has been how A common problem in a number of various RTP extensions has been how
to bind related RTP sources and their media streams together. This to bind related RTP sources and their media streams together. This
issue is common to both using additional SSRCs and Multiple RTP issue is common to both using additional SSRCs and Multiple RTP
sessions. sessions.
skipping to change at page 31, line 27 skipping to change at page 31, line 27
types needs to be handled. types needs to be handled.
h. If the applications need finer control over which session h. If the applications need finer control over which session
participants that are included in different sets of security participants that are included in different sets of security
associations, most key-management will have difficulties associations, most key-management will have difficulties
establishing such a session. establishing such a session.
5.5. Summary 5.5. Summary
There are some clear relations between these archetypes. Both the There are some clear relations between these archetypes. Both the
OCGBPsingle SSRC per RTP sessionOCOe and the OCGBPmultiple media "single SSRC per RTP session" and the "multiple media types in one
types in one sessionOCOe are cases which require full explicit session" are cases which require full explicit signalling of the
signalling of the media stream relations. However, they operate on media stream relations. However, they operate on two different
two different levels where the first primarily enables session level levels where the first primarily enables session level binding, and
binding, and the second needs to do it all on SSRC level. From the second needs to do it all on SSRC level. From another
another perspective, the two solutions are the two extreme points perspective, the two solutions are the two extreme points when it
when it comes to number of RTP sessions needed. comes to number of RTP sessions needed.
The two other archetypes OCGBPMultiple SSRCs of the Same Media The two other archetypes "Multiple SSRCs of the Same Media Type" and
TypeOCOe and OCGBPMultiple Sessions for one Media TypeOCOe are "Multiple Sessions for one Media Type" are examples of two other
examples of two other cases that first of all allows for some cases that first of all allows for some implicit mapping of the role
implicit mapping of the role or usage of the media streams based on or usage of the media streams based on which RTP session they appear
which RTP session they appear in. It thus potentially allows for in. It thus potentially allows for less signalling and in particular
less signalling and in particular reduced need for real-time reduced need for real-time signalling in dynamic sessions. They also
signalling in dynamic sessions. They also represent points in represent points in between the first two when it comes to amount of
between the first two when it comes to amount of RTP sessions RTP sessions established, i.e. representing an attempt to reduce the
established, i.e. representing an attempt to reduce the amount of amount of sessions as much as possible without compromising the
sessions as much as possible without compromising the functionality functionality the session provides both on network level and on
the session provides both on network level and on signalling level. signalling level.
6. Summary considerations and guidelines 6. Summary considerations and guidelines
6.1. Guidelines 6.1. Guidelines
This section contains a number of recommendations for implementers or This section contains a number of recommendations for implementers or
specification writers when it comes to handling multi-stream. specification writers when it comes to handling multi-stream.
Do not Require the same SSRC across Sessions: As discussed in Do not Require the same SSRC across Sessions: As discussed in
Section 3.4.3 there exist drawbacks in using the same SSRC in Section 3.4.3 there exist drawbacks in using the same SSRC in
multiple RTP sessions as a mechanism to bind related media streams multiple RTP sessions as a mechanism to bind related media streams
 End of changes. 13 change blocks. 
73 lines changed or deleted 72 lines changed or added

This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/