draft-ietf-avtcore-multiplex-guidelines-00.txt   draft-ietf-avtcore-multiplex-guidelines-01.txt 
Network Working Group M. Westerlund Network Working Group M. Westerlund
Internet-Draft B. Burman Internet-Draft B. Burman
Intended status: Informational Ericsson Intended status: Informational Ericsson
Expires: October 24, 2013 C. Perkins Expires: January 16, 2014 C. Perkins
University of Glasgow University of Glasgow
H. Alvestrand H. Alvestrand
Google Google
April 22, 2013 July 15, 2013
Guidelines for using the Multiplexing Features of RTP Guidelines for using the Multiplexing Features of RTP to Support
draft-ietf-avtcore-multiplex-guidelines-00 Multiple Media Streams
draft-ietf-avtcore-multiplex-guidelines-01
Abstract Abstract
Real-time Transport Protocol (RTP) is a flexible protocol possible to The Real-time Transport Protocol (RTP) is a flexible protocol that
use in a wide range of applications and network and system can be used in a wide range of applications, networks, and system
topologies. This flexibility and the implications of different topologies. That flexibility makes for wide applicability, but can
choices should be understood by any application developer using RTP. complicate the application design process. One particular design
To facilitate that understanding, this document contains an in-depth question that has received much attention is how to support multiple
discussion of the usage of RTP's multiplexing points; the RTP session media streams in RTP. This memo discusses the available options and
and the Synchronisation Source Identifier (SSRC). The document tries design trade-offs, and provides guidelines on how to use the
to give guidance and source material for an analysis on the most multiplexing features of RTP to support multiple media streams.
suitable choices for the application being designed.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 24, 2013. This Internet-Draft will expire on January 16, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Subjects Out of Scope . . . . . . . . . . . . . . . . . . 6 2.2. Subjects Out of Scope . . . . . . . . . . . . . . . . . . 6
3. RTP Concepts . . . . . . . . . . . . . . . . . . . . . . . . 7 3. Reasons for Multiplexing and Grouping RTP Media Streams . . . 6
3.1. Session . . . . . . . . . . . . . . . . . . . . . . . . . 7 4. RTP Multiplexing Points . . . . . . . . . . . . . . . . . . . 7
3.2. SSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.1. RTP Session . . . . . . . . . . . . . . . . . . . . . . . 8
3.3. CSRC . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Synchronisation Source (SSRC) . . . . . . . . . . . . . . 9
3.4. Payload Type . . . . . . . . . . . . . . . . . . . . . . 10 4.3. Contributing Source (CSRC) . . . . . . . . . . . . . . . 10
4. Multiple Streams Alternatives . . . . . . . . . . . . . . . . 11 4.4. RTP Payload Type . . . . . . . . . . . . . . . . . . . . 11
5. RTP Topologies and Issues . . . . . . . . . . . . . . . . . . 12 5. RTP Topologies and Issues . . . . . . . . . . . . . . . . . . 12
5.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 13 5.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 12
5.2. Translators & Gateways . . . . . . . . . . . . . . . . . 13 5.2. Translators & Gateways . . . . . . . . . . . . . . . . . 13
5.3. Point to Multipoint Using Multicast . . . . . . . . . . . 14 5.3. Point to Multipoint Using Multicast . . . . . . . . . . . 13
5.4. Point to Multipoint Using an RTP Transport Translator . . 15 5.4. Point to Multipoint Using an RTP Transport Translator . . 14
5.5. Point to Multipoint Using an RTP Mixer . . . . . . . . . 15 5.5. Point to Multipoint Using an RTP Mixer . . . . . . . . . 15
6. Multiple Streams Discussion . . . . . . . . . . . . . . . . . 16 6. RTP Multiplexing: When to Use Multiple RTP Sessions . . . . . 15
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 16 6.1. RTP and RTCP Protocol Considerations . . . . . . . . . . 16
6.2. RTP/RTCP Aspects . . . . . . . . . . . . . . . . . . . . 16 6.1.1. The RTP Specification . . . . . . . . . . . . . . . . 16
6.2.1. The RTP Specification . . . . . . . . . . . . . . . . 16 6.1.2. Multiple SSRCs in a Session . . . . . . . . . . . . . 18
6.2.2. Multiple SSRCs in a Session . . . . . . . . . . . . . 19 6.1.3. Handling Varying Sets of Senders . . . . . . . . . . 19
6.2.3. Handling Varying Sets of Senders . . . . . . . . . . 19 6.1.4. Cross Session RTCP Requests . . . . . . . . . . . . . 19
6.2.4. Cross Session RTCP Requests . . . . . . . . . . . . . 19 6.1.5. Binding Related Sources . . . . . . . . . . . . . . . 19
6.2.5. Binding Related Sources . . . . . . . . . . . . . . . 20 6.1.6. Forward Error Correction . . . . . . . . . . . . . . 21
6.2.6. Forward Error Correction . . . . . . . . . . . . . . 21 6.1.7. Transport Translator Sessions . . . . . . . . . . . . 21
6.2.7. Transport Translator Sessions . . . . . . . . . . . . 22 6.2. Interworking Considerations . . . . . . . . . . . . . . . 21
6.3. Interworking . . . . . . . . . . . . . . . . . . . . . . 22 6.2.1. Types of Interworking . . . . . . . . . . . . . . . . 22
6.3.1. Types of Interworking . . . . . . . . . . . . . . . . 22 6.2.2. RTP Translator Interworking . . . . . . . . . . . . . 22
6.3.2. RTP Translator Interworking . . . . . . . . . . . . . 22 6.2.3. Gateway Interworking . . . . . . . . . . . . . . . . 22
6.3.3. Gateway Interworking . . . . . . . . . . . . . . . . 23 6.2.4. Multiple SSRC Legacy Considerations . . . . . . . . . 23
6.3.4. Multiple SSRC Legacy Considerations . . . . . . . . . 24 6.3. Network Considerations . . . . . . . . . . . . . . . . . 24
6.4. Network Aspects . . . . . . . . . . . . . . . . . . . . . 24 6.3.1. Quality of Service . . . . . . . . . . . . . . . . . 24
6.4.1. Quality of Service . . . . . . . . . . . . . . . . . 25 6.3.2. NAT and Firewall Traversal . . . . . . . . . . . . . 25
6.4.2. NAT and Firewall Traversal . . . . . . . . . . . . . 25 6.3.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 26
6.4.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 27 6.3.4. Multiplexing multiple RTP Session on a Single
6.4.4. Multiplexing multiple RTP Session on a Single
Transport . . . . . . . . . . . . . . . . . . . . . . 27 Transport . . . . . . . . . . . . . . . . . . . . . . 27
6.5. Security Aspects . . . . . . . . . . . . . . . . . . . . 28 6.4. Security and Key Management Considerations . . . . . . . 27
6.5.1. Security Context Scope . . . . . . . . . . . . . . . 28 6.4.1. Security Context Scope . . . . . . . . . . . . . . . 27
6.5.2. Key Management for Multi-party session . . . . . . . 29 6.4.2. Key Management for Multi-party session . . . . . . . 28
6.5.3. Complexity Implications . . . . . . . . . . . . . . . 29 6.4.3. Complexity Implications . . . . . . . . . . . . . . . 28
7. Arch-Types . . . . . . . . . . . . . . . . . . . . . . . . . 29 7. Archetypes . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.1. Single SSRC per Session . . . . . . . . . . . . . . . . . 29 7.1. Single SSRC per Session . . . . . . . . . . . . . . . . . 29
7.2. Multiple SSRCs of the Same Media Type . . . . . . . . . . 31 7.2. Multiple SSRCs of the Same Media Type . . . . . . . . . . 31
7.3. Multiple Sessions for one Media type . . . . . . . . . . 32 7.3. Multiple Sessions for one Media type . . . . . . . . . . 32
7.4. Multiple Media Types in one Session . . . . . . . . . . . 34 7.4. Multiple Media Types in one Session . . . . . . . . . . . 34
7.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 35 7.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 35
8. Summary considerations and guidelines . . . . . . . . . . . . 36 8. Summary considerations and guidelines . . . . . . . . . . . . 35
8.1. Guidelines . . . . . . . . . . . . . . . . . . . . . . . 36 8.1. Guidelines . . . . . . . . . . . . . . . . . . . . . . . 35
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36
10. Security Considerations . . . . . . . . . . . . . . . . . . . 37 10. Security Considerations . . . . . . . . . . . . . . . . . . . 37
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 37
11.1. Normative References . . . . . . . . . . . . . . . . . . 37 11.1. Normative References . . . . . . . . . . . . . . . . . . 37
11.2. Informative References . . . . . . . . . . . . . . . . . 37 11.2. Informative References . . . . . . . . . . . . . . . . . 37
Appendix A. Dismissing Payload Type Multiplexing . . . . . . . . 41 Appendix A. Dismissing Payload Type Multiplexing . . . . . . . . 41
Appendix B. Proposals for Future Work . . . . . . . . . . . . . 43 Appendix B. Proposals for Future Work . . . . . . . . . . . . . 43
Appendix C. RTP Specification Clarifications . . . . . . . . . . 43 Appendix C. Signalling considerations . . . . . . . . . . . . . 43
C.1. RTCP Reporting from all SSRCs . . . . . . . . . . . . . . 44 C.1. Signalling Aspects . . . . . . . . . . . . . . . . . . . 43
C.2. RTCP Self-reporting . . . . . . . . . . . . . . . . . . . 44 C.1.1. Session Oriented Properties . . . . . . . . . . . . . 44
C.3. Combined RTCP Packets . . . . . . . . . . . . . . . . . . 44 C.1.2. SDP Prevents Multiple Media Types . . . . . . . . . . 45
Appendix D. Signalling considerations . . . . . . . . . . . . . 45 C.1.3. Signalling Media Stream Usage . . . . . . . . . . . . 45
D.1. Signalling Aspects . . . . . . . . . . . . . . . . . . . 45 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 45
D.1.1. Session Oriented Properties . . . . . . . . . . . . . 45
D.1.2. SDP Prevents Multiple Media Types . . . . . . . . . . 46
D.1.3. Signalling Media Stream Usage . . . . . . . . . . . . 46
Appendix E. Changes from -01 to -02 . . . . . . . . . . . . . . 47
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 48
1. Introduction 1. Introduction
Real-time Transport Protocol (RTP) [RFC3550] is a commonly used The Real-time Transport Protocol (RTP) [RFC3550] is a commonly used
protocol for real-time media transport. It is a protocol that protocol for real-time media transport. It is a protocol that
provides great flexibility and can support a large set of different provides great flexibility and can support a large set of different
applications. RTP has several multiplexing points designed for applications. RTP has several multiplexing points designed for
different purposes. These enable support of multiple media streams different purposes. These enable support of multiple media streams
and switching between different encoding or packetization of the and switching between different encoding or packetization of the
media. By using multiple RTP sessions, sets of media streams can be media. By using multiple RTP sessions, sets of media streams can be
structured for efficient processing or identification. Thus the structured for efficient processing or identification. Thus the
question for any RTP application designer is how to best use the RTP question for any RTP application designer is how to best use the RTP
session, the SSRC and the payload type to meet the application's session, the SSRC and the payload type to meet the application's
needs. needs.
The purpose of this document is to provide clear information about The purpose of this document is to provide clear information about
the possibilities of RTP when it comes to multiplexing. The RTP the possibilities of RTP when it comes to multiplexing. The RTP
application designer should understand the implications that come application designer needs to understand the implications that come
from a particular usage of the RTP multiplexing points. The document from a particular usage of the RTP multiplexing points. The document
will recommend against some usages as being unsuitable, in general or will recommend against some usages as being unsuitable, in general or
for particular purposes. for particular purposes.
RTP was from the beginning designed for multiple participants in a RTP was from the beginning designed for multiple participants in a
communication session. This is not restricted to multicast, as some communication session. This is not restricted to multicast, as some
may believe, but also provides functionality over unicast, using believe, but also provides functionality over unicast, using either
either multiple transport flows below RTP or a network node that re- multiple transport flows below RTP or a network node that re-
distributes the RTP packets. The re-distributing node can for distributes the RTP packets. The re-distributing node can for
example be a transport translator (relay) that forwards the packets example be a transport translator (relay) that forwards the packets
unchanged, a translator performing media or protocol translation in unchanged, a translator performing media or protocol translation in
addition to forwarding, or an RTP mixer that creates new conceptual addition to forwarding, or an RTP mixer that creates new sources from
sources from the received streams. In addition, multiple streams may the received streams. In addition, multiple streams can occur when a
occur when a single endpoint have multiple media sources, like single endpoint have multiple media sources, like multiple cameras or
multiple cameras or microphones that need to be sent simultaneously. microphones that need to be sent simultaneously.
This document has been written due to increased interest in more This document has been written due to increased interest in more
advanced usage of RTP, resulting in questions regarding the most advanced usage of RTP, resulting in questions regarding the most
appropriate RTP usage. The limitations in some implementations, RTP/ appropriate RTP usage. The limitations in some implementations, RTP/
RTCP extensions, and signalling has also been exposed. It is RTCP extensions, and signalling has also been exposed. It is
expected that some limitations will be addressed by updates or new expected that some limitations will be addressed by updates or new
extensions resolving the shortcomings. The authors also hope that extensions resolving the shortcomings. The authors also hope that
clarification on the usefulness of some functionalities in RTP will clarification on the usefulness of some functionalities in RTP will
result in more complete implementations in the future. result in more complete implementations in the future.
The document starts with some definitions and then goes into the The document starts with some definitions and then goes into the
existing RTP functionalities around multiplexing. Both the desired existing RTP functionalities around multiplexing. Both the desired
behaviour and the implications of a particular behaviour depend on behaviour and the implications of a particular behaviour depend on
which topologies are used, which requires some consideration. This which topologies are used, which requires some consideration. This
is followed by a discussion of some choices in multiplexing behaviour is followed by a discussion of some choices in multiplexing behaviour
and their impacts. Some arch-types of RTP usage are discussed. and their impacts. Some archetypes of RTP usage are discussed.
Finally, some recommendations and examples are provided. Finally, some recommendations and examples are provided.
This document is currently an individual contribution, but it is the
intention of the authors that this should become a WG document that
objectively describes and provides suitable recommendations for which
there is WG consensus. Currently this document only represents the
views of the authors. The authors gladly accept any feedback on the
document and will be happy to discuss suitable recommendations.
2. Definitions 2. Definitions
2.1. Terminology 2.1. Terminology
The following terms and abbreviations are used in this document: The following terms and abbreviations are used in this document:
Endpoint: A single entity sending or receiving RTP packets. It may Endpoint: A single entity sending or receiving RTP packets. It can
be decomposed into several functional blocks, but as long as it be decomposed into several functional blocks, but as long as it
behaves a single RTP stack entity it is classified as a single behaves a single RTP stack entity it is classified as a single
endpoint. endpoint.
Multiparty: A communication situation including multiple end-points. Multiparty: A communication situation including multiple endpoints.
In this document it will be used to refer to situations where more In this document it will be used to refer to situations where more
than two end-points communicate. than two endpoints communicate.
Media Source: The source of a stream of data of one Media Type, It Media Source: The source of a stream of data of one Media Type, It
can either be a single media capturing device such as a video can either be a single media capturing device such as a video
camera, a microphone, or a specific output of a media production camera, a microphone, or a specific output of a media production
function, such as an audio mixer, or some video editing function. function, such as an audio mixer, or some video editing function.
Sending data from a Media Source may cause multiple RTP sources to
Sending data from a Media Source can cause multiple RTP sources to
send multiple Media Streams. send multiple Media Streams.
Media Stream: A sequence of RTP packets using a single SSRC that Media Stream: A sequence of RTP packets using a single SSRC that
together carries part or all of the content of a specific Media together carries part or all of the content of a specific Media
Type from a specific sender source within a given RTP session. Type from a specific sender source within a given RTP session.
RTP Source: The originator or source of a particular Media Stream. RTP Source: The originator or source of a particular Media Stream.
Identified using an SSRC in a particular RTP session. An RTP Identified using an SSRC in a particular RTP session. An RTP
source is the source of a single media stream, and is associated source is the source of a single media stream, and is associated
with a single endpoint and a single Media Source. An RTP Source with a single endpoint and a single Media Source. An RTP Source
is just called a Source in RFC 3550. is just called a Source in RFC 3550.
Media Sink: A recipient of a Media Stream. The endpoint sinking RTP Sink: A recipient of a Media Stream. The Media Sink is
media are Identified using one or more SSRCs. There may be more identified using one or more SSRCs. There can be more than one
than one Media Sink for one RTP source. RTP Sink for one RTP source.
CNAME: "Canonical name" - identifier associated with one or more RTP CNAME: "Canonical name" - identifier associated with one or more RTP
sources from a single endpoint. Defined in the RTP specification sources from a single endpoint. Defined in the RTP specification
[RFC3550]. A CNAME identifies a synchronisation context. A CNAME [RFC3550]. A CNAME identifies a synchronisation context. A CNAME
is associated with a single endpoint, although some RTP nodes will is associated with a single endpoint, although some RTP nodes will
use an end-points CNAME on that end-points behalf. An endpoint use an endpoint's CNAME on that endpoints behalf. An endpoint can
may use multiple CNAMEs. A CNAME is intended to be globally use multiple CNAMEs. A CNAME is intended to be globally unique
unique and stable for the full duration of a communication and stable for the full duration of a communication session.
session. [RFC6222][I-D.ietf-avtcore-6222bis] gives updated [RFC6222][I-D.ietf-avtcore-6222bis] gives updated guidelines for
guidelines for choosing CNAMEs. choosing CNAMEs.
Media Type: Audio, video, text or data whose form and meaning are Media Type: Audio, video, text or data whose form and meaning are
defined by a specific real-time application. defined by a specific real-time application.
Multiplex: The operation of taking multiple entities as input, Multiplexing: The operation of taking multiple entities as input,
aggregating them onto some common resource while keeping the aggregating them onto some common resource while keeping the
individual entities addressable such that they can later be fully individual entities addressable such that they can later be fully
and unambiguously separated (de-multiplexed) again. and unambiguously separated (de-multiplexed) again.
RTP Session: As defined by [RFC3550], the endpoints belonging to the RTP Session: As defined by [RFC3550], the endpoints belonging to the
same RTP Session are those that share a single SSRC space. That same RTP Session are those that share a single SSRC space. That
is, those endpoints can see an SSRC identifier transmitted by any is, those endpoints can see an SSRC identifier transmitted by any
one of the other endpoints. An endpoint can receive an SSRC one of the other endpoints. An endpoint can receive an SSRC
either as SSRC or as CSRC in RTP and RTCP packets. Thus, the RTP either as SSRC or as CSRC in RTP and RTCP packets. Thus, the RTP
Session scope is decided by the endpoints' network interconnection Session scope is decided by the endpoints' network interconnection
topology, in combination with RTP and RTCP forwarding strategies topology, in combination with RTP and RTCP forwarding strategies
deployed by endpoints and any interconnecting middle nodes. deployed by endpoints and any interconnecting middle nodes.
RTP Session Group: One or more RTP sessions that are used together RTP Session Group: One or more RTP sessions that are used together
to perform some function. Examples are multiple RTP sessions used to perform some function. Examples are multiple RTP sessions used
to carry different layers of a layered encoding. In an RTP to carry different layers of a layered encoding. In an RTP
Session Group, CNAMEs are assumed to be valid across all RTP Session Group, CNAMEs are assumed to be valid across all RTP
sessions, and designate synchronisation contexts that can cross sessions, and designate synchronisation contexts that can cross
RTP sessions. RTP sessions.
Source: Term that should not be used alone. An RTP Source, as Source: Term that ought not be used alone. An RTP Source, as
identified by its SSRC, is the source of a single Media Stream; a identified by its SSRC, is the source of a single Media Stream; a
Media Source can be the source of mutiple Media Streams. Media Source can be the source of mutiple Media Streams.
SSRC: An RTP 32-bit unsigned integer used as identifier for a RTP SSRC: A 32-bit unsigned integer used as identifier for a RTP Source.
Source.
CSRC: Contributing Source, A SSRC identifier used in a context, like CSRC: Contributing Source, A SSRC identifier used in a context, like
the RTP headers CSRC list, where it is clear that the Media Source the RTP headers CSRC list, where it is clear that the Media Source
is not the source of the media stream, instead only a contributor is not the source of the media stream, instead only a contributor
to the Media Stream. to the Media Stream.
Signalling: The process of configuring endpoints to participate in Signalling: The process of configuring endpoints to participate in
one or more RTP sessions. one or more RTP sessions.
2.2. Subjects Out of Scope 2.2. Subjects Out of Scope
skipping to change at page 7, line 4 skipping to change at page 6, line 33
This document is focused on issues that affect RTP. Thus, issues This document is focused on issues that affect RTP. Thus, issues
that involve signalling protocols, such as whether SIP, Jingle or that involve signalling protocols, such as whether SIP, Jingle or
some other protocol is in use for session configuration, the some other protocol is in use for session configuration, the
particular syntaxes used to define RTP session properties, or the particular syntaxes used to define RTP session properties, or the
constraints imposed by particular choices in the signalling constraints imposed by particular choices in the signalling
protocols, are mentioned only as examples in order to describe the protocols, are mentioned only as examples in order to describe the
RTP issues more precisely. RTP issues more precisely.
This document assumes the applications will use RTCP. While there This document assumes the applications will use RTCP. While there
are such applications that don't send RTCP, they do not conform to are such applications that don't send RTCP, they do not conform to
the RTP specification, and thus should be regarded as reusing the RTP the RTP specification, and thus can be regarded as reusing the RTP
packet format, not as implementing the RTP protocol. packet format but not implementing the RTP protocol.
3. RTP Concepts 3. Reasons for Multiplexing and Grouping RTP Media Streams
This section describes the existing RTP tools that are particularly The reasons why an endpoint might choose to send multiple media
important when discussing multiplexing of different media streams. streams are widespread. In the below discussion, please keep in mind
that the reasons for having multiple media streams vary and include
but are not limited to the following:
3.1. Session o Multiple Media Sources
The RTP Session is the highest semantic level in RTP and contains all o Multiple Media Streams might be needed to represent one Media
of the RTP functionality. RTP itself has no normative statements Source (for instance when using layered encodings)
about the relationship between different RTP sessions.
Identifier: RTP in itself does not contain any Session identifier, o A Retransmission stream might repeat the content of another Media
but relies either on the underlying transport or on the used Stream
signalling protocol, depending on in which context the identifier o An FEC stream might provide material that can be used to repair
is used (e.g. transport or signalling). Due to this, a single another Media Stream
RTP Session may have multiple associated identifiers belonging to
different contexts.
Position: Depending on underlying transport and signalling o Alternative Encodings, for instance different codecs for the same
protocol. For example, when running RTP on top of UDP, an audio stream
RTP endpoint can identify and delimit an RTP Session from
other RTP Sessions through the UDP source and destination
transport address, consisting of network address and port
number(s). Commonly, RTP and RTCP use separate ports and
the destination transport address is in fact an address
pair, but in the case of RTP/RTCP multiplex [RFC5761] there
is only a single port. Another example is SDP signalling
[RFC4566], where the grouping framework [RFC5888] uses an
identifier per "m="-line. If there is a one-to-one mapping
between "m="-line and RTP Session, that grouping framework
identifier can identify a single RTP Session.
Usage: Identify separate RTP Sessions. o Alternative formats, for instance multiple resolutions of the same
video stream
Uniqueness: Globally unique, but identity can only be detected by For each of these, it is necessary to decide if each additional media
the general communication context for the specific endpoint. stream gets its own SSRC multiplexed within a RTP Session, or if it
is necessary to use additional RTP sessions to group the media
streams. The choice between these made due to one reason might not
be the choice suitable for another reason. In the above list, the
different items have different levels of maturity in the discussion
on how to solve them. The clearest understanding is associated with
multiple media sources of the same media type. However, all warrant
discussion and clarification on how to deal with them. As the
discussion below will show, in reality we cannot choose a single one
of the two solutions. To utilise RTP well and as efficiently as
possible, both are needed. The real issue is finding the right
guidance on when to create RTP sessions and when additional SSRCs in
an RTP session is the right choice.
Inter-relation: Depending on the underlying transport and 4. RTP Multiplexing Points
signalling protocol.
Special Restrictions: None. This section describes the multiplexing points present in the RTP
protocol that can be used to distinguish media streams and groups of
media streams. Figure 1 outlines the process of demultiplexing
incoming RTP streams:
A RTP source in an RTP session that changes its source transport |
address during a session must also choose a new SSRC identifier to | packets
avoid being interpreted as a looped source. +-- v
| +------------+
| | Socket |
| +------------+
| || ||
RTP | RTP/ || |+-----> SCTP ( ...and any other protocols)
Session | RTCP || +------> STUN (multiplexed using same port)
+-- ||
+-- ||
| (split by SSRC)
| || || ||
| || || ||
Media | +--+ +--+ +--+
Streams | |PB| |PB| |PB| Jitter buffer, process RTCP, FEC, etc.
| +--+ +--+ +--+
+-- | | |
(pick rending context based on PT)
+-- | / |
| +---+ |
| / | |
Payload | +--+ +--+ +--+
Formats | |CR| |CR| |CR| Codecs and rendering
| +--+ +--+ +--+
+--
The set of participants considered part of the same RTP Session is Figure 1: RTP Demultiplexing Process
defined by the RTP specification [RFC3550] as those that share a
single SSRC space. That is, those participants that can see an SSRC
identifier transmitted by any one of the other participants. A
participant can receive an SSRC either as SSRC or CSRC in RTP and
RTCP packets. Thus, the RTP Session scope is decided by the
participants' network interconnection topology, in combination with
RTP and RTCP forwarding strategies deployed by endpoints and any
interconnecting middle nodes.
3.2. SSRC 4.1. RTP Session
An SSRC identifies a RTP source or a Media Sink. For end-points that An RTP Session is the highest semantic layer in the RTP protocol, and
both source and sink media streams its SSRCs are used in both roles. represents an association between a group of communicating endpoints.
At any given time, a RTP source has one and only one SSRC - although The set of participants that form an RTP session is defined as those
that may change over the lifetime of the RTP source or sink. An RTP that share a single synchronisation source space [RFC3550]. That is,
Session serves one or more RTP sources. if a group of participants are each aware of the synchronisation
source identifiers belonging to the other participants, then those
participants are in a single RTP session. A participant can become
aware of a synchronisation source identifier by receiving an RTP
packet containing it in the SSRC field or CSRC list, by receiving an
RTCP packet mentioning it in an SSRC field, or through signalling
(e.g., the SDP "a=ssrc:" attribute). Thus, the scope of an RTP
session is determined by the participants' network interconnection
topology, in combination with RTP and RTCP forwarding strategies
deployed by the endpoints and any middleboxes, and by the signalling.
Identifier: Synchronisation Source (SSRC), 32-bit unsigned number. RTP does not contain a session identifier. Rather, it relies on the
underlying transport layer to separate different sessions, and on the
signalling to identify sessions in a manner that is meaningful to the
application. The signalling layer might give sessions an explicit
identifier, or their identification might be implicit based on the
addresses and ports used. Accordingly, a single RTP Session can have
multiple associated identifiers, explicit and implicit, belonging to
different contexts. For example, when running RTP on top of UDP/IP,
an RTP endpoint can identify and delimit an RTP Session from other
RTP Sessions using the UDP source and destination IP addresses and
UDP port numbers. Another example is when using SDP grouping
framework [RFC5888] which uses an identifier per "m="-line; if there
is a one-to-one mapping between "m="-lines and RTP sessions, that
grouping framework identifier will identify an RTP Session.
Position: In every RTP and RTCP packet header. May be present in RTP sessions are globally unique, but their identity can only be
RTCP payload. May be present in SDP signalling. determined by the communication context at an endpoint of the
session, or by a middlebox that is aware of the session context. The
relationship between RTP sessions depending on the underlying
application, transport, and signalling protocol. The RTP protocol
makes no normative statements about the relationship between
different RTP sessions, however the applications that use more than
one RTP session will have some higher layer understanding of the
relationship between the sessions they create.
Usage: Identify individual RTP sources and Media Sinks within an 4.2. Synchronisation Source (SSRC)
RTP Session. Refer to individual RTP sources and Media
Sinks in RTCP messages and SDP signalling.
Uniqueness: Randomly chosen, intended to be globally unique A synchronisation source (SSRC) identifies an RTP source or an RTP
within an RTP Session and not dependent on network address. sink. Every endpoint will have at least one synchronisation source
SSRC value collisions may occur and must be handled as identifier, even if it does not send media (endpoints that are only
specified in RTP [RFC3550]. RTP sinks still send RTCP, and use their synchronisation source
identifier in the RTCP packets they send). An endpoint can have
multiple synchronisation sources identifiers if it contains multiple
RTP sources (i.e., if it sends multiple media streams). Endpoints
that are both RTP sources and RTP sinks use the same synchronisation
sources in both roles. At any given time, a RTP source has one and
only one SSRC - although that can change over the lifetime of the RTP
source or sink.
Inter-relation: SSRC belonging to the same synchronisation The synchronisation Source identifier is a 32-bit unsigned integer.
context (originating from the same endpoint), within or It is present in every RTP and RTCP packet header, and in the payload
between RTP Sessions, are indicated through use of identical of some RTCP packet types. It can also be present in SDP signalling.
SDES CNAME items in RTCP compound packets with those SSRC as Unless pre-signalled using the SDP "a=ssrc:" attribute [RFC5576], the
originating source. SDP signalling can provide explicit synchronisation source identifier is chosen at random. It is not
SSRC grouping [RFC5576]. When CNAME is inappropriate or dependent on the network address of the endpoint, and is intended to
insufficient, there exist a few other methods to relate be unique within an RTP session. Synchronisation source identifier
different SSRC. One such case is session-based RTP collisions can occur, and are handled as specified in [RFC3550] and
retransmission [RFC4588]. In some cases, the same SSRC [RFC5576], resulting in the synchronisation source identifier of the
Identifier value is used to relate streams in two different affecting RTP sources and/or sinks changing. An RTP source that
RTP Sessions, such as in Multi-Session Transmission of changes its RTP Session identifier (e.g. source transport address)
scalable video [RFC6190]. during a session has to choose a new SSRC identifier to avoid being
interpreted as looped source.
Special Restrictions: All RTP implementations must be prepared to Synchronisation source identifiers that belong to the same
use procedures for SSRC collision handling, which results in an synchronisation context (i.e., that represent media streams that can
SSRC number change. A RTP source that changes its RTP Session be synchronised using information in RTCP SR packets) are indicated
identifier (e.g. source transport address) during a session must by use of identical CNAME chunks in corresponding RTCP SDES packets.
also choose a new SSRC identifier to avoid being interpreted as SDP signalling can also be used to provide explicit grouping of
looped source. synchronisation sources [RFC5576].
Note that RTP sequence number and RTP timestamp are scoped by SSRC In some cases, the same SSRC Identifier value is used to relate
and thus independent between different SSRCs. streams in two different RTP Sessions, such as in Multi-Session
Transmission of scalable video [RFC6190]. This is NOT RECOMMENDED
since there is no guarantee of uniqueness in SSRC values across
RTP sessions.
Note that RTP sequence number and RTP timestamp are scoped by the
synchronisation source. Each RTP source will have a different
synchronisation source, and the corresponding media stream will have
a separate RTP sequence number and timestamp space.
An SSRC identifier is used by different type of sources as well as An SSRC identifier is used by different type of sources as well as
sinks: sinks:
Real Media Source: Connected to a "physical" media source, for Real Media Source: Connected to a "physical" media source, for
example a camera or microphone. example a camera or microphone.
Conceptual Media Source: A source with some attributed property Processed Media Source: A source with some attributed property
generated by some network node, for example a filtering function generated by some network node, for example a filtering function
in an RTP mixer that provides the most active speaker based on in an RTP mixer that provides the most active speaker based on
some criteria, or a mix representing a set of other sources. some criteria, or a mix representing a set of other sources.
Media Sink: A source that does not generate any RTP media stream in RTP Sink: A source that does not generate any RTP media stream in
itself (e.g. an endpoint or middlebox only receiving in an RTP itself (e.g. an endpoint or middlebox only receiving in an RTP
session), but anyway need a sender SSRC for use as source in RTCP session). It still needs a sender SSRC for use as source in RTCP
reports. reports.
Note that a endpoint that generates more than one media type, e.g. a Note that a endpoint that generates more than one media type, e.g. a
conference participant sending both audio and video, need not (and conference participant sending both audio and video, need not (and
commonly should not) use the same SSRC value across RTP sessions. commonly does not) use the same SSRC value across RTP sessions. RTCP
RTCP Compound packets containing the CNAME SDES item is the Compound packets containing the CNAME SDES item is the designated
designated method to bind an SSRC to a CNAME, effectively cross- method to bind an SSRC to a CNAME, effectively cross-correlating
correlating SSRCs within and between RTP Sessions as coming from the SSRCs within and between RTP Sessions as coming from the same
same endpoint. The main property attributed to SSRCs associated with endpoint. The main property attributed to SSRCs associated with the
the same CNAME is that they are from a particular synchronisation same CNAME is that they are from a particular synchronisation context
context and may be synchronised at playback. and can be synchronised at playback.
An RTP receiver receiving a previously unseen SSRC value must An RTP receiver receiving a previously unseen SSRC value will
interpret it as a new source. It may in fact be a previously interpret it as a new source. It might in fact be a previously
existing source that had to change SSRC number due to an SSRC existing source that had to change SSRC number due to an SSRC
conflict. However, the originator of the previous SSRC should have conflict. However, the originator of the previous SSRC ought to have
ended the conflicting source by sending an RTCP BYE for it prior to ended the conflicting source by sending an RTCP BYE for it prior to
starting to send with the new SSRC, so the new SSRC is anyway starting to send with the new SSRC, so the new SSRC is anyway
effectively a new source. effectively a new source.
3.3. CSRC 4.3. Contributing Source (CSRC)
The Contributing Source (CSRC) is not a separate identifier, but an
usage of the SSRC identifier. It is optionally included in the RTP
header as list of up to 15 contributing RTP sources. CSRC shares the
SSRC number space and specifies which set of SSRCs that has
contributed to the RTP payload. However, even though each RTP packet
and SSRC can be tagged with the contained CSRCs, the media
representation of an individual CSRC is in general not possible to
extract from the RTP payload since it is typically the result of a
media mixing (merge) operation (by an RTP mixer) on the individual
media streams corresponding to the CSRC identifiers. The exception
is the case when only a single CSRC is indicated as this represent
forwarding of a media stream, possibly modified. The RTP header
extension for Mixer-to-Client Audio Level Indication [RFC6465]
expands on the receivers information about a packet with a CSRC list.
Due to these restrictions, CSRC will not be considered a fully
qualified multiplex point and will be disregarded in the rest of this
document.
3.4. Payload Type
Each Media Stream utilises one or more encoding formats, identified
by the Payload Type.
The Payload Type is not a multiplexing point. Appendix A gives some
of the many reasons why attempting to use it as a multiplexing point
will have bad results.
Identifier: Payload Type number.
Position: In every RTP header and in signalling.
Usage: Identify a specific Media Stream encoding format. The The Contributing Source (CSRC) is not a separate identifier. Rather
format definition may be taken from [RFC3551] for statically a synchronisation source identifier is listed as a CSRC in the RTP
allocated Payload Types, but should be explicitly defined in header of a packet generated by an RTP mixer if the corresponding
signalling, such as SDP, both for static and dynamic Payload SSRC was in the header of one of the packets that contributed to the
Types. The term "format" here includes whatever can be mix.
described by out-of-band signalling means. In SDP, the term
"format" includes media type, RTP timestamp sampling rate,
codec, codec configuration, payload format configurations,
and various robustness mechanisms such as redundant
encodings [RFC2198].
Uniqueness: Scoped by sending endpoint within an RTP Session. To It is not possible, in general, to extract media represented by an
avoid any potential for ambiguity, it is desirable that individual CSRC since it is typically the result of a media mixing
payload types are unique across all sending endpoints within (merge) operation by an RTP mixer on the individual media streams
an RTP session, but this is often not true in practice. All corresponding to the CSRC identifiers. The exception is the case
SSRC in an RTP session sent from an single endpoint share when only a single CSRC is indicated as this represent forwarding of
the same Payload Types definitions. The RTP Payload Type is a media stream, possibly modified. The RTP header extension for
designed such that only a single Payload Type is valid at Mixer-to-Client Audio Level Indication [RFC6465] expands on the
any time instant in the SSRC's RTP timestamp time line, receivers information about a packet with a CSRC list. Due to these
effectively time-multiplexing different Payload Types if any restrictions, CSRC will not be considered a fully qualified
change occurs. Used Payload Type may change on a per-packet multiplexing point and will be disregarded in the rest of this
basis for an SSRC, for example a speech codec making use of document.
generic Comfort Noise [RFC3389].
Inter-relation: There are some uses where Payload Type numbers 4.4. RTP Payload Type
need to be unique across RTP Sessions. This is for example
the case in Media Decoding Dependency [RFC5583] where
Payload Types are used to describe media dependency across
RTP Sessions. Another example is session-based RTP
retransmission [RFC4588].
Special Restrictions: Using different RTP timestamp clock rates for Each Media Stream utilises one or more RTP payload formats. An RTP
the RTP Payload Types in use in the same RTP Session have issues payload format describes how the output of a particular media codec
such as potential for loss of synchronisation. Payload Type clock is framed and encoded into RTP packets. The payload format used is
rate switching requires some special consideration that is identified by the payload type field in the RTP data packet header.
described in the multiple clock rates specification The combination therefore identifies a specific Media Stream encoding
[I-D.ietf-avtext-multiple-clock-rates]. format. The format definition can be taken from [RFC3551] for
statically allocated payload types, but ought to be explicitly
defined in signalling, such as SDP, both for static and dynamic
Payload Types. The term "format" here includes whatever can be
described by out-of-band signalling means. In SDP, the term "format"
includes media type, RTP timestamp sampling rate, codec, codec
configuration, payload format configurations, and various robustness
mechanisms such as redundant encodings [RFC2198].
The payload type is scoped by sending endpoint within an RTP Session.
All synchronisation sources sent from an single endpoint share the
same payload types definitions. The RTP Payload Type is designed
such that only a single Payload Type is valid at any time instant in
the RTP source's RTP timestamp time line, effectively time-
multiplexing different Payload Types if any change occurs. The
payload type used can change on a per-packet basis for an SSRC, for
example a speech codec making use of generic comfort noise [RFC3389].
If there is a true need to send multiple Payload Types for the same If there is a true need to send multiple Payload Types for the same
SSRC that are valid for the same RTP Timestamps, then redundant SSRC that are valid for the same instant, then redundant encodings
encodings [RFC2198] can be used. Several additional constraints than [RFC2198] can be used. Several additional constraints than the ones
the ones mentioned above need to be met to enable this use, one of mentioned above need to be met to enable this use, one of which is
which is that the combined payload sizes of the different Payload that the combined payload sizes of the different Payload Types ought
Types must not exceed the transport MTU. not exceed the transport MTU.
Other aspects of RTP payload format use are described in RTP Payload Other aspects of RTP payload format use are described in RTP Payload
HowTo [I-D.ietf-payload-rtp-howto]. HowTo [I-D.ietf-payload-rtp-howto].
4. Multiple Streams Alternatives The payload type is not a multiplexing point at the RTP layer (see
Appendix A for a detailed discussion of why using the payload type as
The reasons why an endpoint may choose to send multiple media streams an RTP multiplexing point does not work). The RTP payload type is,
are widespread. In the below discussion, please keep in mind that however, used to determine how to render a media stream, and so can
the reasons for having multiple media streams vary and include but be viewed as selecting a rendering context. The rendering context
are not limited to the following: can be defined by the signalling, and the RTP payload type number is
sometimes used to associate an RTP media stream with the signalling.
o Multiple Media Sources This association is possible provided unique RTP payload type numbers
are used in each context. For example, an RTP media stream can be
o Multiple Media Streams may be needed to represent one Media Source associated with an SDP "m=" line by comparing the RTP payload type
(for instance when using layered encodings) numbers used by the media stream with payload types signalled in the
o A Retransmission stream may repeat the content of another Media "a=rtpmap:" lines in the media sections of the SDP. If RTP media
Stream streams are being associated with signalling contexts based on the
RTP payload type, then the assignment of RTP payload type numbers
o An FEC stream may provide material that can be used to repair MUST be unique across signalling contexts; if the same RTP payload
another Media Stream format configuration is used in multiple contexts, then a different
RTP payload type number has to be assigned in each context to ensure
o Alternative Encodings, for instance different codecs for the same uniqueness. If the RTP payload type number is not being used to
audio stream associated RTP media streams with a signalling context, then the same
RTP payload type number can be used to indicate the exact same RTP
o Alternative formats, for instance multiple resolutions of the same payload format configuration in multiple contexts.
video stream
Thus the choice made due to one reason may not be the choice suitable
for another reason. In the above list, the different items have
different levels of maturity in the discussion on how to solve them.
The clearest understanding is associated with multiple media sources
of the same media type. However, all warrant discussion and
clarification on how to deal with them.
This section reviews the alternatives to enable multi-stream
handling. Let's start with describing mechanisms that could enable
multiple media streams, independent of the purpose for having
multiple streams.
Additional SSRC: Each additional Media Stream gets its own SSRC
within a RTP Session.
Multiple RTP Sessions: Using additional RTP Sessions to handle
additional Media Streams.
As the below discussion will show, in reality we cannot choose a
single one of the two solutions. To utilise RTP well and as
efficiently as possible, both are needed. The real issue is finding
the right guidance on when to create RTP sessions and when additional
SSRCs in an RTP session is the right choice.
5. RTP Topologies and Issues 5. RTP Topologies and Issues
The impact of how RTP Multiplex is performed will in general vary The impact of how RTP multiplexing is performed will in general vary
with how the RTP Session participants are interconnected, described with how the RTP Session participants are interconnected, described
by RTP Topology [RFC5117] and its intended successor by RTP Topology [RFC5117] and its intended successor
[I-D.westerlund-avtcore-rtp-topologies-update]. [I-D.westerlund-avtcore-rtp-topologies-update].
5.1. Point to Point 5.1. Point to Point
Even the most basic use case, denoted Topo-Point-to-Point in Even the most basic use case, denoted Topo-Point-to-Point in
[I-D.westerlund-avtcore-rtp-topologies-update], raises a number of [I-D.westerlund-avtcore-rtp-topologies-update], raises a number of
considerations that are discussed in detail below (Section 6). They considerations that are discussed in detail below (Section 6). They
range over such aspects as: range over such aspects as:
skipping to change at page 13, line 25 skipping to change at page 13, line 5
o Do I need network differentiation in form of QoS? o Do I need network differentiation in form of QoS?
o Can the application more easily process and handle the media o Can the application more easily process and handle the media
streams if they are in different RTP sessions? streams if they are in different RTP sessions?
o Do I need to use additional media streams for RTP retransmission o Do I need to use additional media streams for RTP retransmission
or FEC. or FEC.
o etc. o etc.
The application designer will have to make choices here. The point The point to point topology can contain one to many RTP sessions with
to point topology can contain one to many RTP sessions with one to one to many media sources per session, each having one or more RTP
many media sources per session, resulting in one or more RTP source sources per media source.
(SSRC) per media source.
5.2. Translators & Gateways 5.2. Translators & Gateways
A point to point communication can end up in a situation when the A point to point communication can end up in a situation when the
peer it is communicating with is not compatible with the other peer peer it is communicating with is not compatible with the other peer
for various reasons: for various reasons:
o No common media codec for a media type thus requiring transcoding o No common media codec for a media type thus requiring transcoding
o Different support for multiple RTP sources and RTP sessions o Different support for multiple RTP sources and RTP sessions
o Usage of different media transport protocols, i.e RTP or other. o Usage of different media transport protocols, i.e RTP or other.
o Usage of different transport protocols, e.g. UDP, DCCP, TCP o Usage of different transport protocols, e.g. UDP, DCCP, TCP
o Different security solutions, e.g. IPsec, TLS, DTLS, SRTP with o Different security solutions, e.g. IPsec, TLS, DTLS, SRTP with
different keying mechanisms. different keying mechanisms.
This is in many situations resolved by the inclusion of a translator In many situations this is resolved by the inclusion of a translator
in-between the two peers, as described by Topo-PtP-Translator in between the two peers, as described by Topo-PtP-Translator in
[I-D.westerlund-avtcore-rtp-topologies-update]. The translator's [I-D.westerlund-avtcore-rtp-topologies-update]. The translator's
main purpose is to make the peer look to the other peer like main purpose is to make the peer look to the other peer like
something it is compatible with. There may also be other reasons something it is compatible with. There can also be other reasons
than compatibility to insert a translator in the form of a middlebox than compatibility to insert a translator in the form of a middlebox
or gateway, for example a need to monitor the media streams. If the or gateway, for example a need to monitor the media streams. If the
stream transport characteristics are changed by the translator, stream transport characteristics are changed by the translator,
appropriate media handling can require thorough understanding of the appropriate media handling can require thorough understanding of the
application logic, specifically any congestion control or media application logic, specifically any congestion control or media
adaptation. adaptation.
5.3. Point to Multipoint Using Multicast 5.3. Point to Multipoint Using Multicast
This section discusses the Point to Multi-point using Multicast to The Point to Multi-point topology is using Multicast to interconnect
interconnect the session participants. This includes both Topo-ASM the session participants. This includes both Topo-ASM and Topo-SSM
and Topo-SSM in [I-D.westerlund-avtcore-rtp-topologies-update]. in [I-D.westerlund-avtcore-rtp-topologies-update].
Special considerations must be made as multicast is a one to many Special considerations need to be made as multicast is a one to many
distribution system. For example, the only practical method for distribution system. For example, the only practical method for
adapting the bit-rate sent towards a given receiver for large groups adapting the bit-rate sent towards a given receiver for large groups
is to use a set of multicast groups, where each multicast group is to use a set of multicast groups, where each multicast group
represents a particular bit-rate. Otherwise the whole group gets represents a particular bit-rate. Otherwise the whole group gets
media adapted to the participant with the worst conditions. The media adapted to the participant with the worst conditions. The
media encoding is either scalable, where multiple layers can be media encoding is either scalable, where multiple layers can be
combined, or simulcast where a single version is selected. By either combined, or simulcast, where a single version is selected. By
selecting or combing multicast groups, the receiver can control the either selecting or combing multicast groups, the receiver can
bit-rate sent on the path to itself. It is also common that streams control the bit-rate sent on the path to itself. It is also common
that improve transport robustness are sent in their own multicast that streams that improve transport robustness are sent in their own
group to allow for interworking with legacy or to support different multicast group to allow for interworking with legacy or to support
levels of protection. different levels of protection.
The result of this is some common behaviours for RTP multicast: The result of this is some common behaviours for RTP multicast:
1. Multicast applications use a group of RTP sessions, not one. 1. Multicast applications use a group of RTP sessions, not one.
Each endpoint will need to be a member of a number of RTP Each endpoint will need to be a member of a number of RTP
sessions in order to perform well. sessions in order to perform well.
2. Within each RTP session, the number of Media Sinks is likely to 2. Within each RTP session, the number of RTP Sinks is likely to be
be much larger than the number of RTP sources. much larger than the number of RTP sources.
3. Multicast applications need signalling functions to identify the 3. Multicast applications need signalling functions to identify the
relationships between RTP sessions. relationships between RTP sessions.
4. Multicast applications need signalling functions to identify the 4. Multicast applications need signalling functions to identify the
relationships between SSRCs in different RTP sessions. relationships between SSRCs in different RTP sessions.
All multicast configurations share a signalling requirement; all of All multicast configurations share a signalling requirement; all of
the participants will need to have the same RTP and payload type the participants will need to have the same RTP and payload type
configuration. Otherwise, A could for example be using payload type configuration. Otherwise, A could for example be using payload type
97 as the video codec H.264 while B thinks it is MPEG-2. It should 97 as the video codec H.264 while B thinks it is MPEG-2. It is to be
be noted that SDP offer/answer [RFC3264] has issues with ensuring noted that SDP offer/answer [RFC3264] is not appropriate for ensuring
this property. The signalling aspects of multicast are not explored this property. The signalling aspects of multicast are not explored
further in this memo. further in this memo.
Security solutions for this type of group communications are also Security solutions for this type of group communications are also
challenging. First of all the key-management and the security challenging. First of all the key-management and the security
protocol must support group communication. Source authentication protocol needs to support group communication. Source authentication
becomes more difficult and requires special solutions. For more requires special solutions. For more discussion on this please
discussion on this please review Options for Securing RTP Sessions review Options for Securing RTP Sessions
[I-D.ietf-avtcore-rtp-security-options]. [I-D.ietf-avtcore-rtp-security-options].
5.4. Point to Multipoint Using an RTP Transport Translator 5.4. Point to Multipoint Using an RTP Transport Translator
This mode is described as Topo-Translator in This mode is described as Topo-Translator in
[I-D.westerlund-avtcore-rtp-topologies-update]. [I-D.westerlund-avtcore-rtp-topologies-update].
Transport Translators (Relays) result in an RTP session situation Transport Translators (Relays) result in an RTP session situation
that is very similar to how an ASM group RTP session would behave. that is very similar to how an ASM group RTP session would behave.
One of the most important aspects with the simple relay is that it is One of the most important aspects with the simple relay is that it is
only rewriting transport headers, no RTP modifications nor media only rewriting transport headers, no RTP modifications nor media
transcoding occur. The most obvious downside of this basic relaying transcoding occur. The most obvious downside of this basic relaying
is that the translator has no control over how many streams need to is that the translator has no control over how many streams need to
be delivered to a receiver. Nor can it simply select to deliver only be delivered to a receiver. Nor can it simply select to deliver only
certain streams, as this creates session inconsistencies: If the certain streams, as this creates session inconsistencies: If the
translator temporarily stops a stream, this prevents some receivers translator temporarily stops a stream, this prevents some receivers
from reporting on it. From the sender's perspective it will look from reporting on it. From the sender's perspective it will look
like a transport failure. Applications having needs to stop or like a transport failure. Applications needing to stop or switch
switch streams in the central node should consider using an RTP mixer streams in the central node ought to consider using an RTP mixer to
to avoid this issue. avoid this issue.
The Transport Translator has the same signalling requirement as The Transport Translator has the same signalling requirement as
multicast: All participants must have the same payload type multicast: All participants need to have the same payload type
configuration. Most of the ASM security issues also arise here. configuration. Most of the ASM security issues also arise here.
Some alternative when it comes to solution do exist as there after Some alternatives when it comes to solution do exist, as there exists
all exist a central node to communicate with. One that also can a central node to communicate with, one that also can enforce some
enforce some security policies depending on the level of trust placed security policies depending on the level of trust placed in the node.
in the node.
5.5. Point to Multipoint Using an RTP Mixer 5.5. Point to Multipoint Using an RTP Mixer
A mixer, described by Topo-Mixer in A mixer, described by Topo-Mixer in
[I-D.westerlund-avtcore-rtp-topologies-update], is a centralised node [I-D.westerlund-avtcore-rtp-topologies-update], is a centralised node
that selects or mixes content in a conference to optimise the RTP that selects or mixes content in a conference to optimise the RTP
session so that each endpoint only needs connect to one entity, the session so that each endpoint only needs connect to one entity, the
mixer. The media sent from the mixer to the end-point can be mixer. The media sent from the mixer to the endpoint can be
optimised in different ways. These optimisations include methods optimised in different ways. These optimisations include methods
like only choosing media from the currently most active speaker or like only choosing media from the currently most active speaker or
mixing together audio so that only one audio stream is required. mixing together audio so that only one audio stream is needed.
Mixers have some downsides, the first is that the mixer must be a Mixers have some downsides, the first is that the mixer has to be a
trusted node as they either perform media operations or at least trusted node as they repacketize the media, and can perform media
repacketize the media. When using SRTP, both media operations and transformation operations. When using SRTP, both media operations
repacketization requires that the mixer verifies integrity, decrypts and repacketization requires that the mixer verifies integrity,
the content, performs the operation and forms new RTP packets, decrypts the content, performs the operation and forms new RTP
encrypts and integrity-protects them. This applies to all types of packets, encrypts and integrity-protects them. This applies to all
mixers. The second downside is that all these operations and types of mixers. The second downside is that all these operations
optimisations of the session requires processing. How much depends and optimisations of the session requires processing. How much
on the implementation, as will become evident below. depends on the implementation, as will become evident below.
A mixer, unlike a pure transport translator, is always application A mixer, unlike a pure transport translator, is always application
specific: the application logic for stream mixing or stream selection specific: the application logic for stream mixing or stream selection
has to be embedded within the mixer, and controlled using application has to be embedded within the mixer, and controlled using application
specific signalling. The implementation of a mixer can take several specific signalling. The implementation of a mixer can take several
different forms and we will discuss the main themes available that different forms, as discussed below.
doesn't break RTP.
Please note that a Mixer could also contain translator
functionalities, like a media transcoder to adjust the media bit-rate
or codec used for a particular RTP media stream.
6. Multiple Streams Discussion
6.1. Introduction A Mixer can also contain translator functionalities, like a media
transcoder to adjust the media bit-rate or codec used for a
particular RTP media stream.
6. RTP Multiplexing: When to Use Multiple RTP Sessions
Using multiple media streams is a well supported feature of RTP. Using multiple media streams is a well supported feature of RTP.
However, it can be unclear for most implementers or people writing However, it can be unclear for most implementers or people writing
RTP/RTCP applications or extensions attempting to apply multiple RTP/RTCP applications or extensions attempting to apply multiple
streams when it is most appropriate to add an additional SSRC in an streams when it is most appropriate to add an additional SSRC in an
existing RTP session and when it is better to use multiple RTP existing RTP session and when it is better to use multiple RTP
sessions. This section tries to discuss the various considerations sessions. This section tries to discuss the various considerations
needed. The next section then concludes with some guidelines. needed. The next section then concludes with some guidelines.
6.2. RTP/RTCP Aspects 6.1. RTP and RTCP Protocol Considerations
This section discusses RTP and RTCP aspects worth considering when This section discusses RTP and RTCP aspects worth considering when
selecting between using an additional SSRC and Multiple RTP sessions. selecting between using an additional SSRC and Multiple RTP sessions.
6.2.1. The RTP Specification 6.1.1. The RTP Specification
RFC 3550 contains some recommendations and a bullet list with 5 RFC 3550 contains some recommendations and a bullet list with 5
arguments for different aspects of RTP multiplexing. Let's review arguments for different aspects of RTP multiplexing. Let's review
Section 5.2 of [RFC3550], reproduced below: Section 5.2 of [RFC3550], reproduced below:
"For efficient protocol processing, the number of multiplexing points "For efficient protocol processing, the number of multiplexing points
should be minimised, as described in the integrated layer processing should be minimised, as described in the integrated layer processing
design principle [ALF]. In RTP, multiplexing is provided by the design principle [ALF]. In RTP, multiplexing is provided by the
destination transport address (network address and port number) which destination transport address (network address and port number) which
is different for each RTP session. For example, in a teleconference is different for each RTP session. For example, in a teleconference
skipping to change at page 17, line 46 skipping to change at page 17, line 24
either single- or multiple-process implementations. either single- or multiple-process implementations.
Using a different SSRC for each medium but sending them in the same Using a different SSRC for each medium but sending them in the same
RTP session would avoid the first three problems but not the last RTP session would avoid the first three problems but not the last
two. two.
On the other hand, multiplexing multiple related sources of the same On the other hand, multiplexing multiple related sources of the same
medium in one RTP session using different SSRC values is the norm for medium in one RTP session using different SSRC values is the norm for
multicast sessions. The problems listed above don't apply: an RTP multicast sessions. The problems listed above don't apply: an RTP
mixer can combine multiple audio sources, for example, and the same mixer can combine multiple audio sources, for example, and the same
treatment is applicable for all of them. It may also be appropriate treatment is applicable for all of them. It might also be
to multiplex streams of the same medium using different SSRC values appropriate to multiplex streams of the same medium using different
in other scenarios where the last two problems do not apply." SSRC values in other scenarios where the last two problems do not
apply."
Let's consider one argument at a time. The first is an argument for Let's consider one argument at a time. The first is an argument for
using different SSRC for each individual media stream, which is very using different SSRC for each individual media stream, which is very
applicable. applicable.
The second argument is advocating against using payload type The second argument is advocating against using payload type
multiplexing, which still stands as can been seen by the extensive multiplexing, which still stands as can been seen by the extensive
list of issues found in Appendix A. list of issues found in Appendix A.
The third argument is yet another argument against payload type The third argument is yet another argument against payload type
multiplexing. multiplexing.
The fourth is an argument against multiplexing media streams that The fourth is an argument against multiplexing media streams that
require different handling into the same session. As we saw in the require different handling into the same session. As we saw in the
discussion of RTP mixers, the RTP mixer has to embed application discussion of RTP mixers, the RTP mixer has to embed application
logic in order to handle streams anyway; the separation of streams logic in order to handle streams anyway; the separation of streams
according to stream type is just another piece of application logic, according to stream type is just another piece of application logic,
which may or may not be appropriate for a particular application. A which might or might not be appropriate for a particular application.
type of application that can mix different media sources "blindly" is A type of application that can mix different media sources "blindly"
the audio only "telephone" bridge; most other type of application is the audio only "telephone" bridge; most other type of application
needs application-specific logic to perform the mix correctly. needs application-specific logic to perform the mix correctly.
The fifth argument discusses network aspects that we will discuss The fifth argument discusses network aspects that we will discuss
more below in Section 6.4. It also goes into aspects of more below in Section 6.3. It also goes into aspects of
implementation, like decomposed endpoints where different processes implementation, like decomposed endpoints where different processes
or inter-connected devices handle different aspects of the whole or inter-connected devices handle different aspects of the whole
multi-media session. multi-media session.
A summary of RFC 3550's view on multiplexing is to use unique SSRCs A summary of RFC 3550's view on multiplexing is to use unique SSRCs
for anything that is its own media/packet stream, and to use for anything that is its own media/packet stream, and to use
different RTP sessions for media streams that don't share media type. different RTP sessions for media streams that don't share a media
The first this document support as very valid. The later is one type. This document supports the first point; it is very valid. The
thing which is further discussed in this document as something the later is one thing which is further discussed in this document as
application developer needs to make a conscious choice for. something the application developer needs to make a conscious choice
for, but where imposing a single solution on all usages of RTP is
inappropriate.
6.2.1.1. Different Media Types Recommendations 6.1.1.1. Different Media Types: Recommendations
The above quote from RTP [RFC3550] includes a strong recommendation: The above quote from RTP [RFC3550] includes a strong recommendation:
"For example, in a teleconference composed of audio and video "For example, in a teleconference composed of audio and video
media encoded separately, each medium SHOULD be carried in a media encoded separately, each medium SHOULD be carried in a
separate RTP session with its own destination transport address." separate RTP session with its own destination transport address."
It was identified in "Why RTP Sessions Should Be Content Neutral" It was identified in "Why RTP Sessions Should Be Content Neutral"
[I-D.alvestrand-rtp-sess-neutral] that the above statement is poorly [I-D.alvestrand-rtp-sess-neutral] that the above statement is poorly
supported by any of the motivations provided in the RTP supported by any of the motivations provided in the RTP
specification. This has resulted in the creation of a specification specification. This has resulted in the creation of a specification
Multiple Media Types in an RTP Session specification Multiple Media Types in an RTP Session specification
[I-D.ietf-avtcore-multi-media-rtp-session] which intend to update [I-D.ietf-avtcore-multi-media-rtp-session] which intends to update
this recommendation. That document has a detailed analysis of the this recommendation. That document has a detailed analysis of the
potential issues in having multiple media types in the same RTP potential issues in having multiple media types in the same RTP
session. This document tries to provide an more over arching session. This document tries to provide an more over arching
consideration regarding the usage of RTP session and considers consideration regarding the usage of RTP session and considers
multiple media types in one RTP session as possible choice for the multiple media types in one RTP session as possible choice for the
RTP application designer. RTP application designer.
6.2.2. Multiple SSRCs in a Session 6.1.2. Multiple SSRCs in a Session
Using multiple SSRCs in an RTP session at one endpoint has some Using multiple SSRCs in an RTP session at one endpoint requires
unclarities in the RTP specification. These could potentially lead resolving some unclear aspects of the RTP specification. These could
to some interoperability issues as well as some potential significant potentially lead to some interoperability issues as well as some
inefficencies. These are further discussed in "RTP Considerations potential significant inefficencies. These are further discussed in
for Endpoints Sending Multiple Media Streams" "RTP Considerations for Endpoints Sending Multiple Media Streams"
[I-D.lennox-avtcore-rtp-multi-stream]. A application designer may [I-D.lennox-avtcore-rtp-multi-stream]. A application designer needs
need to consider these issues and the impact availability or lack of to consider these issues and the impact availability or lack of the
the optimization in the endpoints has on their application. optimization in the endpoints has on their application.
If an application will become affected by the issues described, using If an application will become affected by the issues described, using
Multiple RTP sessions can mitigate these issues. Multiple RTP sessions can mitigate these issues.
6.2.3. Handling Varying Sets of Senders 6.1.3. Handling Varying Sets of Senders
In some applications, the set of simultaneously active sources varies In some applications, the set of simultaneously active sources varies
within a larger set of session members. A receiver can then possibly within a larger set of session members. A receiver can then possibly
try to use a set of decoding chains that is smaller than the number try to use a set of decoding chains that is smaller than the number
of senders, switching the decoding chains between different senders. of senders, switching the decoding chains between different senders.
As each media decoding chain may contain state, either the receiver As each media decoding chain can contain state, either the receiver
must either be able to save the state of swapped-out senders, or the needs to either be able to save the state of swapped-out senders, or
sender must be able to send data that permits the receiver to the sender needs to be able to send data that permits the receiver to
reinitialise when it resumes activity. reinitialise when it resumes activity.
This behaviour will cause similar issues independent of Additional This behaviour will cause similar issues independent of Additional
SSRC or Multiple RTP session. SSRC or Multiple RTP session.
6.2.4. Cross Session RTCP Requests 6.1.4. Cross Session RTCP Requests
There currently exists no functionality to make truly synchronised There currently exists no functionality to make truly synchronised
and atomic RTCP messages with some type of request semantics across and atomic RTCP messages with some type of request semantics across
multiple RTP Sessions. Instead, separate RTCP messages will have to multiple RTP Sessions. Instead, separate RTCP messages will have to
be sent in each session. This gives streams in the same RTP session be sent in each session. This gives streams in the same RTP session
a slight advantage as RTCP messages for different streams in the same a slight advantage as RTCP messages for different streams in the same
session can be sent in a compound RTCP packet. Thus providing an session can be sent in a compound RTCP packet, thus providing an
atomic operation if different modifications of different streams are atomic operation if different modifications of different streams are
requested at the same time. requested at the same time.
When using multiple RTP sessions, the RTCP timing rules in the When using multiple RTP sessions, the RTCP timing rules in the
sessions and the transport aspects, such as packet loss and jitter, sessions and the transport aspects, such as packet loss and jitter,
prevents a receiver from relying on atomic operations, forcing it to prevents a receiver from relying on atomic operations, forcing it to
use more robust and forgiving mechanisms. use more robust and forgiving mechanisms.
6.2.5. Binding Related Sources 6.1.5. Binding Related Sources
A common problem in a number of various RTP extensions has been how A common problem in a number of various RTP extensions has been how
to bind related RTP sources and their media streams together. This to bind related RTP sources and their media streams together. This
issue is common to both using additional SSRCs and Multiple RTP issue is common to both using additional SSRCs and Multiple RTP
sessions. sessions.
The solutions can be divided into some groups, RTP/RTCP based, The solutions can be divided into some groups, RTP/RTCP based,
Signalling based (SDP), grouping related RTP sessions, and grouping Signalling based (SDP), grouping related RTP sessions, and grouping
SSRCs within an RTP session. Most solutions are explicit, but some SSRCs within an RTP session. Most solutions are explicit, but some
implicit methods have also been applied to the problem. implicit methods have also been applied to the problem.
The SDP-based signalling solutions are: The SDP-based signalling solutions are:
SDP Media Description Grouping: The SDP Grouping Framework [RFC5888] SDP Media Description Grouping: The SDP Grouping Framework [RFC5888]
uses various semantics to group any number of media descriptions. uses various semantics to group any number of media descriptions.
These has previously been considered primarily as grouping RTP These has previously been considered primarily as grouping RTP
sessions, but this may change. sessions, but this might change.
SDP SSRC grouping: Source-Specific Media Attributes in SDP [RFC5576] SDP SSRC grouping: Source-Specific Media Attributes in SDP [RFC5576]
includes a solution for grouping SSRCs the same way as the includes a solution for grouping SSRCs the same way as the
Grouping framework groupes Media Descriptions. Grouping framework groupes Media Descriptions.
This supports a lot of use cases. Both solutions have shortcomings SDP MSID grouping: Media Stream Identifiers [I-D.ietf-mmusic-msid]
in cases where the session's dynamic properties are such that it is includes a solution for grouping SSRCs that is independent of
difficult or resource consuming to keep the list of related SSRCs up their allocation to RTP sessions.
to date. As they are two related but still separated solutions it is
not well specified to group SSRCs across multiple RTP sessions and This supports a lot of use cases. All these solutions have
SDP media descriptions. shortcomings in cases where the session's dynamic properties are such
that it is difficult or resource consuming to keep the list of
related SSRCs up to date.
Within RTP/RTCP based solutions when binding to a endpoint or Within RTP/RTCP based solutions when binding to a endpoint or
synchronization context, i.e. the CNAME has not be sufficient and synchronization context, i.e. the CNAME has not be sufficient and
one has multiple RTP sessions has been to using the same SSRC value one has multiple RTP sessions has been to using the same SSRC value
across all the RTP sessions. RTP Retransmission [RFC4588] is across all the RTP sessions. RTP Retransmission [RFC4588] is
multiple RTP session mode, Generic FEC [RFC5109], as well as the RTP multiple RTP session mode, Generic FEC [RFC5109], as well as the RTP
payload format for Scalable Video Coding [RFC6190] in Multi Session payload format for Scalable Video Coding [RFC6190] in Multi Session
Transmission (MST) mode uses this method. This method clearly works Transmission (MST) mode uses this method. This method clearly works
but might have some downside in RTP sessions with many participating but might have some downside in RTP sessions with many participating
SSRCs. The birthday paradox ensures that if you populate a single SSRCs. The birthday paradox ensures that if you populate a single
session with 9292 SSRCs at random, the chances are approximately 1% session with 9292 SSRCs at random, the chances are approximately 1%
that at least one collision will occur. When a collision occur this that at least one collision will occur. When a collision occur this
will force one to change SSRC in all RTP sessions and thus will force one to change SSRC in all RTP sessions and thus
resynchronizing all of them instead of only the single media stream resynchronizing all of them instead of only the single media stream
having the collision. having the collision.
It can be noted that Section 8.3 of the RTP Specification [RFC3550] It can be noted that Section 8.3 of the RTP Specification [RFC3550]
recommends using a single SSRC space across all RTP sessions for recommends using a single SSRC space across all RTP sessions for
layered coding. layered coding.
Another solution that has been applied to binding SSRCs have been an Another solution that has been applied to binding SSRCs has been an
implicit method used by RTP Retransmission [RFC4588] when doing implicit method used by RTP Retransmission [RFC4588] when doing
retransmissions in the same RTP session as the source RTP media retransmissions in the same RTP session as the source RTP media
stream. This issues an RTP retransmission request, and then await a stream. This issues an RTP retransmission request, and then await a
new SSRC carrying the RTP retransmission payload and where that SSRC new SSRC carrying the RTP retransmission payload and where that SSRC
is from the same CNAME. This limits a requestor to having only one is from the same CNAME. This limits a requestor to having only one
outstanding request on any new source SSRCs per endpoint. outstanding request on any new source SSRCs per endpoint.
There exist no RTP/RTCP based mechanism capable of supporting There exists no RTP/RTCP based mechanism capable of supporting
explicit association accross multiple RTP sessions as well within an explicit association accross multiple RTP sessions as well within an
RTP session. A proposed solution for handling this issue is RTP session. A proposed solution for handling this issue is
[I-D.westerlund-avtext-rtcp-sdes-srcname]. This can potentially be [I-D.westerlund-avtext-rtcp-sdes-srcname]. If accepted, this can
part of an SDP based solution also by reusing the same identifiers potentially also be part of an SDP based solution also by reusing the
and name space. same identifiers and name space.
6.2.6. Forward Error Correction 6.1.6. Forward Error Correction
There exist a number of Forward Error Correction (FEC) based schemes There exist a number of Forward Error Correction (FEC) based schemes
for how to reduce the packet loss of the original streams. Most of for how to reduce the packet loss of the original streams. Most of
the FEC schemes will protect a single source flow. The protection is the FEC schemes will protect a single source flow. The protection is
achieved by transmitting a certain amount of redundant information achieved by transmitting a certain amount of redundant information
that is encoded such that it can repair one or more packet loss over that is encoded such that it can repair one or more packet losses
the set of packets they protect. This sequence of redundant over the set of packets they protect. This sequence of redundant
information also needs to be transmitted as its own media stream, or information also needs to be transmitted as its own media stream, or
in some cases instead of the original media stream. Thus many of in some cases instead of the original media stream. Thus many of
these schemes create a need for binding the related flows as these schemes create a need for binding related flows as discussed
discussed above. They also create additional flows that need to be above. Looking at the history of these schemes, there are schemes
transported. Looking at the history of these schemes, there is both using multiple SSRCs and schemes using multiple RTP sessions, and
schemes using multiple SSRCs and multiple RTP sessions, and some some schemes that support both modes of operation.
schemes that support both modes of operation.
Using multiple RTP sessions supports the case where some set of Using multiple RTP sessions supports the case where some set of
receivers may not be able to utilise the FEC information. By placing receivers might not be able to utilise the FEC information. By
it in a separate RTP session, it can easily be ignored. placing it in a separate RTP session, it can easily be ignored.
In usages involving multicast, having the FEC information on its own In usages involving multicast, having the FEC information on its own
multicast group, and therefore in its own RTP session, allows for multicast group, and therefore in its own RTP session, allows for
flexibility, for example when using Rapid Acquisition of Multicast flexibility. This is especially useful when receivers see very
Groups (RAMS) [RFC6285]. During the RAMS burst where data is heterogeneous packet loss rates. Those receivers that are not seeing
received over unicast and where it is possible to combine with packet loss don't need to join the multicast group with the FEC data,
unicast based retransmission [RFC4588], there is no need to burst the and so avoid the overhead of receiving unnecessary FEC packets, for
FEC data related to the burst of the source media streams needed to example.
catch up with the multicast group. This saves bandwidth to the
receiver during the burst, enabling quicker catch up. When the
receiver has caught up and joins the multicast group(s) for the
source, it can at the same time join the multicast group with the FEC
information. Having the source stream and the FEC in separate groups
allows for easy separation in the Burst/Retransmission Source (BRS)
without having to individually classify packets.
6.2.7. Transport Translator Sessions 6.1.7. Transport Translator Sessions
A basic Transport Translator relays any incoming RTP and RTCP packets A basic Transport Translator relays any incoming RTP and RTCP packets
to the other participants. The main difference between Additional to the other participants. The main difference between Additional
SSRCs and Multiple RTP Sessions resulting from this use case is that SSRCs and Multiple RTP Sessions resulting from this use case is that
with Additional SSRCs it is not possible for a particular session with Additional SSRCs it is not possible for a particular session
participant to decide to receive a subset of media streams. When participant to decide to receive a subset of media streams. When
using separate RTP sessions for the different sets of media streams, using separate RTP sessions for the different sets of media streams,
a single participant can choose to leave one of the sessions but not a single participant can choose to leave one of the sessions but not
the other. the other.
6.3. Interworking 6.2. Interworking Considerations
There are several different kinds of interworking, and this section There are several different kinds of interworking, and this section
discusses two related ones. The interworking between different discusses two related ones. The interworking between different
applications and the implications of potentially different choices of applications and the implications of potentially different choices of
usage of RTP's multiplexing points. The second topic relates to what usage of RTP's multiplexing points. The second topic relates to what
limitations may have to be considered working with some legacy limitations have to be considered working with some legacy
applications. applications.
6.3.1. Types of Interworking 6.2.1. Types of Interworking
It is not uncommon that applications or services of similar usage, It is not uncommon that applications or services of similar usage,
especially the ones intended for interactive communication, ends up especially the ones intended for interactive communication, encounter
in a situation where one want to interconnect two or more of these a situation where one want to interconnect two or more of these
applications. applications.
In these cases one ends up in a situation where one might use a In these cases one ends up in a situation where one might use a
gateway to interconnect applications. This gateway then needs to gateway to interconnect applications. This gateway then needs to
change the multiplexing structure or adhere to limitations in each change the multiplexing structure or adhere to limitations in each
application. application.
There are two fundamental approaches to gatewaying: RTP bridging, There are two fundamental approaches to gatewaying: RTP Translator
where the gateway acts as an RTP Translator, and the two applications interworking (RTP bridging), where the gateway acts as an RTP
are members of the same RTP session, and RTP termination, where there Translator, and the two applications are members of the same RTP
session, and Gateway Interworking (with RTP termination), where there
are independent RTP sessions running from each interconnected are independent RTP sessions running from each interconnected
application to the gateway. application to the gateway.
6.3.2. RTP Translator Interworking 6.2.2. RTP Translator Interworking
From an RTP perspective the RTP Translator approach could work if all From an RTP perspective the RTP Translator approach could work if all
the applications are using the same codecs with the same payload the applications are using the same codecs with the same payload
types, have made the same multiplexing choices, have the same types, have made the same multiplexing choices, have the same
capabilities in number of simultaneous media streams combined with capabilities in number of simultaneous media streams combined with
the same set of RTP/RTCP extensions being supported. Unfortunately the same set of RTP/RTCP extensions being supported. Unfortunately
this may not always be true. this might not always be true.
When one is gatewaying via an RTP Translator, a natural requirement When one is gatewaying via an RTP Translator, a natural requirement
is that the two applications being interconnected must use the same is that the two applications being interconnected need to use the
approach to multiplexing. Furthermore, if one of the applications is same approach to multiplexing. Furthermore, if one of the
capable of working in several modes (such as being able to use applications is capable of working in several modes (such as being
Additional SSRCs or Multiple RTP sessions at will), and the other one able to use Additional SSRCs or Multiple RTP sessions at will), and
is not, successful interconnection depends on locking the more the other one is not, successful interconnection depends on locking
flexible application into the operating mode where interconnection the more flexible application into the operating mode where
can be successful, even if no participants using the less flexible interconnection can be successful, even if no participants using the
application are present when the RTP sessions are being created. less flexible application are present when the RTP sessions are being
created.
6.3.3. Gateway Interworking 6.2.3. Gateway Interworking
When one terminates RTP sessions at the gateway, there are certain When one terminates RTP sessions at the gateway, there are certain
tasks that the gateway must carry out: tasks that the gateway has to carry out:
o Generating appropriate RTCP reports for all media streams o Generating appropriate RTCP reports for all media streams
(possibly based on incoming RTCP reports), originating from SSRCs (possibly based on incoming RTCP reports), originating from SSRCs
controlled by the gateway. controlled by the gateway.
o Handling SSRC collision resolution in each application's RTP o Handling SSRC collision resolution in each application's RTP
sessions. sessions.
o Signalling, choosing and policing appropriate bit-rates for each o Signalling, choosing and policing appropriate bit-rates for each
session. session.
If either of the applications has any security applied, e.g. in the If either of the applications has any security applied, e.g. in the
form of SRTP, the gateway must be able to decrypt incoming packets form of SRTP, the gateway needs to be able to decrypt incoming
and re-encrypt them in the other application's security context. packets and re-encrypt them in the other application's security
This is necessary even if all that's required is a simple remapping context. This is necessary even if all that's needed is a simple
of SSRC numbers. If this is done, the gateway also needs to be a remapping of SSRC numbers. If this is done, the gateway also needs
member of the security contexts of both sides, of course. to be a member of the security contexts of both sides, of course.
Other tasks a gateway may need to apply include transcoding (for Other tasks a gateway might need to apply include transcoding (for
incompatible codec types), rescaling (for incompatible video size incompatible codec types), rescaling (for incompatible video size
requirements), suppression of content that is known not to be handled requirements), suppression of content that is known not to be handled
in the destination application, or the addition or removal of in the destination application, or the addition or removal of
redundancy coding or scalability layers to fit the need of the redundancy coding or scalability layers to fit the need of the
destination domain. destination domain.
From the above, we can see that the gateway needs to have an intimate From the above, we can see that the gateway needs to have an intimate
knowledge of the application requirements; a gateway is by its nature knowledge of the application requirements; a gateway is by its nature
application specific, not a commodity product. application specific, not a commodity product.
This fact reveals the potential for these gateways to block evolution This fact reveals the potential for these gateways to block evolution
of the applications by blocking unknown RTP and RTCP extensions that of the applications by blocking unknown RTP and RTCP extensions that
the regular application has been extended with. the regular application has been extended with.
If one uses security functions, like SRTP, they can as seen above If one uses security functions, like SRTP, they can as seen above
incur both additional risk due to the gateway needing to be in incur both additional risk due to the gateway needing to be in
security association between the endpoints, unless the gateway is on security association between the endpoints, unless the gateway is on
the transport level, and additional complexities in form of the the transport level, and additional complexities in form of the
decrypt-encrypt cycles needed for each forwarded packet. SRTP, due decrypt-encrypt cycles needed for each forwarded packet. SRTP, due
to its keying structure, also requires that each RTP session must to its keying structure, also requires that each RTP session needs
have different master keys, as use of the same key in two RTP different master keys, as use of the same key in two RTP sessions can
sessions can result in two-time pads that completely breaks the result in two-time pads that completely breaks the confidentiality of
confidentiality of the packets. the packets.
6.3.4. Multiple SSRC Legacy Considerations
6.2.4. Multiple SSRC Legacy Considerations
Historically, the most common RTP use cases have been point to point Historically, the most common RTP use cases have been point to point
Voice over IP (VoIP) or streaming applications, commonly with no more Voice over IP (VoIP) or streaming applications, commonly with no more
than one media source per endpoint and media type (typically audio than one media source per endpoint and media type (typically audio
and video). Even in conferencing applications, especially voice and video). Even in conferencing applications, especially voice
only, the conference focus or bridge has provided a single stream only, the conference focus or bridge has provided a single stream
with a mix of the other participants to each participant. It is also with a mix of the other participants to each participant. It is also
common to have individual RTP sessions between each endpoint and the common to have individual RTP sessions between each endpoint and the
RTP mixer, meaning that the mixer functions as an RTP-terminating RTP mixer, meaning that the mixer functions as an RTP-terminating
gateway. gateway.
When establishing RTP sessions that may contain endpoints that aren't When establishing RTP sessions that can contain endpoints that aren't
updated to handle multiple streams following these recommendations, a updated to handle multiple streams following these recommendations, a
particular application can have issues with multiple SSRCs within a particular application can have issues with multiple SSRCs within a
single session. These issues include: single session. These issues include:
1. Need to handle more than one stream simultaneously rather than 1. Need to handle more than one stream simultaneously rather than
replacing an already existing stream with a new one. replacing an already existing stream with a new one.
2. Be capable of decoding multiple streams simultaneously. 2. Be capable of decoding multiple streams simultaneously.
3. Be capable of rendering multiple streams simultaneously. 3. Be capable of rendering multiple streams simultaneously.
This indicates that gateways attempting to interconnect to this class This indicates that gateways attempting to interconnect to this class
of devices must make sure that only one media stream of each type of devices has to make sure that only one media stream of each type
gets delivered to the endpoint if it's expecting only one, and that gets delivered to the endpoint if it's expecting only one, and that
the multiplexing format is what the device expects. It is highly the multiplexing format is what the device expects. It is highly
unlikely that RTP translator-based interworking can be made to unlikely that RTP translator-based interworking can be made to
function successfully in such a context. function successfully in such a context.
6.4. Network Aspects 6.3. Network Considerations
The multiplexing choice has impact on network level mechanisms that The multiplexing choice has impact on network level mechanisms that
need to be considered by the implementor. need to be considered by the implementor.
6.4.1. Quality of Service 6.3.1. Quality of Service
When it comes to Quality of Service mechanisms, they are either flow When it comes to Quality of Service mechanisms, they are either flow
based or marking based. RSVP [RFC2205] is an example of a flow based based or marking based. RSVP [RFC2205] is an example of a flow based
mechanism, while Diff-Serv [RFC2474] is an example of a Marking based mechanism, while Diff-Serv [RFC2474] is an example of a Marking based
one. For a marking based scheme, the method of multiplexing will not one. For a marking based scheme, the method of multiplexing will not
affect the possibility to use QoS. affect the possibility to use QoS.
However, for a flow based scheme there is a clear difference between However, for a flow based scheme there is a clear difference between
the methods. Additional SSRC will result in all media streams being the methods. Additional SSRC will result in all media streams being
part of the same 5-tuple (protocol, source address, destination part of the same 5-tuple (protocol, source address, destination
address, source port, destination port) which is the most common address, source port, destination port) which is the most common
selector for flow based QoS. Thus, separation of the level of QoS selector for flow based QoS. Thus, separation of the level of QoS
between media streams is not possible. That is however possible when between media streams is not possible. That is however possible when
using multiple RTP sessions, where each media stream for which a using multiple RTP sessions, where each media stream for which a
separate QoS handling is desired can be in a different RTP session separate QoS handling is desired can be in a different RTP session
that can be sent over different 5-tuples. that can be sent over different 5-tuples.
6.4.2. NAT and Firewall Traversal 6.3.2. NAT and Firewall Traversal
In today's network there exist a large number of middleboxes. The In today's network there exist a large number of middleboxes. The
ones that normally have most impact on RTP are Network Address ones that normally have most impact on RTP are Network Address
Translators (NAT) and Firewalls (FW). Translators (NAT) and Firewalls (FW).
Below we analyze and comment on the impact of requiring more Below we analyze and comment on the impact of requiring more
underlying transport flows in the presence of NATs and Firewalls: underlying transport flows in the presence of NATs and Firewalls:
End-Point Port Consumption: A given IP address only has 65536 End-Point Port Consumption: A given IP address only has 65536
available local ports per transport protocol for all consumers of available local ports per transport protocol for all consumers of
ports that exist on the machine. This is normally never an issue ports that exist on the machine. This is normally never an issue
for an end-user machine. It can become an issue for servers that for an end-user machine. It can become an issue for servers that
handle large number of simultaneous streams. However, if the handle large number of simultaneous streams. However, if the
application uses ICE to authenticate STUN requests, a server can application uses ICE to authenticate STUN requests, a server can
serve multiple endpoints from the same local port, and use the serve multiple endpoints from the same local port, and use the
whole 5-tuple (source and destination address, source and whole 5-tuple (source and destination address, source and
destination port, protocol) as identifier of flows after having destination port, protocol) as identifier of flows after having
securely bound them to the remote endpoint address using the STUN securely bound them to the remote endpoint address using the STUN
request. In theory the minimum number of media server ports request. In theory the minimum number of media server ports
needed are the maximum number of simultaneous RTP Sessions a needed are the maximum number of simultaneous RTP Sessions a
single endpoint may use. In practice, implementation will single endpoint can use. In practice, implementation will
probably benefit from using more server ports to simplify probably benefit from using more server ports to simplify
implementation or avoid performance bottlenecks. implementation or avoid performance bottlenecks.
NAT State: If an endpoint sits behind a NAT, each flow it generates NAT State: If an endpoint sits behind a NAT, each flow it generates
to an external address will result in a state that has to be kept to an external address will result in a state that has to be kept
in the NAT. That state is a limited resource. In home or Small in the NAT. That state is a limited resource. In home or Small
Office/Home Office (SOHO) NATs, memory or processing are usually Office/Home Office (SOHO) NATs, memory or processing are usually
the most limited resources. For large scale NATs serving many the most limited resources. For large scale NATs serving many
internal endpoints, available external ports are likely the scarce internal endpoints, available external ports are likely the scarce
resource. Port limitations is primarily a problem for larger resource. Port limitations is primarily a problem for larger
skipping to change at page 26, line 39 skipping to change at page 26, line 22
has found a working candidate pair. Based on that working pair has found a working candidate pair. Based on that working pair
the needed extra time is to in parallel establish the, in most the needed extra time is to in parallel establish the, in most
cases 2-3, additional flows. However, packet loss causes extra cases 2-3, additional flows. However, packet loss causes extra
delays, at least 100 ms, which is the minimal retransmission timer delays, at least 100 ms, which is the minimal retransmission timer
for ICE. for ICE.
NAT Traversal Failure Rate: Due to the need to establish more than a NAT Traversal Failure Rate: Due to the need to establish more than a
single flow through the NAT, there is some risk that establishing single flow through the NAT, there is some risk that establishing
the first flow succeeds but that one or more of the additional the first flow succeeds but that one or more of the additional
flows fail. The risk that this happens is hard to quantify, but flows fail. The risk that this happens is hard to quantify, but
it should be fairly low as one flow from the same interfaces has ought to be fairly low as one flow from the same interfaces has
just been successfully established. Thus only rare events such as just been successfully established. Thus only rare events such as
NAT resource overload, or selecting particular port numbers that NAT resource overload, or selecting particular port numbers that
are filtered etc, should be reasons for failure. are filtered etc, ought to be reasons for failure.
Deep Packet Inspection and Multiple Streams: Firewalls differ in how Deep Packet Inspection and Multiple Streams: Firewalls differ in how
deeply they inspect packets. There exist some potential that deeply they inspect packets. There exist some potential that
deeply inspecting firewalls will have similar legacy issues with deeply inspecting firewalls will have similar legacy issues with
multiple SSRCs as some stack implementations. multiple SSRCs as some stack implementations.
Additional SSRC keeps the additional media streams within one RTP Additional SSRC keeps the additional media streams within one RTP
Session and transport flow and does not introduce any additional NAT Session and transport flow and does not introduce any additional NAT
traversal complexities per media stream. This can be compared with traversal complexities per media stream. This can be compared with
normally one or two additional transport flows per RTP session when normally one or two additional transport flows per RTP session when
using multiple RTP sessions. Additional lower layer transport flows using multiple RTP sessions. Additional lower layer transport flows
will be required, unless an explicit de-multiplexing layer is added will be needed, unless an explicit de-multiplexing layer is added
between RTP and the transport protocol. A proposal for how to between RTP and the transport protocol. A proposal for how to
multiplex multiple RTP sessions over the same single lower layer multiplex multiple RTP sessions over the same single lower layer
transport exist in [I-D.westerlund-avtcore-transport-multiplexing]. transport exist in [I-D.westerlund-avtcore-transport-multiplexing].
6.4.3. Multicast 6.3.3. Multicast
Multicast groups provides a powerful semantics for a number of real- Multicast groups provides a powerful semantics for a number of real-
time applications, especially the ones that desire broadcast-like time applications, especially the ones that desire broadcast-like
behaviours with one endpoint transmitting to a large number of behaviours with one endpoint transmitting to a large number of
receivers, like in IPTV. But that same semantics do result in a receivers, like in IPTV. But that same semantics do result in a
certain number of limitations. certain number of limitations.
One limitation is that for any group, sender side adaptation to the One limitation is that for any group, sender side adaptation to the
actual receiver properties causes degradation for all participants to actual receiver properties causes degradation for all participants to
what is supported by the receiver with the worst conditions among the what is supported by the receiver with the worst conditions among the
group participants. In most cases this is not acceptable. Instead group participants. In most cases this is not acceptable. Instead
various receiver based solutions are employed to ensure that the various receiver based solutions are employed to ensure that the
receivers achieve best possible performance. By using scalable receivers achieve best possible performance. By using scalable
encoding and placing each scalability layer in a different multicast encoding and placing each scalability layer in a different multicast
group, the receiver can control the amount of traffic it receives. group, the receiver can control the amount of traffic it receives.
To have each scalability layer on a different multicast group, one To have each scalability layer on a different multicast group, one
RTP session per multicast group is used. RTP session per multicast group is used.
RTP can't function correctly if media streams sent over different In addition, the transport flow considerations in multicast are a bit
multicast groups where considered part of the same RTP session. different from unicast; NATs are not useful in the multicast
First of all the different layers needs different SSRCs or the environment, meaning that the entire port range of each multicast
sequence number space seen for a receiver of any sub set of the address is available for distinguishing between RTP sessions.
layers would have sender side holes. Thus triggering packet loss
reactions. Also any RTCP reporting of such a session would be non
consistent and making it difficult for the sender to determine the
sessions actual state.
Thus it appears easiest and most straightforward to use multiple RTP Thus it appears easiest and most straightforward to use multiple RTP
sessions. In addition, the transport flow considerations in sessions for sending different media flows used for adapting to
multicast are a bit different from unicast. First of all there is no network conditions.
shortage of port space, as each multicast group has its own port
space.
6.4.4. Multiplexing multiple RTP Session on a Single Transport 6.3.4. Multiplexing multiple RTP Session on a Single Transport
For applications that doesn't need flow based QoS and like to save For applications that don't need flow based QoS and like to save
ports and NAT/FW traversal costs and where usage of multiple media ports and NAT/FW traversal costs and where usage of multiple media
types in one RTP session is not suitable, there is a proposal for how types in one RTP session is not suitable, there is a proposal for how
to achieve multiplexing of multiple RTP sessions over the same lower to achieve multiplexing of multiple RTP sessions over the same lower
layer transport [I-D.westerlund-avtcore-transport-multiplexing]. layer transport [I-D.westerlund-avtcore-transport-multiplexing].
Using such a solution would allow Multiple RTP session without most Using such a solution would allow Multiple RTP session without most
of the perceived downsides of Multiple RTP sessions creating a need of the perceived downsides of Multiple RTP sessions creating a need
for additional transport flows. for additional transport flows, but this solution would require
support from all functions that handle RTP packets, including
firewalls.
6.5. Security Aspects 6.4. Security and Key Management Considerations
When dealing with point-to-point, 2-member RTP sessions only, there When dealing with point-to-point, 2-member RTP sessions only, there
are few security issues that are relevant to the choice of having one are few security issues that are relevant to the choice of having one
RTP session or multiple RTP sessions. However, there are a few RTP session or multiple RTP sessions. However, there are a few
aspects of multiparty sessions that might warrant consideration. For aspects of multiparty sessions that might warrant consideration. For
general information of possible methods of securing RTP, please general information of possible methods of securing RTP, please
review RTP Security Options [I-D.ietf-avtcore-rtp-security-options]. review RTP Security Options [I-D.ietf-avtcore-rtp-security-options].
6.5.1. Security Context Scope 6.4.1. Security Context Scope
When using SRTP [RFC3711] the security context scope is important and When using SRTP [RFC3711] the security context scope is important and
can be a necessary differentiation in some applications. As SRTP's can be a necessary differentiation in some applications. As SRTP's
crypto suites (so far) is built around symmetric keys, the receiver crypto suites (so far) are built around symmetric keys, the receiver
will need to have the same key as the sender. This results in that will need to have the same key as the sender. This results in that
no one in a multi-party session can be certain that a received packet no one in a multi-party session can be certain that a received packet
really was sent by the claimed sender or by another party having really was sent by the claimed sender or by another party having
access to the key. In most cases this is a sufficient security access to the key. In most cases this is a sufficient security
property, but there are a few cases where this does create property, but there are a few cases where this does create issues.
situations.
The first case is when someone leaves a multi-party session and one The first case is when someone leaves a multi-party session and one
wants to ensure that the party that left can no longer access the wants to ensure that the party that left can no longer access the
media streams. This requires that everyone re-keys without media streams. This requires that everyone re-keys without
disclosing the keys to the excluded party. disclosing the keys to the excluded party.
A second case is when using security as an enforcing mechanism for A second case is when using security as an enforcing mechanism for
differentiation. Take for example a scalable layer or a high quality differentiation. Take for example a scalable layer or a high quality
simulcast version which only premium users are allowed to access. simulcast version which only premium users are allowed to access.
The mechanism preventing a receiver from getting the high quality The mechanism preventing a receiver from getting the high quality
stream can be based on the stream being encrypted with a key that stream can be based on the stream being encrypted with a key that
user can't access without paying premium, having the key-management user can't access without paying premium, having the key-management
limit access to the key. limit access to the key.
SRTP [RFC3711] has not special functions for dealing with different SRTP [RFC3711] has no special functions for dealing with different
sets of master keys for different SSRCs. The key-management sets of master keys for different SSRCs. The key-management
functions has different capabilities to establish different set of functions have different capabilities to establish different set of
keys, normally on a per end-point basis. DTLS-SRTP [RFC5764] and keys, normally on a per endpoint basis. For example, DTLS-SRTP
Security Descriptions [RFC4568] for example establish different keys [RFC5764] and Security Descriptions [RFC4568] establish different
for outgoing and incoming traffic from an end-point. This key usage keys for outgoing and incoming traffic from an endpoint. This key
must be written into the cryptographic context, possibly associated usage has to be written into the cryptographic context, possibly
with different SSRCs. associated with different SSRCs.
6.5.2. Key Management for Multi-party session 6.4.2. Key Management for Multi-party session
Performing key-management for multi-party session can be a challenge. Performing key-management for multi-party session can be a challenge.
This section considers some of the issues. This section considers some of the issues.
Multi-party sessions, such as transport translator based sessions and Multi-party sessions, such as transport translator based sessions and
multicast sessions, cannot use Security Description [RFC4568] nor multicast sessions, cannot use Security Description [RFC4568] nor
DTLS-SRTP [RFC5764] without an extension as each endpoint provides DTLS-SRTP [RFC5764] without an extension as each endpoint provides
its set of keys. In centralised conference, the signalling its set of keys. In centralised conferences, the signalling
counterpart is a conference server and the media plane unicast counterpart is a conference server and the media plane unicast
counterpart (to which DTLS messages would be sent) is the transport counterpart (to which DTLS messages would be sent) is the transport
translator. Thus an extension like Encrypted Key Transport translator. Thus an extension like Encrypted Key Transport
[I-D.ietf-avt-srtp-ekt] is needed or a MIKEY [RFC3830] based solution [I-D.ietf-avt-srtp-ekt] is needed or a MIKEY [RFC3830] based solution
that allows for keying all session participants with the same master that allows for keying all session participants with the same master
key. key.
6.5.3. Complexity Implications 6.4.3. Complexity Implications
The usage of security functions can surface complexity implications The usage of security functions can surface complexity implications
of the choice of multiplexing and topology. This becomes especially of the choice of multiplexing and topology. This becomes especially
evident in RTP topologies having any type of middlebox that processes evident in RTP topologies having any type of middlebox that processes
or modifies RTP/RTCP packets. Where there is very small overhead for or modifies RTP/RTCP packets. Where there is very small overhead for
an RTP translator or mixer to rewrite an SSRC value in the RTP packet an RTP translator or mixer to rewrite an SSRC value in the RTP packet
of an unencrypted session, the cost of doing it when using of an unencrypted session, the cost of doing it when using
cryptographic security functions is higher. For example if using cryptographic security functions is higher. For example if using
SRTP [RFC3711], the actual security context and exact crypto key are SRTP [RFC3711], the actual security context and exact crypto key are
determined by the SSRC field value. If one changes it, the determined by the SSRC field value. If one changes it, the
encryption and authentication tag must be performed using another encryption and authentication tag needs to be performed using another
key. Thus changing the SSRC value implies a decryption using the old key. Thus changing the SSRC value implies a decryption using the old
SSRC and its security context followed by an encryption using the new SSRC and its security context followed by an encryption using the new
one. one.
7. Arch-Types 7. Archetypes
This section discusses some arch-types of how RTP multiplexing can be This section discusses some archetypes of how RTP multiplexing can be
used in applications to achieve certain goals and a summary of their used in applications to achieve certain goals and a summary of their
implications. For each arch-type there is discussion of benefits and implications. For each archetype there is discussion of benefits and
downsides. downsides.
7.1. Single SSRC per Session 7.1. Single SSRC per Session
In this arch-type each endpoint in a point-to-point session has only In this archetype each endpoint in a point-to-point session has only
a single SSRC, thus the RTP session contains only two SSRCs, one a single SSRC, thus the RTP session contains only two SSRCs, one
local and one remote. This session can be used both unidirectional, local and one remote. This session can be used both unidirectional,
i.e. only a single media stream or bi-directional, i.e. both i.e. only a single media stream or bi-directional, i.e. both
endpoints have one media stream each. If the application needs endpoints have one media stream each. If the application needs
additional media flows between the endpoints, they will have to additional media flows between the endpoints, they will have to
establish additional RTP sessions. establish additional RTP sessions.
The Pros: The Pros:
1. This arch-type has great legacy interoperability potential as it 1. This archetype has great legacy interoperability potential as it
will not tax any RTP stack implementations. will not tax any RTP stack implementations.
2. The signalling has good possibilities to negotiate and describe 2. The signalling has good possibilities to negotiate and describe
the exact formats and bit-rates for each media stream, especially the exact formats and bit-rates for each media stream, especially
using today's tools in SDP. using today's tools in SDP.
3. It does not matter if usage or purpose of the media stream is 3. It does not matter if usage or purpose of the media stream is
signalled on media stream level or session level as there is no signalled on media stream level or session level as there is no
difference. difference.
4. It is possible to control security association per RTP session 4. It is possible to control security association per RTP media
with current key-management. stream with current key-management, since each media stream is
directly related to an RTP session, and the keying operates on a
per-session basis.
The Cons: The Cons:
a. The number of required RTP sessions grows directly in proportion a. The number of RTP sessions grows directly in proportion with the
with the number of media streams, which has the implications: number of media streams, which has the implications:
* Linear growth of the amount of NAT/FW state with number of * Linear growth of the amount of NAT/FW state with number of
media streams. media streams.
* Increased delay and resource consumption from NAT/FW * Increased delay and resource consumption from NAT/FW
traversal. traversal.
* Likely larger signalling message and signalling processing * Likely larger signalling message and signalling processing
requirement due to the amount of session related information. requirement due to the amount of session related information.
* Higher potential for a single media stream to fail during * Higher potential for a single media stream to fail during
transport between the endpoints. transport between the endpoints.
b. When the number of RTP sessions grows, the amount of explicit b. When the number of RTP sessions grows, the amount of explicit
state for relating media stream also grows, linearly or possibly state for relating media stream also grows, linearly or possibly
exponentially, depending on how the application needs to relate exponentially, depending on how the application needs to relate
media streams. media streams.
c. The port consumption may become a problem for centralised c. The port consumption might become a problem for centralised
services, where the central node's port consumption grows rapidly services, where the central node's port consumption grows rapidly
with the number of sessions. with the number of sessions.
d. For applications where the media streams are highly dynamic in d. For applications where the media streams are highly dynamic in
their usage, i.e. entering and leaving, the amount of signalling their usage, i.e. entering and leaving, the amount of signalling
can grow high. Issues arising from the timely establishment of can grow high. Issues arising from the timely establishment of
additional RTP sessions can also arise. additional RTP sessions can also arise.
e. Cross session RTCP requests needs is likely to exist and may e. Cross session RTCP requests might be needed, and the fact that
cause issues. they're impossible can cause issues.
f. If the same SSRC value is reused in multiple RTP sessions rather f. If the same SSRC value is reused in multiple RTP sessions rather
than being randomly chosen, interworking with applications that than being randomly chosen, interworking with applications that
uses another multiplexing structure than this application will uses another multiplexing structure than this application will
have issues and require SSRC translation. require SSRC translation.
g. Cannot be used with Any Source Multicast (ASM) as one cannot g. Cannot be used with Any Source Multicast (ASM) as one cannot
guarantee that only two endpoints participate as packet senders. guarantee that only two endpoints participate as packet senders.
Using SSM, it is possible to restrict to these requirements if no Using SSM, it is possible to restrict to these requirements if no
RTCP feedback is injected back into the SSM group. RTCP feedback is injected back into the SSM group.
h. For most security mechanisms, each RTP session or transport flow h. For most security mechanisms, each RTP session or transport flow
requires individual key-management and security association requires individual key-management and security association
establishment thus increasing the overhead. establishment thus increasing the overhead.
RTP applications that need to inter-work with legacy RTP RTP applications that need to inter-work with legacy RTP
applications, like VoIP and video conferencing, can potentially applications, like most deployed VoIP and video conferencing
benefit from this structure. However, a large number of media solutions, can potentially benefit from this structure. However, a
descriptions in SDP can also run into issues with existing large number of media descriptions in SDP can also run into issues
implementations. For any application needing a larger number of with existing implementations. For any application needing a larger
media flows, the overhead can become very significant. This number of media flows, the overhead can become very significant.
structure is also not suitable for multi-party sessions, as any given This structure is also not suitable for multi-party sessions, as any
media stream from each participant, although having same usage in the given media stream from each participant, although having same usage
application, must have its own RTP session. In addition, the dynamic in the application, needs its own RTP session. In addition, the
behaviour that can arise in multi-party applications can tax the dynamic behaviour that can arise in multi-party applications can tax
signalling system and make timely media establishment more difficult. the signalling system and make timely media establishment more
difficult.
7.2. Multiple SSRCs of the Same Media Type 7.2. Multiple SSRCs of the Same Media Type
In this arch-type, each RTP session serves only a single media type. In this archetype, each RTP session serves only a single media type.
The RTP session can contain multiple media streams, either from a The RTP session can contain multiple media streams, either from a
single endpoint or due to multiple endpoints. This commonly creates single endpoint or from multiple endpoints. This commonly creates a
a low number of RTP sessions, typically only two one for audio and low number of RTP sessions, typically only one for audio and one for
one for video with a corresponding need for two listening ports when video, with a corresponding need for two listening ports when using
using RTP and RTCP multiplexing. RTP/RTCP multiplexing.
The Pros: The Pros:
1. Low number of RTP sessions needed compared to single SSRC case. 1. Low number of RTP sessions needed compared to single SSRC case.
This implies: This implies:
* Reduced NAT/FW state * Reduced NAT/FW state
* Lower NAT/FW Traversal Cost in both processing and delay. * Lower NAT/FW Traversal Cost in both processing and delay.
skipping to change at page 32, line 32 skipping to change at page 32, line 11
a. May have some need for cross session RTCP requests for things a. May have some need for cross session RTCP requests for things
that affect both media types in an asynchronous way. that affect both media types in an asynchronous way.
b. Some potential for concern with legacy implementations that does b. Some potential for concern with legacy implementations that does
not support the RTP specification fully when it comes to handling not support the RTP specification fully when it comes to handling
multiple SSRC per endpoint. multiple SSRC per endpoint.
c. Will not be able to control security association for sets of c. Will not be able to control security association for sets of
media streams within the same media type with today's key- media streams within the same media type with today's key-
management mechanisms, only between SDP media descriptions. management mechanisms, unless these are split into different RTP
sessions.
For RTP applications where all media streams of the same media type For RTP applications where all media streams of the same media type
share same usage, this structure provides efficiency gains in amount share same usage, this structure provides efficiency gains in amount
of network state used and provides more faith sharing with other of network state used and provides more fate sharing with other media
media flows of the same type. At the same time, it is still flows of the same type. At the same time, it is still maintaining
maintaining almost all functionalities when it comes to negotiation almost all functionalities when it comes to negotiation in the
in the signalling of the properties for the individual media type and signalling of the properties for the individual media type and also
also enabling flow based QoS prioritisation between media types. It enabling flow based QoS prioritisation between media types. It
handles multi-party session well, independently of multicast or handles multi-party session well, independently of multicast or
centralised transport distribution, as additional sources can centralised transport distribution, as additional sources can
dynamically enter and leave the session. dynamically enter and leave the session.
7.3. Multiple Sessions for one Media type 7.3. Multiple Sessions for one Media type
In this arch-type one goes one step further than in the above In this archetype one goes one step further than in the above
(Section 7.2) by using multiple RTP sessions also for a single media (Section 7.2) by using multiple RTP sessions also for a single media
type. The main reason for going in this direction is that the RTP type, but still not as far as having a single SSRC per RTP session.
The main reason for going in this direction is that the RTP
application needs separation of the media streams due to their usage. application needs separation of the media streams due to their usage.
Some typical reasons for going to this arch-type are scalability over Some typical reasons for going to this archetype are scalability over
multicast, simulcast, need for extended QoS prioritisation of media multicast, simulcast, need for extended QoS prioritisation of media
streams due to their usage in the application, or the need for fine streams due to their usage in the application, or the need for fine-
granular signalling using today's tools. grained signalling using today's tools.
The Pros: The Pros:
1. More suitable for Multicast usage where receivers can 1. More suitable for Multicast usage where receivers can
individually select which RTP sessions they want to participate individually select which RTP sessions they want to participate
in, assuming each RTP session has its own multicast group. in, assuming each RTP session has its own multicast group.
2. Detailed indication of the application's usage of the media 2. Indication of the application's usage of the media stream, where
stream, where multiple different usages exist. multiple different usages exist.
3. Less need for SSRC specific explicit signalling for each media 3. Less need for SSRC specific explicit signalling for each media
stream and thus reduced need for explicit and timely signalling. stream and thus reduced need for explicit and timely signalling.
4. Enables detailed QoS prioritisation for flow based mechanisms. 4. Enables detailed QoS prioritisation for flow based mechanisms.
5. Works well with de-composite endpoints. 5. Works well with de-composite endpoints.
6. Handles dynamic usage of media streams well. 6. Handles dynamic usage of media streams well.
skipping to change at page 33, line 45 skipping to change at page 33, line 27
a. Increases the amount of RTP sessions compared to Multiple SSRCs a. Increases the amount of RTP sessions compared to Multiple SSRCs
of the Same Media Type. of the Same Media Type.
b. Increased amount of session configuration state. b. Increased amount of session configuration state.
c. May need synchronised cross-session RTCP requests and require c. May need synchronised cross-session RTCP requests and require
some consideration due to this. some consideration due to this.
d. For media streams that are part of scalability, simulcast or d. For media streams that are part of scalability, simulcast or
transport robustness it will be needed to bind sources, which transport robustness it will be needed to bind sources, which
must support multiple RTP sessions. need to support multiple RTP sessions.
e. Some potential for concern with legacy implementations that does e. Some potential for concern with legacy implementations that does
not support the RTP specification fully when it comes to handling not support the RTP specification fully when it comes to handling
multiple SSRC per endpoint. multiple SSRC per endpoint.
f. Higher overhead for security association establishment. f. Higher overhead for security association establishment.
g. If the applications need finer control than on media type level g. If the applications need finer control than on media type level
over which session participants that are included in different over which session participants that are included in different
sets of security associations, most of today's key-management sets of security associations, most of today's key-management
skipping to change at page 34, line 20 skipping to change at page 34, line 7
For more complex RTP applications that have several different usages For more complex RTP applications that have several different usages
for media streams of the same media type and / or uses scalability or for media streams of the same media type and / or uses scalability or
simulcast, this solution can enable those functions at the cost of simulcast, this solution can enable those functions at the cost of
increased overhead associated with the additional sessions. This increased overhead associated with the additional sessions. This
type of structure is suitable for more advanced applications as well type of structure is suitable for more advanced applications as well
as multicast based applications requiring differentiation to as multicast based applications requiring differentiation to
different participants. different participants.
7.4. Multiple Media Types in one Session 7.4. Multiple Media Types in one Session
This arch-type is to use a single RTP session for multiple different This archetype is to use a single RTP session for multiple different
media types, like audio and video, and possibly also transport media types, like audio and video, and possibly also transport
robustness mechanisms like FEC or Retransmission. Each media stream robustness mechanisms like FEC or Retransmission. Each media stream
will use its own SSRC and a given SSRC value from a particular will use its own SSRC and a given SSRC value from a particular
endpoint will never use the SSRC for more than a single media type. endpoint will never use the SSRC for more than a single media type.
The Pros: The Pros:
1. Single RTP session which implies: 1. Single RTP session which implies:
* Minimal NAT/FW state. * Minimal NAT/FW state.
skipping to change at page 35, line 8 skipping to change at page 34, line 43
The Cons: The Cons:
a. Less suitable for interworking with other applications that uses a. Less suitable for interworking with other applications that uses
individual RTP sessions per media type or multiple sessions for a individual RTP sessions per media type or multiple sessions for a
single media type, due to need of SSRC translation. single media type, due to need of SSRC translation.
b. Negotiation of bandwidth for the different media types is b. Negotiation of bandwidth for the different media types is
currently not possible in SDP. This requires SDP extensions to currently not possible in SDP. This requires SDP extensions to
enable payload or source specific bandwidth. Likely to be a enable payload or source specific bandwidth. Likely to be a
problem due to media type asymmetry in required bandwidth. problem due to media type asymmetry in needed bandwidth.
c. Not suitable for de-composite end-points as it requires higher c. Not suitable for de-composite endpoints.
bandwidth and processing.
d. Flow based QoS cannot provide separate treatment to some media d. Flow based QoS cannot provide separate treatment to some media
streams compared to other in the single RTP session. streams compared to others in the single RTP session.
e. If there is significant asymmetry between the media streams RTCP e. If there is significant asymmetry between the media streams' RTCP
reporting needs, there are some challenges in configuration and reporting needs, there are some challenges in configuration and
usage to avoid wasting RTCP reporting on the media stream that usage to avoid wasting RTCP reporting on the media stream that
does not need that frequent reporting. does not need that frequent reporting.
f. Not suitable for applications where some receivers like to f. Not suitable for applications where some receivers like to
receive only a subset of the media streams, especially if receive only a subset of the media streams, especially if
multicast or transport translator is being used. multicast or transport translator is being used.
g. Additional concern with legacy implementations that does not g. Additional concern with legacy implementations that do not
support the RTP specification fully when it comes to handling support the RTP specification fully when it comes to handling
multiple SSRC per endpoint, as also multiple simultaneous media multiple SSRC per endpoint, as also multiple simultaneous media
types needs to be handled. types needs to be handled.
h. If the applications need finer control over which session h. If the applications need finer control over which session
participants that are included in different sets of security participants that are included in different sets of security
associations, most key-management will have difficulties associations, most key-management will have difficulties
establishing such a session. establishing such a session.
7.5. Summary 7.5. Summary
There are some clear relations between these arch-types. Both the There are some clear relations between these archetypes. Both the
"single SSRC per RTP session" and the "multiple media types in one "single SSRC per RTP session" and the "multiple media types in one
session" are cases which require full explicit signalling of the session" are cases which require full explicit signalling of the
media stream relations. However, they operate on two different media stream relations. However, they operate on two different
levels where the first primarily enables session level binding, and levels where the first primarily enables session level binding, and
the second needs to do it all on SSRC level. From another the second needs to do it all on SSRC level. From another
perspective, the two solutions are the two extreme points when it perspective, the two solutions are the two extreme points when it
comes to number of RTP sessions required. comes to number of RTP sessions needed.
The two other arch-types "Multiple SSRCs of the Same Media Type" and The two other archetypes "Multiple SSRCs of the Same Media Type" and
"Multiple Sessions for one Media Type" are examples of two other "Multiple Sessions for one Media Type" are examples of two other
cases that first of all allows for some implicit mapping of the role cases that first of all allows for some implicit mapping of the role
or usage of the media streams based on which RTP session they appear or usage of the media streams based on which RTP session they appear
in. It thus potentially allows for less signalling and in particular in. It thus potentially allows for less signalling and in particular
reduced need for real-time signalling in dynamic sessions. They also reduced need for real-time signalling in dynamic sessions. They also
represent points in between the first two when it comes to amount of represent points in between the first two when it comes to amount of
RTP sessions established, i.e. representing an attempt to reduce the RTP sessions established, i.e. representing an attempt to reduce the
amount of sessions as much as possible without compromising the amount of sessions as much as possible without compromising the
functionality the session provides both on network level and on functionality the session provides both on network level and on
signalling level. signalling level.
8. Summary considerations and guidelines 8. Summary considerations and guidelines
8.1. Guidelines 8.1. Guidelines
This section contains a number of recommendations for implementors or This section contains a number of recommendations for implementors or
specification writers when it comes to handling multi-stream. specification writers when it comes to handling multi-stream.
Do not Require the same SSRC across Sessions: As discussed in Do not Require the same SSRC across Sessions: As discussed in
Section 6.2.5 there exist drawbacks in using the same SSRC in Section 6.1.5 there exist drawbacks in using the same SSRC in
multiple RTP sessions as a mechanism to bind related media streams multiple RTP sessions as a mechanism to bind related media streams
together. It is instead recommended that a mechanism to together. It is instead suggested that a mechanism to explicitly
explicitly signal the relation is used, either in RTP/RTCP or in signal the relation is used, either in RTP/RTCP or in the used
the used signalling mechanism that establishes the RTP session(s). signalling mechanism that establishes the RTP session(s).
Use additional SSRCs additional Media Sources: In the cases an RTP Use additional SSRCs additional Media Sources: In the cases where an
endpoint needs to transmit additional media streams of the same RTP endpoint needs to transmit additional media streams of the
media type in the application, with the same processing same media type in the application, with the same processing
requirements at the network and RTP layers, it is recommended to requirements at the network and RTP layers, it is suggested to
send them as additional SSRCs in the same RTP session. For send them as additional SSRCs in the same RTP session. For
example a telepresence room where there are three cameras, and example a telepresence room where there are three cameras, and
each camera captures 2 persons sitting at the table, sending each each camera captures 2 persons sitting at the table, sending each
camera as its own SSRC within a single RTP session is recommended. camera as its own SSRC within a single RTP session is suggested.
Use additional RTP sessions for streams with different requirements: Use additional RTP sessions for streams with different requirements:
When media streams have different processing requirements from the When media streams have different processing requirements from the
network or the RTP layer at the endpoints, it is recommended that network or the RTP layer at the endpoints, it is suggested that
the different types of streams are put in different RTP sessions. the different types of streams are put in different RTP sessions.
This includes the case where different participants want different This includes the case where different participants want different
subsets of the set of RTP streams. subsets of the set of RTP streams.
When using multiple RTP Sessions use grouping: When using Multiple When using multiple RTP Sessions use grouping: When using Multiple
RTP session solutions, it is recommended to be explicitly group RTP session solutions, it is suggested to explicitly group the
the involved RTP sessions when needed using the signalling involved RTP sessions when needed using the signalling mechanism,
mechanism, for example The Session Description Protocol (SDP) for example The Session Description Protocol (SDP) Grouping
Grouping Framework. [RFC5888], using some appropriate grouping Framework. [RFC5888], using some appropriate grouping semantics.
semantics.
RTP/RTCP Extensions May Support Additional SSRCs as well as Multiple RTP sessions: RTP/RTCP Extensions May Support Additional SSRCs as well as Multiple RTP sessions:
When defining an RTP or RTCP extension, the creator needs to When defining an RTP or RTCP extension, the creator needs to
consider if this extension is applicable to usage with additional consider if this extension is applicable to usage with additional
SSRCs and Multiple RTP sessions. Any extension intended to be SSRCs and Multiple RTP sessions. Any extension intended to be
generic is recommended to support both. Applications that are not generic is suggested to support both. Applications that are not
as generally applicable will have to consider if interoperability as generally applicable will have to consider if interoperability
is better served by defining a single solution or providing both is better served by defining a single solution or providing both
options. options.
Transport Support Extensions: When defining new RTP/RTCP extensions Transport Support Extensions: When defining new RTP/RTCP extensions
intended for transport support, like the retransmission or FEC intended for transport support, like the retransmission or FEC
mechanisms, they are recommended to include support for both mechanisms, they are expected to include support for both
additional SSRCs and multiple RTP sessions so that application additional SSRCs and multiple RTP sessions so that application
developers can choose freely from the set of mechanisms without developers can choose freely from the set of mechanisms without
concerning themselves with which of the multiplexing choices a concerning themselves with which of the multiplexing choices a
particular solution supports. particular solution supports.
9. IANA Considerations 9. IANA Considerations
This document makes no request of IANA. This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an Note to RFC Editor: this section can be removed on publication as an
RFC. RFC.
10. Security Considerations 10. Security Considerations
There is discussion of the security implications of choosing SSRC vs There is discussion of the security implications of choosing SSRC vs
Multiple RTP session in Section 6.5. Multiple RTP session in Section 6.4.
11. References 11. References
11.1. Normative References 11.1. Normative References
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003. Applications", STD 64, RFC 3550, July 2003.
11.2. Informative References 11.2. Informative References
skipping to change at page 38, line 8 skipping to change at page 37, line 43
progress), June 2012. progress), June 2012.
[I-D.ietf-avt-srtp-ekt] [I-D.ietf-avt-srtp-ekt]
Wing, D., McGrew, D., and K. Fischer, "Encrypted Key Wing, D., McGrew, D., and K. Fischer, "Encrypted Key
Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03
(work in progress), October 2011. (work in progress), October 2011.
[I-D.ietf-avtcore-6222bis] [I-D.ietf-avtcore-6222bis]
Begen, A., Perkins, C., Wing, D., and E. Rescorla, Begen, A., Perkins, C., Wing, D., and E. Rescorla,
"Guidelines for Choosing RTP Control Protocol (RTCP) "Guidelines for Choosing RTP Control Protocol (RTCP)
Canonical Names (CNAMEs)", draft-ietf-avtcore-6222bis-02 Canonical Names (CNAMEs)", draft-ietf-avtcore-6222bis-06
(work in progress), April 2013. (work in progress), July 2013.
[I-D.ietf-avtcore-multi-media-rtp-session] [I-D.ietf-avtcore-multi-media-rtp-session]
Westerlund, M., Perkins, C., and J. Lennox, "Multiple Westerlund, M., Perkins, C., and J. Lennox, "Sending
Media Types in an RTP Session", draft-ietf-avtcore-multi- Multiple Types of Media in a Single RTP Session", draft-
media-rtp-session-02 (work in progress), February 2013. ietf-avtcore-multi-media-rtp-session-03 (work in
progress), July 2013.
[I-D.ietf-avtcore-rtp-security-options] [I-D.ietf-avtcore-rtp-security-options]
Westerlund, M. and C. Perkins, "Options for Securing RTP Westerlund, M. and C. Perkins, "Options for Securing RTP
Sessions", draft-ietf-avtcore-rtp-security-options-02 Sessions", draft-ietf-avtcore-rtp-security-options-03
(work in progress), February 2013. (work in progress), May 2013.
[I-D.ietf-avtext-multiple-clock-rates] [I-D.ietf-avtext-multiple-clock-rates]
Petit-Huguenin, M. and G. Zorn, "Support for Multiple Petit-Huguenin, M. and G. Zorn, "Support for Multiple
Clock Rates in an RTP Session", draft-ietf-avtext- Clock Rates in an RTP Session", draft-ietf-avtext-
multiple-clock-rates-09 (work in progress), April 2013. multiple-clock-rates-09 (work in progress), April 2013.
[I-D.ietf-mmusic-msid]
Alvestrand, H., "Cross Session Stream Identification in
the Session Description Protocol", draft-ietf-mmusic-
msid-00 (work in progress), February 2013.
[I-D.ietf-mmusic-sdp-bundle-negotiation] [I-D.ietf-mmusic-sdp-bundle-negotiation]
Holmberg, C., Alvestrand, H., and C. Jennings, Holmberg, C., Alvestrand, H., and C. Jennings,
"Multiplexing Negotiation Using Session Description "Multiplexing Negotiation Using Session Description
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
bundle-negotiation-03 (work in progress), February 2013. bundle-negotiation-04 (work in progress), June 2013.
[I-D.ietf-payload-rtp-howto] [I-D.ietf-payload-rtp-howto]
Westerlund, M., "How to Write an RTP Payload Format", Westerlund, M., "How to Write an RTP Payload Format",
draft-ietf-payload-rtp-howto-03 (work in progress), April draft-ietf-payload-rtp-howto-04 (work in progress), June
2013. 2013.
[I-D.lennox-avtcore-rtp-multi-stream] [I-D.lennox-avtcore-rtp-multi-stream]
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, "RTP Lennox, J., Westerlund, M., Wu, W., and C. Perkins, "RTP
Considerations for Endpoints Sending Multiple Media Considerations for Endpoints Sending Multiple Media
Streams", draft-lennox-avtcore-rtp-multi-stream-02 (work Streams", draft-lennox-avtcore-rtp-multi-stream-02 (work
in progress), February 2013. in progress), February 2013.
[I-D.lennox-mmusic-sdp-source-selection] [I-D.lennox-mmusic-sdp-source-selection]
Lennox, J. and H. Schulzrinne, "Mechanisms for Media Lennox, J. and H. Schulzrinne, "Mechanisms for Media
skipping to change at page 42, line 5 skipping to change at page 41, line 42
with the same SSRC, thus using the same timestamp and sequence number with the same SSRC, thus using the same timestamp and sequence number
space. This has many effects: space. This has many effects:
1. Putting restraint on RTP timestamp rate for the multiplexed 1. Putting restraint on RTP timestamp rate for the multiplexed
media. For example, media streams that use different RTP media. For example, media streams that use different RTP
timestamp rates cannot be combined, as the timestamp values need timestamp rates cannot be combined, as the timestamp values need
to be consistent across all multiplexed media frames. Thus to be consistent across all multiplexed media frames. Thus
streams are forced to use the same rate. When this is not streams are forced to use the same rate. When this is not
possible, Payload Type multiplexing cannot be used. possible, Payload Type multiplexing cannot be used.
2. Many RTP payload formats may fragment a media object over 2. Many RTP payload formats can fragment a media object over
multiple packets, like parts of a video frame. These payload multiple packets, like parts of a video frame. These payload
formats need to determine the order of the fragments to formats need to determine the order of the fragments to
correctly decode them. Thus it is important to ensure that all correctly decode them. Thus it is important to ensure that all
fragments related to a frame or a similar media object are fragments related to a frame or a similar media object are
transmitted in sequence and without interruptions within the transmitted in sequence and without interruptions within the
object. This can relatively simple be solved on the sender side object. This can relatively simple be solved on the sender side
by ensuring that the fragments of each media stream are sent in by ensuring that the fragments of each media stream are sent in
sequence. sequence.
3. Some media formats require uninterrupted sequence number space 3. Some media formats require uninterrupted sequence number space
skipping to change at page 42, line 33 skipping to change at page 42, line 23
4. Sending multiple streams in the same sequence number space makes 4. Sending multiple streams in the same sequence number space makes
it impossible to determine which Payload Type and thus which it impossible to determine which Payload Type and thus which
stream a packet loss relates to. stream a packet loss relates to.
5. If RTP Retransmission [RFC4588] is used and there is a loss, it 5. If RTP Retransmission [RFC4588] is used and there is a loss, it
is possible to ask for the missing packet(s) by SSRC and is possible to ask for the missing packet(s) by SSRC and
sequence number, not by Payload Type. If only some of the sequence number, not by Payload Type. If only some of the
Payload Type multiplexed streams are of interest, there is no Payload Type multiplexed streams are of interest, there is no
way of telling which missing packet(s) belong to the interesting way of telling which missing packet(s) belong to the interesting
stream(s) and all lost packets must be requested, wasting stream(s) and all lost packets need be requested, wasting
bandwidth. bandwidth.
6. The current RTCP feedback mechanisms are built around providing 6. The current RTCP feedback mechanisms are built around providing
feedback on media streams based on stream ID (SSRC), packet feedback on media streams based on stream ID (SSRC), packet
(sequence numbers) and time interval (RTP Timestamps). There is (sequence numbers) and time interval (RTP Timestamps). There is
almost never a field to indicate which Payload Type is reported, almost never a field to indicate which Payload Type is reported,
so sending feedback for a specific media stream is difficult so sending feedback for a specific media stream is difficult
without extending existing RTCP reporting. without extending existing RTCP reporting.
7. The current RTCP media control messages [RFC5104] specification 7. The current RTCP media control messages [RFC5104] specification
skipping to change at page 43, line 24 skipping to change at page 43, line 10
is no defined way to group Payload Types. is no defined way to group Payload Types.
10. It is currently not possible to signal bandwidth requirements 10. It is currently not possible to signal bandwidth requirements
per media stream when using Payload Type Multiplexing. per media stream when using Payload Type Multiplexing.
11. Most existing SDP media level attributes cannot be applied on a 11. Most existing SDP media level attributes cannot be applied on a
per Payload Type level and would require re-definition in that per Payload Type level and would require re-definition in that
context. context.
12. A legacy endpoint that doesn't understand the indication that 12. A legacy endpoint that doesn't understand the indication that
different RTP payload types are different media streams may be different RTP payload types are different media streams might be
slightly confused by the large amount of possibly overlapping or slightly confused by the large amount of possibly overlapping or
identically defined RTP Payload Types. identically defined RTP Payload Types.
Appendix B. Proposals for Future Work Appendix B. Proposals for Future Work
The above discussion and guidelines indicates that a small set of The above discussion and guidelines indicates that a small set of
extension mechanisms could greatly improve the situation when it extension mechanisms could greatly improve the situation when it
comes to using multiple streams independently of Multiple RTP session comes to using multiple streams independently of Multiple RTP session
or Additional SSRC. These extensions are: or Additional SSRC. These extensions are:
Media Source Identification: A Media source identification that can Media Source Identification: A Media source identification that can
be used to bind together media streams that are related to the be used to bind together media streams that are related to the
same media source. A proposal same media source. A proposal
[I-D.westerlund-avtext-rtcp-sdes-srcname] exist for a new SDES [I-D.westerlund-avtext-rtcp-sdes-srcname] exist for a new SDES
item SRCNAME that also can be used with the a=ssrc SDP attribute item SRCNAME that also can be used with the a=ssrc SDP attribute
to provide signalling layer binding information. to provide signalling layer binding information.
MSID: A Media Stream identification scheme that can be used to
signal relationships between SSRCs that can be in the same or in
different RTP sessions. Described in [I-D.ietf-mmusic-msid]
SSRC limitations within RTP sessions: By providing a signalling SSRC limitations within RTP sessions: By providing a signalling
solution that allows the signalling peers to explicitly express solution that allows the signalling peers to explicitly express
both support and limitations on how many simultaneous media both support and limitations on how many simultaneous media
streams an endpoint can handle within a given RTP Session. That streams an endpoint can handle within a given RTP Session. That
ensures that usage of Additional SSRC occurs when supported and ensures that usage of Additional SSRC occurs when supported and
without overloading an endpoint. This extension is proposed in without overloading an endpoint. This extension is proposed in
[I-D.westerlund-avtcore-max-ssrc]. [I-D.westerlund-avtcore-max-ssrc].
Appendix C. RTP Specification Clarifications Appendix C. Signalling considerations
This section describes a number of clarifications to the RTP
specifications that are likely necessary for aligned behaviour when
RTP sessions contain more SSRCs than one local and one remote.
All of the below proposals are under consideration in
[I-D.lennox-avtcore-rtp-multi-stream].
C.1. RTCP Reporting from all SSRCs
When one has multiple SSRC in an RTP node, all these SSRC must send
some RTP or RTCP packet as long as the SSRC exist. It is not
sufficient that only one SSRC in the node sends report blocks on the
incoming RTP streams; any SSRC that intends to remain in the session
must send some packets to avoid timing out according to the rules in
RFC 3550 section 6.3.5.
It has been hypothesised that a third party monitor may be confused
by not necessarily being able to determine that all these SSRC are in
fact co-located and originate from the same stack instance; if this
hypothesis is true, this may argue for having all the sources send
full reception reports, even though they are reporting the same
packet delivery.
The contrary argument is that such double reporting may confuse the
third party monitor even more by making it seem that utilisation of
the last-hop link to the recipient is (number of SSRCs) times higher
than what it actually is.
C.2. RTCP Self-reporting
For any RTP node that sends more than one SSRC, there is the question
if SSRC1 needs to report its reception of SSRC2 and vice versa. The
reason that they in fact need to report on all other local streams as
being received is report consistency. The hypothetical third party
monitor that considers the full matrix of media streams and all known
SSRC reports on these media streams would detect a gap in the reports
which could be a transport issue unless identified as in fact being
sources from the same node.
C.3. Combined RTCP Packets
When a node contains multiple SSRCs, it is questionable if an RTCP
compound packet can only contain RTCP packets from a single SSRC or
if multiple SSRCs can include their packets in a joint compound
packet. The high level question is a matter for any receiver
processing on what to expect. In addition to that question there is
the issue of how to use the RTCP timer rules in these cases, as the
existing rules are focused on determining when a single SSRC can
send.
Appendix D. Signalling considerations
Signalling is not an architectural consideration for RTP itself, so Signalling is not an architectural consideration for RTP itself, so
this discussion has been moved to an appendix. However, it is hugely this discussion has been moved to an appendix. However, it is hugely
important for anyone building complete applications, so it is important for anyone building complete applications, so it is
deserving of discussion. deserving of discussion.
The issues raised here need to be addressed in the WGs that deal with The issues raised here need to be addressed in the WGs that deal with
signalling; they cannot be addressed by tweaking, extending or signalling; they cannot be addressed by tweaking, extending or
profiling RTP. profiling RTP.
D.1. Signalling Aspects C.1. Signalling Aspects
There exist various signalling solutions for establishing RTP There exist various signalling solutions for establishing RTP
sessions. Many are SDP [RFC4566] based, however SDP functionality is sessions. Many are SDP [RFC4566] based, however SDP functionality is
also dependent on the signalling protocols carrying the SDP. Where also dependent on the signalling protocols carrying the SDP. Where
RTSP [RFC2326] and SAP [RFC2974] both use SDP in a declarative RTSP [RFC2326] and SAP [RFC2974] both use SDP in a declarative
fashion, while SIP [RFC3261] uses SDP with the additional definition fashion, while SIP [RFC3261] uses SDP with the additional definition
of Offer/Answer [RFC3264]. The impact on signalling and especially of Offer/Answer [RFC3264]. The impact on signalling and especially
SDP needs to be considered as it can greatly affect how to deploy a SDP needs to be considered as it can greatly affect how to deploy a
certain multiplexing point choice. certain multiplexing point choice.
D.1.1. Session Oriented Properties C.1.1. Session Oriented Properties
One aspect of the existing signalling is that it is focused around One aspect of the existing signalling is that it is focused around
sessions, or at least in the case of SDP the media description. sessions, or at least in the case of SDP the media description.
There are a number of things that are signalled on a session level/ There are a number of things that are signalled on a session level/
media description but those are not necessarily strictly bound to an media description but those are not necessarily strictly bound to an
RTP session and could be of interest to signal specifically for a RTP session and could be of interest to signal specifically for a
particular media stream (SSRC) within the session. The following particular media stream (SSRC) within the session. The following
properties have been identified as being potentially useful to signal properties have been identified as being potentially useful to signal
not only on RTP session level: not only on RTP session level:
skipping to change at page 46, line 15 skipping to change at page 44, line 38
o Which SSRC that will use which RTP Payload Types (this will be o Which SSRC that will use which RTP Payload Types (this will be
visible from the first media packet, but is sometimes useful to visible from the first media packet, but is sometimes useful to
know before packet arrival). know before packet arrival).
Some of these issues are clearly SDP's problem rather than RTP Some of these issues are clearly SDP's problem rather than RTP
limitations. However, if the aim is to deploy an solution using limitations. However, if the aim is to deploy an solution using
additional SSRCs that contains several sets of media streams with additional SSRCs that contains several sets of media streams with
different properties (encoding/packetization parameter, bit-rate, different properties (encoding/packetization parameter, bit-rate,
etc), putting each set in a different RTP session would directly etc), putting each set in a different RTP session would directly
enable negotiation of the parameters for each set. If insisting on enable negotiation of the parameters for each set. If insisting on
Additional SSRC only, a number of signalling extensions are needed to additional SSRC only, a number of signalling extensions are needed to
clarify that there are multiple sets of media streams with different clarify that there are multiple sets of media streams with different
properties and that they shall in fact be kept different, since a properties and that they need in fact be kept different, since a
single set will not satisfy the application's requirements. single set will not satisfy the application's requirements.
For some parameters, such as resolution and framerate, a SSRC-linked For some parameters, such as resolution and framerate, a SSRC-linked
mechanism has been proposed: mechanism has been proposed:
[I-D.lennox-mmusic-sdp-source-selection]. [I-D.lennox-mmusic-sdp-source-selection].
D.1.2. SDP Prevents Multiple Media Types C.1.2. SDP Prevents Multiple Media Types
SDP chose to use the m= line both to delineate an RTP session and to SDP chose to use the m= line both to delineate an RTP session and to
specify the top level of the MIME media type; audio, video, text, specify the top level of the MIME media type; audio, video, text,
image, application. This media type is used as the top-level media image, application. This media type is used as the top-level media
type for identifying the actual payload format bound to a particular type for identifying the actual payload format bound to a particular
payload type using the rtpmap attribute. This binding has to be payload type using the rtpmap attribute. This binding has to be
loosened in order to use SDP to describe RTP sessions containing loosened in order to use SDP to describe RTP sessions containing
multiple MIME top level types. multiple MIME top level types.
There is an accepted WG item in the MMUSIC WG to define how multiple There is an accepted WG item in the MMUSIC WG to define how multiple
media lines describe a single underlying transport media lines describe a single underlying transport
[I-D.ietf-mmusic-sdp-bundle-negotiation] and thus it becomes possible [I-D.ietf-mmusic-sdp-bundle-negotiation] and thus it becomes possible
in SDP to define one RTP session with media types having different in SDP to define one RTP session with media types having different
MIME top level types. MIME top level types.
D.1.3. Signalling Media Stream Usage C.1.3. Signalling Media Stream Usage
Media streams being transported in RTP has some particular usage in Media streams being transported in RTP has some particular usage in
an RTP application. This usage of the media stream is in many an RTP application. This usage of the media stream is in many
applications so far implicitly signalled. For example, an applications so far implicitly signalled. For example, an
application may choose to take all incoming audio RTP streams, mix application might choose to take all incoming audio RTP streams, mix
them and play them out. However, in more advanced applications that them and play them out. However, in more advanced applications that
use multiple media streams there will be more than a single usage or use multiple media streams there will be more than a single usage or
purpose among the set of media streams being sent or received. RTP purpose among the set of media streams being sent or received. RTP
applications will need to signal this usage somehow. The signalling applications will need to signal this usage somehow. The signalling
used will have to identify the media streams affected by their RTP- used will have to identify the media streams affected by their RTP-
level identifiers, which means that they have to be identified either level identifiers, which means that they have to be identified either
by their session or by their SSRC + session. by their session or by their SSRC + session.
In some applications, the receiver cannot utilise the media stream at In some applications, the receiver cannot utilise the media stream at
all before it has received the signalling message describing the all before it has received the signalling message describing the
media stream and its usage. In other applications, there exists a media stream and its usage. In other applications, there exists a
default handling that is appropriate. default handling that is appropriate.
If all media streams in an RTP session are to be treated in the same If all media streams in an RTP session are to be treated in the same
way, identifying the session is enough. If SSRCs in a session are to way, identifying the session is enough. If SSRCs in a session are to
be treated differently, signalling must identify both the session and be treated differently, signalling needs to identify both the session
the SSRC. and the SSRC.
If this signalling affects how any RTP central node, like an RTP If this signalling affects how any RTP central node, like an RTP
mixer or translator that selects, mixes or processes streams, treats mixer or translator that selects, mixes or processes streams, treats
the streams, the node will also need to receive the same signalling the streams, the node will also need to receive the same signalling
to know how to treat media streams with different usage in the right to know how to treat media streams with different usage in the right
fashion. fashion.
Appendix E. Changes from -01 to -02
o Added Harald Alvestrand as co-author.
o Removed unused term "Media aggregate".
o Added term "RTP session group", noted that CNAMEs are assumed to
bind across the sessions of an RTP session group, and used it when
appropriate (TODO)
o Moved discussion of signalling aspects to appendix
o Removed all suggestion that PT can be a multiplexing point
o Normalised spelling of "endpoint" to follow RFC 3550 and not use a
hyphen.
o Added CNAME to definition list.
o Added term "Media Sink" for the thing that is identified by a
listen-only SSRC.
o Added term "RTP source" for the thing that transmits one media
stream, separating it from "Media Source". [[OUTSTANDING: Whether
to use "RTP Source" or "Media Sender" here]]
o Rewrote section on distributed endpoint, noting that this, like
any endpoint that wants a subset of a set of RTP streams, needs
multiple RTP sessions.
o Removed all substantive references to the undefined term "purpose"
from the main body of the document when it referred to the purpose
of an RTP stream.
o Moved the summary section of section 6 to the guidelines section
that it most closely supports.
Authors' Addresses Authors' Addresses
Magnus Westerlund Magnus Westerlund
Ericsson Ericsson
Farogatan 6 Farogatan 6
SE-164 80 Kista SE-164 80 Kista
Sweden Sweden
Phone: +46 10 714 82 87 Phone: +46 10 714 82 87
Email: magnus.westerlund@ericsson.com Email: magnus.westerlund@ericsson.com
Bo Burman Bo Burman
 End of changes. 211 change blocks. 
683 lines changed or deleted 590 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/