draft-ietf-rtcweb-rtp-usage-03.txt | draft-ietf-rtcweb-rtp-usage-04.txt | |||
---|---|---|---|---|
Network Working Group C. Perkins | Network Working Group C. Perkins | |||
Internet-Draft University of Glasgow | Internet-Draft University of Glasgow | |||
Intended status: Standards Track M. Westerlund | Intended status: Standards Track M. Westerlund | |||
Expires: December 6, 2012 Ericsson | Expires: January 17, 2013 Ericsson | |||
J. Ott | J. Ott | |||
Aalto University | Aalto University | |||
June 4, 2012 | July 16, 2012 | |||
Web Real-Time Communication (WebRTC): Media Transport and Use of RTP | Web Real-Time Communication (WebRTC): Media Transport and Use of RTP | |||
draft-ietf-rtcweb-rtp-usage-03 | draft-ietf-rtcweb-rtp-usage-04 | |||
Abstract | Abstract | |||
The Web Real-Time Communication (WebRTC) framework provides support | The Web Real-Time Communication (WebRTC) framework provides support | |||
for direct interactive rich communication using audio, video, text, | for direct interactive rich communication using audio, video, text, | |||
collaboration, games, etc. between two peers' web-browsers. This | collaboration, games, etc. between two peers' web-browsers. This | |||
memo describes the media transport aspects of the WebRTC framework. | memo describes the media transport aspects of the WebRTC framework. | |||
It specifies how the Real-time Transport Protocol (RTP) is used in | It specifies how the Real-time Transport Protocol (RTP) is used in | |||
the WebRTC context, and gives requirements for which RTP features, | the WebRTC context, and gives requirements for which RTP features, | |||
profiles, and extensions need to be supported. | profiles, and extensions need to be supported. | |||
skipping to change at page 1, line 39 ¶ | skipping to change at page 1, line 39 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on December 6, 2012. | This Internet-Draft will expire on January 17, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 17 ¶ | skipping to change at page 2, line 17 ¶ | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 6 | 4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 6 | |||
4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . . 6 | 4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
4.2. Choice of RTP Profile . . . . . . . . . . . . . . . . . . 7 | 4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 7 | |||
4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 7 | 4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 8 | |||
4.4. RTP Session Multiplexing . . . . . . . . . . . . . . . . . 8 | 4.4. RTP Session Multiplexing . . . . . . . . . . . . . . . . . 9 | |||
4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 8 | 4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 10 | |||
4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 9 | 4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 | |||
4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . . 9 | 4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . . 11 | |||
4.8. Generation of the RTCP Canonical Name (CNAME) . . . . . . 10 | 4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 11 | |||
5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 10 | 4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 | |||
5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 10 | 5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 12 | |||
5.1.1. Full Intra Request . . . . . . . . . . . . . . . . . . 11 | 5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 12 | |||
5.1.2. Picture Loss Indication . . . . . . . . . . . . . . . 11 | 5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . . 13 | |||
5.1.3. Slice Loss Indication . . . . . . . . . . . . . . . . 11 | 5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 13 | |||
5.1.4. Reference Picture Selection Indication . . . . . . . . 12 | 5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 | |||
5.1.5. Temporary Maximum Media Stream Bit Rate Request . . . 12 | 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 14 | |||
5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 12 | 5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 14 | |||
5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 12 | 5.1.6. Temporary Maximum Media Stream Bit Rate Request . . . 14 | |||
5.2.2. Client to Mixer Audio Level . . . . . . . . . . . . . 13 | 5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 | |||
5.2.3. Mixer to Client Audio Level . . . . . . . . . . . . . 13 | 5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 15 | |||
6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 13 | 5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 15 | |||
6.1. Retransmission . . . . . . . . . . . . . . . . . . . . . . 14 | 5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 | |||
6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . . 15 | 6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 16 | |||
6.2.1. Basic Redundancy . . . . . . . . . . . . . . . . . . . 15 | 6.1. Negative Acknowledgements and RTP Retransmission . . . . . 16 | |||
6.2.2. Block Based FEC . . . . . . . . . . . . . . . . . . . 16 | 6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . . 17 | |||
6.2.3. Recommendations for FEC . . . . . . . . . . . . . . . 17 | ||||
7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . . 17 | 7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . . 17 | |||
7.1. Congestion Control Requirements . . . . . . . . . . . . . 19 | 7.1. Congestion Control Requirements . . . . . . . . . . . . . 18 | |||
7.2. Rate Control Boundary Conditions . . . . . . . . . . . . . 19 | 7.2. Rate Control Boundary Conditions . . . . . . . . . . . . . 19 | |||
7.3. RTCP Limiations . . . . . . . . . . . . . . . . . . . . . 19 | 7.3. RTCP Limitations for Congestion Control . . . . . . . . . 19 | |||
7.4. Legacy Interop Limitations . . . . . . . . . . . . . . . . 20 | 7.4. Congestion Control Interoperability With Legacy Systems . 20 | |||
8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 21 | 8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 20 | |||
9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . . 21 | 9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . . 21 | |||
10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 | 10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 | |||
11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 23 | 11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 22 | |||
11.1. API MediaStream to RTP Mapping . . . . . . . . . . . . . . 23 | 11.1. API MediaStream to RTP Mapping . . . . . . . . . . . . . . 22 | |||
12. RTP Implementation Considerations . . . . . . . . . . . . . . 23 | 12. RTP Implementation Considerations . . . . . . . . . . . . . . 23 | |||
12.1. RTP Sessions and PeerConnection . . . . . . . . . . . . . 24 | 12.1. RTP Sessions and PeerConnection . . . . . . . . . . . . . 23 | |||
12.2. Multiple Sources . . . . . . . . . . . . . . . . . . . . . 25 | 12.2. Multiple Sources . . . . . . . . . . . . . . . . . . . . . 25 | |||
12.3. Multiparty . . . . . . . . . . . . . . . . . . . . . . . . 25 | 12.3. Multiparty . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
12.4. SSRC Collision Detection . . . . . . . . . . . . . . . . . 27 | 12.4. SSRC Collision Detection . . . . . . . . . . . . . . . . . 26 | |||
12.5. Contributing Sources . . . . . . . . . . . . . . . . . . . 28 | 12.5. Contributing Sources . . . . . . . . . . . . . . . . . . . 27 | |||
12.6. Media Synchronization . . . . . . . . . . . . . . . . . . 29 | 12.6. Media Synchronization . . . . . . . . . . . . . . . . . . 28 | |||
12.7. Multiple RTP End-points . . . . . . . . . . . . . . . . . 29 | 12.7. Multiple RTP End-points . . . . . . . . . . . . . . . . . 28 | |||
12.8. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 30 | 12.8. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
12.9. Differentiated Treatment of Flows . . . . . . . . . . . . 30 | 12.9. Differentiated Treatment of Flows . . . . . . . . . . . . 29 | |||
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | |||
14. Security Considerations . . . . . . . . . . . . . . . . . . . 32 | 14. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | |||
15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32 | 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 | |||
16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 | 16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
16.1. Normative References . . . . . . . . . . . . . . . . . . . 32 | 16.1. Normative References . . . . . . . . . . . . . . . . . . . 32 | |||
16.2. Informative References . . . . . . . . . . . . . . . . . . 35 | 16.2. Informative References . . . . . . . . . . . . . . . . . . 34 | |||
Appendix A. Supported RTP Topologies . . . . . . . . . . . . . . 37 | Appendix A. Supported RTP Topologies . . . . . . . . . . . . . . 36 | |||
A.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 37 | A.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 36 | |||
A.2. Multi-Unicast (Mesh) . . . . . . . . . . . . . . . . . . . 40 | A.2. Multi-Unicast (Mesh) . . . . . . . . . . . . . . . . . . . 39 | |||
A.3. Mixer Based . . . . . . . . . . . . . . . . . . . . . . . 43 | A.3. Mixer Based . . . . . . . . . . . . . . . . . . . . . . . 42 | |||
A.3.1. Media Mixing . . . . . . . . . . . . . . . . . . . . . 43 | A.3.1. Media Mixing . . . . . . . . . . . . . . . . . . . . . 42 | |||
A.3.2. Media Switching . . . . . . . . . . . . . . . . . . . 46 | A.3.2. Media Switching . . . . . . . . . . . . . . . . . . . 45 | |||
A.3.3. Media Projecting . . . . . . . . . . . . . . . . . . . 49 | A.3.3. Media Projecting . . . . . . . . . . . . . . . . . . . 48 | |||
A.4. Translator Based . . . . . . . . . . . . . . . . . . . . . 52 | A.4. Translator Based . . . . . . . . . . . . . . . . . . . . . 51 | |||
A.4.1. Transcoder . . . . . . . . . . . . . . . . . . . . . . 52 | A.4.1. Transcoder . . . . . . . . . . . . . . . . . . . . . . 51 | |||
A.4.2. Gateway / Protocol Translator . . . . . . . . . . . . 53 | A.4.2. Gateway / Protocol Translator . . . . . . . . . . . . 52 | |||
A.4.3. Relay . . . . . . . . . . . . . . . . . . . . . . . . 55 | A.4.3. Relay . . . . . . . . . . . . . . . . . . . . . . . . 54 | |||
A.5. End-point Forwarding . . . . . . . . . . . . . . . . . . . 59 | A.5. End-point Forwarding . . . . . . . . . . . . . . . . . . . 58 | |||
A.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 60 | A.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 59 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 61 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 60 | |||
1. Introduction | 1. Introduction | |||
The Real-time Transport Protocol (RTP) [RFC3550] provides a framework | The Real-time Transport Protocol (RTP) [RFC3550] provides a framework | |||
for delivery of audio and video teleconferencing data and other real- | for delivery of audio and video teleconferencing data and other real- | |||
time media applications. Previous work has defined the RTP protocol, | time media applications. Previous work has defined the RTP protocol, | |||
along with numerous profiles, payload formats, and other extensions. | along with numerous profiles, payload formats, and other extensions. | |||
When combined with appropriate signalling, these form the basis for | When combined with appropriate signalling, these form the basis for | |||
many teleconferencing systems. | many teleconferencing systems. | |||
The Web Real-Time communication (WebRTC) framework is a new protocol | The Web Real-Time communication (WebRTC) framework provides the | |||
framework that provides support for direct, interactive, real-time | protocol building blocks to support direct, interactive, real-time | |||
communication using audio, video, collaboration, games, etc., between | communication using audio, video, collaboration, games, etc., between | |||
two peers' web-browsers. This memo describes how the RTP framework | two peers' web-browsers. This memo describes how the RTP framework | |||
is to be used in the WebRTC context. It proposes a baseline set of | is to be used in the WebRTC context. It proposes a baseline set of | |||
RTP features that must be implemented by all WebRTC-aware browsers, | RTP features that are to be implemented by all WebRTC-aware end- | |||
along with suggested extensions for enhanced functionality. | points, along with suggested extensions for enhanced functionality. | |||
The WebRTC overview [I-D.ietf-rtcweb-overview] outlines the complete | The WebRTC overview [I-D.ietf-rtcweb-overview] outlines the complete | |||
WebRTC framework, of which this memo is a part. | WebRTC framework, of which this memo is a part. | |||
The structure of this memo is as follows. Section 2 outlines our | The structure of this memo is as follows. Section 2 outlines our | |||
rationale in preparing this memo and choosing these RTP features. | rationale in preparing this memo and choosing these RTP features. | |||
Section 3 defines requirement terminology. Requirements for core RTP | Section 3 defines requirement terminology. Requirements for core RTP | |||
protocols are described in Section 4 and recommended RTP extensions | protocols are described in Section 4 and recommended RTP extensions | |||
are described in Section 5. Section 6 outlines mechanisms that can | are described in Section 5. Section 6 outlines mechanisms that can | |||
increase robustness to network problems, while Section 7 describes | increase robustness to network problems, while Section 7 describes | |||
the required congestion control and rate adaptation mechanisms. The | the required congestion control and rate adaptation mechanisms. The | |||
discussion of required RTP mechanisms concludes in Section 8 with a | discussion of mandated RTP mechanisms concludes in Section 8 with a | |||
review of performance monitoring and network management tools that | review of performance monitoring and network management tools that | |||
can be used in the WebRTC context. Section 9 gives some guidelines | can be used in the WebRTC context. Section 9 gives some guidelines | |||
for future incorporation of other RTP and RTP Control Protocol (RTCP) | for future incorporation of other RTP and RTP Control Protocol (RTCP) | |||
extensions into this framework. Section 10 describes requirements | extensions into this framework. Section 10 describes requirements | |||
placed on the signalling channel. Section 11 discusses the | placed on the signalling channel. Section 11 discusses the | |||
relationship between features of the RTP framework and the WebRTC | relationship between features of the RTP framework and the WebRTC | |||
application programming interface (API), and Section 12 discusses RTP | application programming interface (API), and Section 12 discusses RTP | |||
implementation considerations. This memo concludes with an appendix | implementation considerations. This memo concludes with an appendix | |||
discussing several different RTP Topologies, and how they affect the | discussing several different RTP Topologies, and how they affect the | |||
RTP session(s) and various implementation details of possible | RTP session(s) and various implementation details of possible | |||
realization of central nodes. | realization of central nodes. | |||
2. Rationale | 2. Rationale | |||
The RTP framework comprises the RTP data transfer protocol, the RTP | The RTP framework comprises the RTP data transfer protocol, the RTP | |||
control protocol, and numerous RTP payload formats, profiles, and | control protocol, and numerous RTP payload formats, profiles, and | |||
extensions. This range of add-ons has allowed RTP to meet various | extensions. This range of add-ons has allowed RTP to meet various | |||
needs that were not envisaged by the original protocol designers, and | needs that were not envisaged by the original protocol designers, and | |||
to support many new media encodings, but raises the question of what | to support many new media encodings, but raises the question of what | |||
features should be supported by new implementations? The development | extensions are to be supported by new implementations. The | |||
of the WebRTC framework provides an opportunity for us to review the | development of the WebRTC framework provides an opportunity for us to | |||
available RTP features and extensions, and to define a common | review the available RTP features and extensions, and to define a | |||
baseline feature set for all WebRTC implementations of RTP. This | common baseline feature set for all WebRTC implementations of RTP. | |||
builds on the past 15 years development of RTP to mandate the use of | This builds on the past 15 years development of RTP to mandate the | |||
extensions that have shown widespread utility, while still remaining | use of extensions that have shown widespread utility, while still | |||
compatible with the wide installed base of RTP implementations where | remaining compatible with the wide installed base of RTP | |||
possible. | implementations where possible. | |||
RTP and RTCP extensions not discussed in this document can still be | ||||
implemented by a WebRTC end-point, but they are considered optional, | ||||
are not required for interoperability, and do not provide features | ||||
needed to address the WebRTC use cases and requirements | ||||
[I-D.ietf-rtcweb-use-cases-and-requirements]. | ||||
While the baseline set of RTP features and extensions defined in this | While the baseline set of RTP features and extensions defined in this | |||
memo is targetted at the requirements of the WebRTC framework, it is | memo is targeted at the requirements of the WebRTC framework, it is | |||
expected to be broadly useful for other conferencing-related uses of | expected to be broadly useful for other conferencing-related uses of | |||
RTP. In particular, it is likely that this set of RTP features and | RTP. In particular, it is likely that this set of RTP features and | |||
extensions will be apppropriate for other desktop or mobile video | extensions will be appropriate for other desktop or mobile video | |||
conferencing systems, or for room-based high-quality telepresence | conferencing systems, or for room-based high-quality telepresence | |||
applications. | applications. | |||
3. Terminology | 3. Terminology | |||
This memo specifies various requirements levels for implementation or | This memo specifies various requirements levels for implementation or | |||
use of RTP features and extensions. When we describe the importance | use of RTP features and extensions. When we describe the importance | |||
of RTP extensions, or the need for implementation support, we use the | of RTP extensions, or the need for implementation support, we use the | |||
following requirement levels to specify the importance of the feature | following requirement levels to specify the importance of the feature | |||
in the WebRTC framework: | in the WebRTC framework: | |||
skipping to change at page 5, line 50 ¶ | skipping to change at page 6, line 9 ¶ | |||
that it enhances the product while another vendor may omit the | that it enhances the product while another vendor may omit the | |||
same item. An implementation which does not include a particular | same item. An implementation which does not include a particular | |||
option MUST be prepared to interoperate with another | option MUST be prepared to interoperate with another | |||
implementation which does include the option, though perhaps with | implementation which does include the option, though perhaps with | |||
reduced functionality. In the same vein an implementation which | reduced functionality. In the same vein an implementation which | |||
does include a particular option MUST be prepared to interoperate | does include a particular option MUST be prepared to interoperate | |||
with another implementation which does not include the option | with another implementation which does not include the option | |||
(except, of course, for the feature the option provides.) | (except, of course, for the feature the option provides.) | |||
These key words are used in a manner consistent with their definition | These key words are used in a manner consistent with their definition | |||
in [RFC2119]. | in [RFC2119]. The above interpretation of these key words applies | |||
only when written in ALL CAPS. Lower- or mixed-case uses of these | ||||
key words are not to be interpreted as carrying special significance | ||||
in this memo. | ||||
We define the following terms: | ||||
RTP Media Stream: A sequence of RTP packets, and associated RTCP | ||||
packets, using a single synchronisation source (SSRC) that | ||||
together carries part or all of the content of a specific Media | ||||
Type from a specific sender source within a given RTP session. | ||||
RTP Session: As defined by [RFC3550], the endpoints belonging to the | ||||
same RTP Session are those that share a single SSRC space. That | ||||
is, those endpoints can see an SSRC identifier transmitted by any | ||||
one of the other endpoints. An endpoint can see an SSRC either | ||||
directly in RTP and RTCP packets, or as a contributing source | ||||
(CSRC) in RTP packets from a mixer. The RTP Session scope is | ||||
hence decided by the endpoints' network interconnection topology, | ||||
in combination with RTP and RTCP forwarding strategies deployed by | ||||
endpoints and any interconnecting middle nodes. | ||||
WebRTC MediaStream: The MediaStream concept defined by the W3C in | ||||
the API. | ||||
Other terms are used according to their definitions from the RTP | ||||
Specification [RFC3550] and WebRTC overview | ||||
[I-D.ietf-rtcweb-overview] documents. | ||||
4. WebRTC Use of RTP: Core Protocols | 4. WebRTC Use of RTP: Core Protocols | |||
The following sections describe the core features of RTP and RTCP | The following sections describe the core features of RTP and RTCP | |||
that MUST be implemented, along with the mandated RTP profiles and | that need to be implemented, along with the mandated RTP profiles and | |||
payload formats. Also described are the core extensions providing | payload formats. Also described are the core extensions providing | |||
essential features that all WebRTC implementations MUST implement to | essential features that all WebRTC implementations need to implement | |||
function effectively on today's networks. | to function effectively on today's networks. | |||
4.1. RTP and RTCP | 4.1. RTP and RTCP | |||
The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be | The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be | |||
implemented as the media transport protocol for WebRTC. RTP itself | implemented as the media transport protocol for WebRTC. RTP itself | |||
comprises two parts: the RTP data transfer protocol, and the RTP | comprises two parts: the RTP data transfer protocol, and the RTP | |||
control protocol (RTCP). RTCP is a fundamental and integral part of | control protocol (RTCP). RTCP is a fundamental and integral part of | |||
RTP, and MUST be implemented in all WebRTC applications. | RTP, and MUST be implemented in all WebRTC applications. | |||
The following RTP and RTCP features are sometimes omitted in limited | The following RTP and RTCP features are sometimes omitted in limited | |||
functionality implementations of RTP, but are REQUIRED in all WebRTC | functionality implementations of RTP, but are REQUIRED in all WebRTC | |||
implementations: | implementations: | |||
o Support for use of multiple simultaneous SSRC values in a single | o Support for use of multiple simultaneous SSRC values in a single | |||
RTP session, including support for RTP end-points that send many | RTP session, including support for RTP end-points that send many | |||
SSRC values simultaneously. | SSRC values simultaneously. | |||
o Random choice of SSRC on joining a session; collision detection | o Random choice of SSRC on joining a session; collision detection | |||
and resolution for SSRC values. | and resolution for SSRC values (but see also Section 4.8). | |||
o Support reception of RTP data packets containing CSRC lists, as | o Support for reception of RTP data packets containing CSRC lists, | |||
generated by RTP mixers. | as generated by RTP mixers, and RTCP packets relating to CSRCs. | |||
o Support for sending correct synchronization information in the | o Support for sending correct synchronization information in the | |||
RTCP Sender Reports, with RECOMMENDED support for the rapid RTP | RTCP Sender Reports, to allow a receiver to implement lip-sync, | |||
synchronisation extensions (see Section 5.2.1). | with RECOMMENDED support for the rapid RTP synchronisation | |||
extensions (see Section 5.2.1). | ||||
o Support for standard RTCP packet types, include SR, RR, SDES, and | o Support for sending and receiving RTCP SR, RR, SDES, and BYE | |||
BYE packets. | packet types, with OPTIONAL support for other RTCP packet types; | |||
implementations MUST ignore unknown RTCP packet types. | ||||
o Support for multiple end-points in a single RTP session, and for | o Support for multiple end-points in a single RTP session, and for | |||
scaling the RTCP transmission interval according to the number of | scaling the RTCP transmission interval according to the number of | |||
participants in the session; support randomised RTCP transmission | participants in the session; support for randomised RTCP | |||
intervals to avoid synchronisation of RTCP reports. | transmission intervals to avoid synchronisation of RTCP reports; | |||
support for RTCP timer reconsideration. | ||||
o Support for configuring the RTCP bandwidth as a fraction of the | ||||
media bandwidth, and for configuring the fraction of the RTCP | ||||
bandwidth allocated to senders, e.g., using the SDP "b=" line. | ||||
It is known that a significant number of legacy RTP implementations, | It is known that a significant number of legacy RTP implementations, | |||
especially those targetted for purely VoIP systems, do not support | especially those targeted at VoIP-only systems, do not support all of | |||
all of the above features. | the above features, and in some cases do not support RTCP at all. | |||
Implementers are advised to consider the requirements for graceful | ||||
degradation when interoperating with legacy implementations. | ||||
Other implementation considerations are discussed in Section 12. | Other implementation considerations are discussed in Section 12. | |||
4.2. Choice of RTP Profile | 4.2. Choice of the RTP Profile | |||
The complete specification of RTP for a particular application domain | The complete specification of RTP for a particular application domain | |||
requires the choice of an RTP Profile. For WebRTC use, the "Extended | requires the choice of an RTP Profile. For WebRTC use, the "Extended | |||
Secure RTP Profile for Real-time Transport Control Protocol (RTCP)- | Secure RTP Profile for Real-time Transport Control Protocol (RTCP)- | |||
Based Feedback (RTP/SAVPF)" [RFC5124] is REQUIRED to be implemented. | Based Feedback (RTP/SAVPF)" [RFC5124] is REQUIRED to be implemented. | |||
This builds on the basic RTP/AVP profile [RFC3551], the RTP profile | This builds on the basic RTP/AVP profile [RFC3551], the RTP profile | |||
for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure RTP | for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure RTP | |||
profile (RTP/SAVP) [RFC3711]. | profile (RTP/SAVP) [RFC3711]. | |||
The RTP/AVPF part of RTP/SAVPF is required to get the improved RTCP | The RTCP-based feedback extensions are needed for the improved RTCP | |||
timer model, that allows more flexible transmission of RTCP packets | timer model, that allows more flexible transmission of RTCP packets | |||
in response to events, rather than strictly according to bandwidth. | in response to events, rather than strictly according to bandwidth. | |||
This is vital for being able to report congestion events. The RTP/ | This is vital for being able to report congestion events. These | |||
AVPF profile also saves RTCP bandwidth, and will commonly only use | extensions also save RTCP bandwidth, and will commonly only use the | |||
the full RTCP bandwidth allocation when there are many events that | full RTCP bandwidth allocation if there are many events that require | |||
require feedback. The RTP/AVPF functionality is also needed to make | feedback. They are also needed to make use of the RTP conferencing | |||
use of the RTP conferencing extensions discussed in Section 5.1. | extensions discussed in Section 5.1. | |||
Note: The enhanced RTCP timer model defined in the RTP/AVPF | Note: The enhanced RTCP timer model defined in the RTP/AVPF | |||
profile is backwards compatible with legacy systems that implement | profile is backwards compatible with legacy systems that implement | |||
only the base RTP/AVP profile, given some constraints on parameter | only the base RTP/AVP profile, given some constraints on parameter | |||
configuration such as the RTCP bandwidth value and "trr-int" (the | configuration such as the RTCP bandwidth value and "trr-int" (the | |||
most important factor for interworking with RTP/AVP end-points via | most important factor for interworking with RTP/AVP end-points via | |||
a gateway is to set the trr-int parameter to a value representing | a gateway is to set the trr-int parameter to a value representing | |||
4 seconds). | 4 seconds). | |||
The RTP/SAVP part of the RTP/SAVPF profile is for support for Secure | The secure RTP profile is needed to provide SRTP media encryption, | |||
RTP (SRTP) [RFC3711]. This provides media encryption, integrity | integrity protection, replay protection and a limited form of source | |||
protection, replay protection and a limited form of source | ||||
authentication. | authentication. | |||
WebRTC implementation MUST NOT send packets using the RTP/AVP profile | WebRTC implementations MUST NOT send packets using the basic RTP/AVP | |||
or the RTP/AVPF profile; they MUST use the RTP/SAVPF profile. WebRTC | profile or the RTP/AVPF profile; they MUST employ the full RTP/SAVPF | |||
implementations MUST support DTLS-SRTP [RFC5764] for key-management. | profile to protect all RTP and RTCP packets that are generated. The | |||
default and mandatory-to-implement transforms listed in Section 5 of | ||||
[RFC3711] SHALL apply. | ||||
(tbd: There is ongoing discussion on what additional keying mechanism | Implementations MUST support DTLS-SRTP [RFC5764] for key-management. | |||
is to be required, what are the mandated cryptographic transforms. | Other key management schemes MAY be supported. | |||
This section needs to be updated based on the results of that | ||||
discussion.) | ||||
4.3. Choice of RTP Payload Formats | 4.3. Choice of RTP Payload Formats | |||
(tbd: say something about the choice of RTP Payload Format for | The requirement from Section 6 of [RFC3551] that "Audio applications | |||
WebRTC. If there is a mandatory to implement set of codecs, this | operating under this profile SHOULD, at a minimum, be able to send | |||
should reference them. In any case, it should reference a discussion | and/or receive payload types 0 (PCMU) and 5 (DVI4)" applies, since | |||
of signalling for the choice of codec, once that discussion reaches | Section 4.2 of this memo mandates the use of the RTP/SAVPF profile, | |||
closure.) | which inherits this restriction from the RTP/AVP profile. | |||
Endpoints may signal support for multiple media formats, or multiple | ||||
(tbd: there is ongoing discussion on whether support for other audio | ||||
and video codecs is to be mandated) | ||||
Endpoints MAY signal support for multiple media formats, or multiple | ||||
configurations of a single format, provided each uses a different RTP | configurations of a single format, provided each uses a different RTP | |||
payload type number. An endpoint that has signalled it's support for | payload type number. An endpoint that has signalled its support for | |||
multiple formats is REQUIRED to accept data in any of those formats | multiple formats is REQUIRED to accept data in any of those formats | |||
at any time, unless it has previously signalled limitations on it's | at any time, unless it has previously signalled limitations on its | |||
decoding capability. This is modified if several media types are | decoding capability. | |||
sent in the same RTP session, in that case a source (SSRC) is | ||||
restricted to switch between any RTP payload format established for | This requirement is constrained if several media types are sent in | |||
the media type that is being sent by that source; see Section 4.4. | the same RTP session. In such a case, a source (SSRC) is restricted | |||
To support rapid rate adaptation, RTP does not require signalling in | to switching only between the RTP payload formats signalled for the | |||
media type that is being sent by that source; see Section 4.4. To | ||||
support rapid rate adaptation, RTP does not require signalling in | ||||
advance for changes between payload formats that were signalled | advance for changes between payload formats that were signalled | |||
during session setup. | during session setup. | |||
An RTP sender that changes between two RTP payload types that use | ||||
different RTP clock rates MUST follow the recommendations in Section | ||||
4.1 of [I-D.ietf-avtext-multiple-clock-rates]. RTP receivers MUST | ||||
follow the recommendations in Section 4.3 of | ||||
[I-D.ietf-avtext-multiple-clock-rates], in order to support sources | ||||
that switch between clock rates in an RTP session (these | ||||
recommendations for receivers are backwards compatible with the case | ||||
where senders use only a single clock rate). | ||||
4.4. RTP Session Multiplexing | 4.4. RTP Session Multiplexing | |||
An association amongst a set of participants communicating with RTP | An association amongst a set of participants communicating with RTP | |||
is known as an RTP session. A participant may be involved in | is known as an RTP session. A participant can be involved in | |||
multiple RTP sessions at the same time. In a multimedia session, | multiple RTP sessions at the same time. In a multimedia session, | |||
each medium has typically been carried in a separate RTP session with | each medium has typically been carried in a separate RTP session with | |||
its own RTCP packets (i.e., one RTP session for the audio, with a | its own RTCP packets (i.e., one RTP session for the audio, with a | |||
separate RTP session running on a different transport connection for | separate RTP session using a different transport address for the | |||
the video; if SDP is used, this corresponds to one RTP session for | video; if SDP is used, this corresponds to one RTP session for each | |||
each "m=" line in the SDP). WebRTC implementations of RTP are | "m=" line in the SDP). WebRTC implementations of RTP are REQUIRED to | |||
REQUIRED to implement support for multimedia sessions in this way, | implement support for multimedia sessions in this way, for | |||
for compatibility with legacy systems. | compatibility with legacy systems. | |||
In today's networks, however, with the widespread use of Network | In today's networks, however, with the widespread use of Network | |||
Address/Port Translators (NAT/NAPT) and Firewalls (FW), it is | Address/Port Translators (NAT/NAPT) and Firewalls (FW), it is | |||
desirable to reduce the number of transport layer ports used by real- | desirable to reduce the number of transport addresses used by real- | |||
time media applications using RTP by combining multimedia traffic in | time media applications using RTP by combining multimedia traffic in | |||
a single RTP session. (Details of how this is to be done are tbd, | a single RTP session. (Details of how this is to be done are tbd, | |||
but see [I-D.lennox-rtcweb-rtp-media-type-mux], | but see [I-D.lennox-rtcweb-rtp-media-type-mux], | |||
[I-D.holmberg-mmusic-sdp-bundle-negotiation] and | [I-D.holmberg-mmusic-sdp-bundle-negotiation] and | |||
[I-D.westerlund-avtcore-multiplex-architecture].) Using a single RTP | [I-D.westerlund-avtcore-multiplex-architecture].) Using a single RTP | |||
session also effects the possibility for differentiated treament of | session also effects the possibility for differentiated treatment of | |||
media flows. This is further discussed in Section 12.9. | media flows. This is further discussed in Section 12.9. | |||
WebRTC implementations of RTP are REQUIRED to support multiplexing of | WebRTC implementations of RTP are REQUIRED to support multiplexing of | |||
a multimedia session onto a single RTP session according to (tbd). | a multimedia session onto a single RTP session according to (tbd). | |||
If such RTP session multiplexing is to be used, this MUST be | If such RTP session multiplexing is to be used, this MUST be | |||
negotiated during the signalling phase. Support for multiple RTP | negotiated during the signalling phase. Support for multiple RTP | |||
sessions over a single UDP flow as defined by | sessions over a single UDP flow as defined by | |||
[I-D.westerlund-avtcore-transport-multiplexing] is RECOMMENDED. | [I-D.westerlund-avtcore-transport-multiplexing] is RECOMMENDED/ | |||
OPTIONAL. | ||||
4.5. RTP and RTCP Multiplexing | (tbd: No consensus on the level of including support of Multiple RTP | |||
sessions over a single UDP flow.) | ||||
Historically, RTP and RTCP have been run on separate transport-layer | 4.5. RTP and RTCP Multiplexing | |||
ports (e.g., two UDP ports for each RTP session, one port for RTP and | ||||
one port for RTCP). With the increased use of Network Address/Port | ||||
Translation (NAPT) this has become problematic, since maintaining | ||||
multiple NAT bindings can be costly. It also complicates firewall | ||||
administration, since multiple ports must be opened to allow RTP | ||||
traffic. To reduce these costs and session setup times, support for | ||||
multiplexing RTP data packets and RTCP control packets on a single | ||||
port [RFC5761] for each RTP session is REQUIRED. | ||||
(tbd: Are WebRTC implementations required to support the case where | Historically, RTP and RTCP have been run on separate transport layer | |||
the RTP and RTCP are run on separate UDP ports, for interoperability | addresses (e.g., two UDP ports for each RTP session, one port for RTP | |||
with legacy systems?) | and one port for RTCP). With the increased use of Network Address/ | |||
Port Translation (NAPT) this has become problematic, since | ||||
maintaining multiple NAT bindings can be costly. It also complicates | ||||
firewall administration, since multiple ports need to be opened to | ||||
allow RTP traffic. To reduce these costs and session setup times, | ||||
support for multiplexing RTP data packets and RTCP control packets on | ||||
a single port for each RTP session is REQUIRED, as specified in | ||||
[RFC5761]. For backwards compatibility, implementations are also | ||||
REQUIRED to support sending of RTP and RTCP to separate destination | ||||
ports. | ||||
Note that the use of RTP and RTCP multiplexed onto a single transport | Note that the use of RTP and RTCP multiplexed onto a single transport | |||
port ensures that there is occasional traffic sent on that port, even | port ensures that there is occasional traffic sent on that port, even | |||
if there is no active media traffic. This may be useful to keep- | if there is no active media traffic. This can be useful to keep NAT | |||
alive NAT bindings, and is the recommend method for application level | bindings alive, and is the recommend method for application level | |||
keep-alives of RTP sessions [RFC6263]. | keep-alives of RTP sessions [RFC6263]. | |||
4.6. Reduced Size RTCP | 4.6. Reduced Size RTCP | |||
RTCP packets are usually sent as compound RTCP packets, and [RFC3550] | RTCP packets are usually sent as compound RTCP packets, and [RFC3550] | |||
requires that those compound packets start with an Sender Report (SR) | requires that those compound packets start with an Sender Report (SR) | |||
or Receiver Report (RR) packet. When using frequent RTCP feedback | or Receiver Report (RR) packet. When using frequent RTCP feedback | |||
messages, these general statistics are not needed in every packet and | messages, these general statistics are not needed in every packet and | |||
unnecessarily increase the mean RTCP packet size. This can limit the | unnecessarily increase the mean RTCP packet size. This can limit the | |||
frequency at which RTCP packets can be sent within the RTCP bandwidth | frequency at which RTCP packets can be sent within the RTCP bandwidth | |||
share. | share. | |||
To avoid this problem, [RFC5506] specifies how to reduce the mean | To avoid this problem, [RFC5506] specifies how to reduce the mean | |||
RTCP message and allow for more frequent feedback. Frequent | RTCP message size and allow for more frequent feedback. Frequent | |||
feedback, in turn, is essential to make real-time application quickly | feedback, in turn, is essential to make real-time applications | |||
aware of changing network conditions and allow them to adapt their | quickly aware of changing network conditions, and to allow them to | |||
transmission and encoding behaviour. Support for RFC5506 is | adapt their transmission and encoding behaviour. Support for sending | |||
REQUIRED. | RTCP feedback packets as [RFC5506] non-compound packets is REQUIRED | |||
when signalled. For backwards compatibility, implementations are | ||||
also REQUIRED to support the use of compound RTCP feedback packets. | ||||
4.7. Symmetric RTP/RTCP | 4.7. Symmetric RTP/RTCP | |||
To ease traversal of NAT and firewall devices, implementations are | To ease traversal of NAT and firewall devices, implementations are | |||
REQUIRED to implement Symmetric RTP [RFC4961]. This requires that | REQUIRED to implement and use Symmetric RTP [RFC4961]. This requires | |||
the IP address and port used for sending and receiving RTP and RTCP | that the IP address and port used for sending and receiving RTP and | |||
packets are identical. The reasons for using symmetric RTP is | RTCP packets are identical. The reasons for using symmetric RTP is | |||
primarily to avoid issues with NAT and Firewalls by ensuring that the | primarily to avoid issues with NAT and Firewalls by ensuring that the | |||
flow is actually bi-directional and thus kept alive and registered as | flow is actually bi-directional and thus kept alive and registered as | |||
flow the intended recipient actually wants. In addition it saves | flow the intended recipient actually wants. In addition, it saves | |||
resources in the form of ports at the end-points, but also in the | resources, specifically ports at the end-points, but also in the | |||
network as NAT mappings or firewall state is not unnecessary bloated. | network as NAT mappings or firewall state is not unnecessary bloated. | |||
Also the amount of QoS state is reduced. | Also the amount of QoS state is reduced. | |||
4.8. Generation of the RTCP Canonical Name (CNAME) | 4.8. Choice of RTP Synchronisation Source (SSRC) | |||
Implementations are REQUIRED to support signalled RTP SSRC values, | ||||
using the "a=ssrc:" SDP attribute defined in Sections 4.1 and 5 of | ||||
[RFC5576], and MUST also support the "previous-ssrc" source attribute | ||||
defined in Section 6.2 of [RFC5576]. Other attributes defined in | ||||
[RFC5576] MAY be supported. | ||||
Use of the "a=ssrc:" attribute is OPTIONAL. Implementations MUST | ||||
support random SSRC assignment, and MUST support SSRC collision | ||||
detection and resolution, both according to [RFC3550]. | ||||
4.9. Generation of the RTCP Canonical Name (CNAME) | ||||
The RTCP Canonical Name (CNAME) provides a persistent transport-level | The RTCP Canonical Name (CNAME) provides a persistent transport-level | |||
identifier for an RTP endpoint. While the Synchronisation Source | identifier for an RTP endpoint. While the Synchronisation Source | |||
(SSRC) identifier for an RTP endpoint may change if a collision is | (SSRC) identifier for an RTP endpoint can change if a collision is | |||
detected, or when the RTP application is restarted, it's RTCP CNAME | detected, or when the RTP application is restarted, its RTCP CNAME is | |||
is meant to stay unchanged, so that RTP endpoints can be uniquely | meant to stay unchanged, so that RTP endpoints can be uniquely | |||
identified and associated with their RTP media streams. For proper | identified and associated with their RTP media streams within a set | |||
functionality, each RTP endpoint needs to have a unique RTCP CNAME | of related RTP sessions. For proper functionality, each RTP endpoint | |||
value. | needs to have a unique RTCP CNAME value. | |||
The RTP specification [RFC3550] includes guidelines for choosing a | The RTP specification [RFC3550] includes guidelines for choosing a | |||
unique RTP CNAME, but these are not sufficient in the presence of NAT | unique RTP CNAME, but these are not sufficient in the presence of NAT | |||
devices. In addition, some may find long-term persistent identifiers | devices. In addition, long-term persistent identifiers can be | |||
problematic from a privacy viewpoint. Accordingly, support for | problematic from a privacy viewpoint. Accordingly, support for | |||
generating a short-term persistent RTCP CNAMEs following method (b) | generating a short-term persistent RTCP CNAMEs following method (b) | |||
specified in Section 4.2 of "Guidelines for Choosing RTP Control | specified in Section 4.2 of "Guidelines for Choosing RTP Control | |||
Protocol (RTCP) Canonical Names (CNAMEs)" [RFC6222] is REQUIRED, | Protocol (RTCP) Canonical Names (CNAMEs)" [RFC6222] is RECOMMENDED. | |||
since this addresses both concerns. | Note, however, that this does not resolve the privacy concern as | |||
there is not sufficient randomness to avoid tracking of an end-point. | ||||
An WebRTC end-point MUST support reception of any CNAME that matches | ||||
the syntax limitations specified by the RTP specification [RFC3550] | ||||
and cannot assume that any CNAME will be according to the recommended | ||||
form above. | ||||
(tbd: there seems to be a growing consensus that the working group | ||||
wants randomly-chosen CNAME values; need to reference a draft that | ||||
describes how this is to be done) | ||||
5. WebRTC Use of RTP: Extensions | 5. WebRTC Use of RTP: Extensions | |||
There are a number of RTP extensions that are either required to | There are a number of RTP extensions that are either needed to obtain | |||
obtain full functionality, or extremely useful to improve on the | full functionality, or extremely useful to improve on the baseline | |||
baseline performance, in the WebRTC application context. One set of | performance, in the WebRTC application context. One set of these | |||
these extensions is related to conferencing, while others are more | extensions is related to conferencing, while others are more generic | |||
generic in nature. The following subsections describe the various | in nature. The following subsections describe the various RTP | |||
RTP extensions mandated or strongly recommended within WebRTC. | extensions mandated or suggested for use within the WebRTC context. | |||
5.1. Conferencing Extensions | 5.1. Conferencing Extensions | |||
RTP is inherently a group communication protocol. Groups can be | RTP is inherently a group communication protocol. Groups can be | |||
implemented using a centralised server, multi-unicast, or using IP | implemented using a centralised server, multi-unicast, or using IP | |||
multicast. While IP multicast was popular in early deployments, in | multicast. While IP multicast was popular in early deployments, in | |||
today's practice, overlay-based conferencing dominates, typically | today's practice, overlay-based conferencing dominates, typically | |||
using one or more central servers to connect endpoints in a star or | using one or more central servers to connect endpoints in a star or | |||
flat tree topology. These central servers can be implemented in a | flat tree topology. These central servers can be implemented in a | |||
number of ways as discussed in Appendix A, and in the memo on RTP | number of ways as discussed in Appendix A, and in the memo on RTP | |||
Topologies [RFC5117]. | Topologies [RFC5117]. | |||
As discussed in Section 3.5 of [RFC5117], the use of a video | As discussed in Section 3.5 of [RFC5117], the use of a video | |||
switching MCU makes the use of RTCP for congestion control, or any | switching MCU makes the use of RTCP for congestion control, or any | |||
type of quality reports, very problematic. Also, as discussed in | type of quality reports, very problematic. Also, as discussed in | |||
section 3.6 of [RFC5117], the use of a content modifying MCU with | section 3.6 of [RFC5117], the use of a content modifying MCU with | |||
RTCP termination breaks RTP loop detection and removes the ability | RTCP termination breaks RTP loop detection and removes the ability | |||
for receivers to identify active senders. Accordingly, only RTP | for receivers to identify active senders. RTP Transport Translators | |||
Transport Translators (relays), RTP Mixers, and end-point based | (Topo-Translator) are not of immediate interest to WebRTC, although | |||
forwarding topologies are supported in WebRTC. These RECOMMENDED | the main difference compared to point to point is the possibility of | |||
topologies are expected to be supported by all WebRTC end-points | seeing multiple different transport paths in any RTCP feedback. | |||
(these three topologies require no special support in the end-point, | Accordingly, only Point to Point (Topo-Point-to-Point), Multiple | |||
if the RTP features mandated in this memo are implemented). | concurrent Point to Point (Mesh) and RTP Mixers (Topo-Mixer) | |||
topologies are needed to achieve the use-cases to be supported in | ||||
WebRTC initially. These RECOMMENDED topologies are expected to be | ||||
supported by all WebRTC end-points (these topologies require no | ||||
special RTP-layer support in the end-point if the RTP features | ||||
mandated in this memo are implemented). | ||||
The RTP protocol extensions to be used with conferencing, described | The RTP extensions described below to be used with centralised | |||
below, are not required for correctness; an RTP endpoint that does | conferencing -- where one RTP Mixer (e.g., a conference bridge) | |||
not implement these extensions will work correctly, but offer poor | receives a participant's RTP media streams and distributes them to | |||
performance. Support for the listed extensions will greatly improve | the other participants -- are not necessary for interoperability; an | |||
the quality of experience, however, in the context of centralised | RTP endpoint that does not implement these extensions will work | |||
conferencing, where one RTP Mixer (Conference Focus) receives a | correctly, but may offer poor performance. Support for the listed | |||
participants media streams and distribute them to the other | extensions will greatly improve the quality of experience and, to | |||
participants. These messages are defined in the Extended RTP Profile | provide a reasonable baseline quality, some these extensions are | |||
for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/ | mandatory to be supported by WebRTC end-points. | |||
AVPF) [RFC4585] and the "Codec Control Messages in the RTP Audio- | ||||
Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] and are fully | ||||
usable by the Secure variant of this profile (RTP/SAVPF) [RFC5124]. | ||||
5.1.1. Full Intra Request | The RTCP packets assisting in such operation are defined in the | |||
Extended RTP Profile for Real-time Transport Control Protocol (RTCP)- | ||||
Based Feedback (RTP/AVPF) [RFC4585] and the "Codec Control Messages | ||||
in the RTP Audio-Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] | ||||
and are fully usable by the Secure variant of this profile (RTP/ | ||||
SAVPF) [RFC5124]. | ||||
5.1.1. Full Intra Request (FIR) | ||||
The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the | The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the | |||
Codec Control Messages [RFC5104]. This message is used to have the | Codec Control Messages [RFC5104]. This message is used to make the | |||
mixer request a new Intra picture from a participant in the session. | mixer request a new Intra picture from a participant in the session. | |||
This is used when switching between sources to ensure that the | This is used when switching between sources to ensure that the | |||
receivers can decode the video or other predicted media encoding with | receivers can decode the video or other predictive media encoding | |||
long prediction chains. It is REQUIRED that this feedback message is | with long prediction chains. It is REQUIRED that this feedback | |||
supported by RTP senders in WebRTC, since it greatly improves the | message is supported by RTP senders in WebRTC, since it greatly | |||
user experience when using centralised mixers-based conferencing. | improves the user experience when using centralised mixers-based | |||
conferencing. | ||||
5.1.2. Picture Loss Indication | 5.1.2. Picture Loss Indication (PLI) | |||
The Picture Loss Indication is defined in Section 6.3.1 of the RTP/ | The Picture Loss Indication is defined in Section 6.3.1 of the RTP/ | |||
AVPF profile [RFC4585]. It is used by a receiver to tell the sending | AVPF profile [RFC4585]. It is used by a receiver to tell the sending | |||
encoder that it lost the decoder context and would like to have it | encoder that it lost the decoder context and would like to have it | |||
repaired somehow. This is semantically different from the Full Intra | repaired somehow. This is semantically different from the Full Intra | |||
Request above as there can exist multiple methods to fulfil the | Request above as there there may be multiple methods to fulfill the | |||
request. It is RECOMMENDED that this feedback message is supported | request. It is REQUIRED that senders understand and react to this | |||
as a loss tolerance mechanism. | feedback message as a loss tolerance mechanism; receivers MAY send | |||
PLI messages. | ||||
5.1.3. Slice Loss Indication | 5.1.3. Slice Loss Indication (SLI) | |||
The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF | The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF | |||
profile [RFC4585]. It is used by a receiver to tell the encoder that | profile [RFC4585]. It is used by a receiver to tell the encoder that | |||
it has detected the loss or corruption of one or more consecutive | it has detected the loss or corruption of one or more consecutive | |||
macroblocks, and would like to have these repaired somehow. The use | macroblocks, and would like to have these repaired somehow. The use | |||
of this feedback message is OPTIONAL as a loss tolerance mechanism. | of this feedback message is OPTIONAL as a loss tolerance mechanism. | |||
5.1.4. Reference Picture Selection Indication | 5.1.4. Reference Picture Selection Indication (RPSI) | |||
Reference Picture Selection Indication (RPSI) is defined in Section | Reference Picture Selection Indication (RPSI) is defined in Section | |||
6.3.3 of the RTP/AVPF profile [RFC4585]. Some video coding standards | 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video coding standards | |||
allow the use of older reference pictures than the most recent one | allow the use of older reference pictures than the most recent one | |||
for predictive coding. If such a codec is in used, and if the | for predictive coding. If such a codec is in used, and if the | |||
encoder has learned about a loss of encoder-decoder synchronicity, a | encoder has learned about a loss of encoder-decoder synchronisation, | |||
known-as-correct reference picture can be used for future coding. | a known-as-correct reference picture can be used for future coding. | |||
The RPSI message allows this to be signalled. The use of this RTCP | The RPSI message allows this to be signalled. | |||
feedback message is OPTIONAL as a loss tolerance mechanism. | ||||
5.1.5. Temporary Maximum Media Stream Bit Rate Request | Support for RPSI messages is OPTIONAL. | |||
5.1.5. Temporal-Spatial Trade-off Request (TSTR) | ||||
The temporal-spatial trade-off request and notification are defined | ||||
in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used | ||||
to ask the video encoder to change the trade-off it makes between | ||||
temporal and spatial resolution, for example to prefer high spatial | ||||
image quality but low frame rate. | ||||
Support for TSTR requests and notifications is OPTIONAL. | ||||
5.1.6. Temporary Maximum Media Stream Bit Rate Request | ||||
This feedback message is defined in Sections 3.5.4 and 4.2.1 of the | This feedback message is defined in Sections 3.5.4 and 4.2.1 of the | |||
Codec Control Messages [RFC5104]. This message and its notification | Codec Control Messages [RFC5104]. This message and its notification | |||
message is used by a media receiver, to inform the sending party that | message are used by a media receiver to inform the sending party that | |||
there is a current limitation on the amount of bandwidth available to | there is a current limitation on the amount of bandwidth available to | |||
this receiver. This can be for various reasons, and can for example | this receiver. This may have various reasons; for example, an RTP | |||
be used by an RTP mixer to limit the media sender being forwarded by | mixer may use this message to limit the media rate of the sender | |||
the mixer (without doing media transcoding) to fit the bottlenecks | being forwarded by the mixer (without doing media transcoding) to fit | |||
existing towards the other session participants. It is REQUIRED that | the bottlenecks existing towards the other session participants. It | |||
this feedback message is supported. | is REQUIRED that this feedback message is supported. A RTP media | |||
stream sender receiving a TMMBR for its SSRC MUST follow the | ||||
limitations set by the message; the sending of TMMBR requests is | ||||
OPTIONAL. | ||||
5.2. Header Extensions | 5.2. Header Extensions | |||
The RTP specification [RFC3550] provides the capability to include | The RTP specification [RFC3550] provides the capability to include | |||
RTP header extensions containing in-band data, but the format and | RTP header extensions containing in-band data, but the format and | |||
semantics of the extensions are poorly specified. The use of header | semantics of the extensions are poorly specified. The use of header | |||
extensions is OPTIONAL in the WebRTC context, but if they are used, | extensions is OPTIONAL in the WebRTC context, but if they are used, | |||
they MUST be formatted and signalled following the general mechanism | they MUST be formatted and signalled following the general mechanism | |||
for RTP header extensions defined in [RFC5285], since this gives | for RTP header extensions defined in [RFC5285], since this gives | |||
well-defined semantics to RTP header extensions. | well-defined semantics to RTP header extensions. | |||
skipping to change at page 13, line 11 ¶ | skipping to change at page 15, line 25 ¶ | |||
Many RTP sessions require synchronisation between audio, video, and | Many RTP sessions require synchronisation between audio, video, and | |||
other content. This synchronisation is performed by receivers, using | other content. This synchronisation is performed by receivers, using | |||
information contained in RTCP SR packets, as described in the RTP | information contained in RTCP SR packets, as described in the RTP | |||
specification [RFC3550]. This basic mechanism can be slow, however, | specification [RFC3550]. This basic mechanism can be slow, however, | |||
so it is RECOMMENDED that the rapid RTP synchronisation extensions | so it is RECOMMENDED that the rapid RTP synchronisation extensions | |||
described in [RFC6051] be implemented. The rapid synchronisation | described in [RFC6051] be implemented. The rapid synchronisation | |||
extensions use the general RTP header extension mechanism [RFC5285], | extensions use the general RTP header extension mechanism [RFC5285], | |||
which requires signalling, but are otherwise backwards compatible. | which requires signalling, but are otherwise backwards compatible. | |||
5.2.2. Client to Mixer Audio Level | 5.2.2. Client-to-Mixer Audio Level | |||
The Client to Mixer Audio Level [RFC6464] is an RTP header extension | The Client to Mixer Audio Level extension [RFC6464] is an RTP header | |||
used by a client to inform a mixer about the level of audio activity | extension used by a client to inform a mixer about the level of audio | |||
in the packet the header is attached to. This enables a central node | activity in the packet to which the header is attached. This enables | |||
to make mixing or selection decisions without decoding or detailed | a central node to make mixing or selection decisions without decoding | |||
inspection of the payload. Thus reducing the needed complexity in | or detailed inspection of the payload, reducing the complexity in | |||
some types of central RTP nodes. It can also be used to save | some types of central RTP nodes. It can also save decoding resources | |||
decoding resources in a WebRTC receiver in a mesh topology, which if | in receivers, which can choose to decode only the most relevant RTP | |||
it has limited decoding resources, may select to decode only the most | media streams based on audio activity levels. | |||
relevant media streams based on audio activity levels. | ||||
The Client-to-Mixer Audio Level [RFC6464] extension is RECOMMENDED to | The Client-to-Mixer Audio Level [RFC6464] extension is RECOMMENDED to | |||
be implemented. If it is implemented, it is REQUIRED that the header | be implemented. If it is implemented, it is REQUIRED that the header | |||
extensions are encrypted according to | extensions are encrypted according to | |||
[I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | [I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | |||
contained in these header extensions can be considered sensitive. | contained in these header extensions can be considered sensitive. | |||
5.2.3. Mixer to Client Audio Level | 5.2.3. Mixer-to-Client Audio Level | |||
The Mixer to Client Audio Level header extension [RFC6465] provides | The Mixer to Client Audio Level header extension [RFC6465] provides | |||
the client with the audio level of the different sources mixed into a | the client with the audio level of the different sources mixed into a | |||
common mix by a RTP mixer. This enables a user interface to indicate | common mix by a RTP mixer. This enables a user interface to indicate | |||
the relative activity level of each session participant, rather than | the relative activity level of each session participant, rather than | |||
just being included or not based on the CSRC field. This is a pure | just being included or not based on the CSRC field. This is a pure | |||
optimisations of non critical functions, and is hence OPTIONAL to | optimisations of non critical functions, and is hence OPTIONAL to | |||
implement. If it is implemented, it is REQUIRED that the header | implement. If it is implemented, it is REQUIRED that the header | |||
extensions are encrypted according to | extensions are encrypted according to | |||
[I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | [I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | |||
contained in these header extensions can be considered sensitive. | contained in these header extensions can be considered sensitive. | |||
6. WebRTC Use of RTP: Improving Transport Robustness | 6. WebRTC Use of RTP: Improving Transport Robustness | |||
There are some tools that can make RTP flows robust against Packet | There are some tools that can make RTP flows robust against Packet | |||
loss and reduce the impact on media quality. However they all add | loss and reduce the impact on media quality. However, they all add | |||
extra bits compared to a non-robust stream. These extra bits need to | extra bits compared to a non-robust stream. These extra bits need to | |||
be considered, and the aggregate bit-rate must be rate controlled. | be considered, and the aggregate bit-rate must be rate-controlled. | |||
Thus improving robustness might require a lower base encoding | Thus, improving robustness might require a lower base encoding | |||
quality, but has the potential to give that quality with fewer | quality, but has the potential to deliver that quality with fewer | |||
errors. The mechanisms described in the following sub-sections can | errors. The mechanisms described in the following sub-sections can | |||
be used to improve tolerance to packet loss. | be used to improve tolerance to packet loss. | |||
6.1. Retransmission | 6.1. Negative Acknowledgements and RTP Retransmission | |||
Support for RTP retransmission as defined by "RTP Retransmission | ||||
Payload Format" [RFC4588] is RECOMMENDED. | ||||
The retransmission scheme in RTP allows flexible application of | ||||
retransmissions. Only selected missing packets can be requested by | ||||
the receiver. It also allows for the sender to prioritise between | ||||
missing packets based on senders knowledge about their content. | ||||
Compared to TCP, RTP retransmission also allows one to give up on a | ||||
packet that despite retransmission(s) still has not been received | ||||
within a time window. | ||||
"WebRTC Media Transport Requirements" [I-D.cbran-rtcweb-data] raises | ||||
two issues that they think makes RTP Retransmission unsuitable for | ||||
WebRTC. We here consider these issues and explain why they are in | ||||
fact not a reason to exclude RTP retransmission from the tool box | ||||
available to WebRTC media sessions. | ||||
The additional latency added by [RFC4588] will exceed the latency | As a consequence of supporting the RTP/SAVPF profile, implementations | |||
threshold for interactive voice and video: RTP Retransmission will | will support negative acknowlegdements (NACKs) for RTP data packets | |||
require at least one round trip time for a retransmission request | [RFC4585]. This feedback can be used to inform a sender of the loss | |||
and repair packet to arrive. Thus the general suitability of | of particular RTP packets, subject to the capacity limitations of the | |||
using retransmissions will depend on the actual network path | RTCP feedback channel. A sender can use this information to optimise | |||
latency between the end-points. In many of the actual usages the | the user experience by adapting the media encoding to compensate for | |||
latency between two end-points will be low enough for RTP | known lost packets, for example. | |||
retransmission to be effective. Interactive communication with | ||||
end-to-end delays of 400 ms still provide a fair quality. Even | ||||
removing half of that in end-point delays allows functional | ||||
retransmission between end-points on the same continent. In | ||||
addition, some applications may accept temporary delay spikes to | ||||
allow for retransmission of crucial codec information such an | ||||
parameter sets, intra picture etc, rather than getting no media at | ||||
all. | ||||
The undesirable increase in packet transmission at the point when | Senders are REQUIRED to understand the Generic NACK message defined | |||
congestion occurs: Congestion loss will impact the rate controls | in Section 6.2.1 of [RFC4585], but MAY choose to ignore this feedback | |||
view of available bit-rate for transmission. When using | (following Section 4.2 of [RFC4585]). Receivers MAY send NACKs for | |||
retransmission one will have to prioritise between performing | missing RTP packets; [RFC4585] provides some guidelines on when to | |||
retransmissions and the quality one can achieve with ones | send NACKs. It is not expected that a receiver will send a NACK for | |||
adaptable codecs. In many use cases one prefer error free or low | every lost RTP packet, rather it should consider the cost of sending | |||
rates of error with reduced base quality over high degrees of | NACK feedback, and the importance of the lost packet, to make an | |||
error at a higher base quality. | informed decision on whether it is worth telling the sender about a | |||
packet loss event. | ||||
The WebRTC end-point implementations will need to both select when to | The RTP Retransmission Payload Format [RFC4588] offers the ability to | |||
enable RTP retransmissions based on API settings and measurements of | retransmit lost packets based on NACK feedback. Retransmission needs | |||
the actual round trip time. In addition for each NACK request that a | to be used with care in interactive real-time applications to ensure | |||
media sender receives it will need to make a prioritisation based on | that the retransmitted packet arrives in time to be useful, but can | |||
the importance of the requested media, the probability that the | be effective in environments with relatively low network RTT (an RTP | |||
packet will reach the receiver in time for being usable, the | sender can estimate the RTT to the receivers using the information in | |||
consumption of available bit-rate and the impact of the media quality | RTCP SR and RR packets). The use of retransmissions can also | |||
for new encodings. | increase the forward RTP bandwidth, and can potentially worsen the | |||
problem if the packet loss was caused by network congestion. We | ||||
note, however, that retransmission of an important lost packet to | ||||
repair decoder state may be lower cost than sending a full intra | ||||
frame. It is not appropriate to blindly retransmit RTP packets in | ||||
response to a NACK. The importance of lost packets and the | ||||
likelihood of them arriving in time to be useful needs to be | ||||
considered before RTP retransmission is used. | ||||
To conclude, the issues raised are implementation concerns that an | Receivers are REQUIRED to implement support for RTP retransmission | |||
implementation needs to take into consideration, they are not | packets [RFC4588]. Senders MAY send RTP retransmission packets in | |||
arguments against including a highly versatile and efficient packet | response to NACKs if the RTP retransmission payload format has been | |||
loss repair mechanism. | negotiated for the session, and if the sender believes it is useful | |||
to send a retransmission of the packet(s) referenced in the NACK. An | ||||
RTP sender is not expected to retransmit every NACKed packet. | ||||
6.2. Forward Error Correction (FEC) | 6.2. Forward Error Correction (FEC) | |||
Support of some type of FEC to combat the effects of packet loss is | The use of Forward Error Correction (FEC) can provide an effective | |||
beneficial, but is heavily application dependent. However, some FEC | protection against some degree of packet loss, at the cost of steady | |||
mechanisms are encumbered. | bandwidth overhead. There are several FEC schemes that are defined | |||
for use with RTP. Some of these schemes are specific to a particular | ||||
The main benefit from FEC is the relatively low additional delay | RTP payload format, others operate across RTP packets and can be used | |||
needed to protect against packet losses. The transmission of any | with any payload format. It should be noted that using redundancy | |||
repair packets should preferably be done with a time delay that is | encoding or FEC will lead to increased playout delay, which should be | |||
just larger than any loss events normally encountered. That way the | considered when choosing the redundancy or FEC formats and their | |||
repair packet isn't also lost in the same event as the source data. | respective parameters. | |||
The amount of repair packets needed varies depending on the amount | ||||
and pattern of packet loss to be recovered, and on the mechanism used | ||||
to derive repair data. The later choice also effects the the | ||||
additional delay required to both encode the repair packets and in | ||||
the receiver to be able to recover the lost packet(s). | ||||
6.2.1. Basic Redundancy | ||||
The method for providing basic redundancy is to simply retransmit a | ||||
some time earlier sent packet. This is relatively simple in theory, | ||||
i.e. one saves any outgoing source (original) packet in a buffer | ||||
marked with a timestamp of actual transmission, some X ms later one | ||||
transmit this packet again. Where X is selected to be longer than | ||||
the common loss events. Thus any loss events shorter than X can be | ||||
recovered assuming that one doesn't get an another loss event before | ||||
all the packets lost in the first event has been received. | ||||
The downside of basic redundancy is the overhead. To provide each | ||||
packet with once chance of recovery, then the transmission rate | ||||
increases with 100% as one needs to send each packet twice. It is | ||||
possible to only redundantly send really important packets thus | ||||
reducing the overhead below 100% for some other trade-off is | ||||
overhead. | ||||
In addition the basic retransmission of the same packet using the | ||||
same SSRC in the same RTP session is not possible in RTP context. | ||||
The reason is that one would then destroy the RTCP reporting if one | ||||
sends the same packet twice with the same sequence number. Thus one | ||||
needs more elaborate mechanisms. | ||||
RTP Payload Format Support: Some RTP payload format do support basic | ||||
redundancy within the RTP paylaod format itself. Examples are | ||||
AMR-WB [RFC4867] and G.719 [RFC5404]. | ||||
RTP Payload for Redundant Audio Data: This audio and text redundancy | ||||
format defined in [RFC2198] allows for multiple levels of | ||||
redundancy with different delay in their transmissions, as long as | ||||
the source plus payload parts to be redundantly transmitted | ||||
together fits into one MTU. This should work fine for most | ||||
interactive audio and text use cases as both the codec bit-rates | ||||
and the framing intervals normally allow for this requirement to | ||||
hold. This payload format also don't increase the packet rate, as | ||||
original data and redundant data are sent together. This format | ||||
does not allow perfect recovery, only recovery of information | ||||
deemed necessary for audio, for example the sequence number of the | ||||
original data is lost. | ||||
RTP Retransmission Format: The RTP Retransmission Payload format | ||||
[RFC4588] can be used to pro-actively send redundant packets using | ||||
either SSRC or session multiplexing. By using different SSRCs or | ||||
a different session for the redundant packets the RTCP receiver | ||||
reports will be correct. The retransmission payload format is | ||||
used to recover the packets original data thus enabling a perfect | ||||
recovery. | ||||
Duplication Grouping Semantics in the Session Description Protocol: | ||||
This [I-D.begen-mmusic-redundancy-grouping] is proposal for new | ||||
SDP signalling to indicate media stream duplication using | ||||
different RTP sessions, or different SSRCs to separate the source | ||||
and the redundant copy of the stream. | ||||
6.2.2. Block Based FEC | ||||
Block based redundancy collects a number of source packets into a | ||||
data block for processing. The processing results in some number of | ||||
repair packets that is then transmitted to the other end allowing the | ||||
receiver to attempt to recover some number of lost packets in the | ||||
block. The benefit of block based approaches is the overhead which | ||||
can be lower than 100% and still recover one or more lost source | ||||
packet from the block. The optimal block codes allows for each | ||||
received repair packet to repair a single loss within the block. | ||||
Thus 3 repair packets that are received should allow for any set of 3 | ||||
packets within the block to be recovered. In reality one commonly | ||||
don't reach this level of performance for any block sizes and number | ||||
of repair packets, and taking the computational complexity into | ||||
account there are even more trade-offs to make among the codes. | ||||
One result of the block based approach is the extra delay, as one | ||||
needs to collect enough data together before being able to calculate | ||||
the repair packets. In addition sufficient amount of the block needs | ||||
to be received prior to recovery. Thus additional delay are added on | ||||
both sending and receiving side to ensure possibility to recover any | ||||
packet within the block. | ||||
The redundancy overhead and the transmission pattern of source and | ||||
repair data can be altered from block to block, thus allowing a | ||||
adaptive process adjusting to meet the actual amount of loss seen on | ||||
the network path and reported in RTCP. | ||||
The alternatives that exist for block based FEC with RTP are the | ||||
following: | ||||
RTP Payload Format for Generic Forward Error Correction: This RTP | ||||
payload format [RFC5109] defines an XOR based recovery packet. | ||||
This is the simplest processing wise that an block based FEC | ||||
scheme can be. It also results in some limited properties, as | ||||
each repair packet can only repair a single loss. To handle | ||||
multiple close losses a scheme of hierarchical encodings are need. | ||||
Thus increasing the overhead significantly. | ||||
Forward Error Correction (FEC) Framework: This framework | ||||
[I-D.ietf-fecframe-framework] defines how not only RTP packets but | ||||
how arbitrary packet flows can be protected. Some solutions | ||||
produced or under development in FECFRAME WG are RTP specific. | ||||
There exist alternatives supporting block codes such as Reed- | ||||
Salomon and Raptor. | ||||
6.2.3. Recommendations for FEC | If an RTP payload format negotiated for use in a WebRTC session | |||
supports redundant transmission or FEC as a standard feature of that | ||||
payload format, then that support MAY be used in the WebRTC session, | ||||
subject to any appropriate signalling. | ||||
Open Issue: Decision of need for FEC and if to be included in | There are several block-based FEC schemes that are designed for use | |||
recommendation which FEC scheme to be supported needs to be | with RTP independent of the chosen RTP payload format. At the time | |||
documented. | of this writing there is no consensus on which, if any, of these FEC | |||
schemes is appropriate for use in the WebRTC context. Accordingly, | ||||
this memo makes no recommendation on the choice of block-based FEC | ||||
for WebRTC use. | ||||
7. WebRTC Use of RTP: Rate Control and Media Adaptation | 7. WebRTC Use of RTP: Rate Control and Media Adaptation | |||
WebRTC will be used in very varied network environment with a | WebRTC will be used in very varied network environment with a | |||
hetrogenous set of link technologies, including wired and wireless, | heterogeneous set of link technologies, including wired and wireless, | |||
interconnecting peers at different topological locations resulting in | interconnecting peers at different topological locations resulting in | |||
network paths with widely varying one way delays, bit-rate capacity, | network paths with widely varying one way delays, bit-rate capacity, | |||
load levels and traffic mixes. In addition individual end-points | load levels and traffic mixes. In addition, individual end-points | |||
will open one or more WebRTC sessions between one or more peers. | will open one or more WebRTC sessions between one or more peers. | |||
Each of these session may contain different mixes of media and data | Each of these session may contain different mixes of media and data | |||
flows. Assymetric usage of media bit-rates and number of media | flows. Asymmetric usage of media bit-rates and number of RTP media | |||
streams is also to be expected. A single end-point may receive zero | streams is also to be expected. A single end-point may receive zero | |||
to many simultanous media streams while itself transmitting one or | to many simultaneous RTP media streams while itself transmitting one | |||
more streams. | or more streams. | |||
The WebRTC application is very dependent from a quality perspective | The WebRTC application is very dependent from a quality perspective | |||
on the media adapation working well so that an end-point doesn't | on the media adaptation working well so that an end-point doesn't | |||
transmit significantly more than the path is capable of handling. If | transmit significantly more than the path is capable of handling. If | |||
it would, the result would be high levels of packet loss or delay | it would, the result would be high levels of packet loss or delay | |||
spikes causing media degradations. | spikes causing media quality degradation. | |||
WebRTC applications using more than a single media stream of any | WebRTC applications using more than a single RTP media stream of any | |||
media type or data flows has an additional concern. In this case the | media type or data flows have an additional concern. In this case, | |||
different flows should try to avoid affecting each other negatively. | the different flows should try to avoid affecting each other | |||
In addition in case there is a resource limiation, the available | negatively. In addition, in case there is a resource limitation, the | |||
resources needs to be shared. How to share them is something the | available resources need to be shared. How to share them is | |||
application should prioritize so that the limiation in quality or | something the application should prioritize so that the limitations | |||
capabilities are the ones that provide the least affect on the | in quality or capabilities are those that have the least impact on | |||
application. | the application. | |||
This hetrogenous situation results in a requirement to have | Overall, the diversity of operating environments lead to the need for | |||
functionality that adapts to the available capacity and that competes | functionality that adapts to the available capacity and that competes | |||
fairly with other network flows. If it would not compete fairly | fairly with other network flows. If it would not compete fairly | |||
enough WebRTC could be used as an attack method for starving out | enough WebRTC could be used as an attack method for starving out | |||
other traffic on specific links as long as the attacker is able to | other traffic on specific links as long as the attacker is able to | |||
create traffic across a specific link. This is not far-fetched for a | create traffic across the links in question. A possible attack | |||
web-service capable of attracting large number of end-points and use | scenario is to use a web-service capable of attracting large numbers | |||
the service, combined with BGP routing state a server could pick | of end-points, combined with BGP routing state to have the server | |||
client pairs to drive traffic to specific paths. | pick client pairs to drive traffic to specific paths. | |||
The above estalish a clear need based on several reasons why there | The above clearly motivates the need for a well working media | |||
need to be a well working media adaptation mechanism. This mechanism | adaptation mechanism. This mechanism also have a number of | |||
also have a number of requirements on what services it should provide | requirements on what services it should provide and what performance | |||
and what performance it needs to provide. | it needs to provide. | |||
The biggest issue is that there are no standardised and ready to use | The biggest issue is that there are no standardised and ready to use | |||
mechanism that can simply be included in WebRTC. Thus there will be | mechanism that can simply be included in WebRTC. Thus, there will be | |||
need for the IETF to produce such a specification. Therefore the | a need for the IETF to produce such a specification. Therefore, the | |||
suggested way forward is to specify requirements on any solution for | suggested way forward is to specify requirements on any solution for | |||
the media adaptation. These requirements is for now proposed to be | the media adaptation. For now, we propose that these requirements be | |||
documented in this specification. In addition a proposed detailed | documented in this specification. In addition, a proposed detailed | |||
solution will be developed, but is expected to take longer time to | solution will be developed, but is expected to take longer time to | |||
finalize than this document. | finalize than this document. | |||
7.1. Congestion Control Requirements | 7.1. Congestion Control Requirements | |||
Requirements for congestion control of WebRTC sessions are discussed | Requirements for congestion control of WebRTC sessions are discussed | |||
in [I-D.jesup-rtp-congestion-reqs]. | in [I-D.jesup-rtp-congestion-reqs]. | |||
Implementations are REQUIRED to implement the RTP circuit breakers | Implementations are REQUIRED to implement the RTP circuit breakers | |||
described in [I-D.perkins-avtcore-rtp-circuit-breakers]. | described in [I-D.perkins-avtcore-rtp-circuit-breakers]. | |||
(tbd: Should add the RTP/RTCP Mechanisms that an WebRTC | ||||
implementation is required to support. Potential candidates include | ||||
Transmission Timestamps (RFC 5450).) | ||||
7.2. Rate Control Boundary Conditions | 7.2. Rate Control Boundary Conditions | |||
The session establishment signalling will establish certain boundary | The session establishment signalling will establish certain boundary | |||
that the media bit-rate adaptation can act within. First of all the | that the media bit-rate adaptation can act within. First of all the | |||
set of media codecs provide practical limitations in the supported | set of media codecs provide practical limitations in the supported | |||
bit-rate span where it can provide useful quality, which | bit-rate span where it can provide useful quality, which | |||
packetization choices that exist. Next the signalling can establish | packetization choices that exist. Next the signalling can establish | |||
maximum media bit-rate boundaries using SDP b=AS or b=CT. | maximum media bit-rate boundaries using SDP b=AS or b=CT. | |||
7.3. RTCP Limiations | (tbd: This section needs expanding on how to use these limits) | |||
7.3. RTCP Limitations for Congestion Control | ||||
Experience with the congestion control algorithms of TCP [RFC5681], | Experience with the congestion control algorithms of TCP [RFC5681], | |||
TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown | TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown | |||
that feedback on packet arrivals needs to be sent roughly once per | that feedback on packet arrivals needs to be sent roughly once per | |||
round trip time. We note that the capabilities of real-time media | round trip time. We note that the real-time media traffic may not | |||
traffic to adapt to changing path conditions may be less rapid than | have to adapt to changing path conditions as rapidly as needed for | |||
for the elastic applications TCP was designed for, but frequent | the elastic applications TCP was designed for, but frequent feedback | |||
feedback is still required to allow the congestion control algorithm | is still required to allow the congestion control algorithm to track | |||
to track the path dynamics. | the path dynamics. | |||
The total RTCP bandwidth is limited in its transmission rate to a | The total RTCP bandwidth is limited in its transmission rate to a | |||
fraction of the RTP traffic (by default 5%). RTCP packets are larger | fraction of the RTP traffic (by default 5%). RTCP packets are larger | |||
than, e.g., TCP ACKs (even when non-compound RTCP packets are used). | than, e.g., TCP ACKs (even when non-compound RTCP packets are used). | |||
The media stream bit rate thus limits the maximum feedback rate as a | The RTP media stream bit rate thus limits the maximum feedback rate | |||
function of the mean RTCP packet size. | as a function of the mean RTCP packet size. | |||
Interactive communication may not be able to afford waiting for | Interactive communication may not be able to afford waiting for | |||
packet losses to occur to indicate congestion, because an increase in | packet losses to occur to indicate congestion, because an increase in | |||
playout delay due to queuing (most prominent in wireless networks) | playout delay due to queuing (most prominent in wireless networks) | |||
may easily lead to packets being dropped due to late arrival at the | may easily lead to packets being dropped due to late arrival at the | |||
receiver. Therefore, more sophisticated cues may need to be reported | receiver. Therefore, more sophisticated cues may need to be reported | |||
-- to be defined in a suitable congestion control framework as noted | -- to be defined in a suitable congestion control framework as noted | |||
above -- which, in turn, increase the report size again. For | above -- which, in turn, increase the report size again. For | |||
example, different RTCP XR report blocks (jointly) provide the | example, different RTCP XR report blocks (jointly) provide the | |||
necessary details to implement a variety of congestion control | necessary details to implement a variety of congestion control | |||
skipping to change at page 20, line 10 ¶ | skipping to change at page 20, line 7 ¶ | |||
In group communication, the share of RTCP bandwidth needs to be | In group communication, the share of RTCP bandwidth needs to be | |||
shared by all group members, reducing the capacity and thus the | shared by all group members, reducing the capacity and thus the | |||
reporting frequency per node. | reporting frequency per node. | |||
Example: assuming 512 kbit/s video yields 3200 bytes/s RTCP | Example: assuming 512 kbit/s video yields 3200 bytes/s RTCP | |||
bandwidth, split across two entities in a point-to-point session. An | bandwidth, split across two entities in a point-to-point session. An | |||
endpoint could thus send a report of 100 bytes about every 70ms or | endpoint could thus send a report of 100 bytes about every 70ms or | |||
for every other frame in a 30 fps video. | for every other frame in a 30 fps video. | |||
7.4. Legacy Interop Limitations | 7.4. Congestion Control Interoperability With Legacy Systems | |||
Congestion control interoperability with most type of legacy devices, | ||||
even using an translator could be difficult. There are numerous | ||||
reasons for this: | ||||
No RTCP Support: There exist legacy implementations that does not | ||||
even implement RTCP at all. Thus no feedback at all is provided. | ||||
RTP/AVP Minimal RTCP Interval of 5s: RTP [RFC3550] under the RTP/AVP | ||||
profile specifies a recommended minimal fixed interval of 5 | ||||
seconds. Sending RTCP report blocks as seldom as 5 seconds makes | ||||
it very difficult for a sender to use these reports and react to | ||||
any congestion event. | ||||
RTP/AVP Scaled Minimal Interval: If a legacy device uses the scaled | There are legacy implementations that do not implement RTCP, and | |||
minimal RTCP compound interval, the "RECOMMENDED value for the | hence do not provide any congestion feedback. Congestion control | |||
reduced minimum in seconds is 360 divided by the session bandwidth | cannot be performed with these end-points. WebRTC implementations | |||
in kilobits/second" ([RFC3550], section 6.2). The minimal | that must interwork with such end-points MUST limit their | |||
interval drops below a second, still several times the RTT in | transmission to a low rate, equivalent to a VoIP call using a low | |||
almost all paths in the Internet, when the session bandwidht | bandwidth codec, that is unlikely to cause any significant | |||
becomes 360 kbps. A session bandwidth of 1 Mbps still has a | congestion. | |||
minimal interval of 360 ms. Thus, with the exception for rather | ||||
high bandwidth sessions, getting frequent enough RTCP Report | ||||
Blocks to report on the order of the RTT is very difficult as long | ||||
as the legacy device uses the RTP/AVP profile. | ||||
RTP/AVPF Supporting Legacy Device: If a legacy device supports RTP/ | When interworking with legacy implementations that support RTCP using | |||
AVPF, then that enables negotation of important parameters for | the RTP/AVP profile [RFC3551], congestion feedback is provided in | |||
frequent reporting, such as the "trr-int" parameter, and the | RTCP RR packets every few seconds. Implementations that are required | |||
possibility that the end-point supports some useful feedback | to interwork with such end-points MUST ensure that they keep within | |||
format for congestion control purpose such as TMMBR [RFC5104]. | the RTP circuit breaker [I-D.perkins-avtcore-rtp-circuit-breakers] | |||
constraints to limit the congestion they can cause. | ||||
It has been suggested on the WebRTC mailing list that if | If a legacy end-point supports RTP/AVPF, this enables negotiation of | |||
interoperating with really limited legacy devices an WebRTC end-point | important parameters for frequent reporting, such as the "trr-int" | |||
may not send more than 64 kbps of media streams, to avoid it causing | parameter, and the possibility that the end-point supports some | |||
massive congestion on most paths in the Internet when communicating | useful feedback format for congestion control purpose such as TMMBR | |||
with a legacy node not providing sufficient feedback for effective | [RFC5104]. Implementations that are required to interwork with such | |||
congestion control. This warrants further discussion as there is | end-points MUST ensure that they stay within the RTP circuit breaker | |||
clearly a number of link layers that don't even provide that amount | [I-D.perkins-avtcore-rtp-circuit-breakers] constraints to limit the | |||
of bit-rate consistently, and that assumes no competing traffic. | congestion they can cause, but may find that they can achieve better | |||
congestion response depending on the amount of feedback that is | ||||
available. | ||||
8. WebRTC Use of RTP: Performance Monitoring | 8. WebRTC Use of RTP: Performance Monitoring | |||
RTCP does contains a basic set of RTP flow monitoring points like | RTCP does contains a basic set of RTP flow monitoring metrics like | |||
packet loss and jitter. There exist a number of extensions that | packet loss and jitter. There are a number of extensions that could | |||
could be included in the set to be supported. However, in most cases | be included in the set to be supported. However, in most cases which | |||
which RTP monitoring that is needed depends on the application, which | RTP monitoring that is needed depends on the application, which makes | |||
makes it difficult to select which to include when the set of | it difficult to select which to include when the set of applications | |||
applications is very large. | is very large. | |||
Exposing some metrics in the WebRTC API should be considered allowing | Exposing some metrics in the WebRTC API should be considered allowing | |||
the application to gather the measurements of interest. However, | the application to gather the measurements of interest. However, | |||
security implications for the different data sets exposed will need | security implications for the different data sets exposed will need | |||
to be considered in this. | to be considered in this. | |||
(tbd: If any RTCP XR metrics should be added is still an open | ||||
question, but possible to extend at a later stage) | ||||
9. WebRTC Use of RTP: Future Extensions | 9. WebRTC Use of RTP: Future Extensions | |||
It is possible that the core set of RTP protocols and RTP extensions | It is possible that the core set of RTP protocols and RTP extensions | |||
specified in this memo will prove insufficient for the future needs | specified in this memo will prove insufficient for the future needs | |||
of WebRTC applications. In this case, future updates to this memo | of WebRTC applications. In this case, future updates to this memo | |||
MUST be made following the Guidelines for Writers of RTP Payload | MUST be made following the Guidelines for Writers of RTP Payload | |||
Format Specifications [RFC2736] and Guidelines for Extending the RTP | Format Specifications [RFC2736] and Guidelines for Extending the RTP | |||
Control Protocol [RFC5968], and SHOULD take into account any future | Control Protocol [RFC5968], and SHOULD take into account any future | |||
guidelines for extending RTP and related protocols that have been | guidelines for extending RTP and related protocols that have been | |||
developed. | developed. | |||
Authors of future extensions are urged to consider the wide range of | Authors of future extensions are urged to consider the wide range of | |||
environments in which RTP is used when recommending extensions, since | environments in which RTP is used when recommending extensions, since | |||
extensions that are applicable in some scenarios can be problematic | extensions that are applicable in some scenarios can be problematic | |||
in others. Where possible, the WebRTC framework should adopt RTP | in others. Where possible, the WebRTC framework should adopt RTP | |||
extensions that are of general utility, to enable easy gatewaying to | extensions that are of general utility, to enable easy gatewaying to | |||
other applications using RTP, rather than adopt mechanisms that are | other applications using RTP, rather than adopt mechanisms that are | |||
narrowly targetted at specific WebRTC use cases. | narrowly targeted at specific WebRTC use cases. | |||
10. Signalling Considerations | 10. Signalling Considerations | |||
RTP is built with the assumption of an external signalling channel | RTP is built with the assumption of an external signalling channel | |||
that can be used to configure the RTP sessions and their features. | that can be used to configure the RTP sessions and their features. | |||
The basic configuration of an RTP session consists of the following | The basic configuration of an RTP session consists of the following | |||
parameters: | parameters: | |||
RTP Profile: The name of the RTP profile to be used in session. The | RTP Profile: The name of the RTP profile to be used in session. The | |||
RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate | RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate | |||
on basic level, as can their secure variants RTP/SAVP [RFC3711] | on basic level, as can their secure variants RTP/SAVP [RFC3711] | |||
and RTP/SAVPF [RFC5124]. The secure variants of the profiles do | and RTP/SAVPF [RFC5124]. The secure variants of the profiles do | |||
not directly interoperate with the non-secure variants, due to the | not directly interoperate with the non-secure variants, due to the | |||
presence of additional header fields in addition to any | presence of additional header fields in addition to any | |||
cryptographic transformation of the packet content. As WebRTC | cryptographic transformation of the packet content. As WebRTC | |||
requires the usage of the SAVPF profile only a single profile will | requires the usage of the RTP/SAVPF profile this can be inferred | |||
need to be signalled. Interworking functions may transform this | as there is only a single profile, but in SDP this is still | |||
into SAVP for a legacy use case by indicating to the WebRTC end- | required information to be signalled. Interworking functions may | |||
point a SAVPF end-point and limiting the usage of the a=rtcp | transform this into RTP/SAVP for a legacy use case by indicating | |||
attribute to indicate a trr-int value of 4 seconds. | to the WebRTC end-point a RTP/SAVPF end-point and limiting the | |||
usage of the a=rtcp attribute to indicate a trr-int value of 4 | ||||
seconds. | ||||
Transport Information: Source and destination address(s) and ports | Transport Information: Source and destination IP address(s) and | |||
for RTP and RTCP MUST be signalled for each RTP session. In | ports for RTP and RTCP MUST be signalled for each RTP session. In | |||
WebRTC these end-points will be provided by ICE that signalls | WebRTC these transport addresses will be provided by ICE that | |||
candidates and arrive at nominated candidate pairs. If RTP and | signals candidates and arrives at nominated candidate address | |||
RTCP multiplexing [RFC5761] is to be used, such that a single port | pairs. If RTP and RTCP multiplexing [RFC5761] is to be used, such | |||
is used for RTP and RTCP flows, this MUST be signalled (see | that a single port is used for RTP and RTCP flows, this MUST be | |||
Section 4.5). If several RTP sessions are to be multiplexed onto | signalled (see Section 4.5). If several RTP sessions are to be | |||
a single transport layer flow, this MUST also be signalled (see | multiplexed onto a single transport layer flow, this MUST also be | |||
Section 4.4). | signalled (see Section 4.4). | |||
RTP Payload Types, media formats, and media format | RTP Payload Types, media formats, and media format | |||
parameters: The mapping between media type names (and hence the RTP | parameters: The mapping between media type names (and hence the RTP | |||
payload formats to be used) and the RTP payload type numbers must | payload formats to be used) and the RTP payload type numbers MUST | |||
be signalled. Each media type may also have a number of media | be signalled. Each media type MAY also have a number of media | |||
type parameters that must also be signalled to configure the codec | type parameters that MUST also be signalled to configure the codec | |||
and RTP payload format (the "a=fmtp:" line from SDP). | and RTP payload format (the "a=fmtp:" line from SDP). | |||
RTP Extensions: The RTP extensions one intends to use need to be | RTP Extensions: The RTP extensions to be used SHOULD be agreed upon, | |||
agreed upon, including any parameters for each respective | including any parameters for each respective extension. At the | |||
extension. At the very least, this will help avoiding using | very least, this will help avoiding using bandwidth for features | |||
bandwidth for features that the other end-point will ignore. But | that the other end-point will ignore. But for certain mechanisms | |||
for certain mechanisms there is requirement for this to happen as | there is requirement for this to happen as interoperability | |||
interoperability failure otherwise happens. | failure otherwise happens. | |||
RTCP Bandwidth: Support for exchanging RTCP Bandwidth values to the | RTCP Bandwidth: Support for exchanging RTCP Bandwidth values to the | |||
end-points will be necessary, as described in "Session Description | end-points will be necessary. This SHALL be done as described in | |||
Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) | "Session Description Protocol (SDP) Bandwidth Modifiers for RTP | |||
Bandwidth" [RFC3556], or something semantically equivalent. This | Control Protocol (RTCP) Bandwidth" [RFC3556], or something | |||
also ensures that the end-points have a common view of the RTCP | semantically equivalent. This also ensures that the end-points | |||
bandwidth, this is important as too different view of the | have a common view of the RTCP bandwidth, this is important as too | |||
bandwidths may lead to failure to interoperate. | different view of the bandwidths may lead to failure to | |||
interoperate. | ||||
These parameters are often expressed in SDP messages conveyed within | These parameters are often expressed in SDP messages conveyed within | |||
an offer/answer exchange. RTP does not depend on SDP or on the | an offer/answer exchange. RTP does not depend on SDP or on the | |||
offer/answer model, but does require all the necessary parameters to | offer/answer model, but does require all the necessary parameters to | |||
be agreed somehow, and provided to the RTP implementation. We note | be agreed upon, and provided to the RTP implementation. We note that | |||
that in the WebRTC context it will depend on the signalling model and | in the WebRTC context it will depend on the signalling model and API | |||
API how these parameters need to be configured but they will be need | how these parameters need to be configured but they will be need to | |||
to either set in the API or explicitly signalled between the peers. | either set in the API or explicitly signalled between the peers. | |||
11. WebRTC API Considerations | 11. WebRTC API Considerations | |||
The following sections describe how the WebRTC API features map onto | The following sections describe how the WebRTC API features map onto | |||
the RTP mechanisms described in this memo. | the RTP mechanisms described in this memo. | |||
11.1. API MediaStream to RTP Mapping | 11.1. API MediaStream to RTP Mapping | |||
The WebRTC API and its media function have the concept of a | The WebRTC API and its media function have the concept of a WebRTC | |||
MediaStream that consists of zero or more tracks. Where a track is | MediaStream that consists of zero or more tracks. A track is an | |||
an individual stream of media from any type of media source like a | individual stream of media from any type of media source like a | |||
microphone or a camera, but also coneptual sources, like a audio mix | microphone or a camera, but also conceptual sources, like a audio mix | |||
or a video composition. The tracks within a MediaStream are expected | or a video composition, are possible. The tracks within a WebRTC | |||
to be synchronized. | MediaStream are expected to be synchronized. | |||
A track correspondes to the media received with one particular SSRC. | A track correspond to the media received with one particular SSRC. | |||
There might be additional SSRCs associated with that SSRC, like for | There might be additional SSRCs associated with that SSRC, like for | |||
RTP retransmission or Forward Error Correction. However, one SSRC | RTP retransmission or Forward Error Correction. However, one SSRC | |||
will identify a media stream and its timing. | will identify an RTP media stream and its timing. | |||
Thus a MediaStream is a collection of SSRCs carrying the different | As a result, a WebRTC MediaStream is a collection of SSRCs carrying | |||
media included in the synchornized aggregate. Thus also the | the different media included in the synchronised aggregate. | |||
synchronization state associated with the included SSRCs are part of | Therefore, also the synchronization state associated with the | |||
concept. One important thing to consider is that there can be | included SSRCs are part of concept. It is important to consider that | |||
multiple different MediaStreams containing a given Track (SSRC). | there can be multiple different WebRTC MediaStreams containing a | |||
Thus to avoid unnecessary duplication of media at transport level one | given Track (SSRC). To avoid unnecessary duplication of media at the | |||
need to do the binding of which MediaStreams a given SSRC is | transport level in such cases, a need arises for a binding defining | |||
associated with at signalling level. | which WebRTC MediaStreams a given SSRC is associated with at the | |||
signalling level. | ||||
A proposal for how the binding between MediaStreams and SSRC can be | A proposal for how the binding between WebRTC MediaStreams and SSRC | |||
done exist in "Cross Session Stream Identification in the Session | can be done is specified in "Cross Session Stream Identification in | |||
Description Protocol" [I-D.alvestrand-rtcweb-msid]. | the Session Description Protocol" [I-D.alvestrand-rtcweb-msid]. | |||
(tbd: This text must be improved and achieved consensus on. Interim | ||||
meeting in June 2012 shows large differences in opinions.) | ||||
12. RTP Implementation Considerations | 12. RTP Implementation Considerations | |||
The following provide some guidance on the implementation of the RTP | The following provide some guidance on the implementation of the RTP | |||
features described in this memo. | features described in this memo. | |||
This section discusses RTP functionality that is part of the RTP | This section discusses RTP functionality that is part of the RTP | |||
standard, required by decisions made, or to enable use cases raised | standard, required by decisions made, or to enable use cases raised | |||
and their motivations. This discussion is done from an WebRTC end- | and their motivations. This discussion is from an WebRTC end-point | |||
point perspective. It will occassional go into central nodes, but as | perspective. It will occasionally talk about central nodes, but as | |||
the specification is for an end-point that is where the focus lies. | this specification is for an end-point, this is where the focus lies. | |||
For more discussion on the central nodes and details about RTP | For more discussion on the central nodes and details about RTP | |||
topologies please reveiw Appendix A. | topologies please see Appendix A. | |||
The section will touch on the relation with certain RTP/RTCP | The section will touch on the relation with certain RTP/RTCP | |||
extensions, but will focus on the RTP core functionality. The | extensions, but will focus on the RTP core functionality. The | |||
definition of what functionalities and the level of requirement on | definition of what functionalities and the level of requirement on | |||
implementing it is defined in Section 2. | implementing it is defined in Section 2. | |||
12.1. RTP Sessions and PeerConnection | 12.1. RTP Sessions and PeerConnection | |||
An RTP session is an association among RTP nodes, which have one | An RTP session is an association among RTP nodes, which have one | |||
common SSRC space. An RTP session can include any number of end- | common SSRC space. An RTP session can include any number of end- | |||
points and nodes sourcing, sinking, manipulating or reporting on the | points and nodes sourcing, sinking, manipulating or reporting on the | |||
media streams being sent within the RTP session. A PeerConnection | RTP media streams being sent within the RTP session. A | |||
being a point to point association between an end-point and another | PeerConnection being a point-to-point association between an end- | |||
node. That peer node may be both an end-point or centralized | point and another node. That peer node may be both an end-point or | |||
processing node of some type, thus the RTP session may terminate | centralized processing node of some type; thus, the RTP session may | |||
immediately on the far end of the PeerConnection, but it may also | terminate immediately on the far end of the PeerConnection, but it | |||
continue as further discused below in Multiparty (Section 12.3) and | may also continue as further discussed below in Multiparty | |||
Multiple RTP End-points (Section 12.7). | (Section 12.3) and Multiple RTP End-points (Section 12.7). | |||
A PeerConnection can contain one or more RTP session depending on how | A PeerConnection can contain one or more RTP session depending on how | |||
it is setup and how many UDP flows it uses. A common usage has been | it is setup and how many UDP flows it uses. A common usage has been | |||
to have one RTP session per media type, e.g. one for audio and one | to have one RTP session per media type, e.g. one for audio and one | |||
for Video, each sent over different UDP flows. However, the default | for video, each sent over different UDP flows. However, the default | |||
usage in WebRTC will be to use one RTP session for all media types. | usage in WebRTC will be to use one RTP session for all media types. | |||
This usage then uses only one UDP flow, as also RTP and RTCP | This usage then uses only one UDP flow, as also RTP and RTCP | |||
multiplexing is mandated (Section 4.5). However, for legacy | multiplexing is mandated (Section 4.5). However, for legacy | |||
interworking and network prioritization (Section 12.9) based on flows | interworking and network prioritization (Section 12.9) based on | |||
a WebRTC end-point needs to support a mode of operation where one RTP | flows, a WebRTC end-point needs to support a mode of operation where | |||
session per media type is used. Currently each RTP session must use | one RTP session per media type is used. Currently, each RTP session | |||
its own UDP flow. Discussion are ongoing if a solution enabling | must use its own UDP flow. Discussions are ongoing if a solution | |||
multiple RTP sessions over a single UDP flow, see Section 4.4. | enabling multiple RTP sessions over a single UDP flow, see | |||
Section 4.4. | ||||
The multi-unicast or mesh based multi-party topology (Figure 1) is | The multi-unicast- or mesh-based multi-party topology (Figure 1) is a | |||
best to raise in this section as it concers the relation between RTP | good example for this section as it concerns the relation between RTP | |||
sessions and PeerConnections. In this topology, each participant | sessions and PeerConnections. In this topology, each participant | |||
sends individual unicast RTP/UDP/IP flows to each of the other | sends individual unicast RTP/UDP/IP flows to each of the other | |||
participants using independent PeerConnections in a full mesh. This | participants using independent PeerConnections in a full mesh. This | |||
topology has the benefit of not requiring central nodes. The | topology has the benefit of not requiring central nodes. The | |||
downside is that it increases the used bandwidth at each sender by | downside is that it increases the used bandwidth at each sender by | |||
requiring one copy of the media streams for each participant that are | requiring one copy of the RTP media streams for each participant that | |||
part of the same session beyond the sender itself. Hence, this | are part of the same session beyond the sender itself. Hence, this | |||
topology is limited to scenarios with few participants unless the | topology is limited to scenarios with few participants unless the | |||
media is very low bandwidth. | media is very low bandwidth. | |||
+---+ +---+ | +---+ +---+ | |||
| A |<---->| B | | | A |<---->| B | | |||
+---+ +---+ | +---+ +---+ | |||
^ ^ | ^ ^ | |||
\ / | \ / | |||
\ / | \ / | |||
v v | v v | |||
skipping to change at page 25, line 29 ¶ | skipping to change at page 25, line 14 ¶ | |||
session, spanning multiple peer-to-peer transport layer connections, | session, spanning multiple peer-to-peer transport layer connections, | |||
or as several pairwise RTP sessions, one between each pair of peers. | or as several pairwise RTP sessions, one between each pair of peers. | |||
To maintain a coherent mapping between the relation between RTP | To maintain a coherent mapping between the relation between RTP | |||
sessions and PeerConnections we recommend that one implements this as | sessions and PeerConnections we recommend that one implements this as | |||
individual RTP sessions. The only downside is that end-point A will | individual RTP sessions. The only downside is that end-point A will | |||
not learn of the quality of any transmission happening between B and | not learn of the quality of any transmission happening between B and | |||
C based on RTCP. This has not been seen as a significant downside as | C based on RTCP. This has not been seen as a significant downside as | |||
no one has yet seen a clear need for why A would need to know about | no one has yet seen a clear need for why A would need to know about | |||
the B's and C's communication. An advantage of using separate RTP | the B's and C's communication. An advantage of using separate RTP | |||
sessions is that it enables using different media bit-rates to the | sessions is that it enables using different media bit-rates to the | |||
differnt peers, thus not forcing B to endure the same quality | different peers, thus not forcing B to endure the same quality | |||
reductions if there are limiations in the transport from A to C as C | reductions if there are limitations in the transport from A to C as C | |||
will. | will. | |||
12.2. Multiple Sources | 12.2. Multiple Sources | |||
A WebRTC end-point may have multiple cameras, microphones or audio | A WebRTC end-point may have multiple cameras, microphones or audio | |||
inputs thus a single end-point can source multiple media streams | inputs and thus a single end-point can source multiple RTP media | |||
concurrently of the same media type. In addition the above discussed | streams of the same media type concurrently. Even if an end-point | |||
criteria to support multiple media types in one single RTP session | does not have multiple media sources of the same media type it will | |||
results that also an end-point that has one audio and one video | be required to support transmission using multiple SSRCs concurrently | |||
source still need two transmit using two SSRCs concurrently. As | in the same RTP session. This is due to the requirement on an WebRTC | |||
multi-party conferences are supported, as discussed below in | end-point to support multiple media types in one RTP session. For | |||
Section 12.3, a WebRTC end-point will need to be capable of | example, one audio and one video source can result in the end-point | |||
receiving, decoding and playout multiple media streams of the same | sending with two different SSRCs in the same RTP session. As multi- | |||
type concurrently. | party conferences are supported, as discussed below in Section 12.3, | |||
a WebRTC end-point will need to be capable of receiving, decoding and | ||||
playout multiple RTP media streams of the same type concurrently. | ||||
Open Issue:Are any mechanism needed to signal limiations in the | tbd: Are any mechanism needed to signal limitations in the number of | |||
number of SSRC that an end-point can handle? | SSRC that an end-point can handle? | |||
12.3. Multiparty | 12.3. Multiparty | |||
There exist numerous situations and clear use cases for WebRTC | There are numerous situations and clear use cases for WebRTC | |||
supporting sessions supoprting multi-party. This can be realized in | supporting RTP sessions supporting multi-party. This can be realized | |||
a number of ways using a number of different implementations | in a number of ways using a number of different implementation | |||
strategies. This focus on the different set of WebRTC end-point | strategies. In the following, the focus is on the different set of | |||
requirements that arise from different sets of multi-party | WebRTC end-point requirements that arise from different sets of | |||
topologies. | multi-party topologies. | |||
The multi-unicast mesh (Figure 1) based multi-party topoology | The multi-unicast mesh (Figure 1)-based multi-party topology | |||
discussed above provides a non-centralized solution but can easily | discussed above provides a non-centralized solution but may incur a | |||
tax the end-points outgoing paths. It may also consume large amount | heavy tax on the end-points' outgoing paths. It may also consume | |||
of encoding resources if each outgoing stream is specifically | large amount of encoding resources if each outgoing stream is | |||
encoded. If an encoding is transmitted to multiple parties, either | specifically encoded. If an encoding is transmitted to multiple | |||
as in the mesh case or when using relaying central nodes (see below) | parties, as in some implementations of the mesh case, a requirement | |||
a requirement on the end-point becomes to be able to create media | on the end-point becomes to be able to create RTP media streams | |||
streams suitable to multiple destinations requirements. These | suitable for multiple destinations requirements. These requirements | |||
requirements may both be dependent on transport path and the | may both be dependent on transport path and the different end-points | |||
different end-points preferences related to playout of the media. | preferences related to playout of the media. | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| Mixer | | | Mixer | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 2: RTP Mixer with Only Unicast Paths | Figure 2: RTP Mixer with Only Unicast Paths | |||
A Mixer (Figure 2) is an RTP end-point that optimizes the | A Mixer (Figure 2) is an RTP end-point that optimizes the | |||
transmission of media streams from certain perspectives, either by | transmission of RTP media streams from certain perspectives, either | |||
only sending some of the received media stream to any given receiver | by only sending some of the received RTP media stream to any given | |||
or by providing a combined media stream out of a set of contributing | receiver or by providing a combined RTP media stream out of a set of | |||
streams. There exist various methods of implementation as discussed | contributing streams. There are various methods of implementation as | |||
in Appendix A.3. A common aspect is that these central nodes a | discussed in Appendix A.3. A common aspect is that these central | |||
number of tools to control the media encoding provided by a WebRTC | nodes may use a number of tools to control the media encoding | |||
end-point. This includes functions like requesting breaking the | provided by a WebRTC end-point. This includes functions like | |||
encoding chain and have the encoder produce a so called Intra frame. | requesting breaking the encoding chain and have the encoder produce a | |||
Another is limiting the bit-rate of a given stream to better suit the | so called Intra frame. Another is limiting the bit-rate of a given | |||
mixer view of the multiple down-streams. Others are controling the | stream to better suit the mixer view of the multiple down-streams. | |||
most suitable frame-rate, picture resultion, the trade-off between | Others are controlling the most suitable frame-rate, picture | |||
frame-rate and spatial quality. | resolution, the trade-off between frame-rate and spatial quality. | |||
A mixer gets a significant responsibility to correctly perform | A mixer gets a significant responsibility to correctly perform | |||
congestion control, identity management, manage synchronization while | congestion control, source identification, manage synchronization | |||
providing a for the application suitable media optimization. | while providing the application with suitable media optimizations. | |||
Mixers also need to be a trusted node when it comes to security as it | Mixers also need to be trusted nodes when it comes to security as it | |||
manipulates either RTP or the media itself before sending it on | manipulates either RTP or the media itself before sending it on | |||
towards the end-point(s) thus must be able to decrypt and then | towards the end-point(s), thus they must be able to decrypt and then | |||
encrypt it before sending it out. There exist one type of central | encrypt it before sending it out. | |||
node, the relay that one doesn't need to trust with the keys to the | ||||
media. The relay operates only on the IP/UDP level of the transport. | ||||
It is configured so that it would forward any RTP/RTCP packets from A | ||||
to the other participants B-D. | ||||
+---+ +---+ | ||||
| | +-----------+ | | | ||||
| A |<------->| DTLS-SRTP |<------->| C | | ||||
| |<-- -->| HOST |<-- -->| | | ||||
+---+ \ / +-----------+ \ / +---+ | ||||
X X | ||||
+---+ / \ +-----------+ / \ +---+ | ||||
| |<-- -->| RTP |<-- -->| | | ||||
| B |<------->| RELAY |<------->| D | | ||||
| | +-----------+ | | | ||||
+---+ +---+ | ||||
Figure 3: DTLS-SRTP host and RTP Relay Separated | ||||
To accomplish the security properties discussed above using a relay | ||||
one need to have a separate key handling server and also support for | ||||
distribute the different keys such as Encrypted Key Transport | ||||
[I-D.ietf-avt-srtp-ekt]. The relay also creates a situation where | ||||
there is multiple end-points visible in the RTCP reporting and any | ||||
feedback events. Thus becoming yet another situation in addition to | ||||
Mesh where the end-point will have to have logic for merging | ||||
different requirements and preferences. This is more detail | ||||
discussed in Section 12.7. | ||||
+---+ +---+ +---+ | ||||
| A |--->| B |--->| C | | ||||
+---+ +---+ +---+ | ||||
Figure 4: MediaStream Forwarding | ||||
The above Figure 4 depicts a possible scenario where an WebRTC end- | ||||
point (A) sends a media stream to B. B decides to forward the media | ||||
stream to C. This can either be realized in B (WebRTC end-point) | ||||
using a simple relay functionality creating similar consideration and | ||||
implementation requirements. Another implmentation strategy in B | ||||
could be to select to transcode the media from A to C, thus breaking | ||||
most of the dependecies between A and C. In that case A is not | ||||
required to be aware of B forwarding the media to C. | ||||
12.4. SSRC Collision Detection | 12.4. SSRC Collision Detection | |||
The RTP standard [RFC3550] requires any RTP implementation to have | The RTP standard [RFC3550] requires any RTP implementation to have | |||
support for detecting and handling SSRC collisions, i.e. when two | support for detecting and handling SSRC collisions, i.e., resolve the | |||
different end-points uses the same SSRC value. This requirement | conflict when two different end-points use the same SSRC value. This | |||
applies also to WebRTC end-points. There exist several scenarios | requirement also applies to WebRTC end-points. There are several | |||
where SSRC collisions may occur. | scenarios where SSRC collisions may occur. | |||
In a point to point session where each SSRC are associated with | In a point-to-point session where each SSRC is associated with either | |||
either of the two end-points and where the main media carrying SSRC | of the two end-points and where the main media carrying SSRC | |||
identifier will be announced in the signalling there is less likely | identifier will be announced in the signalling channel, a collision | |||
to occur due to the information about used SSRCs provided by Source- | is less likely to occur due to the information about used SSRCs | |||
Specific SDP Attributes [RFC5576]. Still if both end-points starts | provided by Source-Specific SDP Attributes [RFC5576]. Still if both | |||
uses an new SSRC identifier prior to having signalled it to the peer | end-points start uses an new SSRC identifier prior to having | |||
and received acknowledgement on the signalling message there can be | signalled it to the peer and received acknowledgement on the | |||
collisions. The Source-Specific SDP Attributes [RFC5576] contains no | signalling message, there can be collisions. The Source-Specific SDP | |||
mechanism to resolve SSRC collisions or reject a end-points usage of | Attributes [RFC5576] contains no mechanism to resolve SSRC collisions | |||
an SSRC. | or reject a end-points usage of an SSRC. | |||
There could also appear unsignalled SSRCs, this may be considered a | There could also appear unsignalled SSRCs. This is more likely than | |||
bug. This is more likely than it appears as certain RTP | it appears as certain RTP functions need extra SSRCs to provide | |||
functionalities need extra SSRCs to provide functionality related to | functionality related to another (the "main") SSRC, for example, SSRC | |||
another SSRC, for example SSRC multiplexed RTP retransmission | multiplexed RTP retransmission [RFC4588]. In those cases, an end- | |||
[RFC4588]. In those cases an end-point can create a new SSRC which | point can create a new SSRC that strictly doesn't need to be | |||
strictly don't need to be announced over the signalling channel to | announced over the signalling channel to function correctly on both | |||
function correctly on both RTP and PeerConnection level. | RTP and PeerConnection level. | |||
The more likely cases for SSRC collision is that multiple end-points | The more likely case for SSRC collision is that multiple end-points | |||
in an multiparty creates new soruces and signalls those towards the | in a multiparty conference create new sources and signals those | |||
central server. In cases where the SSRC/CSRC are propogated between | towards the central server. In cases where the SSRC/CSRC are | |||
the different end-points from the central node collisions can occur. | propagated between the different end-points from the central node | |||
collisions can occur. | ||||
Another scenario is when the central node manage to connect an end- | Another scenario is when the central node manages to connect an end- | |||
points PeerConnection to another PeerConnectio the end-point it has. | point's PeerConnection to another PeerConnection the end-point | |||
Thus forming a loop where the end-point will receive its own traffic. | already has, thus forming a loop where the end-point will receive its | |||
This must be considered a bug, but still if it occurs it is important | own traffic. While is is clearly considered a bug, it is important | |||
that the end-point can handle the situation. | that the end-point is able to recognise and handle the case when it | |||
occurs. | ||||
12.5. Contributing Sources | 12.5. Contributing Sources | |||
Contributing Sources (CSRC) is a functionality in RTP header that | Contributing Sources (CSRC) is a functionality in the RTP header that | |||
enables a RTP node combing multiple sources into one to identify the | allows an RTP node to combine media packets from multiple sources | |||
sources that has gone into the combination. For WebRTC end-point the | into one and to identify which sources yielded the result. For | |||
support of contributing sources are trivial. The set of CSRC are | WebRTC end-points, supporting contributing sources is trivial. The | |||
provided for a given RTP packet. This information can then be | set of CSRCs is provided in a given RTP packet. This information can | |||
exposed towards the applications using some form of API, most likely | then be exposed to the applications using some form of API, possibly | |||
a mapping back into MediaStream identities to avoid having to expose | a mapping back into WebRTC MediaStream identities to avoid having to | |||
two namespaces and the handling of SSRC collision handling to the | expose two namespaces and the handling of SSRC collision handling to | |||
JavaScript. | the JavaScript. | |||
There are also at least one extension that is dependent on the CRSRC | (tbd: should the API provide the ability to add a CSRC list to an | |||
list being used, that is the Mixer to client audio level [RFC6465], | outgoing packet? this is only useful if the sender is mixing content) | |||
that enhances the information provided by the CSRC to actual energy | ||||
levels for audio for each contributing source. | There are also at least one extension that depends on the CRSRC list | |||
being used: the Mixer-to-client audio level [RFC6465], which enhances | ||||
the information provided by the CSRC to actual energy levels for | ||||
audio for each contributing source. | ||||
12.6. Media Synchronization | 12.6. Media Synchronization | |||
When an end-point has more than one media source being sent one need | When an end-point sends media from more than one media source, it | |||
to consider if these media source are to be synchronized. In RTP/ | needs to consider if (and which of) these media sources are to be | |||
RTCP synchronziation is provided by having a set of media streams be | synchronized. In RTP/RTCP, synchronisation is provided by having a | |||
indicated as comming from the same synchroniztion context and logical | set of RTP media streams be indicated as coming from the same | |||
end-point by using the same CNAME identifier. | synchronisation context and logical end-point by using the same CNAME | |||
identifier. | ||||
The next provision is that all media sources internal clock, i.e. | The next provision is that the internal clocks of all media sources, | |||
what drives the RTP timestamp can be correlated with a system clock | i.e., what drives the RTP timestamp, can be correlated to a system | |||
that is provided in RTCP Sender Reports encoded in an NTP format. By | clock that is provided in RTCP Sender Reports encoded in an NTP | |||
having the RTP timestamp to system clock being provided for all | format. By correlating all RTP timestamps to a common system clock | |||
sources the relation of the different media stream, also across | for all sources, the timing relation of the different RTP media | |||
multiple RTP sessions can if chosen to be synchronized. The | streams, also across multiple RTP sessions can be derived at the | |||
requirement is for the media sender to provide the information, the | receiver and, if desired, the streams can be synchronized. The | |||
receiver can chose to use it or not. | requirement is for the media sender to provide the correlation | |||
information; it is up to the receiver to use it or not. | ||||
12.7. Multiple RTP End-points | 12.7. Multiple RTP End-points | |||
A number of usages of RTP discussed here results in that an WebRTC | Some usages of RTP beyond the recommend topologies result in that an | |||
end-point sending media in an RTP session out over an PeerConnection | WebRTC end-point sending media in an RTP session out over a single | |||
will receive receiver reports from multiple RTP receiving nodes. | PeerConnection will receive receiver reports from multiple RTP | |||
Note that receiving multiple receiver reports are expected due to | receivers. Note that receiving multiple receiver reports is expected | |||
that any RTP node that has multiple SSRCs are required to report on | because any RTP node that has multiple SSRCs is required to report to | |||
the media sender. The difference here is that they are multiple | the media sender. The difference here is that they are multiple | |||
nodes, and thus will have different path characteristics. | nodes, and thus will likely have different path characteristics. | |||
The topologies relevant to WebRTC when this can occur are centralized | RTP Mixers may create a situation where an end-point experiences a | |||
relay and a end-point forwarding a media stream. Mixers are expected | situation in-between a session with only two end-points and multiple | |||
to not forward media stream reports across itself due to the | end-points. Mixers are expected to not forward RTCP reports | |||
difference in the media stream provided to different end-points which | regarding RTP media streams across themselves. This is due to the | |||
the original media source lacks information about the mixers | difference in the RTP media streams provided to the different end- | |||
manipulation. | points. The original media source lacks information about a mixer's | |||
manipulations prior to sending it the different receivers. This | ||||
setup also results in that an end-point's feedback or requests goes | ||||
to the mixer. When the mixer can't act on this by itself, it is | ||||
forced to go to the original media source to fulfill the receivers | ||||
request. This will not necessarily be explicitly visible any RTP and | ||||
RTCP traffic, but the interactions and the time to complete them will | ||||
indicate such dependencies. | ||||
Having multiple RTP nodes receive ones RTP flow and send reports and | The topologies in which an end-point receives receiver reports from | |||
feedback about it has several impacts. As previously discussed | multiple other end-points are the centralized relay, multicast and an | |||
(Section 12.3) any codec control and rate control needs to be capable | end-point forwarding an RTP media stream. Having multiple RTP nodes | |||
of merging the requirements and preferences to provide a single best | receive an RTP flow and send reports and feedback about it has | |||
according to the situation media stream. Specifically when it comes | several impacts. As previously discussed (Section 12.3) any codec | |||
to congestion control it needs to be capable of identifying the | control and rate control needs to be capable of merging the | |||
requirements and preferences to provide a single best encoding | ||||
according to the situation RTP media stream. Specifically, when it | ||||
comes to congestion control it needs to be capable of identifying the | ||||
different end-points to form independent congestion state information | different end-points to form independent congestion state information | |||
for each different path. | for each different path. | |||
Providing source authentication in multi-party is a challange. In | Providing source authentication in multi-party scenarios is a | |||
the mixer based topologies an end-points source authentication is | challenge. In the mixer-based topologies, end-points source | |||
based on verifying that media comes from the mixer by cryptographic | authentication is based on, firstly, verifying that media comes from | |||
verification and secondly trust the mixer to correctly identify any | the mixer by cryptographic verification and, secondly, trust in the | |||
source towards the end-point. In RTP sessions where multiple end- | mixer to correctly identify any source towards the end-point. In RTP | |||
points are directly visible to an end-point all end-points have | sessions where multiple end-points are directly visible to an end- | |||
knowledge about each others master keys, and can thus inject packets | point, all end-points will have knowledge about each others' master | |||
claimed to come from another end-point in the session. Any node | keys, and can thus inject packets claimed to come from another end- | |||
performing relay can perform non-cryptographic mitigation by | point in the session. Any node performing relay can perform non- | |||
preventing forwarding of packets that has SSRC fields that has | cryptographic mitigation by preventing forwarding of packets that | |||
previously come from other end-points. For cryptographic | have SSRC fields that came from other end-points before. For | |||
verification of the source SRTP will require additional security | cryptographic verification of the source SRTP would require | |||
mechanisms, like TESLA for SRTP [RFC4383]. | additional security mechanisms, like TESLA for SRTP [RFC4383]. | |||
12.8. Simulcast | 12.8. Simulcast | |||
This section discusses simulcast in the meaning of providing a node, | This section discusses simulcast in the meaning of providing a node, | |||
for example a Mixer, with multiple different encoded version of the | for example a Mixer, with multiple different encoded versions of the | |||
same media source. In the WebRTC context that appears to be most | same media source. In the WebRTC context, this could be accomplished | |||
easily accomplished by establishing mutliple PeerConnection all being | in two ways. One is to establish multiple PeerConnection all being | |||
feed the same set of MediaStreams. Each PeerConnection is then | feed the same set of WebRTC MediaStreams. Another method is to use | |||
configured to deliver a particular media quality and thus media bit- | multiple WebRTC MediaStreams that are differently configured when it | |||
rate. This will work well as long as the end-point implements | comes to the media parameters. This would result in that multiple | |||
indepdentent media encoding for each PeerConnection and not share the | different RTP Media Streams (SSRCs) being in used with different | |||
encoder. Simulcast will fail if the end-point uses a common encoder | encoding based on the same media source (camera, microphone). | |||
instance to multiple PeerConnections. | ||||
Thus it should be considered to explicitly signal which of the two | When intending to use simulcast it is important that this is made | |||
implementation strategies that are desired and which will be done. | explicit so that the end-points don't automatically try to optimize | |||
At least making the application and possible the central node | away the different encodings and provide a single common version. | |||
interested in receiving simulcast of an end-points media streams to | Thus, some explicit indications that the intent really is to have | |||
be aware if it will function or not. | different media encodings is likely required. It should be noted | |||
that it might be a central node, rather than an WebRTC end-point that | ||||
would benefit from receiving simulcasted media sources. | ||||
tbd: How to perform simulcast needs to be determined and the | ||||
appropriate API or signalling for its usage needs to be defined. | ||||
12.9. Differentiated Treatment of Flows | 12.9. Differentiated Treatment of Flows | |||
There exist use cases for differentiated treatment of media streams. | There are use cases for differentiated treatment of RTP media | |||
Such differentiation can happen at several places in the system. | streams. Such differentiation can happen at several places in the | |||
First of all is the prioritization within the end-point for which | system. First of all is the prioritization within the end-point | |||
media streams that should be sent, there allocation of bit-rate out | sending the media, which controls, both which RTP media streams that | |||
of the current available aggregate as determined by the congestion | will be sent, and their allocation of bit-rate out of the current | |||
control. | available aggregate as determined by the congestion control. | |||
Secondly, the transport can prioritize a media streams. This is done | Secondly, the network can prioritize packet flows, including RTP | |||
according to three methods; | media streams. Typically, differential treatment includes two steps, | |||
the first being identifying whether an IP packet belongs to a class | ||||
which should be treated differently, the second the actual mechanism | ||||
to prioritize packets. This is done according to three methods; | ||||
Diffserv: The end-point could mark the packet with a diffserv code | Diffserv: The end-point marks a packet with a diffserv code point to | |||
point to indicate to the network how the WebRTC application and | indicate to the network that the packet belongs to a particular | |||
browser would like this particular packet treated. | class. | |||
Flow based: Prioritization of all packets belonging to a particular | Flow based: Packets that shall be given a particular treatment are | |||
media flow or RTP session by keeping them in separated UDP flows. | identified using a combination of IP and port address. | |||
Thus enabling either end-point initiated or network initiated | ||||
prioritization of the flow. | ||||
Deep Packet Inspection: A network classifier (DPI) inspects the | Deep Packet Inspection: A network classifier (DPI) inspects the | |||
packet and tries to determine if the packet represents a | packet and tries to determine if the packet represents a | |||
particular application and type that is to be prioritized. | particular application and type that is to be prioritized. | |||
With the exception of diffserv both flow based and DPI have issues | With the exception of diffserv both flow based and DPI have issues | |||
with running multiple media types and flows on a single UDP flow, | with running multiple media types and flows on a single UDP flow, | |||
especially when combined with data transport (SCTP/DTLS). DPI has | especially when combined with data transport (SCTP/DTLS). DPI has | |||
issues due to that multiple different type of flows are aggregated | issues because multiple types of flows are aggregated and thus it | |||
and thus becomes more difficult to apply analysis on. The flow based | becomes more difficult to analyse them. The flow-based | |||
differentiation will provide the same treatment to all packets within | differentiation will provide the same treatment to all packets within | |||
the flow. Thus relative prioritization is not possible. In addition | the flow, i.e., relative prioritization is not possible. Moreover, | |||
if the resources are limited it may not be possible to provide | if the resources are limited it may not be possible to provide | |||
differential treatment compared to best-effort for all the flows in a | differential treatment compared to best-effort for all the flows in a | |||
WebRTC application. | WebRTC application. | |||
When flow based differentiation is available the WebRTC application | When flow-based differentiation is available the WebRTC application | |||
needs to know about so that it can provide the separation of the | needs to know about it so that it can provide the separation of the | |||
media streams onto different UDP flows to enable a more granular | RTP media streams onto different UDP flows to enable a more granular | |||
usage of flow based differentiation. | usage of flow based differentiation. | |||
Diffserv is based on that either the end-point or a classifier can | Diffserv assumes that either the end-point or a classifier can mark | |||
mark the packets with an appropriate DSCP so the packets is treated | the packets with an appropriate DSCP so that the packets are treated | |||
according to that marking. If the end-point is to mark the traffic | according to that marking. If the end-point is to mark the traffic | |||
there exist two requirements in the WebRTC context. The first is | two requirements arise in the WebRTC context: 1) The WebRTC | |||
that the WebRTC application or browser knows which DSCP to use and | application or browser has to know which DSCP to use and that it can | |||
that it can use them on some set of media streams. Secondly the | use them on some set of RTP media streams. 2) The information needs | |||
information needs to be propagated to the operating system when | to be propagated to the operating system when transmitting the | |||
transmitting the packet. | packet. | |||
Open Issue: How will the WebRTC application and/or browser know that | tbd: The model for providing differentiated treatment needs to be | |||
differentiated treatment is desired and available and ensure that it | evolved. This includes: | |||
gets the information required to correctly configure the WebRTC | ||||
multimedia conference. | 1. How the application can prioritize MediaStreamTracks differently | |||
in the API | ||||
2. How the browser or application determine availability of | ||||
transport differentiation | ||||
3. How to learn about any configuration information for transport | ||||
differentiation, such as DSCPs. | ||||
13. IANA Considerations | 13. IANA Considerations | |||
This memo makes no request of IANA. | This memo makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
14. Security Considerations | 14. Security Considerations | |||
RTP and its various extensions each have their own security | RTP and its various extensions each have their own security | |||
considerations. These should be taken into account when considering | considerations. These should be taken into account when considering | |||
the security properties of the complete suite. We currently don't | the security properties of the complete suite. We currently don't | |||
think this suite creates any additional security issues or | think this suite creates any additional security issues or | |||
properties. The use of SRTP [RFC3711] will provide protection or | properties. The use of SRTP [RFC3711] will provide protection or | |||
mitigation against all the fundamental issues by offering | mitigation against most of the fundamental issues by offering | |||
confidentiality, integrity and partial source authentication. A | confidentiality, integrity and partial source authentication. A | |||
mandatory to implement media security solution will be required to be | mandatory to implement media security solution will be required to be | |||
picked. We currently don't discuss the key-management aspect of SRTP | picked. We currently don't discuss the key-management aspect of SRTP | |||
in this memo, that needs to be done taking the WebRTC communication | in this memo, that needs to be done taking the WebRTC communication | |||
model into account. | model into account. | |||
The guidelines in [I-D.ietf-avtcore-srtp-vbr-audio] apply when using | Privacy concerns are under discussion and the generation of non- | |||
variable bit rate (VBR) audio codecs, for example Opus or the Mixer | trackable CNAMEs are under discussion. | |||
audio level header extensions. | ||||
The guidelines in [RFC6562] apply when using variable bit rate (VBR) | ||||
audio codecs, for example Opus or the Mixer audio level header | ||||
extensions. | ||||
Security considerations for the WebRTC work are discussed in | Security considerations for the WebRTC work are discussed in | |||
[I-D.ietf-rtcweb-security]. | [I-D.ietf-rtcweb-security]. | |||
15. Acknowledgements | 15. Acknowledgements | |||
The authors would like to thank Harald Alvestrand, Cary Bran, Charles | The authors would like to thank Harald Alvestrand, Cary Bran, Charles | |||
Eckel and Cullen Jennings for valuable feedback. | Eckel and Cullen Jennings for valuable feedback. | |||
16. References | 16. References | |||
skipping to change at page 32, line 47 ¶ | skipping to change at page 32, line 22 ¶ | |||
Using Session Description Protocol (SDP) Port Numbers", | Using Session Description Protocol (SDP) Port Numbers", | |||
draft-holmberg-mmusic-sdp-bundle-negotiation-00 (work in | draft-holmberg-mmusic-sdp-bundle-negotiation-00 (work in | |||
progress), October 2011. | progress), October 2011. | |||
[I-D.ietf-avtcore-srtp-encrypted-header-ext] | [I-D.ietf-avtcore-srtp-encrypted-header-ext] | |||
Lennox, J., "Encryption of Header Extensions in the Secure | Lennox, J., "Encryption of Header Extensions in the Secure | |||
Real-Time Transport Protocol (SRTP)", | Real-Time Transport Protocol (SRTP)", | |||
draft-ietf-avtcore-srtp-encrypted-header-ext-01 (work in | draft-ietf-avtcore-srtp-encrypted-header-ext-01 (work in | |||
progress), October 2011. | progress), October 2011. | |||
[I-D.ietf-avtcore-srtp-vbr-audio] | [I-D.ietf-avtext-multiple-clock-rates] | |||
Perkins, C. and J. Valin, "Guidelines for the use of | Petit-Huguenin, M. and G. Zorn, "Support for Multiple | |||
Variable Bit Rate Audio with Secure RTP", | Clock Rates in an RTP Session", | |||
draft-ietf-avtcore-srtp-vbr-audio-04 (work in progress), | draft-ietf-avtext-multiple-clock-rates-05 (work in | |||
December 2011. | progress), May 2012. | |||
[I-D.ietf-rtcweb-overview] | [I-D.ietf-rtcweb-overview] | |||
Alvestrand, H., "Overview: Real Time Protocols for Brower- | Alvestrand, H., "Overview: Real Time Protocols for Brower- | |||
based Applications", draft-ietf-rtcweb-overview-03 (work | based Applications", draft-ietf-rtcweb-overview-04 (work | |||
in progress), March 2012. | in progress), June 2012. | |||
[I-D.ietf-rtcweb-security] | [I-D.ietf-rtcweb-security] | |||
Rescorla, E., "Security Considerations for RTC-Web", | Rescorla, E., "Security Considerations for RTC-Web", | |||
draft-ietf-rtcweb-security-02 (work in progress), | draft-ietf-rtcweb-security-03 (work in progress), | |||
March 2012. | June 2012. | |||
[I-D.jesup-rtp-congestion-reqs] | ||||
Jesup, R. and H. Alvestrand, "Congestion Control | ||||
Requirements For Real Time Media", | ||||
draft-jesup-rtp-congestion-reqs-00 (work in progress), | ||||
March 2012. | ||||
[I-D.lennox-rtcweb-rtp-media-type-mux] | [I-D.lennox-rtcweb-rtp-media-type-mux] | |||
Rosenberg, J. and J. Lennox, "Multiplexing Multiple Media | Rosenberg, J. and J. Lennox, "Multiplexing Multiple Media | |||
Types In a Single Real-Time Transport Protocol (RTP) | Types In a Single Real-Time Transport Protocol (RTP) | |||
Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work | Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work | |||
in progress), October 2011. | in progress), October 2011. | |||
[I-D.perkins-avtcore-rtp-circuit-breakers] | [I-D.perkins-avtcore-rtp-circuit-breakers] | |||
Perkins, C. and V. Singh, "RTP Congestion Control: Circuit | Perkins, C. and V. Singh, "RTP Congestion Control: Circuit | |||
Breakers for Unicast Sessions", | Breakers for Unicast Sessions", | |||
draft-perkins-avtcore-rtp-circuit-breakers-00 (work in | draft-perkins-avtcore-rtp-circuit-breakers-00 (work in | |||
progress), March 2012. | progress), March 2012. | |||
[I-D.westerlund-avtcore-multiplex-architecture] | ||||
Westerlund, M., Burman, B., and C. Perkins, "RTP | ||||
Multiplexing Architecture", | ||||
draft-westerlund-avtcore-multiplex-architecture-01 (work | ||||
in progress), March 2012. | ||||
[I-D.westerlund-avtcore-transport-multiplexing] | [I-D.westerlund-avtcore-transport-multiplexing] | |||
Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a | Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a | |||
Single Lower-Layer Transport", | Single Lower-Layer Transport", | |||
draft-westerlund-avtcore-transport-multiplexing-02 (work | draft-westerlund-avtcore-transport-multiplexing-02 (work | |||
in progress), March 2012. | in progress), March 2012. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP | [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP | |||
skipping to change at page 34, line 37 ¶ | skipping to change at page 33, line 48 ¶ | |||
Hakenberg, "RTP Retransmission Payload Format", RFC 4588, | Hakenberg, "RTP Retransmission Payload Format", RFC 4588, | |||
July 2006. | July 2006. | |||
[RFC4961] Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)", | [RFC4961] Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)", | |||
BCP 131, RFC 4961, July 2007. | BCP 131, RFC 4961, July 2007. | |||
[RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, | [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, | |||
"Codec Control Messages in the RTP Audio-Visual Profile | "Codec Control Messages in the RTP Audio-Visual Profile | |||
with Feedback (AVPF)", RFC 5104, February 2008. | with Feedback (AVPF)", RFC 5104, February 2008. | |||
[RFC5109] Li, A., "RTP Payload Format for Generic Forward Error | ||||
Correction", RFC 5109, December 2007. | ||||
[RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for | [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for | |||
Real-time Transport Control Protocol (RTCP)-Based Feedback | Real-time Transport Control Protocol (RTCP)-Based Feedback | |||
(RTP/SAVPF)", RFC 5124, February 2008. | (RTP/SAVPF)", RFC 5124, February 2008. | |||
[RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP | [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP | |||
Header Extensions", RFC 5285, July 2008. | Header Extensions", RFC 5285, July 2008. | |||
[RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size | [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size | |||
Real-Time Transport Control Protocol (RTCP): Opportunities | Real-Time Transport Control Protocol (RTCP): Opportunities | |||
and Consequences", RFC 5506, April 2009. | and Consequences", RFC 5506, April 2009. | |||
skipping to change at page 35, line 24 ¶ | skipping to change at page 34, line 34 ¶ | |||
(CNAMEs)", RFC 6222, April 2011. | (CNAMEs)", RFC 6222, April 2011. | |||
[RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time | [RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time | |||
Transport Protocol (RTP) Header Extension for Client-to- | Transport Protocol (RTP) Header Extension for Client-to- | |||
Mixer Audio Level Indication", RFC 6464, December 2011. | Mixer Audio Level Indication", RFC 6464, December 2011. | |||
[RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time | [RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time | |||
Transport Protocol (RTP) Header Extension for Mixer-to- | Transport Protocol (RTP) Header Extension for Mixer-to- | |||
Client Audio Level Indication", RFC 6465, December 2011. | Client Audio Level Indication", RFC 6465, December 2011. | |||
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of | ||||
Variable Bit Rate Audio with Secure RTP", RFC 6562, | ||||
March 2012. | ||||
16.2. Informative References | 16.2. Informative References | |||
[I-D.alvestrand-rtcweb-msid] | [I-D.alvestrand-rtcweb-msid] | |||
Alvestrand, H., "Cross Session Stream Identification in | Alvestrand, H., "Cross Session Stream Identification in | |||
the Session Description Protocol", | the Session Description Protocol", | |||
draft-alvestrand-rtcweb-msid-02 (work in progress), | draft-alvestrand-rtcweb-msid-02 (work in progress), | |||
May 2012. | May 2012. | |||
[I-D.begen-mmusic-redundancy-grouping] | ||||
Begen, A., Cai, Y., and H. Ou, "Duplication Grouping | ||||
Semantics in the Session Description Protocol", | ||||
draft-begen-mmusic-redundancy-grouping-03 (work in | ||||
progress), March 2012. | ||||
[I-D.cbran-rtcweb-data] | ||||
Bran, C. and C. Jennings, "RTC-Web Non-Media Data | ||||
Transport Requirements", draft-cbran-rtcweb-data-00 (work | ||||
in progress), July 2011. | ||||
[I-D.ietf-avt-srtp-ekt] | [I-D.ietf-avt-srtp-ekt] | |||
Wing, D., McGrew, D., and K. Fischer, "Encrypted Key | Wing, D., McGrew, D., and K. Fischer, "Encrypted Key | |||
Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 | Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 | |||
(work in progress), October 2011. | (work in progress), October 2011. | |||
[I-D.ietf-fecframe-framework] | [I-D.ietf-rtcweb-use-cases-and-requirements] | |||
Watson, M., Begen, A., and V. Roca, "Forward Error | Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- | |||
Correction (FEC) Framework", | Time Communication Use-cases and Requirements", | |||
draft-ietf-fecframe-framework-15 (work in progress), | draft-ietf-rtcweb-use-cases-and-requirements-09 (work in | |||
June 2011. | progress), June 2012. | |||
[RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., | [I-D.jesup-rtp-congestion-reqs] | |||
Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- | Jesup, R. and H. Alvestrand, "Congestion Control | |||
Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, | Requirements For Real Time Media", | |||
September 1997. | draft-jesup-rtp-congestion-reqs-00 (work in progress), | |||
March 2012. | ||||
[I-D.westerlund-avtcore-multiplex-architecture] | ||||
Westerlund, M., Burman, B., and C. Perkins, "RTP | ||||
Multiplexing Architecture", | ||||
draft-westerlund-avtcore-multiplex-architecture-01 (work | ||||
in progress), March 2012. | ||||
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | |||
Control Protocol (DCCP) Congestion Control ID 2: TCP-like | Control Protocol (DCCP) Congestion Control ID 2: TCP-like | |||
Congestion Control", RFC 4341, March 2006. | Congestion Control", RFC 4341, March 2006. | |||
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | |||
Datagram Congestion Control Protocol (DCCP) Congestion | Datagram Congestion Control Protocol (DCCP) Congestion | |||
Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | |||
March 2006. | March 2006. | |||
[RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient | [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient | |||
Stream Loss-Tolerant Authentication (TESLA) in the Secure | Stream Loss-Tolerant Authentication (TESLA) in the Secure | |||
Real-time Transport Protocol (SRTP)", RFC 4383, | Real-time Transport Protocol (SRTP)", RFC 4383, | |||
February 2006. | February 2006. | |||
[RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | |||
(TFRC): The Small-Packet (SP) Variant", RFC 4828, | (TFRC): The Small-Packet (SP) Variant", RFC 4828, | |||
April 2007. | April 2007. | |||
[RFC4867] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie, | ||||
"RTP Payload Format and File Storage Format for the | ||||
Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband | ||||
(AMR-WB) Audio Codecs", RFC 4867, April 2007. | ||||
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, | [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, | |||
January 2008. | January 2008. | |||
[RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP | [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP | |||
Friendly Rate Control (TFRC): Protocol Specification", | Friendly Rate Control (TFRC): Protocol Specification", | |||
RFC 5348, September 2008. | RFC 5348, September 2008. | |||
[RFC5404] Westerlund, M. and I. Johansson, "RTP Payload Format for | ||||
G.719", RFC 5404, January 2009. | ||||
[RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific | [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific | |||
Media Attributes in the Session Description Protocol | Media Attributes in the Session Description Protocol | |||
(SDP)", RFC 5576, June 2009. | (SDP)", RFC 5576, June 2009. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
[RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP | [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP | |||
Control Protocol (RTCP)", RFC 5968, September 2010. | Control Protocol (RTCP)", RFC 5968, September 2010. | |||
skipping to change at page 37, line 20 ¶ | skipping to change at page 36, line 23 ¶ | |||
RTP supports both unicast and group communication, with participants | RTP supports both unicast and group communication, with participants | |||
being connected using wide range of transport-layer topologies. Some | being connected using wide range of transport-layer topologies. Some | |||
of these topologies involve only the end-points, while others use RTP | of these topologies involve only the end-points, while others use RTP | |||
translators and mixers to provide in-network processing. Properties | translators and mixers to provide in-network processing. Properties | |||
of some RTP topologies are discussed in [RFC5117], and we further | of some RTP topologies are discussed in [RFC5117], and we further | |||
describe those expected to be useful for WebRTC in the following. We | describe those expected to be useful for WebRTC in the following. We | |||
also goes into important RTP session aspects that the topology or | also goes into important RTP session aspects that the topology or | |||
implementation variant can place on a WebRTC end-point. | implementation variant can place on a WebRTC end-point. | |||
This section includes RTP topologies beyond the recommended ones. | ||||
This in an attempt to highlight the differencies and the in many case | ||||
small differences in implementation to support a larger set of | ||||
possible topologies. | ||||
A.1. Point to Point | A.1. Point to Point | |||
The point-to-point RTP topology (Figure 5) is the simplest scenario | The point-to-point RTP topology (Figure 3) is the simplest scenario | |||
for WebRTC applications. This is going to be very common for user to | for WebRTC applications. This is going to be very common for user to | |||
user calls. | user calls. | |||
+---+ +---+ | +---+ +---+ | |||
| A |<------->| B | | | A |<------->| B | | |||
+---+ +---+ | +---+ +---+ | |||
Figure 5: Point to Point | Figure 3: Point to Point | |||
This being the basic one lets use the topology to high-light a couple | This being the basic one lets use the topology to high-light a couple | |||
of details that are common for all RTP usage in the WebRTC context. | of details that are common for all RTP usage in the WebRTC context. | |||
First is the intention to multiplex RTP and RTCP over the same UDP- | First is the intention to multiplex RTP and RTCP over the same UDP- | |||
flow. Secondly is the question of using only a single RTP session or | flow. Secondly is the question of using only a single RTP session or | |||
one per media type for legacy interoperability. Thirdly is the | one per media type for legacy interoperability. Thirdly is the | |||
question of using multiple sender sources (SSRCs) per end-point. | question of using multiple sender sources (SSRCs) per end-point. | |||
Historically, RTP and RTCP have been run on separate UDP ports. With | Historically, RTP and RTCP have been run on separate UDP ports. With | |||
the increased use of Network Address/Port Translation (NAPT) this has | the increased use of Network Address/Port Translation (NAPT) this has | |||
skipping to change at page 38, line 13 ¶ | skipping to change at page 37, line 21 ¶ | |||
(e.g., audio and video), then each type media can be sent as a | (e.g., audio and video), then each type media can be sent as a | |||
separate RTP session using a different 5-tuple, allowing for separate | separate RTP session using a different 5-tuple, allowing for separate | |||
transport level treatment of each type of media. Alternatively, all | transport level treatment of each type of media. Alternatively, all | |||
types of media can be multiplexed onto a single 5-tuple as a single | types of media can be multiplexed onto a single 5-tuple as a single | |||
RTP session, or as several RTP sessions if using a demultiplexing | RTP session, or as several RTP sessions if using a demultiplexing | |||
shim. Multiplexing different types of media onto a single 5-tuple | shim. Multiplexing different types of media onto a single 5-tuple | |||
places some limitations on how RTP is used, as described in "RTP | places some limitations on how RTP is used, as described in "RTP | |||
Multiplexing Architecture" | Multiplexing Architecture" | |||
[I-D.westerlund-avtcore-multiplex-architecture]. It is not expected | [I-D.westerlund-avtcore-multiplex-architecture]. It is not expected | |||
that these limitations will significantly affect the scenarios | that these limitations will significantly affect the scenarios | |||
targetted by WebRTC, but they may impact interoperability with legacy | targeted by WebRTC, but they may impact interoperability with legacy | |||
systems. | systems. | |||
An RTP session have good support for simultanously transport multiple | An RTP session have good support for simultanously transport multiple | |||
media sources. Each media source uses an unique SSRC identifier and | media sources. Each media source uses an unique SSRC identifier and | |||
each SSRC has independent RTP sequence number and timestamp spaces. | each SSRC has independent RTP sequence number and timestamp spaces. | |||
This is being utilized in WebRTC for several cases. One is to enable | This is being utilized in WebRTC for several cases. One is to enable | |||
multiple media sources of the same type, an end-point that has two | multiple media sources of the same type, an end-point that has two | |||
video cameras can potentially transmitt video from both to its | video cameras can potentially transmitt video from both to its | |||
peer(s). Another usage is when a single RTP session is being used | peer(s). Another usage is when a single RTP session is being used | |||
for both multiple media types, thus an end-point can transmit both | for both multiple media types, thus an end-point can transmit both | |||
skipping to change at page 39, line 28 ¶ | skipping to change at page 38, line 28 ¶ | |||
| | | | +-Video-| |-Video-+ | | | | | | | | | +-Video-| |-Video-+ | | | | | |||
| | | | | AV1|---------------->| | | | | | | | | | | | AV1|---------------->| | | | | | | |||
| | | | | AV2|---------------->| | | | | | | | | | | | AV2|---------------->| | | | | | | |||
| | | | | |<----------------|BV1 | | | | | | | | | | | |<----------------|BV1 | | | | | | |||
| | | | +-------| |-------+ | | | | | | | | | +-------| |-------+ | | | | | |||
| | | +---------| |---------+ | | | | | | | +---------| |---------+ | | | | |||
| | +-----------| |-----------+ | | | | | +-----------| |-----------+ | | | |||
| +-------------| |-------------+ | | | +-------------| |-------------+ | | |||
+---------------+ +---------------+ | +---------------+ +---------------+ | |||
Figure 6: Point to Point: Multiple RTP sessions | Figure 4: Point to Point: Multiple RTP sessions | |||
As can be seen above in the Point to Point: Multiple RTP sessions | As can be seen above in the Point to Point: Multiple RTP sessions | |||
(Figure 6) the single Peer Connection contains two RTP sessions over | (Figure 4) the single Peer Connection contains two RTP sessions over | |||
different UDP flows UDP 1 and UDP 2, i.e. their 5-tuples will be | different UDP flows UDP 1 and UDP 2, i.e. their 5-tuples will be | |||
different, normally on source and destination ports. The first RTP | different, normally on source and destination ports. The first RTP | |||
session (RTP1) carries audio, one stream in each direction AA1 and | session (RTP1) carries audio, one stream in each direction AA1 and | |||
BA1. The second RTP session contains two video streams from A (AV1 | BA1. The second RTP session contains two video streams from A (AV1 | |||
and AV2) and one from B to A (BV1). | and AV2) and one from B to A (BV1). | |||
+-A-------------+ +-B-------------+ | +-A-------------+ +-B-------------+ | |||
| +-PeerC1------| |-PeerC1------+ | | | +-PeerC1------| |-PeerC1------+ | | |||
| | +-UDP1------| |-UDP1------+ | | | | | +-UDP1------| |-UDP1------+ | | | |||
| | | +-RTP1----| |-RTP1----+ | | | | | | | +-RTP1----| |-RTP1----+ | | | | |||
skipping to change at page 40, line 24 ¶ | skipping to change at page 39, line 24 ¶ | |||
| | | | +-Video-| |-Video-+ | | | | | | | | | +-Video-| |-Video-+ | | | | | |||
| | | | | AV1|---------------->| | | | | | | | | | | | AV1|---------------->| | | | | | | |||
| | | | | AV2|---------------->| | | | | | | | | | | | AV2|---------------->| | | | | | | |||
| | | | | |<----------------|BV1 | | | | | | | | | | | |<----------------|BV1 | | | | | | |||
| | | | +-------| |-------+ | | | | | | | | | +-------| |-------+ | | | | | |||
| | | +---------| |---------+ | | | | | | | +---------| |---------+ | | | | |||
| | +-----------| |-----------+ | | | | | +-----------| |-----------+ | | | |||
| +-------------| |-------------+ | | | +-------------| |-------------+ | | |||
+---------------+ +---------------+ | +---------------+ +---------------+ | |||
Figure 7: Point to Point: Single RTP session. | Figure 5: Point to Point: Single RTP session. | |||
In (Figure 7) there is only a single UDP flow and RTP session (RTP1). | In (Figure 5) there is only a single UDP flow and RTP session (RTP1). | |||
This RTP session carries a total of five (5) media streams (SSRCs). | This RTP session carries a total of five (5) RTP media streams | |||
From A to B there is Audio (AA1) and two video (AV1 and AV2). From B | (SSRCs). From A to B there is Audio (AA1) and two video (AV1 and | |||
to A there is Audio (BA1) and Video (BV1). | AV2). From B to A there is Audio (BA1) and Video (BV1). | |||
A.2. Multi-Unicast (Mesh) | A.2. Multi-Unicast (Mesh) | |||
For small multiparty calls, it is practical to set up a multi-unicast | For small multiparty calls, it is practical to set up a multi-unicast | |||
topology (Figure 8); unfortunately not discussed in the RTP | topology (Figure 6); unfortunately not discussed in the RTP | |||
Topologies RFC [RFC5117]. In this topology, each participant sends | Topologies RFC [RFC5117]. In this topology, each participant sends | |||
individual unicast RTP/UDP/IP flows to each of the other participants | individual unicast RTP/UDP/IP flows to each of the other participants | |||
using independent PeerConnections in a full mesh. | using independent PeerConnections in a full mesh. | |||
+---+ +---+ | +---+ +---+ | |||
| A |<---->| B | | | A |<---->| B | | |||
+---+ +---+ | +---+ +---+ | |||
^ ^ | ^ ^ | |||
\ / | \ / | |||
\ / | \ / | |||
v v | v v | |||
+---+ | +---+ | |||
| C | | | C | | |||
+---+ | +---+ | |||
Figure 8: Multi-unicast | Figure 6: Multi-unicast | |||
This topology has the benefit of not requiring central nodes. The | This topology has the benefit of not requiring central nodes. The | |||
downside is that it increases the used bandwidth at each sender by | downside is that it increases the used bandwidth at each sender by | |||
requiring one copy of the media streams for each participant that are | requiring one copy of the RTP media streams for each participant that | |||
part of the same session beyond the sender itself. Hence, this | are part of the same session beyond the sender itself. Hence, this | |||
topology is limited to scenarios with few participants unless the | topology is limited to scenarios with few participants unless the | |||
media is very low bandwidth. The multi-unicast topology could be | media is very low bandwidth. The multi-unicast topology could be | |||
implemented as a single RTP session, spanning multiple peer-to-peer | implemented as a single RTP session, spanning multiple peer-to-peer | |||
transport layer connections, or as several pairwise RTP sessions, one | transport layer connections, or as several pairwise RTP sessions, one | |||
between each pair of peers. To maintain a coherent mapping between | between each pair of peers. To maintain a coherent mapping between | |||
the relation between RTP sessions and PeerConnections we recommend | the relation between RTP sessions and PeerConnections we recommend | |||
that one implements this as individual RTP sessions. The only | that one implements this as individual RTP sessions. The only | |||
downside is that end-point A will not learn of the quality of any | downside is that end-point A will not learn of the quality of any | |||
transmission happening between B and C based on RTCP. This has not | transmission happening between B and C based on RTCP. This has not | |||
been seen as a significant downside as now one has yet seen a need | been seen as a significant downside as now one has yet seen a need | |||
skipping to change at page 41, line 49 ¶ | skipping to change at page 40, line 49 ¶ | |||
| | | | +-RTP2----| |-RTP2----+ | | | | | | | | +-RTP2----| |-RTP2----+ | | | | |||
| | +----+ | | | +-Audio-| |-Audio-+ | | | | | | | +----+ | | | +-Audio-| |-Audio-+ | | | | | |||
| +->|ENC2|--+-+-+-+--->AA2|------------->| | | | | | | | +->|ENC2|--+-+-+-+--->AA2|------------->| | | | | | | |||
| +----+ | | | | |<-------------|CA1 | | | | | | | +----+ | | | | |<-------------|CA1 | | | | | | |||
| | | | +-------| |-------+ | | | | | | | | | +-------| |-------+ | | | | | |||
| | | +---------| |---------+ | | | | | | | +---------| |---------+ | | | | |||
| | +-----------| |-----------+ | | | | | +-----------| |-----------+ | | | |||
| +-------------| |-------------+ | | | +-------------| |-------------+ | | |||
+--------------------------+ +---------------+ | +--------------------------+ +---------------+ | |||
Figure 9: Session strcuture for Multi-Unicast Setup | Figure 7: Session structure for Multi-Unicast Setup | |||
Lets review how the RTP sessions looks from A's perspective by | Lets review how the RTP sessions looks from A's perspective by | |||
considering both how the media is a handled and what PeerConnections | considering both how the media is a handled and what PeerConnections | |||
and RTP sessions that are setup in Figure 9. A's microphone is | and RTP sessions that are setup in Figure 7. A's microphone is | |||
captured and the digital audio can then be feed into two different | captured and the digital audio can then be feed into two different | |||
encoder instances each beeing associated with two different | encoder instances each beeing associated with two different | |||
PeerConnections (PeerC1 and PeerC2) each containing independent RTP | PeerConnections (PeerC1 and PeerC2) each containing independent RTP | |||
sessions (RTP1 and RTP2). The SSRCs in each RTP session will be | sessions (RTP1 and RTP2). The SSRCs in each RTP session will be | |||
completely independent and the media bit-rate produced by the encoder | completely independent and the media bit-rate produced by the encoder | |||
can also be tuned to address any congestion control requirements | can also be tuned to address any congestion control requirements | |||
between A and B differently then for the path A to C. | between A and B differently then for the path A to C. | |||
For media encodings which are more resource consuming, like video, | For media encodings which are more resource consuming, like video, | |||
one could expect that it will be common that end-points that are | one could expect that it will be common that end-points that are | |||
resource costrained will use a different implementation strategy | resource costrained will use a different implementation strategy | |||
where the encoder is shared between the different PeerConnections as | where the encoder is shared between the different PeerConnections as | |||
shown below Figure 10. | shown below Figure 8. | |||
+-A----------------------+ +-B-------------+ | +-A----------------------+ +-B-------------+ | |||
|+---+ | | | | |+---+ | | | | |||
||CAM| +-PeerC1------| |-PeerC1------+ | | ||CAM| +-PeerC1------| |-PeerC1------+ | | |||
|+---+ | +-UDP1------| |-UDP1------+ | | | |+---+ | +-UDP1------| |-UDP1------+ | | | |||
| | | | +-RTP1----| |-RTP1----+ | | | | | | | | +-RTP1----| |-RTP1----+ | | | | |||
| V | | | +-Video-| |-Video-+ | | | | | | V | | | +-Video-| |-Video-+ | | | | | |||
|+----+ | | | | |<----------------|BV1 | | | | | | |+----+ | | | | |<----------------|BV1 | | | | | | |||
||ENC |----+-+-+-+--->AV1|---------------->| | | | | | | ||ENC |----+-+-+-+--->AV1|---------------->| | | | | | | |||
|+----+ | | | +-------| |-------+ | | | | | |+----+ | | | +-------| |-------+ | | | | | |||
| | | | +---------| |---------+ | | | | | | | | +---------| |---------+ | | | | |||
skipping to change at page 42, line 46 ¶ | skipping to change at page 41, line 46 ¶ | |||
| | | | +-RTP2----| |-RTP2----+ | | | | | | | | +-RTP2----| |-RTP2----+ | | | | |||
| | | | | +-Video-| |-Video-+ | | | | | | | | | | +-Video-| |-Video-+ | | | | | |||
| +-------+-+-+-+--->AV2|---------------->| | | | | | | | +-------+-+-+-+--->AV2|---------------->| | | | | | | |||
| | | | | |<----------------|CV1 | | | | | | | | | | | |<----------------|CV1 | | | | | | |||
| | | | +-------| |-------+ | | | | | | | | | +-------| |-------+ | | | | | |||
| | | +---------| |---------+ | | | | | | | +---------| |---------+ | | | | |||
| | +-----------| |-----------+ | | | | | +-----------| |-----------+ | | | |||
| +-------------| |-------------+ | | | +-------------| |-------------+ | | |||
+------------------------+ +---------------+ | +------------------------+ +---------------+ | |||
Figure 10: Single Encoder Multi-Unicast Setup | Figure 8: Single Encoder Multi-Unicast Setup | |||
This will clearly save resources consumed by encoding but does | This will clearly save resources consumed by encoding but does | |||
introduce the need for the end-point A to make decisions on how it | introduce the need for the end-point A to make decisions on how it | |||
encodes the media so it suites delivery to both B and C. This is not | encodes the media so it suites delivery to both B and C. This is not | |||
limited to congestion control, also prefered resolution to receive | limited to congestion control, also prefered resolution to receive | |||
based on dispaly area available is another aspect requiring | based on dispaly area available is another aspect requiring | |||
consideration. The need for this type of descion logic does arise in | consideration. The need for this type of descion logic does arise in | |||
several different topologies and implementation. | several different topologies and implementation. | |||
A.3. Mixer Based | A.3. Mixer Based | |||
An mixer (Figure 11) is a centralised point that selects or mixes | An mixer (Figure 9) is a centralised point that selects or mixes | |||
content in a conference to optimise the RTP session so that each end- | content in a conference to optimise the RTP session so that each end- | |||
point only needs connect to one entity, the mixer. The mixer can | point only needs connect to one entity, the mixer. The mixer can | |||
also reduce the bit-rate needed from the mixer down to a conference | also reduce the bit-rate needed from the mixer down to a conference | |||
participants as the media sent from the mixer to the end-point can be | participants as the media sent from the mixer to the end-point can be | |||
optimised in different ways. These optimisations include methods | optimised in different ways. These optimisations include methods | |||
like only choosing media from the currently most active speaker or | like only choosing media from the currently most active speaker or | |||
mixing together audio so that only one audio stream is required in | mixing together audio so that only one audio stream is required in | |||
stead of 3 in the depicted scenario (Figure 11). | stead of 3 in the depicted scenario (Figure 9). | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| Mixer | | | Mixer | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 11: RTP Mixer with Only Unicast Paths | Figure 9: RTP Mixer with Only Unicast Paths | |||
Mixers has two downsides, the first is that the mixer must be a | Mixers has two downsides, the first is that the mixer must be a | |||
trusted node as they either performs media operations or at least | trusted node as they either performs media operations or at least | |||
repacketize the media. Both type of operations requires when using | repacketize the media. Both type of operations requires when using | |||
SRTP that the mixer verifies integrity, decrypts the content, perform | SRTP that the mixer verifies integrity, decrypts the content, perform | |||
its operation and form new RTP packets, encrypts and integegrity | its operation and form new RTP packets, encrypts and integegrity | |||
protect them. This applies to all types of mixers described below. | protect them. This applies to all types of mixers described below. | |||
The second downside is that all these operations and optimization of | The second downside is that all these operations and optimization of | |||
the session requires processing. How much depends on the | the session requires processing. How much depends on the | |||
implementation as will become evident below. | implementation as will become evident below. | |||
The implementation of an mixer can take several different forms and | The implementation of an mixer can take several different forms and | |||
we will discuss the main themes available that doesn't break RTP. | we will discuss the main themes available that doesn't break RTP. | |||
Please note that a Mixer could also contain translator | Please note that a Mixer could also contain translator | |||
functionalities, like a media transcoder to adjust the media bit-rate | functionalities, like a media transcoder to adjust the media bit-rate | |||
or codec used on a particular media stream. | or codec used on a particular RTP media stream. | |||
A.3.1. Media Mixing | A.3.1. Media Mixing | |||
This type of mixer is one which clearly can be called RTP mixer is | This type of mixer is one which clearly can be called RTP mixer is | |||
likely the one that most thinks of when they hear the term mixer. | likely the one that most thinks of when they hear the term mixer. | |||
Its basic patter of operation is that it will receive the different | Its basic patter of operation is that it will receive the different | |||
participants media stream. Select which that are to be included in a | participants RTP media stream. Select which that are to be included | |||
media domain mix of the incomming media streams. Then create a | in a media domain mix of the incomming RTP media streams. Then | |||
single outgoing stream from this mix. | create a single outgoing stream from this mix. | |||
Audio mixing is straight forward and commonly possible to do for a | Audio mixing is straight forward and commonly possible to do for a | |||
number of participants. Lets assume that you want to mix N number of | number of participants. Lets assume that you want to mix N number of | |||
streams from different participants. Then the mixer need to perform | streams from different participants. Then the mixer need to perform | |||
N decodings. Then it needs to produce N or N+1 mixes, the reasons | N decodings. Then it needs to produce N or N+1 mixes, the reasons | |||
that different mixes are needed are so that each contributing source | that different mixes are needed are so that each contributing source | |||
get a mix which don't contain themselves, as this would result in an | get a mix which don't contain themselves, as this would result in an | |||
echo. When N is lower than the number of all participants one may | echo. When N is lower than the number of all participants one may | |||
produce a Mix of all N streams for the group that are curently not | produce a Mix of all N streams for the group that are curently not | |||
included in the mix, thus N+1 mixes. These audio streams are then | included in the mix, thus N+1 mixes. These audio streams are then | |||
skipping to change at page 44, line 33 ¶ | skipping to change at page 43, line 33 ¶ | |||
video streams can be done. In fact it can be done in a number of | video streams can be done. In fact it can be done in a number of | |||
ways, tiling the different streams creating a chessboard, selecting | ways, tiling the different streams creating a chessboard, selecting | |||
someone as more important and showing them large and a number of | someone as more important and showing them large and a number of | |||
other sources as smaller is another. Also here one commonly need to | other sources as smaller is another. Also here one commonly need to | |||
produce a number of different compositions so that the contributing | produce a number of different compositions so that the contributing | |||
part doesn't need to see themselves. Then the mixer re-encodes the | part doesn't need to see themselves. Then the mixer re-encodes the | |||
created video stream, RTP packetize it and send it out | created video stream, RTP packetize it and send it out | |||
The problem with media mixing is that it both consume large amount of | The problem with media mixing is that it both consume large amount of | |||
media processing and encoding resources. The second is the quality | media processing and encoding resources. The second is the quality | |||
degradation created by decoding and re-encoding the media stream. | degradation created by decoding and re-encoding the RTP media stream. | |||
Its advantage is that it is quite simplistic for the clients to | Its advantage is that it is quite simplistic for the clients to | |||
handle as they don't need to handle local mixing and composition. | handle as they don't need to handle local mixing and composition. | |||
+-A-------------+ +-MIXER--------------------------+ | +-A-------------+ +-MIXER--------------------------+ | |||
| +-PeerC1------| |-PeerC1--------+ | | | +-PeerC1------| |-PeerC1--------+ | | |||
| | +-UDP1------| |-UDP1--------+ | | | | | +-UDP1------| |-UDP1--------+ | | | |||
| | | +-RTP1----| |-RTP1------+ | | +-----+ | | | | | +-RTP1----| |-RTP1------+ | | +-----+ | | |||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | | | | | +-Audio-| |-Audio---+ | | | +---+ | | | | |||
| | | | | AA1|------------>|---------+-+-+-+-|DEC|->| | | | | | | | | AA1|------------>|---------+-+-+-+-|DEC|->| | | | |||
| | | | | |<------------|MA1 <----+ | | | +---+ | | | | | | | | | |<------------|MA1 <----+ | | | +---+ | | | | |||
skipping to change at page 45, line 46 ¶ | skipping to change at page 44, line 46 ¶ | |||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | | | | | +-Audio-| |-Audio---+ | | | +---+ | | | | |||
| | | | | CA1|------------>|---------+-+-+-+-|DEC|->| | | | | | | | | CA1|------------>|---------+-+-+-+-|DEC|->| | | | |||
| | | | | |<------------|MA3 <----+ | | | +---+ | | | | | | | | | |<------------|MA3 <----+ | | | +---+ | | | | |||
| | | | +-------| |(BA1+CA1)|\| | | +---+ | | | | | | | | +-------| |(BA1+CA1)|\| | | +---+ | | | | |||
| | | +---------| |---------+ +-+-+-|ENC|<-| A+B | | | | | | +---------| |---------+ +-+-+-|ENC|<-| A+B | | | |||
| | +-----------| |-----------+ | | +---+ | | | | | | +-----------| |-----------+ | | +---+ | | | | |||
| +-------------| |-------------+ | +-----+ | | | +-------------| |-------------+ | +-----+ | | |||
+---------------+ |---------------+ | | +---------------+ |---------------+ | | |||
+--------------------------------+ | +--------------------------------+ | |||
Figure 12: Session and SSRC details for Media Mixer | Figure 10: Session and SSRC details for Media Mixer | |||
From an RTP perspective media mixing can be very straight forward as | From an RTP perspective media mixing can be very straight forward as | |||
can be seen in Figure 12. The mixer present one SSRC towards the | can be seen in Figure 10. The mixer present one SSRC towards the | |||
peer client, e.g. MA1 to Peer A, which is the media mix of the other | peer client, e.g. MA1 to Peer A, which is the media mix of the other | |||
particpants. As each peer receives a different version produced by | particpants. As each peer receives a different version produced by | |||
the mixer there are no actual relation between the different RTP | the mixer there are no actual relation between the different RTP | |||
sessions in the actual media or the transport level information. | sessions in the actual media or the transport level information. | |||
There is however one connection between RTP1-RTP3 in this figure. It | There is however one connection between RTP1-RTP3 in this figure. It | |||
has to do with the SSRC space and the identity information. When A | has to do with the SSRC space and the identity information. When A | |||
receives the MA1 stream which is a combination of BA1 and CA1 streams | receives the MA1 stream which is a combination of BA1 and CA1 streams | |||
in the other PeerConnections RTP could enable the mixer to include | in the other PeerConnections RTP could enable the mixer to include | |||
CSRC information in the MA1 stream to identify the contributing | CSRC information in the MA1 stream to identify the contributing | |||
source BA1 and CA1. | source BA1 and CA1. | |||
skipping to change at page 46, line 28 ¶ | skipping to change at page 45, line 28 ¶ | |||
the different legs. For the above situation commonly nothing more | the different legs. For the above situation commonly nothing more | |||
than the Source Description (SDES) information and RTCP BYE for CSRC | than the Source Description (SDES) information and RTCP BYE for CSRC | |||
need to be exposed. The main goal would be to enable the correct | need to be exposed. The main goal would be to enable the correct | |||
binding against the application logic and other information sources. | binding against the application logic and other information sources. | |||
This also enables loop detection in the RTP session. | This also enables loop detection in the RTP session. | |||
A.3.1.1. RTP Session Termination | A.3.1.1. RTP Session Termination | |||
There exist an possible implementation choice to have the RTP | There exist an possible implementation choice to have the RTP | |||
sessions being separated between the different legs in the multi- | sessions being separated between the different legs in the multi- | |||
party communication session and only generate media streams in each | party communication session and only generate RTP media streams in | |||
without carrying on RTP/RTCP level any identity information about the | each without carrying on RTP/RTCP level any identity information | |||
contributing sources. This removes both the functionaltiy that CSRC | about the contributing sources. This removes both the functionaltiy | |||
can provide and the possibility to use any extensions that build on | that CSRC can provide and the possibility to use any extensions that | |||
CSRC and the loop detection. It may appear a simplification if SSRC | build on CSRC and the loop detection. It may appear a simplification | |||
collision would occur between two different end-points as they can be | if SSRC collision would occur between two different end-points as | |||
avoide to be resolved and instead remapped between the independent | they can be avoide to be resolved and instead remapped between the | |||
sessions if at all exposed. However, SSRC/CSRC remapping | independent sessions if at all exposed. However, SSRC/CSRC remapping | |||
requiresthat SSRC/CSRC are never exposed to the WebRTC javascript | requiresthat SSRC/CSRC are never exposed to the WebRTC javascript | |||
client to use as reference. This as they only have local importance | client to use as reference. This as they only have local importance | |||
if they are used on a multi-party session scope the result would be | if they are used on a multi-party session scope the result would be | |||
missreferencing. Also SSRC collision handling will still be needed | missreferencing. Also SSRC collision handling will still be needed | |||
as it may occur between the mixer and the end-point. | as it may occur between the mixer and the end-point. | |||
Session termination may appear to resolve some issues, it however | Session termination may appear to resolve some issues, it however | |||
creates other issues that needs resolving, like loop detection, | creates other issues that needs resolving, like loop detection, | |||
identification of contributing sources and the need to handle mapped | identification of contributing sources and the need to handle mapped | |||
identities and ensure that the right one is used towards the right | identities and ensure that the right one is used towards the right | |||
skipping to change at page 47, line 9 ¶ | skipping to change at page 46, line 9 ¶ | |||
A.3.2. Media Switching | A.3.2. Media Switching | |||
An RTP Mixer based on media switching avoids the media decoding and | An RTP Mixer based on media switching avoids the media decoding and | |||
encoding cycle in the mixer, but not the decryption and re-encryption | encoding cycle in the mixer, but not the decryption and re-encryption | |||
cycle as one rewrites RTP headers. This both reduces the amount of | cycle as one rewrites RTP headers. This both reduces the amount of | |||
computational resources needed in the mixer and increases the media | computational resources needed in the mixer and increases the media | |||
quality per transmitted bit. This is achieve by letting the mixer | quality per transmitted bit. This is achieve by letting the mixer | |||
have a number of SSRCs that represents conceptual or functional | have a number of SSRCs that represents conceptual or functional | |||
streams the mixer produces. These streams are created by selecting | streams the mixer produces. These streams are created by selecting | |||
media from one of the by the mixer received media streams and forward | media from one of the by the mixer received RTP media streams and | |||
the media using the mixers own SSRCs. The mixer can then switch | forward the media using the mixers own SSRCs. The mixer can then | |||
between available sources if that is required by the concept for the | switch between available sources if that is required by the concept | |||
source, like currently active speaker. | for the source, like currently active speaker. | |||
To achieve a coherent RTP media stream from the mixer's SSRC the | To achieve a coherent RTP media stream from the mixer's SSRC the | |||
mixer is forced to rewrite the incoming RTP packet's header. First | mixer is forced to rewrite the incoming RTP packet's header. First | |||
the SSRC field must be set to the value of the Mixer's SSRC. | the SSRC field must be set to the value of the Mixer's SSRC. | |||
Secondly, the sequence number must be the next in the sequence of | Secondly, the sequence number must be the next in the sequence of | |||
outgoing packets it sent. Thirdly the RTP timestamp value needs to | outgoing packets it sent. Thirdly the RTP timestamp value needs to | |||
be adjusted using an offset that changes each time one switch media | be adjusted using an offset that changes each time one switch media | |||
source. Finally depending on the negotiation the RTP payload type | source. Finally depending on the negotiation the RTP payload type | |||
value representing this particular RTP payload configuration may have | value representing this particular RTP payload configuration may have | |||
to be changed if the different PeerConnections have not arrived on | to be changed if the different PeerConnections have not arrived on | |||
skipping to change at page 48, line 48 ¶ | skipping to change at page 47, line 48 ¶ | |||
| | | | +-Video-| |-Video---+ | | | | | | | | | | | +-Video-| |-Video---+ | | | | | | | |||
| | | | | CV1|------------>|---------+-+-+-+------->| | | | | | | | | CV1|------------>|---------+-+-+-+------->| | | | |||
| | | | | |<------------|MV11 <---+-+-+-+-AV1----| | | | | | | | | |<------------|MV11 <---+-+-+-+-AV1----| | | | |||
| | | | | |<------------|MV12 <---+-+-+-+-EV1----| | | | | | | | | |<------------|MV12 <---+-+-+-+-EV1----| | | | |||
| | | | +-------| |---------+ | | | | | | | | | | | +-------| |---------+ | | | | | | | |||
| | | +---------| |-----------+ | | | | | | | | | +---------| |-----------+ | | | | | | |||
| | +-----------| |-------------+ | +-----+ | | | | +-----------| |-------------+ | +-----+ | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +--------------------------------+ | +---------------+ +--------------------------------+ | |||
Figure 13: Media Switching RTP Mixer | Figure 11: Media Switching RTP Mixer | |||
The Media Switching RTP mixer can similar to the Media Mixing one | The Media Switching RTP mixer can similar to the Media Mixing one | |||
reduce the bit-rate needed towards the different peers by selecting | reduce the bit-rate needed towards the different peers by selecting | |||
and switching in a sub-set of media streams out of the ones it | and switching in a sub-set of RTP media streams out of the ones it | |||
receives from the conference participations. | receives from the conference participations. | |||
To ensure that a media receiver can correctly decode the media stream | To ensure that a media receiver can correctly decode the RTP media | |||
after a switch, it becomes necessary to ensure for state saving | stream after a switch, it becomes necessary to ensure for state | |||
codecs that they start from default state at the point of switching. | saving codecs that they start from default state at the point of | |||
Thus one common tool for video is to request that the encoding | switching. Thus one common tool for video is to request that the | |||
creates an intra picture, something that isn't dependent on earlier | encoding creates an intra picture, something that isn't dependent on | |||
state. This can be done using Full Intra Request RTCP codec control | earlier state. This can be done using Full Intra Request RTCP codec | |||
message as discussed in Section 5.1.1. | control message as discussed in Section 5.1.1. | |||
Also in this type of mixer one could consider to terminate the RTP | Also in this type of mixer one could consider to terminate the RTP | |||
sessions fully between the different PeerConnection. The same | sessions fully between the different PeerConnection. The same | |||
arguments and conisderations as discussed in Appendix A.3.1.1 applies | arguments and conisderations as discussed in Appendix A.3.1.1 applies | |||
here. | here. | |||
A.3.3. Media Projecting | A.3.3. Media Projecting | |||
Another method for handling media in the RTP mixer is to project all | Another method for handling media in the RTP mixer is to project all | |||
potential sources (SSRCs) into a per end-point independent RTP | potential sources (SSRCs) into a per end-point independent RTP | |||
skipping to change at page 51, line 4 ¶ | skipping to change at page 50, line 4 ¶ | |||
| | | | +-Video-| |-Video---+ | | | | | | | | | | | +-Video-| |-Video---+ | | | | | | | |||
| | | | | CV1|------------>|---------+-+-+-+------->| | | | | | | | | CV1|------------>|---------+-+-+-+------->| | | | |||
| | | | | |<------------|AV1 <----+-+-+-+--------| | | | | | | | | |<------------|AV1 <----+-+-+-+--------| | | | |||
| | | | | | : : : |: : : : : : : : : : :| | | | | | | | | | : : : |: : : : : : : : : : :| | | | |||
| | | | | |<------------|EV1 <----+-+-+-+--------| | | | | | | | | |<------------|EV1 <----+-+-+-+--------| | | | |||
| | | | +-------| |---------+ | | | | | | | | | | | +-------| |---------+ | | | | | | | |||
| | | +---------| |-----------+ | | | | | | | | | +---------| |-----------+ | | | | | | |||
| | +-----------| |-------------+ | +-----+ | | | | +-----------| |-------------+ | +-----+ | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +--------------------------------+ | +---------------+ +--------------------------------+ | |||
Figure 14: Media Projecting Mixer | Figure 12: Media Projecting Mixer | |||
So in this six participant conference depicted above in (Figure 14) | So in this six participant conference depicted above in (Figure 12) | |||
one can see that end-point A will in this case be aware of 5 incoming | one can see that end-point A will in this case be aware of 5 incoming | |||
SSRCs, BV1-FV1. If this mixer intend to have the same behavior as in | SSRCs, BV1-FV1. If this mixer intend to have the same behavior as in | |||
Appendix A.3.2 where the mixer provides the end-points with the two | Appendix A.3.2 where the mixer provides the end-points with the two | |||
latest speaking end-points, then only two out of these five SSRCs | latest speaking end-points, then only two out of these five SSRCs | |||
will concurrently transmitt media to A. As the mixer selects which | will concurrently transmitt media to A. As the mixer selects which | |||
source in the different RTP sessions that transmit media to the end- | source in the different RTP sessions that transmit media to the end- | |||
points each media stream will require some rewriting when being | points each RTP media stream will require some rewriting when being | |||
projected from one session into another. The main thing is that the | projected from one session into another. The main thing is that the | |||
sequence number will need to be consequitvely incremented based on | sequence number will need to be consequitvely incremented based on | |||
the packet actually being transmitted in each RTP session. Thus the | the packet actually being transmitted in each RTP session. Thus the | |||
RTP sequence number offset will change each time a source is turned | RTP sequence number offset will change each time a source is turned | |||
on in RTP session. | on in RTP session. | |||
As the RTP sessions are independent the SSRC numbers used can be | As the RTP sessions are independent the SSRC numbers used can be | |||
handled indepdentently also thus working around any SSRC collisions | handled indepdentently also thus working around any SSRC collisions | |||
by having remapping tables between the RTP sessions. However the | by having remapping tables between the RTP sessions. However the | |||
related MediaStream signalling must be correspondlingly changed to | related WebRTC MediaStream signalling must be correspondlingly | |||
ensure consistent MediaStream to SSRC mappings between the different | changed to ensure consistent WebRTC MediaStream to SSRC mappings | |||
PeerConnections and the same comment that higher functions must not | between the different PeerConnections and the same comment that | |||
use SSRC as references to media streams applies also here. | higher functions must not use SSRC as references to RTP media streams | |||
applies also here. | ||||
The mixer will also be responsible to act on any RTCP codec control | The mixer will also be responsible to act on any RTCP codec control | |||
requests comming from an end-point and decide if it can act on it | requests comming from an end-point and decide if it can act on it | |||
locally or needs to translate the request into the RTP session that | locally or needs to translate the request into the RTP session that | |||
contains the media source. Both end-points and the mixer will need | contains the media source. Both end-points and the mixer will need | |||
to implement conference related codec control functionalities to | to implement conference related codec control functionalities to | |||
provide a good experience. Full Intra Request to request from the | provide a good experience. Full Intra Request to request from the | |||
media source to provide switching points between the sources, | media source to provide switching points between the sources, | |||
Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer | Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer | |||
to aggregate congestion control response towards the media source and | to aggregate congestion control response towards the media source and | |||
have it adjust its bit-rate in case the limitation is not in the | have it adjust its bit-rate in case the limitation is not in the | |||
source to mixer link. | source to mixer link. | |||
This version of the mixer also puts different requirements on the | This version of the mixer also puts different requirements on the | |||
end-point when it comes to decoder instances and handling of the | end-point when it comes to decoder instances and handling of the RTP | |||
media streams providing media. As each projected SSRC can at any | media streams providing media. As each projected SSRC can at any | |||
time provide media the end-point either needs to handle having thus | time provide media the end-point either needs to handle having thus | |||
many allocated decoder instances or have efficient switching of | many allocated decoder instances or have efficient switching of | |||
decoder contexts in a more limited set of actual decoder instances to | decoder contexts in a more limited set of actual decoder instances to | |||
cope with the switches. The WebRTC application also gets more | cope with the switches. The WebRTC application also gets more | |||
responsibility to update how the media provides is to be presented to | responsibility to update how the media provides is to be presented to | |||
the user. | the user. | |||
A.4. Translator Based | A.4. Translator Based | |||
There is also a variety of translators. The core commonality is that | There is also a variety of translators. The core commonality is that | |||
they do not need to make themselves visible in the RTP level by | they do not need to make themselves visible in the RTP level by | |||
having an SSRC themselves. Instead they sit between one or more end- | having an SSRC themselves. Instead they sit between one or more end- | |||
point and perform translation at some level. It can be media | point and perform translation at some level. It can be media | |||
transcoding, protocol translation or covering missing functionality | transcoding, protocol translation or covering missing functionality | |||
for a legacy device or simply relay packets between transport domains | for a legacy end-point or simply relay packets between transport | |||
or to realize multi-party. We will go in details below. | domains or to realize multi-party. We will go in details below. | |||
A.4.1. Transcoder | A.4.1. Transcoder | |||
A transcoder operates on media level and really used for two | A transcoder operates on media level and really used for two | |||
purposes, the first is to allow two end-points that doesn't have a | purposes, the first is to allow two end-points that doesn't have a | |||
common set of media codecs to communicate by translating from one | common set of media codecs to communicate by translating from one | |||
codec to another. The second is to change the bit-rate to a lower | codec to another. The second is to change the bit-rate to a lower | |||
one. For WebRTC end-points communicating with each other only the | one. For WebRTC end-points communicating with each other only the | |||
first one should at all be relevant. In certain legacy deployment | first one should at all be relevant. In certain legacy deployment | |||
media transcoder will be necessary to ensure both codecs and bit-rate | media transcoder will be necessary to ensure both codecs and bit-rate | |||
falls within the envelope the legacy device supports. | falls within the envelope the legacy end-point supports. | |||
As transcoding requires access to the media the transcoder must | As transcoding requires access to the media the transcoder must | |||
within the security context and access any media encryption and | within the security context and access any media encryption and | |||
integrity keys. On the RTP plane a media transcoder will in practice | integrity keys. On the RTP plane a media transcoder will in practice | |||
fork the RTP session into two different domains that are highly | fork the RTP session into two different domains that are highly | |||
decoupled when it comes to media parameters and reporting, but not | decoupled when it comes to media parameters and reporting, but not | |||
identities. To maintain signalling bindings to SSRCs a transcoder is | identities. To maintain signalling bindings to SSRCs a transcoder is | |||
likely needing to use the SSRC of one end-point to represent the | likely needing to use the SSRC of one end-point to represent the | |||
transcoded media stream to the other end-point(s). The congestion | transcoded RTP media stream to the other end-point(s). The | |||
control loop can be terminated in the transcoder as the media bit- | congestion control loop can be terminated in the transcoder as the | |||
rate being sent by the transcoder can be adjusted independently of | media bit-rate being sent by the transcoder can be adjusted | |||
the incoming bit-rate. However, for optimizing performance and | independently of the incoming bit-rate. However, for optimizing | |||
resource consumption the translator needs to consider what signals or | performance and resource consumption the translator needs to consider | |||
bit-rate reductions it should send towards the source end-point. For | what signals or bit-rate reductions it should send towards the source | |||
example receving a 2.5 mbps video stream and then send out a 250 kbps | end-point. For example receving a 2.5 mbps video stream and then | |||
video stream after transcoding is a vaste of resources. In most | send out a 250 kbps video stream after transcoding is a vaste of | |||
cases a 500 kbps video stream from the source in the right resolution | resources. In most cases a 500 kbps video stream from the source in | |||
is likely to provide equal quality after transcoding as the 2.5 mbps | the right resolution is likely to provide equal quality after | |||
source stream. At the same time increasing media bit-rate futher | transcoding as the 2.5 mbps source stream. At the same time | |||
than what is needed to represent the incoming quality accurate is | increasing media bit-rate futher than what is needed to represent the | |||
also wasted resources. | incoming quality accurate is also wasted resources. | |||
+-A-------------+ +-Translator------------------+ | +-A-------------+ +-Translator------------------+ | |||
| +-PeerC1------| |-PeerC1--------+ | | | +-PeerC1------| |-PeerC1--------+ | | |||
| | +-UDP1------| |-UDP1--------+ | | | | | +-UDP1------| |-UDP1--------+ | | | |||
| | | +-RTP1----| |-RTP1------+ | | | | | | | +-RTP1----| |-RTP1------+ | | | | |||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | | | +-Audio-| |-Audio---+ | | | +---+ | | |||
| | | | | AA1|------------>|---------+-+-+-+-|DEC|----+ | | | | | | | AA1|------------>|---------+-+-+-+-|DEC|----+ | | |||
| | | | | |<------------|BA1 <----+ | | | +---+ | | | | | | | | |<------------|BA1 <----+ | | | +---+ | | | |||
| | | | | | | |\| | | +---+ | | | | | | | | | | |\| | | +---+ | | | |||
| | | | +-------| |---------+ +-+-+-|ENC|<-+ | | | | | | | +-------| |---------+ +-+-+-|ENC|<-+ | | | |||
skipping to change at page 53, line 33 ¶ | skipping to change at page 52, line 33 ¶ | |||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | | | | | +-Audio-| |-Audio---+ | | | +---+ | | | | |||
| | | | | BA1|------------>|---------+-+-+-+-|DEC|--+ | | | | | | | | BA1|------------>|---------+-+-+-+-|DEC|--+ | | | |||
| | | | | |<------------|AA1 <----+ | | | +---+ | | | | | | | | |<------------|AA1 <----+ | | | +---+ | | | |||
| | | | | | | |\| | | +---+ | | | | | | | | | | |\| | | +---+ | | | |||
| | | | +-------| |---------+ +-+-+-|ENC|<---+ | | | | | | +-------| |---------+ +-+-+-|ENC|<---+ | | |||
| | | +---------| |-----------+ | | +---+ | | | | | +---------| |-----------+ | | +---+ | | |||
| | +-----------| |-------------+ | | | | | +-----------| |-------------+ | | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +-----------------------------+ | +---------------+ +-----------------------------+ | |||
Figure 15: Media Transcoder | Figure 13: Media Transcoder | |||
Figure 15 exposes some important details. First of all you can see | Figure 13 exposes some important details. First of all you can see | |||
the SSRC identifiers used by the translator are the corresponding | the SSRC identifiers used by the translator are the corresponding | |||
end-points. Secondly, there is a relation between the RTP sessions | end-points. Secondly, there is a relation between the RTP sessions | |||
in the two different PeerConnections that are represtented by having | in the two different PeerConnections that are represtented by having | |||
both parts be identified by the same level and they need to share | both parts be identified by the same level and they need to share | |||
certain contexts. Also certain type of RTCP messages will need to be | certain contexts. Also certain type of RTCP messages will need to be | |||
bridged between the two parts. Certain RTCP feedback messages are | bridged between the two parts. Certain RTCP feedback messages are | |||
likely needed to be soruced by the translator in response to actions | likely needed to be soruced by the translator in response to actions | |||
by the translator and its media encoder. | by the translator and its media encoder. | |||
A.4.2. Gateway / Protocol Translator | A.4.2. Gateway / Protocol Translator | |||
Gateways are used when some protocol feature that is required is not | Gateways are used when some protocol feature that is required is not | |||
supported by an end-point wants to participate in session. This RTP | supported by an end-point wants to participate in session. This RTP | |||
translator in Figure 16 takes on the role of ensuring that from the | translator in Figure 14 takes on the role of ensuring that from the | |||
perspective of participant A, participant B appears as a fully | perspective of participant A, participant B appears as a fully | |||
compliant WebRTC end-point (that is, it is the combination of the | compliant WebRTC end-point (that is, it is the combination of the | |||
Translator and participant B that looks like a WebRTC end point). | Translator and participant B that looks like a WebRTC end point). | |||
+------------+ | +------------+ | |||
| | | | | | |||
+---+ | Translator | +---+ | +---+ | Translator | +---+ | |||
| A |<---->| to legacy |<---->| B | | | A |<---->| to legacy |<---->| B | | |||
+---+ | end-point | +---+ | +---+ | end-point | +---+ | |||
WebRTC | | Legacy | WebRTC | | Legacy | |||
+------------+ | +------------+ | |||
Figure 16: Gateway (RTP translator) towards legacy end-point | Figure 14: Gateway (RTP translator) towards legacy end-point | |||
For WebRTC there are a number of requirements that could force the | For WebRTC there are a number of requirements that could force the | |||
need for a gateway if a WebRTC end-point is to communicate with a | need for a gateway if a WebRTC end-point is to communicate with a | |||
legacy end-point, such as support of ICE and DTLS-SRTP for | legacy end-point, such as support of ICE and DTLS-SRTP for | |||
keymanagement. On RTP level the main functions that may be missing | keymanagement. On RTP level the main functions that may be missing | |||
in a legacy implementation that otherswise support RTP are RTCP in | in a legacy implementation that otherswise support RTP are RTCP in | |||
general, SRTP implementation, congestion control and feedback | general, SRTP implementation, congestion control and feedback | |||
messages required to make it work. | messages required to make it work. | |||
+-A-------------+ +-Translator------------------+ | +-A-------------+ +-Translator------------------+ | |||
skipping to change at page 54, line 51 ¶ | skipping to change at page 53, line 51 ¶ | |||
| | | +-Audio-| |-Audio---+ +---+-+ | | || | | | | +-Audio-| |-Audio---+ +---+-+ | | || | |||
| | | | |<---RTCP---->|<--------+----------+ | | || | | | | | |<---RTCP---->|<--------+----------+ | | || | |||
| | | | BA1|------------>|---------+--------------+ | || | | | | | BA1|------------>|---------+--------------+ | || | |||
| | | | |<------------|AA1 <----+----------------+ || | | | | | |<------------|AA1 <----+----------------+ || | |||
| | | +-------| |---------+ || | | | | +-------| |---------+ || | |||
| | +---------| |----------------------------+| | | | +---------| |----------------------------+| | |||
| +-----------| |-----------+ | | | +-----------| |-----------+ | | |||
| | | | | | | | | | |||
+---------------+ +-----------------------------+ | +---------------+ +-----------------------------+ | |||
Figure 17: RTP/RTCP Protocol Translator | Figure 15: RTP/RTCP Protocol Translator | |||
The legacy gateway may be implemented in several ways and what it | The legacy gateway may be implemented in several ways and what it | |||
need to change is higly dependent on what functions it need to proxy | need to change is higly dependent on what functions it need to proxy | |||
for the legacy end-point. One possibility is depicted in Figure 17 | for the legacy end-point. One possibility is depicted in Figure 15 | |||
where the RTP media streams are compatible and forward without | where the RTP media streams are compatible and forward without | |||
changes. However, their RTP header values are captured to enable the | changes. However, their RTP header values are captured to enable the | |||
RTCP translator to create RTCP reception information related to the | RTCP translator to create RTCP reception information related to the | |||
leg between the end-point and the translator. This can then be | leg between the end-point and the translator. This can then be | |||
combined with the more basic RTCP reports that the legacy endpoint | combined with the more basic RTCP reports that the legacy endpoint | |||
(B) provides to give compatible and expected RTCP reporting to A. | (B) provides to give compatible and expected RTCP reporting to A. | |||
Thus enabling at least full congestion control on the path between A | Thus enabling at least full congestion control on the path between A | |||
and the translator. If B has limited possibilities for congestion | and the translator. If B has limited possibilities for congestion | |||
response for the media then the translator may need the capabilities | response for the media then the translator may need the capabilities | |||
to perform media transcoding to address cases where it otherwise | to perform media transcoding to address cases where it otherwise | |||
skipping to change at page 55, line 38 ¶ | skipping to change at page 54, line 38 ¶ | |||
encryption and integirty protection operation to resolve missmatch in | encryption and integirty protection operation to resolve missmatch in | |||
security systems. | security systems. | |||
A.4.3. Relay | A.4.3. Relay | |||
There exist a class of translators that operates on transport level | There exist a class of translators that operates on transport level | |||
below RTP and thus do not effect RTP/RTCP packets directly. They | below RTP and thus do not effect RTP/RTCP packets directly. They | |||
come in two distinct flavors, the one used to bridge between two | come in two distinct flavors, the one used to bridge between two | |||
different transport or address domains to more function as a gateway | different transport or address domains to more function as a gateway | |||
and the second one which is to to provide a group communication | and the second one which is to to provide a group communication | |||
feature as depicted below in Figure 18. | feature as depicted below in Figure 16. | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| Translator | | | Translator | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 18: RTP Translator (Relay) with Only Unicast Paths | Figure 16: RTP Translator (Relay) with Only Unicast Paths | |||
The first kind is straight forward and is likely to exist in WebRTC | The first kind is straight forward and is likely to exist in WebRTC | |||
context when an legacy end-point is compatible with the exception for | context when an legacy end-point is compatible with the exception for | |||
ICE, and thus needs a gateway that terminates the ICE and then | ICE, and thus needs a gateway that terminates the ICE and then | |||
forwards all the RTP/RTCP traffic and keymanagment to the end-point | forwards all the RTP/RTCP traffic and keymanagment to the end-point | |||
only rewriting the IP/UDP to forward the packet to the legacy node. | only rewriting the IP/UDP to forward the packet to the legacy node. | |||
The second type is useful if one wants a less complex central node or | The second type is useful if one wants a less complex central node or | |||
a central node that is outside of the security context and thus do | a central node that is outside of the security context and thus do | |||
not have access to the media. This relay takes on the role of | not have access to the media. This relay takes on the role of | |||
forwarding the media (RTP and RTCP) packets to the other end-points | forwarding the media (RTP and RTCP) packets to the other end-points | |||
but doesn't perform any RTP or media processing. Such a device | but doesn't perform any RTP or media processing. Such a device | |||
simply forwards the media from each sender to all of the other | simply forwards the media from each sender to all of the other | |||
particpants, and is sometimes called a transport-layer translator. | particpants, and is sometimes called a transport-layer translator. | |||
In Figure 18, participant A will only need to send a media once to | In Figure 16, participant A will only need to send a media once to | |||
the relay, which will redistribute it by sending a copy of the stream | the relay, which will redistribute it by sending a copy of the stream | |||
to participants B, C, and D. Participant A will still receive three | to participants B, C, and D. Participant A will still receive three | |||
RTP streams with the media from B, C and D if they transmit | RTP streams with the media from B, C and D if they transmit | |||
simultaneously. This is from an RTP perspective resulting in an RTP | simultaneously. This is from an RTP perspective resulting in an RTP | |||
session that behaves equivalent to one transporter over an IP Any | session that behaves equivalent to one transporter over an IP Any | |||
Source Multicast (ASM). | Source Multicast (ASM). | |||
This results in one common RTP session between all participants | This results in one common RTP session between all participants | |||
despite that there will be independent PeerConnections created to the | despite that there will be independent PeerConnections created to the | |||
translator as depicted below Figure 19. | translator as depicted below Figure 17. | |||
+-A-------------+ +-RELAY--------------------------+ | +-A-------------+ +-RELAY--------------------------+ | |||
| +-PeerC1------| |-PeerC1--------+ | | | +-PeerC1------| |-PeerC1--------+ | | |||
| | +-UDP1------| |-UDP1--------+ | | | | | +-UDP1------| |-UDP1--------+ | | | |||
| | | +-RTP1----| |-RTP1-------------------------+ | | | | | +-RTP1----| |-RTP1-------------------------+ | | |||
| | | | +-Video-| |-Video---+ | | | | | | | +-Video-| |-Video---+ | | | |||
| | | | | AV1|------------>|---------------------------+ | | | | | | | | AV1|------------>|---------------------------+ | | | |||
| | | | | |<------------|BV1 <--------------------+ | | | | | | | | | |<------------|BV1 <--------------------+ | | | | |||
| | | | | |<------------|CV1 <------------------+ | | | | | | | | | | |<------------|CV1 <------------------+ | | | | | |||
| | | | +-------| |---------+ | | | | | | | | | | +-------| |---------+ | | | | | | |||
skipping to change at page 57, line 48 ¶ | skipping to change at page 56, line 48 ¶ | |||
| | | | +-Video-| |-Video---+ | | | | | | | | | | +-Video-| |-Video---+ | | | | | | |||
| | | | | CV1|------------>|-------------------------+ | | | | | | | | | CV1|------------>|-------------------------+ | | | | |||
| | | | | |<------------|AV1 <----------------------+ | | | | | | | | |<------------|AV1 <----------------------+ | | | |||
| | | | | |<------------|BV1 <------------------+ | | | | | | | | |<------------|BV1 <------------------+ | | | |||
| | | | +-------| |---------+ | | | | | | | +-------| |---------+ | | | |||
| | | +---------| |------------------------------+ | | | | | +---------| |------------------------------+ | | |||
| | +-----------| |-------------+ | | | | | +-----------| |-------------+ | | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +--------------------------------+ | +---------------+ +--------------------------------+ | |||
Figure 19: Transport Multi-party Relay | Figure 17: Transport Multi-party Relay | |||
As the Relay RTP and RTCP packets between the UDP flows as indicated | As the Relay RTP and RTCP packets between the UDP flows as indicated | |||
by the arrows for the media flow a given WebRTC end-point, like A | by the arrows for the media flow a given WebRTC end-point, like A | |||
will see the remote sources BV1 and CV1. There will be also two | will see the remote sources BV1 and CV1. There will be also two | |||
different network paths between A, and B or C. This results in that | different network paths between A, and B or C. This results in that | |||
the client A must be capable of handlilng that when determining | the client A must be capable of handlilng that when determining | |||
congestion state that there might exist multiple destinations on the | congestion state that there might exist multiple destinations on the | |||
far side of a PeerConnection and that these paths shall be treated | far side of a PeerConnection and that these paths shall be treated | |||
differently. It also results in a requirement to combine the | differently. It also results in a requirement to combine the | |||
different congestion states into a decision to transmit a particular | different congestion states into a decision to transmit a particular | |||
media stream suitable to all participants. | RTP media stream suitable to all participants. | |||
It is also important to note that the relay can not perform selective | It is also important to note that the relay can not perform selective | |||
relaying of some sources and not others. The reason is that the RTCP | relaying of some sources and not others. The reason is that the RTCP | |||
reporting in that case becomes incosistent and without explicit | reporting in that case becomes incosistent and without explicit | |||
information about it being blocked must be interpret as severe | information about it being blocked must be interpret as severe | |||
congestion. | congestion. | |||
In this usage it is also necessary that the session management has | In this usage it is also necessary that the session management has | |||
configured a common set of RTP configuration including RTP payload | configured a common set of RTP configuration including RTP payload | |||
formats as when A sends a packet with pt=97 it will arrive at both B | formats as when A sends a packet with pt=97 it will arrive at both B | |||
skipping to change at page 58, line 40 ¶ | skipping to change at page 57, line 40 ¶ | |||
RTP session. | RTP session. | |||
The second problem can basically be solved in two ways. Either a | The second problem can basically be solved in two ways. Either a | |||
common master key from which all derive their per source key for | common master key from which all derive their per source key for | |||
SRTP. The second alternative which might be more practical is that | SRTP. The second alternative which might be more practical is that | |||
each end-point has its own key used to protects all RTP/RTCP packets | each end-point has its own key used to protects all RTP/RTCP packets | |||
it sends. Each participants key are then distributed to the other | it sends. Each participants key are then distributed to the other | |||
participants. This second method could be implemented using DTLS- | participants. This second method could be implemented using DTLS- | |||
SRTP to a special key server and then use Encrypted Key Transport | SRTP to a special key server and then use Encrypted Key Transport | |||
[I-D.ietf-avt-srtp-ekt] to distribute the actual used key to the | [I-D.ietf-avt-srtp-ekt] to distribute the actual used key to the | |||
other participants in the RTP session Figure 20. The first one could | other participants in the RTP session Figure 18. The first one could | |||
be achieved using MIKEY messages in SDP. | be achieved using MIKEY messages in SDP. | |||
+---+ +---+ | +---+ +---+ | |||
| | +-----------+ | | | | | +-----------+ | | | |||
| A |<------->| DTLS-SRTP |<------->| C | | | A |<------->| DTLS-SRTP |<------->| C | | |||
| |<-- -->| HOST |<-- -->| | | | |<-- -->| HOST |<-- -->| | | |||
+---+ \ / +-----------+ \ / +---+ | +---+ \ / +-----------+ \ / +---+ | |||
X X | X X | |||
+---+ / \ +-----------+ / \ +---+ | +---+ / \ +-----------+ / \ +---+ | |||
| |<-- -->| RTP |<-- -->| | | | |<-- -->| RTP |<-- -->| | | |||
| B |<------->| RELAY |<------->| D | | | B |<------->| RELAY |<------->| D | | |||
| | +-----------+ | | | | | +-----------+ | | | |||
+---+ +---+ | +---+ +---+ | |||
Figure 20: DTLS-SRTP host and RTP Relay Separated | Figure 18: DTLS-SRTP host and RTP Relay Separated | |||
The relay can still verify that a given SSRC isn't used or spoofed by | The relay can still verify that a given SSRC isn't used or spoofed by | |||
another participant within the multi-party session by binding SSRCs | another participant within the multi-party session by binding SSRCs | |||
on their first usage to a given source address and port pair. | on their first usage to a given source address and port pair. | |||
Packets carrying that source SSRC from other addresses can be | Packets carrying that source SSRC from other addresses can be | |||
suppressed to prevent spoofing. This is possible as long as SRTP is | suppressed to prevent spoofing. This is possible as long as SRTP is | |||
used which leaves the SSRC of the packet originator in RTP and RTCP | used which leaves the SSRC of the packet originator in RTP and RTCP | |||
packets in the clear. If such packet level method for enforcing | packets in the clear. If such packet level method for enforcing | |||
source authentication within the group, then there exist | source authentication within the group, then there exist | |||
cryptographic methods such as TESLA [RFC4383] that could be used for | cryptographic methods such as TESLA [RFC4383] that could be used for | |||
true source authentication. | true source authentication. | |||
A.5. End-point Forwarding | A.5. End-point Forwarding | |||
An WebRTC end-point (B in Figure 21) will receive a MediaStream (set | An WebRTC end-point (B in Figure 19) will receive a WebRTC | |||
of SSRCs) over a PeerConnection (from A). For the moment is not | MediaStream (set of SSRCs) over a PeerConnection (from A). For the | |||
decided if the end-point is allowed or not to in its turn send that | moment is not decided if the end-point is allowed or not to in its | |||
MediaStream over another PeerConnection to C. This section discusses | turn send that WebRTC MediaStream over another PeerConnection to C. | |||
the RTP and end-point implications of allowing such functionality, | This section discusses the RTP and end-point implications of allowing | |||
which on the API level is extremely simplistic to perform. | such functionality, which on the API level is extremely simplistic to | |||
perform. | ||||
+---+ +---+ +---+ | +---+ +---+ +---+ | |||
| A |--->| B |--->| C | | | A |--->| B |--->| C | | |||
+---+ +---+ +---+ | +---+ +---+ +---+ | |||
Figure 21: MediaStream Forwarding | Figure 19: MediaStream Forwarding | |||
There exist two main approaches to how B forwards the media from A to | There exist two main approaches to how B forwards the media from A to | |||
C. The first one is to simply relay the media stream. The second one | C. The first one is to simply relay the RTP media stream. The second | |||
is for B to act as a transcoder. Lets consider both approaches. | one is for B to act as a transcoder. Lets consider both approaches. | |||
A relay approache will result in that the WebRTC end-points will have | A relay approache will result in that the WebRTC end-points will have | |||
to have the same capabilities as being discussed in Relay | to have the same capabilities as being discussed in Relay | |||
(Appendix A.4.3). Thus A will see an RTP session that is extended | (Appendix A.4.3). Thus A will see an RTP session that is extended | |||
beyond the PeerConnection and see two different receiving end-points | beyond the PeerConnection and see two different receiving end-points | |||
with different path characteristics (B and C). Thus A's congestion | with different path characteristics (B and C). Thus A's congestion | |||
control needs to be capable of handling this. The security solution | control needs to be capable of handling this. The security solution | |||
can either support mechanism that allows A to inform C about the key | can either support mechanism that allows A to inform C about the key | |||
A is using despite B and C having agreed on another set of keys. | A is using despite B and C having agreed on another set of keys. | |||
Alternatively B will decrypt and then re-encrypt using a new key. | Alternatively B will decrypt and then re-encrypt using a new key. | |||
The relay based approach has the advantage that B does not need to | The relay based approach has the advantage that B does not need to | |||
transcode the media thus both maintaining the quality of the encoding | transcode the media thus both maintaining the quality of the encoding | |||
and reducing B's complexity requirements. If the right security | and reducing B's complexity requirements. If the right security | |||
solutions are supported then also C will be able to verify the | solutions are supported then also C will be able to verify the | |||
authenticity of the media comming from A. As downside A are forced to | authenticity of the media comming from A. As downside A are forced to | |||
take both B and C into consideration when delivering content. | take both B and C into consideration when delivering content. | |||
The media transcoder approach is similar to having B act as Mixer | The media transcoder approach is similar to having B act as Mixer | |||
terminating the RTP session combined with the transcoder as discussed | terminating the RTP session combined with the transcoder as discussed | |||
in Appendix A.4.1. A will only see B as receiver of its media. B | in Appendix A.4.1. A will only see B as receiver of its media. B | |||
will responsible to produce a media stream suitable for the B to C | will responsible to produce a RTP media stream suitable for the B to | |||
PeerConnection. This may require media transcoding for congestion | C PeerConnection. This may require media transcoding for congestion | |||
control purpose to produce a suitable bit-rate. Thus loosing media | control purpose to produce a suitable bit-rate. Thus loosing media | |||
quality in the transcoding and forcing B to spend the resource on the | quality in the transcoding and forcing B to spend the resource on the | |||
transcoding. The media transcoding does result in a separation of | transcoding. The media transcoding does result in a separation of | |||
the two different legs removing almost all dependencies. B could | the two different legs removing almost all dependencies. B could | |||
choice to implement logic to optimize its media transcoding | choice to implement logic to optimize its media transcoding | |||
operation, by for example requesting media properties that are | operation, by for example requesting media properties that are | |||
suitable for C also, thus trying to avoid it having to transcode the | suitable for C also, thus trying to avoid it having to transcode the | |||
content and only forward the media payloads between the two sides. | content and only forward the media payloads between the two sides. | |||
For that optimization to be practical WebRTC end-points must support | For that optimization to be practical WebRTC end-points must support | |||
sufficiently good tools for codec control. | sufficiently good tools for codec control. | |||
A.6. Simulcast | A.6. Simulcast | |||
This section discusses simulcast in the meaning of providing a node, | This section discusses simulcast in the meaning of providing a node, | |||
for example a stream switching Mixer, with multiple different encoded | for example a stream switching Mixer, with multiple different encoded | |||
version of the same media source. In the WebRTC context that appears | version of the same media source. In the WebRTC context that appears | |||
to be most easily accomplished by establishing mutliple | to be most easily accomplished by establishing mutliple | |||
PeerConnection all being feed the same set of MediaStreams. Each | PeerConnection all being feed the same set of WebRTC MediaStreams. | |||
PeerConnection is then configured to deliver a particular media | Each PeerConnection is then configured to deliver a particular media | |||
quality and thus media bit-rate. This will work well as long as the | quality and thus media bit-rate. This will work well as long as the | |||
end-point implements media encoding according to Figure 9. Then each | end-point implements media encoding according to Figure 7. Then each | |||
PeerConnection will receive an independently encoded version and the | PeerConnection will receive an independently encoded version and the | |||
codec parameters can be agreed specifically in the context of this | codec parameters can be agreed specifically in the context of this | |||
PeerConnection. | PeerConnection. | |||
For simulcast to work one needs to prevent that the end-point deliver | For simulcast to work one needs to prevent that the end-point deliver | |||
content encoded as depicted in Figure 10. If a single encoder | content encoded as depicted in Figure 8. If a single encoder | |||
instance is feed to multiple PeerConnections the intention of | instance is feed to multiple PeerConnections the intention of | |||
performing simulcast will fail. | performing simulcast will fail. | |||
Thus it should be considered to explicitly signal which of the two | Thus it should be considered to explicitly signal which of the two | |||
implementation strategies that are desired and which will be done. | implementation strategies that are desired and which will be done. | |||
At least making the application and possible the central node | At least making the application and possible the central node | |||
interested in receiving simulcast of an end-points media streams to | interested in receiving simulcast of an end-points RTP media streams | |||
be aware if it will function or not. | to be aware if it will function or not. | |||
Authors' Addresses | Authors' Addresses | |||
Colin Perkins | Colin Perkins | |||
University of Glasgow | University of Glasgow | |||
School of Computing Science | School of Computing Science | |||
Glasgow G12 8QQ | Glasgow G12 8QQ | |||
United Kingdom | United Kingdom | |||
Email: csp@csperkins.org | Email: csp@csperkins.org | |||
End of changes. 228 change blocks. | ||||
934 lines changed or deleted | 902 lines changed or added | |||
This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |