draft-ietf-rtcweb-rtp-usage-04.txt | draft-ietf-rtcweb-rtp-usage-05.txt | |||
---|---|---|---|---|
Network Working Group C. Perkins | Network Working Group C. Perkins | |||
Internet-Draft University of Glasgow | Internet-Draft University of Glasgow | |||
Intended status: Standards Track M. Westerlund | Intended status: Standards Track M. Westerlund | |||
Expires: January 17, 2013 Ericsson | Expires: April 25, 2013 Ericsson | |||
J. Ott | J. Ott | |||
Aalto University | Aalto University | |||
July 16, 2012 | October 22, 2012 | |||
Web Real-Time Communication (WebRTC): Media Transport and Use of RTP | Web Real-Time Communication (WebRTC): Media Transport and Use of RTP | |||
draft-ietf-rtcweb-rtp-usage-04 | draft-ietf-rtcweb-rtp-usage-05 | |||
Abstract | Abstract | |||
The Web Real-Time Communication (WebRTC) framework provides support | The Web Real-Time Communication (WebRTC) framework provides support | |||
for direct interactive rich communication using audio, video, text, | for direct interactive rich communication using audio, video, text, | |||
collaboration, games, etc. between two peers' web-browsers. This | collaboration, games, etc. between two peers' web-browsers. This | |||
memo describes the media transport aspects of the WebRTC framework. | memo describes the media transport aspects of the WebRTC framework. | |||
It specifies how the Real-time Transport Protocol (RTP) is used in | It specifies how the Real-time Transport Protocol (RTP) is used in | |||
the WebRTC context, and gives requirements for which RTP features, | the WebRTC context, and gives requirements for which RTP features, | |||
profiles, and extensions need to be supported. | profiles, and extensions need to be supported. | |||
skipping to change at page 1, line 39 ¶ | skipping to change at page 1, line 39 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 17, 2013. | This Internet-Draft will expire on April 25, 2013. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2012 IETF Trust and the persons identified as the | Copyright (c) 2012 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 19 ¶ | skipping to change at page 2, line 19 ¶ | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 6 | 4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 6 | |||
4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . . 6 | 4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 7 | 4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 7 | |||
4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 8 | 4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 8 | |||
4.4. RTP Session Multiplexing . . . . . . . . . . . . . . . . . 9 | 4.4. RTP Session Multiplexing . . . . . . . . . . . . . . . . . 8 | |||
4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 10 | 4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 9 | |||
4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 | 4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 | |||
4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . . 11 | 4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . . 10 | |||
4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 11 | 4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 10 | |||
4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 | 4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 | |||
5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 12 | 5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 11 | |||
5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 12 | 5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 11 | |||
5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . . 13 | 5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . . 12 | |||
5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 13 | 5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 12 | |||
5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 | 5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 | |||
5.1.4. Reference Picture Selection Indication (RPSI) . . . . 14 | 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 13 | |||
5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 14 | 5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 13 | |||
5.1.6. Temporary Maximum Media Stream Bit Rate Request . . . 14 | 5.1.6. Temporary Maximum Media Stream Bit Rate Request | |||
(TMMBR) . . . . . . . . . . . . . . . . . . . . . . . 13 | ||||
5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 | 5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 | |||
5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 15 | 5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 14 | |||
5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 15 | 5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 14 | |||
5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 | 5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 | |||
6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 16 | 6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 15 | |||
6.1. Negative Acknowledgements and RTP Retransmission . . . . . 16 | 6.1. Negative Acknowledgements and RTP Retransmission . . . . . 15 | |||
6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . . 17 | 6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . . 16 | |||
7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . . 17 | 7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . . 16 | |||
7.1. Congestion Control Requirements . . . . . . . . . . . . . 18 | 7.1. Boundary Conditions and Circuit Breakers . . . . . . . . . 17 | |||
7.2. Rate Control Boundary Conditions . . . . . . . . . . . . . 19 | 7.2. RTCP Limitations for Congestion Control . . . . . . . . . 18 | |||
7.3. RTCP Limitations for Congestion Control . . . . . . . . . 19 | 7.3. Congestion Control Interoperability With Legacy Systems . 19 | |||
7.4. Congestion Control Interoperability With Legacy Systems . 20 | 8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 19 | |||
8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 20 | 9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . . 20 | |||
9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . . 21 | 10. Signalling Considerations . . . . . . . . . . . . . . . . . . 20 | |||
10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 | 11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 21 | |||
11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 22 | 11.1. API MediaStream to RTP Mapping . . . . . . . . . . . . . . 21 | |||
11.1. API MediaStream to RTP Mapping . . . . . . . . . . . . . . 22 | 12. RTP Implementation Considerations . . . . . . . . . . . . . . 22 | |||
12. RTP Implementation Considerations . . . . . . . . . . . . . . 23 | 12.1. RTP Sessions and PeerConnection . . . . . . . . . . . . . 22 | |||
12.1. RTP Sessions and PeerConnection . . . . . . . . . . . . . 23 | 12.2. Multiple Sources . . . . . . . . . . . . . . . . . . . . . 24 | |||
12.2. Multiple Sources . . . . . . . . . . . . . . . . . . . . . 25 | 12.3. Multiparty . . . . . . . . . . . . . . . . . . . . . . . . 24 | |||
12.3. Multiparty . . . . . . . . . . . . . . . . . . . . . . . . 25 | 12.4. SSRC Collision Detection . . . . . . . . . . . . . . . . . 25 | |||
12.4. SSRC Collision Detection . . . . . . . . . . . . . . . . . 26 | 12.5. Contributing Sources . . . . . . . . . . . . . . . . . . . 26 | |||
12.5. Contributing Sources . . . . . . . . . . . . . . . . . . . 27 | 12.6. Media Synchronization . . . . . . . . . . . . . . . . . . 27 | |||
12.6. Media Synchronization . . . . . . . . . . . . . . . . . . 28 | 12.7. Multiple RTP End-points . . . . . . . . . . . . . . . . . 27 | |||
12.7. Multiple RTP End-points . . . . . . . . . . . . . . . . . 28 | 12.8. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
12.8. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 29 | ||||
12.9. Differentiated Treatment of Flows . . . . . . . . . . . . 29 | 12.9. Differentiated Treatment of Flows . . . . . . . . . . . . 29 | |||
13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | 13. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
14. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 | |||
15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 | 15. Security Considerations . . . . . . . . . . . . . . . . . . . 31 | |||
16. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 | 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
16.1. Normative References . . . . . . . . . . . . . . . . . . . 32 | 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 32 | |||
16.2. Informative References . . . . . . . . . . . . . . . . . . 34 | 17.1. Normative References . . . . . . . . . . . . . . . . . . . 32 | |||
17.2. Informative References . . . . . . . . . . . . . . . . . . 35 | ||||
Appendix A. Supported RTP Topologies . . . . . . . . . . . . . . 36 | Appendix A. Supported RTP Topologies . . . . . . . . . . . . . . 36 | |||
A.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 36 | A.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 37 | |||
A.2. Multi-Unicast (Mesh) . . . . . . . . . . . . . . . . . . . 39 | A.2. Multi-Unicast (Mesh) . . . . . . . . . . . . . . . . . . . 40 | |||
A.3. Mixer Based . . . . . . . . . . . . . . . . . . . . . . . 42 | A.3. Mixer Based . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
A.3.1. Media Mixing . . . . . . . . . . . . . . . . . . . . . 42 | A.3.1. Media Mixing . . . . . . . . . . . . . . . . . . . . . 43 | |||
A.3.2. Media Switching . . . . . . . . . . . . . . . . . . . 45 | A.3.2. Media Switching . . . . . . . . . . . . . . . . . . . 46 | |||
A.3.3. Media Projecting . . . . . . . . . . . . . . . . . . . 48 | A.3.3. Media Projecting . . . . . . . . . . . . . . . . . . . 49 | |||
A.4. Translator Based . . . . . . . . . . . . . . . . . . . . . 51 | A.4. Translator Based . . . . . . . . . . . . . . . . . . . . . 52 | |||
A.4.1. Transcoder . . . . . . . . . . . . . . . . . . . . . . 51 | A.4.1. Transcoder . . . . . . . . . . . . . . . . . . . . . . 52 | |||
A.4.2. Gateway / Protocol Translator . . . . . . . . . . . . 52 | A.4.2. Gateway / Protocol Translator . . . . . . . . . . . . 53 | |||
A.4.3. Relay . . . . . . . . . . . . . . . . . . . . . . . . 54 | A.4.3. Relay . . . . . . . . . . . . . . . . . . . . . . . . 55 | |||
A.5. End-point Forwarding . . . . . . . . . . . . . . . . . . . 58 | A.5. End-point Forwarding . . . . . . . . . . . . . . . . . . . 59 | |||
A.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 59 | A.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 60 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 60 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 61 | |||
1. Introduction | 1. Introduction | |||
The Real-time Transport Protocol (RTP) [RFC3550] provides a framework | The Real-time Transport Protocol (RTP) [RFC3550] provides a framework | |||
for delivery of audio and video teleconferencing data and other real- | for delivery of audio and video teleconferencing data and other real- | |||
time media applications. Previous work has defined the RTP protocol, | time media applications. Previous work has defined the RTP protocol, | |||
along with numerous profiles, payload formats, and other extensions. | along with numerous profiles, payload formats, and other extensions. | |||
When combined with appropriate signalling, these form the basis for | When combined with appropriate signalling, these form the basis for | |||
many teleconferencing systems. | many teleconferencing systems. | |||
skipping to change at page 4, line 28 ¶ | skipping to change at page 4, line 28 ¶ | |||
is to be used in the WebRTC context. It proposes a baseline set of | is to be used in the WebRTC context. It proposes a baseline set of | |||
RTP features that are to be implemented by all WebRTC-aware end- | RTP features that are to be implemented by all WebRTC-aware end- | |||
points, along with suggested extensions for enhanced functionality. | points, along with suggested extensions for enhanced functionality. | |||
The WebRTC overview [I-D.ietf-rtcweb-overview] outlines the complete | The WebRTC overview [I-D.ietf-rtcweb-overview] outlines the complete | |||
WebRTC framework, of which this memo is a part. | WebRTC framework, of which this memo is a part. | |||
The structure of this memo is as follows. Section 2 outlines our | The structure of this memo is as follows. Section 2 outlines our | |||
rationale in preparing this memo and choosing these RTP features. | rationale in preparing this memo and choosing these RTP features. | |||
Section 3 defines requirement terminology. Requirements for core RTP | Section 3 defines requirement terminology. Requirements for core RTP | |||
protocols are described in Section 4 and recommended RTP extensions | protocols are described in Section 4 and suggested RTP extensions are | |||
are described in Section 5. Section 6 outlines mechanisms that can | described in Section 5. Section 6 outlines mechanisms that can | |||
increase robustness to network problems, while Section 7 describes | increase robustness to network problems, while Section 7 describes | |||
the required congestion control and rate adaptation mechanisms. The | congestion control and rate adaptation mechanisms. The discussion of | |||
discussion of mandated RTP mechanisms concludes in Section 8 with a | mandated RTP mechanisms concludes in Section 8 with a review of | |||
review of performance monitoring and network management tools that | performance monitoring and network management tools that can be used | |||
can be used in the WebRTC context. Section 9 gives some guidelines | in the WebRTC context. Section 9 gives some guidelines for future | |||
for future incorporation of other RTP and RTP Control Protocol (RTCP) | incorporation of other RTP and RTP Control Protocol (RTCP) extensions | |||
extensions into this framework. Section 10 describes requirements | into this framework. Section 10 describes requirements placed on the | |||
placed on the signalling channel. Section 11 discusses the | signalling channel. Section 11 discusses the relationship between | |||
relationship between features of the RTP framework and the WebRTC | features of the RTP framework and the WebRTC application programming | |||
application programming interface (API), and Section 12 discusses RTP | interface (API), and Section 12 discusses RTP implementation | |||
implementation considerations. This memo concludes with an appendix | considerations. This memo concludes with an appendix discussing | |||
discussing several different RTP Topologies, and how they affect the | several different RTP Topologies, and how they affect the RTP | |||
RTP session(s) and various implementation details of possible | session(s) and various implementation details of possible realization | |||
realization of central nodes. | of central nodes. | |||
2. Rationale | 2. Rationale | |||
The RTP framework comprises the RTP data transfer protocol, the RTP | The RTP framework comprises the RTP data transfer protocol, the RTP | |||
control protocol, and numerous RTP payload formats, profiles, and | control protocol, and numerous RTP payload formats, profiles, and | |||
extensions. This range of add-ons has allowed RTP to meet various | extensions. This range of add-ons has allowed RTP to meet various | |||
needs that were not envisaged by the original protocol designers, and | needs that were not envisaged by the original protocol designers, and | |||
to support many new media encodings, but raises the question of what | to support many new media encodings, but raises the question of what | |||
extensions are to be supported by new implementations. The | extensions are to be supported by new implementations. The | |||
development of the WebRTC framework provides an opportunity for us to | development of the WebRTC framework provides an opportunity for us to | |||
review the available RTP features and extensions, and to define a | review the available RTP features and extensions, and to define a | |||
common baseline feature set for all WebRTC implementations of RTP. | common baseline feature set for all WebRTC implementations of RTP. | |||
This builds on the past 15 years development of RTP to mandate the | This builds on the past 15 years development of RTP to mandate the | |||
use of extensions that have shown widespread utility, while still | use of extensions that have shown widespread utility, while still | |||
remaining compatible with the wide installed base of RTP | remaining compatible with the wide installed base of RTP | |||
implementations where possible. | implementations where possible. | |||
RTP and RTCP extensions not discussed in this document can still be | Other RTP and RTCP extensions not discussed in this document can be | |||
implemented by a WebRTC end-point, but they are considered optional, | implemented by WebRTC end-points if they are beneficial for new use | |||
are not required for interoperability, and do not provide features | cases. However, they are not necessary to address the WebRTC use | |||
needed to address the WebRTC use cases and requirements | cases and requirements identified to date | |||
[I-D.ietf-rtcweb-use-cases-and-requirements]. | [I-D.ietf-rtcweb-use-cases-and-requirements]. | |||
While the baseline set of RTP features and extensions defined in this | While the baseline set of RTP features and extensions defined in this | |||
memo is targeted at the requirements of the WebRTC framework, it is | memo is targeted at the requirements of the WebRTC framework, it is | |||
expected to be broadly useful for other conferencing-related uses of | expected to be broadly useful for other conferencing-related uses of | |||
RTP. In particular, it is likely that this set of RTP features and | RTP. In particular, it is likely that this set of RTP features and | |||
extensions will be appropriate for other desktop or mobile video | extensions will be appropriate for other desktop or mobile video | |||
conferencing systems, or for room-based high-quality telepresence | conferencing systems, or for room-based high-quality telepresence | |||
applications. | applications. | |||
3. Terminology | 3. Terminology | |||
This memo specifies various requirements levels for implementation or | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
use of RTP features and extensions. When we describe the importance | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
of RTP extensions, or the need for implementation support, we use the | document are to be interpreted as described in [RFC2119]. The RFC | |||
following requirement levels to specify the importance of the feature | 2119 interpretation of these key words applies only when written in | |||
in the WebRTC framework: | ALL CAPS. Lower- or mixed-case uses of these key words are not to be | |||
interpreted as carrying special significance in this memo. | ||||
MUST: This word, or the terms "REQUIRED" or "SHALL", mean that the | ||||
definition is an absolute requirement of the specification. | ||||
SHOULD: This word, or the adjective "RECOMMENDED", mean that there | ||||
may exist valid reasons in particular circumstances to ignore a | ||||
particular item, but the full implications must be understood and | ||||
carefully weighed before choosing a different course. | ||||
MAY: This word, or the adjective "OPTIONAL", mean that an item is | ||||
truly optional. One vendor may choose to include the item because | ||||
a particular marketplace requires it or because the vendor feels | ||||
that it enhances the product while another vendor may omit the | ||||
same item. An implementation which does not include a particular | ||||
option MUST be prepared to interoperate with another | ||||
implementation which does include the option, though perhaps with | ||||
reduced functionality. In the same vein an implementation which | ||||
does include a particular option MUST be prepared to interoperate | ||||
with another implementation which does not include the option | ||||
(except, of course, for the feature the option provides.) | ||||
These key words are used in a manner consistent with their definition | ||||
in [RFC2119]. The above interpretation of these key words applies | ||||
only when written in ALL CAPS. Lower- or mixed-case uses of these | ||||
key words are not to be interpreted as carrying special significance | ||||
in this memo. | ||||
We define the following terms: | We define the following terms: | |||
RTP Media Stream: A sequence of RTP packets, and associated RTCP | RTP Media Stream: A sequence of RTP packets, and associated RTCP | |||
packets, using a single synchronisation source (SSRC) that | packets, using a single synchronisation source (SSRC) that | |||
together carries part or all of the content of a specific Media | together carries part or all of the content of a specific Media | |||
Type from a specific sender source within a given RTP session. | Type from a specific sender source within a given RTP session. | |||
RTP Session: As defined by [RFC3550], the endpoints belonging to the | RTP Session: As defined by [RFC3550], the endpoints belonging to the | |||
same RTP Session are those that share a single SSRC space. That | same RTP Session are those that share a single SSRC space. That | |||
skipping to change at page 8, line 5 ¶ | skipping to change at page 7, line 28 ¶ | |||
Implementers are advised to consider the requirements for graceful | Implementers are advised to consider the requirements for graceful | |||
degradation when interoperating with legacy implementations. | degradation when interoperating with legacy implementations. | |||
Other implementation considerations are discussed in Section 12. | Other implementation considerations are discussed in Section 12. | |||
4.2. Choice of the RTP Profile | 4.2. Choice of the RTP Profile | |||
The complete specification of RTP for a particular application domain | The complete specification of RTP for a particular application domain | |||
requires the choice of an RTP Profile. For WebRTC use, the "Extended | requires the choice of an RTP Profile. For WebRTC use, the "Extended | |||
Secure RTP Profile for Real-time Transport Control Protocol (RTCP)- | Secure RTP Profile for Real-time Transport Control Protocol (RTCP)- | |||
Based Feedback (RTP/SAVPF)" [RFC5124] is REQUIRED to be implemented. | Based Feedback (RTP/SAVPF)" [RFC5124] as extended by | |||
This builds on the basic RTP/AVP profile [RFC3551], the RTP profile | [I-D.terriberry-avp-codecs] MUST be implemented. This builds on the | |||
for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure RTP | basic RTP/AVP profile [RFC3551], the RTP profile for RTCP-based | |||
profile (RTP/SAVP) [RFC3711]. | feedback (RTP/AVPF) [RFC4585], and the secure RTP profile (RTP/SAVP) | |||
[RFC3711]. | ||||
The RTCP-based feedback extensions are needed for the improved RTCP | The RTCP-based feedback extensions [RFC4585] are needed for the | |||
timer model, that allows more flexible transmission of RTCP packets | improved RTCP timer model, that allows more flexible transmission of | |||
in response to events, rather than strictly according to bandwidth. | RTCP packets in response to events, rather than strictly according to | |||
This is vital for being able to report congestion events. These | bandwidth. This is vital for being able to report congestion events. | |||
extensions also save RTCP bandwidth, and will commonly only use the | These extensions also save RTCP bandwidth, and will commonly only use | |||
full RTCP bandwidth allocation if there are many events that require | the full RTCP bandwidth allocation if there are many events that | |||
feedback. They are also needed to make use of the RTP conferencing | require feedback. They are also needed to make use of the RTP | |||
extensions discussed in Section 5.1. | conferencing extensions discussed in Section 5.1. | |||
Note: The enhanced RTCP timer model defined in the RTP/AVPF | Note: The enhanced RTCP timer model defined in the RTP/AVPF | |||
profile is backwards compatible with legacy systems that implement | profile is backwards compatible with legacy systems that implement | |||
only the base RTP/AVP profile, given some constraints on parameter | only the base RTP/AVP profile, given some constraints on parameter | |||
configuration such as the RTCP bandwidth value and "trr-int" (the | configuration such as the RTCP bandwidth value and "trr-int" (the | |||
most important factor for interworking with RTP/AVP end-points via | most important factor for interworking with RTP/AVP end-points via | |||
a gateway is to set the trr-int parameter to a value representing | a gateway is to set the trr-int parameter to a value representing | |||
4 seconds). | 4 seconds). | |||
The secure RTP profile is needed to provide SRTP media encryption, | The secure RTP profile [RFC3711] is needed to provide media | |||
integrity protection, replay protection and a limited form of source | encryption, integrity protection, replay protection and a limited | |||
authentication. | form of source authentication. WebRTC implementations MUST NOT send | |||
packets using the basic RTP/AVP profile or the RTP/AVPF profile; they | ||||
WebRTC implementations MUST NOT send packets using the basic RTP/AVP | MUST employ the full RTP/SAVPF profile to protect all RTP and RTCP | |||
profile or the RTP/AVPF profile; they MUST employ the full RTP/SAVPF | packets that are generated. The default and mandatory to implement | |||
profile to protect all RTP and RTCP packets that are generated. The | transforms listed in Section 5 of [RFC3711] SHALL apply. | |||
default and mandatory-to-implement transforms listed in Section 5 of | ||||
[RFC3711] SHALL apply. | ||||
Implementations MUST support DTLS-SRTP [RFC5764] for key-management. | Implementations MUST support DTLS-SRTP [RFC5764] for key-management. | |||
Other key management schemes MAY be supported. | Other key management schemes MAY be supported. | |||
4.3. Choice of RTP Payload Formats | 4.3. Choice of RTP Payload Formats | |||
The requirement from Section 6 of [RFC3551] that "Audio applications | Implementations MUST follow the WebRTC Audio Codec and Processing | |||
operating under this profile SHOULD, at a minimum, be able to send | Requirements [I-D.ietf-rtcweb-audio] and SHOULD follow the updated | |||
and/or receive payload types 0 (PCMU) and 5 (DVI4)" applies, since | recommendations for audio codecs in the RTP/AVP Profile | |||
Section 4.2 of this memo mandates the use of the RTP/SAVPF profile, | [I-D.terriberry-avp-codecs]. Support for other audio codecs is | |||
which inherits this restriction from the RTP/AVP profile. | OPTIONAL. | |||
(tbd: there is ongoing discussion on whether support for other audio | ||||
and video codecs is to be mandated) | ||||
Endpoints MAY signal support for multiple media formats, or multiple | (tbd: the mandatory to implement video codec is not yet decided) | |||
configurations of a single format, provided each uses a different RTP | ||||
payload type number. An endpoint that has signalled its support for | ||||
multiple formats is REQUIRED to accept data in any of those formats | ||||
at any time, unless it has previously signalled limitations on its | ||||
decoding capability. | ||||
Endpoints MAY signal support for multiple RTP payload formats, or | ||||
multiple configurations of a single RTP payload format, provided each | ||||
payload format uses a different RTP payload type number. An endpoint | ||||
that has signalled support for multiple RTP payload formats SHOULD | ||||
accept data in any of those payload formats at any time, unless it | ||||
has previously signalled limitations on its decoding capability. | ||||
This requirement is constrained if several media types are sent in | This requirement is constrained if several media types are sent in | |||
the same RTP session. In such a case, a source (SSRC) is restricted | the same RTP session. In such a case, a source (SSRC) is restricted | |||
to switching only between the RTP payload formats signalled for the | to switching only between the RTP payload formats signalled for the | |||
media type that is being sent by that source; see Section 4.4. To | media type that is being sent by that source; see Section 4.4. To | |||
support rapid rate adaptation, RTP does not require signalling in | support rapid rate adaptation by changing codec, RTP does not require | |||
advance for changes between payload formats that were signalled | advance signalling for changes between RTP payload formats that were | |||
during session setup. | signalled during session set-up. | |||
An RTP sender that changes between two RTP payload types that use | An RTP sender that changes between two RTP payload types that use | |||
different RTP clock rates MUST follow the recommendations in Section | different RTP clock rates MUST follow the recommendations in Section | |||
4.1 of [I-D.ietf-avtext-multiple-clock-rates]. RTP receivers MUST | 4.1 of [I-D.ietf-avtext-multiple-clock-rates]. RTP receivers MUST | |||
follow the recommendations in Section 4.3 of | follow the recommendations in Section 4.3 of | |||
[I-D.ietf-avtext-multiple-clock-rates], in order to support sources | [I-D.ietf-avtext-multiple-clock-rates], in order to support sources | |||
that switch between clock rates in an RTP session (these | that switch between clock rates in an RTP session (these | |||
recommendations for receivers are backwards compatible with the case | recommendations for receivers are backwards compatible with the case | |||
where senders use only a single clock rate). | where senders use only a single clock rate). | |||
skipping to change at page 10, line 22 ¶ | skipping to change at page 9, line 41 ¶ | |||
sessions over a single UDP flow.) | sessions over a single UDP flow.) | |||
4.5. RTP and RTCP Multiplexing | 4.5. RTP and RTCP Multiplexing | |||
Historically, RTP and RTCP have been run on separate transport layer | Historically, RTP and RTCP have been run on separate transport layer | |||
addresses (e.g., two UDP ports for each RTP session, one port for RTP | addresses (e.g., two UDP ports for each RTP session, one port for RTP | |||
and one port for RTCP). With the increased use of Network Address/ | and one port for RTCP). With the increased use of Network Address/ | |||
Port Translation (NAPT) this has become problematic, since | Port Translation (NAPT) this has become problematic, since | |||
maintaining multiple NAT bindings can be costly. It also complicates | maintaining multiple NAT bindings can be costly. It also complicates | |||
firewall administration, since multiple ports need to be opened to | firewall administration, since multiple ports need to be opened to | |||
allow RTP traffic. To reduce these costs and session setup times, | allow RTP traffic. To reduce these costs and session set-up times, | |||
support for multiplexing RTP data packets and RTCP control packets on | support for multiplexing RTP data packets and RTCP control packets on | |||
a single port for each RTP session is REQUIRED, as specified in | a single port for each RTP session is REQUIRED, as specified in | |||
[RFC5761]. For backwards compatibility, implementations are also | [RFC5761]. For backwards compatibility, implementations are also | |||
REQUIRED to support sending of RTP and RTCP to separate destination | REQUIRED to support sending of RTP and RTCP to separate destination | |||
ports. | ports. | |||
Note that the use of RTP and RTCP multiplexed onto a single transport | Note that the use of RTP and RTCP multiplexed onto a single transport | |||
port ensures that there is occasional traffic sent on that port, even | port ensures that there is occasional traffic sent on that port, even | |||
if there is no active media traffic. This can be useful to keep NAT | if there is no active media traffic. This can be useful to keep NAT | |||
bindings alive, and is the recommend method for application level | bindings alive, and is the recommend method for application level | |||
keep-alives of RTP sessions [RFC6263]. | keep-alives of RTP sessions [RFC6263]. | |||
4.6. Reduced Size RTCP | 4.6. Reduced Size RTCP | |||
RTCP packets are usually sent as compound RTCP packets, and [RFC3550] | RTCP packets are usually sent as compound RTCP packets, and [RFC3550] | |||
requires that those compound packets start with an Sender Report (SR) | requires that those compound packets start with an Sender Report (SR) | |||
or Receiver Report (RR) packet. When using frequent RTCP feedback | or Receiver Report (RR) packet. When using frequent RTCP feedback | |||
messages, these general statistics are not needed in every packet and | messages under the RTP/AVPF Profile [RFC4585] these statistics are | |||
unnecessarily increase the mean RTCP packet size. This can limit the | not needed in every packet, and unnecessarily increase the mean RTCP | |||
frequency at which RTCP packets can be sent within the RTCP bandwidth | packet size. This can limit the frequency at which RTCP packets can | |||
share. | be sent within the RTCP bandwidth share. | |||
To avoid this problem, [RFC5506] specifies how to reduce the mean | To avoid this problem, [RFC5506] specifies how to reduce the mean | |||
RTCP message size and allow for more frequent feedback. Frequent | RTCP message size and allow for more frequent feedback. Frequent | |||
feedback, in turn, is essential to make real-time applications | feedback, in turn, is essential to make real-time applications | |||
quickly aware of changing network conditions, and to allow them to | quickly aware of changing network conditions, and to allow them to | |||
adapt their transmission and encoding behaviour. Support for sending | adapt their transmission and encoding behaviour. Support for sending | |||
RTCP feedback packets as [RFC5506] non-compound packets is REQUIRED | RTCP feedback packets as [RFC5506] non-compound packets is REQUIRED, | |||
when signalled. For backwards compatibility, implementations are | but MUST be negotiated using the signalling channel before use. For | |||
also REQUIRED to support the use of compound RTCP feedback packets. | backwards compatibility, implementations are also REQUIRED to support | |||
the use of compound RTCP feedback packets if the remote endpoint does | ||||
not agree to the use of non-compound RTCP in the signalling exchange. | ||||
4.7. Symmetric RTP/RTCP | 4.7. Symmetric RTP/RTCP | |||
To ease traversal of NAT and firewall devices, implementations are | To ease traversal of NAT and firewall devices, implementations are | |||
REQUIRED to implement and use Symmetric RTP [RFC4961]. This requires | REQUIRED to implement and use Symmetric RTP [RFC4961]. This requires | |||
that the IP address and port used for sending and receiving RTP and | that the IP address and port used for sending and receiving RTP and | |||
RTCP packets are identical. The reasons for using symmetric RTP is | RTCP packets are identical. The reasons for using symmetric RTP is | |||
primarily to avoid issues with NAT and Firewalls by ensuring that the | primarily to avoid issues with NAT and Firewalls by ensuring that the | |||
flow is actually bi-directional and thus kept alive and registered as | flow is actually bi-directional and thus kept alive and registered as | |||
flow the intended recipient actually wants. In addition, it saves | flow the intended recipient actually wants. In addition, it saves | |||
skipping to change at page 11, line 46 ¶ | skipping to change at page 11, line 20 ¶ | |||
detected, or when the RTP application is restarted, its RTCP CNAME is | detected, or when the RTP application is restarted, its RTCP CNAME is | |||
meant to stay unchanged, so that RTP endpoints can be uniquely | meant to stay unchanged, so that RTP endpoints can be uniquely | |||
identified and associated with their RTP media streams within a set | identified and associated with their RTP media streams within a set | |||
of related RTP sessions. For proper functionality, each RTP endpoint | of related RTP sessions. For proper functionality, each RTP endpoint | |||
needs to have a unique RTCP CNAME value. | needs to have a unique RTCP CNAME value. | |||
The RTP specification [RFC3550] includes guidelines for choosing a | The RTP specification [RFC3550] includes guidelines for choosing a | |||
unique RTP CNAME, but these are not sufficient in the presence of NAT | unique RTP CNAME, but these are not sufficient in the presence of NAT | |||
devices. In addition, long-term persistent identifiers can be | devices. In addition, long-term persistent identifiers can be | |||
problematic from a privacy viewpoint. Accordingly, support for | problematic from a privacy viewpoint. Accordingly, support for | |||
generating a short-term persistent RTCP CNAMEs following method (b) | generating a short-term persistent RTCP CNAMEs following | |||
specified in Section 4.2 of "Guidelines for Choosing RTP Control | [I-D.rescorla-avtcore-6222bis] is RECOMMENDED. | |||
Protocol (RTCP) Canonical Names (CNAMEs)" [RFC6222] is RECOMMENDED. | ||||
Note, however, that this does not resolve the privacy concern as | ||||
there is not sufficient randomness to avoid tracking of an end-point. | ||||
An WebRTC end-point MUST support reception of any CNAME that matches | An WebRTC end-point MUST support reception of any CNAME that matches | |||
the syntax limitations specified by the RTP specification [RFC3550] | the syntax limitations specified by the RTP specification [RFC3550] | |||
and cannot assume that any CNAME will be according to the recommended | and cannot assume that any CNAME will be chosen according to the form | |||
form above. | suggested above. | |||
(tbd: there seems to be a growing consensus that the working group | ||||
wants randomly-chosen CNAME values; need to reference a draft that | ||||
describes how this is to be done) | ||||
5. WebRTC Use of RTP: Extensions | 5. WebRTC Use of RTP: Extensions | |||
There are a number of RTP extensions that are either needed to obtain | There are a number of RTP extensions that are either needed to obtain | |||
full functionality, or extremely useful to improve on the baseline | full functionality, or extremely useful to improve on the baseline | |||
performance, in the WebRTC application context. One set of these | performance, in the WebRTC application context. One set of these | |||
extensions is related to conferencing, while others are more generic | extensions is related to conferencing, while others are more generic | |||
in nature. The following subsections describe the various RTP | in nature. The following subsections describe the various RTP | |||
extensions mandated or suggested for use within the WebRTC context. | extensions mandated or suggested for use within the WebRTC context. | |||
5.1. Conferencing Extensions | 5.1. Conferencing Extensions | |||
RTP is inherently a group communication protocol. Groups can be | RTP is inherently a group communication protocol. Groups can be | |||
implemented using a centralised server, multi-unicast, or using IP | implemented using a centralised server, multi-unicast, or using IP | |||
multicast. While IP multicast was popular in early deployments, in | multicast. While IP multicast was popular in early deployments, in | |||
today's practice, overlay-based conferencing dominates, typically | today's practice, overlay-based conferencing dominates, typically | |||
using one or more central servers to connect endpoints in a star or | using one or more central servers to connect endpoints in a star or | |||
flat tree topology. These central servers can be implemented in a | flat tree topology. These central servers can be implemented in a | |||
number of ways as discussed in Appendix A, and in the memo on RTP | number of ways as discussed in Appendix A, and in the memo on RTP | |||
Topologies [RFC5117]. | Topologies [I-D.westerlund-avtcore-rtp-topologies-update]. | |||
As discussed in Section 3.5 of [RFC5117], the use of a video | As discussed in Section 3.7 of | |||
[I-D.westerlund-avtcore-rtp-topologies-update], the use of a video | ||||
switching MCU makes the use of RTCP for congestion control, or any | switching MCU makes the use of RTCP for congestion control, or any | |||
type of quality reports, very problematic. Also, as discussed in | type of quality reports, very problematic. Also, as discussed in | |||
section 3.6 of [RFC5117], the use of a content modifying MCU with | section 3.8 of [I-D.westerlund-avtcore-rtp-topologies-update], the | |||
RTCP termination breaks RTP loop detection and removes the ability | use of a content modifying MCU with RTCP termination breaks RTP loop | |||
for receivers to identify active senders. RTP Transport Translators | detection and removes the ability for receivers to identify active | |||
(Topo-Translator) are not of immediate interest to WebRTC, although | senders. RTP Transport Translators (Topo-Translator) are not of | |||
the main difference compared to point to point is the possibility of | immediate interest to WebRTC, although the main difference compared | |||
seeing multiple different transport paths in any RTCP feedback. | to point to point is the possibility of seeing multiple different | |||
Accordingly, only Point to Point (Topo-Point-to-Point), Multiple | transport paths in any RTCP feedback. Accordingly, only Point to | |||
concurrent Point to Point (Mesh) and RTP Mixers (Topo-Mixer) | Point (Topo-Point-to-Point), Multiple concurrent Point to Point | |||
topologies are needed to achieve the use-cases to be supported in | (Mesh) and RTP Mixers (Topo-Mixer) topologies are needed to achieve | |||
WebRTC initially. These RECOMMENDED topologies are expected to be | the use-cases to be supported in WebRTC initially. These RECOMMENDED | |||
supported by all WebRTC end-points (these topologies require no | topologies are expected to be supported by all WebRTC end-points | |||
special RTP-layer support in the end-point if the RTP features | (these topologies require no special RTP-layer support in the end- | |||
mandated in this memo are implemented). | point if the RTP features mandated in this memo are implemented). | |||
The RTP extensions described below to be used with centralised | The RTP extensions described below to be used with centralised | |||
conferencing -- where one RTP Mixer (e.g., a conference bridge) | conferencing -- where one RTP Mixer (e.g., a conference bridge) | |||
receives a participant's RTP media streams and distributes them to | receives a participant's RTP media streams and distributes them to | |||
the other participants -- are not necessary for interoperability; an | the other participants -- are not necessary for interoperability; an | |||
RTP endpoint that does not implement these extensions will work | RTP endpoint that does not implement these extensions will work | |||
correctly, but may offer poor performance. Support for the listed | correctly, but might offer poor performance. Support for the listed | |||
extensions will greatly improve the quality of experience and, to | extensions will greatly improve the quality of experience and, to | |||
provide a reasonable baseline quality, some these extensions are | provide a reasonable baseline quality, some these extensions are | |||
mandatory to be supported by WebRTC end-points. | mandatory to be supported by WebRTC end-points. | |||
The RTCP packets assisting in such operation are defined in the | The RTCP conferencing extensions are defined in Extended RTP Profile | |||
Extended RTP Profile for Real-time Transport Control Protocol (RTCP)- | for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/ | |||
Based Feedback (RTP/AVPF) [RFC4585] and the "Codec Control Messages | AVPF) [RFC4585] and the "Codec Control Messages in the RTP Audio- | |||
in the RTP Audio-Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] | Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] and are fully | |||
and are fully usable by the Secure variant of this profile (RTP/ | usable by the Secure variant of this profile (RTP/SAVPF) [RFC5124]. | |||
SAVPF) [RFC5124]. | ||||
5.1.1. Full Intra Request (FIR) | 5.1.1. Full Intra Request (FIR) | |||
The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the | The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the | |||
Codec Control Messages [RFC5104]. This message is used to make the | Codec Control Messages [RFC5104]. This message is used to make the | |||
mixer request a new Intra picture from a participant in the session. | mixer request a new Intra picture from a participant in the session. | |||
This is used when switching between sources to ensure that the | This is used when switching between sources to ensure that the | |||
receivers can decode the video or other predictive media encoding | receivers can decode the video or other predictive media encoding | |||
with long prediction chains. It is REQUIRED that this feedback | with long prediction chains. It is REQUIRED that WebRTC senders | |||
message is supported by RTP senders in WebRTC, since it greatly | understand the react to this feedback message since it greatly | |||
improves the user experience when using centralised mixers-based | improves the user experience when using centralised mixer-based | |||
conferencing. | conferencing; support for sending the FIR message is OPTIONAL. | |||
5.1.2. Picture Loss Indication (PLI) | 5.1.2. Picture Loss Indication (PLI) | |||
The Picture Loss Indication is defined in Section 6.3.1 of the RTP/ | The Picture Loss Indication is defined in Section 6.3.1 of the RTP/ | |||
AVPF profile [RFC4585]. It is used by a receiver to tell the sending | AVPF profile [RFC4585]. It is used by a receiver to tell the sending | |||
encoder that it lost the decoder context and would like to have it | encoder that it lost the decoder context and would like to have it | |||
repaired somehow. This is semantically different from the Full Intra | repaired somehow. This is semantically different from the Full Intra | |||
Request above as there there may be multiple methods to fulfill the | Request above as there there could be multiple ways to fulfil the | |||
request. It is REQUIRED that senders understand and react to this | request. It is REQUIRED that WebRTC senders understand and react to | |||
feedback message as a loss tolerance mechanism; receivers MAY send | this feedback message as a loss tolerance mechanism; receivers MAY | |||
PLI messages. | send PLI messages. | |||
5.1.3. Slice Loss Indication (SLI) | 5.1.3. Slice Loss Indication (SLI) | |||
The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF | The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF | |||
profile [RFC4585]. It is used by a receiver to tell the encoder that | profile [RFC4585]. It is used by a receiver to tell the encoder that | |||
it has detected the loss or corruption of one or more consecutive | it has detected the loss or corruption of one or more consecutive | |||
macroblocks, and would like to have these repaired somehow. The use | macro blocks, and would like to have these repaired somehow. The use | |||
of this feedback message is OPTIONAL as a loss tolerance mechanism. | of this feedback message is OPTIONAL as a loss tolerance mechanism. | |||
5.1.4. Reference Picture Selection Indication (RPSI) | 5.1.4. Reference Picture Selection Indication (RPSI) | |||
Reference Picture Selection Indication (RPSI) is defined in Section | Reference Picture Selection Indication (RPSI) is defined in Section | |||
6.3.3 of the RTP/AVPF profile [RFC4585]. Some video coding standards | 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video coding standards | |||
allow the use of older reference pictures than the most recent one | allow the use of older reference pictures than the most recent one | |||
for predictive coding. If such a codec is in used, and if the | for predictive coding. If such a codec is in used, and if the | |||
encoder has learned about a loss of encoder-decoder synchronisation, | encoder has learned about a loss of encoder-decoder synchronisation, | |||
a known-as-correct reference picture can be used for future coding. | a known-as-correct reference picture can be used for future coding. | |||
The RPSI message allows this to be signalled. | The RPSI message allows this to be signalled. Support for RPSI | |||
messages is OPTIONAL. | ||||
Support for RPSI messages is OPTIONAL. | ||||
5.1.5. Temporal-Spatial Trade-off Request (TSTR) | 5.1.5. Temporal-Spatial Trade-off Request (TSTR) | |||
The temporal-spatial trade-off request and notification are defined | The temporal-spatial trade-off request and notification are defined | |||
in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used | in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used | |||
to ask the video encoder to change the trade-off it makes between | to ask the video encoder to change the trade-off it makes between | |||
temporal and spatial resolution, for example to prefer high spatial | temporal and spatial resolution, for example to prefer high spatial | |||
image quality but low frame rate. | image quality but low frame rate. Support for TSTR requests and | |||
notifications is OPTIONAL. | ||||
Support for TSTR requests and notifications is OPTIONAL. | ||||
5.1.6. Temporary Maximum Media Stream Bit Rate Request | 5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR) | |||
This feedback message is defined in Sections 3.5.4 and 4.2.1 of the | This feedback message is defined in Sections 3.5.4 and 4.2.1 of the | |||
Codec Control Messages [RFC5104]. This message and its notification | Codec Control Messages [RFC5104]. This message and its notification | |||
message are used by a media receiver to inform the sending party that | message are used by a media receiver to inform the sending party that | |||
there is a current limitation on the amount of bandwidth available to | there is a current limitation on the amount of bandwidth available to | |||
this receiver. This may have various reasons; for example, an RTP | this receiver. This can be various reasons for this: for example, an | |||
mixer may use this message to limit the media rate of the sender | RTP mixer can use this message to limit the media rate of the sender | |||
being forwarded by the mixer (without doing media transcoding) to fit | being forwarded by the mixer (without doing media transcoding) to fit | |||
the bottlenecks existing towards the other session participants. It | the bottlenecks existing towards the other session participants. It | |||
is REQUIRED that this feedback message is supported. A RTP media | is REQUIRED that this feedback message is supported. WebRTC senders | |||
stream sender receiving a TMMBR for its SSRC MUST follow the | are REQUIRED to implement support for TMMBR messages, and MUST follow | |||
limitations set by the message; the sending of TMMBR requests is | bandwidth limitations set by a TMMBR message received for their SSRC. | |||
OPTIONAL. | The sending of TMMBR requests is OPTIONAL. | |||
5.2. Header Extensions | 5.2. Header Extensions | |||
The RTP specification [RFC3550] provides the capability to include | The RTP specification [RFC3550] provides the capability to include | |||
RTP header extensions containing in-band data, but the format and | RTP header extensions containing in-band data, but the format and | |||
semantics of the extensions are poorly specified. The use of header | semantics of the extensions are poorly specified. The use of header | |||
extensions is OPTIONAL in the WebRTC context, but if they are used, | extensions is OPTIONAL in the WebRTC context, but if they are used, | |||
they MUST be formatted and signalled following the general mechanism | they MUST be formatted and signalled following the general mechanism | |||
for RTP header extensions defined in [RFC5285], since this gives | for RTP header extensions defined in [RFC5285], since this gives | |||
well-defined semantics to RTP header extensions. | well-defined semantics to RTP header extensions. | |||
skipping to change at page 16, line 4 ¶ | skipping to change at page 15, line 15 ¶ | |||
5.2.3. Mixer-to-Client Audio Level | 5.2.3. Mixer-to-Client Audio Level | |||
The Mixer to Client Audio Level header extension [RFC6465] provides | The Mixer to Client Audio Level header extension [RFC6465] provides | |||
the client with the audio level of the different sources mixed into a | the client with the audio level of the different sources mixed into a | |||
common mix by a RTP mixer. This enables a user interface to indicate | common mix by a RTP mixer. This enables a user interface to indicate | |||
the relative activity level of each session participant, rather than | the relative activity level of each session participant, rather than | |||
just being included or not based on the CSRC field. This is a pure | just being included or not based on the CSRC field. This is a pure | |||
optimisations of non critical functions, and is hence OPTIONAL to | optimisations of non critical functions, and is hence OPTIONAL to | |||
implement. If it is implemented, it is REQUIRED that the header | implement. If it is implemented, it is REQUIRED that the header | |||
extensions are encrypted according to | extensions are encrypted according to | |||
[I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | [I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | |||
contained in these header extensions can be considered sensitive. | contained in these header extensions can be considered sensitive. | |||
6. WebRTC Use of RTP: Improving Transport Robustness | 6. WebRTC Use of RTP: Improving Transport Robustness | |||
There are some tools that can make RTP flows robust against Packet | There are some tools that can make RTP flows robust against Packet | |||
loss and reduce the impact on media quality. However, they all add | loss and reduce the impact on media quality. However, they all add | |||
extra bits compared to a non-robust stream. These extra bits need to | extra bits compared to a non-robust stream. These extra bits need to | |||
be considered, and the aggregate bit-rate must be rate-controlled. | be considered, and the aggregate bit-rate MUST be rate-controlled. | |||
Thus, improving robustness might require a lower base encoding | Thus, improving robustness might require a lower base encoding | |||
quality, but has the potential to deliver that quality with fewer | quality, but has the potential to deliver that quality with fewer | |||
errors. The mechanisms described in the following sub-sections can | errors. The mechanisms described in the following sub-sections can | |||
be used to improve tolerance to packet loss. | be used to improve tolerance to packet loss. | |||
6.1. Negative Acknowledgements and RTP Retransmission | 6.1. Negative Acknowledgements and RTP Retransmission | |||
As a consequence of supporting the RTP/SAVPF profile, implementations | As a consequence of supporting the RTP/SAVPF profile, implementations | |||
will support negative acknowlegdements (NACKs) for RTP data packets | will support negative acknowledgements (NACKs) for RTP data packets | |||
[RFC4585]. This feedback can be used to inform a sender of the loss | [RFC4585]. This feedback can be used to inform a sender of the loss | |||
of particular RTP packets, subject to the capacity limitations of the | of particular RTP packets, subject to the capacity limitations of the | |||
RTCP feedback channel. A sender can use this information to optimise | RTCP feedback channel. A sender can use this information to optimise | |||
the user experience by adapting the media encoding to compensate for | the user experience by adapting the media encoding to compensate for | |||
known lost packets, for example. | known lost packets, for example. | |||
Senders are REQUIRED to understand the Generic NACK message defined | Senders are REQUIRED to understand the Generic NACK message defined | |||
in Section 6.2.1 of [RFC4585], but MAY choose to ignore this feedback | in Section 6.2.1 of [RFC4585], but MAY choose to ignore this feedback | |||
(following Section 4.2 of [RFC4585]). Receivers MAY send NACKs for | (following Section 4.2 of [RFC4585]). Receivers MAY send NACKs for | |||
missing RTP packets; [RFC4585] provides some guidelines on when to | missing RTP packets; [RFC4585] provides some guidelines on when to | |||
send NACKs. It is not expected that a receiver will send a NACK for | send NACKs. It is not expected that a receiver will send a NACK for | |||
every lost RTP packet, rather it should consider the cost of sending | every lost RTP packet, rather it needs to consider the cost of | |||
NACK feedback, and the importance of the lost packet, to make an | sending NACK feedback, and the importance of the lost packet, to make | |||
informed decision on whether it is worth telling the sender about a | an informed decision on whether it is worth telling the sender about | |||
packet loss event. | a packet loss event. | |||
The RTP Retransmission Payload Format [RFC4588] offers the ability to | The RTP Retransmission Payload Format [RFC4588] offers the ability to | |||
retransmit lost packets based on NACK feedback. Retransmission needs | retransmit lost packets based on NACK feedback. Retransmission needs | |||
to be used with care in interactive real-time applications to ensure | to be used with care in interactive real-time applications to ensure | |||
that the retransmitted packet arrives in time to be useful, but can | that the retransmitted packet arrives in time to be useful, but can | |||
be effective in environments with relatively low network RTT (an RTP | be effective in environments with relatively low network RTT (an RTP | |||
sender can estimate the RTT to the receivers using the information in | sender can estimate the RTT to the receivers using the information in | |||
RTCP SR and RR packets). The use of retransmissions can also | RTCP SR and RR packets). The use of retransmissions can also | |||
increase the forward RTP bandwidth, and can potentially worsen the | increase the forward RTP bandwidth, and can potentially worsen the | |||
problem if the packet loss was caused by network congestion. We | problem if the packet loss was caused by network congestion. We | |||
note, however, that retransmission of an important lost packet to | note, however, that retransmission of an important lost packet to | |||
repair decoder state may be lower cost than sending a full intra | repair decoder state can have lower cost than sending a full intra | |||
frame. It is not appropriate to blindly retransmit RTP packets in | frame. It is not appropriate to blindly retransmit RTP packets in | |||
response to a NACK. The importance of lost packets and the | response to a NACK. The importance of lost packets and the | |||
likelihood of them arriving in time to be useful needs to be | likelihood of them arriving in time to be useful needs to be | |||
considered before RTP retransmission is used. | considered before RTP retransmission is used. | |||
Receivers are REQUIRED to implement support for RTP retransmission | Receivers are REQUIRED to implement support for RTP retransmission | |||
packets [RFC4588]. Senders MAY send RTP retransmission packets in | packets [RFC4588]. Senders MAY send RTP retransmission packets in | |||
response to NACKs if the RTP retransmission payload format has been | response to NACKs if the RTP retransmission payload format has been | |||
negotiated for the session, and if the sender believes it is useful | negotiated for the session, and if the sender believes it is useful | |||
to send a retransmission of the packet(s) referenced in the NACK. An | to send a retransmission of the packet(s) referenced in the NACK. An | |||
RTP sender is not expected to retransmit every NACKed packet. | RTP sender is not expected to retransmit every NACKed packet. | |||
6.2. Forward Error Correction (FEC) | 6.2. Forward Error Correction (FEC) | |||
The use of Forward Error Correction (FEC) can provide an effective | The use of Forward Error Correction (FEC) can provide an effective | |||
protection against some degree of packet loss, at the cost of steady | protection against some degree of packet loss, at the cost of steady | |||
bandwidth overhead. There are several FEC schemes that are defined | bandwidth overhead. There are several FEC schemes that are defined | |||
for use with RTP. Some of these schemes are specific to a particular | for use with RTP. Some of these schemes are specific to a particular | |||
RTP payload format, others operate across RTP packets and can be used | RTP payload format, others operate across RTP packets and can be used | |||
with any payload format. It should be noted that using redundancy | with any payload format. It needs to be noted that using redundant | |||
encoding or FEC will lead to increased playout delay, which should be | encoding or FEC will lead to increased play out delay, which needs to | |||
considered when choosing the redundancy or FEC formats and their | be considered when choosing the redundancy or FEC formats and their | |||
respective parameters. | respective parameters. | |||
If an RTP payload format negotiated for use in a WebRTC session | If an RTP payload format negotiated for use in a WebRTC session | |||
supports redundant transmission or FEC as a standard feature of that | supports redundant transmission or FEC as a standard feature of that | |||
payload format, then that support MAY be used in the WebRTC session, | payload format, then that support MAY be used in the WebRTC session, | |||
subject to any appropriate signalling. | subject to any appropriate signalling. | |||
There are several block-based FEC schemes that are designed for use | There are several block-based FEC schemes that are designed for use | |||
with RTP independent of the chosen RTP payload format. At the time | with RTP independent of the chosen RTP payload format. At the time | |||
of this writing there is no consensus on which, if any, of these FEC | of this writing there is no consensus on which, if any, of these FEC | |||
schemes is appropriate for use in the WebRTC context. Accordingly, | schemes is appropriate for use in the WebRTC context. Accordingly, | |||
this memo makes no recommendation on the choice of block-based FEC | this memo makes no recommendation on the choice of block-based FEC | |||
for WebRTC use. | for WebRTC use. | |||
7. WebRTC Use of RTP: Rate Control and Media Adaptation | 7. WebRTC Use of RTP: Rate Control and Media Adaptation | |||
WebRTC will be used in very varied network environment with a | WebRTC will be used in heterogeneous network environments using a | |||
heterogeneous set of link technologies, including wired and wireless, | variety set of link technologies, including both wired and wireless | |||
interconnecting peers at different topological locations resulting in | links, to interconnect potentially large groups of users around the | |||
network paths with widely varying one way delays, bit-rate capacity, | world. As a result, the network paths between users can have widely | |||
load levels and traffic mixes. In addition, individual end-points | varying one-way delays, available bit-rates, load levels, and traffic | |||
will open one or more WebRTC sessions between one or more peers. | mixtures. Individual end-points can open one or more RTP sessions to | |||
Each of these session may contain different mixes of media and data | each participant in a WebRTC conference, and there can be several | |||
flows. Asymmetric usage of media bit-rates and number of RTP media | participants. Each of these RTP sessions can contain different types | |||
streams is also to be expected. A single end-point may receive zero | of media, and the type of media, bit rate, and number of flows can be | |||
to many simultaneous RTP media streams while itself transmitting one | highly asymmetric. Non-RTP traffic can share the network paths RTP | |||
or more streams. | flows. Since the network environment is not predictable or stable, | |||
WebRTC endpoints MUST ensure that the RTP traffic they generate can | ||||
The WebRTC application is very dependent from a quality perspective | adapt to match changes in the available network capacity. | |||
on the media adaptation working well so that an end-point doesn't | ||||
transmit significantly more than the path is capable of handling. If | ||||
it would, the result would be high levels of packet loss or delay | ||||
spikes causing media quality degradation. | ||||
WebRTC applications using more than a single RTP media stream of any | ||||
media type or data flows have an additional concern. In this case, | ||||
the different flows should try to avoid affecting each other | ||||
negatively. In addition, in case there is a resource limitation, the | ||||
available resources need to be shared. How to share them is | ||||
something the application should prioritize so that the limitations | ||||
in quality or capabilities are those that have the least impact on | ||||
the application. | ||||
Overall, the diversity of operating environments lead to the need for | ||||
functionality that adapts to the available capacity and that competes | ||||
fairly with other network flows. If it would not compete fairly | ||||
enough WebRTC could be used as an attack method for starving out | ||||
other traffic on specific links as long as the attacker is able to | ||||
create traffic across the links in question. A possible attack | ||||
scenario is to use a web-service capable of attracting large numbers | ||||
of end-points, combined with BGP routing state to have the server | ||||
pick client pairs to drive traffic to specific paths. | ||||
The above clearly motivates the need for a well working media | ||||
adaptation mechanism. This mechanism also have a number of | ||||
requirements on what services it should provide and what performance | ||||
it needs to provide. | ||||
The biggest issue is that there are no standardised and ready to use | ||||
mechanism that can simply be included in WebRTC. Thus, there will be | ||||
a need for the IETF to produce such a specification. Therefore, the | ||||
suggested way forward is to specify requirements on any solution for | ||||
the media adaptation. For now, we propose that these requirements be | ||||
documented in this specification. In addition, a proposed detailed | ||||
solution will be developed, but is expected to take longer time to | ||||
finalize than this document. | ||||
7.1. Congestion Control Requirements | ||||
Requirements for congestion control of WebRTC sessions are discussed | ||||
in [I-D.jesup-rtp-congestion-reqs]. | ||||
Implementations are REQUIRED to implement the RTP circuit breakers | The quality of experience for users of WebRTC implementation is very | |||
described in [I-D.perkins-avtcore-rtp-circuit-breakers]. | dependent on effective adaptation of the media to the limitations of | |||
the network. End-points have to be designed so they do not transmit | ||||
significantly more data than the network path can support, except for | ||||
very short time periods, otherwise high levels of network packet loss | ||||
or delay spikes will occur, causing media quality degradation. The | ||||
limiting factor on the capacity of the network path might be the link | ||||
bandwidth, or it might be competition with other traffic on the link | ||||
(this can be non-WebRTC traffic, traffic due to other WebRTC flows, | ||||
or even competition with other WebRTC flows in the same session). | ||||
(tbd: Should add the RTP/RTCP Mechanisms that an WebRTC | An effective media congestion control algorithm is therefore an | |||
implementation is required to support. Potential candidates include | essential part of the WebRTC framework. However, at the time of this | |||
Transmission Timestamps (RFC 5450).) | writing, there is no standard congestion control algorithm that can | |||
be used for interactive media applications such as WebRTC flows. | ||||
Some requirements for congestion control algorithms for WebRTC | ||||
sessions are discussed in [I-D.jesup-rtp-congestion-reqs], and it is | ||||
expected that a future version of this memo will mandate the use of a | ||||
congestion control algorithm that satisfies these requirements. | ||||
7.2. Rate Control Boundary Conditions | 7.1. Boundary Conditions and Circuit Breakers | |||
The session establishment signalling will establish certain boundary | In the absence of a concrete congestion control algorithm, all WebRTC | |||
that the media bit-rate adaptation can act within. First of all the | implementations MUST implement the RTP circuit breaker algorithm that | |||
set of media codecs provide practical limitations in the supported | is in described [I-D.ietf-avtcore-rtp-circuit-breakers]. The circuit | |||
bit-rate span where it can provide useful quality, which | breaker defines a conservative boundary condition for safe operation, | |||
packetization choices that exist. Next the signalling can establish | chosen such that applications that trigger the circuit breaker will | |||
maximum media bit-rate boundaries using SDP b=AS or b=CT. | almost certainly be causing severe network congestion. Any future | |||
RTP congestion control algorithms are expected to operate within the | ||||
envelope allowed by the circuit breaker. | ||||
(tbd: This section needs expanding on how to use these limits) | The session establishment signalling will also necessarily establish | |||
boundaries to which the media bit-rate will conform. The choice of | ||||
media codecs provides upper- and lower-bounds on the supported bit- | ||||
rates that the application can utilise to provide useful quality, and | ||||
the packetization choices that exist. In addition, the signalling | ||||
channel can establish maximum media bit-rate boundaries using the SDP | ||||
"b=AS:" or "b=CT:" lines, and the RTP/AVPF Temporary Maximum Media | ||||
Stream Bit Rate (TMMBR) Requests (see Section 5.1.6 of this memo). | ||||
The combination of media codec choice and signalled bandwidth limits | ||||
SHOULD be used to limit traffic based on known bandwidth limitations, | ||||
for example the capacity of the edge links, to the extent possible. | ||||
7.3. RTCP Limitations for Congestion Control | 7.2. RTCP Limitations for Congestion Control | |||
Experience with the congestion control algorithms of TCP [RFC5681], | Experience with the congestion control algorithms of TCP [RFC5681], | |||
TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown | TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown | |||
that feedback on packet arrivals needs to be sent roughly once per | that feedback on packet arrivals needs to be sent roughly once per | |||
round trip time. We note that the real-time media traffic may not | round trip time. We note that the real-time media traffic might not | |||
have to adapt to changing path conditions as rapidly as needed for | have to adapt to changing path conditions as rapidly as needed for | |||
the elastic applications TCP was designed for, but frequent feedback | the elastic applications TCP was designed for, but frequent feedback | |||
is still required to allow the congestion control algorithm to track | is still needed to allow the congestion control algorithm to track | |||
the path dynamics. | the path dynamics. | |||
The total RTCP bandwidth is limited in its transmission rate to a | The total RTCP bandwidth is limited in its transmission rate to a | |||
fraction of the RTP traffic (by default 5%). RTCP packets are larger | fraction of the RTP traffic (by default 5%). RTCP packets are larger | |||
than, e.g., TCP ACKs (even when non-compound RTCP packets are used). | than, e.g., TCP ACKs (even when non-compound RTCP packets are used). | |||
The RTP media stream bit rate thus limits the maximum feedback rate | The RTP media stream bit rate thus limits the maximum feedback rate | |||
as a function of the mean RTCP packet size. | as a function of the mean RTCP packet size. | |||
Interactive communication may not be able to afford waiting for | Interactive communication might not be able to afford waiting for | |||
packet losses to occur to indicate congestion, because an increase in | packet losses to occur to indicate congestion, because an increase in | |||
playout delay due to queuing (most prominent in wireless networks) | play out delay due to queuing (most prominent in wireless networks) | |||
may easily lead to packets being dropped due to late arrival at the | can easily lead to packets being dropped due to late arrival at the | |||
receiver. Therefore, more sophisticated cues may need to be reported | receiver. Therefore, more sophisticated cues might need to be | |||
-- to be defined in a suitable congestion control framework as noted | reported -- to be defined in a suitable congestion control framework | |||
above -- which, in turn, increase the report size again. For | as noted above -- which, in turn, increase the report size again. | |||
example, different RTCP XR report blocks (jointly) provide the | For example, different RTCP XR report blocks (jointly) provide the | |||
necessary details to implement a variety of congestion control | necessary details to implement a variety of congestion control | |||
algorithms, but the (compound) report size grows quickly. | algorithms, but the (compound) report size grows quickly. | |||
In group communication, the share of RTCP bandwidth needs to be | In group communication, the share of RTCP bandwidth needs to be | |||
shared by all group members, reducing the capacity and thus the | shared by all group members, reducing the capacity and thus the | |||
reporting frequency per node. | reporting frequency per node. | |||
Example: assuming 512 kbit/s video yields 3200 bytes/s RTCP | Example: assuming 512 kbit/s video yields 3200 bytes/s RTCP | |||
bandwidth, split across two entities in a point-to-point session. An | bandwidth, split across two entities in a point-to-point session. An | |||
endpoint could thus send a report of 100 bytes about every 70ms or | endpoint could thus send a report of 100 bytes about every 70ms or | |||
for every other frame in a 30 fps video. | for every other frame in a 30 fps video. | |||
7.4. Congestion Control Interoperability With Legacy Systems | 7.3. Congestion Control Interoperability With Legacy Systems | |||
There are legacy implementations that do not implement RTCP, and | There are legacy implementations that do not implement RTCP, and | |||
hence do not provide any congestion feedback. Congestion control | hence do not provide any congestion feedback. Congestion control | |||
cannot be performed with these end-points. WebRTC implementations | cannot be performed with these end-points. WebRTC implementations | |||
that must interwork with such end-points MUST limit their | that need to interwork with such end-points MUST limit their | |||
transmission to a low rate, equivalent to a VoIP call using a low | transmission to a low rate, equivalent to a VoIP call using a low | |||
bandwidth codec, that is unlikely to cause any significant | bandwidth codec, that is unlikely to cause any significant | |||
congestion. | congestion. | |||
When interworking with legacy implementations that support RTCP using | When interworking with legacy implementations that support RTCP using | |||
the RTP/AVP profile [RFC3551], congestion feedback is provided in | the RTP/AVP profile [RFC3551], congestion feedback is provided in | |||
RTCP RR packets every few seconds. Implementations that are required | RTCP RR packets every few seconds. Implementations that have to | |||
to interwork with such end-points MUST ensure that they keep within | interwork with such end-points MUST ensure that they keep within the | |||
the RTP circuit breaker [I-D.perkins-avtcore-rtp-circuit-breakers] | RTP circuit breaker [I-D.ietf-avtcore-rtp-circuit-breakers] | |||
constraints to limit the congestion they can cause. | constraints to limit the congestion they can cause. | |||
If a legacy end-point supports RTP/AVPF, this enables negotiation of | If a legacy end-point supports RTP/AVPF, this enables negotiation of | |||
important parameters for frequent reporting, such as the "trr-int" | important parameters for frequent reporting, such as the "trr-int" | |||
parameter, and the possibility that the end-point supports some | parameter, and the possibility that the end-point supports some | |||
useful feedback format for congestion control purpose such as TMMBR | useful feedback format for congestion control purpose such as TMMBR | |||
[RFC5104]. Implementations that are required to interwork with such | [RFC5104]. Implementations that have to interwork with such end- | |||
end-points MUST ensure that they stay within the RTP circuit breaker | points MUST ensure that they stay within the RTP circuit breaker | |||
[I-D.perkins-avtcore-rtp-circuit-breakers] constraints to limit the | [I-D.ietf-avtcore-rtp-circuit-breakers] constraints to limit the | |||
congestion they can cause, but may find that they can achieve better | congestion they can cause, but might find that they can achieve | |||
congestion response depending on the amount of feedback that is | better congestion response depending on the amount of feedback that | |||
available. | is available. | |||
8. WebRTC Use of RTP: Performance Monitoring | 8. WebRTC Use of RTP: Performance Monitoring | |||
RTCP does contains a basic set of RTP flow monitoring metrics like | RTCP does contains a basic set of RTP flow monitoring metrics like | |||
packet loss and jitter. There are a number of extensions that could | packet loss and jitter. There are a number of extensions that could | |||
be included in the set to be supported. However, in most cases which | be included in the set to be supported. However, in most cases which | |||
RTP monitoring that is needed depends on the application, which makes | RTP monitoring that is needed depends on the application, which makes | |||
it difficult to select which to include when the set of applications | it difficult to select which to include when the set of applications | |||
is very large. | is very large. | |||
Exposing some metrics in the WebRTC API should be considered allowing | Exposing some metrics in the WebRTC API needs to be considered | |||
the application to gather the measurements of interest. However, | allowing the application to gather the measurements of interest. | |||
security implications for the different data sets exposed will need | However, security implications for the different data sets exposed | |||
to be considered in this. | will need to be considered in this. | |||
(tbd: If any RTCP XR metrics should be added is still an open | (tbd: If any RTCP XR metrics need to be added is still an open | |||
question, but possible to extend at a later stage) | question, but possible to extend at a later stage) | |||
9. WebRTC Use of RTP: Future Extensions | 9. WebRTC Use of RTP: Future Extensions | |||
It is possible that the core set of RTP protocols and RTP extensions | It is possible that the core set of RTP protocols and RTP extensions | |||
specified in this memo will prove insufficient for the future needs | specified in this memo will prove insufficient for the future needs | |||
of WebRTC applications. In this case, future updates to this memo | of WebRTC applications. In this case, future updates to this memo | |||
MUST be made following the Guidelines for Writers of RTP Payload | MUST be made following the Guidelines for Writers of RTP Payload | |||
Format Specifications [RFC2736] and Guidelines for Extending the RTP | Format Specifications [RFC2736] and Guidelines for Extending the RTP | |||
Control Protocol [RFC5968], and SHOULD take into account any future | Control Protocol [RFC5968], and SHOULD take into account any future | |||
guidelines for extending RTP and related protocols that have been | guidelines for extending RTP and related protocols that have been | |||
developed. | developed. | |||
Authors of future extensions are urged to consider the wide range of | Authors of future extensions are urged to consider the wide range of | |||
environments in which RTP is used when recommending extensions, since | environments in which RTP is used when recommending extensions, since | |||
extensions that are applicable in some scenarios can be problematic | extensions that are applicable in some scenarios can be problematic | |||
in others. Where possible, the WebRTC framework should adopt RTP | in others. Where possible, the WebRTC framework will adopt RTP | |||
extensions that are of general utility, to enable easy gatewaying to | extensions that are of general utility, to enable easy implementation | |||
other applications using RTP, rather than adopt mechanisms that are | of a gateway to other applications using RTP, rather than adopt | |||
narrowly targeted at specific WebRTC use cases. | mechanisms that are narrowly targeted at specific WebRTC use cases. | |||
10. Signalling Considerations | 10. Signalling Considerations | |||
RTP is built with the assumption of an external signalling channel | RTP is built with the assumption of an external signalling channel | |||
that can be used to configure the RTP sessions and their features. | that can be used to configure the RTP sessions and their features. | |||
The basic configuration of an RTP session consists of the following | The basic configuration of an RTP session consists of the following | |||
parameters: | parameters: | |||
RTP Profile: The name of the RTP profile to be used in session. The | RTP Profile: The name of the RTP profile to be used in session. The | |||
RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate | RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate | |||
on basic level, as can their secure variants RTP/SAVP [RFC3711] | on basic level, as can their secure variants RTP/SAVP [RFC3711] | |||
and RTP/SAVPF [RFC5124]. The secure variants of the profiles do | and RTP/SAVPF [RFC5124]. The secure variants of the profiles do | |||
not directly interoperate with the non-secure variants, due to the | not directly interoperate with the non-secure variants, due to the | |||
presence of additional header fields in addition to any | presence of additional header fields in addition to any | |||
cryptographic transformation of the packet content. As WebRTC | cryptographic transformation of the packet content. As WebRTC | |||
requires the usage of the RTP/SAVPF profile this can be inferred | requires the usage of the RTP/SAVPF profile this can be inferred | |||
as there is only a single profile, but in SDP this is still | as there is only a single profile, but in SDP this is still | |||
required information to be signalled. Interworking functions may | information that has to be signalled. Interworking functions | |||
transform this into RTP/SAVP for a legacy use case by indicating | might transform this into RTP/SAVP for a legacy use case by | |||
to the WebRTC end-point a RTP/SAVPF end-point and limiting the | indicating to the WebRTC end-point a RTP/SAVPF end-point and | |||
usage of the a=rtcp attribute to indicate a trr-int value of 4 | limiting the usage of the a=rtcp attribute to indicate a trr-int | |||
seconds. | value of 4 seconds. | |||
Transport Information: Source and destination IP address(s) and | Transport Information: Source and destination IP address(s) and | |||
ports for RTP and RTCP MUST be signalled for each RTP session. In | ports for RTP and RTCP MUST be signalled for each RTP session. In | |||
WebRTC these transport addresses will be provided by ICE that | WebRTC these transport addresses will be provided by ICE that | |||
signals candidates and arrives at nominated candidate address | signals candidates and arrives at nominated candidate address | |||
pairs. If RTP and RTCP multiplexing [RFC5761] is to be used, such | pairs. If RTP and RTCP multiplexing [RFC5761] is to be used, such | |||
that a single port is used for RTP and RTCP flows, this MUST be | that a single port is used for RTP and RTCP flows, this MUST be | |||
signalled (see Section 4.5). If several RTP sessions are to be | signalled (see Section 4.5). If several RTP sessions are to be | |||
multiplexed onto a single transport layer flow, this MUST also be | multiplexed onto a single transport layer flow, this MUST also be | |||
signalled (see Section 4.4). | signalled (see Section 4.4). | |||
skipping to change at page 22, line 28 ¶ | skipping to change at page 21, line 28 ¶ | |||
that the other end-point will ignore. But for certain mechanisms | that the other end-point will ignore. But for certain mechanisms | |||
there is requirement for this to happen as interoperability | there is requirement for this to happen as interoperability | |||
failure otherwise happens. | failure otherwise happens. | |||
RTCP Bandwidth: Support for exchanging RTCP Bandwidth values to the | RTCP Bandwidth: Support for exchanging RTCP Bandwidth values to the | |||
end-points will be necessary. This SHALL be done as described in | end-points will be necessary. This SHALL be done as described in | |||
"Session Description Protocol (SDP) Bandwidth Modifiers for RTP | "Session Description Protocol (SDP) Bandwidth Modifiers for RTP | |||
Control Protocol (RTCP) Bandwidth" [RFC3556], or something | Control Protocol (RTCP) Bandwidth" [RFC3556], or something | |||
semantically equivalent. This also ensures that the end-points | semantically equivalent. This also ensures that the end-points | |||
have a common view of the RTCP bandwidth, this is important as too | have a common view of the RTCP bandwidth, this is important as too | |||
different view of the bandwidths may lead to failure to | different view of the bandwidths can lead to failure to | |||
interoperate. | interoperate. | |||
These parameters are often expressed in SDP messages conveyed within | These parameters are often expressed in SDP messages conveyed within | |||
an offer/answer exchange. RTP does not depend on SDP or on the | an offer/answer exchange. RTP does not depend on SDP or on the | |||
offer/answer model, but does require all the necessary parameters to | offer/answer model, but does require all the necessary parameters to | |||
be agreed upon, and provided to the RTP implementation. We note that | be agreed upon, and provided to the RTP implementation. We note that | |||
in the WebRTC context it will depend on the signalling model and API | in the WebRTC context it will depend on the signalling model and API | |||
how these parameters need to be configured but they will be need to | how these parameters need to be configured but they will be need to | |||
either set in the API or explicitly signalled between the peers. | either set in the API or explicitly signalled between the peers. | |||
skipping to change at page 23, line 25 ¶ | skipping to change at page 22, line 25 ¶ | |||
there can be multiple different WebRTC MediaStreams containing a | there can be multiple different WebRTC MediaStreams containing a | |||
given Track (SSRC). To avoid unnecessary duplication of media at the | given Track (SSRC). To avoid unnecessary duplication of media at the | |||
transport level in such cases, a need arises for a binding defining | transport level in such cases, a need arises for a binding defining | |||
which WebRTC MediaStreams a given SSRC is associated with at the | which WebRTC MediaStreams a given SSRC is associated with at the | |||
signalling level. | signalling level. | |||
A proposal for how the binding between WebRTC MediaStreams and SSRC | A proposal for how the binding between WebRTC MediaStreams and SSRC | |||
can be done is specified in "Cross Session Stream Identification in | can be done is specified in "Cross Session Stream Identification in | |||
the Session Description Protocol" [I-D.alvestrand-rtcweb-msid]. | the Session Description Protocol" [I-D.alvestrand-rtcweb-msid]. | |||
(tbd: This text must be improved and achieved consensus on. Interim | (tbd: This text needs to be improved and achieved consensus on. | |||
meeting in June 2012 shows large differences in opinions.) | Interim meeting in June 2012 shows large differences in opinions.) | |||
12. RTP Implementation Considerations | 12. RTP Implementation Considerations | |||
The following provide some guidance on the implementation of the RTP | The following provide some guidance on the implementation of the RTP | |||
features described in this memo. | features described in this memo. | |||
This section discusses RTP functionality that is part of the RTP | This section discusses RTP functionality that is part of the RTP | |||
standard, required by decisions made, or to enable use cases raised | standard, needed by decisions made, or to enable use cases raised and | |||
and their motivations. This discussion is from an WebRTC end-point | their motivations. This discussion is from an WebRTC end-point | |||
perspective. It will occasionally talk about central nodes, but as | perspective. It will occasionally talk about central nodes, but as | |||
this specification is for an end-point, this is where the focus lies. | this specification is for an end-point, this is where the focus lies. | |||
For more discussion on the central nodes and details about RTP | For more discussion on the central nodes and details about RTP | |||
topologies please see Appendix A. | topologies please see Appendix A. | |||
The section will touch on the relation with certain RTP/RTCP | The section will touch on the relation with certain RTP/RTCP | |||
extensions, but will focus on the RTP core functionality. The | extensions, but will focus on the RTP core functionality. The | |||
definition of what functionalities and the level of requirement on | definition of what functionalities and the level of requirement on | |||
implementing it is defined in Section 2. | implementing it is defined in Section 2. | |||
12.1. RTP Sessions and PeerConnection | 12.1. RTP Sessions and PeerConnection | |||
An RTP session is an association among RTP nodes, which have one | An RTP session is an association among RTP nodes, which have one | |||
common SSRC space. An RTP session can include any number of end- | common SSRC space. An RTP session can include any number of end- | |||
points and nodes sourcing, sinking, manipulating or reporting on the | points and nodes sourcing, sinking, manipulating or reporting on the | |||
RTP media streams being sent within the RTP session. A | RTP media streams being sent within the RTP session. A | |||
PeerConnection being a point-to-point association between an end- | PeerConnection being a point-to-point association between an end- | |||
point and another node. That peer node may be both an end-point or | point and another node. That peer node can be both an end-point or | |||
centralized processing node of some type; thus, the RTP session may | centralized processing node of some type; thus, the RTP session can | |||
terminate immediately on the far end of the PeerConnection, but it | terminate immediately on the far end of the PeerConnection, but it | |||
may also continue as further discussed below in Multiparty | might also continue as further discussed below in Multiparty | |||
(Section 12.3) and Multiple RTP End-points (Section 12.7). | (Section 12.3) and Multiple RTP End-points (Section 12.7). | |||
A PeerConnection can contain one or more RTP session depending on how | A PeerConnection can contain one or more RTP session depending on how | |||
it is setup and how many UDP flows it uses. A common usage has been | it is setup and how many UDP flows it uses. A common usage has been | |||
to have one RTP session per media type, e.g. one for audio and one | to have one RTP session per media type, e.g. one for audio and one | |||
for video, each sent over different UDP flows. However, the default | for video, each sent over different UDP flows. However, the default | |||
usage in WebRTC will be to use one RTP session for all media types. | usage in WebRTC will be to use one RTP session for all media types. | |||
This usage then uses only one UDP flow, as also RTP and RTCP | This usage then uses only one UDP flow, as also RTP and RTCP | |||
multiplexing is mandated (Section 4.5). However, for legacy | multiplexing is mandated (Section 4.5). However, for legacy | |||
interworking and network prioritization (Section 12.9) based on | interworking and network prioritization (Section 12.9) based on | |||
flows, a WebRTC end-point needs to support a mode of operation where | flows, a WebRTC end-point needs to support a mode of operation where | |||
one RTP session per media type is used. Currently, each RTP session | one RTP session per media type is used. Currently, each RTP session | |||
must use its own UDP flow. Discussions are ongoing if a solution | has to use its own UDP flow. Discussions are ongoing if a solution | |||
enabling multiple RTP sessions over a single UDP flow, see | enabling multiple RTP sessions over a single UDP flow, see | |||
Section 4.4. | Section 4.4. | |||
The multi-unicast- or mesh-based multi-party topology (Figure 1) is a | The multi-unicast- or mesh-based multi-party topology (Figure 1) is a | |||
good example for this section as it concerns the relation between RTP | good example for this section as it concerns the relation between RTP | |||
sessions and PeerConnections. In this topology, each participant | sessions and PeerConnections. In this topology, each participant | |||
sends individual unicast RTP/UDP/IP flows to each of the other | sends individual unicast RTP/UDP/IP flows to each of the other | |||
participants using independent PeerConnections in a full mesh. This | participants using independent PeerConnections in a full mesh. This | |||
topology has the benefit of not requiring central nodes. The | topology has the benefit of not requiring central nodes. The | |||
downside is that it increases the used bandwidth at each sender by | downside is that it increases the used bandwidth at each sender by | |||
skipping to change at page 25, line 20 ¶ | skipping to change at page 24, line 20 ¶ | |||
C based on RTCP. This has not been seen as a significant downside as | C based on RTCP. This has not been seen as a significant downside as | |||
no one has yet seen a clear need for why A would need to know about | no one has yet seen a clear need for why A would need to know about | |||
the B's and C's communication. An advantage of using separate RTP | the B's and C's communication. An advantage of using separate RTP | |||
sessions is that it enables using different media bit-rates to the | sessions is that it enables using different media bit-rates to the | |||
different peers, thus not forcing B to endure the same quality | different peers, thus not forcing B to endure the same quality | |||
reductions if there are limitations in the transport from A to C as C | reductions if there are limitations in the transport from A to C as C | |||
will. | will. | |||
12.2. Multiple Sources | 12.2. Multiple Sources | |||
A WebRTC end-point may have multiple cameras, microphones or audio | A WebRTC end-point might have multiple cameras, microphones or audio | |||
inputs and thus a single end-point can source multiple RTP media | inputs and thus a single end-point can source multiple RTP media | |||
streams of the same media type concurrently. Even if an end-point | streams of the same media type concurrently. Even if an end-point | |||
does not have multiple media sources of the same media type it will | does not have multiple media sources of the same media type it has to | |||
be required to support transmission using multiple SSRCs concurrently | support transmission using multiple SSRCs concurrently in the same | |||
in the same RTP session. This is due to the requirement on an WebRTC | RTP session. This is due to the requirement on an WebRTC end-point | |||
end-point to support multiple media types in one RTP session. For | to support multiple media types in one RTP session. For example, one | |||
example, one audio and one video source can result in the end-point | audio and one video source can result in the end-point sending with | |||
sending with two different SSRCs in the same RTP session. As multi- | two different SSRCs in the same RTP session. As multi-party | |||
party conferences are supported, as discussed below in Section 12.3, | conferences are supported, as discussed below in Section 12.3, a | |||
a WebRTC end-point will need to be capable of receiving, decoding and | WebRTC end-point will need to be capable of receiving, decoding and | |||
playout multiple RTP media streams of the same type concurrently. | play out multiple RTP media streams of the same type concurrently. | |||
tbd: Are any mechanism needed to signal limitations in the number of | tbd: Are any mechanism needed to signal limitations in the number of | |||
SSRC that an end-point can handle? | active SSRC that an end-point can handle? | |||
12.3. Multiparty | 12.3. Multiparty | |||
There are numerous situations and clear use cases for WebRTC | There are numerous situations and clear use cases for WebRTC | |||
supporting RTP sessions supporting multi-party. This can be realized | supporting RTP sessions supporting multi-party. This can be realized | |||
in a number of ways using a number of different implementation | in a number of ways using a number of different implementation | |||
strategies. In the following, the focus is on the different set of | strategies. In the following, the focus is on the different set of | |||
WebRTC end-point requirements that arise from different sets of | WebRTC end-point requirements that arise from different sets of | |||
multi-party topologies. | multi-party topologies. | |||
The multi-unicast mesh (Figure 1)-based multi-party topology | The multi-unicast mesh (Figure 1)-based multi-party topology | |||
discussed above provides a non-centralized solution but may incur a | discussed above provides a non-centralized solution but can incur a | |||
heavy tax on the end-points' outgoing paths. It may also consume | heavy tax on the end-points' outgoing paths. It can also consume | |||
large amount of encoding resources if each outgoing stream is | large amount of encoding resources if each outgoing stream is | |||
specifically encoded. If an encoding is transmitted to multiple | specifically encoded. If an encoding is transmitted to multiple | |||
parties, as in some implementations of the mesh case, a requirement | parties, as in some implementations of the mesh case, a requirement | |||
on the end-point becomes to be able to create RTP media streams | on the end-point becomes to be able to create RTP media streams | |||
suitable for multiple destinations requirements. These requirements | suitable for multiple destinations requirements. These requirements | |||
may both be dependent on transport path and the different end-points | can both be dependent on transport path and the different end-points | |||
preferences related to playout of the media. | preferences related to play out of the media. | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| Mixer | | | Mixer | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 2: RTP Mixer with Only Unicast Paths | Figure 2: RTP Mixer with Only Unicast Paths | |||
A Mixer (Figure 2) is an RTP end-point that optimizes the | A Mixer (Figure 2) is an RTP end-point that optimizes the | |||
transmission of RTP media streams from certain perspectives, either | transmission of RTP media streams from certain perspectives, either | |||
by only sending some of the received RTP media stream to any given | by only sending some of the received RTP media stream to any given | |||
receiver or by providing a combined RTP media stream out of a set of | receiver or by providing a combined RTP media stream out of a set of | |||
contributing streams. There are various methods of implementation as | contributing streams. There are various methods of implementation as | |||
discussed in Appendix A.3. A common aspect is that these central | discussed in Appendix A.3. A common aspect is that these central | |||
nodes may use a number of tools to control the media encoding | nodes can use a number of tools to control the media encoding | |||
provided by a WebRTC end-point. This includes functions like | provided by a WebRTC end-point. This includes functions like | |||
requesting breaking the encoding chain and have the encoder produce a | requesting breaking the encoding chain and have the encoder produce a | |||
so called Intra frame. Another is limiting the bit-rate of a given | so called Intra frame. Another is limiting the bit-rate of a given | |||
stream to better suit the mixer view of the multiple down-streams. | stream to better suit the mixer view of the multiple down-streams. | |||
Others are controlling the most suitable frame-rate, picture | Others are controlling the most suitable frame-rate, picture | |||
resolution, the trade-off between frame-rate and spatial quality. | resolution, the trade-off between frame-rate and spatial quality. | |||
A mixer gets a significant responsibility to correctly perform | A mixer gets a significant responsibility to correctly perform | |||
congestion control, source identification, manage synchronization | congestion control, source identification, manage synchronization | |||
while providing the application with suitable media optimizations. | while providing the application with suitable media optimizations. | |||
Mixers also need to be trusted nodes when it comes to security as it | Mixers also need to be trusted nodes when it comes to security as it | |||
manipulates either RTP or the media itself before sending it on | manipulates either RTP or the media itself before sending it on | |||
towards the end-point(s), thus they must be able to decrypt and then | towards the end-point(s), thus they need to be able to decrypt and | |||
encrypt it before sending it out. | then encrypt it before sending it out. | |||
12.4. SSRC Collision Detection | 12.4. SSRC Collision Detection | |||
The RTP standard [RFC3550] requires any RTP implementation to have | The RTP standard [RFC3550] requires any RTP implementation to have | |||
support for detecting and handling SSRC collisions, i.e., resolve the | support for detecting and handling SSRC collisions, i.e., resolve the | |||
conflict when two different end-points use the same SSRC value. This | conflict when two different end-points use the same SSRC value. This | |||
requirement also applies to WebRTC end-points. There are several | requirement also applies to WebRTC end-points. There are several | |||
scenarios where SSRC collisions may occur. | scenarios where SSRC collisions can occur. | |||
In a point-to-point session where each SSRC is associated with either | In a point-to-point session where each SSRC is associated with either | |||
of the two end-points and where the main media carrying SSRC | of the two end-points and where the main media carrying SSRC | |||
identifier will be announced in the signalling channel, a collision | identifier will be announced in the signalling channel, a collision | |||
is less likely to occur due to the information about used SSRCs | is less likely to occur due to the information about used SSRCs | |||
provided by Source-Specific SDP Attributes [RFC5576]. Still if both | provided by Source-Specific SDP Attributes [RFC5576]. Still if both | |||
end-points start uses an new SSRC identifier prior to having | end-points start uses an new SSRC identifier prior to having | |||
signalled it to the peer and received acknowledgement on the | signalled it to the peer and received acknowledgement on the | |||
signalling message, there can be collisions. The Source-Specific SDP | signalling message, there can be collisions. The Source-Specific SDP | |||
Attributes [RFC5576] contains no mechanism to resolve SSRC collisions | Attributes [RFC5576] contains no mechanism to resolve SSRC collisions | |||
or reject a end-points usage of an SSRC. | or reject a end-points usage of an SSRC. | |||
There could also appear unsignalled SSRCs. This is more likely than | There could also appear SSRC values that are not signalled. This is | |||
it appears as certain RTP functions need extra SSRCs to provide | more likely than it appears as certain RTP functions need extra SSRCs | |||
functionality related to another (the "main") SSRC, for example, SSRC | to provide functionality related to another (the "main") SSRC, for | |||
multiplexed RTP retransmission [RFC4588]. In those cases, an end- | example, SSRC multiplexed RTP retransmission [RFC4588]. In those | |||
point can create a new SSRC that strictly doesn't need to be | cases, an end-point can create a new SSRC that strictly doesn't need | |||
announced over the signalling channel to function correctly on both | to be announced over the signalling channel to function correctly on | |||
RTP and PeerConnection level. | both RTP and PeerConnection level. | |||
The more likely case for SSRC collision is that multiple end-points | The more likely case for SSRC collision is that multiple end-points | |||
in a multiparty conference create new sources and signals those | in a multiparty conference create new sources and signals those | |||
towards the central server. In cases where the SSRC/CSRC are | towards the central server. In cases where the SSRC/CSRC are | |||
propagated between the different end-points from the central node | propagated between the different end-points from the central node | |||
collisions can occur. | collisions can occur. | |||
Another scenario is when the central node manages to connect an end- | Another scenario is when the central node manages to connect an end- | |||
point's PeerConnection to another PeerConnection the end-point | point's PeerConnection to another PeerConnection the end-point | |||
already has, thus forming a loop where the end-point will receive its | already has, thus forming a loop where the end-point will receive its | |||
skipping to change at page 27, line 42 ¶ | skipping to change at page 26, line 42 ¶ | |||
12.5. Contributing Sources | 12.5. Contributing Sources | |||
Contributing Sources (CSRC) is a functionality in the RTP header that | Contributing Sources (CSRC) is a functionality in the RTP header that | |||
allows an RTP node to combine media packets from multiple sources | allows an RTP node to combine media packets from multiple sources | |||
into one and to identify which sources yielded the result. For | into one and to identify which sources yielded the result. For | |||
WebRTC end-points, supporting contributing sources is trivial. The | WebRTC end-points, supporting contributing sources is trivial. The | |||
set of CSRCs is provided in a given RTP packet. This information can | set of CSRCs is provided in a given RTP packet. This information can | |||
then be exposed to the applications using some form of API, possibly | then be exposed to the applications using some form of API, possibly | |||
a mapping back into WebRTC MediaStream identities to avoid having to | a mapping back into WebRTC MediaStream identities to avoid having to | |||
expose two namespaces and the handling of SSRC collision handling to | expose two name spaces and the handling of SSRC collision handling to | |||
the JavaScript. | the JavaScript. | |||
(tbd: should the API provide the ability to add a CSRC list to an | (tbd: does the API need to provide the ability to add a CSRC list to | |||
outgoing packet? this is only useful if the sender is mixing content) | an outgoing packet? this is only useful if the sender is mixing | |||
content) | ||||
There are also at least one extension that depends on the CRSRC list | There are also at least one extension that depends on the CSRC list | |||
being used: the Mixer-to-client audio level [RFC6465], which enhances | being used: the Mixer-to-client audio level [RFC6465], which enhances | |||
the information provided by the CSRC to actual energy levels for | the information provided by the CSRC to actual energy levels for | |||
audio for each contributing source. | audio for each contributing source. | |||
12.6. Media Synchronization | 12.6. Media Synchronization | |||
When an end-point sends media from more than one media source, it | When an end-point sends media from more than one media source, it | |||
needs to consider if (and which of) these media sources are to be | needs to consider if (and which of) these media sources are to be | |||
synchronized. In RTP/RTCP, synchronisation is provided by having a | synchronized. In RTP/RTCP, synchronisation is provided by having a | |||
set of RTP media streams be indicated as coming from the same | set of RTP media streams be indicated as coming from the same | |||
skipping to change at page 28, line 30 ¶ | skipping to change at page 27, line 31 ¶ | |||
receiver and, if desired, the streams can be synchronized. The | receiver and, if desired, the streams can be synchronized. The | |||
requirement is for the media sender to provide the correlation | requirement is for the media sender to provide the correlation | |||
information; it is up to the receiver to use it or not. | information; it is up to the receiver to use it or not. | |||
12.7. Multiple RTP End-points | 12.7. Multiple RTP End-points | |||
Some usages of RTP beyond the recommend topologies result in that an | Some usages of RTP beyond the recommend topologies result in that an | |||
WebRTC end-point sending media in an RTP session out over a single | WebRTC end-point sending media in an RTP session out over a single | |||
PeerConnection will receive receiver reports from multiple RTP | PeerConnection will receive receiver reports from multiple RTP | |||
receivers. Note that receiving multiple receiver reports is expected | receivers. Note that receiving multiple receiver reports is expected | |||
because any RTP node that has multiple SSRCs is required to report to | because any RTP node that has multiple SSRCs has to report to the | |||
the media sender. The difference here is that they are multiple | media sender. The difference here is that they are multiple nodes, | |||
nodes, and thus will likely have different path characteristics. | and thus will likely have different path characteristics. | |||
RTP Mixers may create a situation where an end-point experiences a | RTP Mixers can create a situation where an end-point experiences a | |||
situation in-between a session with only two end-points and multiple | situation in-between a session with only two end-points and multiple | |||
end-points. Mixers are expected to not forward RTCP reports | end-points. Mixers are expected to not forward RTCP reports | |||
regarding RTP media streams across themselves. This is due to the | regarding RTP media streams across themselves. This is due to the | |||
difference in the RTP media streams provided to the different end- | difference in the RTP media streams provided to the different end- | |||
points. The original media source lacks information about a mixer's | points. The original media source lacks information about a mixer's | |||
manipulations prior to sending it the different receivers. This | manipulations prior to sending it the different receivers. This | |||
setup also results in that an end-point's feedback or requests goes | scenario also results in that an end-point's feedback or requests | |||
to the mixer. When the mixer can't act on this by itself, it is | goes to the mixer. When the mixer can't act on this by itself, it is | |||
forced to go to the original media source to fulfill the receivers | forced to go to the original media source to fulfil the receivers | |||
request. This will not necessarily be explicitly visible any RTP and | request. This will not necessarily be explicitly visible any RTP and | |||
RTCP traffic, but the interactions and the time to complete them will | RTCP traffic, but the interactions and the time to complete them will | |||
indicate such dependencies. | indicate such dependencies. | |||
The topologies in which an end-point receives receiver reports from | The topologies in which an end-point receives receiver reports from | |||
multiple other end-points are the centralized relay, multicast and an | multiple other end-points are the centralized relay, multicast and an | |||
end-point forwarding an RTP media stream. Having multiple RTP nodes | end-point forwarding an RTP media stream. Having multiple RTP nodes | |||
receive an RTP flow and send reports and feedback about it has | receive an RTP flow and send reports and feedback about it has | |||
several impacts. As previously discussed (Section 12.3) any codec | several impacts. As previously discussed (Section 12.3) any codec | |||
control and rate control needs to be capable of merging the | control and rate control needs to be capable of merging the | |||
skipping to change at page 29, line 41 ¶ | skipping to change at page 28, line 43 ¶ | |||
feed the same set of WebRTC MediaStreams. Another method is to use | feed the same set of WebRTC MediaStreams. Another method is to use | |||
multiple WebRTC MediaStreams that are differently configured when it | multiple WebRTC MediaStreams that are differently configured when it | |||
comes to the media parameters. This would result in that multiple | comes to the media parameters. This would result in that multiple | |||
different RTP Media Streams (SSRCs) being in used with different | different RTP Media Streams (SSRCs) being in used with different | |||
encoding based on the same media source (camera, microphone). | encoding based on the same media source (camera, microphone). | |||
When intending to use simulcast it is important that this is made | When intending to use simulcast it is important that this is made | |||
explicit so that the end-points don't automatically try to optimize | explicit so that the end-points don't automatically try to optimize | |||
away the different encodings and provide a single common version. | away the different encodings and provide a single common version. | |||
Thus, some explicit indications that the intent really is to have | Thus, some explicit indications that the intent really is to have | |||
different media encodings is likely required. It should be noted | different media encodings is likely needed. It is to be noted that | |||
that it might be a central node, rather than an WebRTC end-point that | it might be a central node, rather than an WebRTC end-point that | |||
would benefit from receiving simulcasted media sources. | would benefit from receiving simulcast media sources. | |||
tbd: How to perform simulcast needs to be determined and the | tbd: How to perform simulcast needs to be determined and the | |||
appropriate API or signalling for its usage needs to be defined. | appropriate API or signalling for its usage needs to be defined. | |||
12.9. Differentiated Treatment of Flows | 12.9. Differentiated Treatment of Flows | |||
There are use cases for differentiated treatment of RTP media | There are use cases for differentiated treatment of RTP media | |||
streams. Such differentiation can happen at several places in the | streams. Such differentiation can happen at several places in the | |||
system. First of all is the prioritization within the end-point | system. First of all is the prioritization within the end-point | |||
sending the media, which controls, both which RTP media streams that | sending the media, which controls, both which RTP media streams that | |||
will be sent, and their allocation of bit-rate out of the current | will be sent, and their allocation of bit-rate out of the current | |||
available aggregate as determined by the congestion control. | available aggregate as determined by the congestion control. | |||
Secondly, the network can prioritize packet flows, including RTP | Secondly, the network can prioritize packet flows, including RTP | |||
media streams. Typically, differential treatment includes two steps, | media streams. Typically, differential treatment includes two steps, | |||
the first being identifying whether an IP packet belongs to a class | the first being identifying whether an IP packet belongs to a class | |||
which should be treated differently, the second the actual mechanism | that has to be treated differently, the second the actual mechanism | |||
to prioritize packets. This is done according to three methods; | to prioritize packets. This is done according to three methods; | |||
Diffserv: The end-point marks a packet with a diffserv code point to | DiffServ: The end-point marks a packet with a DiffServ code point to | |||
indicate to the network that the packet belongs to a particular | indicate to the network that the packet belongs to a particular | |||
class. | class. | |||
Flow based: Packets that shall be given a particular treatment are | Flow based: Packets that need to be given a particular treatment are | |||
identified using a combination of IP and port address. | identified using a combination of IP and port address. | |||
Deep Packet Inspection: A network classifier (DPI) inspects the | Deep Packet Inspection: A network classifier (DPI) inspects the | |||
packet and tries to determine if the packet represents a | packet and tries to determine if the packet represents a | |||
particular application and type that is to be prioritized. | particular application and type that is to be prioritized. | |||
With the exception of diffserv both flow based and DPI have issues | With the exception of DiffServ both flow based and DPI have issues | |||
with running multiple media types and flows on a single UDP flow, | with running multiple media types and flows on a single UDP flow, | |||
especially when combined with data transport (SCTP/DTLS). DPI has | especially when combined with data transport (SCTP/DTLS). DPI has | |||
issues because multiple types of flows are aggregated and thus it | issues because multiple types of flows are aggregated and thus it | |||
becomes more difficult to analyse them. The flow-based | becomes more difficult to analyse them. The flow-based | |||
differentiation will provide the same treatment to all packets within | differentiation will provide the same treatment to all packets within | |||
the flow, i.e., relative prioritization is not possible. Moreover, | the flow, i.e., relative prioritization is not possible. Moreover, | |||
if the resources are limited it may not be possible to provide | if the resources are limited it might not be possible to provide | |||
differential treatment compared to best-effort for all the flows in a | differential treatment compared to best-effort for all the flows in a | |||
WebRTC application. | WebRTC application. | |||
When flow-based differentiation is available the WebRTC application | When flow-based differentiation is available the WebRTC application | |||
needs to know about it so that it can provide the separation of the | needs to know about it so that it can provide the separation of the | |||
RTP media streams onto different UDP flows to enable a more granular | RTP media streams onto different UDP flows to enable a more granular | |||
usage of flow based differentiation. | usage of flow based differentiation. | |||
Diffserv assumes that either the end-point or a classifier can mark | DiffServ assumes that either the end-point or a classifier can mark | |||
the packets with an appropriate DSCP so that the packets are treated | the packets with an appropriate DSCP so that the packets are treated | |||
according to that marking. If the end-point is to mark the traffic | according to that marking. If the end-point is to mark the traffic | |||
two requirements arise in the WebRTC context: 1) The WebRTC | two requirements arise in the WebRTC context: 1) The WebRTC | |||
application or browser has to know which DSCP to use and that it can | application or browser has to know which DSCP to use and that it can | |||
use them on some set of RTP media streams. 2) The information needs | use them on some set of RTP media streams. 2) The information needs | |||
to be propagated to the operating system when transmitting the | to be propagated to the operating system when transmitting the | |||
packet. | packet. These issues are discussed in DSCP and other packet markings | |||
for RTCWeb QoS [I-D.ietf-rtcweb-qos]. | ||||
tbd: The model for providing differentiated treatment needs to be | tbd: The model for providing differentiated treatment needs to be | |||
evolved. This includes: | evolved. Most of this is not the responsibility of this memo. | |||
However, this memo could include: | ||||
1. How the application can prioritize MediaStreamTracks differently | 1. How can the application can prioritize MediaStreamTracks | |||
in the API | differently in the API? | |||
2. How the browser or application determine availability of | 2. How MediaStreamTrack prioritization maps to the RTP level, and | |||
transport differentiation | what type of marking behaviour can occur on the RTP media stream | |||
and its datagram? | ||||
3. How to learn about any configuration information for transport | 13. Open Issues | |||
differentiation, such as DSCPs. | ||||
13. IANA Considerations | This section contains a summary of the open issues or to be done | |||
things noted in the document: | ||||
1. Need to add references to the RTP payload format for the Video | ||||
Codec chosen in Section 4.3. | ||||
2. The methods and solutions for RTP multiplexing over a single | ||||
transport is not yet finalized in Section 4.4. | ||||
3. RTP congestion control algorithms will probably require some | ||||
feedback information to be conveyed in RTCP. Are the tools that | ||||
are mandated by this memo sufficient, or do we need additional | ||||
information? | ||||
4. RTP congestion control could be implementing using either a | ||||
sender-based algorithm or a receiver-based algorithm. To ensure | ||||
interoperability, does this memo need to mandate which end is in | ||||
charge of congestion control for a path? | ||||
5. Still open if any RTCP XR performance metrics are needed, as | ||||
discussed in Section 8. | ||||
6. The API mapping to RTP level concepts has to be agreed and | ||||
documented in Section 11. | ||||
7. An open question if any requirements are needed to agree and | ||||
limit the number of simultaneously used media sources (SSRCs) | ||||
within an RTP session. See Section 12.2. | ||||
8. Is an API needed for expressing any application level media | ||||
mixing of an RTP media stream so that the correct CSRC list can | ||||
be set as discussed in Section 12.5? | ||||
9. The method for achieving simulcast of a media source has to be | ||||
decided as discussed in Section 12.8. | ||||
10. Possible documentation of what support for differentiated | ||||
treatment that are needed on RTP level as the API and the | ||||
network level specification matures as discussed in | ||||
Section 12.9. | ||||
11. Editing of Appendix A to remove redundancy between this and the | ||||
update of RTP Topologies | ||||
[I-D.westerlund-avtcore-rtp-topologies-update]. | ||||
14. IANA Considerations | ||||
This memo makes no request of IANA. | This memo makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section is to be removed on publication as | |||
RFC. | an RFC. | |||
14. Security Considerations | 15. Security Considerations | |||
RTP and its various extensions each have their own security | The security considerations for the WebRTC framework are described in | |||
considerations. These should be taken into account when considering | [I-D.ietf-rtcweb-security]. The overall security architecture for | |||
the security properties of the complete suite. We currently don't | WebRTC is described in [I-D.ietf-rtcweb-security-arch]. | |||
think this suite creates any additional security issues or | ||||
properties. The use of SRTP [RFC3711] will provide protection or | ||||
mitigation against most of the fundamental issues by offering | ||||
confidentiality, integrity and partial source authentication. A | ||||
mandatory to implement media security solution will be required to be | ||||
picked. We currently don't discuss the key-management aspect of SRTP | ||||
in this memo, that needs to be done taking the WebRTC communication | ||||
model into account. | ||||
Privacy concerns are under discussion and the generation of non- | The security considerations of the RTP specification, the RTP/SAVPF | |||
trackable CNAMEs are under discussion. | profile, and the various RTP/RTCP extensions and RTP payload formats | |||
that form the complete protocol suite described in this memo apply. | ||||
We do not believe there are any new security considerations resulting | ||||
from the combination of these various protocol extensions. | ||||
The guidelines in [RFC6562] apply when using variable bit rate (VBR) | The Extended Secure RTP Profile for Real-time Transport Control | |||
audio codecs, for example Opus or the Mixer audio level header | Protocol (RTCP)-Based Feedback [RFC5124] (RTP/SAVPF) provides | |||
extensions. | handling of fundamental issues by offering confidentiality, integrity | |||
and partial source authentication. A mandatory to implement media | ||||
security solution is (tbd). | ||||
Security considerations for the WebRTC work are discussed in | tbd: Privacy concerns, and the generation of untraceable CNAMEs, are | |||
[I-D.ietf-rtcweb-security]. | under discussion. | |||
15. Acknowledgements | The guidelines in [RFC6562] apply when using variable bit rate (VBR) | |||
audio codecs, e.g., Opus or the Mixer audio level header extensions. | ||||
16. Acknowledgements | ||||
The authors would like to thank Harald Alvestrand, Cary Bran, Charles | The authors would like to thank Harald Alvestrand, Cary Bran, Charles | |||
Eckel and Cullen Jennings for valuable feedback. | Eckel and Cullen Jennings for valuable feedback. | |||
16. References | 17. References | |||
16.1. Normative References | 17.1. Normative References | |||
[I-D.holmberg-mmusic-sdp-bundle-negotiation] | [I-D.holmberg-mmusic-sdp-bundle-negotiation] | |||
Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation | Holmberg, C. and H. Alvestrand, "Multiplexing Negotiation | |||
Using Session Description Protocol (SDP) Port Numbers", | Using Session Description Protocol (SDP) Port Numbers", | |||
draft-holmberg-mmusic-sdp-bundle-negotiation-00 (work in | draft-holmberg-mmusic-sdp-bundle-negotiation-00 (work in | |||
progress), October 2011. | progress), October 2011. | |||
[I-D.ietf-avtcore-rtp-circuit-breakers] | ||||
Perkins, C. and V. Singh, "RTP Congestion Control: Circuit | ||||
Breakers for Unicast Sessions", | ||||
draft-ietf-avtcore-rtp-circuit-breakers-00 (work in | ||||
progress), October 2012. | ||||
[I-D.ietf-avtcore-srtp-encrypted-header-ext] | [I-D.ietf-avtcore-srtp-encrypted-header-ext] | |||
Lennox, J., "Encryption of Header Extensions in the Secure | Lennox, J., "Encryption of Header Extensions in the Secure | |||
Real-Time Transport Protocol (SRTP)", | Real-Time Transport Protocol (SRTP)", | |||
draft-ietf-avtcore-srtp-encrypted-header-ext-01 (work in | draft-ietf-avtcore-srtp-encrypted-header-ext-02 (work in | |||
progress), October 2011. | progress), July 2012. | |||
[I-D.ietf-avtext-multiple-clock-rates] | [I-D.ietf-avtext-multiple-clock-rates] | |||
Petit-Huguenin, M. and G. Zorn, "Support for Multiple | Petit-Huguenin, M. and G. Zorn, "Support for Multiple | |||
Clock Rates in an RTP Session", | Clock Rates in an RTP Session", | |||
draft-ietf-avtext-multiple-clock-rates-05 (work in | draft-ietf-avtext-multiple-clock-rates-06 (work in | |||
progress), May 2012. | progress), October 2012. | |||
[I-D.ietf-rtcweb-audio] | ||||
Valin, J. and C. Bran, "WebRTC Audio Codec and Processing | ||||
Requirements", draft-ietf-rtcweb-audio-00 (work in | ||||
progress), September 2012. | ||||
[I-D.ietf-rtcweb-overview] | [I-D.ietf-rtcweb-overview] | |||
Alvestrand, H., "Overview: Real Time Protocols for Brower- | Alvestrand, H., "Overview: Real Time Protocols for Brower- | |||
based Applications", draft-ietf-rtcweb-overview-04 (work | based Applications", draft-ietf-rtcweb-overview-04 (work | |||
in progress), June 2012. | in progress), June 2012. | |||
[I-D.ietf-rtcweb-security] | [I-D.ietf-rtcweb-security] | |||
Rescorla, E., "Security Considerations for RTC-Web", | Rescorla, E., "Security Considerations for RTC-Web", | |||
draft-ietf-rtcweb-security-03 (work in progress), | draft-ietf-rtcweb-security-03 (work in progress), | |||
June 2012. | June 2012. | |||
[I-D.ietf-rtcweb-security-arch] | ||||
Rescorla, E., "RTCWEB Security Architecture", | ||||
draft-ietf-rtcweb-security-arch-05 (work in progress), | ||||
October 2012. | ||||
[I-D.lennox-rtcweb-rtp-media-type-mux] | [I-D.lennox-rtcweb-rtp-media-type-mux] | |||
Rosenberg, J. and J. Lennox, "Multiplexing Multiple Media | Rosenberg, J. and J. Lennox, "Multiplexing Multiple Media | |||
Types In a Single Real-Time Transport Protocol (RTP) | Types In a Single Real-Time Transport Protocol (RTP) | |||
Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work | Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work | |||
in progress), October 2011. | in progress), October 2011. | |||
[I-D.perkins-avtcore-rtp-circuit-breakers] | [I-D.rescorla-avtcore-6222bis] | |||
Perkins, C. and V. Singh, "RTP Congestion Control: Circuit | Rescorla, E. and A. Begen, "Guidelines for Choosing RTP | |||
Breakers for Unicast Sessions", | Control Protocol (RTCP) Canonical Names (CNAMEs)", | |||
draft-perkins-avtcore-rtp-circuit-breakers-00 (work in | draft-rescorla-avtcore-6222bis-00 (work in progress), | |||
progress), March 2012. | October 2012. | |||
[I-D.terriberry-avp-codecs] | ||||
Terriberry, T., "Update to Recommended Codecs for the AVP | ||||
RTP Profile", draft-terriberry-avp-codecs-00 (work in | ||||
progress), August 2012. | ||||
[I-D.westerlund-avtcore-transport-multiplexing] | [I-D.westerlund-avtcore-transport-multiplexing] | |||
Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a | Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a | |||
Single Lower-Layer Transport", | Single Lower-Layer Transport", | |||
draft-westerlund-avtcore-transport-multiplexing-02 (work | draft-westerlund-avtcore-transport-multiplexing-04 (work | |||
in progress), March 2012. | in progress), October 2012. | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP | [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP | |||
Payload Format Specifications", BCP 36, RFC 2736, | Payload Format Specifications", BCP 36, RFC 2736, | |||
December 1999. | December 1999. | |||
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | |||
Jacobson, "RTP: A Transport Protocol for Real-Time | Jacobson, "RTP: A Transport Protocol for Real-Time | |||
skipping to change at page 34, line 22 ¶ | skipping to change at page 34, line 45 ¶ | |||
[RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and | [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and | |||
Control Packets on a Single Port", RFC 5761, April 2010. | Control Packets on a Single Port", RFC 5761, April 2010. | |||
[RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer | [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer | |||
Security (DTLS) Extension to Establish Keys for the Secure | Security (DTLS) Extension to Establish Keys for the Secure | |||
Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. | Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. | |||
[RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP | [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP | |||
Flows", RFC 6051, November 2010. | Flows", RFC 6051, November 2010. | |||
[RFC6222] Begen, A., Perkins, C., and D. Wing, "Guidelines for | ||||
Choosing RTP Control Protocol (RTCP) Canonical Names | ||||
(CNAMEs)", RFC 6222, April 2011. | ||||
[RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time | [RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time | |||
Transport Protocol (RTP) Header Extension for Client-to- | Transport Protocol (RTP) Header Extension for Client-to- | |||
Mixer Audio Level Indication", RFC 6464, December 2011. | Mixer Audio Level Indication", RFC 6464, December 2011. | |||
[RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time | [RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time | |||
Transport Protocol (RTP) Header Extension for Mixer-to- | Transport Protocol (RTP) Header Extension for Mixer-to- | |||
Client Audio Level Indication", RFC 6465, December 2011. | Client Audio Level Indication", RFC 6465, December 2011. | |||
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of | [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of | |||
Variable Bit Rate Audio with Secure RTP", RFC 6562, | Variable Bit Rate Audio with Secure RTP", RFC 6562, | |||
March 2012. | March 2012. | |||
16.2. Informative References | 17.2. Informative References | |||
[I-D.alvestrand-rtcweb-msid] | [I-D.alvestrand-rtcweb-msid] | |||
Alvestrand, H., "Cross Session Stream Identification in | Alvestrand, H., "Cross Session Stream Identification in | |||
the Session Description Protocol", | the Session Description Protocol", | |||
draft-alvestrand-rtcweb-msid-02 (work in progress), | draft-alvestrand-rtcweb-msid-02 (work in progress), | |||
May 2012. | May 2012. | |||
[I-D.ietf-avt-srtp-ekt] | [I-D.ietf-avt-srtp-ekt] | |||
Wing, D., McGrew, D., and K. Fischer, "Encrypted Key | Wing, D., McGrew, D., and K. Fischer, "Encrypted Key | |||
Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 | Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 | |||
(work in progress), October 2011. | (work in progress), October 2011. | |||
[I-D.ietf-rtcweb-qos] | ||||
Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and | ||||
other packet markings for RTCWeb QoS", | ||||
draft-ietf-rtcweb-qos-00 (work in progress), October 2012. | ||||
[I-D.ietf-rtcweb-use-cases-and-requirements] | [I-D.ietf-rtcweb-use-cases-and-requirements] | |||
Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- | Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- | |||
Time Communication Use-cases and Requirements", | Time Communication Use-cases and Requirements", | |||
draft-ietf-rtcweb-use-cases-and-requirements-09 (work in | draft-ietf-rtcweb-use-cases-and-requirements-09 (work in | |||
progress), June 2012. | progress), June 2012. | |||
[I-D.jesup-rtp-congestion-reqs] | [I-D.jesup-rtp-congestion-reqs] | |||
Jesup, R. and H. Alvestrand, "Congestion Control | Jesup, R. and H. Alvestrand, "Congestion Control | |||
Requirements For Real Time Media", | Requirements For Real Time Media", | |||
draft-jesup-rtp-congestion-reqs-00 (work in progress), | draft-jesup-rtp-congestion-reqs-00 (work in progress), | |||
March 2012. | March 2012. | |||
[I-D.westerlund-avtcore-multiplex-architecture] | [I-D.westerlund-avtcore-multiplex-architecture] | |||
Westerlund, M., Burman, B., and C. Perkins, "RTP | Westerlund, M., Burman, B., Perkins, C., and H. | |||
Multiplexing Architecture", | Alvestrand, "Guidelines for using the Multiplexing | |||
draft-westerlund-avtcore-multiplex-architecture-01 (work | Features of RTP", | |||
in progress), March 2012. | draft-westerlund-avtcore-multiplex-architecture-02 (work | |||
in progress), July 2012. | ||||
[I-D.westerlund-avtcore-rtp-topologies-update] | ||||
Westerlund, M. and S. Wenger, "RTP Topologies", | ||||
draft-westerlund-avtcore-rtp-topologies-update-01 (work in | ||||
progress), October 2012. | ||||
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | |||
Control Protocol (DCCP) Congestion Control ID 2: TCP-like | Control Protocol (DCCP) Congestion Control ID 2: TCP-like | |||
Congestion Control", RFC 4341, March 2006. | Congestion Control", RFC 4341, March 2006. | |||
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | |||
Datagram Congestion Control Protocol (DCCP) Congestion | Datagram Congestion Control Protocol (DCCP) Congestion | |||
Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | |||
March 2006. | March 2006. | |||
[RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient | [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient | |||
Stream Loss-Tolerant Authentication (TESLA) in the Secure | Stream Loss-Tolerant Authentication (TESLA) in the Secure | |||
Real-time Transport Protocol (SRTP)", RFC 4383, | Real-time Transport Protocol (SRTP)", RFC 4383, | |||
February 2006. | February 2006. | |||
[RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control | |||
(TFRC): The Small-Packet (SP) Variant", RFC 4828, | (TFRC): The Small-Packet (SP) Variant", RFC 4828, | |||
April 2007. | April 2007. | |||
[RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, | ||||
January 2008. | ||||
[RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP | [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP | |||
Friendly Rate Control (TFRC): Protocol Specification", | Friendly Rate Control (TFRC): Protocol Specification", | |||
RFC 5348, September 2008. | RFC 5348, September 2008. | |||
[RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific | [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific | |||
Media Attributes in the Session Description Protocol | Media Attributes in the Session Description Protocol | |||
(SDP)", RFC 5576, June 2009. | (SDP)", RFC 5576, June 2009. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
skipping to change at page 36, line 18 ¶ | skipping to change at page 36, line 44 ¶ | |||
[RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for | [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for | |||
Keeping Alive the NAT Mappings Associated with RTP / RTP | Keeping Alive the NAT Mappings Associated with RTP / RTP | |||
Control Protocol (RTCP) Flows", RFC 6263, June 2011. | Control Protocol (RTCP) Flows", RFC 6263, June 2011. | |||
Appendix A. Supported RTP Topologies | Appendix A. Supported RTP Topologies | |||
RTP supports both unicast and group communication, with participants | RTP supports both unicast and group communication, with participants | |||
being connected using wide range of transport-layer topologies. Some | being connected using wide range of transport-layer topologies. Some | |||
of these topologies involve only the end-points, while others use RTP | of these topologies involve only the end-points, while others use RTP | |||
translators and mixers to provide in-network processing. Properties | translators and mixers to provide in-network processing. Properties | |||
of some RTP topologies are discussed in [RFC5117], and we further | of some RTP topologies are discussed in | |||
[I-D.westerlund-avtcore-rtp-topologies-update], and we further | ||||
describe those expected to be useful for WebRTC in the following. We | describe those expected to be useful for WebRTC in the following. We | |||
also goes into important RTP session aspects that the topology or | also goes into important RTP session aspects that the topology or | |||
implementation variant can place on a WebRTC end-point. | implementation variant can place on a WebRTC end-point. | |||
This section includes RTP topologies beyond the recommended ones. | This section includes RTP topologies beyond the RECOMMENDED ones. | |||
This in an attempt to highlight the differencies and the in many case | ||||
This in an attempt to highlight the differences and the in many case | ||||
small differences in implementation to support a larger set of | small differences in implementation to support a larger set of | |||
possible topologies. | possible topologies. | |||
(tbd: This section needs reworking and clearer relation to | ||||
[I-D.westerlund-avtcore-rtp-topologies-update].) | ||||
A.1. Point to Point | A.1. Point to Point | |||
The point-to-point RTP topology (Figure 3) is the simplest scenario | The point-to-point RTP topology (Figure 3) is the simplest scenario | |||
for WebRTC applications. This is going to be very common for user to | for WebRTC applications. This is going to be very common for user to | |||
user calls. | user calls. | |||
+---+ +---+ | +---+ +---+ | |||
| A |<------->| B | | | A |<------->| B | | |||
+---+ +---+ | +---+ +---+ | |||
skipping to change at page 36, line 51 ¶ | skipping to change at page 37, line 35 ¶ | |||
of details that are common for all RTP usage in the WebRTC context. | of details that are common for all RTP usage in the WebRTC context. | |||
First is the intention to multiplex RTP and RTCP over the same UDP- | First is the intention to multiplex RTP and RTCP over the same UDP- | |||
flow. Secondly is the question of using only a single RTP session or | flow. Secondly is the question of using only a single RTP session or | |||
one per media type for legacy interoperability. Thirdly is the | one per media type for legacy interoperability. Thirdly is the | |||
question of using multiple sender sources (SSRCs) per end-point. | question of using multiple sender sources (SSRCs) per end-point. | |||
Historically, RTP and RTCP have been run on separate UDP ports. With | Historically, RTP and RTCP have been run on separate UDP ports. With | |||
the increased use of Network Address/Port Translation (NAPT) this has | the increased use of Network Address/Port Translation (NAPT) this has | |||
become problematic, since maintaining multiple NAT bindings can be | become problematic, since maintaining multiple NAT bindings can be | |||
costly. It also complicates firewall administration, since multiple | costly. It also complicates firewall administration, since multiple | |||
ports must be opened to allow RTP traffic. To reduce these costs and | ports need to be opened to allow RTP traffic. To reduce these costs | |||
session setup times, support for multiplexing RTP data packets and | and session set-up times, support for multiplexing RTP data packets | |||
RTCP control packets on a single port [RFC5761] will be supported. | and RTCP control packets on a single port [RFC5761] will be | |||
supported. | ||||
In cases where there is only one type of media (e.g., a voice-only | In cases where there is only one type of media (e.g., a voice-only | |||
call) this topology will be implemented as a single RTP session, with | call) this topology will be implemented as a single RTP session, with | |||
bidirectional flows of RTP and RTCP packets, all then multiplexed | bidirectional flows of RTP and RTCP packets, all then multiplexed | |||
onto a single 5-tuple. If multiple types of media are to be used | onto a single 5-tuple. If multiple types of media are to be used | |||
(e.g., audio and video), then each type media can be sent as a | (e.g., audio and video), then each type media can be sent as a | |||
separate RTP session using a different 5-tuple, allowing for separate | separate RTP session using a different 5-tuple, allowing for separate | |||
transport level treatment of each type of media. Alternatively, all | transport level treatment of each type of media. Alternatively, all | |||
types of media can be multiplexed onto a single 5-tuple as a single | types of media can be multiplexed onto a single 5-tuple as a single | |||
RTP session, or as several RTP sessions if using a demultiplexing | RTP session, or as several RTP sessions if using a demultiplexing | |||
shim. Multiplexing different types of media onto a single 5-tuple | shim. Multiplexing different types of media onto a single 5-tuple | |||
places some limitations on how RTP is used, as described in "RTP | places some limitations on how RTP is used, as described in "RTP | |||
Multiplexing Architecture" | Multiplexing Architecture" | |||
[I-D.westerlund-avtcore-multiplex-architecture]. It is not expected | [I-D.westerlund-avtcore-multiplex-architecture]. It is not expected | |||
that these limitations will significantly affect the scenarios | that these limitations will significantly affect the scenarios | |||
targeted by WebRTC, but they may impact interoperability with legacy | targeted by WebRTC, but they can impact interoperability with legacy | |||
systems. | systems. | |||
An RTP session have good support for simultanously transport multiple | An RTP session have good support for simultaneously transport | |||
media sources. Each media source uses an unique SSRC identifier and | multiple media sources. Each media source uses an unique SSRC | |||
each SSRC has independent RTP sequence number and timestamp spaces. | identifier and each SSRC has independent RTP sequence number and | |||
This is being utilized in WebRTC for several cases. One is to enable | timestamp spaces. This is being utilized in WebRTC for several | |||
multiple media sources of the same type, an end-point that has two | cases. One is to enable multiple media sources of the same type, an | |||
video cameras can potentially transmitt video from both to its | end-point that has two video cameras can potentially transmit video | |||
peer(s). Another usage is when a single RTP session is being used | from both to its peer(s). Another usage is when a single RTP session | |||
for both multiple media types, thus an end-point can transmit both | is being used for both multiple media types, thus an end-point can | |||
audio and video to the peer(s). Thirdly to support multi-party cases | transmit both audio and video to the peer(s). Thirdly to support | |||
as will be discussed below support for multiple SSRC of the same | multi-party cases as will be discussed below support for multiple | |||
media type are required. | SSRC of the same media type is needed. | |||
Thus we can introduce a couple of different notiations in the below | Thus we can introduce a couple of different notations in the below | |||
two alternate figures of a single peer connection in a a point to | two alternate figures of a single peer connection in a point to point | |||
point setup. The first depicting a setup where the peer connection | set-up. The first depicting a setup where the peer connection | |||
established has two different RTP sessions, one for audio and one for | established has two different RTP sessions, one for audio and one for | |||
video. The second one using a single RTP session. In both cases A | video. The second one using a single RTP session. In both cases A | |||
has two video streams to send and one audio stream. B has only one | has two video streams to send and one audio stream. B has only one | |||
audio and video stream. These are used to illustrate the relation | audio and video stream. These are used to illustrate the relation | |||
between a peerConnection, the UDP flow(s), the RTP session(s) and the | between a peerConnection, the UDP flow(s), the RTP session(s) and the | |||
SSRCs that will be used in the later cases also. In the below | SSRCs that will be used in the later cases also. In the below | |||
figures RTCP flows are not included. They will flow bi-directionally | figures RTCP flows are not included. They will flow bi-directionally | |||
between any RTP session instances in the different nodes. | between any RTP session instances in the different nodes. | |||
+-A-------------+ +-B-------------+ | +-A-------------+ +-B-------------+ | |||
skipping to change at page 39, line 34 ¶ | skipping to change at page 40, line 34 ¶ | |||
Figure 5: Point to Point: Single RTP session. | Figure 5: Point to Point: Single RTP session. | |||
In (Figure 5) there is only a single UDP flow and RTP session (RTP1). | In (Figure 5) there is only a single UDP flow and RTP session (RTP1). | |||
This RTP session carries a total of five (5) RTP media streams | This RTP session carries a total of five (5) RTP media streams | |||
(SSRCs). From A to B there is Audio (AA1) and two video (AV1 and | (SSRCs). From A to B there is Audio (AA1) and two video (AV1 and | |||
AV2). From B to A there is Audio (BA1) and Video (BV1). | AV2). From B to A there is Audio (BA1) and Video (BV1). | |||
A.2. Multi-Unicast (Mesh) | A.2. Multi-Unicast (Mesh) | |||
For small multiparty calls, it is practical to set up a multi-unicast | For small multiparty calls, it is practical to set up a multi-unicast | |||
topology (Figure 6); unfortunately not discussed in the RTP | topology (Figure 6). In this topology, each participant sends | |||
Topologies RFC [RFC5117]. In this topology, each participant sends | ||||
individual unicast RTP/UDP/IP flows to each of the other participants | individual unicast RTP/UDP/IP flows to each of the other participants | |||
using independent PeerConnections in a full mesh. | using independent PeerConnections in a full mesh. | |||
+---+ +---+ | +---+ +---+ | |||
| A |<---->| B | | | A |<---->| B | | |||
+---+ +---+ | +---+ +---+ | |||
^ ^ | ^ ^ | |||
\ / | \ / | |||
\ / | \ / | |||
v v | v v | |||
skipping to change at page 40, line 19 ¶ | skipping to change at page 41, line 18 ¶ | |||
implemented as a single RTP session, spanning multiple peer-to-peer | implemented as a single RTP session, spanning multiple peer-to-peer | |||
transport layer connections, or as several pairwise RTP sessions, one | transport layer connections, or as several pairwise RTP sessions, one | |||
between each pair of peers. To maintain a coherent mapping between | between each pair of peers. To maintain a coherent mapping between | |||
the relation between RTP sessions and PeerConnections we recommend | the relation between RTP sessions and PeerConnections we recommend | |||
that one implements this as individual RTP sessions. The only | that one implements this as individual RTP sessions. The only | |||
downside is that end-point A will not learn of the quality of any | downside is that end-point A will not learn of the quality of any | |||
transmission happening between B and C based on RTCP. This has not | transmission happening between B and C based on RTCP. This has not | |||
been seen as a significant downside as now one has yet seen a need | been seen as a significant downside as now one has yet seen a need | |||
for why A would need to know about the B's and C's communication. An | for why A would need to know about the B's and C's communication. An | |||
advantage of using separate RTP sessions is that it enables using | advantage of using separate RTP sessions is that it enables using | |||
different media bit-rates to the differnt peers, thus not forcing B | different media bit-rates to the different peers, thus not forcing B | |||
to endure the same quality reductions if there are limiations in the | to endure the same quality reductions if there are limitations in the | |||
transport from A to C as C will. | transport from A to C as C will. | |||
+-A------------------------+ +-B-------------+ | +-A------------------------+ +-B-------------+ | |||
|+---+ +-PeerC1------| |-PeerC1------+ | | |+---+ +-PeerC1------| |-PeerC1------+ | | |||
||MIC| | +-UDP1------| |-UDP1------+ | | | ||MIC| | +-UDP1------| |-UDP1------+ | | | |||
|+---+ | | +-RTP1----| |-RTP1----+ | | | | |+---+ | | +-RTP1----| |-RTP1----+ | | | | |||
| | +----+ | | | +-Audio-| |-Audio-+ | | | | | | | +----+ | | | +-Audio-| |-Audio-+ | | | | | |||
| +->|ENC1|--+-+-+-+--->AA1|------------->| | | | | | | | +->|ENC1|--+-+-+-+--->AA1|------------->| | | | | | | |||
| | +----+ | | | | |<-------------|BA1 | | | | | | | | +----+ | | | | |<-------------|BA1 | | | | | | |||
| | | | | +-------| |-------+ | | | | | | | | | | +-------| |-------+ | | | | | |||
skipping to change at page 41, line 5 ¶ | skipping to change at page 42, line 4 ¶ | |||
| | | | +-------| |-------+ | | | | | | | | | +-------| |-------+ | | | | | |||
| | | +---------| |---------+ | | | | | | | +---------| |---------+ | | | | |||
| | +-----------| |-----------+ | | | | | +-----------| |-----------+ | | | |||
| +-------------| |-------------+ | | | +-------------| |-------------+ | | |||
+--------------------------+ +---------------+ | +--------------------------+ +---------------+ | |||
Figure 7: Session structure for Multi-Unicast Setup | Figure 7: Session structure for Multi-Unicast Setup | |||
Lets review how the RTP sessions looks from A's perspective by | Lets review how the RTP sessions looks from A's perspective by | |||
considering both how the media is a handled and what PeerConnections | considering both how the media is a handled and what PeerConnections | |||
and RTP sessions that are setup in Figure 7. A's microphone is | and RTP sessions that are set-up in Figure 7. A's microphone is | |||
captured and the digital audio can then be feed into two different | captured and the digital audio can then be feed into two different | |||
encoder instances each beeing associated with two different | encoder instances each beeing associated with two different | |||
PeerConnections (PeerC1 and PeerC2) each containing independent RTP | PeerConnections (PeerC1 and PeerC2) each containing independent RTP | |||
sessions (RTP1 and RTP2). The SSRCs in each RTP session will be | sessions (RTP1 and RTP2). The SSRCs in each RTP session will be | |||
completely independent and the media bit-rate produced by the encoder | completely independent and the media bit-rate produced by the encoder | |||
can also be tuned to address any congestion control requirements | can also be tuned to address any congestion control requirements | |||
between A and B differently then for the path A to C. | between A and B differently then for the path A to C. | |||
For media encodings which are more resource consuming, like video, | For media encodings which are more resource consuming, like video, | |||
one could expect that it will be common that end-points that are | one could expect that it will be common that end-points that are | |||
resource costrained will use a different implementation strategy | resource constrained will use a different implementation strategy | |||
where the encoder is shared between the different PeerConnections as | where the encoder is shared between the different PeerConnections as | |||
shown below Figure 8. | shown below Figure 8. | |||
+-A----------------------+ +-B-------------+ | +-A----------------------+ +-B-------------+ | |||
|+---+ | | | | |+---+ | | | | |||
||CAM| +-PeerC1------| |-PeerC1------+ | | ||CAM| +-PeerC1------| |-PeerC1------+ | | |||
|+---+ | +-UDP1------| |-UDP1------+ | | | |+---+ | +-UDP1------| |-UDP1------+ | | | |||
| | | | +-RTP1----| |-RTP1----+ | | | | | | | | +-RTP1----| |-RTP1----+ | | | | |||
| V | | | +-Video-| |-Video-+ | | | | | | V | | | +-Video-| |-Video-+ | | | | | |||
|+----+ | | | | |<----------------|BV1 | | | | | | |+----+ | | | | |<----------------|BV1 | | | | | | |||
||ENC |----+-+-+-+--->AV1|---------------->| | | | | | | ||ENC |----+-+-+-+--->AV1|---------------->| | | | | | | |||
skipping to change at page 41, line 51 ¶ | skipping to change at page 42, line 50 ¶ | |||
| | | +---------| |---------+ | | | | | | | +---------| |---------+ | | | | |||
| | +-----------| |-----------+ | | | | | +-----------| |-----------+ | | | |||
| +-------------| |-------------+ | | | +-------------| |-------------+ | | |||
+------------------------+ +---------------+ | +------------------------+ +---------------+ | |||
Figure 8: Single Encoder Multi-Unicast Setup | Figure 8: Single Encoder Multi-Unicast Setup | |||
This will clearly save resources consumed by encoding but does | This will clearly save resources consumed by encoding but does | |||
introduce the need for the end-point A to make decisions on how it | introduce the need for the end-point A to make decisions on how it | |||
encodes the media so it suites delivery to both B and C. This is not | encodes the media so it suites delivery to both B and C. This is not | |||
limited to congestion control, also prefered resolution to receive | limited to congestion control, also preferred resolution to receive | |||
based on dispaly area available is another aspect requiring | based on dispaly area available is another aspect requiring | |||
consideration. The need for this type of descion logic does arise in | consideration. The need for this type of decision logic does arise | |||
several different topologies and implementation. | in several different topologies and implementation. | |||
A.3. Mixer Based | A.3. Mixer Based | |||
An mixer (Figure 9) is a centralised point that selects or mixes | An mixer (Figure 9) is a centralised point that selects or mixes | |||
content in a conference to optimise the RTP session so that each end- | content in a conference to optimise the RTP session so that each end- | |||
point only needs connect to one entity, the mixer. The mixer can | point only needs connect to one entity, the mixer. The mixer can | |||
also reduce the bit-rate needed from the mixer down to a conference | also reduce the bit-rate needed from the mixer down to a conference | |||
participants as the media sent from the mixer to the end-point can be | participants as the media sent from the mixer to the end-point can be | |||
optimised in different ways. These optimisations include methods | optimised in different ways. These optimisations include methods | |||
like only choosing media from the currently most active speaker or | like only choosing media from the currently most active speaker or | |||
mixing together audio so that only one audio stream is required in | mixing together audio so that only one audio stream is needed instead | |||
stead of 3 in the depicted scenario (Figure 9). | of 3 in the depicted scenario (Figure 9). | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| Mixer | | | Mixer | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 9: RTP Mixer with Only Unicast Paths | Figure 9: RTP Mixer with Only Unicast Paths | |||
Mixers has two downsides, the first is that the mixer must be a | Mixers have two downsides, the first is that the mixer has to be a | |||
trusted node as they either performs media operations or at least | trusted node as they either performs media operations or at least re- | |||
repacketize the media. Both type of operations requires when using | packetize the media. Both type of operations requires when using | |||
SRTP that the mixer verifies integrity, decrypts the content, perform | SRTP that the mixer verifies integrity, decrypts the content, perform | |||
its operation and form new RTP packets, encrypts and integegrity | its operation and form new RTP packets, encrypts and integrity | |||
protect them. This applies to all types of mixers described below. | protect them. This applies to all types of mixers described below. | |||
The second downside is that all these operations and optimization of | The second downside is that all these operations and optimization of | |||
the session requires processing. How much depends on the | the session requires processing. How much depends on the | |||
implementation as will become evident below. | implementation as will become evident below. | |||
The implementation of an mixer can take several different forms and | The implementation of an mixer can take several different forms and | |||
we will discuss the main themes available that doesn't break RTP. | we will discuss the main themes available that doesn't break RTP. | |||
Please note that a Mixer could also contain translator | Please note that a Mixer could also contain translator | |||
skipping to change at page 43, line 4 ¶ | skipping to change at page 43, line 51 ¶ | |||
we will discuss the main themes available that doesn't break RTP. | we will discuss the main themes available that doesn't break RTP. | |||
Please note that a Mixer could also contain translator | Please note that a Mixer could also contain translator | |||
functionalities, like a media transcoder to adjust the media bit-rate | functionalities, like a media transcoder to adjust the media bit-rate | |||
or codec used on a particular RTP media stream. | or codec used on a particular RTP media stream. | |||
A.3.1. Media Mixing | A.3.1. Media Mixing | |||
This type of mixer is one which clearly can be called RTP mixer is | This type of mixer is one which clearly can be called RTP mixer is | |||
likely the one that most thinks of when they hear the term mixer. | likely the one that most thinks of when they hear the term mixer. | |||
Its basic patter of operation is that it will receive the different | Its basic patter of operation is that it will receive the different | |||
participants RTP media stream. Select which that are to be included | participants RTP media stream. Select which that are to be included | |||
in a media domain mix of the incomming RTP media streams. Then | in a media domain mix of the incoming RTP media streams. Then create | |||
create a single outgoing stream from this mix. | a single outgoing stream from this mix. | |||
Audio mixing is straight forward and commonly possible to do for a | Audio mixing is straight forward and commonly possible to do for a | |||
number of participants. Lets assume that you want to mix N number of | number of participants. Lets assume that you want to mix N number of | |||
streams from different participants. Then the mixer need to perform | streams from different participants. Then the mixer need to perform | |||
N decodings. Then it needs to produce N or N+1 mixes, the reasons | decoding N times. Then it needs to produce N or N+1 mixes, the | |||
that different mixes are needed are so that each contributing source | reasons that different mixes are needed are so that each contributing | |||
get a mix which don't contain themselves, as this would result in an | source get a mix which don't contain themselves, as this would result | |||
echo. When N is lower than the number of all participants one may | in an echo. When N is lower than the number of all participants one | |||
produce a Mix of all N streams for the group that are curently not | can produce a Mix of all N streams for the group that are curently | |||
included in the mix, thus N+1 mixes. These audio streams are then | not included in the mix, thus N+1 mixes. These audio streams are | |||
encoded again, RTP packetized and sent out. | then encoded again, RTP packetized and sent out. | |||
Video can't really be "mixed" and produce something particular useful | Video can't really be "mixed" and produce something particular useful | |||
for the users, however creating an composition out of the contributed | for the users, however creating an composition out of the contributed | |||
video streams can be done. In fact it can be done in a number of | video streams can be done. In fact it can be done in a number of | |||
ways, tiling the different streams creating a chessboard, selecting | ways, tiling the different streams creating a chessboard, selecting | |||
someone as more important and showing them large and a number of | someone as more important and showing them large and a number of | |||
other sources as smaller is another. Also here one commonly need to | other sources as smaller is another. Also here one commonly need to | |||
produce a number of different compositions so that the contributing | produce a number of different compositions so that the contributing | |||
part doesn't need to see themselves. Then the mixer re-encodes the | part doesn't need to see themselves. Then the mixer re-encodes the | |||
created video stream, RTP packetize it and send it out | created video stream, RTP packetize it and send it out | |||
skipping to change at page 44, line 51 ¶ | skipping to change at page 45, line 51 ¶ | |||
| | +-----------| |-----------+ | | +---+ | | | | | | +-----------| |-----------+ | | +---+ | | | | |||
| +-------------| |-------------+ | +-----+ | | | +-------------| |-------------+ | +-----+ | | |||
+---------------+ |---------------+ | | +---------------+ |---------------+ | | |||
+--------------------------------+ | +--------------------------------+ | |||
Figure 10: Session and SSRC details for Media Mixer | Figure 10: Session and SSRC details for Media Mixer | |||
From an RTP perspective media mixing can be very straight forward as | From an RTP perspective media mixing can be very straight forward as | |||
can be seen in Figure 10. The mixer present one SSRC towards the | can be seen in Figure 10. The mixer present one SSRC towards the | |||
peer client, e.g. MA1 to Peer A, which is the media mix of the other | peer client, e.g. MA1 to Peer A, which is the media mix of the other | |||
particpants. As each peer receives a different version produced by | participants. As each peer receives a different version produced by | |||
the mixer there are no actual relation between the different RTP | the mixer there are no actual relation between the different RTP | |||
sessions in the actual media or the transport level information. | sessions in the actual media or the transport level information. | |||
There is however one connection between RTP1-RTP3 in this figure. It | There is however one connection between RTP1-RTP3 in this figure. It | |||
has to do with the SSRC space and the identity information. When A | has to do with the SSRC space and the identity information. When A | |||
receives the MA1 stream which is a combination of BA1 and CA1 streams | receives the MA1 stream which is a combination of BA1 and CA1 streams | |||
in the other PeerConnections RTP could enable the mixer to include | in the other PeerConnections RTP could enable the mixer to include | |||
CSRC information in the MA1 stream to identify the contributing | CSRC information in the MA1 stream to identify the contributing | |||
source BA1 and CA1. | source BA1 and CA1. | |||
The CSRC has in its turn utility in RTP extensions, like the in | The CSRC has in its turn utility in RTP extensions, like the in | |||
skipping to change at page 45, line 30 ¶ | skipping to change at page 46, line 30 ¶ | |||
need to be exposed. The main goal would be to enable the correct | need to be exposed. The main goal would be to enable the correct | |||
binding against the application logic and other information sources. | binding against the application logic and other information sources. | |||
This also enables loop detection in the RTP session. | This also enables loop detection in the RTP session. | |||
A.3.1.1. RTP Session Termination | A.3.1.1. RTP Session Termination | |||
There exist an possible implementation choice to have the RTP | There exist an possible implementation choice to have the RTP | |||
sessions being separated between the different legs in the multi- | sessions being separated between the different legs in the multi- | |||
party communication session and only generate RTP media streams in | party communication session and only generate RTP media streams in | |||
each without carrying on RTP/RTCP level any identity information | each without carrying on RTP/RTCP level any identity information | |||
about the contributing sources. This removes both the functionaltiy | about the contributing sources. This removes both the functionality | |||
that CSRC can provide and the possibility to use any extensions that | that CSRC can provide and the possibility to use any extensions that | |||
build on CSRC and the loop detection. It may appear a simplification | build on CSRC and the loop detection. It might appear a | |||
if SSRC collision would occur between two different end-points as | simplification if SSRC collision would occur between two different | |||
they can be avoide to be resolved and instead remapped between the | end-points as they can be avoided to be resolved and instead remapped | |||
independent sessions if at all exposed. However, SSRC/CSRC remapping | between the independent sessions if at all exposed. However, SSRC/ | |||
requiresthat SSRC/CSRC are never exposed to the WebRTC javascript | CSRC remapping requires that SSRC/CSRC are never exposed to the | |||
client to use as reference. This as they only have local importance | WebRTC JavaScript client to use as reference. This as they only have | |||
if they are used on a multi-party session scope the result would be | local importance if they are used on a multi-party session scope the | |||
missreferencing. Also SSRC collision handling will still be needed | result would be mis-referencing. Also SSRC collision handling will | |||
as it may occur between the mixer and the end-point. | still be needed as it can occur between the mixer and the end-point. | |||
Session termination may appear to resolve some issues, it however | Session termination might appear to resolve some issues, it however | |||
creates other issues that needs resolving, like loop detection, | creates other issues that needs resolving, like loop detection, | |||
identification of contributing sources and the need to handle mapped | identification of contributing sources and the need to handle mapped | |||
identities and ensure that the right one is used towards the right | identities and ensure that the right one is used towards the right | |||
identities and never used directly between multiple end-points. | identities and never used directly between multiple end-points. | |||
A.3.2. Media Switching | A.3.2. Media Switching | |||
An RTP Mixer based on media switching avoids the media decoding and | An RTP Mixer based on media switching avoids the media decoding and | |||
encoding cycle in the mixer, but not the decryption and re-encryption | encoding cycle in the mixer, but not the decryption and re-encryption | |||
cycle as one rewrites RTP headers. This both reduces the amount of | cycle as one rewrites RTP headers. This both reduces the amount of | |||
computational resources needed in the mixer and increases the media | computational resources needed in the mixer and increases the media | |||
quality per transmitted bit. This is achieve by letting the mixer | quality per transmitted bit. This is achieve by letting the mixer | |||
have a number of SSRCs that represents conceptual or functional | have a number of SSRCs that represents conceptual or functional | |||
streams the mixer produces. These streams are created by selecting | streams the mixer produces. These streams are created by selecting | |||
media from one of the by the mixer received RTP media streams and | media from one of the by the mixer received RTP media streams and | |||
forward the media using the mixers own SSRCs. The mixer can then | forward the media using the mixers own SSRCs. The mixer can then | |||
switch between available sources if that is required by the concept | switch between available sources if that is needed by the concept for | |||
for the source, like currently active speaker. | the source, like currently active speaker. | |||
To achieve a coherent RTP media stream from the mixer's SSRC the | To achieve a coherent RTP media stream from the mixer's SSRC the | |||
mixer is forced to rewrite the incoming RTP packet's header. First | mixer is forced to rewrite the incoming RTP packet's header. First | |||
the SSRC field must be set to the value of the Mixer's SSRC. | the SSRC field has to be set to the value of the Mixer's SSRC. | |||
Secondly, the sequence number must be the next in the sequence of | Secondly, the sequence number is set to the next in the sequence of | |||
outgoing packets it sent. Thirdly the RTP timestamp value needs to | outgoing packets it sent. Thirdly the RTP timestamp value needs to | |||
be adjusted using an offset that changes each time one switch media | be adjusted using an offset that changes each time one switch media | |||
source. Finally depending on the negotiation the RTP payload type | source. Finally depending on the negotiation the RTP payload type | |||
value representing this particular RTP payload configuration may have | value representing this particular RTP payload configuration might | |||
to be changed if the different PeerConnections have not arrived on | have to be changed if the different PeerConnections have not arrived | |||
the same numbering for a given configuration. This also requires | on the same numbering for a given configuration. This also requires | |||
that the different end-points do support a common set of codecs, | that the different end-points do support a common set of codecs, | |||
otherwise media transcoding for codec compatibility is still | otherwise media transcoding for codec compatibility is still needed. | |||
required. | ||||
Lets consider the operation of media switching mixer that supports a | Lets consider the operation of media switching mixer that supports a | |||
video conference with six participants (A-F) where the two latest | video conference with six participants (A-F) where the two latest | |||
speakers in the conference are shown to each participants. Thus the | speakers in the conference are shown to each participants. Thus the | |||
mixer has two SSRCs sending video to each peer. | mixer has two SSRCs sending video to each peer. | |||
+-A-------------+ +-MIXER--------------------------+ | +-A-------------+ +-MIXER--------------------------+ | |||
| +-PeerC1------| |-PeerC1--------+ | | | +-PeerC1------| |-PeerC1--------+ | | |||
| | +-UDP1------| |-UDP1--------+ | | | | | +-UDP1------| |-UDP1--------+ | | | |||
| | | +-RTP1----| |-RTP1------+ | | +-----+ | | | | | +-RTP1----| |-RTP1------+ | | +-----+ | | |||
skipping to change at page 48, line 16 ¶ | skipping to change at page 49, line 16 ¶ | |||
To ensure that a media receiver can correctly decode the RTP media | To ensure that a media receiver can correctly decode the RTP media | |||
stream after a switch, it becomes necessary to ensure for state | stream after a switch, it becomes necessary to ensure for state | |||
saving codecs that they start from default state at the point of | saving codecs that they start from default state at the point of | |||
switching. Thus one common tool for video is to request that the | switching. Thus one common tool for video is to request that the | |||
encoding creates an intra picture, something that isn't dependent on | encoding creates an intra picture, something that isn't dependent on | |||
earlier state. This can be done using Full Intra Request RTCP codec | earlier state. This can be done using Full Intra Request RTCP codec | |||
control message as discussed in Section 5.1.1. | control message as discussed in Section 5.1.1. | |||
Also in this type of mixer one could consider to terminate the RTP | Also in this type of mixer one could consider to terminate the RTP | |||
sessions fully between the different PeerConnection. The same | sessions fully between the different PeerConnection. The same | |||
arguments and conisderations as discussed in Appendix A.3.1.1 applies | arguments and considerations as discussed in Appendix A.3.1.1 applies | |||
here. | here. | |||
A.3.3. Media Projecting | A.3.3. Media Projecting | |||
Another method for handling media in the RTP mixer is to project all | Another method for handling media in the RTP mixer is to project all | |||
potential sources (SSRCs) into a per end-point independent RTP | potential sources (SSRCs) into a per end-point independent RTP | |||
session. The mixer can then select which of the potential sources | session. The mixer can then select which of the potential sources | |||
that are currently actively transmitting media, despite that the | that are currently actively transmitting media, despite that the | |||
mixer in another RTP session recieves media from that end-point. | mixer in another RTP session receives media from that end-point. | |||
This is similar to the media switching Mixer but have some important | This is similar to the media switching Mixer but have some important | |||
differences in RTP details. | differences in RTP details. | |||
+-A-------------+ +-MIXER--------------------------+ | +-A-------------+ +-MIXER--------------------------+ | |||
| +-PeerC1------| |-PeerC1--------+ | | | +-PeerC1------| |-PeerC1--------+ | | |||
| | +-UDP1------| |-UDP1--------+ | | | | | +-UDP1------| |-UDP1--------+ | | | |||
| | | +-RTP1----| |-RTP1------+ | | +-----+ | | | | | +-RTP1----| |-RTP1------+ | | +-----+ | | |||
| | | | +-Video-| |-Video---+ | | | | | | | | | | | +-Video-| |-Video---+ | | | | | | | |||
| | | | | AV1|------------>|---------+-+-+-+------->| | | | | | | | | AV1|------------>|---------+-+-+-+------->| | | | |||
| | | | | |<------------|BV1 <----+-+-+-+--------| | | | | | | | | |<------------|BV1 <----+-+-+-+--------| | | | |||
skipping to change at page 50, line 11 ¶ | skipping to change at page 51, line 11 ¶ | |||
| | +-----------| |-------------+ | +-----+ | | | | +-----------| |-------------+ | +-----+ | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +--------------------------------+ | +---------------+ +--------------------------------+ | |||
Figure 12: Media Projecting Mixer | Figure 12: Media Projecting Mixer | |||
So in this six participant conference depicted above in (Figure 12) | So in this six participant conference depicted above in (Figure 12) | |||
one can see that end-point A will in this case be aware of 5 incoming | one can see that end-point A will in this case be aware of 5 incoming | |||
SSRCs, BV1-FV1. If this mixer intend to have the same behavior as in | SSRCs, BV1-FV1. If this mixer intend to have the same behavior as in | |||
Appendix A.3.2 where the mixer provides the end-points with the two | Appendix A.3.2 where the mixer provides the end-points with the two | |||
latest speaking end-points, then only two out of these five SSRCs | latest speaking end-points, then only two out of these five SSRCs | |||
will concurrently transmitt media to A. As the mixer selects which | will concurrently transmit media to A. As the mixer selects which | |||
source in the different RTP sessions that transmit media to the end- | source in the different RTP sessions that transmit media to the end- | |||
points each RTP media stream will require some rewriting when being | points each RTP media stream will require some rewriting when being | |||
projected from one session into another. The main thing is that the | projected from one session into another. The main thing is that the | |||
sequence number will need to be consequitvely incremented based on | sequence number will need to be consecutively incremented based on | |||
the packet actually being transmitted in each RTP session. Thus the | the packet actually being transmitted in each RTP session. Thus the | |||
RTP sequence number offset will change each time a source is turned | RTP sequence number offset will change each time a source is turned | |||
on in RTP session. | on in RTP session. | |||
As the RTP sessions are independent the SSRC numbers used can be | As the RTP sessions are independent the SSRC numbers used can be | |||
handled indepdentently also thus working around any SSRC collisions | handled independently also thus working around any SSRC collisions by | |||
by having remapping tables between the RTP sessions. However the | having remapping tables between the RTP sessions. However the | |||
related WebRTC MediaStream signalling must be correspondlingly | related WebRTC MediaStream signalling need to be correspondingly | |||
changed to ensure consistent WebRTC MediaStream to SSRC mappings | changed to ensure consistent WebRTC MediaStream to SSRC mappings | |||
between the different PeerConnections and the same comment that | between the different PeerConnections and the same comment that | |||
higher functions must not use SSRC as references to RTP media streams | higher functions MUST NOT use SSRC as references to RTP media streams | |||
applies also here. | applies also here. | |||
The mixer will also be responsible to act on any RTCP codec control | The mixer will also be responsible to act on any RTCP codec control | |||
requests comming from an end-point and decide if it can act on it | requests coming from an end-point and decide if it can act on it | |||
locally or needs to translate the request into the RTP session that | locally or needs to translate the request into the RTP session that | |||
contains the media source. Both end-points and the mixer will need | contains the media source. Both end-points and the mixer will need | |||
to implement conference related codec control functionalities to | to implement conference related codec control functionalities to | |||
provide a good experience. Full Intra Request to request from the | provide a good experience. Full Intra Request to request from the | |||
media source to provide switching points between the sources, | media source to provide switching points between the sources, | |||
Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer | Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer | |||
to aggregate congestion control response towards the media source and | to aggregate congestion control response towards the media source and | |||
have it adjust its bit-rate in case the limitation is not in the | have it adjust its bit-rate in case the limitation is not in the | |||
source to mixer link. | source to mixer link. | |||
skipping to change at page 51, line 22 ¶ | skipping to change at page 52, line 22 ¶ | |||
for a legacy end-point or simply relay packets between transport | for a legacy end-point or simply relay packets between transport | |||
domains or to realize multi-party. We will go in details below. | domains or to realize multi-party. We will go in details below. | |||
A.4.1. Transcoder | A.4.1. Transcoder | |||
A transcoder operates on media level and really used for two | A transcoder operates on media level and really used for two | |||
purposes, the first is to allow two end-points that doesn't have a | purposes, the first is to allow two end-points that doesn't have a | |||
common set of media codecs to communicate by translating from one | common set of media codecs to communicate by translating from one | |||
codec to another. The second is to change the bit-rate to a lower | codec to another. The second is to change the bit-rate to a lower | |||
one. For WebRTC end-points communicating with each other only the | one. For WebRTC end-points communicating with each other only the | |||
first one should at all be relevant. In certain legacy deployment | first one is relevant. In certain legacy deployment media transcoder | |||
media transcoder will be necessary to ensure both codecs and bit-rate | will be necessary to ensure both codecs and bit-rate falls within the | |||
falls within the envelope the legacy end-point supports. | envelope the legacy end-point supports. | |||
As transcoding requires access to the media the transcoder must | As transcoding requires access to the media, the transcoder has to be | |||
within the security context and access any media encryption and | within the security context and access any media encryption and | |||
integrity keys. On the RTP plane a media transcoder will in practice | integrity keys. On the RTP plane a media transcoder will in practice | |||
fork the RTP session into two different domains that are highly | fork the RTP session into two different domains that are highly | |||
decoupled when it comes to media parameters and reporting, but not | decoupled when it comes to media parameters and reporting, but not | |||
identities. To maintain signalling bindings to SSRCs a transcoder is | identities. To maintain signalling bindings to SSRCs a transcoder is | |||
likely needing to use the SSRC of one end-point to represent the | likely needing to use the SSRC of one end-point to represent the | |||
transcoded RTP media stream to the other end-point(s). The | transcoded RTP media stream to the other end-point(s). The | |||
congestion control loop can be terminated in the transcoder as the | congestion control loop can be terminated in the transcoder as the | |||
media bit-rate being sent by the transcoder can be adjusted | media bit-rate being sent by the transcoder can be adjusted | |||
independently of the incoming bit-rate. However, for optimizing | independently of the incoming bit-rate. However, for optimizing | |||
performance and resource consumption the translator needs to consider | performance and resource consumption the translator needs to consider | |||
what signals or bit-rate reductions it should send towards the source | what signals or bit-rate reductions it needs to send towards the | |||
end-point. For example receving a 2.5 mbps video stream and then | source end-point. For example receiving a 2.5 Mbps video stream and | |||
send out a 250 kbps video stream after transcoding is a vaste of | then send out a 250 kbps video stream after transcoding is a waste of | |||
resources. In most cases a 500 kbps video stream from the source in | resources. In most cases a 500 kbps video stream from the source in | |||
the right resolution is likely to provide equal quality after | the right resolution is likely to provide equal quality after | |||
transcoding as the 2.5 mbps source stream. At the same time | transcoding as the 2.5 Mbps source stream. At the same time | |||
increasing media bit-rate futher than what is needed to represent the | increasing media bit-rate further than what is needed to represent | |||
incoming quality accurate is also wasted resources. | the incoming quality accurate is also wasted resources. | |||
+-A-------------+ +-Translator------------------+ | +-A-------------+ +-Translator------------------+ | |||
| +-PeerC1------| |-PeerC1--------+ | | | +-PeerC1------| |-PeerC1--------+ | | |||
| | +-UDP1------| |-UDP1--------+ | | | | | +-UDP1------| |-UDP1--------+ | | | |||
| | | +-RTP1----| |-RTP1------+ | | | | | | | +-RTP1----| |-RTP1------+ | | | | |||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | | | +-Audio-| |-Audio---+ | | | +---+ | | |||
| | | | | AA1|------------>|---------+-+-+-+-|DEC|----+ | | | | | | | AA1|------------>|---------+-+-+-+-|DEC|----+ | | |||
| | | | | |<------------|BA1 <----+ | | | +---+ | | | | | | | | |<------------|BA1 <----+ | | | +---+ | | | |||
| | | | | | | |\| | | +---+ | | | | | | | | | | |\| | | +---+ | | | |||
| | | | +-------| |---------+ +-+-+-|ENC|<-+ | | | | | | | +-------| |---------+ +-+-+-|ENC|<-+ | | | |||
skipping to change at page 52, line 38 ¶ | skipping to change at page 53, line 38 ¶ | |||
| | | +---------| |-----------+ | | +---+ | | | | | +---------| |-----------+ | | +---+ | | |||
| | +-----------| |-------------+ | | | | | +-----------| |-------------+ | | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +-----------------------------+ | +---------------+ +-----------------------------+ | |||
Figure 13: Media Transcoder | Figure 13: Media Transcoder | |||
Figure 13 exposes some important details. First of all you can see | Figure 13 exposes some important details. First of all you can see | |||
the SSRC identifiers used by the translator are the corresponding | the SSRC identifiers used by the translator are the corresponding | |||
end-points. Secondly, there is a relation between the RTP sessions | end-points. Secondly, there is a relation between the RTP sessions | |||
in the two different PeerConnections that are represtented by having | in the two different PeerConnections that are represented by having | |||
both parts be identified by the same level and they need to share | both parts be identified by the same level and they need to share | |||
certain contexts. Also certain type of RTCP messages will need to be | certain contexts. Also certain type of RTCP messages will need to be | |||
bridged between the two parts. Certain RTCP feedback messages are | bridged between the two parts. Certain RTCP feedback messages are | |||
likely needed to be soruced by the translator in response to actions | likely needed to be sourced by the translator in response to actions | |||
by the translator and its media encoder. | by the translator and its media encoder. | |||
A.4.2. Gateway / Protocol Translator | A.4.2. Gateway / Protocol Translator | |||
Gateways are used when some protocol feature that is required is not | Gateways are used when some protocol feature that are needed are not | |||
supported by an end-point wants to participate in session. This RTP | supported by an end-point wants to participate in session. This RTP | |||
translator in Figure 14 takes on the role of ensuring that from the | translator in Figure 14 takes on the role of ensuring that from the | |||
perspective of participant A, participant B appears as a fully | perspective of participant A, participant B appears as a fully | |||
compliant WebRTC end-point (that is, it is the combination of the | compliant WebRTC end-point (that is, it is the combination of the | |||
Translator and participant B that looks like a WebRTC end point). | Translator and participant B that looks like a WebRTC end point). | |||
+------------+ | +------------+ | |||
| | | | | | |||
+---+ | Translator | +---+ | +---+ | Translator | +---+ | |||
| A |<---->| to legacy |<---->| B | | | A |<---->| to legacy |<---->| B | | |||
+---+ | end-point | +---+ | +---+ | end-point | +---+ | |||
WebRTC | | Legacy | WebRTC | | Legacy | |||
+------------+ | +------------+ | |||
Figure 14: Gateway (RTP translator) towards legacy end-point | Figure 14: Gateway (RTP translator) towards legacy end-point | |||
For WebRTC there are a number of requirements that could force the | For WebRTC there are a number of requirements that could force the | |||
need for a gateway if a WebRTC end-point is to communicate with a | need for a gateway if a WebRTC end-point is to communicate with a | |||
legacy end-point, such as support of ICE and DTLS-SRTP for | legacy end-point, such as support of ICE and DTLS-SRTP for key | |||
keymanagement. On RTP level the main functions that may be missing | management. On RTP level the main functions that might be missing in | |||
in a legacy implementation that otherswise support RTP are RTCP in | a legacy implementation that otherwise support RTP are RTCP in | |||
general, SRTP implementation, congestion control and feedback | general, SRTP implementation, congestion control and feedback | |||
messages required to make it work. | messages needed to make it work. | |||
+-A-------------+ +-Translator------------------+ | +-A-------------+ +-Translator------------------+ | |||
| +-PeerC1------| |-PeerC1------+ | | | +-PeerC1------| |-PeerC1------+ | | |||
| | +-UDP1------| |-UDP1------+ | | | | | +-UDP1------| |-UDP1------+ | | | |||
| | | +-RTP1----| |-RTP1-----------------------+| | | | | +-RTP1----| |-RTP1-----------------------+| | |||
| | | | +-Audio-| |-Audio---+ || | | | | | +-Audio-| |-Audio---+ || | |||
| | | | | AA1|------------>|---------+----------------+ || | | | | | | AA1|------------>|---------+----------------+ || | |||
| | | | | |<------------|BA1 <----+--------------+ | || | | | | | | |<------------|BA1 <----+--------------+ | || | |||
| | | | | |<---RTCP---->|<--------+----------+ | | || | | | | | | |<---RTCP---->|<--------+----------+ | | || | |||
| | | | +-------| |---------+ +---+-+ | | || | | | | | +-------| |---------+ +---+-+ | | || | |||
skipping to change at page 54, line 5 ¶ | skipping to change at page 55, line 5 ¶ | |||
| | | | BA1|------------>|---------+--------------+ | || | | | | | BA1|------------>|---------+--------------+ | || | |||
| | | | |<------------|AA1 <----+----------------+ || | | | | | |<------------|AA1 <----+----------------+ || | |||
| | | +-------| |---------+ || | | | | +-------| |---------+ || | |||
| | +---------| |----------------------------+| | | | +---------| |----------------------------+| | |||
| +-----------| |-----------+ | | | +-----------| |-----------+ | | |||
| | | | | | | | | | |||
+---------------+ +-----------------------------+ | +---------------+ +-----------------------------+ | |||
Figure 15: RTP/RTCP Protocol Translator | Figure 15: RTP/RTCP Protocol Translator | |||
The legacy gateway may be implemented in several ways and what it | The legacy gateway can be implemented in several ways and what it | |||
need to change is higly dependent on what functions it need to proxy | need to change is highly dependent on what functions it need to proxy | |||
for the legacy end-point. One possibility is depicted in Figure 15 | for the legacy end-point. One possibility is depicted in Figure 15 | |||
where the RTP media streams are compatible and forward without | where the RTP media streams are compatible and forward without | |||
changes. However, their RTP header values are captured to enable the | changes. However, their RTP header values are captured to enable the | |||
RTCP translator to create RTCP reception information related to the | RTCP translator to create RTCP reception information related to the | |||
leg between the end-point and the translator. This can then be | leg between the end-point and the translator. This can then be | |||
combined with the more basic RTCP reports that the legacy endpoint | combined with the more basic RTCP reports that the legacy endpoint | |||
(B) provides to give compatible and expected RTCP reporting to A. | (B) provides to give compatible and expected RTCP reporting to A. | |||
Thus enabling at least full congestion control on the path between A | Thus enabling at least full congestion control on the path between A | |||
and the translator. If B has limited possibilities for congestion | and the translator. If B has limited possibilities for congestion | |||
response for the media then the translator may need the capabilities | response for the media then the translator might need the capability | |||
to perform media transcoding to address cases where it otherwise | to perform media transcoding to address cases where it otherwise | |||
would need to terminate media transmission. | would need to terminate media transmission. | |||
As the translator are generating RTP/RTCP traffic on behalf of B to A | As the translator are generating RTP/RTCP traffic on behalf of B to A | |||
it will need to be able to correctly protect these packets that it | it will need to be able to correctly protect these packets that it | |||
translates or generates. Thus security context information are | translates or generates. Thus security context information are | |||
required in this type of translator if it operates on the RTP/RTCP | needed in this type of translator if it operates on the RTP/RTCP | |||
packet content or media. In fact one of the more likley scenario is | packet content or media. In fact one of the more likely scenario is | |||
that the translator (gateway) will need to have two different | that the translator (gateway) will need to have two different | |||
security contexts one towards A and one towards B and for each RTP/ | security contexts one towards A and one towards B and for each RTP/ | |||
RTCP packet do a authenticity verification, decryption followed by a | RTCP packet do a authenticity verification, decryption followed by a | |||
encryption and integirty protection operation to resolve missmatch in | encryption and integrity protection operation to resolve mismatch in | |||
security systems. | security systems. | |||
A.4.3. Relay | A.4.3. Relay | |||
There exist a class of translators that operates on transport level | There exist a class of translators that operates on transport level | |||
below RTP and thus do not effect RTP/RTCP packets directly. They | below RTP and thus do not effect RTP/RTCP packets directly. They | |||
come in two distinct flavors, the one used to bridge between two | come in two distinct flavours, the one used to bridge between two | |||
different transport or address domains to more function as a gateway | different transport or address domains to more function as a gateway | |||
and the second one which is to to provide a group communication | and the second one which is to to provide a group communication | |||
feature as depicted below in Figure 16. | feature as depicted below in Figure 16. | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
| A |<---->| |<---->| B | | | A |<---->| |<---->| B | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| Translator | | | Translator | | |||
+---+ | | +---+ | +---+ | | +---+ | |||
| C |<---->| |<---->| D | | | C |<---->| |<---->| D | | |||
+---+ +------------+ +---+ | +---+ +------------+ +---+ | |||
Figure 16: RTP Translator (Relay) with Only Unicast Paths | Figure 16: RTP Translator (Relay) with Only Unicast Paths | |||
The first kind is straight forward and is likely to exist in WebRTC | The first kind is straight forward and is likely to exist in WebRTC | |||
context when an legacy end-point is compatible with the exception for | context when an legacy end-point is compatible with the exception for | |||
ICE, and thus needs a gateway that terminates the ICE and then | ICE, and thus needs a gateway that terminates the ICE and then | |||
forwards all the RTP/RTCP traffic and keymanagment to the end-point | forwards all the RTP/RTCP traffic and key management to the end-point | |||
only rewriting the IP/UDP to forward the packet to the legacy node. | only rewriting the IP/UDP to forward the packet to the legacy node. | |||
The second type is useful if one wants a less complex central node or | The second type is useful if one wants a less complex central node or | |||
a central node that is outside of the security context and thus do | a central node that is outside of the security context and thus do | |||
not have access to the media. This relay takes on the role of | not have access to the media. This relay takes on the role of | |||
forwarding the media (RTP and RTCP) packets to the other end-points | forwarding the media (RTP and RTCP) packets to the other end-points | |||
but doesn't perform any RTP or media processing. Such a device | but doesn't perform any RTP or media processing. Such a device | |||
simply forwards the media from each sender to all of the other | simply forwards the media from each sender to all of the other | |||
particpants, and is sometimes called a transport-layer translator. | participants, and is sometimes called a transport-layer translator. | |||
In Figure 16, participant A will only need to send a media once to | In Figure 16, participant A will only need to send a media once to | |||
the relay, which will redistribute it by sending a copy of the stream | the relay, which will redistribute it by sending a copy of the stream | |||
to participants B, C, and D. Participant A will still receive three | to participants B, C, and D. Participant A will still receive three | |||
RTP streams with the media from B, C and D if they transmit | RTP streams with the media from B, C and D if they transmit | |||
simultaneously. This is from an RTP perspective resulting in an RTP | simultaneously. This is from an RTP perspective resulting in an RTP | |||
session that behaves equivalent to one transporter over an IP Any | session that behaves equivalent to one transporter over an IP Any | |||
Source Multicast (ASM). | Source Multicast (ASM). | |||
This results in one common RTP session between all participants | This results in one common RTP session between all participants | |||
despite that there will be independent PeerConnections created to the | despite that there will be independent PeerConnections created to the | |||
skipping to change at page 57, line 5 ¶ | skipping to change at page 58, line 5 ¶ | |||
| | +-----------| |-------------+ | | | | | +-----------| |-------------+ | | | |||
| +-------------| |---------------+ | | | +-------------| |---------------+ | | |||
+---------------+ +--------------------------------+ | +---------------+ +--------------------------------+ | |||
Figure 17: Transport Multi-party Relay | Figure 17: Transport Multi-party Relay | |||
As the Relay RTP and RTCP packets between the UDP flows as indicated | As the Relay RTP and RTCP packets between the UDP flows as indicated | |||
by the arrows for the media flow a given WebRTC end-point, like A | by the arrows for the media flow a given WebRTC end-point, like A | |||
will see the remote sources BV1 and CV1. There will be also two | will see the remote sources BV1 and CV1. There will be also two | |||
different network paths between A, and B or C. This results in that | different network paths between A, and B or C. This results in that | |||
the client A must be capable of handlilng that when determining | the client A has to be capable of handling that when determining | |||
congestion state that there might exist multiple destinations on the | congestion state that there might exist multiple destinations on the | |||
far side of a PeerConnection and that these paths shall be treated | far side of a PeerConnection and that these paths have to be treated | |||
differently. It also results in a requirement to combine the | differently. It also results in a requirement to combine the | |||
different congestion states into a decision to transmit a particular | different congestion states into a decision to transmit a particular | |||
RTP media stream suitable to all participants. | RTP media stream suitable to all participants. | |||
It is also important to note that the relay can not perform selective | It is also important to note that the relay can not perform selective | |||
relaying of some sources and not others. The reason is that the RTCP | relaying of some sources and not others. The reason is that the RTCP | |||
reporting in that case becomes incosistent and without explicit | reporting in that case becomes inconsistent and without explicit | |||
information about it being blocked must be interpret as severe | information about it being blocked has to be interpreted as severe | |||
congestion. | congestion. | |||
In this usage it is also necessary that the session management has | In this usage it is also necessary that the session management has | |||
configured a common set of RTP configuration including RTP payload | configured a common set of RTP configuration including RTP payload | |||
formats as when A sends a packet with pt=97 it will arrive at both B | formats as when A sends a packet with pt=97 it will arrive at both B | |||
and C carrying pt=97 and having the same packetization and encoding, | and C carrying pt=97 and having the same packetization and encoding, | |||
no entity will have manipulated the packet. | no entity will have manipulated the packet. | |||
When it comes to security there exist some additional requirements to | When it comes to security there exist some additional requirements to | |||
ensure that the property that the relay can't read the media traffic | ensure that the property that the relay can't read the media traffic | |||
is enforced. First of all the key to be used must be agreed such so | is enforced. First of all the key to be used has to be agreed such | |||
that the relay doesn't get it, e.g. no DTLS-SRTP handshake with the | so that the relay doesn't get it, e.g. no DTLS-SRTP handshake with | |||
relay, instead some other method must be used. Secondly, the keying | the relay, instead some other method needs to be used. Secondly, the | |||
structure must be capable of handling multiple end-points in the same | keying structure has to be capable of handling multiple end-points in | |||
RTP session. | the same RTP session. | |||
The second problem can basically be solved in two ways. Either a | The second problem can basically be solved in two ways. Either a | |||
common master key from which all derive their per source key for | common master key from which all derive their per source key for | |||
SRTP. The second alternative which might be more practical is that | SRTP. The second alternative which might be more practical is that | |||
each end-point has its own key used to protects all RTP/RTCP packets | each end-point has its own key used to protects all RTP/RTCP packets | |||
it sends. Each participants key are then distributed to the other | it sends. Each participants key are then distributed to the other | |||
participants. This second method could be implemented using DTLS- | participants. This second method could be implemented using DTLS- | |||
SRTP to a special key server and then use Encrypted Key Transport | SRTP to a special key server and then use Encrypted Key Transport | |||
[I-D.ietf-avt-srtp-ekt] to distribute the actual used key to the | [I-D.ietf-avt-srtp-ekt] to distribute the actual used key to the | |||
other participants in the RTP session Figure 18. The first one could | other participants in the RTP session Figure 18. The first one could | |||
skipping to change at page 58, line 50 ¶ | skipping to change at page 59, line 50 ¶ | |||
+---+ +---+ +---+ | +---+ +---+ +---+ | |||
| A |--->| B |--->| C | | | A |--->| B |--->| C | | |||
+---+ +---+ +---+ | +---+ +---+ +---+ | |||
Figure 19: MediaStream Forwarding | Figure 19: MediaStream Forwarding | |||
There exist two main approaches to how B forwards the media from A to | There exist two main approaches to how B forwards the media from A to | |||
C. The first one is to simply relay the RTP media stream. The second | C. The first one is to simply relay the RTP media stream. The second | |||
one is for B to act as a transcoder. Lets consider both approaches. | one is for B to act as a transcoder. Lets consider both approaches. | |||
A relay approache will result in that the WebRTC end-points will have | A relay approach will result in that the WebRTC end-points will have | |||
to have the same capabilities as being discussed in Relay | to have the same capabilities as being discussed in Relay | |||
(Appendix A.4.3). Thus A will see an RTP session that is extended | (Appendix A.4.3). Thus A will see an RTP session that is extended | |||
beyond the PeerConnection and see two different receiving end-points | beyond the PeerConnection and see two different receiving end-points | |||
with different path characteristics (B and C). Thus A's congestion | with different path characteristics (B and C). Thus A's congestion | |||
control needs to be capable of handling this. The security solution | control needs to be capable of handling this. The security solution | |||
can either support mechanism that allows A to inform C about the key | can either support mechanism that allows A to inform C about the key | |||
A is using despite B and C having agreed on another set of keys. | A is using despite B and C having agreed on another set of keys. | |||
Alternatively B will decrypt and then re-encrypt using a new key. | Alternatively B will decrypt and then re-encrypt using a new key. | |||
The relay based approach has the advantage that B does not need to | The relay based approach has the advantage that B does not need to | |||
transcode the media thus both maintaining the quality of the encoding | transcode the media thus both maintaining the quality of the encoding | |||
and reducing B's complexity requirements. If the right security | and reducing B's complexity requirements. If the right security | |||
solutions are supported then also C will be able to verify the | solutions are supported then also C will be able to verify the | |||
authenticity of the media comming from A. As downside A are forced to | authenticity of the media coming from A. As downside A are forced to | |||
take both B and C into consideration when delivering content. | take both B and C into consideration when delivering content. | |||
The media transcoder approach is similar to having B act as Mixer | The media transcoder approach is similar to having B act as Mixer | |||
terminating the RTP session combined with the transcoder as discussed | terminating the RTP session combined with the transcoder as discussed | |||
in Appendix A.4.1. A will only see B as receiver of its media. B | in Appendix A.4.1. A will only see B as receiver of its media. B | |||
will responsible to produce a RTP media stream suitable for the B to | will responsible to produce a RTP media stream suitable for the B to | |||
C PeerConnection. This may require media transcoding for congestion | C PeerConnection. This might require media transcoding for | |||
control purpose to produce a suitable bit-rate. Thus loosing media | congestion control purpose to produce a suitable bit-rate. Thus | |||
quality in the transcoding and forcing B to spend the resource on the | loosing media quality in the transcoding and forcing B to spend the | |||
transcoding. The media transcoding does result in a separation of | resource on the transcoding. The media transcoding does result in a | |||
the two different legs removing almost all dependencies. B could | separation of the two different legs removing almost all | |||
choice to implement logic to optimize its media transcoding | dependencies. B could choice to implement logic to optimize its | |||
operation, by for example requesting media properties that are | media transcoding operation, by for example requesting media | |||
suitable for C also, thus trying to avoid it having to transcode the | properties that are suitable for C also, thus trying to avoid it | |||
content and only forward the media payloads between the two sides. | having to transcode the content and only forward the media payloads | |||
For that optimization to be practical WebRTC end-points must support | between the two sides. For that optimization to be practical WebRTC | |||
sufficiently good tools for codec control. | end-points have to support sufficiently good tools for codec control. | |||
A.6. Simulcast | A.6. Simulcast | |||
This section discusses simulcast in the meaning of providing a node, | This section discusses simulcast in the meaning of providing a node, | |||
for example a stream switching Mixer, with multiple different encoded | for example a stream switching Mixer, with multiple different encoded | |||
version of the same media source. In the WebRTC context that appears | version of the same media source. In the WebRTC context that appears | |||
to be most easily accomplished by establishing mutliple | to be most easily accomplished by establishing multiple | |||
PeerConnection all being feed the same set of WebRTC MediaStreams. | PeerConnection all being feed the same set of WebRTC MediaStreams. | |||
Each PeerConnection is then configured to deliver a particular media | Each PeerConnection is then configured to deliver a particular media | |||
quality and thus media bit-rate. This will work well as long as the | quality and thus media bit-rate. This will work well as long as the | |||
end-point implements media encoding according to Figure 7. Then each | end-point implements media encoding according to Figure 7. Then each | |||
PeerConnection will receive an independently encoded version and the | PeerConnection will receive an independently encoded version and the | |||
codec parameters can be agreed specifically in the context of this | codec parameters can be agreed specifically in the context of this | |||
PeerConnection. | PeerConnection. | |||
For simulcast to work one needs to prevent that the end-point deliver | For simulcast to work one needs to prevent that the end-point deliver | |||
content encoded as depicted in Figure 8. If a single encoder | content encoded as depicted in Figure 8. If a single encoder | |||
instance is feed to multiple PeerConnections the intention of | instance is feed to multiple PeerConnections the intention of | |||
performing simulcast will fail. | performing simulcast will fail. | |||
Thus it should be considered to explicitly signal which of the two | Thus it needs to be considered to explicitly signal which of the two | |||
implementation strategies that are desired and which will be done. | implementation strategies that are desired and which will be done. | |||
At least making the application and possible the central node | At least making the application and possible the central node | |||
interested in receiving simulcast of an end-points RTP media streams | interested in receiving simulcast of an end-points RTP media streams | |||
to be aware if it will function or not. | to be aware if it will function or not. | |||
Authors' Addresses | Authors' Addresses | |||
Colin Perkins | Colin Perkins | |||
University of Glasgow | University of Glasgow | |||
School of Computing Science | School of Computing Science | |||
End of changes. 179 change blocks. | ||||
560 lines changed or deleted | 584 lines changed or added | |||
This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |