draft-ietf-taps-impl-04.txt | draft-ietf-taps-impl-05.txt | |||
---|---|---|---|---|
TAPS Working Group A. Brunstrom, Ed. | TAPS Working Group A. Brunstrom, Ed. | |||
Internet-Draft Karlstad University | Internet-Draft Karlstad University | |||
Intended status: Informational T. Pauly, Ed. | Intended status: Informational T. Pauly, Ed. | |||
Expires: January 9, 2020 Apple Inc. | Expires: May 7, 2020 Apple Inc. | |||
T. Enghardt | T. Enghardt | |||
TU Berlin | TU Berlin | |||
K-J. Grinnemo | K-J. Grinnemo | |||
Karlstad University | Karlstad University | |||
T. Jones | T. Jones | |||
University of Aberdeen | University of Aberdeen | |||
P. Tiesel | P. Tiesel | |||
TU Berlin | TU Berlin | |||
C. Perkins | C. Perkins | |||
University of Glasgow | University of Glasgow | |||
M. Welzl | M. Welzl | |||
University of Oslo | University of Oslo | |||
July 08, 2019 | November 04, 2019 | |||
Implementing Interfaces to Transport Services | Implementing Interfaces to Transport Services | |||
draft-ietf-taps-impl-04 | draft-ietf-taps-impl-05 | |||
Abstract | Abstract | |||
The Transport Services architecture [I-D.ietf-taps-arch] defines a | The Transport Services architecture [I-D.ietf-taps-arch] defines a | |||
system that allows applications to use transport networking protocols | system that allows applications to use transport networking protocols | |||
flexibly. This document serves as a guide to implementation on how | flexibly. This document serves as a guide to implementation on how | |||
to build such a system. | to build such a system. | |||
Status of This Memo | Status of This Memo | |||
skipping to change at page 1, line 46 ¶ | skipping to change at page 1, line 46 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 9, 2020. | This Internet-Draft will expire on May 7, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Implementing Basic Objects . . . . . . . . . . . . . . . . . 3 | 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 | |||
3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 4 | 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 4 | |||
3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 | 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 | |||
3.2. Role of system policy . . . . . . . . . . . . . . . . . . 5 | 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 | |||
4. Implementing Connection Establishment . . . . . . . . . . . . 6 | 4. Implementing Connection Establishment . . . . . . . . . . . . 6 | |||
4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 7 | 4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 8 | |||
4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 7 | 4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 8 | |||
4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 | 4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 | |||
4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 10 | 4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 11 | |||
4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 13 | 4.2. Branching Order-of-Operations . . . . . . . . . . . . . . 13 | |||
4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 14 | 4.3. Sorting Branches . . . . . . . . . . . . . . . . . . . . 14 | |||
4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 15 | 4.4. Candidate Racing . . . . . . . . . . . . . . . . . . . . 15 | |||
4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 16 | 4.4.1. Delayed . . . . . . . . . . . . . . . . . . . . . . . 16 | |||
4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 16 | 4.4.2. Failover . . . . . . . . . . . . . . . . . . . . . . 17 | |||
4.5. Completing Establishment . . . . . . . . . . . . . . . . 17 | 4.5. Completing Establishment . . . . . . . . . . . . . . . . 17 | |||
4.5.1. Determining Successful Establishment . . . . . . . . 17 | 4.5.1. Determining Successful Establishment . . . . . . . . 18 | |||
4.6. Establishing multiplexed connections . . . . . . . . . . 18 | 4.6. Establishing multiplexed connections . . . . . . . . . . 18 | |||
4.7. Handling racing with "unconnected" protocols . . . . . . 19 | 4.7. Handling racing with "unconnected" protocols . . . . . . 19 | |||
4.8. Implementing listeners . . . . . . . . . . . . . . . . . 19 | 4.8. Implementing listeners . . . . . . . . . . . . . . . . . 19 | |||
4.8.1. Implementing listeners for Connected Protocols . . . 20 | 4.8.1. Implementing listeners for Connected Protocols . . . 20 | |||
4.8.2. Implementing listeners for Unconnected Protocols . . 20 | 4.8.2. Implementing listeners for Unconnected Protocols . . 20 | |||
4.8.3. Implementing listeners for Multiplexed Protocols . . 20 | 4.8.3. Implementing listeners for Multiplexed Protocols . . 20 | |||
5. Implementing Data Transfer . . . . . . . . . . . . . . . . . 20 | 5. Implementing Sending and Receiving Data . . . . . . . . . . . 21 | |||
5.1. Data transfer for streams, datagrams, and frames . . . . 20 | 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 21 | |||
5.1.1. Sending Messages . . . . . . . . . . . . . . . . . . 21 | 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 21 | |||
5.1.2. Receiving Messages . . . . . . . . . . . . . . . . . 23 | 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 23 | |||
5.2. Handling of data for fast-open protocols . . . . . . . . 23 | 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 23 | |||
6. Implementing Maintenance . . . . . . . . . . . . . . . . . . 24 | 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 23 | |||
6.1. Managing Connections . . . . . . . . . . . . . . . . . . 24 | 5.3. Handling of data for fast-open protocols . . . . . . . . 24 | |||
6.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 26 | 6. Implementing Message Framers . . . . . . . . . . . . . . . . 24 | |||
6.1. Defining Message Framers . . . . . . . . . . . . . . . . 25 | ||||
7. Implementing Termination . . . . . . . . . . . . . . . . . . 26 | 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 26 | |||
8. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 27 | 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 26 | |||
8.1. Protocol state caches . . . . . . . . . . . . . . . . . . 27 | 7. Implementing Connection Management . . . . . . . . . . . . . 27 | |||
8.2. Performance caches . . . . . . . . . . . . . . . . . . . 28 | 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 28 | |||
9. Specific Transport Protocol Considerations . . . . . . . . . 29 | 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 28 | |||
9.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | 8. Implementing Connection Termination . . . . . . . . . . . . . 29 | |||
9.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 | 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
9.3. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 | 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 30 | |||
9.4. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 31 | |||
9.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | 10. Specific Transport Protocol Considerations . . . . . . . . . 32 | |||
9.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
9.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 36 | 10.2. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
9.8. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 36 | 10.3. TLS . . . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | 10.4. DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
11. Security Considerations . . . . . . . . . . . . . . . . . . . 37 | 10.5. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
11.1. Considerations for Candidate Gathering . . . . . . . . . 37 | 10.6. QUIC . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
11.2. Considerations for Candidate Racing . . . . . . . . . . 37 | 10.7. HTTP/2 transport . . . . . . . . . . . . . . . . . . . . 39 | |||
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 38 | 10.8. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 | |||
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 | |||
13.1. Normative References . . . . . . . . . . . . . . . . . . 38 | 12. Security Considerations . . . . . . . . . . . . . . . . . . . 42 | |||
13.2. Informative References . . . . . . . . . . . . . . . . . 39 | 12.1. Considerations for Candidate Gathering . . . . . . . . . 42 | |||
Appendix A. Additional Properties . . . . . . . . . . . . . . . 40 | 12.2. Considerations for Candidate Racing . . . . . . . . . . 42 | |||
A.1. Properties Affecting Sorting of Branches . . . . . . . . 40 | 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 42 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
14.1. Normative References . . . . . . . . . . . . . . . . . . 43 | ||||
14.2. Informative References . . . . . . . . . . . . . . . . . 44 | ||||
Appendix A. Additional Properties . . . . . . . . . . . . . . . 45 | ||||
A.1. Properties Affecting Sorting of Branches . . . . . . . . 45 | ||||
Appendix B. Reasons for errors . . . . . . . . . . . . . . . . . 45 | ||||
Appendix C. Existing Implementations . . . . . . . . . . . . . . 46 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 47 | ||||
1. Introduction | 1. Introduction | |||
The Transport Services architecture [I-D.ietf-taps-arch] defines a | The Transport Services architecture [I-D.ietf-taps-arch] defines a | |||
system that allows applications to use transport networking protocols | system that allows applications to use transport networking protocols | |||
flexibly. The interface such a system exposes to applications is | flexibly. The interface such a system exposes to applications is | |||
defined as the Transport Services API [I-D.ietf-taps-interface]. | defined as the Transport Services API [I-D.ietf-taps-interface]. | |||
This API is designed to be generic across multiple transport | This API is designed to be generic across multiple transport | |||
protocols and sets of protocols features. | protocols and sets of protocols features. | |||
This document serves as a guide to implementation on how to build a | This document serves as a guide to implementation on how to build a | |||
system that provides a Transport Services API. It is the job of an | system that provides a Transport Services API. It is the job of an | |||
implementation of a Transport Services system to turn the requests of | implementation of a Transport Services system to turn the requests of | |||
an application into decisions on how to establish connections, and | an application into decisions on how to establish connections, and | |||
how to transfer data over those connections once established. The | how to transfer data over those connections once established. The | |||
terminology used in this document is based on the Architecture | terminology used in this document is based on the Architecture | |||
[I-D.ietf-taps-arch]. | [I-D.ietf-taps-arch]. | |||
2. Implementing Basic Objects | 2. Implementing Connection Objects | |||
The basic objects that are exposed to applications for Transport | The connection objects that are exposed to applications for Transport | |||
Services are the Preconnection, the bundle of properties that | Services are: | |||
describes the application constraints on the transport; the | ||||
Connection, the basic object that represents a flow of data in either | o the Preconnection, the bundle of properties that describes the | |||
direction between the Local and Remote Endpoints; and the Listener, a | application constraints on the transport; | |||
passive waiting object that delivers new Connections. | ||||
o the Connection, the basic object that represents a flow of data in | ||||
either direction between the Local and Remote Endpoints; | ||||
o and the Listener, a passive waiting object that delivers new | ||||
Connections. | ||||
Preconnection objects should be implemented as bundles of properties | Preconnection objects should be implemented as bundles of properties | |||
that an application can both read and write. Once a Preconnection | that an application can both read and write. Once a Preconnection | |||
has been used to create an outbound Connection or a Listener, the | has been used to create an outbound Connection or a Listener, the | |||
implementation should ensure that the copy of the properties held by | implementation should ensure that the copy of the properties held by | |||
the Connection or Listener is immutable. This may involve performing | the Connection or Listener is immutable. This may involve performing | |||
a deep-copy if the application is still able to modify properties on | a deep-copy if the application is still able to modify properties on | |||
the original Preconnection object. | the original Preconnection object. | |||
Connection objects represent the interface between the application | Connection objects represent the interface between the application | |||
skipping to change at page 12, line 45 ¶ | skipping to change at page 13, line 15 ¶ | |||
Another example is racing SCTP with TCP: | Another example is racing SCTP with TCP: | |||
1 [www.example.com:80, Any, Any Stream] | 1 [www.example.com:80, Any, Any Stream] | |||
1.1 [www.example.com:80, Any, SCTP] | 1.1 [www.example.com:80, Any, SCTP] | |||
1.1.1 [192.0.2.1:80, Any, SCTP] | 1.1.1 [192.0.2.1:80, Any, SCTP] | |||
1.2 [www.example.com:80, Any, TCP] | 1.2 [www.example.com:80, Any, TCP] | |||
1.2.1 [192.0.2.1:80, Any, TCP] | 1.2.1 [192.0.2.1:80, Any, TCP] | |||
Implementations that support racing protocols and protocol options | Implementations that support racing protocols and protocol options | |||
should maintain a history of which protocols and protocol options | should maintain a history of which protocols and protocol options | |||
successfully established, on a per-network basis (see Section 8.2). | successfully established, on a per-network basis (see Section 9.2). | |||
This information can influence future racing decisions to prioritize | This information can influence future racing decisions to prioritize | |||
or prune branches. | or prune branches. | |||
4.2. Branching Order-of-Operations | 4.2. Branching Order-of-Operations | |||
Branch types must occur in a specific order relative to one another | Branch types must occur in a specific order relative to one another | |||
to avoid creating leaf nodes with invalid or incompatible settings. | to avoid creating leaf nodes with invalid or incompatible settings. | |||
In the example above, it would be invalid to branch for derived | In the example above, it would be invalid to branch for derived | |||
endpoints (the DNS results for www.example.com) before branching | endpoints (the DNS results for www.example.com) before branching | |||
between interface paths, since usable DNS results on one network may | between interface paths, since usable DNS results on one network may | |||
skipping to change at page 14, line 31 ¶ | skipping to change at page 14, line 44 ¶ | |||
Implementations should sort the branches of the tree of connection | Implementations should sort the branches of the tree of connection | |||
options in order of their preference rank. Leaf nodes on branches | options in order of their preference rank. Leaf nodes on branches | |||
with higher rankings represent connection attempts that will be raced | with higher rankings represent connection attempts that will be raced | |||
first. Implementations should order the branches to reflect the | first. Implementations should order the branches to reflect the | |||
preferences expressed by the application for its new connection, | preferences expressed by the application for its new connection, | |||
including Selection Properties, which are specified in | including Selection Properties, which are specified in | |||
[I-D.ietf-taps-interface]. | [I-D.ietf-taps-interface]. | |||
In addition to the properties provided by the application, an | In addition to the properties provided by the application, an | |||
implementation may include additional criteria such as cached | implementation may include additional criteria such as cached | |||
performance estimates, see Section 8.2, or system policy, see | performance estimates, see Section 9.2, or system policy, see | |||
Section 3.2, in the ranking. Two examples of how Selection and | Section 3.2, in the ranking. Two examples of how Selection and | |||
Connection Properties may be used to sort branches are provided | Connection Properties may be used to sort branches are provided | |||
below: | below: | |||
o "Interface Instance or Type": If the application specifies an | o "Interface Instance or Type": If the application specifies an | |||
interface type to be preferred or avoided, implementations should | interface type to be preferred or avoided, implementations should | |||
rank paths accordingly. If the application specifies an interface | rank paths accordingly. If the application specifies an interface | |||
type to be required or prohibited, we expect an implementation to | type to be required or prohibited, we expect an implementation to | |||
not include the non-conforming paths into the three. | not include the non-conforming paths into the three. | |||
o "Capacity Profile": An implementation may use the Capacity Profile | o "Capacity Profile": An implementation may use the Capacity Profile | |||
to prefer paths optimized for the application's expected traffic | to prefer paths optimized for the application's expected traffic | |||
pattern according to cached performance estimates, see | pattern according to cached performance estimates, see | |||
Section 8.2: | Section 9.2: | |||
* Scavenger: Prefer paths with the highest expected available | * Scavenger: Prefer paths with the highest expected available | |||
bandwidth, based on observed maximum throughput | bandwidth, based on observed maximum throughput | |||
* Low Latency/Interactive: Prefer paths with the lowest expected | * Low Latency/Interactive: Prefer paths with the lowest expected | |||
Round Trip Time | Round Trip Time | |||
* Constant-Rate Streaming: Prefer paths that can satisfy the | * Constant-Rate Streaming: Prefer paths that can satisfy the | |||
requested Stream Send or Stream Receive Bitrate, based on | requested Stream Send or Stream Receive Bitrate, based on | |||
observed maximum throughput | observed maximum throughput | |||
skipping to change at page 17, line 43 ¶ | skipping to change at page 18, line 6 ¶ | |||
If a leaf node has successfully completed its connection, all other | If a leaf node has successfully completed its connection, all other | |||
attempts should be made ineligible for use by the application for the | attempts should be made ineligible for use by the application for the | |||
original request. New connection attempts that involve transmitting | original request. New connection attempts that involve transmitting | |||
data on the network should not be started after another leaf node has | data on the network should not be started after another leaf node has | |||
completed successfully, as the connection as a whole has been | completed successfully, as the connection as a whole has been | |||
established. An implementation may choose to let certain handshakes | established. An implementation may choose to let certain handshakes | |||
and negotiations complete in order to gather metrics to influence | and negotiations complete in order to gather metrics to influence | |||
future connections. Similarly, an implementation may choose to hold | future connections. Similarly, an implementation may choose to hold | |||
onto fully established leaf nodes that were not the first to | onto fully established leaf nodes that were not the first to | |||
establish for use in future connections, but this approach is not | establish for use as part of a Pooled Connection, see Section 7.1, or | |||
recommended since those attempts were slower to connect and may | in future connections. In both cases, keeping additional connections | |||
exhibit less desirable properties. | is generally not recommended since those attempts were slower to | |||
connect and may exhibit less desirable properties. | ||||
4.5.1. Determining Successful Establishment | 4.5.1. Determining Successful Establishment | |||
Implementations may select the criteria by which a leaf node is | Implementations may select the criteria by which a leaf node is | |||
considered to be successfully connected differently on a per-protocol | considered to be successfully connected differently on a per-protocol | |||
basis. If the only protocol being used is a transport protocol with | basis. If the only protocol being used is a transport protocol with | |||
a clear handshake, like TCP, then the obvious choice is to declare | a clear handshake, like TCP, then the obvious choice is to declare | |||
that node "connected" when the last packet of the three-way handshake | that node "connected" when the last packet of the three-way handshake | |||
has been received. If the only protocol being used is an | has been received. If the only protocol being used is an | |||
"unconnected" protocol, like UDP, the implementation may consider the | "unconnected" protocol, like UDP, the implementation may consider the | |||
skipping to change at page 20, line 39 ¶ | skipping to change at page 21, line 5 ¶ | |||
tuple can listen both for entirely new connections (a new HTTP/2 | tuple can listen both for entirely new connections (a new HTTP/2 | |||
stream on a new TCP connection, for example) and for new sub- | stream on a new TCP connection, for example) and for new sub- | |||
connections (a new HTTP/2 stream on an existing connection). If the | connections (a new HTTP/2 stream on an existing connection). If the | |||
abstraction of Connection presented to the application is mapped to | abstraction of Connection presented to the application is mapped to | |||
the multiplexed stream, then the Listener should deliver new | the multiplexed stream, then the Listener should deliver new | |||
Connection objects in the same way for either case. The | Connection objects in the same way for either case. The | |||
implementation should allow the application to introspect the | implementation should allow the application to introspect the | |||
Connection Group marked on the Connections to determine the grouping | Connection Group marked on the Connections to determine the grouping | |||
of the multiplexing. | of the multiplexing. | |||
5. Implementing Data Transfer | 5. Implementing Sending and Receiving Data | |||
5.1. Data transfer for streams, datagrams, and frames | ||||
The most basic mapping for sending a Message is an abstraction of | The most basic mapping for sending a Message is an abstraction of | |||
datagrams, in which the transport protocol naturally deals in | datagrams, in which the transport protocol naturally deals in | |||
discrete packets. Each Message here corresponds to a single | discrete packets. Each Message here corresponds to a single | |||
datagram. Generally, these will be short enough that sending and | datagram. Generally, these will be short enough that sending and | |||
receiving will always use a complete Message. | receiving will always use a complete Message. | |||
For protocols that expose byte-streams, the only delineation provided | For protocols that expose byte-streams, the only delineation provided | |||
by the protocol is the end of the stream in a given direction. Each | by the protocol is the end of the stream in a given direction. Each | |||
Message in this case corresponds to the entire stream of bytes in a | Message in this case corresponds to the entire stream of bytes in a | |||
direction. These Messages may be quite long, in which case they can | direction. These Messages may be quite long, in which case they can | |||
be sent in multiple parts. | be sent in multiple parts. | |||
Protocols that provide the framing (such as length-value protocols, | Protocols that provide the framing (such as length-value protocols, | |||
or protocols that use delimiters) provide data boundaries that may be | or protocols that use delimiters) provide data boundaries that may be | |||
longer than a traditional packet datagram. Each Message for framing | longer than a traditional packet datagram. Each Message for framing | |||
protocols corresponds to a single frame, which may be sent either as | protocols corresponds to a single frame, which may be sent either as | |||
a complete Message, or in multiple parts. | a complete Message, or in multiple parts. | |||
5.1.1. Sending Messages | 5.1. Sending Messages | |||
The effect of the application sending a Message is determined by the | The effect of the application sending a Message is determined by the | |||
top-level protocol in the established Protocol Stack. That is, if | top-level protocol in the established Protocol Stack. That is, if | |||
the top-level protocol provides an abstraction of framed messages | the top-level protocol provides an abstraction of framed messages | |||
over a connection, the receiving application will be able to obtain | over a connection, the receiving application will be able to obtain | |||
multiple Messages on that connection, even if the framing protocol is | multiple Messages on that connection, even if the framing protocol is | |||
built on a byte-stream protocol like TCP. | built on a byte-stream protocol like TCP. | |||
5.1.1.1. Message Properties | 5.1.1. Message Properties | |||
o Lifetime: this should be implemented by removing the Message from | o Lifetime: this should be implemented by removing the Message from | |||
its queue of pending Messages after the Lifetime has expired. A | its queue of pending Messages after the Lifetime has expired. A | |||
queue of pending Messages within the transport system | queue of pending Messages within the transport system | |||
implementation that have yet to be handed to the Protocol Stack | implementation that have yet to be handed to the Protocol Stack | |||
can always support this property, but once a Message has been sent | can always support this property, but once a Message has been sent | |||
into the send buffer of a protocol, only certain protocols may | into the send buffer of a protocol, only certain protocols may | |||
support de-queueing a message. For example, TCP cannot remove | support de-queueing a message. For example, TCP cannot remove | |||
bytes from its send buffer, while in case of SCTP, such control | bytes from its send buffer, while in case of SCTP, such control | |||
over the SCTP send buffer can be exercised using the partial | over the SCTP send buffer can be exercised using the partial | |||
skipping to change at page 22, line 38 ¶ | skipping to change at page 23, line 5 ¶ | |||
to avoid transport-layer segmentation or network-layer | to avoid transport-layer segmentation or network-layer | |||
fragmentation. Some transports implement network-layer | fragmentation. Some transports implement network-layer | |||
fragmentation avoidance (Path MTU Discovery) without exposing this | fragmentation avoidance (Path MTU Discovery) without exposing this | |||
functionality to the application; in this case, only transport- | functionality to the application; in this case, only transport- | |||
layer segmentation should be avoided, by fitting the message into | layer segmentation should be avoided, by fitting the message into | |||
a single transport-layer segment or otherwise failing. Otherwise, | a single transport-layer segment or otherwise failing. Otherwise, | |||
network-layer fragmentation should be avoided--e.g. by requesting | network-layer fragmentation should be avoided--e.g. by requesting | |||
the IP Don't Fragment bit to be set in case of UDP(-Lite) and IPv4 | the IP Don't Fragment bit to be set in case of UDP(-Lite) and IPv4 | |||
(SET_DF in [RFC8304]). | (SET_DF in [RFC8304]). | |||
5.1.1.2. Send Completion | 5.1.2. Send Completion | |||
The application should be notified whenever a Message or partial | The application should be notified whenever a Message or partial | |||
Message has been consumed by the Protocol Stack, or has failed to | Message has been consumed by the Protocol Stack, or has failed to | |||
send. The meaning of the Message being consumed by the stack may | send. The meaning of the Message being consumed by the stack may | |||
vary depending on the protocol. For a basic datagram protocol like | vary depending on the protocol. For a basic datagram protocol like | |||
UDP, this may correspond to the time when the packet is sent into the | UDP, this may correspond to the time when the packet is sent into the | |||
interface driver. For a protocol that buffers data in queues, like | interface driver. For a protocol that buffers data in queues, like | |||
TCP, this may correspond to when the data has entered the send | TCP, this may correspond to when the data has entered the send | |||
buffer. | buffer. | |||
5.1.1.3. Batching Sends | 5.1.3. Batching Sends | |||
Since sending a Message may involve a context switch between the | Since sending a Message may involve a context switch between the | |||
application and the transport system, sending patterns that involve | application and the transport system, sending patterns that involve | |||
multiple small Messages can incur high overhead if each needs to be | multiple small Messages can incur high overhead if each needs to be | |||
enqueued separately. To avoid this, the application should have a | enqueued separately. To avoid this, the application should have a | |||
way to indicate a batch of Send actions, during which time the | way to indicate a batch of Send actions, during which time the | |||
implementation will hold off on processing Messages until the batch | implementation will hold off on processing Messages until the batch | |||
is complete. This can also help context switches when enqueuing data | is complete. This can also help context switches when enqueuing data | |||
in the interface driver if the operation can be batched. | in the interface driver if the operation can be batched. | |||
5.1.2. Receiving Messages | 5.2. Receiving Messages | |||
Similar to sending, Receiving a Message is determined by the top- | Similar to sending, Receiving a Message is determined by the top- | |||
level protocol in the established Protocol Stack. The main | level protocol in the established Protocol Stack. The main | |||
difference with Receiving is that the size and boundaries of the | difference with Receiving is that the size and boundaries of the | |||
Message are not known beforehand. The application can communicate in | Message are not known beforehand. The application can communicate in | |||
its Receive action the parameters for the Message, which can help the | its Receive action the parameters for the Message, which can help the | |||
implementation know how much data to deliver and when. For example, | implementation know how much data to deliver and when. For example, | |||
if the application only wants to receive a complete Message, the | if the application only wants to receive a complete Message, the | |||
implementation should wait until an entire Message (datagram, stream, | implementation should wait until an entire Message (datagram, stream, | |||
or frame) is read before delivering any Message content to the | or frame) is read before delivering any Message content to the | |||
skipping to change at page 23, line 41 ¶ | skipping to change at page 24, line 5 ¶ | |||
supports a byte-stream and no deframers were supported, the | supports a byte-stream and no deframers were supported, the | |||
application must specify the minimum number of bytes of Message | application must specify the minimum number of bytes of Message | |||
content it wants to receive (which may be just a single byte) to | content it wants to receive (which may be just a single byte) to | |||
control the flow of received data. | control the flow of received data. | |||
If a Connection becomes finished before a requested Receive action | If a Connection becomes finished before a requested Receive action | |||
can be satisfied, the implementation should deliver any partial | can be satisfied, the implementation should deliver any partial | |||
Message content outstanding, or if none is available, an indication | Message content outstanding, or if none is available, an indication | |||
that there will be no more received Messages. | that there will be no more received Messages. | |||
5.2. Handling of data for fast-open protocols | 5.3. Handling of data for fast-open protocols | |||
Several protocols allow sending higher-level protocol or application | Several protocols allow sending higher-level protocol or application | |||
data within the first packet of their protocol establishment, such as | data within the first packet of their protocol establishment, such as | |||
TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446]. This approach is | TCP Fast Open [RFC7413] and TLS 1.3 [RFC8446]. This approach is | |||
referred to as sending Zero-RTT (0-RTT) data. This is a desirable | referred to as sending Zero-RTT (0-RTT) data. This is a desirable | |||
property, but poses challenges to an implementation that uses racing | property, but poses challenges to an implementation that uses racing | |||
during connection establishment. | during connection establishment. | |||
If the application has 0-RTT data to send in any protocol handshakes, | If the application has 0-RTT data to send in any protocol handshakes, | |||
it needs to provide this data before the handshakes have begun. When | it needs to provide this data before the handshakes have begun. When | |||
skipping to change at page 24, line 33 ¶ | skipping to change at page 24, line 46 ¶ | |||
cookies, previously established TLS tickets, or out-of-band | cookies, previously established TLS tickets, or out-of-band | |||
distributed pre-shared keys (PSKs). Implementations should be aware | distributed pre-shared keys (PSKs). Implementations should be aware | |||
of security concerns around using these tokens across multiple | of security concerns around using these tokens across multiple | |||
addresses or paths when racing. In the case of TLS, any given ticket | addresses or paths when racing. In the case of TLS, any given ticket | |||
or PSK should only be used on one leaf node. If implementations have | or PSK should only be used on one leaf node. If implementations have | |||
multiple tickets available from a previous connection, each leaf node | multiple tickets available from a previous connection, each leaf node | |||
attempt must use a different ticket. In effect, each leaf node will | attempt must use a different ticket. In effect, each leaf node will | |||
send the same early application data, yet encoded (encrypted) | send the same early application data, yet encoded (encrypted) | |||
differently on the wire. | differently on the wire. | |||
6. Implementing Maintenance | 6. Implementing Message Framers | |||
Maintenance encompasses changes that the application can request to a | Message Framers are pieces of code that define simple transformations | |||
Connection, or that a Connection can react to based on system and | between application Message data and raw transport protocol data. A | |||
network changes. | Framer can encapsulate or encode outbound Messages, and decapsulate | |||
or decode inbound data into Messages. | ||||
6.1. Managing Connections | While many protocols can be represented as Message Framers, for the | |||
purposes of the Transport Services interface these are ways for | ||||
applications or application frameworks to define their own Message | ||||
parsing to be included within a Connection's Protocol Stack. As an | ||||
example, TLS can serve the purpose of framing data over TCP, but is | ||||
exposed as a protocol natively supported by the Transport Services | ||||
interface. | ||||
Appendix A.1 of [I-D.ietf-taps-minset] explains, using primitives | Most Message Framers fall into one of two categories: | |||
from [RFC8303] and [RFC8304], how to implement changing some of the | ||||
following protocol properties of an established connection with TCP | ||||
and UDP. Below, we amend this description for other protocols (if | ||||
applicable) and extend it with Connection Properties that are not | ||||
contained in [I-D.ietf-taps-minset]. | ||||
o Notification of excessive retransmissions: TODO | o Header-prefixed record formats, such as a basic Type-Length-Value | |||
o Retransmission threshold before excessive retransmission | (TLV) structure | |||
notification: TODO; for TCP, this can be done using ERROR.TCP | ||||
described in section 4 of [RFC8303]. | ||||
o Notification of ICMP soft error message arrival: TODO | o Delimiter-separated formats, such as HTTP/1.1. | |||
o Required minimum coverage of the checksum for receiving: for UDP- | Common Message Framers can be provided by the Transport Services | |||
Lite, this can be done using the primitive | implementation, but an implemention ought to allow custom Message | |||
SET_MIN_CHECKSUM_COVERAGE.UDP-Lite described in section 4 of | Framers to be defined by the application or some other piece of | |||
[RFC8303]. | software. This section describes one possible interface for defining | |||
Message Framers as an example. | ||||
o Priority (Connection): TODO; for SCTP, this can be done using the | 6.1. Defining Message Framers | |||
primitive CONFIGURE_STREAM_SCHEDULER.SCTP described in section 4 | ||||
of [RFC8303]. | ||||
o Timeout for aborting Connection: for SCTP, this can be done using | A Message Framer is primarily defined by the set of code that handles | |||
the primitive CHANGE_TIMEOUT.SCTP described in section 4 of | events for a framer implementation, specifically how it handles | |||
[RFC8303]. | inbound and outbound data parsing. The piece of code that implements | |||
custom framing logic will be referred to as the "framer | ||||
implementation", which may be provided by the Transport Services | ||||
implementation or the application itself. The Message Framer refers | ||||
to the object or piece of code within the main Connection | ||||
implementation that delivers events to the custom framer | ||||
implementation whenever data is ready to be parsed or framed. | ||||
o Connection group transmission scheduler: for SCTP, this can be | When a Connection establishment attempt begins, an event can be | |||
done using the primitive SET_STREAM_SCHEDULER.SCTP described in | delivered to notify the framer implementation that a new Connection | |||
section 4 of [RFC8303]. | is being created. Similarly, a stop event can be delivered when a | |||
Connection is being torn down. The framer implementation can use the | ||||
Connection object to look up specific properties of the Connection or | ||||
the network being used that may influence how to frame Messages. | ||||
o Maximum message size concurrent with Connection establishment: | MessageFramer -> Start(Connection) | |||
TODO | MessageFramer -> Stop(Connection) | |||
o Maximum Message size before fragmentation or segmentation: TODO | When a Message Framer generates a "Start" event, the framer | |||
implementation has the opportunity to start writing some data prior | ||||
to the Connection delivering its "Ready" event. This allows the | ||||
implementation to communicate control data to the remote endpoint | ||||
that can be used to parse Messages. | ||||
o Maximum Message size on send: TODO | MessageFramer.MakeConnectionReady(Connection) | |||
o Maximum Message size on receive: TODO | At any time if the implementation encounters a fatal error, it can | |||
also cause the Connection to fail and provide an error. | ||||
o Capacity Profile: TODO | MessageFramer.FailConnection(Connection, Error) | |||
o Bounds on Send or Receive Rate: TODO | Before an implementation marks a Message Framer as ready, it can also | |||
dynamically add a protocol or framer above it in the stack. This | ||||
allows protocols like STARTTLS, that need to add TLS conditionally, | ||||
to modify the Protocol Stack based on a handshake result. | ||||
o TCP-specific Property: User Timeout: for TCP, this can be | otherFramer := NewMessageFramer() | |||
configured using the primitive CHANGE_TIMEOUT.TCP described in | MessageFramer.PrependFramer(Connection, otherFramer) | |||
section 4 of [RFC8303]. | ||||
It may happen that the application attempts to set a Protocol | 6.2. Sender-side Message Framing | |||
Property which does not apply to the actually chosen protocol. In | ||||
this case, the implementation should fail gracefully, i.e., it may | ||||
give a warning to the application, but it should not terminate the | ||||
Connection. | ||||
6.2. Handling Path Changes | Message Framers generate an event whenever a Connection sends a new | |||
Message. | ||||
MessageFramer -> NewSentMessage<Connection, MessageData, MessageContext, IsEndOfMessage> | ||||
Upon receiving this event, a framer implementation is responsible for | ||||
performing any necessary transformations and sending the resulting | ||||
data to the next protocol. Implementations SHOULD ensure that there | ||||
is a way to pass the original data through without copying to improve | ||||
performance. | ||||
MessageFramer.Send(Connection, Data) | ||||
To provide an example, a simple protocol that adds a length as a | ||||
header would receive the "NewSentMessage" event, create a data | ||||
representation of the length of the Message data, and then send a | ||||
block of data that is the concatenation of the length header and the | ||||
original Message data. | ||||
6.3. Receiver-side Message Framing | ||||
In order to parse a received flow of data into Messages, the Message | ||||
Framer notifies the framer implementation whenever new data is | ||||
available to parse. | ||||
MessageFramer -> HandleReceivedData<Connection> | ||||
Upon receiving this event, the framer implementation can inspect the | ||||
inbound data. The data is parsed from a particular cursor | ||||
representing the unprocessed data. The application requests a | ||||
specific amount of data it needs to have available in order to parse. | ||||
If the data is not available, the parse fails. | ||||
MessageFramer.Parse(Connection, MinimumIncompleteLength, MaximumLength) -> (Data, MessageContext, IsEndOfMessage) | ||||
The framer implementation can directly advance the receive cursor | ||||
once it has parsed data to effectively discard data (for example, | ||||
discard a header once the content has been parsed). | ||||
To deliver a Message to the application, the framer implementation | ||||
can either directly deliever data that it has allocated, or deliver a | ||||
range of data directly from the underlying transport and | ||||
simulatenously advance the receive cursor. | ||||
MessageFramer.AdvanceReceiveCursor(Connection, Length) | ||||
MessageFramer.DeliverAndAdvanceReceiveCursor(Connection, MessageContext, Length, IsEndOfMessage) | ||||
MessageFramer.Deliver(Connection, MessageContext, Data, IsEndOfMessage) | ||||
Note that "MessageFramer.DeliverAndAdvanceReceiveCursor" allows the | ||||
framer implementation to earmark bytes as part of a Message even | ||||
before they are received by the transport. This allows the delivery | ||||
of very large Messages without requiring the implementation to | ||||
directly inspect all of the bytes. | ||||
To provide an example, a simple protocol that parses a length as a | ||||
header value would receive the "HandleReceivedData" event, and call | ||||
"Parse" with a minimum and maximum set to the length of the header | ||||
field. Once the parse succeeded, it would call | ||||
"AdvanceReceiveCursor" with the length of the header field, and then | ||||
call "DeliverAndAdvanceReceiveCursor" with the length of the body | ||||
that was parsed from the header, marking the new Message as complete. | ||||
7. Implementing Connection Management | ||||
Once a Connection is established, the Transport Services system | ||||
allows applications to interact with the Connection by modifying or | ||||
inspecting Connection Properties. A Connection can also generate | ||||
events in the form of Soft Errors. | ||||
The set of Connection Properties that are supported for setting and | ||||
getting on a Connection are described in [I-D.ietf-taps-interface]. | ||||
For any properties that are generic, and thus could apply to all | ||||
protocols being used by a Connection, the Transport System should | ||||
store the properties in a generic storage, and notify all protocol | ||||
instances in the Protocol Stack whenever the properties have been | ||||
modified by the application. For protocol-specfic properties, such | ||||
as the User Timeout that applies to TCP, the Transport System only | ||||
needs to update the relevant protocol instance. | ||||
If an error is encountered in setting a property (for example, if the | ||||
application tries to set a TCP-specific property on a Connection that | ||||
is not using TCP), the action should fail gracefully. The | ||||
application may be informed of the error, but the Connection itself | ||||
should not be terminated. | ||||
The Transport Services implementation should allow protocol instances | ||||
in the Protocol Stack to pass up arbitrary generic or protocol- | ||||
specific errors that can be delivered to the application as Soft | ||||
Errors. These allow the application to be informed of ICMP errors, | ||||
and other similar events. | ||||
7.1. Pooled Connection | ||||
For protocols that employ request/response pairs and do not require | ||||
in-order delivery of the responses, like HTTP, the transport | ||||
implementation may distribute interactions across several underlying | ||||
transport connections. For these kinds of protocols, implementations | ||||
may hide the connection management and only expose a single | ||||
Connection object and the individual requests/responses as messages. | ||||
These Pooled Connections can use multiple connections or multiple | ||||
streams of multi-streaming connections between endpoints, as long as | ||||
all of these satisfy the requirements, and prohibitions specified in | ||||
the Selection Properties of the Pooled Connection. This enables | ||||
implementations to realize transparent connection coalescing, | ||||
connection migration, and to perform per-message endpoint and path | ||||
selection by choosing among these underlying connections. | ||||
7.2. Handling Path Changes | ||||
When a path change occurs, the Transport Services implementation is | When a path change occurs, the Transport Services implementation is | |||
responsible for notifying Protocol Instances in the Protocol Stack. | responsible for notifying Protocol Instances in the Protocol Stack. | |||
If the Protocol Stack includes a transport protocol that supports | If the Protocol Stack includes a transport protocol that supports | |||
multipath connectivity, an update to the available paths should | multipath connectivity, an update to the available paths should | |||
inform the Protocol Instance of the new set of paths that are | inform the Protocol Instance of the new set of paths that are | |||
permissible based on the Selection Properties passed by the | permissible based on the Selection Properties passed by the | |||
application. A multipath protocol can establish new subflows over | application. A multipath protocol can establish new subflows over | |||
new paths, and should tear down subflows over paths that are no | new paths, and should tear down subflows over paths that are no | |||
longer available. If the Protocol Stack includes a transport | longer available. Pooled Connections Section 7.1 may add or remove | |||
protocol that does not support multipath, but support migrating | underlying transport connections in a similar manner. If the | |||
between paths, the update to available paths can be used as the | Protocol Stack includes a transport protocol that does not support | |||
trigger to migrating the connection. For protocols that do not | multipath, but support migrating between paths, the update to | |||
support multipath or migration, the Protocol Instances may be | available paths can be used as the trigger to migrating the | |||
informed of the path change, but should not be forcibly disconnected | connection. For protocols that do not support multipath or | |||
if the previously used path becomes unavailable. An exception to | migration, the Protocol Instances may be informed of the path change, | |||
this case is if the System Policy changes to prohibit traffic from | but should not be forcibly disconnected if the previously used path | |||
the Connection based on its properties, in which case the Protocol | becomes unavailable. An exception to this case is if the System | |||
Stack should be disconnected. | Policy changes to prohibit traffic from the Connection based on its | |||
properties, in which case the Protocol Stack should be disconnected. | ||||
7. Implementing Termination | 8. Implementing Connection Termination | |||
With TCP, when an application closes a connection, this means that it | With TCP, when an application closes a connection, this means that it | |||
has no more data to send (but expects all data that has been handed | has no more data to send (but expects all data that has been handed | |||
over to be reliably delivered). However, with TCP only, "close" does | over to be reliably delivered). However, with TCP only, "close" does | |||
not mean that the application will stop receiving data. This is | not mean that the application will stop receiving data. This is | |||
related to TCP's ability to support half-closed connections. | related to TCP's ability to support half-closed connections. | |||
SCTP is an example of a protocol that does not support such half- | SCTP is an example of a protocol that does not support such half- | |||
closed connections. Hence, with SCTP, the meaning of "close" is | closed connections. Hence, with SCTP, the meaning of "close" is | |||
stricter: an application has no more data to send (but expects all | stricter: an application has no more data to send (but expects all | |||
skipping to change at page 27, line 21 ¶ | skipping to change at page 30, line 5 ¶ | |||
Initiate action provokes a ConnectionReceived event at its peer. | Initiate action provokes a ConnectionReceived event at its peer. | |||
For Close (provoking a Finished event) and Abort (provoking a | For Close (provoking a Finished event) and Abort (provoking a | |||
ConnectionError event), the same logic applies: while it is desirable | ConnectionError event), the same logic applies: while it is desirable | |||
to be informed when a peer closes or aborts a Connection, whether | to be informed when a peer closes or aborts a Connection, whether | |||
this is possible depends on the underlying protocol, and no | this is possible depends on the underlying protocol, and no | |||
guarantees can be given. With SCTP, the transport system can use the | guarantees can be given. With SCTP, the transport system can use the | |||
stream reset procedure to cause a Finish event upon a Close action | stream reset procedure to cause a Finish event upon a Close action | |||
from the peer [NEAT-flow-mapping]. | from the peer [NEAT-flow-mapping]. | |||
8. Cached State | 9. Cached State | |||
Beyond a single Connection's lifetime, it is useful for an | Beyond a single Connection's lifetime, it is useful for an | |||
implementation to keep state and history. This cached state can help | implementation to keep state and history. This cached state can help | |||
improve future Connection establishment due to re-using results and | improve future Connection establishment due to re-using results and | |||
credentials, and favoring paths and protocols that performed well in | credentials, and favoring paths and protocols that performed well in | |||
the past. | the past. | |||
Cached state may be associated with different Endpoints for the same | Cached state may be associated with different Endpoints for the same | |||
Connection, depending on the protocol generating the cached content. | Connection, depending on the protocol generating the cached content. | |||
For example, session tickets for TLS are associated with specific | For example, session tickets for TLS are associated with specific | |||
endpoints, and thus should be cached based on a Connection's hostname | endpoints, and thus should be cached based on a Connection's hostname | |||
Endpoint (if applicable). On the other hand, performance | Endpoint (if applicable). On the other hand, performance | |||
characteristics of a path are more likely tied to the IP address and | characteristics of a path are more likely tied to the IP address and | |||
subnet being used. | subnet being used. | |||
8.1. Protocol state caches | 9.1. Protocol state caches | |||
Some protocols will have long-term state to be cached in association | Some protocols will have long-term state to be cached in association | |||
with Endpoints. This state often has some time after which it is | with Endpoints. This state often has some time after which it is | |||
expired, so the implementation should allow each protocol to specify | expired, so the implementation should allow each protocol to specify | |||
an expiration for cached content. | an expiration for cached content. | |||
Examples of cached protocol state include: | Examples of cached protocol state include: | |||
o The DNS protocol can cache resolution answers (A and AAAA queries, | o The DNS protocol can cache resolution answers (A and AAAA queries, | |||
for example), associated with a Time To Live (TTL) to be used for | for example), associated with a Time To Live (TTL) to be used for | |||
skipping to change at page 28, line 19 ¶ | skipping to change at page 31, line 5 ¶ | |||
influence an implementation's preference between several candidate | influence an implementation's preference between several candidate | |||
Protocol Stacks. For example, if two IP address Endpoints are | Protocol Stacks. For example, if two IP address Endpoints are | |||
otherwise equally preferred, an implementation may choose to attempt | otherwise equally preferred, an implementation may choose to attempt | |||
a connection to an address for which it has a TCP Fast Open cookie. | a connection to an address for which it has a TCP Fast Open cookie. | |||
Applications must have a way to flush protocol cache state if | Applications must have a way to flush protocol cache state if | |||
desired. This may be necessary, for example, if application-layer | desired. This may be necessary, for example, if application-layer | |||
identifiers rotate and clients wish to avoid linkability via | identifiers rotate and clients wish to avoid linkability via | |||
trackable TLS tickets or TFO cookies. | trackable TLS tickets or TFO cookies. | |||
8.2. Performance caches | 9.2. Performance caches | |||
In addition to protocol state, Protocol Instances should provide data | In addition to protocol state, Protocol Instances should provide data | |||
into a performance-oriented cache to help guide future protocol and | into a performance-oriented cache to help guide future protocol and | |||
path selection. Some performance information can be gathered | path selection. Some performance information can be gathered | |||
generically across several protocols to allow predictive comparisons | generically across several protocols to allow predictive comparisons | |||
between protocols on given paths: | between protocols on given paths: | |||
o Observed Round Trip Time | o Observed Round Trip Time | |||
o Connection Establishment latency | o Connection Establishment latency | |||
skipping to change at page 29, line 16 ¶ | skipping to change at page 32, line 5 ¶ | |||
depending on the nature of the value. Certain information, like the | depending on the nature of the value. Certain information, like the | |||
connection establishment success rate to a Remote Endpoint using a | connection establishment success rate to a Remote Endpoint using a | |||
given protocol stack, can be stored for a long period of time (hours | given protocol stack, can be stored for a long period of time (hours | |||
or longer), since it is expected that the capabilities of the Remote | or longer), since it is expected that the capabilities of the Remote | |||
Endpoint are not changing very quickly. On the other hand, Round | Endpoint are not changing very quickly. On the other hand, Round | |||
Trip Time observed by TCP over a particular network path may vary | Trip Time observed by TCP over a particular network path may vary | |||
over a relatively short time interval. For such values, the | over a relatively short time interval. For such values, the | |||
implementation should remove them from the cache more quickly, or | implementation should remove them from the cache more quickly, or | |||
treat older values with less confidence/weight. | treat older values with less confidence/weight. | |||
9. Specific Transport Protocol Considerations | 10. Specific Transport Protocol Considerations | |||
Each protocol that can run as part of a Transport Services | Each protocol that can run as part of a Transport Services | |||
implementation defines both its API mapping as well as implementation | implementation defines both its API mapping as well as implementation | |||
details. | details. API mappings for a protocol apply most to Connections in | |||
which the given protocol is the "top" of the Protocol Stack. For | ||||
API mappings for a protocol apply most to Connections in which the | example, the mapping of the "Send" function for TCP applies to | |||
given protocol is the "top" of the Protocol Stack. For example, the | Connections in which the application directly sends over TCP. If | |||
mapping of the "Send" function for TCP applies to Connections in | HTTP/2 is used on top of TCP, the HTTP/2 mappings take precendence. | |||
which the application directly sends over TCP. If HTTP/2 is used on | ||||
top of TCP, the HTTP/2 mappings take precendence. | ||||
Each protocol has a notion of Connectedness. Possible values for | Each protocol has a notion of Connectedness. Possible values for | |||
Connectedness are: | Connectedness are: | |||
o Unconnected. Unconnected protocols do not establish explicit | o Unconnected. Unconnected protocols do not establish explicit | |||
state between endpoints, and do not perform a handshake during | state between endpoints, and do not perform a handshake during | |||
Connection establishment. | Connection establishment. | |||
o Connected. Connected protocols establish state between endpoints, | o Connected. Connected protocols establish state between endpoints, | |||
and perform a handshake during Connection establishment. The | and perform a handshake during Connection establishment. The | |||
skipping to change at page 30, line 14 ¶ | skipping to change at page 32, line 49 ¶ | |||
o Datagram. Datagram protocols define Message boundaries at the | o Datagram. Datagram protocols define Message boundaries at the | |||
same level of transmission, such that only complete (not partial) | same level of transmission, such that only complete (not partial) | |||
Messages are supported. | Messages are supported. | |||
o Message. Message protocols support Message boundaries that can be | o Message. Message protocols support Message boundaries that can be | |||
sent and received either as complete or partial Messages. Maximum | sent and received either as complete or partial Messages. Maximum | |||
Message lengths can be defined, and Messages can be partially | Message lengths can be defined, and Messages can be partially | |||
reliable. | reliable. | |||
9.1. TCP | Below, primitives in the style of | |||
"CATEGORY.[SUBCATEGORY].PRIMITIVENAME.PROTOCOL" (e.g., | ||||
"CONNECT.SCTP") refer to the primitives with the same name in section | ||||
4 of [RFC8303]. For further implementation details, the description | ||||
of these primitives in [RFC8303] points to section 3, which refers | ||||
back the specifications for each protocol. This back-tracking method | ||||
applies to all elements of [I-D.ietf-taps-minset] (see appendix D of | ||||
[I-D.ietf-taps-interface]): they are listed in appendix A of | ||||
[I-D.ietf-taps-minset] with an implementation hint in the same style, | ||||
pointing back to section 4 of [RFC8303]. | ||||
10.1. TCP | ||||
Connectedness: Connected | Connectedness: Connected | |||
Data Unit: Byte-stream | Data Unit: Byte-stream | |||
API mappings for TCP are as follows: | API mappings for TCP are as follows: | |||
Connection Object: TCP connections between two hosts map directly to | Connection Object: TCP connections between two hosts map directly to | |||
Connection objects. | Connection objects. | |||
Initiate: Calling "Initiate" on a TCP Connection causes it to | Initiate: CONNECT.TCP. Calling "Initiate" on a TCP Connection | |||
reserve a local port, and send a SYN to the Remote Endpoint. | causes it to reserve a local port, and send a SYN to the Remote | |||
Endpoint. | ||||
InitiateWithSend: Early idempotent data is sent on a TCP Connection | InitiateWithSend: CONNECT.TCP with parameter "user message". Early | |||
in the SYN, as TCP Fast Open data. | idempotent data is sent on a TCP Connection in the SYN, as TCP | |||
Fast Open data. | ||||
Ready: A TCP Connection is ready once the three-way handshake is | Ready: A TCP Connection is ready once the three-way handshake is | |||
complete. | complete. | |||
InitiateError: TCP can throw various errors during connection setup. | InitiateError: Failure of CONNECT.TCP. TCP can throw various errors | |||
Specifically, it is important to handle a RST being sent by the | during connection setup. Specifically, it is important to handle | |||
peer during the handshake. | a RST being sent by the peer during the handshake. | |||
ConnectionError: Once established, TCP throws errors whenever the | ConnectionError: Once established, TCP throws errors whenever the | |||
connection is disconnected, such as due to receive a RST from the | connection is disconnected, such as due to receiving a RST from | |||
peer; or hitting a TCP retransmission timeout. | the peer; or hitting a TCP retransmission timeout. | |||
Listen: Calling "Listen" for TCP binds a local port and prepares it | Listen: LISTEN.TCP. Calling "Listen" for TCP binds a local port and | |||
to receive inbound SYN packets from peers. | prepares it to receive inbound SYN packets from peers. | |||
ConnectionReceived: TCP Listeners will deliver new connections once | ConnectionReceived: TCP Listeners will deliver new connections once | |||
they have replied to an inbound SYN with a SYN-ACK. | they have replied to an inbound SYN with a SYN-ACK. | |||
Clone: Calling "Clone" on a TCP Connection creates a new Connection | Clone: Calling "Clone" on a TCP Connection creates a new Connection | |||
with equivalent parameters. The two Connections are otherwise | with equivalent parameters. The two Connections are otherwise | |||
independent. | independent. | |||
Send: TCP does not on its own preserve Message boundaries. Calling | Send: SEND.TCP. TCP does not on its own preserve Message | |||
"Send" on a TCP connection lays out the bytes on the TCP send | boundaries. Calling "Send" on a TCP connection lays out the bytes | |||
stream without any other delineation. Any Message marked as Final | on the TCP send stream without any other delineation. Any Message | |||
will cause TCP to send a FIN once the Message has been completely | marked as Final will cause TCP to send a FIN once the Message has | |||
written. | been completely written, by calling CLOSE.TCP immediately upon | |||
successful termination of SEND.TCP. | ||||
Receive: TCP delivers a stream of bytes without any Message | Receive: With RECEIVE.TCP, TCP delivers a stream of bytes without | |||
delineation. All data delivered in the "Received" or | any Message delineation. All data delivered in the "Received" or | |||
"ReceivedPartial" event will be part of a single stream-wide | "ReceivedPartial" event will be part of a single stream-wide | |||
Message that is marked Final (unless a MessageFramer is used). | Message that is marked Final (unless a Message Framer is used). | |||
EndOfMessage will be delivered when the TCP Connection has | EndOfMessage will be delivered when the TCP Connection has | |||
received a FIN from the peer. | received a FIN (CLOSE-EVENT.TCP or ABORT-EVENT.TCP) from the peer. | |||
Close: Calling "Close" on a TCP Connection indicates that the | Close: Calling "Close" on a TCP Connection indicates that the | |||
Connection should be gracefully closed by sending a FIN to the | Connection should be gracefully closed (CLOSE.TCP) by sending a | |||
peer and waiting for a FIN-ACK before delivering the "Closed" | FIN to the peer and waiting for a FIN-ACK before delivering the | |||
event. | "Closed" event. | |||
Abort: Calling "Abort" on a TCP Connection indicates that the | Abort: Calling "Abort" on a TCP Connection indicates that the | |||
Connection should be immediately closed by sending a RST to the | Connection should be immediately closed by sending a RST to the | |||
peer. | peer (ABORT.TCP). | |||
9.2. UDP | 10.2. UDP | |||
Connectedness: Unconnected | Connectedness: Unconnected | |||
Data Unit: Datagram | Data Unit: Datagram | |||
API mappings for UDP are as follows: | API mappings for UDP are as follows: | |||
Connection Object: UDP connections represent a pair of specific IP | Connection Object: UDP connections represent a pair of specific IP | |||
addresses and ports on two hosts. | addresses and ports on two hosts. | |||
Initiate: Calling "Initiate" on a UDP Connection causes it to | Initiate: CONNECT.UDP. Calling "Initiate" on a UDP Connection | |||
reserve a local port, but does not generate any traffic. | causes it to reserve a local port, but does not generate any | |||
traffic. | ||||
InitiateWithSend: Early data on a UDP Connection does not have any | InitiateWithSend: Early data on a UDP Connection does not have any | |||
special meaning. The data is sent whenever the Connection is | special meaning. The data is sent whenever the Connection is | |||
Ready. | Ready. | |||
Ready: A UDP Connection is ready once the system has reserved a | Ready: A UDP Connection is ready once the system has reserved a | |||
local port and has a path to send to the Remote Endpoint. | local port and has a path to send to the Remote Endpoint. | |||
InitiateError: UDP Connections can only generate errors on | InitiateError: UDP Connections can only generate errors on | |||
initiation due to port conflicts on the local system. | initiation due to port conflicts on the local system. | |||
ConnectionError: Once in use, UDP throws errors upon receiving ICMP | ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- | |||
notifications indicating failures in the network. | Lite)) upon receiving ICMP notifications indicating failures in | |||
the network. | ||||
Listen: Calling "Listen" for UDP binds a local port and prepares it | Listen: LISTEN.UDP. Calling "Listen" for UDP binds a local port and | |||
to receive inbound UDP datagrams from peers. | prepares it to receive inbound UDP datagrams from peers. | |||
ConnectionReceived: UDP Listeners will deliver new connections once | ConnectionReceived: UDP Listeners will deliver new connections once | |||
they have received traffic from a new Remote Endpoint. | they have received traffic from a new Remote Endpoint. | |||
Clone: Calling "Clone" on a UDP Connection creates a new Connection | Clone: Calling "Clone" on a UDP Connection creates a new Connection | |||
with equivalent parameters. The two Connections are otherwise | with equivalent parameters. The two Connections are otherwise | |||
independent. | independent. | |||
Send: Calling "Send" on a UDP connection sends the data as the | Send: SEND.UDP(-Lite). Calling "Send" on a UDP connection sends the | |||
payload of a complete UDP datagram. Marking Messages as Final | data as the payload of a complete UDP datagram. Marking Messages | |||
does not change anything in the datagram's contents. | as Final does not change anything in the datagram's contents. | |||
Upon sending a UDP datagram, some relevant fields and flags in the | ||||
IP header can be controlled: DSCP (SET_DSCP.UDP(-Lite)), DF in | ||||
IPv4 (SET_DF.UDP(-Lite)) and ECN flag (SET_ECN.UDP(-Lite)). | ||||
Receive: UDP only delivers complete Messages to "Received", each of | Receive: RECEIVE.UDP(-Lite). UDP only delivers complete Messages to | |||
which represents a single datagram received in a UDP packet. | "Received", each of which represents a single datagram received in | |||
a UDP packet. Upon receiving a UDP datagram, the ECN flag from | ||||
the IP header can be obtained (GET_ECN.UDP(-Lite)). | ||||
Close: Calling "Close" on a UDP Connection releases the local port | Close: Calling "Close" on a UDP Connection (ABORT.UDP(-Lite)) | |||
reservation. | releases the local port reservation. | |||
Abort: Calling "Abort" on a UDP Connection is identical to calling | Abort: Calling "Abort" on a UDP Connection (ABORT.UDP(-Lite)) is | |||
"Close". | identical to calling "Close". | |||
9.3. TLS | 10.3. TLS | |||
The mapping of a TLS stream abstraction into the application is | The mapping of a TLS stream abstraction into the application is | |||
equivalent to the contract provided by TCP (see Section 9.1), and | equivalent to the contract provided by TCP (see Section 10.1), and | |||
builds upon many of the actions of TCP connections. | builds upon many of the actions of TCP connections. | |||
Connectedness: Connected | Connectedness: Connected | |||
Data Unit: Byte-stream | Data Unit: Byte-stream | |||
Connection Object: Connection objects represent a single TLS | Connection Object: Connection objects represent a single TLS | |||
connection running over a TCP connection between two hosts. | connection running over a TCP connection between two hosts. | |||
Initiate: Calling "Initiate" on a TLS Connection causes it to first | Initiate: Calling "Initiate" on a TLS Connection causes it to first | |||
skipping to change at page 34, line 5 ¶ | skipping to change at page 37, line 16 ¶ | |||
Connection should be gracefully closed by sending a "close_notify" | Connection should be gracefully closed by sending a "close_notify" | |||
to the peer and waiting for a corresponding "close_notify" before | to the peer and waiting for a corresponding "close_notify" before | |||
delivering the "Closed" event. | delivering the "Closed" event. | |||
Abort: Calling "Abort" on a TCP Connection indicates that the | Abort: Calling "Abort" on a TCP Connection indicates that the | |||
Connection should be immediately closed by sending a | Connection should be immediately closed by sending a | |||
"close_notify", optionally preceded by "user_canceled", to the | "close_notify", optionally preceded by "user_canceled", to the | |||
peer. Implementations do not need to wait to receive | peer. Implementations do not need to wait to receive | |||
"close_notify" before delivering the "Closed" event. | "close_notify" before delivering the "Closed" event. | |||
9.4. DTLS | 10.4. DTLS | |||
DTLS follows the same behavior as TLS (Section 9.3), with the notable | DTLS follows the same behavior as TLS (Section 10.3), with the | |||
exception of not inheriting behavior directly from TCP. Differences | notable exception of not inheriting behavior directly from TCP. | |||
from TLS are detailed below, and all cases not explicitly mentioned | Differences from TLS are detailed below, and all cases not explicitly | |||
should be considered the same as TLS. | mentioned should be considered the same as TLS. | |||
Connectedness: Connected | Connectedness: Connected | |||
Data Unit: Datagram | Data Unit: Datagram | |||
Connection Object: Connection objects represent a single DTLS | Connection Object: Connection objects represent a single DTLS | |||
connection running over a set of UDP ports between two hosts. | connection running over a set of UDP ports between two hosts. | |||
Initiate: Calling "Initiate" on a DTLS Connection causes it reserve | Initiate: Calling "Initiate" on a DTLS Connection causes it reserve | |||
a UDP local port, and begin sending handshake messages to the peer | a UDP local port, and begin sending handshake messages to the peer | |||
skipping to change at page 34, line 35 ¶ | skipping to change at page 37, line 46 ¶ | |||
and keys have been established to encrypt application data. | and keys have been established to encrypt application data. | |||
Send: Sending over DTLS does preserve message boundaries in the same | Send: Sending over DTLS does preserve message boundaries in the same | |||
way that UDP datagrams do. Marking a Message as Final does send a | way that UDP datagrams do. Marking a Message as Final does send a | |||
"close_notify" like TLS. | "close_notify" like TLS. | |||
Receive: Receiving over DTLS delivers one decrypted Message for each | Receive: Receiving over DTLS delivers one decrypted Message for each | |||
received DTLS datagram. If a "close_notify" is received, a | received DTLS datagram. If a "close_notify" is received, a | |||
Message will be delivered that is marked as Final. | Message will be delivered that is marked as Final. | |||
9.5. HTTP | 10.5. HTTP | |||
HTTP requests and responses map naturally into Messages, since they | HTTP requests and responses map naturally into Messages, since they | |||
are delineated chunks of data with metadata that can be sent over a | are delineated chunks of data with metadata that can be sent over a | |||
transport. To that end, HTTP can be seen as the most prevalent | transport. To that end, HTTP can be seen as the most prevalent | |||
framing protocol that runs on top of streams like TCP, TLS, etc. | framing protocol that runs on top of streams like TCP, TLS, etc. | |||
In order to use a transport Connection that provides HTTP Message | In order to use a transport Connection that provides HTTP Message | |||
support, the establishment and closing of the connection can be | support, the establishment and closing of the connection can be | |||
treated as it would without the framing protocol. Sending and | treated as it would without the framing protocol. Sending and | |||
receiving of Messages, however, changes to treat each Message as a | receiving of Messages, however, changes to treat each Message as a | |||
skipping to change at page 35, line 31 ¶ | skipping to change at page 38, line 44 ¶ | |||
Receive: HTTP Connections deliver Messages in which HTTP header | Receive: HTTP Connections deliver Messages in which HTTP header | |||
values attached to MessageContexts, and HTTP bodies in Message | values attached to MessageContexts, and HTTP bodies in Message | |||
data. | data. | |||
Close: Calling "Close" on an HTTP Connection will only close the | Close: Calling "Close" on an HTTP Connection will only close the | |||
underlying TLS or TCP connection if the HTTP version does not | underlying TLS or TCP connection if the HTTP version does not | |||
support multiplexing. For HTTP/2, for example, closing the | support multiplexing. For HTTP/2, for example, closing the | |||
connection only closes a specific stream. | connection only closes a specific stream. | |||
9.6. QUIC | 10.6. QUIC | |||
QUIC provides a multi-streaming interface to an encrypted transport. | QUIC provides a multi-streaming interface to an encrypted transport. | |||
Each stream can be viewed as equivalent to a TLS stream over TCP, so | Each stream can be viewed as equivalent to a TLS stream over TCP, so | |||
a natural mapping is to present each QUIC stream as an individual | a natural mapping is to present each QUIC stream as an individual | |||
Connection. The protocol for the stream will be considered Ready | Connection. The protocol for the stream will be considered Ready | |||
whenever the underlying QUIC connection is established to the point | whenever the underlying QUIC connection is established to the point | |||
that this stream's data can be sent. For streams after the first | that this stream's data can be sent. For streams after the first | |||
stream, this will likely be an immediate operation. | stream, this will likely be an immediate operation. | |||
Closing a single QUIC stream, presented to the application as a | Closing a single QUIC stream, presented to the application as a | |||
skipping to change at page 36, line 4 ¶ | skipping to change at page 39, line 14 ¶ | |||
Closing a single QUIC stream, presented to the application as a | Closing a single QUIC stream, presented to the application as a | |||
Connection, does not imply closing the underlying QUIC connection | Connection, does not imply closing the underlying QUIC connection | |||
itself. Rather, the implementation may choose to close the QUIC | itself. Rather, the implementation may choose to close the QUIC | |||
connection once all streams have been closed (often after some | connection once all streams have been closed (often after some | |||
timeout), or after an individual stream Connection sends an Abort. | timeout), or after an individual stream Connection sends an Abort. | |||
Connectedness: Multiplexing Connected | Connectedness: Multiplexing Connected | |||
Data Unit: Stream | Data Unit: Stream | |||
Connection Object: Connection objects represent a single QUIC stream | Connection Object: Connection objects represent a single QUIC stream | |||
on a QUIC connection. | on a QUIC connection. | |||
9.7. HTTP/2 transport | 10.7. HTTP/2 transport | |||
Similar to QUIC (Section 9.6), HTTP/2 provides a multi-streaming | Similar to QUIC (Section 10.6), HTTP/2 provides a multi-streaming | |||
interface. This will generally use HTTP as the unit of Messages over | interface. This will generally use HTTP as the unit of Messages over | |||
the streams, in which each stream can be represented as a transport | the streams, in which each stream can be represented as a transport | |||
Connection. The lifetime of streams and the HTTP/2 connection should | Connection. The lifetime of streams and the HTTP/2 connection should | |||
be managed as described for QUIC. | be managed as described for QUIC. | |||
It is possible to treat each HTTP/2 stream as a raw byte-stream | It is possible to treat each HTTP/2 stream as a raw byte-stream | |||
instead of a carrier for HTTP messages, in which case the Messages | instead of a carrier for HTTP messages, in which case the Messages | |||
over the streams can be represented similarly to the TCP stream (one | over the streams can be represented similarly to the TCP stream (one | |||
Message per direction, see Section 9.1). | Message per direction, see Section 10.1). | |||
Connectedness: Multiplexing Connected | Connectedness: Multiplexing Connected | |||
Data Unit: Stream | Data Unit: Stream | |||
Connection Object: Connection objects represent a single HTTP/2 | Connection Object: Connection objects represent a single HTTP/2 | |||
stream on a HTTP/2 connection. | stream on a HTTP/2 connection. | |||
9.8. SCTP | 10.8. SCTP | |||
To support sender-side stream schedulers (which are implemented on | Connectedness: Connected | |||
the sender side), a receiver-side Transport System should always | ||||
support message interleaving [RFC8260]. | ||||
SCTP messages can be very large. To allow the reception of large | Data Unit: Message | |||
messages in pieces, a "partial flag" can be used to inform a (native | ||||
SCTP) receiving application that a message is incomplete. After | ||||
receiving the "partial flag", this application would know that the | ||||
next receive calls will only deliver remaining parts of the same | ||||
message (i.e., no messages or partial messages will arrive on other | ||||
streams until the message is complete) (see Section 8.1.20 in | ||||
[RFC6458]). The "partial flag" can therefore facilitate the | ||||
implementation of the receiver buffer in the receiving application, | ||||
at the cost of limiting multiplexing and temporarily creating head- | ||||
of-line blocking delay at the receiver. | ||||
When a Transport System transfers a Message, it seems natural to map | API mappings for SCTP are as follows: | |||
the Message object to SCTP messages in order to support properties | ||||
such as "Ordered" or "Lifetime" (which maps onto partially reliable | ||||
delivery with a SCTP_PR_SCTP_TTL policy [RFC6458]). However, since | ||||
multiplexing of Connections onto SCTP streams may happen, and would | ||||
be hidden from the application, the Transport System requires a per- | ||||
stream receiver buffer anyway, so this potential benefit is lost and | ||||
the "partial flag" becomes unnecessary for the system. | ||||
The problem of long messages either requiring large receiver-side | Connection Object: Connection objects represent a flow of SCTP | |||
buffers or getting in the way of multiplexing is addressed by message | messages between a client and a server, which may be an SCTP | |||
interleaving [RFC8260], which is yet another reason why a receivers- | association or a stream in a SCTP association. How to map | |||
side transport system supporting SCTP should implement this | Connection objects to streams is described in [NEAT-flow-mapping]; | |||
mechanism. | in the following, a similar method is described. To map | |||
Connection objects to SCTP streams without head-of-line blocking | ||||
on the sender side, both the sending and receiving SCTP | ||||
implementation must support message interleaving [RFC8260]. Both | ||||
SCTP implementations must also support stream reconfiguration. | ||||
Finally, both communicating endpoints must be aware of this | ||||
intended multiplexing; [NEAT-flow-mapping] describes a way for a | ||||
Transport System to negotiate the stream mapping capability using | ||||
SCTP's adaptation layer indication, such that this functionality | ||||
would only take effect if both ends sides are aware of it. The | ||||
first flow, for which the SCTP association has been created, will | ||||
always use stream id zero. All additional flows are assigned to | ||||
unused stream ids in growing order. To avoid a conflict when both | ||||
endpoints map new flows simultaneously, the peer which initiated | ||||
the transport connection will use even stream numbers whereas the | ||||
remote side will map its flows to odd stream numbers. Both sides | ||||
maintain a status map of the assigned stream numbers. Generally, | ||||
new streams must consume the lowest available (even or odd, | ||||
depending on the side) stream number; this rule is relevant when | ||||
lower numbers become available because Connection objects | ||||
associated to the streams are closed. | ||||
10. IANA Considerations | Initiate: If this is the only Connection object that is assigned to | |||
the SCTP association or stream mapping has not been negotiated, | ||||
CONNECT.SCTP is called. Else, a new stream is used: if there are | ||||
enough streams available, "Initiate" is just a local operation | ||||
that assigns a new stream number to the Connection object. The | ||||
number of streams is negotiated as a parameter of the prior | ||||
CONNECT.SCTP call, and it represents a trade-off between local | ||||
resource usage and the number of Connection objects that can be | ||||
mapped without requiring a reconfiguration signal. When running | ||||
out of streams, ADD_STREAM.SCTP must be called. | ||||
InitiateWithSend: If this is the only Connection object that is | ||||
assigned to the SCTP association or stream mapping has not been | ||||
negotiated, CONNECT.SCTP is called with the "user message" | ||||
parameter. Else, a new stream is used (see "Initiate" for how to | ||||
handle running out of streams), and this just sends the first | ||||
message on a new stream. | ||||
Ready: "Initiate" or "InitiateWithSend" returns without an error, | ||||
i.e. SCTP's four-way handshake has completed. If an association | ||||
with the peer already exists, and stream mapping has been | ||||
negotiated and enough streams are available, a Connection Object | ||||
instantly becomes Ready after calling "Initiate" or | ||||
"InitiateWithSend". | ||||
InitiateError: Failure of CONNECT.SCTP. | ||||
ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. | ||||
Listen: LISTEN.SCTP. If an association with the peer already exists | ||||
and stream mapping has been negotiated, "Listen" just expects to | ||||
receive a new message on a new stream id (chosen in accordance | ||||
with the stream number assignment procedure described above). | ||||
ConnectionReceived: LISTEN.SCTP returns without an error (a result | ||||
of successful CONNECT.SCTP from the peer), or, in case of stream | ||||
mapping, the first message has arrived on a new stream (in this | ||||
case, "Receive" is also invoked). | ||||
Clone: Calling "Clone" on an SCTP association creates a new | ||||
Connection object and assigns it a new stream number in accordance | ||||
with the stream number assignment procedure described above. If | ||||
there are not enough streams available, ADD_STREAM.SCTP must be | ||||
called. | ||||
Priority (Connection): When this value is changed, or a Message with | ||||
Message Property "Priority" is sent, and there are multiple | ||||
Connection objects assigned to the same SCTP association, | ||||
CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities | ||||
of streams in the SCTP association. | ||||
Send: SEND.SCTP. Message Properties such as "Lifetime" and | ||||
"Ordered" map to parameters of this primitive. | ||||
Receive: RECEIVE.SCTP. The "partial flag" of RECEIVE.SCTP invokes a | ||||
"ReceivedPartial" event. | ||||
Close: If this is the only Connection object that is assigned to the | ||||
SCTP association, CLOSE.SCTP is called. Else, the Connection object | ||||
is one out of several Connection objects that are assigned to the | ||||
same SCTP assocation, and RESET_STREAM.SCTP must be called, which | ||||
informs the peer that the stream will no longer be used for mapping | ||||
and can be used by future "Initiate", "InitiateWithSend" or "Listen" | ||||
calls. At the peer, the event RESET_STREAM-EVENT.SCTP will fire, | ||||
which the peer must answer by issuing RESET_STREAM.SCTP too. The | ||||
resulting local RESET_STREAM-EVENT.SCTP informs the transport system | ||||
that the stream number can now be re-used by the next "Initiate", | ||||
"InitiateWithSend" or "Listen" calls. | ||||
Abort: If this is the only Connection object that is assigned to the | ||||
SCTP association, ABORT.SCTP is called. Else, the Connection object | ||||
is one out of several Connection objects that are assigned to the | ||||
same SCTP assocation, and shutdown proceeds as described under | ||||
"Close". | ||||
11. IANA Considerations | ||||
RFC-EDITOR: Please remove this section before publication. | RFC-EDITOR: Please remove this section before publication. | |||
This document has no actions for IANA. | This document has no actions for IANA. | |||
11. Security Considerations | 12. Security Considerations | |||
11.1. Considerations for Candidate Gathering | 12.1. Considerations for Candidate Gathering | |||
Implementations should avoid downgrade attacks that allow network | Implementations should avoid downgrade attacks that allow network | |||
interference to cause the implementation to select less secure, or | interference to cause the implementation to select less secure, or | |||
entirely insecure, combinations of paths and protocols. | entirely insecure, combinations of paths and protocols. | |||
11.2. Considerations for Candidate Racing | 12.2. Considerations for Candidate Racing | |||
See Section 5.2 for security considerations around racing with 0-RTT | See Section 5.3 for security considerations around racing with 0-RTT | |||
data. | data. | |||
An attacker that knows a particular device is racing several options | An attacker that knows a particular device is racing several options | |||
during connection establishment may be able to block packets for the | during connection establishment may be able to block packets for the | |||
first connection attempt, thus inducing the device to fall back to a | first connection attempt, thus inducing the device to fall back to a | |||
secondary attempt. This is a problem if the secondary attempts have | secondary attempt. This is a problem if the secondary attempts have | |||
worse security properties that enable further attacks. | worse security properties that enable further attacks. | |||
Implementations should ensure that all options have equivalent | Implementations should ensure that all options have equivalent | |||
security properties to avoid incentivizing attacks. | security properties to avoid incentivizing attacks. | |||
Since results from the network can determine how a connection attempt | Since results from the network can determine how a connection attempt | |||
tree is built, such as when DNS returns a list of resolved endpoints, | tree is built, such as when DNS returns a list of resolved endpoints, | |||
it is possible for the network to cause an implementation to consume | it is possible for the network to cause an implementation to consume | |||
significant on-device resources. Implementations should limit the | significant on-device resources. Implementations should limit the | |||
maximum amount of state allowed for any given node, including the | maximum amount of state allowed for any given node, including the | |||
number of child nodes, especially when the state is based on results | number of child nodes, especially when the state is based on results | |||
from the network. | from the network. | |||
12. Acknowledgements | 13. Acknowledgements | |||
This work has received funding from the European Union's Horizon 2020 | This work has received funding from the European Union's Horizon 2020 | |||
research and innovation programme under grant agreement No. 644334 | research and innovation programme under grant agreement No. 644334 | |||
(NEAT). | (NEAT). | |||
This work has been supported by Leibniz Prize project funds of DFG - | This work has been supported by Leibniz Prize project funds of DFG - | |||
German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ | German Research Foundation: Gottfried Wilhelm Leibniz-Preis 2011 (FKZ | |||
FE 570/4-1). | FE 570/4-1). | |||
This work has been supported by the UK Engineering and Physical | This work has been supported by the UK Engineering and Physical | |||
Sciences Research Council under grant EP/R04144X/1. | Sciences Research Council under grant EP/R04144X/1. | |||
This work has been supported by the Research Council of Norway under | ||||
its "Toppforsk" programme through the "OCARINA" project. | ||||
Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric | Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric | |||
Kinnear for their implementation and design efforts, including Happy | Kinnear for their implementation and design efforts, including Happy | |||
Eyeballs, that heavily influenced this work. | Eyeballs, that heavily influenced this work. | |||
13. References | 14. References | |||
13.1. Normative References | 14.1. Normative References | |||
[I-D.ietf-taps-arch] | [I-D.ietf-taps-arch] | |||
Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., | Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., | |||
Perkins, C., Tiesel, P., and C. Wood, "An Architecture for | Perkins, C., Tiesel, P., and C. Wood, "An Architecture for | |||
Transport Services", draft-ietf-taps-arch-03 (work in | Transport Services", draft-ietf-taps-arch-04 (work in | |||
progress), March 2019. | progress), July 2019. | |||
[I-D.ietf-taps-interface] | [I-D.ietf-taps-interface] | |||
Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., | Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., | |||
Kuehlewind, M., Perkins, C., Tiesel, P., and C. Wood, "An | Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., and T. | |||
Abstract Application Layer Interface to Transport | Pauly, "An Abstract Application Layer Interface to | |||
Services", draft-ietf-taps-interface-03 (work in | Transport Services", draft-ietf-taps-interface-04 (work in | |||
progress), March 2019. | progress), July 2019. | |||
[I-D.ietf-taps-minset] | [I-D.ietf-taps-minset] | |||
Welzl, M. and S. Gjessing, "A Minimal Set of Transport | Welzl, M. and S. Gjessing, "A Minimal Set of Transport | |||
Services for End Systems", draft-ietf-taps-minset-11 (work | Services for End Systems", draft-ietf-taps-minset-11 (work | |||
in progress), September 2018. | in progress), September 2018. | |||
[RFC6458] Stewart, R., Tuexen, M., Poon, K., Lei, P., and V. | ||||
Yasevich, "Sockets API Extensions for the Stream Control | ||||
Transmission Protocol (SCTP)", RFC 6458, | ||||
DOI 10.17487/RFC6458, December 2011, | ||||
<https://www.rfc-editor.org/info/rfc6458>. | ||||
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP | |||
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, | |||
<https://www.rfc-editor.org/info/rfc7413>. | <https://www.rfc-editor.org/info/rfc7413>. | |||
[RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext | [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext | |||
Transfer Protocol Version 2 (HTTP/2)", RFC 7540, | Transfer Protocol Version 2 (HTTP/2)", RFC 7540, | |||
DOI 10.17487/RFC7540, May 2015, | DOI 10.17487/RFC7540, May 2015, | |||
<https://www.rfc-editor.org/info/rfc7540>. | <https://www.rfc-editor.org/info/rfc7540>. | |||
[RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, | [RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann, | |||
skipping to change at page 39, line 35 ¶ | skipping to change at page 44, line 19 ¶ | |||
[RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: | [RFC8305] Schinazi, D. and T. Pauly, "Happy Eyeballs Version 2: | |||
Better Connectivity Using Concurrency", RFC 8305, | Better Connectivity Using Concurrency", RFC 8305, | |||
DOI 10.17487/RFC8305, December 2017, | DOI 10.17487/RFC8305, December 2017, | |||
<https://www.rfc-editor.org/info/rfc8305>. | <https://www.rfc-editor.org/info/rfc8305>. | |||
[RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol | [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol | |||
Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, | Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, | |||
<https://www.rfc-editor.org/info/rfc8446>. | <https://www.rfc-editor.org/info/rfc8446>. | |||
13.2. Informative References | 14.2. Informative References | |||
[I-D.ietf-quic-transport] | [I-D.ietf-quic-transport] | |||
Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed | Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed | |||
and Secure Transport", draft-ietf-quic-transport-20 (work | and Secure Transport", draft-ietf-quic-transport-23 (work | |||
in progress), April 2019. | in progress), September 2019. | |||
[NEAT-flow-mapping] | [NEAT-flow-mapping] | |||
"Transparent Flow Mapping for NEAT (in Workshop on Future | "Transparent Flow Mapping for NEAT (in Workshop on Future | |||
of Internet Transport (FIT 2017))", n.d.. | of Internet Transport (FIT 2017))", n.d.. | |||
[RFC5245] Rosenberg, J., "Interactive Connectivity Establishment | [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment | |||
(ICE): A Protocol for Network Address Translator (NAT) | (ICE): A Protocol for Network Address Translator (NAT) | |||
Traversal for Offer/Answer Protocols", RFC 5245, | Traversal for Offer/Answer Protocols", RFC 5245, | |||
DOI 10.17487/RFC5245, April 2010, | DOI 10.17487/RFC5245, April 2010, | |||
<https://www.rfc-editor.org/info/rfc5245>. | <https://www.rfc-editor.org/info/rfc5245>. | |||
[Trickle] "Trickle - Rate Limiting YouTube Video Streaming (ATC | [Trickle] "Trickle - Rate Limiting YouTube Video Streaming (ATC | |||
2012)", n.d.. | 2012)", n.d.. | |||
14.3. URIs | ||||
[1] https://developer.apple.com/documentation/network | ||||
[2] https://github.com/NEAT-project/neat | ||||
[3] https://www.neat-project.org | ||||
[4] https://github.com/fg-inet/python-asyncio-taps | ||||
Appendix A. Additional Properties | Appendix A. Additional Properties | |||
This appendix discusses implementation considerations for additional | This appendix discusses implementation considerations for additional | |||
parameters and properties that could be used to enhance transport | parameters and properties that could be used to enhance transport | |||
protocol and/or path selection, or the transmission of messages given | protocol and/or path selection, or the transmission of messages given | |||
a Protocol Stack that implements them. These are not part of the | a Protocol Stack that implements them. These are not part of the | |||
interface, and may be removed from the final document, but are | interface, and may be removed from the final document, but are | |||
presented here to support discussion within the TAPS working group as | presented here to support discussion within the TAPS working group as | |||
to whether they should be added to a future revision of the base | to whether they should be added to a future revision of the base | |||
specification. | specification. | |||
A.1. Properties Affecting Sorting of Branches | A.1. Properties Affecting Sorting of Branches | |||
In addition to the Protocol and Path Selection Properties discussed | In addition to the Protocol and Path Selection Properties discussed | |||
in Section 4.3, the following properties under discussion can | in Section 4.3, the following properties under discussion can | |||
influence branch sorting: | influence branch sorting: | |||
o Bounds on Send or Receive Rate: If the application indicates a | o Bounds on Send or Receive Rate: If the application indicates a | |||
bound on the expected Send or Receive bitrate, an implementation | bound on the expected Send or Receive bitrate, an implementation | |||
may prefer a path that can likely provide the desired bandwidth, | may prefer a path that can likely provide the desired bandwidth, | |||
based on cached maximum throughput, see Section 8.2. The | based on cached maximum throughput, see Section 9.2. The | |||
application may know the Send or Receive Bitrate from metadata in | application may know the Send or Receive Bitrate from metadata in | |||
adaptive HTTP streaming, such as MPEG-DASH. | adaptive HTTP streaming, such as MPEG-DASH. | |||
o Cost Preferences: If the application indicates a preference to | o Cost Preferences: If the application indicates a preference to | |||
avoid expensive paths, and some paths are associated with a | avoid expensive paths, and some paths are associated with a | |||
monetary cost, an implementation should decrease the ranking of | monetary cost, an implementation should decrease the ranking of | |||
such paths. If the application indicates that it prohibits using | such paths. If the application indicates that it prohibits using | |||
expensive paths, paths that are associated with a cost should be | expensive paths, paths that are associated with a cost should be | |||
purged from the decision tree. | purged from the decision tree. | |||
Appendix B. Reasons for errors | ||||
The Transport Services API [I-D.ietf-taps-interface] allows for the | ||||
several generic error types to specify a more detailed reason as to | ||||
why an error occurred. This appendix lists some of the possible | ||||
reasons. | ||||
o InvalidConfiguration: The transport properties and endpoints | ||||
provided by the application are either contradictory or | ||||
incomplete. Examples include the lack of a remote endpoint on an | ||||
active open or using a multicast group address while not | ||||
requesting a unidirectional receive. | ||||
o NoCandidates: The configuration is valid, but none of the | ||||
available transport protocols can satisfy the transport properties | ||||
provided by the application. | ||||
o ResolutionFailed: The remote or local specifier provided by the | ||||
application can not be resolved. | ||||
o EstablishmentFailed: The TAPS system was unable to establish a | ||||
transport-layer connection to the remote endpoint specified by the | ||||
application. | ||||
o PolicyProhibited: The system policy prevents the transport system | ||||
from performing the action requested by the application. | ||||
o NotCloneable: The protocol stack is not capable of being cloned. | ||||
o MessageTooLarge: The message size is too big for the transport | ||||
system to handle. | ||||
o ProtocolFailed: The underlying protocol stack failed. | ||||
o InvalidMessageProperties: The message properties are either | ||||
contradictory to the transport properties or they can not be | ||||
satisfied by the transport system. | ||||
o DeframingFailed: The data that was received by the underlying | ||||
protocol stack could not be deframed. | ||||
o ConnectionAborted: The connection was aborted by the peer. | ||||
o Timeout: Delivery of a message was not possible after a timeout. | ||||
Appendix C. Existing Implementations | ||||
This appendix gives an overview of existing implementations, at the | ||||
time of writing, of transport systems that are (to some degree) in | ||||
line with this document. | ||||
o Apple's Network.framework: | ||||
* [A very brief introduction should be added] | ||||
* Documentation: https://developer.apple.com/documentation/ | ||||
network [1] | ||||
o NEAT: | ||||
* NEAT is the output of the European H2020 research project | ||||
"NEAT"; it is a user-space library for protocol-independent | ||||
communication on top of TCP, UDP and SCTP, with many more | ||||
features such as a policy manager. | ||||
* Code: https://github.com/NEAT-project/neat [2] | ||||
* NEAT project: https://www.neat-project.org [3] | ||||
o PyTAPS: | ||||
* A TAPS implementation based on Python asyncio, offering | ||||
protocol-independent communication to applications on top of | ||||
TCP, UDP and TLS, with support for multicast. | ||||
* Code: https://github.com/fg-inet/python-asyncio-taps [4] | ||||
Authors' Addresses | Authors' Addresses | |||
Anna Brunstrom (editor) | Anna Brunstrom (editor) | |||
Karlstad University | Karlstad University | |||
Universitetsgatan 2 | Universitetsgatan 2 | |||
651 88 Karlstad | 651 88 Karlstad | |||
Sweden | Sweden | |||
Email: anna.brunstrom@kau.se | Email: anna.brunstrom@kau.se | |||
Tommy Pauly (editor) | Tommy Pauly (editor) | |||
Apple Inc. | Apple Inc. | |||
One Apple Park Way | One Apple Park Way | |||
Cupertino, California 95014 | Cupertino, California 95014 | |||
United States of America | United States of America | |||
Email: tpauly@apple.com | Email: tpauly@apple.com | |||
Theresa Enghardt | Theresa Enghardt | |||
TU Berlin | TU Berlin | |||
End of changes. 103 change blocks. | ||||
238 lines changed or deleted | 549 lines changed or added | |||
This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |