draft-ietf-taps-impl-09.txt   draft-ietf-taps-impl-10.txt 
TAPS Working Group A. Brunstrom, Ed. TAPS Working Group A. Brunstrom, Ed.
Internet-Draft Karlstad University Internet-Draft Karlstad University
Intended status: Informational T. Pauly, Ed. Intended status: Informational T. Pauly, Ed.
Expires: 1 November 2021 Apple Inc. Expires: 13 January 2022 Apple Inc.
T. Enghardt T. Enghardt
Netflix Netflix
K-J. Grinnemo K-J. Grinnemo
Karlstad University Karlstad University
T. Jones T. Jones
University of Aberdeen University of Aberdeen
P. Tiesel P. Tiesel
SAP SE SAP SE
C. Perkins C. Perkins
University of Glasgow University of Glasgow
M. Welzl M. Welzl
University of Oslo University of Oslo
30 April 2021 12 July 2021
Implementing Interfaces to Transport Services Implementing Interfaces to Transport Services
draft-ietf-taps-impl-09 draft-ietf-taps-impl-10
Abstract Abstract
The Transport Services system enables applications to use transport The Transport Services system enables applications to use transport
protocols flexibly for network communication and defines a protocol- protocols flexibly for network communication and defines a protocol-
independent Transport Services Application Programming Interface independent Transport Services Application Programming Interface
(API) that is based on an asynchronous, event-driven interaction (API) that is based on an asynchronous, event-driven interaction
pattern. This document serves as a guide to implementation on how to pattern. This document serves as a guide to implementation on how to
build such a system. build such a system.
skipping to change at page 1, line 48 skipping to change at page 1, line 48
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 1 November 2021. This Internet-Draft will expire on 13 January 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 27 skipping to change at page 2, line 27
provided without warranty as described in the Simplified BSD License. provided without warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Implementing Connection Objects . . . . . . . . . . . . . . . 4 2. Implementing Connection Objects . . . . . . . . . . . . . . . 4
3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5 3. Implementing Pre-Establishment . . . . . . . . . . . . . . . 5
3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5 3.1. Configuration-time errors . . . . . . . . . . . . . . . . 5
3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6 3.2. Role of system policy . . . . . . . . . . . . . . . . . . 6
4. Implementing Connection Establishment . . . . . . . . . . . . 7 4. Implementing Connection Establishment . . . . . . . . . . . . 7
4.1. Candidate Gathering . . . . . . . . . . . . . . . . . . . 8 4.1. Structuring Candidates as a Tree . . . . . . . . . . . . 8
4.1.1. Gathering Endpoint Candidates . . . . . . . . . . . . 8 4.1.1. Branch Types . . . . . . . . . . . . . . . . . . . . 10
4.1.2. Structuring Options as a Tree . . . . . . . . . . . . 9 4.1.2. Branching Order-of-Operations . . . . . . . . . . . . 12
4.1.3. Branch Types . . . . . . . . . . . . . . . . . . . . 11 4.1.3. Sorting Branches . . . . . . . . . . . . . . . . . . 14
4.1.4. Branching Order-of-Operations . . . . . . . . . . . . 13 4.2. Candidate Gathering . . . . . . . . . . . . . . . . . . . 15
4.1.5. Sorting Branches . . . . . . . . . . . . . . . . . . 15 4.2.1. Gathering Endpoint Candidates . . . . . . . . . . . . 15
4.2. Candidate Racing . . . . . . . . . . . . . . . . . . . . 16 4.3. Candidate Racing . . . . . . . . . . . . . . . . . . . . 17
4.2.1. Simultaneous . . . . . . . . . . . . . . . . . . . . 17 4.3.1. Simultaneous . . . . . . . . . . . . . . . . . . . . 17
4.2.2. Staggered . . . . . . . . . . . . . . . . . . . . . . 17 4.3.2. Staggered . . . . . . . . . . . . . . . . . . . . . . 18
4.2.3. Failover . . . . . . . . . . . . . . . . . . . . . . 18 4.3.3. Failover . . . . . . . . . . . . . . . . . . . . . . 19
4.3. Completing Establishment . . . . . . . . . . . . . . . . 18 4.4. Completing Establishment . . . . . . . . . . . . . . . . 19
4.3.1. Determining Successful Establishment . . . . . . . . 19 4.4.1. Determining Successful Establishment . . . . . . . . 20
4.4. Establishing multiplexed connections . . . . . . . . . . 20 4.5. Establishing multiplexed connections . . . . . . . . . . 21
4.5. Handling connectionless protocols . . . . . . . . . . . . 20 4.6. Handling connectionless protocols . . . . . . . . . . . . 21
4.6. Implementing listeners . . . . . . . . . . . . . . . . . 21 4.7. Implementing listeners . . . . . . . . . . . . . . . . . 21
4.6.1. Implementing listeners for Connected Protocols . . . 21 4.7.1. Implementing listeners for Connected Protocols . . . 22
4.6.2. Implementing listeners for Connectionless 4.7.2. Implementing listeners for Connectionless
Protocols . . . . . . . . . . . . . . . . . . . . . . 21 Protocols . . . . . . . . . . . . . . . . . . . . . . 22
4.6.3. Implementing listeners for Multiplexed Protocols . . 22 4.7.3. Implementing listeners for Multiplexed Protocols . . 22
5. Implementing Sending and Receiving Data . . . . . . . . . . . 22 5. Implementing Sending and Receiving Data . . . . . . . . . . . 23
5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 22 5.1. Sending Messages . . . . . . . . . . . . . . . . . . . . 23
5.1.1. Message Properties . . . . . . . . . . . . . . . . . 22 5.1.1. Message Properties . . . . . . . . . . . . . . . . . 23
5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 24 5.1.2. Send Completion . . . . . . . . . . . . . . . . . . . 25
5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 24 5.1.3. Batching Sends . . . . . . . . . . . . . . . . . . . 25
5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 25 5.2. Receiving Messages . . . . . . . . . . . . . . . . . . . 25
5.3. Handling of data for fast-open protocols . . . . . . . . 25 5.3. Handling of data for fast-open protocols . . . . . . . . 26
6. Implementing Message Framers . . . . . . . . . . . . . . . . 26 6. Implementing Message Framers . . . . . . . . . . . . . . . . 27
6.1. Defining Message Framers . . . . . . . . . . . . . . . . 27 6.1. Defining Message Framers . . . . . . . . . . . . . . . . 28
6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 28 6.2. Sender-side Message Framing . . . . . . . . . . . . . . . 29
6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 29 6.3. Receiver-side Message Framing . . . . . . . . . . . . . . 30
7. Implementing Connection Management . . . . . . . . . . . . . 30 7. Implementing Connection Management . . . . . . . . . . . . . 31
7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 30 7.1. Pooled Connection . . . . . . . . . . . . . . . . . . . . 31
7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 31 7.2. Handling Path Changes . . . . . . . . . . . . . . . . . . 32
8. Implementing Connection Termination . . . . . . . . . . . . . 32 8. Implementing Connection Termination . . . . . . . . . . . . . 33
9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 33 9. Cached State . . . . . . . . . . . . . . . . . . . . . . . . 34
9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 33 9.1. Protocol state caches . . . . . . . . . . . . . . . . . . 34
9.2. Performance caches . . . . . . . . . . . . . . . . . . . 34 9.2. Performance caches . . . . . . . . . . . . . . . . . . . 35
10. Specific Transport Protocol Considerations . . . . . . . . . 35 10. Specific Transport Protocol Considerations . . . . . . . . . 36
10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 36 10.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . 37
10.2. MPTCP . . . . . . . . . . . . . . . . . . . . . . . . . 38 10.2. MPTCP . . . . . . . . . . . . . . . . . . . . . . . . . 39
10.3. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 38 10.3. UDP . . . . . . . . . . . . . . . . . . . . . . . . . . 39
10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 40 10.4. UDP-Lite . . . . . . . . . . . . . . . . . . . . . . . . 41
10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 40 10.5. UDP Multicast Receive . . . . . . . . . . . . . . . . . 41
10.6. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 41 10.6. SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . 42
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 44 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45
12. Security Considerations . . . . . . . . . . . . . . . . . . . 44 12. Security Considerations . . . . . . . . . . . . . . . . . . . 45
12.1. Considerations for Candidate Gathering . . . . . . . . . 44 12.1. Considerations for Candidate Gathering . . . . . . . . . 45
12.2. Considerations for Candidate Racing . . . . . . . . . . 44 12.2. Considerations for Candidate Racing . . . . . . . . . . 45
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 45 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 46
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 45 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 46
14.1. Normative References . . . . . . . . . . . . . . . . . . 45 14.1. Normative References . . . . . . . . . . . . . . . . . . 46
14.2. Informative References . . . . . . . . . . . . . . . . . 46 14.2. Informative References . . . . . . . . . . . . . . . . . 47
Appendix A. API Mapping Template . . . . . . . . . . . . . . . . 48 Appendix A. API Mapping Template . . . . . . . . . . . . . . . . 49
Appendix B. Additional Properties . . . . . . . . . . . . . . . 49 Appendix B. Additional Properties . . . . . . . . . . . . . . . 50
B.1. Properties Affecting Sorting of Branches . . . . . . . . 49 B.1. Properties Affecting Sorting of Branches . . . . . . . . 50
Appendix C. Reasons for errors . . . . . . . . . . . . . . . . . 49 Appendix C. Reasons for errors . . . . . . . . . . . . . . . . . 51
Appendix D. Existing Implementations . . . . . . . . . . . . . . 50 Appendix D. Existing Implementations . . . . . . . . . . . . . . 52
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 51 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 52
1. Introduction 1. Introduction
The Transport Services architecture [I-D.ietf-taps-arch] defines a The Transport Services architecture [I-D.ietf-taps-arch] defines a
system that allows applications to use transport networking protocols system that allows applications to use transport networking protocols
flexibly. The interface such a system exposes to applications is flexibly. The interface such a system exposes to applications is
defined as the Transport Services API [I-D.ietf-taps-interface]. defined as the Transport Services API [I-D.ietf-taps-interface].
This API is designed to be generic across multiple transport This API is designed to be generic across multiple transport
protocols and sets of protocols features. protocols and sets of protocols features.
skipping to change at page 4, line 29 skipping to change at page 4, line 29
application constraints on, and preferences for, the transport; application constraints on, and preferences for, the transport;
* the Connection, the basic object that represents a flow of data as * the Connection, the basic object that represents a flow of data as
Messages in either direction between the Local and Remote Messages in either direction between the Local and Remote
Endpoints; Endpoints;
* and the Listener, a passive waiting object that delivers new * and the Listener, a passive waiting object that delivers new
Connections. Connections.
Preconnection objects should be implemented as bundles of properties Preconnection objects should be implemented as bundles of properties
that an application can both read and write. Once a Preconnection that an application can both read and write. A Preconnection object
has been used to create an outbound Connection or a Listener, the influences a Connection only at one point in time: when the
implementation should ensure that the copy of the properties held by Connection is created. Connection objects represent the interface
the Connection or Listener is immutable. This may involve performing between the application and the implementation to manage transport
a deep-copy, copying the object with all the objects it references, state, and conduct data transfer. During the process of
if the application is still able to modify properties on the original establishment (Section 4), the Connection will not be bound to a
Preconnection object. specific transport protocol instance, since multiple candidate
Protocol Stacks might be raced.
Connection objects represent the interface between the application Once a Preconnection has been used to create an outbound Connection
and the implementation to manage transport state, and conduct data or a Listener, the implementation should ensure that the copy of the
transfer. During the process of establishment (Section 4), the properties held by the Connection or Listener is not affected when
Connection will not be bound to a specific transport flow, since the application makes changes to a Preconnection object. This may
there may be multiple candidate Protocol Stacks being raced. Once involve the implementation performing a deep-copy, copying the object
the Connection is established, its interface maps actions and events with all the objects that it references.
to the details of the chosen Protocol Stack. For example, the same
Connection object may ultimately represent the interface into a TCP Once the Connection is established, its interface maps actions and
connection, a TLS session over TCP, a UDP flow with fully-specified events to the details of the chosen Protocol Stack. For example, the
local and remote endpoints, a DTLS session, a SCTP stream, a QUIC same Connection object may ultimately represent a single instance of
stream, or an HTTP/2 stream. one transport protocol (e.g., a TCP connection, a TLS session over
TCP, a UDP flow with fully-specified Local and Remote Endpoints, a
DTLS session, a SCTP stream, a QUIC stream, or an HTTP/2 stream).
The properties held by a Connection or Listener is independent of
other connections that are not part of the same Connection Group.
Once Initate has been called, the Selection Properties and Endpoint
information are immutable (i.e, an application is not able to later
modify Selection Properties on the original Preconnection object).
Listener objects are created with a Preconnection, at which point Listener objects are created with a Preconnection, at which point
their configuration should be considered immutable by the their configuration should be considered immutable by the
implementation. The process of listening is described in implementation. The process of listening is described in
Section 4.6. Section 4.7.
3. Implementing Pre-Establishment 3. Implementing Pre-Establishment
During pre-establishment the application specifies one or more During pre-establishment the application specifies one or more
Endpoints to be used for communication as well as protocol Endpoints to be used for communication as well as protocol
preferences and constraints via Selection Properties and, if desired, preferences and constraints via Selection Properties and, if desired,
also Connection Properties. Generally, Connection Properties should also Connection Properties. Generally, Connection Properties should
be configured as early as possible, because they can serve as input be configured as early as possible, because they can serve as input
to decisions that are made by the implementation (e.g., the Capacity to decisions that are made by the implementation (e.g., the Capacity
Profile can guide usage of a protocol offering scavenger-type Profile can guide usage of a protocol offering scavenger-type
skipping to change at page 5, line 37 skipping to change at page 5, line 45
Transport Properties, the transport system matches the required and Transport Properties, the transport system matches the required and
prohibited properties against the transport features of the available prohibited properties against the transport features of the available
protocols. protocols.
In the following cases, failure should be detected during pre- In the following cases, failure should be detected during pre-
establishment: establishment:
* A request by an application for Protocol Properties that include * A request by an application for Protocol Properties that include
requirements or prohibitions that cannot be satisfied by any of requirements or prohibitions that cannot be satisfied by any of
the available protocols. For example, if an application requires the available protocols. For example, if an application requires
"Configure Reliability per Message", but no such protocol is "Configure Reliability per Message", but no such feature is
available on the host running the transport system this should available in any protocol the host running the transport system on
result in an error, e.g., when SCTP is not supported by the the host running the transport system this should result in an
operating system. error, e.g., when SCTP is not supported by the operating system.
* A request by an application for Protocol Properties that are in * A request by an application for Protocol Properties that are in
conflict with each other, i.e., the required and prohibited conflict with each other, i.e., the required and prohibited
properties cannot be satisfied by the same protocol. For example, properties cannot be satisfied by the same protocol. For example,
if an application prohibits "Reliable Data Transfer" but then if an application prohibits "Reliable Data Transfer" but then
requires "Configure Reliability per Message", this mismatch should requires "Configure Reliability per Message", this mismatch should
result in an error. result in an error.
To avoid allocating resources, it is important that such cases fail To avoid allocating resources that are not finally needed, it is
as early as possible, e.g., prior to endpoint resolution, only to important that configuration-time errors fail as early as possible.
find out later that there is no protocol that satisfies the
requirements.
3.2. Role of system policy 3.2. Role of system policy
The properties specified during pre-establishment have a close The properties specified during pre-establishment have a close
relationship to system policy. The implementation is responsible for relationship to system policy. The implementation is responsible for
combining and reconciling several different sources of preferences combining and reconciling several different sources of preferences
when establishing Connections. These include, but are not limited when establishing Connections. These include, but are not limited
to: to:
1. Application preferences, i.e., preferences specified during the 1. Application preferences, i.e., preferences specified during the
skipping to change at page 7, line 8 skipping to change at page 7, line 8
looking up these policies will vary across various platforms. An looking up these policies will vary across various platforms. An
implementation should attempt to look up the relevant policies for implementation should attempt to look up the relevant policies for
the system in a dynamic way to make sure it is reflecting an accurate the system in a dynamic way to make sure it is reflecting an accurate
version of the system policy, since the system's policy regarding the version of the system policy, since the system's policy regarding the
application's traffic may change over time due to user or application's traffic may change over time due to user or
administrative changes. administrative changes.
4. Implementing Connection Establishment 4. Implementing Connection Establishment
The process of establishing a network connection begins when an The process of establishing a network connection begins when an
application expresses intent to communicate with a remote endpoint by application expresses intent to communicate with a Remote Endpoint by
calling Initiate. (At this point, any constraints or requirements calling Initiate. (At this point, any constraints or requirements
the application may have on the connection are available from pre- the application may have on the connection are available from pre-
establishment.) The process can be considered complete once there is establishment.) The process can be considered complete once there is
at least one Protocol Stack that has completed any required setup to at least one Protocol Stack that has completed any required setup to
the point that it can transmit and receive the application's data. the point that it can transmit and receive the application's data.
Connection establishment is divided into two top-level steps: Connection establishment is divided into two top-level steps:
Candidate Gathering, to identify the paths, protocols, and endpoints Candidate Gathering, to identify the paths, protocols, and endpoints
to use, and Candidate Racing (see Section 4.2.2 of to use, and Candidate Racing (see Section 4.2.2 of
[I-D.ietf-taps-arch]), in which the necessary protocol handshakes are [I-D.ietf-taps-arch]), in which the necessary protocol handshakes are
conducted so that the transport system can select which set to use. conducted so that the transport system can select which set to use.
This document structures candidates for racing as a tree.
This document structures the candidates for racing as a tree as
terminological convention. While a a tree structure is not the only
way in which racing can be implemented, it does ease the illustration
of how racing works.
The most simple example of this process might involve identifying the The most simple example of this process might involve identifying the
single IP address to which the implementation wishes to connect, single IP address to which the implementation wishes to connect,
using the system's current default interface or path, and starting a using the system's current default interface or path, and starting a
TCP handshake to establish a stream to the specified IP address. TCP handshake to establish a stream to the specified IP address.
However, each step may also vary depending on the requirements of the However, each step may also differ depending on the requirements of
connection: if the endpoint is defined as a hostname and port, then the connection: if the endpoint is defined as a hostname and port,
there may be multiple resolved addresses that are available; there then there may be multiple resolved addresses that are available;
may also be multiple interfaces or paths available, other than the there may also be multiple interfaces or paths available, other than
default system interface; and some protocols may not need any the default system interface; and some protocols may not need any
transport handshake to be considered "established" (such as UDP), transport handshake to be considered "established" (such as UDP),
while other connections may utilize layered protocol handshakes, such while other connections may utilize layered protocol handshakes, such
as TLS over TCP. as TLS over TCP.
Whenever an implementation has multiple options for connection Whenever an implementation has multiple options for connection
establishment, it can view the set of all individual connection establishment, it can view the set of all individual connection
establishment options as a single, aggregate connection establishment options as a single, aggregate connection
establishment. The aggregate set conceptually includes every valid establishment. The aggregate set conceptually includes every valid
combination of endpoints, paths, and protocols. As an example, combination of endpoints, paths, and protocols. As an example,
consider an implementation that initiates a TCP connection to a consider an implementation that initiates a TCP connection to a
skipping to change at page 8, line 15 skipping to change at page 8, line 21
would satisfy the original application intent. The concern of this would satisfy the original application intent. The concern of this
section is the algorithm defining which of these options to try, section is the algorithm defining which of these options to try,
when, and in what order. when, and in what order.
During Candidate Gathering, an implementation first excludes all During Candidate Gathering, an implementation first excludes all
protocols and paths that match a Prohibit or do not match all Require protocols and paths that match a Prohibit or do not match all Require
properties. Then, the implementation will sort branches according to properties. Then, the implementation will sort branches according to
Preferred properties, Avoided properties, and possibly other Preferred properties, Avoided properties, and possibly other
criteria. criteria.
4.1. Candidate Gathering 4.1. Structuring Candidates as a Tree
The step of gathering candidates involves identifying which paths,
protocols, and endpoints may be used for a given Connection. This
list is determined by the requirements, prohibitions, and preferences
of the application as specified in the Selection Properties.
4.1.1. Gathering Endpoint Candidates
Both Local and Remote Endpoint Candidates must be discovered during
connection establishment. To support Interactive Connectivity
Establishment (ICE) [RFC8445], or similar protocols that involve out-
of-band indirect signalling to exchange candidates with the Remote
Endpoint, it's important to be able to query the set of candidate
Local Endpoints, and give the protocol stack a set of candidate
Remote Endpoints, before it attempts to establish connections.
4.1.1.1. Local Endpoint candidates
The set of possible Local Endpoints is gathered. In the simple case,
this merely enumerates the local interfaces and protocols, allocates
ephemeral source ports. For example, a system that has WiFi and
Ethernet and supports IPv4 and IPv6 might gather four candidate
locals (IPv4 on Ethernet, IPv6 on Ethernet, IPv4 on WiFi, and IPv6 on
WiFi) that can form the source for a transient.
If NAT traversal is required, the process of gathering Local
Endpoints becomes broadly equivalent to the ICE candidate gathering
phase (see Section 5.1.1. of [RFC8445]). The endpoint determines its
server reflexive Local Endpoints (i.e., the translated address of a
local, on the other side of a NAT, e.g via a STUN sever [RFC5389])
and relayed locals (e.g., via a TURN server [RFC5766] or other
relay), for each interface and network protocol. These are added to
the set of candidate Local Endpoints for this connection.
Gathering Local Endpoints is primarily a local operation, although it
might involve exchanges with a STUN server to derive server reflexive
locals, or with a TURN server or other relay to derive relayed
locals. However, it does not involve communication with the Remote
Endpoint.
4.1.1.2. Remote Endpoint Candidates
The Remote Endpoint is typically a name that needs to be resolved
into a set of possible addresses that can be used for communication.
Resolving the Remote Endpoint is the process of recursively
performing such name lookups, until fully resolved, to return the set
of candidates for the remote of this connection.
How this is done will depend on the type of the Remote Endpoint, and
can also be specific to each Local Endpoint. A common case is when
the Remote Endpoint is a DNS name, in which case it is resolved to
give a set of IPv4 and IPv6 addresses representing that name. Some
types of remote might require more complex resolution. Resolving the
Remote Endpoint for a peer-to-peer connection might involve
communication with a rendezvous server, which in turn contacts the
peer to gain consent to communicate and retrieve its set of candidate
locals, which are returned and form the candidate remote addresses
for contacting that peer.
Resolving the Remote Endpoint is not a local operation. It will
involve a directory service, and can require communication with the
remote to rendezvous and exchange peer addresses. This can expose
some or all of the candidate locals to the remote.
4.1.2. Structuring Options as a Tree As noted above, the considereration of multiple candidates in a
gathering and racing process can be conceptually structured as a
tree; this terminological convention is used throughout this
document.
When an implementation responsible for connection establishment needs Each leaf node of the tree represents a single, coherent connection
to consider multiple options, it should logically structure these attempt, with an Endpoint, a Path, and a set of protocols that can
options as a hierarchical tree. Each leaf node of the tree directly negotiate and send data on the network. Each node in the
represents a single, coherent connection attempt, with an Endpoint, a tree that is not a leaf represents a connection attempt that is
Path, and a set of protocols that can directly negotiate and send either underspecified, or else includes multiple distinct options.
data on the network. Each node in the tree that is not a leaf For example, when connecting on an IP network, a connection attempt
represents a connection attempt that is either underspecified, or to a hostname and port is underspecified, because the connection
else includes multiple distinct options. For example, when attempt requires a resolved IP address as its Remote Endpoint. In
connecting on an IP network, a connection attempt to a hostname and this case, the node represented by the connection attempt to the
port is underspecified, because the connection attempt requires a hostname is a parent node, with child nodes for each IP address.
resolved IP address as its remote endpoint. In this case, the node Similarly, an implementation that is allowed to connect using
represented by the connection attempt to the hostname is a parent multiple interfaces will have a parent node of the tree for the
node, with child nodes for each IP address. Similarly, an decision between the paths, with a branch for each interface.
implementation that is allowed to connect using multiple interfaces
will have a parent node of the tree for the decision between the
paths, with a branch for each interface.
The example aggregate connection attempt above can be drawn as a tree The example aggregate connection attempt above can be drawn as a tree
by grouping the addresses resolved on the same interface into by grouping the addresses resolved on the same interface into
branches: branches:
|| ||
+==========================+ +==========================+
| www.example.com:80/Any | | www.example.com:80/Any |
+==========================+ +==========================+
// \\ // \\
skipping to change at page 11, line 4 skipping to change at page 9, line 49
use by the application. Another way to represent this is that every use by the application. Another way to represent this is that every
leaf node updates the state of its parent node when it becomes ready, leaf node updates the state of its parent node when it becomes ready,
until the trunk node of the tree is ready, which then notifies the until the trunk node of the tree is ready, which then notifies the
application that the connection as a whole is ready to use. application that the connection as a whole is ready to use.
A connection establishment tree may be degenerate, and only have a A connection establishment tree may be degenerate, and only have a
single leaf node, such as a connection attempt to an IP address over single leaf node, such as a connection attempt to an IP address over
a single interface with a single protocol. a single interface with a single protocol.
1 [192.0.2.1:80, Wi-Fi, TCP] 1 [192.0.2.1:80, Wi-Fi, TCP]
A parent node may also only have one child (or leaf) node, such as a A parent node may also only have one child (or leaf) node, such as a
when a hostname resolves to only a single IP address. when a hostname resolves to only a single IP address.
1 [www.example.com:80, Wi-Fi, TCP] 1 [www.example.com:80, Wi-Fi, TCP]
1.1 [192.0.2.1:80, Wi-Fi, TCP] 1.1 [192.0.2.1:80, Wi-Fi, TCP]
4.1.3. Branch Types 4.1.1. Branch Types
There are three types of branching from a parent node into one or There are three types of branching from a parent node into one or
more child nodes. Any parent node of the tree must only use one type more child nodes. Any parent node of the tree must only use one type
of branching. of branching.
4.1.3.1. Derived Endpoints 4.1.1.1. Derived Endpoints
If a connection originally targets a single endpoint, there may be If a connection originally targets a single endpoint, there may be
multiple endpoints of different types that can be derived from the multiple endpoints of different types that can be derived from the
original. The connection library creates an ordered list of the original. The connection library creates an ordered list of the
derived endpoints according to application preference, system policy derived endpoints according to application preference, system policy
and expected performance. and expected performance.
DNS hostname-to-address resolution is the most common method of DNS hostname-to-address resolution is the most common method of
endpoint derivation. When trying to connect to a hostname endpoint endpoint derivation. When trying to connect to a hostname endpoint
on a traditional IP network, the implementation should send DNS on a traditional IP network, the implementation should send DNS
queries for both A (IPv4) and AAAA (IPv6) records if both are queries for both A (IPv4) and AAAA (IPv6) records if both are
supported on the local link. The algorithm for ordering and racing supported on the local interface. The algorithm for ordering and
these addresses should follow the recommendations in Happy Eyeballs racing these addresses should follow the recommendations in Happy
[RFC8305]. Eyeballs [RFC8305].
1 [www.example.com:80, Wi-Fi, TCP] 1 [www.example.com:80, Wi-Fi, TCP]
1.1 [2001:DB8::1.80, Wi-Fi, TCP] 1.1 [2001:DB8::1.80, Wi-Fi, TCP]
1.2 [192.0.2.1:80, Wi-Fi, TCP] 1.2 [192.0.2.1:80, Wi-Fi, TCP]
1.3 [2001:DB8::2.80, Wi-Fi, TCP] 1.3 [2001:DB8::2.80, Wi-Fi, TCP]
1.4 [2001:DB8::3.80, Wi-Fi, TCP] 1.4 [2001:DB8::3.80, Wi-Fi, TCP]
DNS-Based Service Discovery [RFC6763] can also provide an endpoint DNS-Based Service Discovery [RFC6763] can also provide an endpoint
derivation step. When trying to connect to a named service, the derivation step. When trying to connect to a named service, the
client may discover one or more hostname and port pairs on the local client may discover one or more hostname and port pairs on the local
network using multicast DNS [RFC6762]. These hostnames should each network using multicast DNS [RFC6762]. These hostnames should each
be treated as a branch that can be attempted independently from other be treated as a branch that can be attempted independently from other
hostnames. Each of these hostnames might resolve to one or more hostnames. Each of these hostnames might resolve to one or more
addresses, which would create multiple layers of branching. addresses, which would create multiple layers of branching.
1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP] 1 [term-printer._ipp._tcp.meeting.ietf.org, Wi-Fi, TCP]
1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP] 1.1 [term-printer.meeting.ietf.org:631, Wi-Fi, TCP]
1.1.1 [31.133.160.18.631, Wi-Fi, TCP] 1.1.1 [31.133.160.18.631, Wi-Fi, TCP]
4.1.3.2. Alternate Paths 4.1.1.2. Alternate Paths
If a client has multiple network interfaces available to it, e.g., a If a client has multiple network interfaces available to it, e.g., a
mobile client with both Wi-Fi and Cellular connectivity, it can mobile client with both Wi-Fi and Cellular connectivity, it can
attempt a connection over any of the interfaces. This represents a attempt a connection over any of the interfaces. This represents a
branch point in the connection establishment. Similar to a derived branch point in the connection establishment. Similar to a derived
endpoint, the interfaces should be ranked based on preference, system endpoint, the interfaces should be ranked based on preference, system
policy, and performance. Attempts should be started on one policy, and performance. Attempts should be started on one
interface, and then on other interfaces successively after delays interface, and then on other interfaces successively after delays
based on expected round-trip-time or other available metrics. based on expected round-trip-time or other available metrics.
skipping to change at page 12, line 29 skipping to change at page 11, line 29
This same approach applies to any situation in which the client is This same approach applies to any situation in which the client is
aware of multiple links or views of the network. Multiple Paths, aware of multiple links or views of the network. Multiple Paths,
each with a coherent set of addresses, routes, DNS server, and more, each with a coherent set of addresses, routes, DNS server, and more,
may share a single interface. A path may also represent a virtual may share a single interface. A path may also represent a virtual
interface service such as a Virtual Private Network (VPN). interface service such as a Virtual Private Network (VPN).
The list of available paths should be constrained by any requirements The list of available paths should be constrained by any requirements
or prohibitions the application sets, as well as system policy. or prohibitions the application sets, as well as system policy.
4.1.3.3. Protocol Options 4.1.1.3. Protocol Options
Differences in possible protocol compositions and options can also Differences in possible protocol compositions and options can also
provide a branching point in connection establishment. This allows provide a branching point in connection establishment. This allows
clients to be resilient to situations in which a certain protocol is clients to be resilient to situations in which a certain protocol is
not functioning on a server or network. not functioning on a server or network.
This approach is commonly used for connections with optional proxy This approach is commonly used for connections with optional proxy
server configurations. A single connection might have several server configurations. A single connection might have several
options available: an HTTP-based proxy, a SOCKS-based proxy, or no options available: an HTTP-based proxy, a SOCKS-based proxy, or no
proxy. These options should be ranked and attempted in succession. proxy. These options should be ranked and attempted in succession.
skipping to change at page 13, line 25 skipping to change at page 12, line 25
1.1.1 [192.0.2.1:80, Any, SCTP] 1.1.1 [192.0.2.1:80, Any, SCTP]
1.2 [www.example.com:80, Any, TCP] 1.2 [www.example.com:80, Any, TCP]
1.2.1 [192.0.2.1:80, Any, TCP] 1.2.1 [192.0.2.1:80, Any, TCP]
Implementations that support racing protocols and protocol options Implementations that support racing protocols and protocol options
should maintain a history of which protocols and protocol options should maintain a history of which protocols and protocol options
successfully established, on a per-network and per-endpoint basis successfully established, on a per-network and per-endpoint basis
(see Section 9.2). This information can influence future racing (see Section 9.2). This information can influence future racing
decisions to prioritize or prune branches. decisions to prioritize or prune branches.
4.1.4. Branching Order-of-Operations 4.1.2. Branching Order-of-Operations
Branch types must occur in a specific order relative to one another Branch types must occur in a specific order relative to one another
to avoid creating leaf nodes with invalid or incompatible settings. to avoid creating leaf nodes with invalid or incompatible settings.
In the example above, it would be invalid to branch for derived In the example above, it would be invalid to branch for derived
endpoints (the DNS results for www.example.com) before branching endpoints (the DNS results for www.example.com) before branching
between interface paths, since there are situations when the results between interface paths, since there are situations when the results
will be different across networks due to private names or different will be different across networks due to private names or different
supported IP versions. Implementations must be careful to branch in supported IP versions. Implementations must be careful to branch in
an order that results in usable leaf nodes whenever there are an order that results in usable leaf nodes whenever there are
multiple branch types that could be used from a single node. multiple branch types that could be used from a single node.
skipping to change at page 15, line 5 skipping to change at page 14, line 5
Selection Property of preferring WiFi takes precedence over the Selection Property of preferring WiFi takes precedence over the
Property that led to a preference for SCTP. Property that led to a preference for SCTP.
1. [www.example.com:80, Any, Any Stream] 1. [www.example.com:80, Any, Any Stream]
1.1 [192.0.2.1:80, Wi-Fi, Any Stream] 1.1 [192.0.2.1:80, Wi-Fi, Any Stream]
1.1.1 [192.0.2.1:80, Wi-Fi, TCP] 1.1.1 [192.0.2.1:80, Wi-Fi, TCP]
1.2 [192.0.3.1:80, LTE, Any Stream] 1.2 [192.0.3.1:80, LTE, Any Stream]
1.2.1 [192.0.3.1:80, LTE, SCTP] 1.2.1 [192.0.3.1:80, LTE, SCTP]
1.2.2 [192.0.3.1:80, LTE, TCP] 1.2.2 [192.0.3.1:80, LTE, TCP]
4.1.5. Sorting Branches 4.1.3. Sorting Branches
Implementations should sort the branches of the tree of connection Implementations should sort the branches of the tree of connection
options in order of their preference rank, from most preferred to options in order of their preference rank, from most preferred to
least preferred. Leaf nodes on branches with higher rankings least preferred. Leaf nodes on branches with higher rankings
represent connection attempts that will be raced first. represent connection attempts that will be raced first.
Implementations should order the branches to reflect the preferences Implementations should order the branches to reflect the preferences
expressed by the application for its new connection, including expressed by the application for its new connection, including
Selection Properties, which are specified in Selection Properties, which are specified in
[I-D.ietf-taps-interface]. [I-D.ietf-taps-interface].
skipping to change at page 15, line 35 skipping to change at page 14, line 35
accordingly rank the paths. If the application specifies an accordingly rank the paths. If the application specifies an
interface type to be required or prohibited, an implementation is interface type to be required or prohibited, an implementation is
expeceted to not include the non-conforming paths. expeceted to not include the non-conforming paths.
* "Capacity Profile": An implementation can use the Capacity Profile * "Capacity Profile": An implementation can use the Capacity Profile
to prefer paths that match an application's expected traffic to prefer paths that match an application's expected traffic
pattern. This match will use cached performance estimates, see pattern. This match will use cached performance estimates, see
Section 9.2: Section 9.2:
- Scavenger: Prefer paths with the highest expected available - Scavenger: Prefer paths with the highest expected available
capacity, based on the observed maximum throughput; capacity, but minimising impact on other traffic, based on the
observed maximum throughput;
- Low Latency/Interactive: Prefer paths with the lowest expected - Low Latency/Interactive: Prefer paths with the lowest expected
Round Trip Time, based on observed round trip time estimates; Round Trip Time, based on observed round trip time estimates;
- Constant-Rate Streaming: Prefer paths that can are expected to - Low Latency/Non-Interactive: Prefer paths with a low expected
Round Trip Time, but can tolerate delay variation;
- Constant-Rate Streaming: Prefer paths that are expected to
satisy the requested Stream Send or Stream Receive Bitrate, satisy the requested Stream Send or Stream Receive Bitrate,
based on the observed maximum throughput. based on the observed maximum throughput;
- Capacity-Seeking: Prefer adapting to paths to determine the
highest available capacity, based on the observed maximum
throughput.
Implementations process the Properties in the following order: Implementations process the Properties in the following order:
Prohibit, Require, Prefer, Avoid. If Selection Properties contain Prohibit, Require, Prefer, Avoid. If Selection Properties contain
any prohibited properties, the implementation should first purge any prohibited properties, the implementation should first purge
branches containing nodes with these properties. For required branches containing nodes with these properties. For required
properties, it should only keep branches that satisfy these properties, it should only keep branches that satisfy these
requirements. Finally, it should order the branches according to the requirements. Finally, it should order the branches according to the
preferred properties, and finally use any avoided properties as a preferred properties, and finally use any avoided properties as a
tiebreaker. When ordering branches, an implementation can give more tiebreaker. When ordering branches, an implementation can give more
weight to properties that the application has explicitly set, than to weight to properties that the application has explicitly set, than to
the properties that are default. the properties that are default.
The available protocols and paths on a specific system and in a The available protocols and paths on a specific system and in a
specific context can change; therefore, the result of sorting and the specific context can change; therefore, the result of sorting and the
outcome of racing may vary, even when using the same Selection and outcome of racing may vary, even when using the same Selection and
Connection Properties. However, an implementation ought to provide a Connection Properties. However, an implementation ought to provide a
consistent outcome to applications, e.g., by preferring protocols and consistent outcome to applications, e.g., by preferring protocols and
paths that are already used by existing Connections that specified paths that are already used by existing Connections that specified
similar Properties. similar Properties.
4.2. Candidate Racing 4.2. Candidate Gathering
The step of gathering candidates involves identifying which paths,
protocols, and endpoints may be used for a given Connection. This
list is determined by the requirements, prohibitions, and preferences
of the application as specified in the Selection Properties.
4.2.1. Gathering Endpoint Candidates
Both Local and Remote Endpoint Candidates must be discovered during
connection establishment. To support Interactive Connectivity
Establishment (ICE) [RFC8445], or similar protocols that involve out-
of-band indirect signalling to exchange candidates with the Remote
Endpoint, it is important to query the set of candidate Local
Endpoints, and provide the protocol stack with a set of candidate
Remote Endpoints, before the Local Endpoint attempts to establish
connections.
4.2.1.1. Local Endpoint candidates
The set of possible Local Endpoints is gathered. In the simple case,
this merely enumerates the local interfaces and protocols, and
allocates ephemeral source ports. For example, a system that has
WiFi and Ethernet and supports IPv4 and IPv6 might gather four
candidate Local Endpoints (IPv4 on Ethernet, IPv6 on Ethernet, IPv4
on WiFi, and IPv6 on WiFi) that can form the source for a transient.
If NAT traversal is required, the process of gathering Local
Endpoints becomes broadly equivalent to the ICE candidate gathering
phase (see Section 5.1.1. of [RFC8445]). The endpoint determines its
server reflexive Local Endpoints (i.e., the translated address of a
Local Endpoint, on the other side of a NAT, e.g via a STUN sever
[RFC5389]) and relayed Local Endpoints (e.g., via a TURN server
[RFC5766] or other relay), for each interface and network protocol.
These are added to the set of candidate Local Endpoints for this
connection.
Gathering Local Endpoints is primarily a local operation, although it
might involve exchanges with a STUN server to derive server reflexive
Local Endpoints, or with a TURN server or other relay to derive
relayed Local Endpoints. However, it does not involve communication
with the Remote Endpoint.
4.2.1.2. Remote Endpoint Candidates
The Remote Endpoint is typically a name that needs to be resolved
into a set of possible addresses that can be used for communication.
Resolving the Remote Endpoint is the process of recursively
performing such name lookups, until fully resolved, to return the set
of candidates for the Remote Endpoint of this connection.
How this resolution is done will depend on the type of the Remote
Endpoint, and can also be specific to each Local Endpoint. A common
case is when the Remote Endpoint is a DNS name, in which case it is
resolved to give a set of IPv4 and IPv6 addresses representing that
name. Some types of Remote Endpoint might require more complex
resolution. Resolving the Remote Endpoint for a peer-to-peer
connection might involve communication with a rendezvous server,
which in turn contacts the peer to gain consent to communicate and
retrieve its set of candidate Local Endpoints, which are returned and
form the candidate remote addresses for contacting that peer.
Resolving the Remote Endpoint is not a local operation. It will
involve a directory service, and can require communication with the
Remote Endpoint to rendezvous and exchange peer addresses. This can
expose some or all of the candidate Local Endpoints to the Remote
Endpoint.
4.3. Candidate Racing
The primary goal of the Candidate Racing process is to successfully The primary goal of the Candidate Racing process is to successfully
negotiate a protocol stack to an endpoint over an interface--to negotiate a protocol stack to an endpoint over an interface--to
connect a single leaf node of the tree--with as little delay and as connect a single leaf node of the tree--with as little delay and as
few unnecessary connections attempts as possible. Optimizing these few unnecessary connections attempts as possible. Optimizing these
two factors improves the user experience, while minimizing network two factors improves the user experience, while minimizing network
load. load.
This section covers the dynamic aspect of connection establishment. This section covers the dynamic aspect of connection establishment.
The tree described above is a useful conceptual and architectural The tree described above is a useful conceptual and architectural
skipping to change at page 17, line 10 skipping to change at page 17, line 39
Each approach is appropriate in different use-cases and branch types. Each approach is appropriate in different use-cases and branch types.
However, to avoid consuming unnecessary network resources, However, to avoid consuming unnecessary network resources,
implementations should not use simultaneous racing as a default implementations should not use simultaneous racing as a default
approach. approach.
The timing algorithms for racing should remain independent across The timing algorithms for racing should remain independent across
branches of the tree. Any timers or racing logic is isolated to a branches of the tree. Any timers or racing logic is isolated to a
given parent node, and is not ordered precisely with regards to other given parent node, and is not ordered precisely with regards to other
children of other nodes. children of other nodes.
4.2.1. Simultaneous 4.3.1. Simultaneous
Simultaneous racing is when multiple alternate branches are started Simultaneous racing is when multiple alternate branches are started
without waiting for any one branch to make progress before starting without waiting for any one branch to make progress before starting
the next alternative. This means the attempts are effectively the next alternative. This means the attempts are effectively
simultaneous. Simultaneous racing should be avoided by simultaneous. Simultaneous racing should be avoided by
implementations, since it consumes extra network resources and implementations, since it consumes extra network resources and
establishes state that might not be used. establishes state that might not be used.
4.2.2. Staggered 4.3.2. Staggered
Staggered racing can be used whenever a single node of the tree has Staggered racing can be used whenever a single node of the tree has
multiple child nodes. Based on the order determined when building multiple child nodes. Based on the order determined when building
the tree, the first child node will be initiated immediately, the tree, the first child node will be initiated immediately,
followed by the next child node after some delay. Once that second followed by the next child node after some delay. Once that second
child node is initiated, the third child node (if present) will begin child node is initiated, the third child node (if present) will begin
after another delay, and so on until all child nodes have been after another delay, and so on until all child nodes have been
initiated, or one of the child nodes successfully completes its initiated, or one of the child nodes successfully completes its
negotiation. negotiation.
Staggered racing attempts can proceed in parallel. Implementations Staggered racing attempts can proceed in parallel. Implementations
should not terminate an earlier child connection attempt upon should not terminate an earlier child connection attempt upon
starting a secondary child. starting a secondary child.
If a child node fails to establish connectivity (as in Section 4.3.1) If a child node fails to establish connectivity (as in Section 4.4.1)
before the delay time has expired for the next child, the next child before the delay time has expired for the next child, the next child
should be started immediately. should be started immediately.
Staggered racing between IP addresses for a generic Connection should Staggered racing between IP addresses for a generic Connection should
follow the Happy Eyeballs algorithm described in [RFC8305]. follow the Happy Eyeballs algorithm described in [RFC8305].
[RFC8421] provides guidance for racing when performing Interactive [RFC8421] provides guidance for racing when performing Interactive
Connectivity Establishment (ICE). Connectivity Establishment (ICE).
Generally, the delay before starting a given child node ought to be Generally, the delay before starting a given child node ought to be
based on the length of time the previously started child node is based on the length of time the previously started child node is
skipping to change at page 18, line 14 skipping to change at page 19, line 5
network interface (such as radio association) and name resolution network interface (such as radio association) and name resolution
over that interface, in addition to the delay that would be added for over that interface, in addition to the delay that would be added for
a single transport connection handshake. a single transport connection handshake.
Since the staggered delay can be chosen based on dynamic information, Since the staggered delay can be chosen based on dynamic information,
such as predicted round-trip time, implementations should define such as predicted round-trip time, implementations should define
upper and lower bounds for delay times. These bounds are upper and lower bounds for delay times. These bounds are
implementation-specific, and may differ based on which branch type is implementation-specific, and may differ based on which branch type is
being used. being used.
4.2.3. Failover 4.3.3. Failover
If an implementation or application has a strong preference for one If an implementation or application has a strong preference for one
branch over another, the branching node may choose to wait until one branch over another, the branching node may choose to wait until one
child has failed before starting the next. Failure of a leaf node is child has failed before starting the next. Failure of a leaf node is
determined by its protocol negotiation failing or timing out; failure determined by its protocol negotiation failing or timing out; failure
of a parent branching node is determined by all of its children of a parent branching node is determined by all of its children
failing. failing.
An example in which failover is recommended is a race between a An example in which failover is recommended is a race between a
protocol stack that uses a proxy and a protocol stack that bypasses protocol stack that uses a proxy and a protocol stack that bypasses
the proxy. Failover is useful in case the proxy is down or the proxy. Failover is useful in case the proxy is down or
misconfigured, but any more aggressive type of racing may end up misconfigured, but any more aggressive type of racing may end up
unnecessarily avoiding a proxy that was preferred by policy. unnecessarily avoiding a proxy that was preferred by policy.
4.3. Completing Establishment 4.4. Completing Establishment
The process of connection establishment completes when one leaf node The process of connection establishment completes when one leaf node
of the tree has completed negotiation with the remote endpoint of the tree has successfully completed negotiation with the Remote
successfully, or else all nodes of the tree have failed to connect. Endpoint, or else all nodes of the tree have failed to connect. The
The first leaf node to complete its connection is then used by the first leaf node to complete its connection is then used by the
application to send and receive data. application to send and receive data.
Successes and failures of a given attempt should be reported up to Successes and failures of a given attempt should be reported up to
parent nodes (towards the trunk of the tree). For example, in the parent nodes (towards the trunk of the tree). For example, in the
following case, if 1.1.1 fails to connect, it reports the failure to following case, if 1.1.1 fails to connect, it reports the failure to
1.1. Since 1.1 has no other child nodes, it also has failed and 1.1. Since 1.1 has no other child nodes, it also has failed and
reports that failure to 1. Because 1.2 has not yet failed, 1 is not reports that failure to 1. Because 1.2 has not yet failed, 1 is not
considered to have failed. Since 1.2 has not yet started, it is considered to have failed. Since 1.2 has not yet started, it is
started and the process continues. Similarly, if 1.1.1 successfully started and the process continues. Similarly, if 1.1.1 successfully
connects, then it marks 1.1 as connected, which propagates to the connects, then it marks 1.1 as connected, which propagates to the
skipping to change at page 19, line 22 skipping to change at page 20, line 16
attempts should be made ineligible for use by the application for the attempts should be made ineligible for use by the application for the
original request. New connection attempts that involve transmitting original request. New connection attempts that involve transmitting
data on the network ought not to be started after another leaf node data on the network ought not to be started after another leaf node
has already successfully completed, because the connection as a whole has already successfully completed, because the connection as a whole
has now been established. An implementation may choose to let has now been established. An implementation may choose to let
certain handshakes and negotiations complete in order to gather certain handshakes and negotiations complete in order to gather
metrics to influence future connections. Keeping additional metrics to influence future connections. Keeping additional
connections is generally not recommended since those attempts were connections is generally not recommended since those attempts were
slower to connect and may exhibit less desirable properties. slower to connect and may exhibit less desirable properties.
4.3.1. Determining Successful Establishment 4.4.1. Determining Successful Establishment
Implementations may select the criteria by which a leaf node is Implementations may select the criteria by which a leaf node is
considered to be successfully connected differently on a per-protocol considered to be successfully connected differently on a per-protocol
basis. If the only protocol being used is a transport protocol with basis. If the only protocol being used is a transport protocol with
a clear handshake, like TCP, then the obvious choice is to declare a clear handshake, like TCP, then the obvious choice is to declare
that node "connected" when the last packet of the three-way handshake that node "connected" when the last packet of the three-way handshake
has been received. If the only protocol being used is an has been received. If the only protocol being used is an
connectionless protocol, like UDP, the implementation may consider connectionless protocol, like UDP, the implementation may consider
the node fully "connected" the moment it determines a route is the node fully "connected" the moment it determines a route is
present, before sending any packets on the network, see further present, before sending any packets on the network, see further
Section 4.5. Section 4.6.
For protocol stacks with multiple handshakes, the decision becomes For protocol stacks with multiple handshakes, the decision becomes
more nuanced. If the protocol stack involves both TLS and TCP, an more nuanced. If the protocol stack involves both TLS and TCP, an
implementation could determine that a leaf node is connected after implementation could determine that a leaf node is connected after
the TCP handshake is complete, or it can wait for the TLS handshake the TCP handshake is complete, or it can wait for the TLS handshake
to complete as well. The benefit of declaring completion when the to complete as well. The benefit of declaring completion when the
TCP handshake finishes, and thus stopping the race for other branches TCP handshake finishes, and thus stopping the race for other branches
of the tree, is reduced burden on the network and remote endpoints of the tree, is reduced burden on the network and Remote Endpoints
from further connection attempts that are likely to be abandoned. On from further connection attempts that are likely to be abandoned. On
the other hand, by waiting until the TLS handshake is complete, an the other hand, by waiting until the TLS handshake is complete, an
implementation avoids the scenario in which a TCP handshake completes implementation avoids the scenario in which a TCP handshake completes
quickly, but TLS negotiation is either very slow or fails altogether quickly, but TLS negotiation is either very slow or fails altogether
in particular network conditions or to a particular endpoint. To in particular network conditions or to a particular endpoint. To
avoid the issue of TLS possibly failing, the implementation should avoid the issue of TLS possibly failing, the implementation should
not generate a Ready event for the Connection until TLS is not generate a Ready event for the Connection until TLS is
established. established.
If all of the leaf nodes fail to connect during racing, i.e. none of If all of the leaf nodes fail to connect during racing, i.e. none of
the configurations that satisfy all requirements given in the the configurations that satisfy all requirements given in the
Transport Properties actually work over the available paths, then the Transport Properties actually work over the available paths, then the
transport system should notify the application with an InitiateError transport system should notify the application with an InitiateError
event. An InitiateError event should also be generated in case the event. An InitiateError event should also be generated in case the
transport system finds no usable candidates to race. transport system finds no usable candidates to race.
4.4. Establishing multiplexed connections 4.5. Establishing multiplexed connections
Multiplexing several Connections over a single underlying transport Multiplexing several Connections over a single underlying transport
connection requires that the Connections to be multiplexed belong to connection requires that the Connections to be multiplexed belong to
the same Connection Group (as is indicated by the application using the same Connection Group (as is indicated by the application using
the Clone call). When the underlying transport connection supports the Clone call). When the underlying transport connection supports
multi-streaming, the Transport System can map each Connection in the multi-streaming, the Transport System can map each Connection in the
Connection Group to a different stream. Thus, when the Connections Connection Group to a different stream. Thus, when the Connections
that are offered to an application by the Transport System are that are offered to an application by the Transport System are
multiplexed, the Transport System may implement the establishment of multiplexed, the Transport System may implement the establishment of
a new Connection by simply beginning to use a new stream of an a new Connection by simply beginning to use a new stream of an
skipping to change at page 20, line 34 skipping to change at page 21, line 27
connection establishment procedure. This, then, also means that connection establishment procedure. This, then, also means that
there may not be any "establishment" message (like a TCP SYN), but there may not be any "establishment" message (like a TCP SYN), but
the application can simply start sending or receiving. Therefore, the application can simply start sending or receiving. Therefore,
when the Initiate action of a Transport System is called without when the Initiate action of a Transport System is called without
Messages being handed over, it cannot be guaranteed that the other Messages being handed over, it cannot be guaranteed that the other
endpoint will have any way to know about this, and hence a passive endpoint will have any way to know about this, and hence a passive
endpoint's ConnectionReceived event may not be called upon an active endpoint's ConnectionReceived event may not be called upon an active
endpoint's Inititate. Instead, calling the ConnectionReceived event endpoint's Inititate. Instead, calling the ConnectionReceived event
may be delayed until the first Message arrives. may be delayed until the first Message arrives.
4.5. Handling connectionless protocols 4.6. Handling connectionless protocols
While protocols that use an explicit handshake to validate a While protocols that use an explicit handshake to validate a
Connection to a peer can be used for racing multiple establishment Connection to a peer can be used for racing multiple establishment
attempts in parallel, connectionless protocols such as raw UDP do not attempts in parallel, connectionless protocols such as raw UDP do not
offer a way to validate the presence of a peer or the usability of a offer a way to validate the presence of a peer or the usability of a
Connection without application feedback. An implementation should Connection without application feedback. An implementation should
consider such a protocol stack to be established as soon as the consider such a protocol stack to be established as soon as the
Transport Services system has selected a path on which to send data. Transport Services system has selected a path on which to send data.
However, if a peer is not reachable over the network using the However, if a peer is not reachable over the network using the
connectionless protocol, or data cannot be exchanged for any other connectionless protocol, or data cannot be exchanged for any other
reason, the application may want to attempt using another candidate reason, the application may want to attempt using another candidate
Protocol Stack. The implementation should maintain the list of other Protocol Stack. The implementation should maintain the list of other
candidate Protocol Stacks that were eligible to use. candidate Protocol Stacks that were eligible to use.
4.6. Implementing listeners 4.7. Implementing listeners
When an implementation is asked to Listen, it registers with the When an implementation is asked to Listen, it registers with the
system to wait for incoming traffic to the Local Endpoint. If no system to wait for incoming traffic to the Local Endpoint. If no
Local Endpoint is specified, the implementation should use an Local Endpoint is specified, the implementation should use an
ephemeral port. ephemeral port.
If the Selection Properties do not require a single network interface If the Selection Properties do not require a single network interface
or path, but allow the use of multiple paths, the Listener object or path, but allow the use of multiple paths, the Listener object
should register for incoming traffic on all of the network interfaces should register for incoming traffic on all of the network interfaces
or paths that conform to the Properties. The set of available paths or paths that conform to the Properties. The set of available paths
can change over time, so the implementation should monitor network can change over time, so the implementation should monitor network
path changes, and change the registration of the Listener across all path changes, and change the registration of the Listener across all
usable paths as appropriate. When using multiple paths, the Listener usable paths as appropriate. When using multiple paths, the Listener
is generally expected to use the same port for listening on each. is generally expected to use the same port for listening on each.
If the Selection Properties allow multiple protocols to be used for If the Selection Properties allow multiple protocols to be used for
listening, and the implementation supports it, the Listener object listening, and the implementation supports it, the Listener object
should support receiving inbound connections for each eligible should support receiving inbound connections for each eligible
protocol on each eligible path. protocol on each eligible path.
4.6.1. Implementing listeners for Connected Protocols 4.7.1. Implementing listeners for Connected Protocols
Connected protocols such as TCP and TLS-over-TCP have a strong Connected protocols such as TCP and TLS-over-TCP have a strong
mapping between the Local and Remote Endpoints (four-tuple) and their mapping between the Local and Remote Endpoints (four-tuple) and their
protocol connection state. These map into Connection objects. protocol connection state. These map into Connection objects.
Whenever a new inbound handshake is being started, the Listener Whenever a new inbound handshake is being started, the Listener
should generate a new Connection object and pass it to the should generate a new Connection object and pass it to the
application. application.
4.6.2. Implementing listeners for Connectionless Protocols 4.7.2. Implementing listeners for Connectionless Protocols
Connectionless protocols such as UDP and UDP-lite generally do not Connectionless protocols such as UDP and UDP-lite generally do not
provide the same mechanisms that connected protocols do to offer provide the same mechanisms that connected protocols do to offer
Connection objects. Implementations should wait for incoming packets Connection objects. Implementations should wait for incoming packets
for connectionless protocols on a listening port and should perform for connectionless protocols on a listening port and should perform
four-tuple matching of packets to either existing Connection objects four-tuple matching of packets to either existing Connection objects
or the creation of new Connection objects. On platforms with or the creation of new Connection objects. On platforms with
facilities to create a "virtual connection" for connectionless facilities to create a "virtual connection" for connectionless
protocols implementations should use these mechanisms to minimise the protocols implementations should use these mechanisms to minimise the
handling of datagrams intended for already created Connection handling of datagrams intended for already created Connection
objects. objects.
4.6.3. Implementing listeners for Multiplexed Protocols 4.7.3. Implementing listeners for Multiplexed Protocols
Protocols that provide multiplexing of streams into a single four- Protocols that provide multiplexing of streams into a single four-
tuple can listen both for entirely new connections (a new HTTP/2 tuple can listen both for entirely new connections (a new HTTP/2
stream on a new TCP connection, for example) and for new sub- stream on a new TCP connection, for example) and for new sub-
connections (a new HTTP/2 stream on an existing connection). If the connections (a new HTTP/2 stream on an existing connection). If the
abstraction of Connection presented to the application is mapped to abstraction of Connection presented to the application is mapped to
the multiplexed stream, then the Listener should deliver new the multiplexed stream, then the Listener should deliver new
Connection objects in the same way for either case. The Connection objects in the same way for either case. The
implementation should allow the application to introspect the implementation should allow the application to introspect the
Connection Group marked on the Connections to determine the grouping Connection Group marked on the Connections to determine the grouping
skipping to change at page 23, line 28 skipping to change at page 24, line 15
priorities per Message. For example, an implementation of HTTP/2 priorities per Message. For example, an implementation of HTTP/2
could choose to send Messages of different Priority on streams of could choose to send Messages of different Priority on streams of
different priority. different priority.
* Ordered: when this is false, this disables the requirement of in- * Ordered: when this is false, this disables the requirement of in-
order-delivery for protocols that support configurable ordering. order-delivery for protocols that support configurable ordering.
When the protocol stack does not support configurable ordering, When the protocol stack does not support configurable ordering,
this property may be ignored. this property may be ignored.
* Safely Replayable: when this is true, this means that the Message * Safely Replayable: when this is true, this means that the Message
can be used by mechanisms that might transfer it multiple times - can be used by a transport mechanism that might transfer it
e.g., as a result of racing multiple transports or as part of TCP multiple times - e.g., as a result of racing multiple transports
Fast Open. Also, protocols that do not protect against duplicated or as part of TCP Fast Open. Also, protocols that do not protect
messages, such as UDP, can only be used with Messages that are against duplicated messages, such as UDP (when used directly,
Safely Replayable. without a protocol layered atop), can only be used with Messages
that are Safely Replayable. When a transport system is permitted
to replay messages, replay protection could be provided by the
application.
* Final: when this is true, this means that the sender will not send * Final: when this is true, this means that the sender will not send
any further messages. The Connection need not be closed (in case any further messages. The Connection need not be closed (in case
the Protocol Stack supports half-close operation, like TCP). Any the Protocol Stack supports half-close operation, like TCP). Any
messages sent after a Final message will result in a SendError. messages sent after a Final message will result in a SendError.
* Corruption Protection Length: when this is set to any value other * Corruption Protection Length: when this is set to any value other
than "Full Coverage", it sets the minimum protection in protocols than "Full Coverage", it sets the minimum protection in protocols
that allow limiting the checksum length (e.g. UDP-Lite). If the that allow limiting the checksum length (e.g. UDP-Lite). If the
protocol stack does not support checksum length limitation, this protocol stack does not support checksum length limitation, this
skipping to change at page 27, line 38 skipping to change at page 28, line 30
Connection is being torn down. The framer implementation can use the Connection is being torn down. The framer implementation can use the
Connection object to look up specific properties of the Connection or Connection object to look up specific properties of the Connection or
the network being used that may influence how to frame Messages. the network being used that may influence how to frame Messages.
MessageFramer -> Start(Connection) MessageFramer -> Start(Connection)
MessageFramer -> Stop(Connection) MessageFramer -> Stop(Connection)
When a Message Framer generates a "Start" event, the framer When a Message Framer generates a "Start" event, the framer
implementation has the opportunity to start writing some data prior implementation has the opportunity to start writing some data prior
to the Connection delivering its "Ready" event. This allows the to the Connection delivering its "Ready" event. This allows the
implementation to communicate control data to the remote endpoint implementation to communicate control data to the Remote Endpoint
that can be used to parse Messages. that can be used to parse Messages.
MessageFramer.MakeConnectionReady(Connection) MessageFramer.MakeConnectionReady(Connection)
Similarly, when a Message Framer generates a "Stop" event, the framer Similarly, when a Message Framer generates a "Stop" event, the framer
implementation has the opportunity to write some final data or clear implementation has the opportunity to write some final data or clear
up its local state before the "Closed" event is delivered to the up its local state before the "Closed" event is delivered to the
Application. The framer implementation can indicate that it has Application. The framer implementation can indicate that it has
finished with this. finished with this.
skipping to change at page 32, line 19 skipping to change at page 33, line 19
the "multipath-policy" Connection Property choices made by the the "multipath-policy" Connection Property choices made by the
application. A protocol can then establish new subflows over new application. A protocol can then establish new subflows over new
paths while an active path is still available or, if migration is paths while an active path is still available or, if migration is
supported, also after a break has been detected, and should attempt supported, also after a break has been detected, and should attempt
to tear down subflows over paths that are no longer used. The to tear down subflows over paths that are no longer used. The
Transport Services API's Connection Property "multipath-policy" Transport Services API's Connection Property "multipath-policy"
allows an application to indicate when and how different paths should allows an application to indicate when and how different paths should
be used. However, detailed handling of these policies is still be used. However, detailed handling of these policies is still
implementation-specific. For example, if the "multipath" Selection implementation-specific. For example, if the "multipath" Selection
Property is set to "active", the decision about when to create a new Property is set to "active", the decision about when to create a new
path or to announce a new path or set of paths to the remote path or to announce a new path or set of paths to the Remote
endpoint, e.g., in the form of additional IP addresses, is Endpoint, e.g., in the form of additional IP addresses, is
implementation-specific. If the Protocol Stack includes a transport implementation-specific. If the Protocol Stack includes a transport
protocol that does not support multipath, but does support migrating protocol that does not support multipath, but does support migrating
between paths, the update to the set of available paths can trigger between paths, the update to the set of available paths can trigger
the connection to be migrated. the connection to be migrated.
In case of Pooled Connections Section 7.1, the Transport Services In case of Pooled Connections Section 7.1, the Transport Services
implementation may add connections over new paths to the pool if implementation may add connections over new paths to the pool if
permissible based on the multipath policy and Selection Properties. permissible based on the multipath policy and Selection Properties.
In case a previously used path becomes unavailable, the transport In case a previously used path becomes unavailable, the transport
system may disconnect all connections that require this path, but system may disconnect all connections that require this path, but
skipping to change at page 33, line 12 skipping to change at page 34, line 12
all supported protocols. Hence, as is common with all reliable all supported protocols. Hence, as is common with all reliable
transport protocols, after a Close action, the application can expect transport protocols, after a Close action, the application can expect
to have its reliability requirements honored regarding the data it to have its reliability requirements honored regarding the data it
has given to the Transport System, but it cannot expect to be able to has given to the Transport System, but it cannot expect to be able to
read any more data after calling Close. read any more data after calling Close.
Abort differs from Close only in that no guarantees are given Abort differs from Close only in that no guarantees are given
regarding data that the application has handed over to the Transport regarding data that the application has handed over to the Transport
System before calling Abort. System before calling Abort.
As explained in Section 4.4, when a new stream is multiplexed on an As explained in Section 4.5, when a new stream is multiplexed on an
already existing connection of a Transport Protocol Instance, there already existing connection of a Transport Protocol Instance, there
is no need for a connection establishment procedure. Because the is no need for a connection establishment procedure. Because the
Connections that are offered by the Transport System can be Connections that are offered by the Transport System can be
implemented as streams that are multiplexed on a transport protocol's implemented as streams that are multiplexed on a transport protocol's
connection, it can therefore not be guaranteed that one Endpoint's connection, it can therefore not be guaranteed that one Endpoint's
Initiate action provokes a ConnectionReceived event at its peer. Initiate action provokes a ConnectionReceived event at its peer.
For Close (provoking a Finished event) and Abort (provoking a For Close (provoking a Finished event) and Abort (provoking a
ConnectionError event), the same logic applies: while it is desirable ConnectionError event), the same logic applies: while it is desirable
to be informed when a peer closes or aborts a Connection, whether to be informed when a peer closes or aborts a Connection, whether
skipping to change at page 35, line 20 skipping to change at page 36, line 20
for instance be signal strength information reported by radio modems for instance be signal strength information reported by radio modems
like Wi-Fi and mobile broadband or information about the battery- like Wi-Fi and mobile broadband or information about the battery-
level of the device. Furthermore, the system may cache the observed level of the device. Furthermore, the system may cache the observed
maximum throughput on a path as an estimate of the available maximum throughput on a path as an estimate of the available
bandwidth. bandwidth.
An implementation should use this information, when possible, to An implementation should use this information, when possible, to
influence preference between candidate paths, endpoints, and protocol influence preference between candidate paths, endpoints, and protocol
options. Eligible options that historically had significantly better options. Eligible options that historically had significantly better
performance than others should be selected first when gathering performance than others should be selected first when gathering
candidates (see Section 4.1) to ensure better performance for the candidates (see Section 4.2) to ensure better performance for the
application. application.
The reasonable lifetime for cached performance values will vary The reasonable lifetime for cached performance values will vary
depending on the nature of the value. Certain information, like the depending on the nature of the value. Certain information, like the
connection establishment success rate to a Remote Endpoint using a connection establishment success rate to a Remote Endpoint using a
given protocol stack, can be stored for a long period of time (hours given protocol stack, can be stored for a long period of time (hours
or longer), since it is expected that the capabilities of the Remote or longer), since it is expected that the capabilities of the Remote
Endpoint are not changing very quickly. On the other hand, the Round Endpoint are not changing very quickly. On the other hand, the Round
Trip Time observed by TCP over a particular network path may vary Trip Time observed by TCP over a particular network path may vary
over a relatively short time interval. For such values, the over a relatively short time interval. For such values, the
skipping to change at page 40, line 28 skipping to change at page 41, line 28
Connectedness: Connectionless Connectedness: Connectionless
Data Unit: Datagram Data Unit: Datagram
API mappings for Receiving Multicast UDP are as follows: API mappings for Receiving Multicast UDP are as follows:
Connection Object: Established UDP Multicast Receive connections Connection Object: Established UDP Multicast Receive connections
represent a pair of specific IP addresses and ports. The represent a pair of specific IP addresses and ports. The
"unidirectional receive" transport property is required, and the "unidirectional receive" transport property is required, and the
local endpoint must be configured with a group IP address and a Local Endpoint must be configured with a group IP address and a
port. port.
Initiate: Calling "Initiate" on a UDP Multicast Receive Connection Initiate: Calling "Initiate" on a UDP Multicast Receive Connection
causes an immediate InitiateError. This is an unsupported causes an immediate InitiateError. This is an unsupported
operation. operation.
InitiateWithSend: Calling "InitiateWithSend" on a UDP Multicast InitiateWithSend: Calling "InitiateWithSend" on a UDP Multicast
Receive Connection causes an immediate InitiateError. This is an Receive Connection causes an immediate InitiateError. This is an
unsupported operation. unsupported operation.
skipping to change at page 40, line 51 skipping to change at page 41, line 51
InitiateError: UDP Multicast Receive Connections generate an InitiateError: UDP Multicast Receive Connections generate an
InitiateError if Initiate is called. InitiateError if Initiate is called.
ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(- ConnectionError: Once in use, UDP throws "soft errors" (ERROR.UDP(-
Lite)) upon receiving ICMP notifications indicating failures in Lite)) upon receiving ICMP notifications indicating failures in
the network. the network.
Listen: LISTEN.UDP. Calling "Listen" for UDP Multicast Receive Listen: LISTEN.UDP. Calling "Listen" for UDP Multicast Receive
binds a local port, prepares it to receive inbound UDP datagrams binds a local port, prepares it to receive inbound UDP datagrams
from peers, and issues a multicast host join. If a remote from peers, and issues a multicast host join. If a Remote
endpoint with an address is supplied, the join is Source-specific Endpoint with an address is supplied, the join is Source-specific
Multicast, and the path selection is based on the route to the Multicast, and the path selection is based on the route to the
remote endpoint. If a remote endpoint is not supplied, the join Remote Endpoint. If a Remote Endpoint is not supplied, the join
is Any-source Multicast, and the path selection is based on the is Any-source Multicast, and the path selection is based on the
outbound route to the group supplied in the local endpoint. outbound route to the group supplied in the Local Endpoint.
ConnectionReceived: UDP Multicast Receive Listeners will deliver new ConnectionReceived: UDP Multicast Receive Listeners will deliver new
connections once they have received traffic from a new Remote connections once they have received traffic from a new Remote
Endpoint. Endpoint.
Clone: Calling "Clone" on a UDP Multicast Receive Connection creates Clone: Calling "Clone" on a UDP Multicast Receive Connection creates
a new Connection with equivalent parameters. The two Connections a new Connection with equivalent parameters. The two Connections
are otherwise independent. are otherwise independent.
Send: SEND.UDP(-Lite). Calling "Send" on a UDP Multicast Receive Send: SEND.UDP(-Lite). Calling "Send" on a UDP Multicast Receive
skipping to change at page 41, line 42 skipping to change at page 42, line 42
(ABORT.UDP(-Lite)) is identical to calling "Close". (ABORT.UDP(-Lite)) is identical to calling "Close".
10.6. SCTP 10.6. SCTP
Connectedness: Connected Connectedness: Connected
Data Unit: Message Data Unit: Message
API mappings for SCTP are as follows: API mappings for SCTP are as follows:
Connection Object: Connection objects represent a flow of SCTP Connection Object: Connection objects can be mapped to an SCTP
messages between a client and a server, which may be an SCTP association or a stream in an SCTP association. Mapping
association or a stream in a SCTP association. How to map Connection objects to SCTP streams is called "stream mapping" and
Connection objects to streams is described in [NEAT-flow-mapping]; has additional requirements as follows. The following explanation
in the following, a similar method is described. To map assumes a client-server communication model.
Connection objects to SCTP streams without head-of-line blocking
on the sender side, both the sending and receiving SCTP Stream mapping requires an association to already be in place between
implementation must support message interleaving [RFC8260]. Both the client and the server, and it requires the server to understand
SCTP implementations must also support stream reconfiguration. that a new incoming stream should be represented as a new Connection
Finally, both communicating endpoints must be aware of this Object by the Transport Services system. A new SCTP stream is
intended multiplexing; [NEAT-flow-mapping] describes a way for a created by sending an SCTP message with a new stream id. Thus, to
Transport System to negotiate the stream mapping capability using implement stream mapping, the Transport Services system MUST provide
SCTP's adaptation layer indication, such that this functionality a newly created Connection Object to the application upon the
would only take effect if both ends sides are aware of it. The reception of such a message. The necessary semantics to implement a
first flow, for which the SCTP association has been created, will Transport Services system's Close and Abort primitives are provided
always use stream id zero. All additional flows are assigned to by the stream reconfiguration (reset) procedure described in
unused stream ids in growing order. To avoid a conflict when both [RFC6525]. This also allows to re-use a stream id after resetting
endpoints map new flows simultaneously, the peer which initiated ("closing") the stream. To implement this functionality, SCTP stream
the transport connection will use even stream numbers whereas the reconfiguration [RFC6525] MUST be supported by both the client and
remote side will map its flows to odd stream numbers. Both sides the server side.
maintain a status map of the assigned stream numbers. Generally,
new streams must consume the lowest available (even or odd, To avoid head-of-line blocking, stream mapping SHOULD only be
depending on the side) stream number; this rule is relevant when implemented when both sides support message interleaving [RFC8260].
lower numbers become available because Connection objects This allows a sender to schedule transmissions between multiple
associated to the streams are closed. streams without risking that transmission of a large message on one
stream might block transmissions on other streams for a long time.
To avoid conflicts between stream ids, the following procedure is
recommended: the first Connection, for which the SCTP association has
been created, MUST always use stream id zero. All additional
Connections are assigned to unused stream ids in growing order. To
avoid a conflict when both endpoints map new Connections
simultaneously, the peer which initiated association MUST use even
stream ids whereas the remote side MUST map its Connections to odd
stream ids. Both sides maintain a status map of the assigned stream
ids. Generally, new streams SHOULD consume the lowest available
(even or odd, depending on the side) stream id; this rule is relevant
when lower ids become available because Connection objects associated
with the streams are closed.
SCTP stream mapping as described here has been implemented in a
research prototype; a desription of this implementation is given in
[NEAT-flow-mapping].
Initiate: If this is the only Connection object that is assigned to Initiate: If this is the only Connection object that is assigned to
the SCTP association or stream mapping has not been negotiated, the SCTP association or stream mapping is not used, CONNECT.SCTP
CONNECT.SCTP is called. Else, unless the Selection Property is called. Else, unless the Selection Property
"activeReadBeforeSend" is Preferred or Required, a new stream is "activeReadBeforeSend" is Preferred or Required, a new stream is
used: if there are enough streams available, "Initiate" is just a used: if there are enough streams available, "Initiate" is a local
local operation that assigns a new stream number to the Connection operation that assigns a new stream id to the Connection object.
object. The number of streams is negotiated as a parameter of the The number of streams is negotiated as a parameter of the prior
prior CONNECT.SCTP call, and it represents a trade-off between CONNECT.SCTP call, and it represents a trade-off between local
local resource usage and the number of Connection objects that can resource usage and the number of Connection objects that can be
be mapped without requiring a reconfiguration signal. When mapped without requiring a reconfiguration signal. When running
running out of streams, ADD_STREAM.SCTP must be called. out of streams, ADD_STREAM.SCTP must be called.
InitiateWithSend: If this is the only Connection object that is InitiateWithSend: If this is the only Connection object that is
assigned to the SCTP association or stream mapping has not been assigned to the SCTP association or stream mapping is not used,
negotiated, CONNECT.SCTP is called with the "user message" CONNECT.SCTP is called with the "user message" parameter. Else, a
parameter. Else, a new stream is used (see "Initiate" for how to new stream is used (see "Initiate" for how to handle running out
handle running out of streams), and this just sends the first of streams), and this just sends the first message on a new
message on a new stream. stream.
Ready: "Initiate" or "InitiateWithSend" returns without an error, Ready: "Initiate" or "InitiateWithSend" returns without an error,
i.e. SCTP's four-way handshake has completed. If an association i.e. SCTP's four-way handshake has completed. If an association
with the peer already exists, and stream mapping has been with the peer already exists, stream mapping is used and enough
negotiated and enough streams are available, a Connection Object streams are available, a Connection Object instantly becomes Ready
instantly becomes Ready after calling "Initiate" or after calling "Initiate" or "InitiateWithSend".
"InitiateWithSend".
InitiateError: Failure of CONNECT.SCTP. InitiateError: Failure of CONNECT.SCTP.
ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP. ConnectionError: TIMEOUT.SCTP or ABORT-EVENT.SCTP.
Listen: LISTEN.SCTP. If an association with the peer already exists Listen: LISTEN.SCTP. If an association with the peer already exists
and stream mapping has been negotiated, "Listen" just expects to and stream mapping is used, "Listen" just expects to receive a new
receive a new message on a new stream id (chosen in accordance message with a new stream id (chosen in accordance with the stream
with the stream number assignment procedure described above). id assignment procedure described above).
ConnectionReceived: LISTEN.SCTP returns without an error (a result ConnectionReceived: LISTEN.SCTP returns without an error (a result
of successful CONNECT.SCTP from the peer), or, in case of stream of successful CONNECT.SCTP from the peer), or, in case of stream
mapping, the first message has arrived on a new stream (in this mapping, the first message has arrived on a new stream (in this
case, "Receive" is also invoked). case, "Receive" is also invoked).
Clone: Calling "Clone" on an SCTP association creates a new Clone: Calling "Clone" on an SCTP association creates a new
Connection object and assigns it a new stream number in accordance Connection object and assigns it a new stream id in accordance
with the stream number assignment procedure described above. If with the stream id assignment procedure described above. If there
there are not enough streams available, ADD_STREAM.SCTP must be are not enough streams available, ADD_STREAM.SCTP must be called.
called.
Priority (Connection): When this value is changed, or a Message with Priority (Connection): When this value is changed, or a Message with
Message Property "Priority" is sent, and there are multiple Message Property "Priority" is sent, and there are multiple
Connection objects assigned to the same SCTP association, Connection objects assigned to the same SCTP association,
CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities CONFIGURE_STREAM_SCHEDULER.SCTP is called to adjust the priorities
of streams in the SCTP association. of streams in the SCTP association.
Send: SEND.SCTP. Message Properties such as "Lifetime" and Send: SEND.SCTP. Message Properties such as "Lifetime" and
"Ordered" map to parameters of this primitive. "Ordered" map to parameters of this primitive.
skipping to change at page 43, line 41 skipping to change at page 45, line 9
Close: If this is the only Connection object that is assigned to the Close: If this is the only Connection object that is assigned to the
SCTP association, CLOSE.SCTP is called, and the "Closed" event will SCTP association, CLOSE.SCTP is called, and the "Closed" event will
be delivered to the application upon the ensuing CLOSE-EVENT.SCTP. be delivered to the application upon the ensuing CLOSE-EVENT.SCTP.
Else, the Connection object is one out of several Connection objects Else, the Connection object is one out of several Connection objects
that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP that are assigned to the same SCTP assocation, and RESET_STREAM.SCTP
must be called, which informs the peer that the stream will no longer must be called, which informs the peer that the stream will no longer
be used for mapping and can be used by future "Initiate", be used for mapping and can be used by future "Initiate",
"InitiateWithSend" or "Listen" calls. At the peer, the event "InitiateWithSend" or "Listen" calls. At the peer, the event
RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by RESET_STREAM-EVENT.SCTP will fire, which the peer must answer by
issuing RESET_STREAM.SCTP too. The resulting local RESET_STREAM- issuing RESET_STREAM.SCTP too. The resulting local RESET_STREAM-
EVENT.SCTP informs the transport system that the stream number can EVENT.SCTP informs the Transport Services system that the stream id
now be re-used by the next "Initiate", "InitiateWithSend" or "Listen" can now be re-used by the next "Initiate", "InitiateWithSend" or
calls, and invokes a "Closed" event towards the application. "Listen" calls, and invokes a "Closed" event towards the application.
Abort: If this is the only Connection object that is assigned to the Abort: If this is the only Connection object that is assigned to the
SCTP association, ABORT.SCTP is called. Else, the Connection object SCTP association, ABORT.SCTP is called. Else, the Connection object
is one out of several Connection objects that are assigned to the is one out of several Connection objects that are assigned to the
same SCTP assocation, and shutdown proceeds as described under same SCTP assocation, and shutdown proceeds as described under
"Close". "Close".
11. IANA Considerations 11. IANA Considerations
RFC-EDITOR: Please remove this section before publication. RFC-EDITOR: Please remove this section before publication.
skipping to change at page 45, line 31 skipping to change at page 46, line 42
Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric Thanks to Stuart Cheshire, Josh Graessley, David Schinazi, and Eric
Kinnear for their implementation and design efforts, including Happy Kinnear for their implementation and design efforts, including Happy
Eyeballs, that heavily influenced this work. Eyeballs, that heavily influenced this work.
14. References 14. References
14.1. Normative References 14.1. Normative References
[I-D.ietf-taps-arch] [I-D.ietf-taps-arch]
Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G., Pauly, T., Trammell, B., Brunstrom, A., Fairhurst, G.,
Perkins, C., Tiesel, P., and C. Wood, "An Architecture for Perkins, C., Tiesel, P. S., and C. A. Wood, "An
Transport Services", Work in Progress, Internet-Draft, Architecture for Transport Services", Work in Progress,
draft-ietf-taps-arch-09, 2 November 2020, Internet-Draft, draft-ietf-taps-arch-10, 30 April 2021,
<https://www.ietf.org/internet-drafts/draft-ietf-taps- <https://www.ietf.org/archive/id/draft-ietf-taps-arch-
arch-09.txt>. 10.txt>.
[I-D.ietf-taps-interface] [I-D.ietf-taps-interface]
Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G., Trammell, B., Welzl, M., Enghardt, T., Fairhurst, G.,
Kuehlewind, M., Perkins, C., Tiesel, P., Wood, C., Pauly, Kuehlewind, M., Perkins, C., Tiesel, P. S., Wood, C. A.,
T., and K. Rose, "An Abstract Application Layer Interface Pauly, T., and K. Rose, "An Abstract Application Layer
to Transport Services", Work in Progress, Internet-Draft, Interface to Transport Services", Work in Progress,
draft-ietf-taps-interface-12, 9 April 2021, Internet-Draft, draft-ietf-taps-interface-12, 9 April
<https://www.ietf.org/internet-drafts/draft-ietf-taps- 2021, <https://www.ietf.org/archive/id/draft-ietf-taps-
interface-12.txt>. interface-12.txt>.
[RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP [RFC7413] Cheng, Y., Chu, J., Radhakrishnan, S., and A. Jain, "TCP
Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014, Fast Open", RFC 7413, DOI 10.17487/RFC7413, December 2014,
<https://www.rfc-editor.org/info/rfc7413>. <https://www.rfc-editor.org/info/rfc7413>.
[RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
Transfer Protocol Version 2 (HTTP/2)", RFC 7540, Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
DOI 10.17487/RFC7540, May 2015, DOI 10.17487/RFC7540, May 2015,
<https://www.rfc-editor.org/info/rfc7540>. <https://www.rfc-editor.org/info/rfc7540>.
[RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann,
"Stream Schedulers and User Message Interleaving for the
Stream Control Transmission Protocol", RFC 8260,
DOI 10.17487/RFC8260, November 2017,
<https://www.rfc-editor.org/info/rfc8260>.
[RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of [RFC8303] Welzl, M., Tuexen, M., and N. Khademi, "On the Usage of
Transport Features Provided by IETF Transport Protocols", Transport Features Provided by IETF Transport Protocols",
RFC 8303, DOI 10.17487/RFC8303, February 2018, RFC 8303, DOI 10.17487/RFC8303, February 2018,
<https://www.rfc-editor.org/info/rfc8303>. <https://www.rfc-editor.org/info/rfc8303>.
[RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the [RFC8304] Fairhurst, G. and T. Jones, "Transport Features of the
User Datagram Protocol (UDP) and Lightweight UDP (UDP- User Datagram Protocol (UDP) and Lightweight UDP (UDP-
Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018, Lite)", RFC 8304, DOI 10.17487/RFC8304, February 2018,
<https://www.rfc-editor.org/info/rfc8304>. <https://www.rfc-editor.org/info/rfc8304>.
skipping to change at page 46, line 51 skipping to change at page 48, line 5
[RFC8923] Welzl, M. and S. Gjessing, "A Minimal Set of Transport [RFC8923] Welzl, M. and S. Gjessing, "A Minimal Set of Transport
Services for End Systems", RFC 8923, DOI 10.17487/RFC8923, Services for End Systems", RFC 8923, DOI 10.17487/RFC8923,
October 2020, <https://www.rfc-editor.org/info/rfc8923>. October 2020, <https://www.rfc-editor.org/info/rfc8923>.
14.2. Informative References 14.2. Informative References
[I-D.ietf-quic-transport] [I-D.ietf-quic-transport]
Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed Iyengar, J. and M. Thomson, "QUIC: A UDP-Based Multiplexed
and Secure Transport", Work in Progress, Internet-Draft, and Secure Transport", Work in Progress, Internet-Draft,
draft-ietf-quic-transport-34, 14 January 2021, draft-ietf-quic-transport-34, 14 January 2021,
<https://www.ietf.org/internet-drafts/draft-ietf-quic- <https://www.ietf.org/archive/id/draft-ietf-quic-
transport-34.txt>. transport-34.txt>.
[I-D.ietf-tcpm-2140bis] [I-D.ietf-tcpm-2140bis]
Touch, J., Welzl, M., and S. Islam, "TCP Control Block Touch, J., Welzl, M., and S. Islam, "TCP Control Block
Interdependence", Work in Progress, Internet-Draft, draft- Interdependence", Work in Progress, Internet-Draft, draft-
ietf-tcpm-2140bis-11, 12 April 2021, ietf-tcpm-2140bis-11, 12 April 2021,
<https://www.ietf.org/internet-drafts/draft-ietf-tcpm- <https://www.ietf.org/archive/id/draft-ietf-tcpm-2140bis-
2140bis-11.txt>. 11.txt>.
[NEAT-flow-mapping] [NEAT-flow-mapping]
"Transparent Flow Mapping for NEAT", Workshop on Future of "Transparent Flow Mapping for NEAT", IFIP NETWORKING 2017
Internet Transport (FIT 2017) , 2017. Workshop on Future of Internet Transport (FIT 2017) ,
2017.
[RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and [RFC1928] Leech, M., Ganis, M., Lee, Y., Kuris, R., Koblas, D., and
L. Jones, "SOCKS Protocol Version 5", RFC 1928, L. Jones, "SOCKS Protocol Version 5", RFC 1928,
DOI 10.17487/RFC1928, March 1996, DOI 10.17487/RFC1928, March 1996,
<https://www.rfc-editor.org/info/rfc1928>. <https://www.rfc-editor.org/info/rfc1928>.
[RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager",
RFC 3124, DOI 10.17487/RFC3124, June 2001, RFC 3124, DOI 10.17487/RFC3124, June 2001,
<https://www.rfc-editor.org/info/rfc3124>. <https://www.rfc-editor.org/info/rfc3124>.
skipping to change at page 47, line 40 skipping to change at page 48, line 44
"Session Traversal Utilities for NAT (STUN)", RFC 5389, "Session Traversal Utilities for NAT (STUN)", RFC 5389,
DOI 10.17487/RFC5389, October 2008, DOI 10.17487/RFC5389, October 2008,
<https://www.rfc-editor.org/info/rfc5389>. <https://www.rfc-editor.org/info/rfc5389>.
[RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using [RFC5766] Mahy, R., Matthews, P., and J. Rosenberg, "Traversal Using
Relays around NAT (TURN): Relay Extensions to Session Relays around NAT (TURN): Relay Extensions to Session
Traversal Utilities for NAT (STUN)", RFC 5766, Traversal Utilities for NAT (STUN)", RFC 5766,
DOI 10.17487/RFC5766, April 2010, DOI 10.17487/RFC5766, April 2010,
<https://www.rfc-editor.org/info/rfc5766>. <https://www.rfc-editor.org/info/rfc5766>.
[RFC6525] Stewart, R., Tuexen, M., and P. Lei, "Stream Control
Transmission Protocol (SCTP) Stream Reconfiguration",
RFC 6525, DOI 10.17487/RFC6525, February 2012,
<https://www.rfc-editor.org/info/rfc6525>.
[RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762, [RFC6762] Cheshire, S. and M. Krochmal, "Multicast DNS", RFC 6762,
DOI 10.17487/RFC6762, February 2013, DOI 10.17487/RFC6762, February 2013,
<https://www.rfc-editor.org/info/rfc6762>. <https://www.rfc-editor.org/info/rfc6762>.
[RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service [RFC6763] Cheshire, S. and M. Krochmal, "DNS-Based Service
Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013, Discovery", RFC 6763, DOI 10.17487/RFC6763, February 2013,
<https://www.rfc-editor.org/info/rfc6763>. <https://www.rfc-editor.org/info/rfc6763>.
[RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer
Protocol (HTTP/1.1): Message Syntax and Routing", Protocol (HTTP/1.1): Message Syntax and Routing",
RFC 7230, DOI 10.17487/RFC7230, June 2014, RFC 7230, DOI 10.17487/RFC7230, June 2014,
<https://www.rfc-editor.org/info/rfc7230>. <https://www.rfc-editor.org/info/rfc7230>.
[RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services
(Diffserv) and Real-Time Communication", RFC 7657, (Diffserv) and Real-Time Communication", RFC 7657,
DOI 10.17487/RFC7657, November 2015, DOI 10.17487/RFC7657, November 2015,
<https://www.rfc-editor.org/info/rfc7657>. <https://www.rfc-editor.org/info/rfc7657>.
[RFC8260] Stewart, R., Tuexen, M., Loreto, S., and R. Seggelmann,
"Stream Schedulers and User Message Interleaving for the
Stream Control Transmission Protocol", RFC 8260,
DOI 10.17487/RFC8260, November 2017,
<https://www.rfc-editor.org/info/rfc8260>.
[RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive
Connectivity Establishment (ICE): A Protocol for Network Connectivity Establishment (ICE): A Protocol for Network
Address Translator (NAT) Traversal", RFC 8445, Address Translator (NAT) Traversal", RFC 8445,
DOI 10.17487/RFC8445, July 2018, DOI 10.17487/RFC8445, July 2018,
<https://www.rfc-editor.org/info/rfc8445>. <https://www.rfc-editor.org/info/rfc8445>.
[TCP-COUPLING] [TCP-COUPLING]
"ctrlTCP: Reducing Latency through Coupled, Heterogeneous "ctrlTCP: Reducing Latency through Coupled, Heterogeneous
Multi-Flow TCP Congestion Control", IEEE INFOCOM Global Multi-Flow TCP Congestion Control", IEEE INFOCOM Global
Internet Symposium (GI) workshop (GI 2018) , n.d.. Internet Symposium (GI) workshop (GI 2018) , n.d..
skipping to change at page 49, line 21 skipping to change at page 50, line 37
protocol and/or path selection, or the transmission of messages given protocol and/or path selection, or the transmission of messages given
a Protocol Stack that implements them. These are not part of the a Protocol Stack that implements them. These are not part of the
interface, and may be removed from the final document, but are interface, and may be removed from the final document, but are
presented here to support discussion within the TAPS working group as presented here to support discussion within the TAPS working group as
to whether they should be added to a future revision of the base to whether they should be added to a future revision of the base
specification. specification.
B.1. Properties Affecting Sorting of Branches B.1. Properties Affecting Sorting of Branches
In addition to the Protocol and Path Selection Properties discussed In addition to the Protocol and Path Selection Properties discussed
in Section 4.1.5, the following properties under discussion can in Section 4.1.3, the following properties under discussion can
influence branch sorting: influence branch sorting:
* Bounds on Send or Receive Rate: If the application indicates a * Bounds on Send or Receive Rate: If the application indicates a
bound on the expected Send or Receive bitrate, an implementation bound on the expected Send or Receive bitrate, an implementation
may prefer a path that can likely provide the desired bandwidth, may prefer a path that can likely provide the desired bandwidth,
based on cached maximum throughput, see Section 9.2. The based on cached maximum throughput, see Section 9.2. The
application may know the Send or Receive Bitrate from metadata in application may know the Send or Receive Bitrate from metadata in
adaptive HTTP streaming, such as MPEG-DASH. adaptive HTTP streaming, such as MPEG-DASH.
* Cost Preferences: If the application indicates a preference to * Cost Preferences: If the application indicates a preference to
skipping to change at page 49, line 47 skipping to change at page 51, line 14
Appendix C. Reasons for errors Appendix C. Reasons for errors
The Transport Services API [I-D.ietf-taps-interface] allows for the The Transport Services API [I-D.ietf-taps-interface] allows for the
several generic error types to specify a more detailed reason as to several generic error types to specify a more detailed reason as to
why an error occurred. This appendix lists some of the possible why an error occurred. This appendix lists some of the possible
reasons. reasons.
* InvalidConfiguration: The transport properties and endpoints * InvalidConfiguration: The transport properties and endpoints
provided by the application are either contradictory or provided by the application are either contradictory or
incomplete. Examples include the lack of a remote endpoint on an incomplete. Examples include the lack of a Remote Endpoint on an
active open or using a multicast group address while not active open or using a multicast group address while not
requesting a unidirectional receive. requesting a unidirectional receive.
* NoCandidates: The configuration is valid, but none of the * NoCandidates: The configuration is valid, but none of the
available transport protocols can satisfy the transport properties available transport protocols can satisfy the transport properties
provided by the application. provided by the application.
* ResolutionFailed: The remote or local specifier provided by the * ResolutionFailed: The remote or local specifier provided by the
application can not be resolved. application can not be resolved.
 End of changes. 73 change blocks. 
297 lines changed or deleted 346 lines changed or added

This html diff was produced by rfcdiff 1.46. The latest version is available from http://tools.ietf.org/tools/rfcdiff/