Lecture 2 discusses connection establishment. It begins by reviewing
the operation of TCP and showing how TCP connections are established,
and what factors influence the performance of connection establishment.
It considers the impact TLS and IPv6 on connection establishment, and
discusses the need for connection racing. And it reviews the idea of
peer-to-peer connection, and the difficulties network address translation
causes for peer-to-peer connection establishment. The lecture concludes
with a brief explanation of how NAT binding discovery, and the ICE
algorithm for peer-to-peer connection establishment, work.
The 1st part of this lecture reviews the operation of TCP.
It outlines the TCP service model, segment format, and
programming model. Then, it discusses how TCP connections
are established and considers the impact of network latency
on TCP connection establishment performance. It concludes
with a review of how this connection establishment latency
affects protocol design.
Slides for part 1
In this lecture, I’ll talk about the
problem of connection establishment in a fragmented
network, and discuss some issues that affect
the performance of TCP connection establishment and
There are five parts to this lecture.
In this first part, I’ll review the
TCP transport protocol and its programming model,
and talk about client-server connection establishment.
In the next part, I’ll discuss the
implications of Transport Layer Security, TLS,
and the use of IPv6 on connection establishment,
and show how these changes affect performance.
Following that, in part three,
I’ll talk about peer-to-peer connections and the impact of
network address translation.
Then, in the last two parts,
I’ll talk about some of the problems
caused by NATs, network address translation devices,
and outline how NAT traversal works.
To begin, I’ll briefly review the Transmission
Control Protocol, TCP. I’ll talk about the
purpose of TCP, then review the TCP
segment format, and the service model TCP
offers to applications.
Then, I’ll discuss how TCP connections
are written using the Berkeley sockets API.
TCP is currently the most widely used
transport protocol in the Internet.
A TCP connection provides a reliable,
ordered, byte stream delivery service
that runs on the best-effort IP network.
Once a TCP connection is established,
an application can write a sequence of bytes
into the socket, representing the connection,
and TCP will deliver those bytes to the receiver,
reliably, and in the correct order.
If any of the IP packets containing TCP
segments are lost, the TCP stack
in the operating system will notice this,
and automatically retransmit the missing segments.
Similarly, if any of the IP packets
are delayed and arrive out of order,
the TCP stack will put the data
back into the correct order before delivering
it to the application.
Finally, TCP will adapt the speed
at which it sends the data
to match the available network capacity.
If the network is busy,
TCP will slow down the rate at which it sends data,
to fairly share the capacity between transmissions.
Similarly, if the network becomes idle,
TCP connections will speed up to use the spare capacity.
This process is known as congestion control,
and TCP implements sophisticated
congestion control algorithms.
We’ll talk more about TCP congestion control in Lecture 6.
Applications using TCP are unaware of retransmissions,
reordering, and congestion control.
They just see a socket,
into which they write a stream of bytes.
Those bytes are then delivered reliably to the receiver.
Internally, those bytes are split into TCP segments.
Each segment has a header added to it,
to identify the data.
The segment is placed inside the data part of an IP packet.
That IP packet is, in turn, put inside the
data part of a link layer frame,
and sent across the network.
The IP layer just sees a sequence
of TCP segments that it must deliver,
and is unaware of their contents.
Equally, TCP gives data segments to the
IP layer to deliver, and is unaware
whether the underlying network is Ethernet,
WiFi, optical fibre, or something else.
The diagram on the slide shows the
format of a TCP segment header,
inside an IPv4 packet.
Data is sent over the link in the order shown,
left-to-right, top-to-bottom, in the
payload part of a link layer frame.
The IP header is sent first,
then the TCP segment header,
then the TCP payload data.
Looking at the TCP segment header,
highlighted in green, we see that it
comprises a number of fields.
A TCP segment starts with the source
and destination port numbers.
The source port number identifies the socket that sent the
segment, while the destination port number identifies
the socket to which it should be delivered.
When establishing a TCP connection, a TCP
server binds to a well-known port that
identifies the type of service it offers.
For example, web servers bind to port 80.
This gives a known destination port
to which clients can connect.
Clients specify the destination port to which they connect,
but usually leave their source port unspecified.
The operating system then chooses
an unused source port for that connection.
All TCP segments sent from the client to the server,
as part of a single connection,
have the same source and destination ports,
and the responses come back with those ports swapped.
Each new connection gets a new source port.
Following the port numbers in the TCP segment header,
come the sequence number and the acknowledgement number.
At the start of a TCP connection,
the sequence number is set to a random initial value.
As data is sent, the TCP sequence
number inserted by the sender increases,
counting the number of bytes of data being sent.
The acknowledgement number indicates the next byte
of data that’s expected by the receiver.
For example, if a TCP segment is
received with sequence number 4,000,
and that segment contains 100 bytes of data,
then the acknowledgement number will be 4,100.
This indicates that the next TCP segment
expected is that with sequence number 4,100.
If the acknowledgement number that comes back
is different from that expected,
it’s a sign that some of the packets have been lost.
TCP will then retransmit the
segments that were in the lost packets.
The data offset field indicates where
the payload data starts in the segment.
That is, it indicates the size of any TCP options
included in the packet.
The reserved bits are not used.
There are then six single-bit flags.
The URG bit indicates whether the urgent pointer is valid.
The ACK bit indicates whether the
acknowledgement number is valid.
The PSH bit indicatesthat this is the last
segment in a message, and should be pushed
up to the application as soon as possible.
The SYN, synchronise, bit is set on
the first packet sent on a connection.
The FIN bit indicates that the connection
should be cleanly closed.
And the RST bit indicates that the connection will reset,
aborted, without cleanup.
The receive window size allows the receiver
to indicate how much buffer space it
has available to receive new data.
This allows the receiver tell the sender to slow down,
if it’s sending faster than the receiver
can process the data.
The checksum is used to detect corrupted
packets, that can then be retransmitted.
Finally, the Urgent Pointer allows TCP senders
to indicate that some data is to be processed
urgently by the receiver.
Unfortunately, experience has shown that the urgent
data mechanism in TCP is not usable in practice,
due to a combination of
an ambiguous specification and inconsistent implementation.
The fixed TCP segment header is
followed by TCP option headers, that allow TCP
to add new features and extensions,
and then the payload data.
TCP retransmits any data
that was sent in packets that are lost,
and makes sure that data is delivered to the
application in the order in which it was originally sent.
It also adapts the speed at which it sends data to
match the available network capacity.
As a result, TCP provides a reliable, ordered,
byte stream delivery service.
A limitation of the TCP service model
is that message boundaries are not preserved.
That is, if an application writes,
for example, a block of 2,000 bytes
to a TCP connection, then TCP will
deliver those 2,000 bytes to the receiver,
reliably, and in the order they are sent.
However, what TCP does not do,
is guarantee that those bytes are delivered to the
receiving application as a single block of 2,000 bytes.
That might happen.
Equally, they could be delivered to the
application as two blocks of 1,000 bytes.
Or as a block of 1,500 bytes,
followed by a block of 500 bytes.
Or as 2,000 single bytes. Or as
any other combination, provided the data is
delivered reliably and in the order sent.
This complicates the design of applications that
use TCP, since they have to parse
the data received from a TCP socket
to check if they’ve got the complete message.
Despite this inconvenience,
TCP is the right choice for most applications.
If you need to deliver data reliably,
and as fast as possible, then use TCP.
TCP is a client-server protocol.
Servers listen for, and respond to, requests from clients.
The way you write code to use
a TCP connection depends on whether you’re
writing a client or a server.
On the server side, you begin by creating a socket.
The first argument to the socket() call
is the constant PF_INET, from the sys/socket.h header,
if you want the server to listen for connections on IPv4.
Alternatively,if you want the server to listen
for IPv6 connections, use PF_INET6.
The second argument will be the constant,
SOCK_STREAM, to indicate that a TCP server is wanted.
The third argument is unused as must be zero.
You then call the bind() function,
passing the file descriptor representing the newly
created socket as the first argument.
The other arguments to bind() specify the
port number on which the server
should listen for incoming connections.
The bind() function assigns the requested
port number to the socket.
You then call the listen() function.
This starts the server listening for incoming connections.
Then, you call accept(), to indicate that
your server is ready to accept a new connection.
The accept() function doesn’t return
until a client connects to the server.
This could potentially be a long wait.
Meanwhile, the client application creates its own socket.
This is done using the socket() function,
in exactly the same way as the server.
The client then calls connect(),
passing the file descriptor for its newly created socket
as the first argument. The subsequent arguments
contain the IP address of the server,
and the port number on which the server is listening.
The connect() call makes TCP establish a
connection from the client to the server.
When it returns, either the connection has
been successfully established, or the server is unreachable.
When the connection request reaches the server,
the accept() call completes.
The return value is a new file descriptor,
representing the newly established connection.
The original file descriptor,
that was listening for incoming connections,
The client and server can now call send() and recv(),
to send and receive data over the connection.
They can send and receive as much,
or as little, data as they want, and TCP places
no restrictions on the order in which
client and server send.
Remember to use the file descriptor representing
the accepted connection, not the file descriptor
representing the listening socket, when writing the
Finally, when they’ve finished, client and server
call the close() function, to cleanly shut
down the connection.
Once the client has closed the connection, it’s done.
A server can repeatedly accept new connections
from the listening socket.
How does connection establishment actually work?
And what are the factors that affect how
quickly connections can be established?
In the following, I’ll talk in detail
about how client-server connection establishment works for
TCP, and what factors limit performance.
usually work in a client-server manner.
The server listens for connections on a well-known port,
and the client connects to the server, sends a request,
and receives a response.
TCP can also, in principle, be used
in a peer-to-peer manner.
If two devices create TCP sockets,
bind to known ports, and simultaneously attempt
to connect() to each other, then TCP
will able to create a connection,
provided there are no firewalls or NAT
devices blocking traffic.
This is known as simultaneous open.
TCP simultaneous open can work,
but isn’tespecially useful, since it requires both peers
to try to connect at the same time.
It’s usually better to work in client-server mode,
since the server can wait for clients
and the client and server don’t need to synchronise
when to connect.
How does client-server TCP connection
establishment actually work?
First, the server creates a TCP socket,
binds it to a port, tells it
to listen for connections, and calls accept().
At that point, the server blocks,
waiting for a connection.
The client creates a TCP socket and calls connect().
This triggers the TCP connection setup handshake,
and causes the client to send a
TCP segment to the server. This segment
will have the SYN (“synchronise”) bit set
in its TCP segment header, to indicate
that it’s the first packet of the connection.
It will also include a randomly chosen sequence number.
This is the client’s initial sequence number.
The initial segment does not include any data.
When this initial segment arrives at the
server, the server will send a TCP segment in response.
This segment will also have the SYN
bit set, because it’s the first segment
sent by the server on the connection.
It will also include a randomly chosen
initial sequence number.
This is the server’s initial sequence number.
The segment will also have the ACK
bit set, because it’s acknowledging the initial
segment sent from the client.
TCP acknowledgements report the next
sequence number expected,
and by convention a SYN segment will
consume one sequence number.
Accordingly, the acknowledgement number
in the TCP segment header will
be the client’s initial sequence number plus one.
Since this segment has both the SYN
and ACK bits set, it’s known as a SYN-ACK packet.
When the SYN-ACK packet arrives at the
client, the client acknowledges its receipt back
to the server. The TCP segment it
generates to do this will have its
ACK bit set to one, to indicate
that its acknowledgement number is valid,
and the acknowledgement number will be the
server’s initial sequence number plus one.
The SYN bit is not set on
this packet, since it’s not the first
packet sent from the client to the
The sequence number in the TCP segment
header will equal the client’s initial sequence
number plus one, since the SYN packet
consumes one sequence number. This packet also
doesn’t include any data, since it’s sent
before the connect() call completes.
Once this packet has been sent,
the client considers the connection established,
and the connect() function returns. The client
can now send or receive data on the connection.
Once this final ACK packet arrives at
the server, the three-way handshake is complete.
At this point the accept() function completes,
returning a file descriptor the server can
use to access the new connection.
The server now considers the connection to
Once the three-way handshake has completed,
the client and server can send and
receive data over the connection.
The typical case is that the client
sends data to the server immediately after
it connects, and the server responds with
the requested data. There’s no requirement that
the client sends first, though, or that
client and server alternate in sending data.
The slide shows an example where the
client sends a request comprising a single
segment’s worth of data to the server.
The server then responds by sending a
larger response back, including the acknowledgement for
the request on the first segment of
the response. Finally, we see the client
acknowledge receipt of the segments that comprise
This is the typical pattern when a
browser fetches a web page from a
What’s interesting, is that if we look
at the time from when the client
calls connect(), until the time it receives
the last of data segment from the
server, a significant part of that time
is taken up by the connection setup
It takes a certain amount of time,
known as the round trip time,
to send a minimal sized request to
the server and get a minimal sized
response. Larger requests and responses add to
this, based on the time to send
the additional data down the link,
known as the serialisation time for the
data. But, if the amount of data
being requested from the server is small,
it’s often the round trip time that dominates.
For example, let’s assume a browser is
requesting a simple web page from a
web server using HTTP running over TCP.
That web page comprises a single HTML
file, a CSS style sheet, and an
image. The HTML and CSS files are
on one server, while the image is
located on a different server.
How long does it take to retrieve
Well, the client initially connects to the
server where the HTML file is located.
This takes one round-trip time for the
SYN and SYN-ACK packets to be exchanged,
and for the connect() call to complete.
As soon as the connect() completes,
the client sends the request for the
HTML. It takes another round-trip for the
request to reach the server and the
first part of the response to come
back, followed by the serialisation time for
the rest of the response.
When it’s received the HTML, the client
knows that it needs to retrieve the
CSS file and the image.
It reuses the existing connection to the
first server to retrieve the CSS file.
This takes an additional round trip,
plus the serialisation time of the CSS
In parallel to this, it opens a
TCP connection to the second server,
sends a request for the image,
and downloads the response. This takes two
round trips, plus the time to send
the image data.
Whichever of these takes the longest,
plus the amount of time to make
the initial connection and fetch the HTML,
determines the total time to download the page.
The round trip time depends on the
distance from the client to the server.
The serialisation time depends on the available
capacity of the network. For example,
if the image is 1 megabyte,
8 megabits, in size, and the available
bandwidth of the link is 2 megabits
per second, then the image will take
4 seconds to download, in addition to
the round trip time.
It can be seen that the total
download time depends on both the round
trip time, the available bandwidth, and the
size of the data being downloaded.
What’s a typical round trip time?
This depends on the distance between the
client and server, and on the amount
of network congestion. The table on the
right gives some typical values, measured from
a laptop on my home network to various destinations.
There’s a lot of variation. In the
best case, it takes around 33ms to
get a response from a server in
the UK, around 100ms to get a
response from a server on the East
coast of the US, around 165ms from
a server in California, and around 300ms
from Australia. Worst case, when there’s other
traffic on the network, is considerably higher.
This means that a request sent to
a server in New York takes at
least 1/10th of a second, irrespective of
how much data is requested.
What about available bandwidth?
Well, ADSL typically gets around 25 megabits
VDSL, often known as fibre to the
kerb, where the connection runs over your
home phone line to a cabinet in
your street, then over fibre to the
exchange and beyond, typically gets around 50
megabits per second.
And fibre to the premises, where the
optical fibre runs direct into your home,
can transmit several hundred megabits per second.
4G wireless is highly variable, depending on
the options enabled by your provider and
the reception quality, but somewhere in the
15-30 megabits per second range is typical.
What does this mean in practice?
Let’s take the example of a single
web page we used before, comprising HTML,
a CSS style sheet, and a single
image, and plug in some typical numbers
for the file sizes, as shown on
the slide. Let’s also assume that the
round trip time is the same for
both servers, to make the numbers easier.
The table then plots the total time
it would take to download that simple
web page, given different values for bandwidth
and the round trip time to the
The slowest case is the bottom left
of the table, where it would take
45.1 seconds to download the page,
assuming a 1 megabit per second link
to a server with a 300ms round
trip time. This models a slow connection
to a server in Australia.
The fastest is the top right of
the table, where it takes 0.04 seconds
to download the page from a server
located 1ms away on a gigabit link.
What’s interesting is how the download time
varies as the link speed improves.
If we look at the top row,
with 1ms round trip time, we see
that if we increase the bandwidth by
a factor of ten, from 100Mbps to
1Gbps, the time taken to download the
page goes down by a factor of
ten, a 90% reduction. The link is
ten times faster, and the page downloads
ten time faster.
If we look instead at the bottom
row, with 300ms round trip time,
increasing the link speed from 100Mbps to
1Gbps gives only a 22% reduction in
Other links are somewhere in the middle.
Internet service providers like to advertise their
services based on the link speed.
They proudly announce that they can now
provide gigabit links, and that these are
now more than ten times faster than before!
And this is true.
But, in terms of actual download time,
unless you’re downloading very large files,
the round trip time is often the
limiting factor. The download time, for typical
pages, may only improve by a factor
of two if the link gets 10x
Is it still worth paying extra for
that faster Internet connection?
What does this mean for protocol design?
The example shows an HTTP/1.1 exchange.
Once the connection has been opened,
the client sends the data shown in
blue, prefixed with the letter “C:”,
to the server. The server then responds
with the data shown in red,
prefixed with the letter “S:”, comprising some
header information and the requested page.
Everything is completed in a single round
trip. Request. Then response.
Compare that with this example, showing the
Simple Mail Transfer Protocol, SMTP, used to
As with the previous slide, data sent
from client to server is shown in
blue and prefixed with the letter “C:”,
and that sent from the server to
the client is in red, prefixed with
the letter “S:”.
We see that the protocol is very
Once the connection is established, after the
SYN, SYN-ACK, and ACK, the server sends
an initial greeting. Establishing the connection and
sending this initial greeting takes two round trips.
The client then sends HELO, and waits
for the go ahead from the server.
This takes one more round trip.
The client then sends the from address,
and waits for the server. One more
The client then send the recipients,
and waits for the server. One more
The client then says it’d like to
send data now, and waits for the
server. One more round trip.
Then, finally the client gets to send
the data, and once it’s confirmed that
the data was received, sends QUIT,
waits, then closes the connection.
The whole exchange takes eight round trips.
Is this necessary or efficient?
If the protocol were designed differently,
all the data could be sent at
once, as soon as the connection was
opened, and the server could respond with
an okay or an error. The eight
round trips could be reduced to two:
one to establish the connection, one to
send the message and get confirmation from
This is why email is slow to send.
TCP establishes connections using a three-way handshake.
SYN, SYN-ACK, ACK.
The time to establish a connection depends
on round trip time and the bandwidth.
Links are now fast enough that the
round trip time is generally the dominant
factor, even for relatively slow links.
The best way to improve application performance
is usually to reduce the number of
messages that need to be sent from
client to server. That is, to reduce
the number of round trips. Unless you’re
sending a lot of data, increasing the
bandwidth generally makes very little difference to
The 2nd part of the lecture discusses the impact of TLS and IPv6 on
TCP connection establishment. It shows how the use of TLS, to secure
connections, increases the connection establishment latency. And it
discusses the "happy eyeballs" technique for connection racing, to
reduce connection establishment delays, in dual stack IPv4 and IPv6
Slides for part 2
In the previous part, I discussed TCP
connection establishment, and highlighted that the round-trip
time is often the limiting factor for
In the following, I want to discuss
the performance implications of adding transport layer
security to TCP connections, and how to
achieve good performance when the destination is
a dual-stack IPv4 and IPv6 host.
In the previous part, I showed how
the network round trip time can be
the limiting factor in performance. This is
because every TCP connection needs at least
two round trip times: one to establish
the connection, and one for the client
to send a request and receive a
response from the server.
I also showed how the protocol running
over TCP can make a significant difference
to performance, with the examples of HTTP,
which sends a request and receives a
response in a single round trip,
and SMTP, which makes multiple unnecessary round trips.
One of the important protocols that runs
over TCP is the transport layer security
TLS provides security for a TCP connection.
That is, it allows the client and
server to agree encryption and authentication keys
to make sure that the data sent
over that TCP connection is confidential and
protected from modification in transit.
TLS is essential to Internet security.
When you retrieve a secure web page
using HTTPS, it first opens a TCP
connection to the server. Then, it runs
TLS to enable security for that connection.
Then, it asks to retrieve the web
Depending on the version of TLS used,
this adds additional time to the connection.
With the latest version of TLS,
TLS v1.3, it takes one additional round
trip to agree the encryption and authentication
keys. That is, after the TCP connection
has been established, via the SYN -
SYN-ACK - ACK handshake, then the client
and server need an additional round trip
to enable TLS, before they can request data.
The TLS handshake is in three parts.
First, the client sends a TLS ClientHello
message to the server to propose security
parameters. Then, the server responds with a
TLS ServerHello, containing its keys and other
security parameters. Finally, assuming there’s a match,
the client responds with a TLS Finished
message to set the encryption parameters.
The client then immediately follows this by
sending the application data, such as an
HTTP GET request, without waiting for a
This adds one additional round trip time, in most cases.
Older versions of TLS take longer.
TLS v1.2, for example, takes at least
two round trips to negotiate a secure
What impact does the additional round trip
due to TLS have on performance?
Well, let’s look again at the simple
web page download examples from the previous
When the round-trip time is negligible,
as on the top row with 1ms
round trip time, performance is unchanged.
As we go down the table,
though, performance gets worse. With 100ms round
trip time, both the overall performance and
the benefit of increasing the link speed
go down. The download time for the
page on a gigabit link is increased
by 45%, from 0.44 to 0.64 seconds,
compared to a connection without TLS.
And the benefit of going from a
100 megabit link to a gigabit link
is only 36%, rather than 45% without
With 300ms round trip time the behaviour
is even worse. Total download time increases
by 48% compared to the non-TLS case,
and there’s only a 22% reduction in
download time when going from a 100
megabit link to a gigabit link.
This is not to say that TLS
is bad! Far from it – security
Rather, it further highlights that the number
of round trips that a connection must
perform, between client and server, is often
the limiting factor in performance.
Applications that have good performance will try
to reduce the number of TCP connections
that they establish, since each connection takes
time to establish. They also try to
limit the number of request-response exchanges,
each taking a round trip, they make
on each connection.
TLS v1.3, standardised in 2018, was a
big win here, because it reduces the
number of round trips needed to enable
security from two, down to one.
When used with TCP, this gives the
best possible performance: one round trip to
establish the TCP connection, and one to
negotiate the security, before the data can
We’ll talk more about TLS and how
to improve the performance of secure connections
in lectures 3 and 4.
The other factor affecting TCP connection performance
is the ongoing transition to IPv6.
This transition means that we currently have
two Internets: the IPv4 Internet and the
Some hosts can only connect using IPv4.
Some hosts can only connect using IPv6.
And some hosts have both types of address.
Similarly, some network links can only carry
IPv4 traffic, some only IPv6, and some
links can carry both types of traffic.
And some firewalls, or other middleboxes,
block IPv4, some block IPv6, and some
block both types of traffic.
Importantly, the IPv6 network is not a
subset of the IPv4 network. It’s a
separate Internet, that overlaps in places.
Given that some hosts will be reachable
over IPv4 but not IPv6, and vice
versa, how do you establish connections during
Well, given a hostname, you perform a
DNS lookup to find the IP addresses
for that host using the getaddinfo() call.
This returns a list of possible IP
addresses for the host, including both IPv4
and IPv6 addresses.
The simple approach is a loop,
trying each address in turn,
until one successfully connects.
This works, but can be very slow.
In the example on the slide,
Netflix has 16 possible IP addresses,
eight IPv6 and eight IPv4, and lists
the IPv6 addresses first in its DNS
response. If you have only IPv4 connectivity,
it may take a long time to
try, and fail, to connect to eight
different IPv6 addresses before you get to
an IPv4 address that works.
To get good performance, applications use a
technique known as “Happy Eyeballs”.
This involves making two separate DNS lookups,
in parallel, one asking for only IPv4
addresses and one for only IPv6 addresses.
Starting with whichever of these DNS lookups
completes first, the client makes a connection
to the first address returned by the
server. If that hasn’t succeeded within 100ms,
it starts another connection request to the
next possible address, alternating between IPv4 and
The different connection requests proceed in parallel,
until once eventually succeeds. That first successful
connection is used, whether over IPv4 or
IPv6, and the other connection requests are cancelled.
The happy eyeballs technique tries to balance
the time taken to connect vs.
the network overload of trying many possible
connections at once in parallel. It adds
complexity to the connection setup, to achieve
The two factors affecting TCP performance are
bandwidth and latency. In many cases,
the latency, the round trip time,
The are five ways in which applications
using TCP improve their performance.
The first is that a client should
use something like happy eyeballs, overlapping connection
requests if the server, if the server
has more than one address. This is
more complicated to implement than trying to
connect to each different address in turn,
but connects a lot faster.
The second way to improve TCP performance
is to reduce the number of TCP
connections made. Each connection takes time to
establish. If you can make a single
TCP connection and reuse it for multiple
requests, that’s faster than making a new
connection for each request.
Third, if you reduce the number of
request-response exchanges made over each connection,
you reduce the impact of the round
All these are possible for any application,
by using TCP connections effectively.
There are also two more radical changes
that can be made.
The first is to overlap and TCP
and TLS connection setup handshakes, by sending
the security parameters along with the initial
connection request, so that both the connection
setup and security parameters can be negotiated
in a single round trip. This isn’t
possible with TCP, but the QUIC transport
protocol, that we’ll discuss in lecture 4,
does allow this.
Finally, one can always improve performance by
reducing the round trip latency. This latency
depends on two things: the speed at
which the signal propagates down the link,
and the amount of other traffic.
Since signals travel down electrical cables and
optical fibres at the speed of light,
there’s little that can be done to
increase the propagation speed, although low earth
orbit satellites can help, we’ll discuss in
Reducing the amount of other traffic queued
up at intermediate links is a possibility
though, and this can be affected by
the choice of TCP congestion control algorithm.
We’ll talk about this in lectures 5
To summarise, one of the limiting factors
with TCP performance is the round trip
The use of TLS is essential to
improve security, but comes at the expense
of an additional round trip that slows
down connection establishment. This is solved by
the upcoming QUIC transport protocol, that we’ll
discuss in lecture 4.
Similarly, the ongoing migration to IPv6 means
that servers often have both IPv4 and
IPv6 addresses, and it’s not clear which
of these are reachable. Clients must try
to establish multiple connections in parallel,
using the happy eyeballs technique, to get
The 3rd part starts to discuss peer-to-peer connections. It talks about
how the use of Network Address Translation (NAT) affects addressing and
connection establishment, and why it complicates creating peer-to-peer
Slides for part 3
In this part, I’ll start to talk
about peer-to-peer connections, network address translation,
and how these affect Internet addressing and
The Internet was designed as a peer-to-peer
network, and makes no distinction between clients
and servers at the IP layer.
In principle, it should be possible to
run a TCP server, or a UDP-
or TCP-based peer-to-peer application, on any host
on the network. As long as the
clients have some way of finding the
server’s IP address, and knowing what port
number it’s using, and as long as
any firewall pinholes are opened, then it
shouldn’t matter whether a server is located
in someone’s home or in a data
A server in a data centre is
likely to have better performance, of course,
because it’s probably got a faster connection
to the rest of the network.
It’s also likely to be more robust,
because the data centre will have redundant
power and network links, air conditioning,
and professional system administrators. But, at the
protocol level, there shouldn’t be a difference.
In practise, this is not the case.
It’s difficult to run a server on
a host connected to most residential broadband
connections, and it’s difficult to make peer-to-peer
The reason for this is the widespread
use of network address translation – NAT.
What is network address translation?
NAT is the process by which several
devices can share a single public IP
address. It allows several hosts to form
a private internal network, with IP addresses
assigned from a special-use range. One device
– the network address translator, the NAT
– is connected to both the private
network and to the Internet, and can
forward packets between the two networks.
As it does so, it rewrites,
translates, the IP addresses, and the TCP
and UDP port numbers, so all the
packets appear to come from the NAT’s
Essentially, it hides an entire private network
behind a single IP address.
This is useful because there aren’t enough
IPv4 addresses for every device that wants
to connect to the network, and because
it’s taking a long time to deploy IPv6.
NAT is a work around, to let
you keep using IPv4 devices, with some
limitations, even though there aren’t enough IPv4
How does NAT work? Well, let’s first
step back, and think about how a
single host connects to the network.
In the figure, a customer owns a
single host. That host connects to a
network run by an Internet service provider.
That ISP, in turn, connects to the
The ISP owns a range of IP
addresses that it can assign to its customers.
In this example, it owns the IPv4
prefix 203.0.113.0/24. That is, the set of
IPv4 addresses where the first 24 bits,
known as the network part of the
address, match those of 203.0.113.0 are assigned
to the ISP.
These are the IPv4 addresses in the
range 203.0.113.0 to 203.0.113.255.
The address with the host part equal
to zero represents the network, and cannot
be assigned to a device. The ISP
assigns the first usable IP address in
the range, 203.0.113.1, to the internal network
interface of the router that connects it
to the rest of the network,
and assigns the rest of the addresses
to customer machines.
One particular customer is assigned IP address
203.0.113.7 for their device.
The external, Internet-facing, side of the router
that connects the ISP to the rest
of the network has an IP address
assigned by the network to which the
ISP connects. In this example, it gets
IP address 192.0.2.47.
The customer’s host connects to a server
on the Internet. The server happens to
have IP address 192.0.2.53.
The customer’s host sends packets that have
their destination IP address equal to that
of the server, 192.0.2.53, and source IP
address equal to that of the customer’s
Those packets travel through the network without
change, and when they arrive that the
server, they still have destination IP address
192.0.2.53 and source IP address 203.0.113.7.
When it sends a reply, the server
will set the destination IP address to
that of the customer’s device, 203.0.113.7,
and use its own address, 192.0.2.53,
as the source IP address.
No address translation takes place.
At some point later, the customer buys
another host. How does it connect to
What’s supposed to happen is as follows.
First, the customer buys an IP router,
or is given one by the ISP.
The router is used to create an
internal network for the customer, that connects
to the ISP’s network.
This could be an Ethernet, a WiFi
network, or whatever.
The external interface of that router,
that connects the customer to the ISP,
inherits the IP address that was previously
assigned to the customer’s single device,
in this case 203.0.113.7.
The ISP also assigns a new IP
address range to the customer. This will
be a subset of the IP address
range the ISP owns. In this example,
the customer is assigned the IP address
range 203.0.113.16/28. That is, IP addresses where
the first 28 bits match those of
203.0.113.16, namely the range 203.0.113.16 to 203.0.113.31.
The customer assigns the first usable address
in that range, 203.0.113.17, to the internal
network interface of the router, and assigns
other addresses to their two hosts.
In this example, the two hosts are
given addresses 203.0.113.18 and 203.0.113.19.
The end result is that the ISP
delegates some of the IP addresses they
own to their customer, and the customer
uses them in their network.
One of the customer’s hosts connects to
a server on the Internet.
As expected, to do so, that host
sends an IP packet with the source
IP address set to its IP address,
in this case 203.0.113.18, and the destination
address set to the IP address of
the server, 192.0.2.53.
That packet travels through the customer’s network
to its router, and is forwarded on
to the ISP’s network. It traverses the
ISP’s network to the router connecting the
ISP to the Internet, and is forwarded
on from there to the Internet.
Eventually, the packet arrives at the server.
When it arrives, it still has destination
address equal to that of the server,
192.0.2.53, and source address equal to that
of the host that sent it,
When it sends a reply, the server
will set the destination IP address to
that of the customer’s device, 203.0.113.18 and
use its own address, 192.0.2.53, as the
source IP address.
No address translation takes place.
That’s what’s supposed to happen, but what
Well, most likely the ISP either doesn’t
have enough IPv4 addresses to delegate some
of them to their customer, or they
want to charge a lot extra to
Accordingly, the customer buys a network address
translator, and connects it to the ISP’s
network in place of their single original host.
The external interface of the NAT gets
the IP address assigned to the customer’s
original host, 203.0.113.7.
The customer sets up their internal network
as before, but instead of using IP
addresses assigned by their ISP, they use
one of the private IP address ranges.
@ 08:50 In this example, they use
addresses in the range 192.168.0.0 to 192.168.255.255.
The internal interface of the NAT is
given IP address 192.168.0.1, and the two
hosts get addresses 22.214.171.124 and 192.168.0.3.
One of the customer’s hosts again connects
to a server on the Internet.
As expected, that host sends an IP
packet with the source IP address set
to its IP address, in this case
192.168.0.2, and the destination address set to
the IP address of the server, 192.0.2.53.
That packet travels through the customer’s network
to its NAT router. The NAT rewrites
the source address of the packet to
match the external address of the NAT,
in this case 203.0.113.7, and also rewrites
the TCP or UDP port number to
some new port number that’s unused on
the NAT, and forwards the packet on
to the ISP’s network.
Internally, the NAT keeps a record on
the changes it made, associated with the
The packet traverses the ISP’s network to
the router connecting the ISP to the
Internet, and is forwarded on from there
to the Internet. Eventually, the packet arrives
at the server. When it arrives,
it still has destination address equal to
that of the server, 192.0.2.53, but source
address will equal to that of the NAT, 203.0.113.7.
To the server, the packet appears to
have come from the NAT. When it
sends a reply, the server will set
the destination IP address to that of
the NAT, 203.0.113.7, and use its own
address, 192.0.2.53, as the source address.
The reply will traverse the network until
it reaches the NAT. The NAT looks
at the TCP or UDP port number
to which the packet is destined,
and uses this to retrieve its internal
record of the rewrites that were performed.
It then uses this to do the
inverse rewrite, changing the destination IP address
and port in the packet to those
of the host on the private network,
then forwards the packet onto the private
network for delivery.
Essentially, the NAT hides a private network
behind a single public IP address.
The private network can use one of
three private IPv4 address ranges: 10.0.0.0/8,
126.96.36.199/12, and 192.168.0.0/16.
Machines in a private network can directly
talk to each other using these private
IP addresses, provided that communications stays within
the private network.
When they communicate with the rest of
the network, the IP addresses are rewritten
so that, to the rest of the
network, the private network looks like a
single device, with one IP address matching
that of the external address of the
NAT. This gives the illusion that there
are more IPv4 addresses available, by allowing
the same private address ranges being re-used
in different parts of the network.
Your home network, for example, almost certainly
uses addresses in the 192.168.0.0/16 private address
range, and is connected to the rest
of the network via a NAT router
provided by your ISP.
This concludes our review of how NAT
routers allow multiple devices to share a
single IP address. In the next part,
I’ll explain some of the problems NATs cause.
The 4th part of the lecture continues the discussion of the problems
cause by NAT devices, and why they are used despite these problems.
It talks about the use of NAT as a work-around for the lack of IPv4
address space, as a possible translation mechanism between IPv4 and
IPv6, and to avoid renumbering. And it talks about the implications
of NAT for TCP connections and UDP flows.
Slides for part 4
In the previous part we discussed what
is network address translation, and walked through
some examples showing how NAT routers allow
several hosts on a private network to
share a single IP address.
In the following, I want to talk
about some of the problems caused by
NATs, and to discuss some of the
reasons why NATs are used despite these
The first issue with NAT routers is
that they break certain classes of application,
and encourage centralisation.
NATs are designed to support client-server
applications, where the client is behind the NAT
and the server is a host on
the public Internet. Packets sent by a
host with a private IP address can
pass out through the NAT, and will
have their IP address and port translated
to use the public IP address of
the NAT before they’re forwarded to the
public Internet. The NAT will also retain
state, so that the reverse translation will
be applied to replies to those packets,
allowing them to pass back through the NAT.
This behaviour allows clients to connect to
servers, setting up NAT translation state in the process,
and to receive responses.
The reverse doesn’t work, though.
NAT routers rely on outgoing packets to
establish the mappings they need to translate
incoming packets. That is, when an incoming
TCP or UDP packet arrives at some
particular port on a NAT, the NAT
looks at its record of what it
previously sent from that port, and how
it was translated, to know what’s the
reverse translation to make. If there’s been
no outgoing packet on that port,
the NAT won’t know how to translate
the incoming packet. It won’t know which
of the private IP addresses to use
as the destination address for the translated packet.
This complicates running a server behind a
NAT, since the NAT won’t know how
to translate incoming requests for the server.
It’s possible to manually configure the NAT
to forward packets appropriately, of course,
and protools like UPnP can help with
this, but these approaches are complicated or
unreliable. It’s generally easier and more reliable
to pay a cloud computing provider to
host the server, which encourages centralisation onto
large hosting services.
NATs also make it hard to write
peer-to-peer applications. In part, this is because
NATs make incoming connections difficult. But it’s
also because hosts located behind a NAT
only know their private address, so can’t
give their peer a public address to
which it can connect. There are solutions
to this, that I’ll talk about in
the next part of this lecture,
but they’re complicated, slow, and wasteful.
Unless you really need the privacy and
latency benefits of a direct peer-to-peer connection,
it’s often easier to relay traffic via
a server hosted in a data centre
somewhere, with a public IP address,
again encouraging centralisation of services.
If NAT routers are so problematic,
why do people use them? There are
The first is to work around the
lack of IPv4 address space.
As shown in the figure on the
right, the Regional Internet Registries have run
out of IPv4 addresses. There are no
more IPv4 addresses available for ISPs and
companies that want to connect to the
Internet, and they can’t provide enough IPv4
addresses to fulfil demand.
The result is that IPv4 addresses are
scarce and expensive. ISPs either don’t have
enough addresses to meet their customers needs,
or the cost of those addresses is
prohibitive, and customers use a private network
with a NAT instead of using public
The transition to IPv6 will solve this
problem, since IPv6 makes addresses cheap and
plentiful. The smallest possible address allocation for
an IPv6 network is a factor of
four billion times larger than the entire
IPv4 Internet! Unfortunately, the transition to IPv6
has been slow.
This suggests the second reason why NAT
is used: to translate between IPv4 and
In this model, an ISP, or other
network operator, runs IPv6 internally in their
network, and does not support IPv4.
This gives the ISP a clean,
modern, and future-proof network.
The ISP also runs two sets of
For customers that want to use IPv4
internally, the customer uses a private IPv4
network, and the NAT translates the IPv4
packets into IPv6 packets when they leave
the customer’s network. The principle is the
same as the NAT routers we discussed
in the last part of this lecture,
except that rather than rewriting packets with
private IP addresses to have public IPv4
addresses, the NAT rewrites the entire IPv4
header and replaces it with an IPv6 header.
When packets get to the edge of
the ISPs network, where it connects to
the public Internet, they’re either forwarded as
native IPv6 if the destination is accessible
via IPv6, or translated to IPv4 by
The expectation in this approach to running
a network is that, over time,
the number of customers and destination networks
that need IPv4 will go down,
and more traffic will run IPv6 end-to-end.
NAT is used as a, hopefully temporary,
The third reason to use NAT is
to avoid renumbering.
Networks that have a public IP address
range tend, over time, to end up
hard coding IP addresses from that range
into configuration files, applications, and settings.
This is a mistake. Applications should always
use DNS names, to allow the IP
addresses to change, but people do it
The result is that it’s difficult to
change the IP addresses used by machines
on a network. The longer a host
has used a particular IP address,
the more likely it is that something,
somewhere, on the network has that address
hard-coded, and will fail if the host’s address changes.
If a network has an IP address
range delegated to it from its ISP,
what’s known as a provider allocated IP
address range, and wants to change ISP,
then it will need to change the
IP address range it uses to one
delegated from its new provider. Many organisations
have found this sufficiently difficult that it’s
easier to keep the old IP addresses
internally, and use a NAT to translate
addresses to the range assigned by the
A similar problem can occur if one
company buys another, and has to integrate
the IT systems of the new company
into its existing network.
IPv6 has better auto-configuration support than IPv4,
and tries to make renumbering easier,
but it’s not clear how well this
works. As a result, some network equipment
vendors have started selling NATs that translate
between two different IPv6 prefixes, to ease renumbering.
In both cases, a better approach is
that an organisation gets what’s known as
provider independent IP addresses, directly from one
of the Regional Internet Registries, so it
owns the IP addresses it uses.
In this case, the organisation pays its
ISP to route traffic to the addresses
it owns, and can move to a
new ISP without renumbering.
Given these reasons why NAT routers will
be used, despite their problems, what are
the implications for NAT routers on TCP
Well, as I’ve explained, outgoing connections create
state in the NAT, so replies can
be translated to reach the correct host
on the private network. The question is,
then, how does the NAT know what translation state to setup?
The way this works is that the
NAT router looks at TCP segments it’s
translating and forwarding, and watches for packets
representing a TCP connection establishment handshake.
If the NAT sees an outgoing SYN
packet, followed by an incoming SYN-ACK,
then an outgoing ACK, with matching sequence
and acknowledgment numbers, then it can infer
that this is the start of a
TCP connection, and setup the appropriate translation.
TCP connections have a similar exchange at
the end of the connection, with FIN,
FIN-ACK, and ACK packets. The NAT router
can watch for these exchanges, and infer
that the corresponding TCP connections have finished,
and that the translation state can be
Unfortunately, applications and hosts sometimes crash,
and connections disappear without sending the FIN,
FIN-ACK, and ACK packets. For this reason,
NAT routers also implement a timeout.
If a connection waits too long between
sending packets, the NAT will assume it’s
failed, and remove the translation state.
The recommendation from the IETF is that
NATs use a two hour timeout,
but measurements have shown that many NATs
ignore this and use a shorter timer.
The result is that long-lived TCP connections,
that would otherwise go idle, need to
send something, even if just an empty
TCP segment, every few minutes, to prevent
NATs on the path from timing out
and dropping the connection.
If you’ve ever used ssh to login
to a remote system, gone to do
something else, then come back after a
couple of hours and wondered why the
ssh connection has failed, this may well
be due to NAT timeout.
The other issue, as I mentioned at
the start of this part, is that
the NAT won’t have state for incoming
connections, unless manually configured to do so.
This makes it difficult to run a
server or peer-to-peer application behind the NAT.
The implications of NAT for UDP flows
are similar to those for TCP,
except that the lack of connections with
UDP complicates things.
For TCP, a NAT can watch for
the connection establishment and teardown segments,
and know when the TCP connections start
and finish. TCP connections can fail without
sending the FIN, FIN-ACK, ACK exchange,
but this is rare, and NAT routers
generally rely on watching the TCP connection
setup and teardown messages to manage translation state.
UDP, on the other hand, has no
since it has no concept of connections.
This is not a great problem when
it comes to establishing state in a
NAT. If the NAT sees any outgoing
UDP packet with a particular address and
port, it sets up the state in
the NAT to allow replies.
The problem comes with knowing when to
remove that translation state in the NAT.
Since UDP has no “end of connection”
message, the only way to do this
is with a timeout.
The most widely used UDP application,
historically, has been DNS. DNS clients tend
to contact a lot of different servers,
but exchange only a small amount of
data with each. As a result,
many NATs have very short timeouts -
on the order of tens of seconds
- for UDP translation state, to prevent
them accumulating state for too many UDP flows.
An unfortunate consequence of this, is that
applications that use UDP, such as video
conferencing and gaming, must send packets frequently,
in both directions, to make sure the
NAT bindings stay open. The IETF recommends
that such applications send and receive something
at least once every 15 seconds.
This can generate unnecessary traffic.
There is one benefit, though, that comes
from the lack of connection establishment signalling
in UDP. With TCP, the NAT can
see the SYN, SYN-ACK, ACK exchange,
and knows the exact addresses and ports
that the client and server are using.
This allows the NAT to create a
very specific binding, and reject traffic from
These very specific bindings are a security
benefit, but make peer-to-peer connections harder to
establish. UDP applications tend to be more
flexible in where they accept packets from,
so NATs generally establish bindings that allow
any UDP packets that arrive on the
correct port to be translated and forwarded
across the NAT. This makes peer to
peer connection establishment much easier for NAT,
as we’ll see in the next part.
NATs work around three real problems:
lack of IPv4 address space, IPv4 to
IPv6 transition, and renumbering. They work well
for client-server applications, where the client is
behind the NAT and the server is
on the public Internet, but make it
hard to run peer-to-peer applications and to
host servers on networks that use NATs.
This encourages centralisation
of the Internet infrastructure
onto cloud providers and, as we’ll see
in the next part, greatly complicates certain
classes of application.
The final part of the lecture, discusses NAT traversal and peer-to-peer
connection establishment. It outlines the binding discovery process,
by which a client can establish that it's behind a NAT and find the
external IP address of that NAT, and the ICE algorithm for candidate
exchange and peer-to-peer connection establishment.
Slides for part 5
In the final part of this lecture,
I’d like to discuss the problem of
NAT traversal. That is, how applications can
work around the presence of NAT routers
to establish peer-to-peer connections.
As I described in the previous part,
NATs are designed to support outbound connections
from a client in the private network
to a server on the public Internet,
and this use case works well.
Other scenarios are less successful.
Incoming connections, to a server located in
the private network, will fail. This happens
because the NAT can’t know how to
translate the incoming packets. There are work
arounds for this, that involve manually configuring
the NAT to forward incoming connections to
the correct device, but this is difficult
to do correctly.
Similarly, peer-to-peer connections through a NAT will
also fail, unless the packets are sent
in a way that makes the NAT,
or the NATs if there are several
peers all located in private networks,
think that a client-server connection is being
opened, and that the response is coming
from the server. In the following,
I’ll talk about how this can be
The figure shows an example where two
hosts, A and B, and trying to
establish a direct peer to peer connection.
For example, this could be two devices
in people’s homes that are trying to
setup a video call.
Each of these hosts is in a
private network, and is connected to the
public Internet via a NAT. It’s possible,
indeed likely, that if these are home
networks, then both of the private networks
will be using the IP address range
192.168.0.0./16, since that’s the default for most
home NAT routers. A consequence is that
Host A and Host B could both
be using the same private IP address,
for example both hosts could be using
IP address 192.168.0.2.
This isn’t a problem, since Host A
and Host B are on different private
networks, each hidden behind a different NAT.
The two NATs have different public IP
address on the external interface of the
router, and what’s used internally is not
visible to the rest of the network.
How do these two hosts go about
establishing a connection?
Well, Host A can’t send a packet
directly to Host B, because it has
the same IP address. If it tries,
the packet will come straight back to
Rather, in order to connect to Host
B, Host A will have to discover
the external address and port number that
NAT B is using for packets sent
by Host B. It can then send
its packets to NAT B, that will
translate and forward them to host B.
To do this, the two peers,
Host A and Host B, both make
connections to a referral server located somewhere
on the public Internet. This is shown
in the dashed red lines on the
slide. They ask that server where their
packets appear to be coming from.
This process is known as binding discovery,
and lets the hosts find out how
their NAT is translating packets. The result
is a candidate address for each host,
that it thinks is the external address
of the NAT that will translate incoming
packets and forward them to it.
The peers then exchange these candidate addresses
with each other, via the referral server.
Once they’ve received the candidate addresses from
their peer, the two hosts systematically send
probe packets, to check if any of
these candidates actually work to reach the
peer. That is, the hosts check if
the outgoing probe packets they send will
correctly setup translation state in the NAT,
so that incoming probes from the peer
will be translated and forwarded to them.
And they check that there are no
firewalls that are blocking the traffic.
If the probes are successfully received,
in both directions, then the two hosts
can switch to using the direct peer-to-peer
path, shown as the solid blue line
on the slide, and no longer need
If the probes fail, then a direct
peer-to-peer connection may not be possible,
and the hosts may have to relay
all traffic via the referral server.
The process of finding out what translations
a NAT is performing is known as
NAT binding discovery.
The Session Traversal Utilities for NAT,
STUN, is a commonly used protocol that
performs NAT binding discovery in the Internet.
When a host on a private network
sends a packet to a host on
the public network, the NAT at the
edge of the private network will translate
the source IP address and port number
in the packet. The host on the
private network doesn’t know what translation has
been done, but the server that receives
the packet can inspect its source address
and port, to find out where it
For example, when using a UDP socket,
an application can use the recvfrom() system
call to retrieve both the contents of
a UDP packet and its source address.
Similarly, for TCP connections, the accept() system
call returns the address of the client.
The server then replies to the client,
telling it where the packet appeared to
come from. This is what’s known as
a server reflexive address. That is,
the address that a server thinks the
If there’s a NAT between the client
and the server, then the server reflexive
address will be different to the address
from which the client sent the packet.
If the client’s addresses and the server
reflexive address are the same, the client
knows there’s no NAT between it and
You might ask why a host that’s
in a private network doesn’t just ask
its NAT how it will translate the
packets? Two reasons.
The first is that by the time
we realised that binding discovery was needed,
there were already tens of millions of
NATs deployed, with no way to upgrade
them to add a way to ask
how they’ll translate packets.
The second is that a host might
not know that it’s behind a NAT,
or might be behind more than one
NAT, and so won’t know what NAT
to ask for the binding.
When performing binding discovery, it’s important that
a host discovers every possible candidate address
on which it might be reachable.
For example, think about a phone that
has both 4G and WiFi interfaces.
Each of these interfaces can have an
IPv4 address and an IPv6 address,
representing the point of attachment to networks
it directly connects to. This could be
a total of four possible IP addresses
for the phone.
The phone may be behind IPv4 NAT
routers on each of those interfaces,
and so each interface might also have
a server reflexive address on which it
can be reached, that the host can
discover using STUN. This can give another
two addresses, bringing the total to six.
It’s unlikely, but the phone could also
be connected via one or more IPv6
NATs. This potentially gives two more server
reflexive addresses on which it can be
In case these server reflexive addresses don’t
work, the phone may also be able
to use the referral server to relay
for it, using a protocol called TURN,
acting as a proxy to deliver traffic
if a direct connection isn’t possible.
This proxy endpoint might be accessible via IPv4 and IPv6.
The phone might also have a VPN
connection, and be able to send and
receive traffic via the VPN, as well
as directly. That VPN endpoint could be
accessible over IPv4 or IPv6, and might
itself be behind a NAT, so it’s
necessary to check for server reflexive addresses
on the VPN interface.
Not all of these will exist for
every device, of course, but the point
is that a modern networked device is
often reachable in many different ways.
If it’s to successfully connect to another
device, in a peer-to-peer manner, it needs
to find as many of these candidate
addresses as possible.
Having run a binding discovery protocol to
find all its possible candidates, a host
sends the list of candidates to the
referral server, and the referral server sends
them on to its peer. Its peer
does the same, and the host receives
the peer’s candidates via the referral server.
At this point, the two hosts know
each others candidate addresses, and are ready
to check which of the addresses work.
Given that the two peers can communicate
via the referral server, you might ask
why the peers bother to establish a
peer-to-peer connection, and don’t instead just keep
communicating via the relay?
The primary reason is because a direct
peer-to-peer connection is usually lower latency than
a connection via a relay, and for
peer-to-peer applications like video calls, latency matters.
The second is that the relay server
can eavesdrop on connections that it’s relaying,
but not on direct peer-to-peer connections.
This is perhaps less of a concern
than you might think, since the traffic
can be encrypted so it can’t be
read by the server. Also, the server
knows that the call is happening anyway,
and sometimes knowledge that two people are
talking is almost as sensitive as knowing
what they’re talking about.
Once they’ve exchanged candidates, the two hosts
systematically send probe packets from every one
of their candidate addresses to every one
of the peer’s candidate addresses in turn,
to see if the can establish a
The idea is that a probe packet
sent, for example, from Host A to
a server reflexive address of Host B,
will open a binding in NAT A,
even if it fails to reach host
B. This open binding will allow a
later probe from Host B to the
server reflexive address of Host A to
reach Host A. This will, in turn,
trigger Host A to probe again in
response, and this time the probe from
Host A to the server reflexive address
of Host B will succeed because the
probe from Host B opened the necessary
binding on NAT B. The two hosts
then start sending traffic and keep-alive messages
on that path to keep the bindings
active, while the probing continues on all
the other candidates.
The probing can take a long time,
so candidate addresses are assigned a priority
based on how likely the host thinks
it is to be reachable on that
address, and on its expectation of how
well that address will perform. The checks
take place in priority order, to quickly
try to find a pair of candidates
If more than one pair of candidate
addresses successfully succeeds, the hosts choose the
best path, for example the path with
the lowest latency, and drop the other connections.
The Interactive Connectivity Establishment algorithm, ICE,
described by the IETF in RFC 8445
describes this probing process is detail.
When making a peer-to-peer phone or video
call, the ICE algorithm and the probing
usually happens while the phone is ringing,
so the connection is ready when the
call is answered.
What should be clear by now is
that NAT binding discovery, and the systematic
connection probing needed for NAT traversal,
is complex, slow, and generates a lot
of traffic. The RFCs that describe how
the process works are almost 200 pages
long, and are not easy to implement
The result is reasonably effective for UDP
The STUN protocol, and the ICE algorithm,
were developed to support voice-over-IP applications,
that run over UDP, and the result works well.
It’s less effective for peer-to-peer TCP connections.
NATs tend to be quite permissive for
UDP, translating any incoming UDP packet that
reaches the correct address and port,
but are often stricter for TCP connections,
and check for matching TCP sequence numbers,
etc., This makes peer-to-peer TCP connections less
likely to be successful.
In this lecture, I’ve outlined how client-server
connection establishment works, and how the use
of TLS and IPv6 can affect connection
establishment, and can require connection racing using
the “happy eyeballs” technique. I also showed
that connection establishment latency is often a
critical factor limiting the performance of TCP connections.
In the later parts, I outlined how
and why NAT routers are used,
their advantages and disadvantages, and how NAT
traversal techniques work to establish
Establishing a connection used to be a
simple task. What I hope to have
shown you is that it’s no longer
simple, not in the client-server case,
and especially not when peer-to-peer connections are needed.
Lecture 2 discussed how to establish TCP connections in the fragmented
Internet we have today. It started with a review of the TCP service
model, and how it's supposed to establish client-server connections,
showed some of the factors that affect connection establishment
One of the key factors that affects performance is network latency, and
the number of round trip exchanges between the client and server needed
to establish the connection. With the aid of an example, I tried to
show that latency is often the main limiting factor is performance, not
Consider whether the example used to demonstrate this looks
reasonable to you? Did the results surprise you? Given this
behaviour, would you be willing to pay your ISP for a higher
bandwidth Internet connection?
The lecture then discussed dual-stack connection establishment for
networks that support both IPv4 and IPv6 hosts. As part of this, it
highlighted that the IPv4 and IPv6 networks are separate, and that
parallel connection establishment is needed.
Consider how does the complexity of parallel connection establishment
compare to the sequential DNS look-up code shown in lab 1? How would
you implement this parallel connection establishment and racing? Do
you think the complexity is worth the effort to speed-up connection
Finally, the lecture discussed network address translation and NAT
traversal, allowing several hosts to share an IP address. It showed
how NAT devices work, and discussed some of the problems they cause.
Review when does NAT work well and when is it problematic. Think
about what types of application do NATs break. Given this, why do
people still use NAT devices?
NAT traversal for peer-to-peer applications uses the combination of
binding discovery and the ICE algorithm to establish connections,
using a referral server to exchange addresses then probing to check
if candidate addresses work.
Review this algorithm to determine whether the approach makes sense.
How effectively do you think this approach to NAT traversal works?
How easy do you think it would be to implement?