Networked Systems H (2022-2023)
Lecture 3: Secure Communications
This lecture considers secure communications in the Internet. It
reviews the need for security, and the principles of encryption,
integrity protection, and authentication of messages. It explains
the principles of operation of the Transport Layer Security Protocol
(TLS), version 1.3, and how it protects Internet traffic. And it
briefly reviews some of the issues around writing secure software.
Part 1: Secure Communications
The 1st part of this lecture discussed the need for security in
Internet communications. It reviews why end-to-end encryption and
message integrity protection are essential to protect Internet users
for eavesdropping, identity theft, fraud, and other attacks. And it
discusses some of the tensions and concerns that have been raised
about the provision of such protection.
Slides for part 1
00:00:00.766
In the last lecture, I discussed the behavior of TCP
00:00:04.100
and some issues around connection establishment.
00:00:07.700
One of these issues was the observation
00:00:09.600
that establishing a secure connection, using TLS,
00:00:12.533
was slower than establishing an insecure connection.
00:00:16.100
In this lecture, I want to talk more about TLS
00:00:19.400
and about security in general.
00:00:23.400
In this first part,
00:00:24.400
I'll talk about why security is important,
00:00:26.900
and why we need to secure communications.
00:00:30.200
Then, in part two,
00:00:31.800
I'll talk about the principle of secure communication
00:00:34.833
and the cryptographic techniques
00:00:36.533
that can be used to protect data.
00:00:39.866
Part three of the lecture will describe
00:00:41.600
some of the behavior of the transport layer security
00:00:44.033
protocol, that provides security for most Internet traffic.
00:00:48.100
And, finally, in part four,
00:00:50.033
I'll talk about some general issues around network security,
00:00:53.500
and how to write secure networks applications.
00:00:59.133
So why do we need secure communications?
00:01:03.300
Well, the fundamental problem
00:01:05.566
is that it's possible to eavesdrop on network traffic.
00:01:09.866
This can be done by wiretapping the network links
00:01:12.666
down which the data flows,
00:01:14.366
or it can be done by configuring the network routers
00:01:17.333
to save a copy of the packets they forward.
00:01:20.700
The result is that traffic passing across the network
00:01:23.700
can be monitored by third parties.
00:01:26.666
If you want to ensure that the data you send
00:01:28.866
across the network is private,
00:01:30.566
then that data needs to be encrypted somehow.
00:01:34.700
Similarly, network routers can modify
00:01:37.133
the packets they forward.
00:01:39.566
This means that the router can change the data
00:01:42.000
being delivered without the consent of the sender.
00:01:45.533
The sender cannot stop this happening.
00:01:47.966
But they can add some message integrity protection,
00:01:51.000
such as a digital signature,
00:01:53.100
to allow the receiver to detect and reject
00:01:55.700
messages that have been tampered with.
00:01:59.266
Finally, there are numerous devices in the network,
00:02:02.533
known as middle boxes,
00:02:03.933
that try to improve communication
00:02:06.266
by somehow interpreting or modifying the data being sent.
00:02:10.933
For example, we spoke about network address
00:02:13.800
translation in the last lecture
00:02:15.733
where a NAT router rewrites the addresses and ports
00:02:18.566
in TCP/IP headers to allow several machines
00:02:21.800
to share a single single IP address.
00:02:25.733
Other examples include network firewalls,
00:02:28.233
that monitor traffic and try and prevent bad traffic
00:02:30.933
from entering a network,
00:02:32.266
as well as the various accelerator devices
00:02:34.666
that try to improve the performance of TCP
00:02:36.800
connections running over satellite links.
00:02:40.300
If not carefully maintained,
00:02:42.033
these devices tend to lead to network ossification,
00:02:45.700
where they tend to limit the ability to
00:02:47.833
change network protocols.
00:02:50.633
A final rule of secure communications
00:02:53.633
is therefore to limit the ability of such devices to inspect
00:02:56.600
and act on the traffic,
00:02:58.100
so helping to ensure that the network
00:03:00.266
can continue to evolve.
00:03:05.933
A lot of different organizations monitor the network,
00:03:09.166
for many different reasons.
00:03:13.000
These include governments, intelligence agencies,
00:03:15.600
and law enforcement agencies.
00:03:17.933
For example, the police have to monitor the network
00:03:21.366
as part of their crime prevention activities;
00:03:24.200
domestic intelligence agencies inspect traffic
00:03:26.966
to protect against terrorism, or to monitor foreign targets;
00:03:30.733
and foreign intelligence agencies might try to
00:03:33.533
spy on domestic targets.
00:03:36.800
That this happens shouldn't be a surprise.
00:03:40.666
And are clearly good reasons for some of this monitoring.
00:03:46.366
Many people would agree, I think,
00:03:48.166
that targeted wiretaps on suspected criminals,
00:03:51.133
subject to appropriate oversight,
00:03:53.233
the need to obtain a warrant of some sort,
00:03:55.500
and when there's probable cause,
00:03:57.633
are probably not unreasonable.
00:04:01.500
Relatively few people would object
00:04:03.800
to actively monitoring the network traffic of those
00:04:06.466
actively suspected of being engaged in serious crimes,
00:04:09.800
terrorist activities, child abuse, and so on.
00:04:14.866
People differ on what crimes they consider serious,
00:04:18.433
or on the standards of probable cause,
00:04:20.966
or on the amount of oversight needed.
00:04:24.000
But all societies accept some degree of monitoring
00:04:26.633
and oversight of network traffic.
00:04:30.566
However, Edward Snowden showed that
00:04:33.833
some intelligence agencies, including,
00:04:36.900
but certainly not limited to the five eyes,
00:04:39.500
the UK, the US, Canada, Australia, and New Zealand,
00:04:43.733
were conducting pervasive monitoring of all network traffic.
00:04:49.400
Other governments are also known to conduct such monitoring.
00:04:53.100
The great firewall of China is a common example,
00:04:56.100
along with monitoring by Russia,
00:04:57.966
Iran, Saudi Arabia, and others.
00:05:02.166
Many felt that this indiscriminate monitoring
00:05:04.933
of all network traffic without probable cause or suspicion,
00:05:08.500
was a step too far.
00:05:11.766
In part, I think this came from distrust
00:05:14.433
of those governments, their motives,
00:05:16.366
and how they might use the data.
00:05:19.433
The people they were supposed to represent were unconvinced
00:05:22.200
that the monitoring was actually doing them good.
00:05:25.833
But, in part, there was also the realization
00:05:28.833
that if supposedly friendly governments
00:05:30.933
were monitoring traffic indiscriminately,
00:05:33.266
then so were others.
00:05:36.166
Even if I completely trust our government
00:05:39.033
to monitor Internet traffic only good reasons,
00:05:41.833
the fact that they're able to monitor that traffic
00:05:45.133
means that others are able to do so too.
00:05:48.266
And those others might not have my best interests at heart.
00:05:53.633
This led to a push to enable pervasive encryption,
00:05:56.766
to encrypt more and more of the traffic
00:05:58.933
crossing the Internet.
00:06:01.333
The most visible manifestation of this
00:06:03.700
is that most websites now use HTTPS
00:06:06.433
and encrypt their traffic.
00:06:07.933
But the spread of encryption has been wider than the web.
00:06:12.400
The result is that most Internet traffic
00:06:14.800
is now encrypted by default,
00:06:16.700
hindering, but not preventing, pervasive monitoring .
00:06:23.200
Governments and not the only organizations
00:06:25.833
to monitor network traffic, of course.
00:06:29.100
We've all contacted a business and been told that our
00:06:31.833
call may be monitored for quality and training purposes.
00:06:36.733
Some of this monitoring by businesses is necessary
00:06:39.500
for regulatory compliance.
00:06:42.133
Banking and insurance industries, for example,
00:06:44.833
require records to be kept in most cases, to prevent fraud.
00:06:49.433
There are good reasons for some of this monitoring.
00:06:53.833
Other aspects of monitoring and tracking by
00:06:56.100
businesses are perhaps less beneficial.
00:06:59.333
Targeted advertising and customer profiling is
00:07:02.300
frequently cited as problematic, for example.
00:07:06.300
Communication security measures, such as encryption,
00:07:09.600
can help reduce such unwanted monitoring,
00:07:13.800
though the effect is small, since this type of
00:07:16.066
monitoring and tracking is often delivered
00:07:18.133
by the sites we intentionally visit,
00:07:20.233
rather than by snooping on communications.
00:07:27.300
We also see network operators
00:07:29.733
monitoring traffic on the networks they operate.
00:07:33.400
Again, there are both beneficial,
00:07:35.800
and problematic, reasons for this.
00:07:39.366
Network operators monitor traffic
00:07:41.533
to understand how well their networks are operating,
00:07:44.333
and whether they're meeting their quality of service goals.
00:07:48.800
it's common, for example,
00:07:50.533
for network operators to inspect
00:07:52.400
the sequence and acknowledgement numbers
00:07:54.433
in the headers of TCP packets traversing their networks.
00:07:59.000
This lets them understand if packets are being lost,
00:08:01.766
or if the time taken for packets to traverse
00:08:04.333
the network is building up,
00:08:06.400
both of which are signs that the network
00:08:08.166
is becoming overloaded.
00:08:11.166
This helps the operators decide when to reroute traffic
00:08:14.366
onto less busy paths, or when to install
00:08:17.066
more network capacity to keep good performance.
00:08:20.566
And a few would argue that this sort of
00:08:22.733
monitoring is a problem.
00:08:26.066
On the other hand, operators can monitor to traffic
00:08:29.200
to profile what sites that customers are visiting.
00:08:32.600
This information could then be sold to advertisers,
00:08:35.533
or could be used to negatively influence
00:08:37.900
the performance at the traffic.
00:08:40.500
For example, an operator might choose to lower the
00:08:43.100
priority of Netflix traffic
00:08:44.666
for customers who haven't signed up
00:08:46.133
to their video streaming package.
00:08:49.433
Many people are less comfortable with such behaviors,
00:08:52.566
and communication security measures can limit
00:08:55.066
their effectiveness.
00:08:59.300
Finally, of course, are criminals and malicious users
00:09:02.666
that try to steal data and user credentials,
00:09:05.566
that try to perform identity theft,
00:09:07.600
or conduct other attacks.
00:09:10.533
Communication security clearly cannot prevent
00:09:13.366
all such attacks, but it can limit their scope
00:09:16.800
by limiting the amount of information that's available
00:09:19.466
and visible to those monitoring the networks.
00:09:26.366
As a result of these various attacks,
00:09:28.366
there are a range of measures that can be deployed
00:09:30.666
that can help to protect
00:09:31.800
privacy by encrypting network traffic.
00:09:35.500
Unfortunately, what makes this problem space challenging,
00:09:39.333
is that the mechanisms used to protect
00:09:41.433
against malicious attacks also prevent benign monitoring.
00:09:46.700
There's no known way to stop criminals
00:09:49.066
and malicious attackers from accessing private data
00:09:52.066
that doesn't also stopped legitimate law enforcement
00:09:55.100
from doing so, for example.
00:10:01.533
In addition to monitoring and observing data
00:10:03.666
as it traverses the network,
00:10:05.266
many organizations might also try to modify messages.
00:10:10.266
Governments and law enforcement, for example,
00:10:13.266
might require ISPs to censor,
00:10:15.466
or modify, DNS responses
00:10:17.300
to restrict access to certain sites.
00:10:20.133
They might require DNS responses to be modified
00:10:23.000
to indicate that certain sites don't exist,
00:10:25.633
or to change the addressing the DNS response
00:10:28.366
to direct users to a page indicating that the
00:10:30.700
content is blocked.
00:10:33.700
Alternatively, governments might require ISPs
00:10:36.166
and network operators to block or rewrite traffic
00:10:39.266
containing certain content.
00:10:43.166
As with government traffic monitoring,
00:10:45.266
there can be reasonable, and unreasonable,
00:10:47.566
reasons for governments to modify messages.
00:10:51.833
Many countries have widely accepted laws
00:10:54.466
about restricting hate speech,
00:10:56.366
blocking child pornography,
00:10:58.166
or preventing terrorism.
00:11:01.033
Part of the implementation of such laws
00:11:03.433
is often by modifying DNS responses
00:11:06.033
to limit access to certain sites.
00:11:09.966
The same techniques can, of course,
00:11:12.400
also be used to block other types of content,
00:11:15.166
or restrict other kinds of speech.
00:11:19.833
Businesses and network operators might also block
00:11:22.700
or modify contact.
00:11:24.700
The DNS server in a cafe, or a train,
00:11:27.633
that redirects you to a sign up page,
00:11:29.400
and asks asks for payment before letting you browse the web
00:11:32.300
on their Wi-Fi is an example.
00:11:35.400
Other examples might be services that filter spam
00:11:38.266
or block malicious attachments,
00:11:40.200
that enforce terms of service,
00:11:42.166
or that try to prevent copyright infringement.
00:11:46.666
And finally, of course, there are criminals,
00:11:48.666
and malicious users,
00:11:50.166
people modifying content to conduct phishing scams,
00:11:53.100
steal identity, mislead, and defraud.
00:11:57.800
And, again, what makes this problem space challenging
00:12:01.433
is that mechanisms that protect message integrity
00:12:03.966
against malicious attackers
00:12:05.900
also prevent benign modification.
00:12:10.400
For example, a recent development
00:12:12.833
in network security is DNS over HTTPS.
00:12:17.300
This is an approach to encrypting DNS traffic
00:12:20.233
that was designed to protect users from phishing attacks
00:12:23.500
where an attacker on the local networks
00:12:25.566
spoofs DNS responses to perform identity theft.
00:12:29.633
It does this successfully.
00:12:32.900
Unfortunately, some Internet service providers in the UK
00:12:37.333
intentionally spoofed DNS responses
00:12:40.766
to block access to sites hosting child abuse material,
00:12:44.100
as part of a government government mandated blocklist.
00:12:49.200
Encrypting DNS traffic using DNS over HTTPS
00:12:53.766
to protect, to prevent against, identity theft
00:12:57.733
unintentionally also prevented
00:13:00.333
the child abuse block list from working,
00:13:02.300
since both relied on the same vulnerability in DNS.
00:13:07.433
And again, this is an area, whether a difficult questions,
00:13:10.933
and it's not we have all the right answers.
00:13:18.266
The final reason for securing communications
00:13:20.900
relates to protocol ossification.
00:13:24.233
it's common for network operators to deploy middle boxes,
00:13:27.300
of various sorts, to monitor and modify traffic.
00:13:32.600
These can be devices such as NATS and firewalls,
00:13:35.400
traffic shapers, filters, or protocol accelerators.
00:13:39.666
And these middle boxes need to understand the traffic
00:13:42.566
they're observing or modifying.
00:13:45.166
For example, in order to translate IP addresses and ports,
00:13:48.966
a NAT needs to know the format of an IP packet,
00:13:52.500
and where the ports are located in the TCP and UDP header.
00:13:57.500
Equally, a traffic shaping device,
00:14:00.100
intended to limit the throughput of TCP connections
00:14:03.000
for a particular user,
00:14:04.366
needs to understand the congestion control
00:14:06.933
algorithm used by TCP,
00:14:08.800
otherwise how can it influence
00:14:10.766
the sending rate of a connection?
00:14:14.700
This means that the network becomes more complex.
00:14:18.566
It means that devices in the network no longer just look at
00:14:21.766
the IP headers and forward the packets
00:14:23.800
based on the destination address.
00:14:26.066
They also understand details of TCP and UDP,
00:14:29.866
and other protocols,
00:14:31.433
and observe inspect and modify those protocols too.
00:14:36.300
And this leads to a problem known as protocol ossification,
00:14:40.566
where it becomes difficult to change the protocols
00:14:43.766
running between the endpoints,
00:14:45.533
because doing so interacts poorly with middle boxes
00:14:48.333
that don't understand the new version of the Protocol.
00:14:52.400
For example, it'd be very difficult to change the format
00:14:55.500
of the TCP header now, even if we could
00:14:58.566
upgrade all the systems to support the new version,
00:15:01.400
because of all the NATs and firewalls
00:15:03.833
that would also need updating.
00:15:07.300
This protocol ossification,
00:15:09.466
where the network learns about the transport
00:15:11.766
and higher layer protocols,
00:15:13.366
effectively prevents those protocols from being upgraded,
00:15:16.733
and occurs because the network has visibility
00:15:19.466
into those protocols.
00:15:22.933
Encryption offers one way to prevent ossification.
00:15:27.700
The more of a protocol that's encrypted,
00:15:30.033
the easier it is to change that protocol,
00:15:32.566
since the encryption will have stopped middleboxes
00:15:35.166
from understanding or modifying the data.
00:15:39.200
There's a trade off, though,
00:15:40.933
between the ability to change end-to-end protocols
00:15:43.733
and the ability of the networks offer helpful features.
00:15:47.866
The more of a protocol that's encrypted,
00:15:50.200
the easier it is to change the protocol.
00:15:53.300
But the harder it is for middle boxes,
00:15:55.400
to provide help from the network.
00:15:58.733
The draft shown on the slide,
00:16:00.966
on "Long-term viability of protocol extension mechanisms",
00:16:04.266
talks about these issues further,
00:16:06.100
and talks about how to extend and modify protocols
00:16:08.866
and ensure that protocols remain changeable.
00:16:11.466
It'ss very much worth reading.
00:16:18.366
As we've seen there are good reasons to encrypt
00:16:21.900
and authenticate data.
00:16:24.533
Doing so helps to provide privacy,
00:16:26.733
it helps to prevent fraud,
00:16:28.366
and it helps to allow protocols to evolve
00:16:30.600
while avoiding network ossification.
00:16:34.033
Providing security in this way is a good thing,
00:16:36.733
but they're always trade offs,
00:16:38.300
and I've tried to highlight some of these.
00:16:41.433
In particular, it's always possible to find examples
00:16:45.033
where providing security to protect against some attacker
00:16:48.633
will prevent some beneficial monitoring or service.
00:16:53.533
There are no easy solutions here.
00:16:58.500
It's easy to argue that we must encrypt everything
00:17:01.700
to ensure privacy,
00:17:03.100
missing that this causes some real problems.
00:17:07.433
Equally, it's easy to argue that law enforcement
00:17:10.766
should have exceptional access to communications,
00:17:13.533
to help prevent terrorism and child abuse, for example,
00:17:16.833
missing, that there are very real risks that this will cause
00:17:20.500
serious other problems.
00:17:24.466
We need more dialogue between engineers,
00:17:27.400
protocol designers, network operators,
00:17:30.100
policymakers, and law enforcement,
00:17:32.566
to better understand the constraints and the concerns.
00:17:38.200
The "Keys Under Doormats" paper, linked from the slide,
00:17:41.233
talks about these issues in more detail,
00:17:43.433
and I very much encourage you to read it.
00:17:48.933
Finally, as more and more data is encrypted and protected,
00:17:52.900
we're also starting to see increasing discussion
00:17:55.700
of end system based content monitoring.
00:17:59.866
The argument here is that encryption is important
00:18:02.733
to prevent attacks by malicious users,
00:18:05.400
but that law enforcement need access to protect us.
00:18:09.000
But, since effective encryption prevents law enforcement
00:18:12.466
from monitoring traffic on the network,
00:18:14.400
then maybe they should be able to monitor the traffic
00:18:16.833
on the end systems, after it's traversed the network.
00:18:20.866
And there's a certain appeal to this.
00:18:24.933
If done correctly, the encryption provides
00:18:27.833
protection against a large class of attacks,
00:18:30.500
and correct implementation of end-system based monitoring
00:18:34.033
limits who can monitor traffic
00:18:35.633
to those with legitimate needs and legitimate authority.
00:18:40.200
And, in some cases that's an appropriate compromise.
00:18:45.100
It doesn't seem problematic for social networks
00:18:48.033
like Facebook,for example,
00:18:49.566
to support law enforcement in monitoring their network
00:18:52.933
to detect people sharing child abuse material.
00:18:56.833
But,
00:18:58.633
as Apple found out when they announced that they were
00:19:00.900
to implement similar monitoring running on iPhones
00:19:03.533
for one-to-one and group iMessage chats,
00:19:07.433
the expectations around privacy,
00:19:09.800
law enforcement access, and abuse protection,
00:19:12.633
vary very much between social networks,
00:19:15.566
one-to-one communications,
00:19:17.466
group communications, and public posts.
00:19:20.366
And the boundaries between these categories,
00:19:22.866
and what's acceptable in terms of monitoring
00:19:25.466
and protection and privacy,
00:19:27.400
can be very hard to distinguish.
00:19:31.333
And again, there are some difficult questions
00:19:33.600
relating to what type of privacy protection
00:19:36.133
and what type of monitoring is technically
00:19:38.500
possible to implement on end-systems,
00:19:40.933
and what's socially acceptable,
00:19:43.033
and what's desirable.
00:19:46.100
And the the paper on the slide,
00:19:48.600
"Bugs in our pockets",
00:19:49.700
talks about this issue in a lot more detail.
00:19:56.133
So that wraps up the discussion of why
00:19:58.766
secure communication is needed.
00:20:02.166
Network traffic is frequently monitored
00:20:04.866
by governments, businesses,
00:20:07.166
network operators, and malicious users.
00:20:10.466
Some of this monitoring is beneficial,
00:20:13.100
some of it less so.
00:20:15.966
In the following parts, I'll talk about
00:20:18.200
the technologies we can use to provide privacy,
00:20:21.433
to protect message integrity,
00:20:23.266
and to protect and prevent protocol ossification.
Part 2: Principles of Secure Communication
The 2nd part of the lecture reviews the principles of secure
communication. It describes the concepts behind symmetric, public-key,
and hybrid cryptography. It outlines techniques for message integrity
protection and authentication including cryptographic hash functions
and digital signatures. And it reviews the need for a public key
infrastructure.
Slides for part 2
00:00:00.233
In this part, I want to talk
00:00:02.200
about some of the principles of secure
00:00:04.366
communication. I’ll talk about how we go
00:00:06.566
about ensuring confidentiality of messages as they
00:00:08.733
traverse the network.
00:00:09.766
About how we authenticate messages to ensure
00:00:12.500
that they're not modified in transit,
00:00:14.800
and about how we can go about
00:00:17.500
validating the identity of the participants in
00:00:20.233
a communication.
00:00:21.100
So what are the goals of secure communication?
00:00:24.933
Well, we're trying to deliver a message
00:00:27.066
across the internet from a sender to a receiver.
00:00:30.600
In the process we want to avoid
00:00:32.833
eavesdropping on the message – we need
00:00:35.033
to encrypt it in order to provide
00:00:37.266
confidentiality, to make sure no one other
00:00:39.500
than the intended receiver can have access
00:00:41.700
to the content of the message.
00:00:43.700
We want to avoid tampering with the
00:00:45.666
message – we need to authenticate the
00:00:47.600
message to ensure that it's not modified
00:00:49.533
in transit by any of the devices
00:00:51.466
which are which are involved in the
00:00:53.433
delivery of that message.
00:00:54.633
And we want to avoid spoofing –
00:00:57.233
we want to somehow validate the identity
00:00:59.800
of the sender, so that the receiver
00:01:01.733
knows, and can be sure of who the message came from.
00:01:07.000
So how do we go about providing confidentiality?
00:01:10.300
Well unfortunately data traversing the network can
00:01:13.033
be read by any of the devices
00:01:15.033
on the path between the sender and the receiver.
00:01:17.566
It's possible to eavesdrop on packets as
00:01:19.266
they traverse the links that comprise the
00:01:21.000
network. And it's also possible to configure
00:01:23.066
the switches or routers to snoop on
00:01:25.166
the data as they're forwarding it between
00:01:27.266
the different links in the network.
00:01:29.333
The network operator can always do this.
00:01:32.366
They own the network;
00:01:33.933
they can configure the devices to save
00:01:36.000
a copy of the data if they choose to do so.
00:01:38.600
If the network's been compromised, maybe so can others.
00:01:42.333
If an attacker can break
00:01:43.800
into the routers, for example, there's nothing
00:01:46.500
stopping them saving the data, redirecting copies
00:01:49.200
of data traversing the network to some other location.
00:01:52.800
If the data can always be read,
00:01:55.366
how do we provide confidentiality?
00:01:57.300
Well, we use encryption to make sure
00:01:59.400
that the data is useless if it's
00:02:01.500
intercepted or copied. We can't stop an
00:02:03.600
attacker, or the network operator, from reading
00:02:05.700
our data. But we can make sure
00:02:07.333
that they can't make sense of it
00:02:09.166
if they do read it.
00:02:11.500
There are two basic approaches to providing encryption.
00:02:15.233
The first is called symmetric cryptography.
00:02:18.066
Algorithms such as the Advanced Encryption Standard, AES.
00:02:22.133
The other approach is what's known as
00:02:24.466
public key cryptography.
00:02:25.700
Algorithm such as the
00:02:27.033
Diffie-Hellman algorithm, the RSA algorithm, and elliptic
00:02:30.100
curve algorithms.
00:02:31.700
They have quite different properties and are
00:02:34.200
used in different situations. I’ll talk about
00:02:36.700
the details and the differences between them in a minute.
00:02:40.366
Both of them are based on some
00:02:42.666
fairly complex mathematics. I'm not going to
00:02:44.933
attempt to describe how that works.
00:02:47.066
What's important is not the details of
00:02:49.133
the maths. But what are their properties,
00:02:51.433
what behaviours do they provide, and how
00:02:53.300
do they help us secure data as it traverses the network?
00:02:57.366
So we’ll start with the idea of symmetric cryptography.
00:03:01.300
The idea of symmetric encryption is that
00:03:03.566
it can convert plain text into cipher
00:03:05.833
text with the aid of a key.
00:03:08.700
If you have, for example, the plain
00:03:10.433
text as we see on the top-right
00:03:12.666
of the slide, and we pass it
00:03:14.933
through the encryption algorithm, in this case,
00:03:17.166
the AES Advanced Encryption Algorithm, with the
00:03:19.400
aid of an encryption key, we get
00:03:21.633
a blob of encrypted text as we
00:03:23.900
see it in the middle.
00:03:25.600
If we pass that encrypted text through
00:03:28.700
the inverse algorithm, the decryption algorithm,
00:03:31.333
using the same key, then we get
00:03:34.433
the original text back out.
00:03:36.766
The point is that a single secret
00:03:39.200
key controls both the encryption and the
00:03:41.666
decryption process. The key used to encrypt
00:03:44.100
is the same as the key used
00:03:46.566
to decrypt.
00:03:47.366
Now, provided the key is kept secret.
00:03:49.900
And it's known only to the sender
00:03:52.433
and receiver. This can be very secure,
00:03:54.933
and it can be very fast.
00:03:57.200
Symmetric algorithms such as AES can encrypt
00:04:00.433
and decrypt many gigabits per second.
00:04:03.200
This makes them very suitable for Internet
00:04:06.433
communications because they don't slow down the
00:04:09.666
communications, while still providing security.
00:04:12.100
There are a wide range of different
00:04:15.333
symmetric encryption algorithms, probably the most widely
00:04:18.566
used is the US Advanced Encryption Standard, AES.
00:04:22.600
The AES algorithm was developed as part
00:04:24.933
of the output of an open competition,
00:04:27.533
run by the US National Institute of
00:04:30.100
Standards, and it's actually a Dutch algorithm
00:04:32.700
known as Rijndael.
00:04:33.900
Importantly, the AES algorithm, the Rijndael algorithm,
00:04:36.700
is public and the security of the
00:04:39.533
algorithm depends only on keeping the key
00:04:42.333
secret, not on keeping the algorithm itself secret.
00:04:45.966
The link on the slide is a
00:04:47.966
pointer to the specification for the algorithm,
00:04:50.266
and there’s a large amount of open
00:04:52.566
source code which implements it.
00:04:54.333
The problem of symmetric cryptography is that
00:04:56.900
you need to keep the key secret.
00:04:59.500
If anyone other than the sender and
00:05:02.066
the receiver know the key, then the
00:05:04.666
security of the encryption fails.
00:05:06.600
The question then, is how do you
00:05:09.100
security distribute the key? If you want
00:05:11.600
to exchange message a secure message with
00:05:14.100
someone I know well, then this is
00:05:16.600
straightforward. I can meet them in person,
00:05:19.100
give them the key, and ensure that
00:05:21.600
no one else can eavesdrop on that communication.
00:05:24.833
The problem comes when I'm trying to
00:05:26.700
communicate securely with someone where I can't
00:05:28.833
meet them in person.
00:05:30.166
How do I securely get a key
00:05:32.400
from an Internet shopping site, for example?
00:05:34.666
The only means of communication. I have
00:05:36.900
is over the Internet. And if I
00:05:39.166
send the key over the Internet,
00:05:41.066
someone can eavesdrop on the key,
00:05:43.000
and that gives them the ability to
00:05:45.266
decrypt our communications and breaks the security.
00:05:47.600
The solution to this is an approach
00:05:50.466
known as public key cryptography.
00:05:52.600
public key cryptography, like symmetric cryptography,
00:05:54.833
is used to convert a plain text
00:05:57.466
message into an encrypted form. The difference,
00:06:00.100
though, is that there are two different
00:06:02.733
keys, and the key used to encrypt
00:06:05.333
the message, and the key to decrypt
00:06:07.966
the message are different
00:06:09.566
The keys come in pairs. The two
00:06:11.633
halves of the pair are known as
00:06:13.733
the public key and the private key.
00:06:15.900
Importantly, a message which is encrypted using
00:06:18.466
one of those keys can only be
00:06:21.000
decrypted using the other key. If the
00:06:23.566
message is encrypted with the public key,
00:06:26.133
for example, then only the private key
00:06:28.666
can decrypt that message.
00:06:30.233
As you might expect from the names.
00:06:32.400
The idea is that you keep the
00:06:34.566
private key from the key pair secret,
00:06:36.766
and you make the public key as
00:06:38.933
public as is possible.
00:06:40.266
You publish it in the phone book,
00:06:42.200
you put it on your webpage,
00:06:43.866
you write it on your business card,
00:06:45.833
and you make sure everybody knows that
00:06:47.766
this is your public key.
00:06:49.266
In order to send you a message,
00:06:51.566
someone looks up your public key and
00:06:53.866
uses that to encrypt the message.
00:06:55.933
Once the message has been encrypted using
00:06:58.333
a particular public key, the only thing
00:07:00.733
which can decrypt it is the corresponding
00:07:03.166
private key. And since the private key
00:07:05.566
has been kept private, you're the only
00:07:07.966
one who can receive the message.
00:07:10.133
This solves the key distribution problem.
00:07:12.500
Provided you can look up the appropriate
00:07:15.266
public key for the receiver in a directory,
00:07:19.066
and you can trust that the receiver
00:07:20.633
has kept their private key secret,
00:07:22.433
then you use their public key to
00:07:24.533
encrypt the message, and you know that
00:07:26.600
they're the only one who can decrypt it.
00:07:29.433
This allows Internet shopping sites, and the
00:07:31.633
like, to work. If I wish to
00:07:33.266
buy something from Amazon, I look up
00:07:35.333
the key for Amazon in a directory,
00:07:37.433
use that to encrypt the message I'm
00:07:39.500
sending to Amazon, and I know that
00:07:41.600
they're the only ones that can decrypt it.
00:07:44.266
The problem with public key cryptography is
00:07:46.833
that it’s very slow. The public key
00:07:49.600
algorithms such as the Diffie-Hellman algorithm,
00:07:52.000
the RSA algorithm,
00:07:53.266
and the elliptic curve algorithms, work millions
00:07:56.300
of times slower than symmetric encryption algorithms.
00:07:59.333
The result is that they’re too slow
00:08:02.366
to use for any realistic amount of
00:08:05.366
communication. The performance just isn't there.
00:08:08.066
Accordingly, modern communications use what's known as
00:08:11.433
hybrid cryptography, where they use a combination
00:08:14.800
of both public key and symmetric cryptography.
00:08:18.266
This provides both security and speed.
00:08:21.866
The way this works is that the
00:08:24.666
sender and receiver use public key cryptography,
00:08:27.466
which is very slow, to exchange a
00:08:30.266
small amount of information.
00:08:31.966
That information is then used as the
00:08:34.633
key for the symmetric encryption algorithm,
00:08:36.866
which is very fast.
00:08:38.500
In detail, the sender chooses a random
00:08:41.133
value, that we’ll call Ks, which will
00:08:43.733
be used as the key for the symmetric encryption.
00:08:47.233
The sender then looks up the receiver’s
00:08:49.933
public key, Kpub, uses it to encrypt
00:08:52.600
Ks and sends the result to the receiver.
00:08:56.066
The receiver uses its corresponding private key,
00:08:59.133
Kpriv, to decrypt the message and retrieve Ks.
00:09:03.200
This securely transfers Ks, the key for
00:09:07.000
the symmetric encryption algorithm, from the sender
00:09:10.300
to the receiver.
00:09:11.933
Doing this using public key encryption is
00:09:14.466
very slow, but the key for the
00:09:16.966
symmetric encryption, Ks, is very small,
00:09:19.100
so the fact it's very slow doesn't matter.
00:09:22.266
The sender, then uses that key,
00:09:24.866
Ks, to encrypt future messages using symmetric
00:09:28.133
cryptography, for example, using the AES algorithm.
00:09:31.466
The receiver also has Ks, which it
00:09:34.100
exchanged using the public key encryption,
00:09:36.333
and can use that to decrypt the messages.
00:09:39.733
Symmetric cryptography is very fast, so the
00:09:42.400
performance of the communication, once it's got
00:09:45.400
started, is very quick, but it requires
00:09:48.400
the key to be exchanged securely.
00:09:50.966
The public key algorithm, which is slow,
00:09:53.933
is used to securely exchange the key.
00:09:57.033
The result is something which achieves both
00:10:01.266
confidentiality, and solves the key distribution problem,
00:10:05.533
and also achieves good performance.
00:10:08.666
Encryption gives you confidentiality of data and
00:10:10.833
makes sure that no one can eavesdrop
00:10:13.000
on the messages being sent from the
00:10:15.200
sender to the receiver.
00:10:16.533
We also, though, need to verify the
00:10:18.866
identity of the sender, and make sure
00:10:21.166
that messages haven't been modified in transit.
00:10:23.600
In order to do this, we generate
00:10:26.033
a digital signature to authenticate our messages.
00:10:28.466
And the receiver can then validate that
00:10:30.900
signature, check the signature, to make sure
00:10:33.300
they came from the expected sender.
00:10:35.500
The digital signature relies on a combination
00:10:39.400
of public key cryptography,
00:10:41.066
and a cryptographic hash algorithm.
00:10:44.366
So first of all, what is a cryptographic hash?
00:10:47.966
A cryptographic hash function is a function
00:10:50.733
that takes some arbitrary length input and
00:10:53.533
produces a fixed length output hash that
00:10:56.300
somehow represents that input.
00:10:58.000
For example, at the top of the
00:11:00.466
slide, we see some input text going
00:11:02.933
through a hash algorithm, known as SHA256,
00:11:05.400
that produces the fixed length output block
00:11:07.866
you see on the right.
00:11:09.766
A cryptographic hash algorithm has four fundamental
00:11:12.533
properties. The first is that every input
00:11:15.300
will generate a different output, and the
00:11:18.100
slightest change to the input will change
00:11:20.866
the output value.
00:11:22.166
The second is that it should be
00:11:24.466
infeasible to give to find two inputs
00:11:26.733
that gives the same output.
00:11:28.466
The third is that calculating the hash
00:11:30.800
itself should be fast, and going from
00:11:33.100
input to output should happen very quickly.
00:11:35.533
And the fourth, and perhaps most important,
00:11:37.800
is that reversing a hash should be
00:11:40.100
infeasible. If you're only given the output,
00:11:42.400
there should be no way of finding
00:11:44.666
out what the inputs was.
00:11:46.400
A cryptographic hash therefor acts as a
00:11:49.200
unique fingerprint for the input data.
00:11:51.600
It provides a short output, that uniquely
00:11:54.400
identifies a given message.
00:11:56.100
There are many different cryptographic hash algorithms.
00:11:59.800
The current recommendation is the SHA256 over
00:12:03.500
specified by the IETF in RFC 6234.
00:12:07.300
There are a number of older algorithms,
00:12:10.066
such MD5 and SHA1, which you may
00:12:12.866
hear about, but these all have known
00:12:15.666
security flaws and are not recommended for use.
00:12:19.466
So how can we use a cryptographic
00:12:21.333
hash to help build a digital signature?
00:12:23.800
Well, in order to do that,
00:12:25.933
you take the message you wish to
00:12:28.400
send, and you calculate a cryptographic hash
00:12:30.900
of that message.
00:12:32.066
The sender that encrypts that hash with
00:12:34.300
their private key. Now the private key
00:12:36.533
is known only to the sender,
00:12:38.433
so they're the only one who can
00:12:40.633
encrypt that message.
00:12:41.700
But the thing which would decrypt it
00:12:44.133
is the sender’s public key, which is
00:12:46.566
available to everybody. Encrypting the hash with
00:12:48.966
the sender’s private key doesn't provide any
00:12:51.400
confidentiality, because anyone can decrypt the message
00:12:53.833
using the public key.
00:12:55.333
What it does do though, provided the
00:12:57.633
sender can be trusted to keep its
00:12:59.966
private key private, is demonstrate that the
00:13:02.266
sender must have encrypted the hash.
00:13:04.266
Since the hash is a fingerprint of
00:13:06.566
the message, this means that the sender
00:13:08.900
must have generated the original message.
00:13:10.966
The sender then attaches the encrypted hash
00:13:14.033
to the message, forming the digital signature.
00:13:17.200
The message, and its digital signature,
00:13:19.833
are then encrypted and sent to the
00:13:22.933
receiver using hybrid encryption.
00:13:24.766
When the message arrives at the receiver,
00:13:27.466
the receiver can verify the signature.
00:13:29.866
To do this, it first decrypt that
00:13:32.566
the message and its digital signature.
00:13:34.900
The receiver then takes the message itself,
00:13:37.600
and calculates its cryptographic hash.
00:13:39.633
Having done that, it takes the digital
00:13:42.333
signature, looks up the sender’s public key,
00:13:45.000
and uses that to decrypt the digital
00:13:47.700
signature to retrieve the original
00:13:49.700
cryptographic hash that was in the message.
00:13:52.233
It compares the hash, which has sent
00:13:54.800
in the message as part of the
00:13:57.333
digital signature, with the cryptographic hash it
00:13:59.866
just calculated.
00:14:00.700
If the two match, then it knows
00:14:02.966
the messages is authentic and has been
00:14:05.266
unmodified, provided is trusts the sender to
00:14:07.566
have kept its private key private.
00:14:09.633
If the hash of the message it
00:14:11.900
calculated, and the hash that was sent
00:14:14.166
in the digital signature, don't match then
00:14:16.400
it knows that somehow the message has
00:14:18.666
been modified in transit.
00:14:20.066
Public Key Encryption is therefore one of
00:14:22.200
the fundamental building blocks of a secure network.
00:14:25.066
It allows us to send a message
00:14:26.900
to a recipient securely, even if we've
00:14:29.100
not met that recipient, and be sure
00:14:31.300
that they're the only one who’ll be
00:14:33.466
able to decrypt that message. And it
00:14:35.666
allows us to use digital signatures to
00:14:37.866
verify that messages have not been modified
00:14:40.033
in transit.
00:14:40.766
The security of public key encryption,
00:14:43.166
though, depends on knowing which public key
00:14:45.933
corresponds to a particular receiver.
00:14:48.033
There are three ways you can know
00:14:50.300
this. The first is that the receiver
00:14:52.566
gives you their key in person.
00:14:54.633
The second is that the receiver sent
00:14:56.966
you their key, but the message in
00:14:59.300
which they send it is authenticated by
00:15:01.666
someone you trust.
00:15:02.766
That is, there’s a digital signature in
00:15:05.266
the message, signed by someone who's key
00:15:07.766
already have, that authenticates that this message
00:15:10.300
is from who it claims to be from.
00:15:13.633
The third is that someone you trust
00:15:16.166
gives you the receivers key.
00:15:18.333
In the Internet, the role of someone
00:15:20.800
you trust is often played by an
00:15:23.300
organisation known as a certificate authority,
00:15:25.400
as part of a public key infrastructure.
00:15:28.000
The role of a certificate authority is
00:15:30.733
to validate the identity of potential senders.
00:15:33.466
The certificate authority checks the identity of
00:15:36.200
a potential sender, and then adds a
00:15:38.933
digital signature to the sender’s public key
00:15:41.666
to indicate that it's done so.
00:15:44.100
If a receiver trusts the public key
00:15:47.300
infrastructure, trusts the certificate authority, then it
00:15:50.500
can verify that digital signature, added by
00:15:53.700
the certificate authority, to confirm the identity
00:15:56.866
of the sender.
00:15:58.366
These mechanisms, symmetric and public key encryption,
00:16:01.766
and digital signatures, allow us to provide
00:16:05.200
confidentiality for communication over the Internet that
00:16:08.600
performs well and is secure.
00:16:11.600
They allow us to authenticate messages,
00:16:13.700
and demonstrate that they've not been modified in transit.
00:16:16.633
And they allow us to validate the identity of senders
00:16:19.466
of those messages.
Part 3: Transport Layer Security (TLS) v1.3
The 3rd part of the lecture describes the operation of the Transport
Layer Security Protocol (TLS) v1.3; one of the key security protocols
used in the Internet.
Slides for part 3
00:00:00.333
In previous parts of this lecture I
00:00:02.633
spoke about network security in general terms.
00:00:05.266
In part one, I discussed why security
00:00:07.933
is needed in order to protect Internet communications,
00:00:11.233
and in part two, I spoke about
00:00:13.733
how security is provided in outline.
00:00:16.033
I spoke about the different types of
00:00:18.700
encryption, public key and symmetric,
00:00:20.733
the use of hybrid encryption, in order
00:00:24.033
to improve performance while still maintaining security,
00:00:27.333
and the ideas of digital signatures and
00:00:30.633
public key infrastructure.
00:00:32.133
In this third part of the lecture,
00:00:34.533
I want to move on to talk
00:00:36.966
about Internet security in specific terms.
00:00:39.033
I want to talk about the Transport
00:00:41.433
Layer Security protocol, TLS version 1.3
00:00:43.633
I’ll begin by introducing what is TLS,
00:00:45.933
talking about conceptually what role it performs
00:00:48.266
in the network stack. And I'll talk
00:00:50.566
through some of the details of TLS.
00:00:52.966
I'll talk about the TLS handshake protocol,
00:00:56.133
that's used to establish TLS connections.
00:00:58.800
The record protocol, that's used to exchange
00:01:01.933
data. The 0-RTT extension, that reduces connection
00:01:05.066
setup times. And finally, I'll talk about
00:01:08.233
some of the limitations of TLS.
00:01:11.000
As we saw in some of the
00:01:13.833
earlier lectures, TCP connections are not secure
00:01:16.733
Neither the TCP headers, nor the IP
00:01:19.533
headers, nor the data they transfer are
00:01:22.300
encrypted or authenticated in any way.
00:01:24.766
Data sent in a TCP connection is
00:01:28.200
not confidential. It can be observed by
00:01:31.600
governments, businesses, network operators, criminals,
00:01:34.433
or malicious users.
00:01:35.733
Similarly, the data is not authenticated.
00:01:37.666
Anyone who's able to access the network
00:01:40.066
connections, or the routers over which the
00:01:42.466
data flows, is able to modify that
00:01:44.833
data. And the sender and the receiver
00:01:47.233
will not be able to tell that
00:01:49.633
such modifications have been performed.
00:01:51.466
In order to provide security for data
00:01:54.166
going across a TCP connection, we need
00:01:56.866
to run some sort of additional security
00:01:59.566
protocol within that TCP connection to protect
00:02:02.266
the data.
00:02:03.166
The way this is typically done in
00:02:05.366
the Internet, is using a protocol called
00:02:07.566
the Transport Layer Security protocol.
00:02:09.233
The latest version of this is TLS
00:02:12.166
1.3 and it's used to encrypt and
00:02:15.066
authenticate data that is carried within a
00:02:17.966
TCP connection.
00:02:18.900
The official specification for TLS 1.3 is
00:02:21.900
RFC 8446, which was published by the
00:02:24.900
IETF in the last couple of years.
00:02:28.033
The TLS specification is not a simple
00:02:30.933
document to read.
00:02:32.266
In part, this is because it's solving
00:02:34.866
a difficult problem. Providing security over the
00:02:37.433
top of an insecure connection, a TCP
00:02:40.033
connection, is a complex challenge, and TLS
00:02:42.600
has to define the number of complex
00:02:45.200
mechanisms in order to provide that security.
00:02:47.866
In other part, the complexity comes because
00:02:50.600
TLS is an old protocol.
00:02:52.666
The latest versions of TLS have to
00:02:55.533
be backwards compatible, not only with previous
00:02:58.400
versions of TLS as specified, but with
00:03:01.266
previous implementation problems, and bugs in the
00:03:04.133
TLS specification and in its implementations
00:03:06.700
The protocol designers have done a good
00:03:09.766
job, though. TLS version 1.3 is smaller,
00:03:12.866
faster, and simpler than previous versions of
00:03:15.933
TLS, and it's also more secure.
00:03:18.700
The slide lists four blog posts which
00:03:21.333
perfect more information about TLS. The first
00:03:24.000
one is an introduction to TLS 1.3
00:03:26.666
from the IETF. This was written by
00:03:29.300
the TLS working group chairs, and introduces
00:03:31.966
the new features in the protocol.
00:03:34.366
The second, from CloudFlare, is a detailed
00:03:37.000
look at what's new in TLS 1.3,
00:03:39.633
as compared to previous versions of TLS.
00:03:42.400
It talks about some of the advantages
00:03:44.933
of TLS 1.3, and how it improves
00:03:47.466
security, and reduces the connection set up times.
00:03:50.566
The third of these, from David Wong,
00:03:52.900
attempts to redraw the TLS specification in
00:03:55.300
a way that makes it easier to
00:03:57.733
read. This is a copy of RFC
00:04:00.166
8446, the TLS specification, with the diagrams
00:04:02.600
redrawn in an easier to read way,
00:04:05.033
and with explanatory videos and comments added
00:04:07.466
to make it easier to follow.
00:04:09.633
The final post is the most detailed.
00:04:12.566
It's an annotated packet capture showing the
00:04:15.500
details of a TLS connection.
00:04:17.700
This walks through the TLS connection establishment
00:04:20.433
handshake, byte by byte, labelling each byte
00:04:23.133
with reference to the specification to explain
00:04:25.866
exactly what it means, and how the
00:04:28.566
handshake proceeds.
00:04:29.466
I encourage you to review these four
00:04:32.033
blog posts. They give a nice complement
00:04:34.633
to the material I'll talk about in
00:04:37.200
the rest of this lecture, introducing how
00:04:39.800
TLS 1.3 works.
00:04:41.000
So what's the goal of TLS 1.3?
00:04:44.266
Well, given an existing connection, that's capable
00:04:47.400
of delivering data reliably and in the
00:04:50.566
order it was sent, but is insecure,
00:04:53.700
TLS 1.3 aims to add security.
00:04:56.533
That is given a TCP connection,
00:04:59.566
it seems to add authentication, confidentiality,
00:05:02.633
and integrity protection to the data sent
00:05:06.200
over that connection.
00:05:07.833
In terms of authentication, it uses public
00:05:10.500
key cryptography, and a public key infrastructure,
00:05:13.133
in order to verify the identity of
00:05:15.800
the server to which the connection is made.
00:05:19.066
That is, the client can always verify
00:05:21.500
that it's talking to the desired server.
00:05:24.100
In addition, it provides optional authentication for
00:05:26.700
the client, to allow the server to
00:05:29.266
verify the identity of the client.
00:05:31.600
Once the connection has been established,
00:05:34.233
and verified to be correct, TLS provides
00:05:37.333
confidentiality for data sent across that connection.
00:05:40.500
It uses hybrid encryption schemes to provide
00:05:43.266
good performance, while still providing a strong
00:05:46.000
amount of security.
00:05:47.266
Finally, TLS authenticates data sent across the
00:05:50.500
connection, to provide integrity protection. It's not
00:05:53.700
possible for an attacker to modify data
00:05:56.900
sent across a TLS connection without that
00:06:00.133
modification being detectable by the endpoints.
00:06:02.966
How does TLS 1.3 work?
00:06:05.800
Well, first of all, a TCP connection
00:06:08.566
must be established. TLS is not a
00:06:11.333
transport protocol itself, and it relies on
00:06:14.100
an underlying TCP connection in order to
00:06:16.866
exchange data.
00:06:17.766
Once the TCP connection has been established,
00:06:21.166
TLS runs within that connection.
00:06:23.700
There are two parts to a TLS
00:06:26.466
connection. It begins with a handshake protocol,
00:06:29.233
and then proceeds with a record protocol.
00:06:32.100
The goal of the handshake protocol,
00:06:34.200
at the beginning of the connection,
00:06:36.266
is to authenticate the endpoints and agree
00:06:38.700
on what encryption keys to use.
00:06:40.900
Once this is completed, TLS switches to
00:06:43.833
running the record protocol, which lets endpoints
00:06:46.766
exchange authenticated and encrypted blocks of data
00:06:49.700
over the connection.
00:06:51.066
TLS turns the TCP byte stream into
00:06:54.333
a series of records. It provides framing,
00:06:57.600
delivers data block by block, each block
00:07:00.866
being encrypted and authenticated to ensure that
00:07:04.133
the data being sent in that block
00:07:07.400
is confidential, and arrives unmodified.
00:07:09.833
A secure connection over the Internet starts
00:07:12.600
up establishing a TCP connection as normal.
00:07:15.466
The client connects to the server,
00:07:17.700
sending a SYN packet, along with its
00:07:20.300
initial sequence number.
00:07:21.500
The server response with the SYN-ACK,
00:07:23.866
acknowledging the client’s initial sequence number,
00:07:26.200
and providing the server’s initial sequence number.
00:07:28.933
And then the client responsive with an
00:07:31.700
ACK packet, acknowledging that packet from the server.
00:07:35.066
This sets up a TCP connection.
00:07:37.633
Immediately following that, the TLS handshake starts,
00:07:41.066
running within the TCP connection itself.
00:07:44.133
The TLS client sends a TLS ClientHello
00:07:46.966
message to a server immediately following the
00:07:49.766
final ACK of the TCP handshake.
00:07:52.300
The server responds to that with a
00:07:54.700
TLS ServerHello message, and then the client
00:07:57.133
in return
00:07:57.933
responds with a TLS Finished message.
00:08:00.433
This concludes the handshake, and carries the
00:08:03.333
first block of secure data. Following this,
00:08:06.233
the client and the server switch to
00:08:09.133
running the TLS record protocol over the
00:08:12.066
TCP connection, and exchange further secure data blocks.
00:08:15.433
As can be seen the TLS handshake
00:08:18.000
adds an additional round trip time to
00:08:20.000
the connection establishment.
00:08:21.733
At the start of the connection,
00:08:23.533
there's an initial round trip time while
00:08:25.600
TCP connection is set up.
00:08:27.200
And then this is followed by an
00:08:29.533
additional round trip, while the TLS connection
00:08:31.833
and the security parameters are negotiated,
00:08:33.800
before the data can be set.
00:08:35.866
There's a minimum of two round trip
00:08:38.633
times from the start of the TCP
00:08:41.366
connection to the conclusion of the TLS
00:08:44.133
handshake and the first secure data segment
00:08:46.866
being sent.
00:08:47.766
The first part of the TLS handshake
00:08:50.266
is the ClientHello message. This is sent
00:08:52.766
from the client to the server,
00:08:54.900
and begins the negotiation of the security parameters.
00:08:57.933
The ClientHello message does three things.
00:09:00.200
It's indicates the version TLS that is
00:09:02.966
to be used. It indicates the cryptographic
00:09:05.700
algorithms that the client supports, and provides
00:09:08.466
its initial keying material. And it indicates
00:09:11.200
the name of the server to which
00:09:13.966
the client is connecting.
00:09:15.633
You may wonder why the ClientHello message
00:09:17.966
needs to indicate server name, given that
00:09:20.300
it's running over a TCP connection that's
00:09:22.633
just been established to that server.
00:09:24.733
The reason for this, is that TLS
00:09:26.766
is often used with web hosting,
00:09:28.500
and it's common for web servers to
00:09:30.533
host more than one website,
00:09:32.066
so the server name provided in the
00:09:34.866
TLS ClientHello indicates which of the sites,
00:09:37.666
which are accessible over that TCP connection,
00:09:40.500
the TLS message is trying to establish
00:09:43.300
a connection, establish a secure connection, to.
00:09:46.333
The ClientHello message also indicates which version
00:09:48.800
of TLS is to be used.
00:09:51.033
What you would expect to happen here,
00:09:53.633
is that it would indicate that it
00:09:56.200
wishes to use TLS 1.3.
00:09:58.166
What actually happens, though, is that the
00:10:01.066
ClientHello message includes a version number indicating
00:10:03.933
that it wants to use TLS version
00:10:06.833
1.2, the previous version of TLS.
00:10:09.400
The ClientHello message includes an optional set
00:10:12.366
of extension headers, and one of those
00:10:15.366
extension headers includes an extension which says
00:10:18.366
“actually I’m really TLS version 1.3”.
00:10:21.033
The reason the version negotiation happens in
00:10:23.366
such a weird way, specifying an old
00:10:25.700
version of TLS in the version field,
00:10:28.033
and using an extension to indicate the
00:10:30.366
real version,
00:10:31.133
Is because there are too many middle
00:10:33.566
boxes, too many devices which try to
00:10:36.000
inspect TLS traffic in the network,
00:10:38.066
and which fail if the version number changes.
00:10:40.866
The protocol has become ossified.
00:10:43.333
We waited too long between versions of TLS.
00:10:46.366
Too many devices were deployed, to many
00:10:49.633
endpoints were deployed, which only understood version 1.2
00:10:53.066
and which didn't correctly support the version
00:10:55.733
negotiation. And then, when it came to
00:10:58.300
deploying a new version, and people tried
00:11:00.833
with early versions of TLS to just
00:11:03.400
change the version number to 1.3,
00:11:05.566
is was found that those new versions
00:11:08.133
didn't support the change.
00:11:09.700
The result was that connections that indicated
00:11:12.200
TLS version 1.3 in the header would
00:11:14.733
tend to fail,
00:11:15.900
whereas those that pretended to be TLS
00:11:18.600
version 1.2, using an extension header to
00:11:21.266
upgrade the version number, would work through
00:11:23.966
those middleboxes, and the connection could succeed
00:11:26.666
and proceed with the new version.
00:11:29.066
The ClientHello message is the first part
00:11:32.333
of the connection setup handshake. It doesn't
00:11:35.566
carry any new data.
00:11:37.533
Following the ClientHello, the server responds with
00:11:41.333
a ServerHello message.
00:11:43.066
The ServerHello message also indicates the version
00:11:45.866
of TLS which is to be used
00:11:48.633
and, like the ClientHello, it indicates that
00:11:51.433
the version is actually TLS version 1.2
00:11:54.233
and includes an extension header to say
00:11:57.000
that it’s really a TLS 1.3 connection
00:11:59.800
that's being established
00:12:01.066
In addition to the version negotiation.
00:12:03.433
The TLS ServerHello includes the cryptographic algorithms
00:12:06.200
selected by the server, which are a
00:12:08.933
subset of the set suggested by the client.
00:12:11.833
That is, the client suggests the cryptographic
00:12:14.733
algorithms which it supports, and the server
00:12:17.300
looks at those, finds the subset of
00:12:19.866
them which are acceptable to it,
00:12:22.066
picks one of them, and includes that
00:12:24.633
in its response.
00:12:25.833
The ServerHello message also includes the server’s
00:12:28.066
public key, and a digital signature which
00:12:30.266
can be used to verify the identity
00:12:32.500
of the server.
00:12:33.533
Like the ClientHello, it doesn't include any data.
00:12:38.066
Finally, the TLS handshake concludes with a
00:12:40.933
Finished message, which flows from the client
00:12:43.466
to the server. The TLS Finished message
00:12:46.033
includes the clients public key and optionally,
00:12:48.566
it includes a certificate which is used
00:12:51.133
to authenticate the client to the server.
00:12:53.800
The TLS Finished message concludes the connection
00:12:57.533
setup handshake.
00:12:58.700
In addition to the connection setup,
00:13:00.900
it may therefore include the first part
00:13:03.466
of application data that is sent from
00:13:06.033
the client to the server.
00:13:07.966
TLS uses the ephemeral elliptic curve Diffie-Hellman
00:13:11.266
key exchange algorithm in order to derive
00:13:14.566
the keys used for the symmetric encryption.
00:13:18.000
The client and the server exchange that
00:13:20.300
public keys, as part of the connection
00:13:22.633
setup handshake, and they then combine those
00:13:24.933
two public keys to derive the key
00:13:27.266
that's used for the symmetric cryptography.
00:13:29.333
The maths of how this works is
00:13:31.400
complex. I'm not going to attempt to
00:13:33.466
describe it here.
00:13:34.433
What's important though, is that the symmetric
00:13:36.933
key is never exchanged over the wire.
00:13:39.400
The client and the server only exchange
00:13:41.866
their public keys, and the symmetric key
00:13:44.366
is derived from those.
00:13:45.866
A TLS server provides a certificate that
00:13:48.633
allows the client to verify its identity
00:13:51.366
as part of the ServerHello message.
00:13:53.733
The client can optionally provide this information
00:13:56.466
along with its Finished message.
00:13:58.533
Result is that the client can always
00:14:01.000
verify the identity of the server,
00:14:03.133
and the server can optionally verify the
00:14:05.633
identity of the client.
00:14:07.133
The choice of encryption algorithm is driven
00:14:09.633
by the client, which provides the list
00:14:12.133
of the symmetric encryption algorithms that it
00:14:14.633
supports as part of its ClientHello message.
00:14:17.133
The server picks from these, and replies
00:14:19.633
in its ServerHello.
00:14:20.833
The usual result is that either the
00:14:24.766
Advanced Encryption Standard, AES, or the ChaCha20
00:14:28.700
symmetric encryption algorithm is chosen.
00:14:31.633
Once the TLS connection establishment protocol,
00:14:34.166
the handshake protocol, has completed the TLS
00:14:37.166
record protocol starts. The record protocol allows
00:14:40.133
the client and the server to exchange
00:14:43.133
records of data over the TCP connection.
00:14:46.200
Each record can contain up to two
00:14:49.033
to the power 14 bytes of data,
00:14:51.900
and is both encrypted and authenticated.
00:14:54.433
Records of data have a sequence number,
00:14:56.933
and they are delivered reliably, securely,
00:14:59.066
and in the order in which they
00:15:01.600
were sent.
00:15:02.400
The underlying TCP connection does not preserve
00:15:05.233
record boundaries. TLS adds framing to the
00:15:08.066
connection so that it does so,
00:15:10.466
and reading from a TLS connection will
00:15:13.300
block until a complete record of data
00:15:16.133
is received.
00:15:17.033
A TLS connection usually uses the same
00:15:19.866
encryption key to protect data for the
00:15:22.733
entire connection. However, in principle, it can
00:15:25.566
renegotiate encryption keys between records, if there's
00:15:28.400
a need to change the encryption key
00:15:31.233
partway through a connection.
00:15:32.966
The TLS record protocol allows the client
00:15:35.533
and the server to exchange records,
00:15:37.733
to send and receive data as they
00:15:40.300
see fit.
00:15:41.133
Once they finish doing so, they close
00:15:44.833
the connection, which closes the underlying TCP connection.
00:15:48.266
TLS 1.3 usually takes one round trip
00:15:52.066
time to establish the connection after the
00:15:54.966
TCP connection set up.
00:15:56.733
That is, there's the TCP SYN,
00:15:59.566
SYN-ACK, ACK handshake to establish the TCP
00:16:02.833
connection, and then an additional round trip
00:16:06.133
time for the TLS ClientHello, ServerHello,
00:16:08.966
Finished exchange.
00:16:10.000
However, if the client and the server
00:16:12.733
have previously communicated, TLS 1.3 allows them
00:16:15.466
to reuse some of the connection setup
00:16:18.233
parameters, and re-use the same encryption key.
00:16:21.066
The way this works is that the
00:16:23.266
server can send an additional encryption key
00:16:25.433
as part of its ServerHello message,
00:16:27.433
and the client can remember that key,
00:16:29.500
and use it the next time it
00:16:31.600
connects to the server. This is known
00:16:33.700
as a pre-shared key.
00:16:34.966
When the client next connects to that
00:16:37.766
server, it sends its ClientHello message as
00:16:40.566
normal. However, in addition to that ClientHello
00:16:43.333
message, it can also include some data,
00:16:46.133
and that data is encrypted using the
00:16:48.900
pre-shared key.
00:16:49.800
The ServerHello also proceeds as normal.
00:16:52.033
But again, can contain data encrypted using
00:16:54.666
the pre-shared key, and sent in reply
00:16:57.266
to the client, to the data included
00:16:59.866
in the ClientHello message.
00:17:01.466
The use of the pre-shared key therefore
00:17:03.766
allows the client and the server to
00:17:06.100
exchange data along with the initial connection
00:17:08.400
setup handshake. It allows data to be
00:17:10.733
exchanged within zero RTTs of the connection
00:17:13.033
set up, as part of the first
00:17:15.333
round trip.
00:17:16.100
This extension is therefore known as the
00:17:20.100
0-RTT mode of TLS 1.3.
00:17:23.033
The 0-RTT mode is useful, because it
00:17:25.766
allows connections to start sending data much
00:17:28.533
earlier. It removes one round trip times
00:17:31.266
worth of latency. However, it has a limitation.
00:17:34.233
The limitation is that, unlike the record
00:17:38.100
packets which contain a sequence number,
00:17:41.166
TLS ClientHello and ServerHello messages don't contain
00:17:44.766
a sequence number.
00:17:46.400
A consequence of this, is that data
00:17:48.933
sent as part of a ClientHello,
00:17:51.100
or a ServerHello, may be duplicated,
00:17:52.900
and TLS has no way of stopping this.
00:17:55.933
If you're writing an application that uses
00:17:58.700
TLS in 0-RTT mode you need to
00:18:01.133
be careful, and only send what's known
00:18:03.566
as idempotent data,
00:18:04.700
data where it doesn't matter if that
00:18:07.300
data is delivered more than once to
00:18:09.900
the server, in the 0-RTT packets.
00:18:12.233
Data that is sent after the first
00:18:15.033
round trip time has concluded, as part
00:18:17.800
of the regular TLS connection, doesn't suffer
00:18:20.600
from this problem, and is only ever
00:18:23.366
delivered to the application once.
00:18:25.466
A TLS connection is secure, but it
00:18:28.333
has a number of limitations.
00:18:30.466
TLS operates within a TCP connection.
00:18:33.966
A consequence of this, is that the
00:18:36.666
IP addresses and the TCP port numbers
00:18:39.400
are not protected. This exposes information about
00:18:42.100
who is communicating, and what application is
00:18:44.800
being used.
00:18:45.700
Further, the TLS ClientHello message includes the
00:18:48.500
server name, but doesn't encrypt that.
00:18:50.900
This exposes the host name of the
00:18:53.700
server to which the connection is being
00:18:56.500
made, and may be a significant privacy leak.
00:18:59.633
An extension, known as Encrypted Server Name
00:19:02.266
Indication, is under development, but this is
00:19:04.766
not finished yet, and there are some
00:19:07.233
concerns that it may be very difficult
00:19:09.733
to deploy.
00:19:10.533
TLS also relies on a public key
00:19:13.166
infrastructure to validate the keys, and to
00:19:15.766
verify the identity of clients and servers.
00:19:18.500
There are some significant concerns about the
00:19:21.766
trustworthiness this public key infrastructure.
00:19:24.166
The reasons for this are not that
00:19:26.966
the cryptographic algorithms or the mechanisms are
00:19:29.733
insecure, they’re that the browsers tend to
00:19:32.500
trust a very large range of certificate authorities,
00:19:34.766
and it's not clear to which extent all of these certificate
00:19:37.166
authorities are actually trustworthy.
00:19:41.300
The final limitation of TLS is that
00:19:44.700
the 0-RTT extension may deliver data more than once.
00:19:48.600
0-RTT is a very useful extension,
00:19:50.900
because it allows data to be delivered
00:19:53.600
with low latency at the start of
00:19:56.300
the connection, but it runs the risk
00:19:59.000
that the data is delivered multiple times,
00:20:01.700
so must be used with care.
00:20:04.100
That concludes the discussion TLS. I spoke
00:20:07.133
about what is TLS. I've talked about
00:20:10.133
the TLS handshake protocol, that establishes the
00:20:13.133
connection using the ClientHello, ServerHello,
00:20:15.466
and Finished messages,
00:20:16.800
and that agrees the appropriate cryptographic parameters.
00:20:19.766
And I spoke about the TLS record
00:20:21.666
protocol, which is used to actually exchange the data.
00:20:25.000
The TLS 0-RTT extension allows for faster
00:20:27.833
data transfer at the beginning of the
00:20:30.633
connection, but comes with some risks of
00:20:33.466
data replay attack. Finally, I spoke about
00:20:36.300
some of the limitations of TLS.
00:20:38.833
The TLS protocol has actually been wildly
00:20:41.700
successful. It's used to secure all the
00:20:44.600
traffic sent over the web. And when
00:20:47.500
used correctly, is very much a secure
00:20:50.400
protocol, that performs very well.
00:20:52.566
In the final part of the lecture,
00:20:54.766
I'll move on from talking about the details of the
00:20:57.100
cryptographic mechanisms, and the transport protocols,
00:21:00.033
to talk about some of the issues with writing
00:21:02.033
secure software.
Part 4: Discussion
The final part of the lecture discusses systems aspects of providing
secure communication. It reviews the need for end-to-end security to
protect communications. It discusses the robustness principle, and
its implications for the design on input parsers and other aspects
of networked systems. And it briefly reviews some of the challenges
in writing secure code.
Slides for part 4
00:00:00.666
In the previous parts, I’ve spoken about
00:00:03.666
the general principles underlying secure communication,
00:00:05.966
and about the Transport Layer Security protocol,
00:00:08.633
TLS 1.3, that protects most Internet communications.
00:00:11.333
In this final part of the lecture,
00:00:14.100
I want to raise some issues to
00:00:16.766
consider when developing secure networked applications.
00:00:19.066
In particular, I want to discuss the
00:00:21.866
need for end-to-end security, and the problems
00:00:24.533
of making secure communication in the presence
00:00:27.200
of content distribution networks, servers, and middleboxes.
00:00:29.900
I want to talk about the robustness
00:00:32.666
principle, and the difficulty in designing and
00:00:35.333
building networked applications. And I want to
00:00:38.000
talk about the need to carefully validate
00:00:40.700
input data, and part of the issues
00:00:43.366
around writing secure code.
00:00:46.000
For communication to be secure, it must
00:00:48.900
be end-to-end.
00:00:49.733
That is, the secure communication must run
00:00:52.733
between the initial sender and the final
00:00:55.633
recipient, and the message must not be
00:00:58.533
decrypted or lose integrity protection at any
00:01:01.433
point along the path.
00:01:03.066
That is harder to arrange than you
00:01:06.066
might imagine.
00:01:07.000
If the communication is between a client
00:01:09.500
and a server located in a data
00:01:11.966
centre, it’s easy to understand what is
00:01:14.466
the client endpoint. It’s the phone,
00:01:16.600
tablet, or laptop on which the application
00:01:19.100
making the request is running. What is
00:01:21.566
the endpoint in the data centre though?
00:01:24.066
Does the secure connection terminate at the
00:01:26.566
load balancing device at the entrance to
00:01:29.033
the data centre, that chooses which of
00:01:31.533
the many possible servers responds to the
00:01:34.000
request? If so, does that load balancer
00:01:36.500
make a secure onward connection to the
00:01:39.000
back-end server, or is the connection unprotected
00:01:41.466
within the data centre?
00:01:43.000
If the secure connection passes through the
00:01:45.800
load balancer and terminates on the back-end
00:01:48.633
server, are the connections between the back-end
00:01:51.433
servers and the databases, compute servers,
00:01:53.833
and storage servers in other parts of
00:01:56.666
the data centre secure? And, once the
00:01:59.466
request has been handled, how is the
00:02:02.300
data protected once it’s stored in the
00:02:05.100
data centre?
00:02:06.000
What is your threat model? Are you
00:02:08.800
concerned about protecting your communication as it
00:02:11.566
traverses the wide area network between your
00:02:14.366
client and the data centre? Or are
00:02:17.166
you also concerned with protecting communications within
00:02:19.966
the data centre? If you’re concerned about
00:02:22.733
communications and data storage within the data
00:02:25.533
centre, are you trying to protect against
00:02:28.333
other tenants of the data centre? Or
00:02:31.133
against malicious users that may have compromised
00:02:33.900
the data centre infrastructure? Or against the
00:02:36.700
data centre operator?
00:02:38.000
Similar issues arise with content distribution networks.
00:02:41.300
CDNs, such as Akamai, are widely used
00:02:44.600
as the backend infrastructure for websites,
00:02:47.433
software updates, streaming video services, and gaming
00:02:50.733
services. Applications like the Steam store,
00:02:53.566
the BBC iPlayer, Netflix, and Windows Update,
00:02:56.866
have all run on CDNs at various
00:03:00.166
times, although many of them now use
00:03:03.500
their own infrastructure.
00:03:05.000
CDNs are essentially large-scale highly distributed web
00:03:08.000
caches. They provide local copies of data,
00:03:11.000
to improve performance compared to having to
00:03:14.000
fetch the content from the master site.
00:03:17.000
The secure HTTPS connection is therefore from
00:03:19.966
the client to the CDN, rather than
00:03:22.933
from the client to the original site.
00:03:26.000
This introduces an intermediary into the path.
00:03:29.366
The CDN now has visibility into what
00:03:32.733
requests a client is making, in addition
00:03:36.066
to the original service.
00:03:38.000
Performance is better, but you’re forced to
00:03:40.233
trust a third party with information about
00:03:42.433
what sites you’re visiting.
00:03:43.700
Equally, the data has to get to
00:03:46.033
the CDN caches somehow, and has to
00:03:48.266
be protected as its fetched from the
00:03:50.466
original server to populate the cache.
00:03:52.366
You have to trust the CDN to
00:03:54.600
do this correctly. As a user of
00:03:56.833
the CDN, you have know way of
00:03:59.033
knowing how, or indeed if, that data
00:04:01.266
is secure.
00:04:02.000
In many cases, data is moving between
00:04:04.833
two users. Is that data encrypted end-to-end
00:04:07.700
between the two users? Or is the
00:04:10.533
data encrypted between the users and some
00:04:13.400
data centre, but visible to the data
00:04:16.233
centre? The difference can matter: if the
00:04:19.100
data centre has access to the unprotected
00:04:21.933
data, it may be used to target
00:04:24.800
advertising, and it’s much more likely to
00:04:27.633
be accessible to law enforcement or government
00:04:30.500
monitoring.
00:04:32.000
Many applications use some form of in-network
00:04:34.766
processing. For example, video conferencing systems often
00:04:37.566
use a central server to perform audio
00:04:40.333
mixing and to scale the video to
00:04:43.133
produce thumbnails.
00:04:43.933
For example, in a large video conference,
00:04:46.800
if many users are sending video,
00:04:49.200
then all the video goes to a
00:04:51.966
central server. That server only forwards high
00:04:54.733
quality video for the active speaker,
00:04:57.133
and sends a smaller, more heavily compressed,
00:04:59.900
version for the other participants.
00:05:02.000
This reduces the amount of video sent
00:05:04.900
out to each of the participants,
00:05:07.366
and prevents overloading their network connections.
00:05:09.833
This is a good thing.
00:05:12.000
But, it also means that the central
00:05:14.533
server has access to the audio and
00:05:17.100
video. The server can record that video,
00:05:19.633
if it so chooses, and potentially share
00:05:22.166
it with others. That may be a
00:05:24.733
concern, depending on what’s being discussed.
00:05:27.000
An alternative way of building such an
00:05:29.800
application leaves the data encrypted, and doesn’t
00:05:32.566
give the server access. This increases the
00:05:35.366
privacy of the users, since the data
00:05:38.166
is encrypted end-to-end and isn’t available to
00:05:40.966
the server, but means that the server
00:05:43.733
can’t help compress the data and manage
00:05:46.533
the load, and it means that server-based
00:05:49.333
features, like cloud recording and captioning become
00:05:52.100
much harder to provide. It trades-off features
00:05:54.900
and performance, for increased privacy.
00:05:58.000
When building networked applications, it’s important to
00:06:01.100
consider how the network protocol is implemented.
00:06:04.200
Network protocols can be reasonably complex,
00:06:06.966
and difficult to implement. They have a
00:06:10.066
syntax and semantics, in many ways similar
00:06:13.166
to a programming language. And, like a
00:06:16.266
program, the protocol messages your application receives
00:06:19.366
may contain syntax errors or other bugs.
00:06:22.466
What do you if, if the protocol
00:06:25.700
data you receive is incorrect?
00:06:28.000
A frequently quoted guideline is Postel’s law.
00:06:31.166
This is named after Jon Postel,
00:06:33.866
the original editor of what became the
00:06:37.033
IETF’S RFC series of documents, and an
00:06:40.200
influential contributor to the early Internet.
00:06:43.000
Postel’s law can be summarised as “Be
00:06:46.233
liberal in what you accept, and conservative
00:06:49.466
in what you send”.
00:06:51.300
That is, when generating protocol messages,
00:06:54.166
try your hardest to do so correctly.
00:06:57.400
Make sure the messages you send strictly
00:07:00.633
conform to the protocol specification.
00:07:02.966
But, when receiving messages, accept that the
00:07:06.266
generator of those messages may be imperfect.
00:07:09.500
If a message is malformed, but unambiguous
00:07:12.733
and understandable, Postel’s law suggests to accept
00:07:15.966
it anyway.
00:07:17.000
That’s fine, but i’s important to balance
00:07:19.966
interoperability with security. Don’t be too liberal
00:07:22.966
in what you try to accept.
00:07:25.500
Having a clear specification of how and
00:07:28.500
when you will fail might be more
00:07:31.466
appropriate.
00:07:33.000
Postel’s law says “Be liberal in what
00:07:35.800
you accept, and conservative in what you
00:07:38.566
send”.
00:07:39.733
That makes sense if you trust the
00:07:42.600
other devices on the network.
00:07:44.600
It makes sense if the problems with
00:07:47.466
the messages they send are honest mistakes,
00:07:50.266
and not intended to be malicious.
00:07:53.000
The network has changed since Postel’s time,
00:07:56.033
though.
00:07:57.233
As Poul-Henning Kamp, one of the FreeBSD
00:08:00.366
developers, says “Postel lived on a network
00:08:03.400
with all his friends. We live on
00:08:06.433
a network with all our enemies.
00:08:09.033
Postel was wrong for todays internet”.
00:08:12.000
This is an important point.
00:08:15.000
Any networked system is frequently attacked.
00:08:17.666
There are many people scanning the network
00:08:20.900
for vulnerabilities. Actively trying to break your
00:08:24.000
applications. If you write a server,
00:08:26.666
and make it accessible on the Internet,
00:08:29.800
then people will try to break it.
00:08:33.000
This is not because you’re a target.
00:08:35.833
It’s because machines and network connections are
00:08:38.700
now fast enough that it’s possible to
00:08:41.533
scan every machine on the Internet,
00:08:43.966
to see if it’s vulnerable to a
00:08:46.800
particular problem, within a few hours.
00:08:49.233
It’s not personal. But your server will
00:08:52.100
be attacked.
00:08:53.000
The paper shown on the slide,
00:08:55.433
on “The Harmful Consequences of the Robustness
00:08:58.300
Principle”, by Martin Thomson, talks about this
00:09:01.133
in detail, and gives detailed guidance on
00:09:04.000
how to handle malformed messages. If you
00:09:06.833
write networked, applications, I strongly encourage you
00:09:09.666
to read it.
00:09:12.000
One of the key points made is
00:09:14.933
that networked applications work with data supplied
00:09:17.866
by un-trusted third parties.
00:09:19.533
As we’ve discussed, data read from the
00:09:22.566
network may not conform to the protocol
00:09:25.500
specification. This may be due to ignorance,
00:09:28.400
bugs, malice, or a desire to disrupt services.
00:09:33.266
One of the most critical lessons is
00:09:35.533
that you must carefully validate all data
00:09:38.466
received from the network before you make use of it.
00:09:41.300
Don’t trust arbitrary data that comes from
00:09:44.900
another device over the network. Check it
00:09:47.800
carefully, and make sure it contains what
00:09:50.700
you expect, before use.
00:09:52.366
This is especially important when working in
00:09:55.366
scripting language, that often contain escape characters
00:09:58.266
that trigger special processing. The cartoon on
00:10:01.166
the slide is an example. The idea
00:10:04.066
is that the software processing the student’s
00:10:06.966
name sees the closing quote, and interprets
00:10:09.866
the rest of the name as an
00:10:12.766
SQL commands to delete the student records
00:10:15.666
from the database.
00:10:17.000
It’s a silly example.
00:10:18.866
But it’s surprising how often similar problems,
00:10:22.200
known as SQL injection attacks, occur in practice.
00:10:25.500
And similar problems occur in many other
00:10:29.133
programming languages. This is not just an
00:10:32.233
SQL-related problem.
00:10:33.133
Be careful how you process data.
00:10:37.000
And, in general, be careful how you
00:10:40.900
write networked applications.
00:10:42.566
The network is hostile.
00:10:45.000
Any networked application is security critical.
00:10:48.333
Anything that receives data from the network
00:10:52.233
will be attacked.
00:10:54.000
When writing networked applications, carefully specify how
00:10:57.200
they should behave with both correct and
00:11:00.433
incorrect inputs. Carefully validate inputs and handle
00:11:03.633
errors. And check that your code behaves
00:11:06.866
as expected. Try to break your application,
00:11:10.066
before someone else does.
00:11:12.000
If you’re writing your application using a
00:11:15.166
type- or memory-unsafe language, such as C
00:11:18.333
and C++, take extra case, since these
00:11:21.500
languages have additional failure modes.
00:11:23.766
It’s very easy to write a C
00:11:27.033
or C++ program that suffers from buffer
00:11:30.200
overflows, use after free bugs, race conditions, and so on.
00:11:33.300
Such bugs are almost certainly security vulnerabilities.
00:11:37.366
As a rule of thumb, if you’ve
00:11:39.633
written a C or C++ program,
00:11:41.633
and can cause it to crash with
00:11:43.400
a “segmentation violation” message, then that’s probably
00:11:46.433
exploitable as a security vulnerability.
00:11:49.500
Have you ever managed to write a
00:11:51.500
non-trivial C program that never crashes in that way?
00:11:56.000
This is why network programming is difficult.
00:11:59.333
The network, today, is an extremely hostile environment.
00:12:03.533
Networked applications are security critical,
00:12:05.600
and writing secure code is a very difficult skill.
00:12:10.466
If you have the choice, use popular, well-tested,
00:12:14.000
pre-existing software libraries for network protocols
00:12:17.133
where possible, especially do so for implementations
00:12:20.700
of security protocols such as TLS.
00:12:23.766
And make sure to update these libraries
00:12:27.300
regularly, because problems and security vulnerabilities are
00:12:30.866
found frequently.
00:12:32.000
The best encryption in the world doesn’t
00:12:34.366
help if the endpoints can be
00:12:36.300
compromised and the data stolen before it’s encrypted.
00:12:43.000
This concludes our discussion of secure communications.
00:12:45.600
In the first part, I spoke about
00:12:48.300
the need for secure communication, and some
00:12:50.933
of the challenges and trade-offs in enabling security.
00:12:54.233
In the second part, I discussed the
00:12:57.266
principles of secure communication in abstract terms,
00:13:00.533
talking about symmetric and public key encryption,
00:13:03.800
and how these are combined to give
00:13:07.033
hybrid encryption protocols. I spoke about digital
00:13:10.300
signatures to authenticate data, and about public
00:13:13.566
key infrastructure and certificate authorities.
00:13:16.000
I spoke about the Transport Layer Security
00:13:19.366
protocol, TLS 1.3, that instantiates hybrid encryption
00:13:22.700
and digital signatures into a concrete network
00:13:26.066
protocol, that secures web traffic and other applications.
00:13:29.200
And, finally, I’ve discussed some issues to
00:13:31.900
consider when writing networked applications.
00:13:35.066
Ensuring communications security is a difficult problem.
00:13:39.266
It’s technically difficult, because you need to
00:13:42.566
write extremely robust software, and need to
00:13:44.466
design secure network protocols that use sophisticated
00:13:46.900
cryptographic mechanisms. And it’s politically difficult,
00:13:51.666
because there are some extremely sensitive policy
00:13:54.333
questions around what information should be protected,
00:13:56.966
and against whom.
00:14:00.000
The TLS 1.3 protocol is the current
00:14:02.700
state-of-the-art in secure communications. In the next
00:14:06.466
lecture, we’ll move on to further discuss
00:14:08.566
its limitations, and some of the ways
00:14:11.000
in which people are trying to improve
00:14:12.566
network security and performance.
Discussion
Lecture 3 discussed secure communication. It started with a discussion of
the need for security, and the issues with balancing security, privacy,
and the needs of law enforcement, regulatory compliance for businesses,
and the need to effectively manage networks. It then moved on to discus
the principles by which secure communication can be achieved, via a mix
of symmetric and public key encryption and digital signatures. And it
outlined how these are used in the transport layer security protocol,
TLS.
The focus of the discussion will be to check your understanding of
the principles of security. How do symmetric and public key
encryption work, and how are they combined in practice? And how do
digital signatures work? The mathematics behind this work
is outside the scope of this course, and will not be discussed, but
the principles are important.
Discussion will also consider how does TLS use these techniques to
ensure security. How does the TLS handshake work? What
guarantees does TLS provide to applications? How does the use of 0-RTT
session resumption change those guarantees and what benefits does it
provide in exchange?
Finally, the discussion will also focus on the need to consider the
different impacts of providing secure communication. There are clear
benefits to providing security, but also some unexpected costs that can
lead to tension between users, vendors, network operators, businesses
and governments. The discussion will start to highlight some of these
issues. What should we encrypt? What are the trade-offs of
privacy vs law enforcement access? What doesn't encryption protect?