csperkins.org

Networked Systems H (2022-2023)

Lecture 3: Secure Communications

This lecture considers secure communications in the Internet. It reviews the need for security, and the principles of encryption, integrity protection, and authentication of messages. It explains the principles of operation of the Transport Layer Security Protocol (TLS), version 1.3, and how it protects Internet traffic. And it briefly reviews some of the issues around writing secure software.

Part 1: Secure Communications

The 1st part of this lecture discussed the need for security in Internet communications. It reviews why end-to-end encryption and message integrity protection are essential to protect Internet users for eavesdropping, identity theft, fraud, and other attacks. And it discusses some of the tensions and concerns that have been raised about the provision of such protection.

Slides for part 1

 

00:00:00.766 In the last lecture, I discussed the behavior of TCP

00:00:04.100 and some issues around connection establishment.

 

00:00:07.700 One of these issues was the observation

00:00:09.600 that establishing a secure connection, using TLS,

00:00:12.533 was slower than establishing an insecure connection.

 

00:00:16.100 In this lecture, I want to talk more about TLS

00:00:19.400 and about security in general.

 

00:00:23.400 In this first part,

00:00:24.400 I'll talk about why security is important,

00:00:26.900 and why we need to secure communications.

 

00:00:30.200 Then, in part two,

00:00:31.800 I'll talk about the principle of secure communication

00:00:34.833 and the cryptographic techniques

00:00:36.533 that can be used to protect data.

 

00:00:39.866 Part three of the lecture will describe

00:00:41.600 some of the behavior of the transport layer security

00:00:44.033 protocol, that provides security for most Internet traffic.

 

00:00:48.100 And, finally, in part four,

00:00:50.033 I'll talk about some general issues around network security,

00:00:53.500 and how to write secure networks applications.

 

00:00:59.133 So why do we need secure communications?

 

00:01:03.300 Well, the fundamental problem

00:01:05.566 is that it's possible to eavesdrop on network traffic.

 

00:01:09.866 This can be done by wiretapping the network links

00:01:12.666 down which the data flows,

00:01:14.366 or it can be done by configuring the network routers

00:01:17.333 to save a copy of the packets they forward.

 

00:01:20.700 The result is that traffic passing across the network

00:01:23.700 can be monitored by third parties.

 

00:01:26.666 If you want to ensure that the data you send

00:01:28.866 across the network is private,

00:01:30.566 then that data needs to be encrypted somehow.

 

00:01:34.700 Similarly, network routers can modify

00:01:37.133 the packets they forward.

 

00:01:39.566 This means that the router can change the data

00:01:42.000 being delivered without the consent of the sender.

 

00:01:45.533 The sender cannot stop this happening.

00:01:47.966 But they can add some message integrity protection,

00:01:51.000 such as a digital signature,

00:01:53.100 to allow the receiver to detect and reject

00:01:55.700 messages that have been tampered with.

 

00:01:59.266 Finally, there are numerous devices in the network,

00:02:02.533 known as middle boxes,

00:02:03.933 that try to improve communication

00:02:06.266 by somehow interpreting or modifying the data being sent.

 

00:02:10.933 For example, we spoke about network address

00:02:13.800 translation in the last lecture

00:02:15.733 where a NAT router rewrites the addresses and ports

00:02:18.566 in TCP/IP headers to allow several machines

00:02:21.800 to share a single single IP address.

 

00:02:25.733 Other examples include network firewalls,

00:02:28.233 that monitor traffic and try and prevent bad traffic

00:02:30.933 from entering a network,

00:02:32.266 as well as the various accelerator devices

00:02:34.666 that try to improve the performance of TCP

00:02:36.800 connections running over satellite links.

 

00:02:40.300 If not carefully maintained,

00:02:42.033 these devices tend to lead to network ossification,

00:02:45.700 where they tend to limit the ability to

00:02:47.833 change network protocols.

 

00:02:50.633 A final rule of secure communications

00:02:53.633 is therefore to limit the ability of such devices to inspect

00:02:56.600 and act on the traffic,

00:02:58.100 so helping to ensure that the network

00:03:00.266 can continue to evolve.

 

00:03:05.933 A lot of different organizations monitor the network,

00:03:09.166 for many different reasons.

 

00:03:13.000 These include governments, intelligence agencies,

00:03:15.600 and law enforcement agencies.

 

00:03:17.933 For example, the police have to monitor the network

00:03:21.366 as part of their crime prevention activities;

00:03:24.200 domestic intelligence agencies inspect traffic

00:03:26.966 to protect against terrorism, or to monitor foreign targets;

00:03:30.733 and foreign intelligence agencies might try to

00:03:33.533 spy on domestic targets.

 

00:03:36.800 That this happens shouldn't be a surprise.

 

00:03:40.666 And are clearly good reasons for some of this monitoring.

 

00:03:46.366 Many people would agree, I think,

00:03:48.166 that targeted wiretaps on suspected criminals,

00:03:51.133 subject to appropriate oversight,

00:03:53.233 the need to obtain a warrant of some sort,

00:03:55.500 and when there's probable cause,

00:03:57.633 are probably not unreasonable.

 

00:04:01.500 Relatively few people would object

00:04:03.800 to actively monitoring the network traffic of those

00:04:06.466 actively suspected of being engaged in serious crimes,

00:04:09.800 terrorist activities, child abuse, and so on.

 

00:04:14.866 People differ on what crimes they consider serious,

00:04:18.433 or on the standards of probable cause,

00:04:20.966 or on the amount of oversight needed.

 

00:04:24.000 But all societies accept some degree of monitoring

00:04:26.633 and oversight of network traffic.

 

00:04:30.566 However, Edward Snowden showed that

00:04:33.833 some intelligence agencies, including,

00:04:36.900 but certainly not limited to the five eyes,

00:04:39.500 the UK, the US, Canada, Australia, and New Zealand,

00:04:43.733 were conducting pervasive monitoring of all network traffic.

 

00:04:49.400 Other governments are also known to conduct such monitoring.

00:04:53.100 The great firewall of China is a common example,

00:04:56.100 along with monitoring by Russia,

00:04:57.966 Iran, Saudi Arabia, and others.

 

00:05:02.166 Many felt that this indiscriminate monitoring

00:05:04.933 of all network traffic without probable cause or suspicion,

00:05:08.500 was a step too far.

 

00:05:11.766 In part, I think this came from distrust

00:05:14.433 of those governments, their motives,

00:05:16.366 and how they might use the data.

 

00:05:19.433 The people they were supposed to represent were unconvinced

00:05:22.200 that the monitoring was actually doing them good.

 

00:05:25.833 But, in part, there was also the realization

00:05:28.833 that if supposedly friendly governments

00:05:30.933 were monitoring traffic indiscriminately,

00:05:33.266 then so were others.

 

00:05:36.166 Even if I completely trust our government

00:05:39.033 to monitor Internet traffic only good reasons,

00:05:41.833 the fact that they're able to monitor that traffic

00:05:45.133 means that others are able to do so too.

 

00:05:48.266 And those others might not have my best interests at heart.

 

00:05:53.633 This led to a push to enable pervasive encryption,

00:05:56.766 to encrypt more and more of the traffic

00:05:58.933 crossing the Internet.

 

00:06:01.333 The most visible manifestation of this

00:06:03.700 is that most websites now use HTTPS

00:06:06.433 and encrypt their traffic.

 

00:06:07.933 But the spread of encryption has been wider than the web.

 

00:06:12.400 The result is that most Internet traffic

00:06:14.800 is now encrypted by default,

00:06:16.700 hindering, but not preventing, pervasive monitoring .

 

00:06:23.200 Governments and not the only organizations

00:06:25.833 to monitor network traffic, of course.

 

00:06:29.100 We've all contacted a business and been told that our

00:06:31.833 call may be monitored for quality and training purposes.

 

00:06:36.733 Some of this monitoring by businesses is necessary

00:06:39.500 for regulatory compliance.

 

00:06:42.133 Banking and insurance industries, for example,

00:06:44.833 require records to be kept in most cases, to prevent fraud.

 

00:06:49.433 There are good reasons for some of this monitoring.

 

00:06:53.833 Other aspects of monitoring and tracking by

00:06:56.100 businesses are perhaps less beneficial.

 

00:06:59.333 Targeted advertising and customer profiling is

00:07:02.300 frequently cited as problematic, for example.

 

00:07:06.300 Communication security measures, such as encryption,

00:07:09.600 can help reduce such unwanted monitoring,

 

00:07:13.800 though the effect is small, since this type of

00:07:16.066 monitoring and tracking is often delivered

00:07:18.133 by the sites we intentionally visit,

00:07:20.233 rather than by snooping on communications.

 

00:07:27.300 We also see network operators

00:07:29.733 monitoring traffic on the networks they operate.

 

00:07:33.400 Again, there are both beneficial,

00:07:35.800 and problematic, reasons for this.

 

00:07:39.366 Network operators monitor traffic

00:07:41.533 to understand how well their networks are operating,

00:07:44.333 and whether they're meeting their quality of service goals.

 

00:07:48.800 it's common, for example,

00:07:50.533 for network operators to inspect

00:07:52.400 the sequence and acknowledgement numbers

00:07:54.433 in the headers of TCP packets traversing their networks.

 

00:07:59.000 This lets them understand if packets are being lost,

00:08:01.766 or if the time taken for packets to traverse

00:08:04.333 the network is building up,

00:08:06.400 both of which are signs that the network

00:08:08.166 is becoming overloaded.

 

00:08:11.166 This helps the operators decide when to reroute traffic

00:08:14.366 onto less busy paths, or when to install

00:08:17.066 more network capacity to keep good performance.

 

00:08:20.566 And a few would argue that this sort of

00:08:22.733 monitoring is a problem.

 

00:08:26.066 On the other hand, operators can monitor to traffic

00:08:29.200 to profile what sites that customers are visiting.

 

00:08:32.600 This information could then be sold to advertisers,

00:08:35.533 or could be used to negatively influence

00:08:37.900 the performance at the traffic.

 

00:08:40.500 For example, an operator might choose to lower the

00:08:43.100 priority of Netflix traffic

00:08:44.666 for customers who haven't signed up

00:08:46.133 to their video streaming package.

 

00:08:49.433 Many people are less comfortable with such behaviors,

00:08:52.566 and communication security measures can limit

00:08:55.066 their effectiveness.

 

00:08:59.300 Finally, of course, are criminals and malicious users

00:09:02.666 that try to steal data and user credentials,

00:09:05.566 that try to perform identity theft,

00:09:07.600 or conduct other attacks.

 

00:09:10.533 Communication security clearly cannot prevent

00:09:13.366 all such attacks, but it can limit their scope

00:09:16.800 by limiting the amount of information that's available

00:09:19.466 and visible to those monitoring the networks.

 

00:09:26.366 As a result of these various attacks,

00:09:28.366 there are a range of measures that can be deployed

00:09:30.666 that can help to protect

00:09:31.800 privacy by encrypting network traffic.

 

00:09:35.500 Unfortunately, what makes this problem space challenging,

00:09:39.333 is that the mechanisms used to protect

00:09:41.433 against malicious attacks also prevent benign monitoring.

 

00:09:46.700 There's no known way to stop criminals

00:09:49.066 and malicious attackers from accessing private data

00:09:52.066 that doesn't also stopped legitimate law enforcement

00:09:55.100 from doing so, for example.

 

00:10:01.533 In addition to monitoring and observing data

00:10:03.666 as it traverses the network,

00:10:05.266 many organizations might also try to modify messages.

 

00:10:10.266 Governments and law enforcement, for example,

00:10:13.266 might require ISPs to censor,

00:10:15.466 or modify, DNS responses

00:10:17.300 to restrict access to certain sites.

 

00:10:20.133 They might require DNS responses to be modified

00:10:23.000 to indicate that certain sites don't exist,

00:10:25.633 or to change the addressing the DNS response

00:10:28.366 to direct users to a page indicating that the

00:10:30.700 content is blocked.

 

00:10:33.700 Alternatively, governments might require ISPs

00:10:36.166 and network operators to block or rewrite traffic

00:10:39.266 containing certain content.

 

00:10:43.166 As with government traffic monitoring,

00:10:45.266 there can be reasonable, and unreasonable,

00:10:47.566 reasons for governments to modify messages.

 

00:10:51.833 Many countries have widely accepted laws

00:10:54.466 about restricting hate speech,

00:10:56.366 blocking child pornography,

00:10:58.166 or preventing terrorism.

 

00:11:01.033 Part of the implementation of such laws

00:11:03.433 is often by modifying DNS responses

00:11:06.033 to limit access to certain sites.

 

00:11:09.966 The same techniques can, of course,

00:11:12.400 also be used to block other types of content,

00:11:15.166 or restrict other kinds of speech.

 

00:11:19.833 Businesses and network operators might also block

00:11:22.700 or modify contact.

 

00:11:24.700 The DNS server in a cafe, or a train,

00:11:27.633 that redirects you to a sign up page,

00:11:29.400 and asks asks for payment before letting you browse the web

00:11:32.300 on their Wi-Fi is an example.

 

00:11:35.400 Other examples might be services that filter spam

00:11:38.266 or block malicious attachments,

00:11:40.200 that enforce terms of service,

00:11:42.166 or that try to prevent copyright infringement.

 

00:11:46.666 And finally, of course, there are criminals,

00:11:48.666 and malicious users,

00:11:50.166 people modifying content to conduct phishing scams,

00:11:53.100 steal identity, mislead, and defraud.

 

00:11:57.800 And, again, what makes this problem space challenging

00:12:01.433 is that mechanisms that protect message integrity

00:12:03.966 against malicious attackers

00:12:05.900 also prevent benign modification.

 

00:12:10.400 For example, a recent development

00:12:12.833 in network security is DNS over HTTPS.

00:12:17.300 This is an approach to encrypting DNS traffic

00:12:20.233 that was designed to protect users from phishing attacks

00:12:23.500 where an attacker on the local networks

00:12:25.566 spoofs DNS responses to perform identity theft.

 

00:12:29.633 It does this successfully.

 

00:12:32.900 Unfortunately, some Internet service providers in the UK

00:12:37.333 intentionally spoofed DNS responses

00:12:40.766 to block access to sites hosting child abuse material,

00:12:44.100 as part of a government government mandated blocklist.

 

00:12:49.200 Encrypting DNS traffic using DNS over HTTPS

00:12:53.766 to protect, to prevent against, identity theft

00:12:57.733 unintentionally also prevented

00:13:00.333 the child abuse block list from working,

00:13:02.300 since both relied on the same vulnerability in DNS.

 

00:13:07.433 And again, this is an area, whether a difficult questions,

00:13:10.933 and it's not we have all the right answers.

 

00:13:18.266 The final reason for securing communications

00:13:20.900 relates to protocol ossification.

 

00:13:24.233 it's common for network operators to deploy middle boxes,

00:13:27.300 of various sorts, to monitor and modify traffic.

 

00:13:32.600 These can be devices such as NATS and firewalls,

00:13:35.400 traffic shapers, filters, or protocol accelerators.

 

00:13:39.666 And these middle boxes need to understand the traffic

00:13:42.566 they're observing or modifying.

 

00:13:45.166 For example, in order to translate IP addresses and ports,

00:13:48.966 a NAT needs to know the format of an IP packet,

00:13:52.500 and where the ports are located in the TCP and UDP header.

 

00:13:57.500 Equally, a traffic shaping device,

00:14:00.100 intended to limit the throughput of TCP connections

00:14:03.000 for a particular user,

00:14:04.366 needs to understand the congestion control

00:14:06.933 algorithm used by TCP,

00:14:08.800 otherwise how can it influence

00:14:10.766 the sending rate of a connection?

 

00:14:14.700 This means that the network becomes more complex.

 

00:14:18.566 It means that devices in the network no longer just look at

00:14:21.766 the IP headers and forward the packets

00:14:23.800 based on the destination address.

00:14:26.066 They also understand details of TCP and UDP,

00:14:29.866 and other protocols,

00:14:31.433 and observe inspect and modify those protocols too.

 

00:14:36.300 And this leads to a problem known as protocol ossification,

00:14:40.566 where it becomes difficult to change the protocols

00:14:43.766 running between the endpoints,

00:14:45.533 because doing so interacts poorly with middle boxes

00:14:48.333 that don't understand the new version of the Protocol.

 

00:14:52.400 For example, it'd be very difficult to change the format

00:14:55.500 of the TCP header now, even if we could

00:14:58.566 upgrade all the systems to support the new version,

00:15:01.400 because of all the NATs and firewalls

00:15:03.833 that would also need updating.

 

00:15:07.300 This protocol ossification,

00:15:09.466 where the network learns about the transport

00:15:11.766 and higher layer protocols,

00:15:13.366 effectively prevents those protocols from being upgraded,

00:15:16.733 and occurs because the network has visibility

00:15:19.466 into those protocols.

 

00:15:22.933 Encryption offers one way to prevent ossification.

 

00:15:27.700 The more of a protocol that's encrypted,

00:15:30.033 the easier it is to change that protocol,

00:15:32.566 since the encryption will have stopped middleboxes

00:15:35.166 from understanding or modifying the data.

 

00:15:39.200 There's a trade off, though,

00:15:40.933 between the ability to change end-to-end protocols

00:15:43.733 and the ability of the networks offer helpful features.

 

00:15:47.866 The more of a protocol that's encrypted,

00:15:50.200 the easier it is to change the protocol.

00:15:53.300 But the harder it is for middle boxes,

00:15:55.400 to provide help from the network.

 

00:15:58.733 The draft shown on the slide,

00:16:00.966 on "Long-term viability of protocol extension mechanisms",

00:16:04.266 talks about these issues further,

00:16:06.100 and talks about how to extend and modify protocols

00:16:08.866 and ensure that protocols remain changeable.

00:16:11.466 It'ss very much worth reading.

 

00:16:18.366 As we've seen there are good reasons to encrypt

00:16:21.900 and authenticate data.

 

00:16:24.533 Doing so helps to provide privacy,

00:16:26.733 it helps to prevent fraud,

00:16:28.366 and it helps to allow protocols to evolve

00:16:30.600 while avoiding network ossification.

 

00:16:34.033 Providing security in this way is a good thing,

00:16:36.733 but they're always trade offs,

00:16:38.300 and I've tried to highlight some of these.

 

00:16:41.433 In particular, it's always possible to find examples

00:16:45.033 where providing security to protect against some attacker

00:16:48.633 will prevent some beneficial monitoring or service.

 

00:16:53.533 There are no easy solutions here.

 

00:16:58.500 It's easy to argue that we must encrypt everything

00:17:01.700 to ensure privacy,

00:17:03.100 missing that this causes some real problems.

 

00:17:07.433 Equally, it's easy to argue that law enforcement

00:17:10.766 should have exceptional access to communications,

00:17:13.533 to help prevent terrorism and child abuse, for example,

00:17:16.833 missing, that there are very real risks that this will cause

00:17:20.500 serious other problems.

 

00:17:24.466 We need more dialogue between engineers,

00:17:27.400 protocol designers, network operators,

00:17:30.100 policymakers, and law enforcement,

00:17:32.566 to better understand the constraints and the concerns.

 

00:17:38.200 The "Keys Under Doormats" paper, linked from the slide,

00:17:41.233 talks about these issues in more detail,

00:17:43.433 and I very much encourage you to read it.

 

00:17:48.933 Finally, as more and more data is encrypted and protected,

00:17:52.900 we're also starting to see increasing discussion

00:17:55.700 of end system based content monitoring.

 

00:17:59.866 The argument here is that encryption is important

00:18:02.733 to prevent attacks by malicious users,

00:18:05.400 but that law enforcement need access to protect us.

 

00:18:09.000 But, since effective encryption prevents law enforcement

00:18:12.466 from monitoring traffic on the network,

00:18:14.400 then maybe they should be able to monitor the traffic

00:18:16.833 on the end systems, after it's traversed the network.

 

00:18:20.866 And there's a certain appeal to this.

 

00:18:24.933 If done correctly, the encryption provides

00:18:27.833 protection against a large class of attacks,

00:18:30.500 and correct implementation of end-system based monitoring

00:18:34.033 limits who can monitor traffic

00:18:35.633 to those with legitimate needs and legitimate authority.

 

00:18:40.200 And, in some cases that's an appropriate compromise.

 

00:18:45.100 It doesn't seem problematic for social networks

00:18:48.033 like Facebook,for example,

00:18:49.566 to support law enforcement in monitoring their network

00:18:52.933 to detect people sharing child abuse material.

 

00:18:56.833 But,

00:18:58.633 as Apple found out when they announced that they were

00:19:00.900 to implement similar monitoring running on iPhones

00:19:03.533 for one-to-one and group iMessage chats,

00:19:07.433 the expectations around privacy,

00:19:09.800 law enforcement access, and abuse protection,

00:19:12.633 vary very much between social networks,

00:19:15.566 one-to-one communications,

00:19:17.466 group communications, and public posts.

 

00:19:20.366 And the boundaries between these categories,

00:19:22.866 and what's acceptable in terms of monitoring

00:19:25.466 and protection and privacy,

00:19:27.400 can be very hard to distinguish.

 

00:19:31.333 And again, there are some difficult questions

00:19:33.600 relating to what type of privacy protection

00:19:36.133 and what type of monitoring is technically

00:19:38.500 possible to implement on end-systems,

00:19:40.933 and what's socially acceptable,

00:19:43.033 and what's desirable.

 

00:19:46.100 And the the paper on the slide,

00:19:48.600 "Bugs in our pockets",

00:19:49.700 talks about this issue in a lot more detail.

 

00:19:56.133 So that wraps up the discussion of why

00:19:58.766 secure communication is needed.

 

00:20:02.166 Network traffic is frequently monitored

00:20:04.866 by governments, businesses,

00:20:07.166 network operators, and malicious users.

00:20:10.466 Some of this monitoring is beneficial,

00:20:13.100 some of it less so.

 

00:20:15.966 In the following parts, I'll talk about

00:20:18.200 the technologies we can use to provide privacy,

00:20:21.433 to protect message integrity,

00:20:23.266 and to protect and prevent protocol ossification.

Part 2: Principles of Secure Communication

The 2nd part of the lecture reviews the principles of secure communication. It describes the concepts behind symmetric, public-key, and hybrid cryptography. It outlines techniques for message integrity protection and authentication including cryptographic hash functions and digital signatures. And it reviews the need for a public key infrastructure.

Slides for part 2

 

00:00:00.233 In this part, I want to talk

00:00:02.200 about some of the principles of secure

00:00:04.366 communication. I’ll talk about how we go

00:00:06.566 about ensuring confidentiality of messages as they

00:00:08.733 traverse the network.

 

00:00:09.766 About how we authenticate messages to ensure

00:00:12.500 that they're not modified in transit,

00:00:14.800 and about how we can go about

00:00:17.500 validating the identity of the participants in

00:00:20.233 a communication.

 

00:00:21.100 So what are the goals of secure communication?

 

00:00:24.933 Well, we're trying to deliver a message

00:00:27.066 across the internet from a sender to a receiver.

 

00:00:30.600 In the process we want to avoid

00:00:32.833 eavesdropping on the message – we need

00:00:35.033 to encrypt it in order to provide

00:00:37.266 confidentiality, to make sure no one other

00:00:39.500 than the intended receiver can have access

00:00:41.700 to the content of the message.

 

00:00:43.700 We want to avoid tampering with the

00:00:45.666 message – we need to authenticate the

00:00:47.600 message to ensure that it's not modified

00:00:49.533 in transit by any of the devices

00:00:51.466 which are which are involved in the

00:00:53.433 delivery of that message.

 

00:00:54.633 And we want to avoid spoofing –

00:00:57.233 we want to somehow validate the identity

00:00:59.800 of the sender, so that the receiver

00:01:01.733 knows, and can be sure of who the message came from.

 

00:01:07.000 So how do we go about providing confidentiality?

 

00:01:10.300 Well unfortunately data traversing the network can

00:01:13.033 be read by any of the devices

00:01:15.033 on the path between the sender and the receiver.

 

00:01:17.566 It's possible to eavesdrop on packets as

00:01:19.266 they traverse the links that comprise the

00:01:21.000 network. And it's also possible to configure

00:01:23.066 the switches or routers to snoop on

00:01:25.166 the data as they're forwarding it between

00:01:27.266 the different links in the network.

 

00:01:29.333 The network operator can always do this.

00:01:32.366 They own the network;

00:01:33.933 they can configure the devices to save

00:01:36.000 a copy of the data if they choose to do so.

 

00:01:38.600 If the network's been compromised, maybe so can others.

 

00:01:42.333 If an attacker can break

00:01:43.800 into the routers, for example, there's nothing

00:01:46.500 stopping them saving the data, redirecting copies

00:01:49.200 of data traversing the network to some other location.

 

00:01:52.800 If the data can always be read,

00:01:55.366 how do we provide confidentiality?

 

00:01:57.300 Well, we use encryption to make sure

00:01:59.400 that the data is useless if it's

00:02:01.500 intercepted or copied. We can't stop an

00:02:03.600 attacker, or the network operator, from reading

00:02:05.700 our data. But we can make sure

00:02:07.333 that they can't make sense of it

00:02:09.166 if they do read it.

 

00:02:11.500 There are two basic approaches to providing encryption.

 

00:02:15.233 The first is called symmetric cryptography.

 

00:02:18.066 Algorithms such as the Advanced Encryption Standard, AES.

 

00:02:22.133 The other approach is what's known as

00:02:24.466 public key cryptography.

00:02:25.700 Algorithm such as the

00:02:27.033 Diffie-Hellman algorithm, the RSA algorithm, and elliptic

00:02:30.100 curve algorithms.

 

00:02:31.700 They have quite different properties and are

00:02:34.200 used in different situations. I’ll talk about

00:02:36.700 the details and the differences between them in a minute.

 

00:02:40.366 Both of them are based on some

00:02:42.666 fairly complex mathematics. I'm not going to

00:02:44.933 attempt to describe how that works.

 

00:02:47.066 What's important is not the details of

00:02:49.133 the maths. But what are their properties,

00:02:51.433 what behaviours do they provide, and how

00:02:53.300 do they help us secure data as it traverses the network?

 

00:02:57.366 So we’ll start with the idea of symmetric cryptography.

 

00:03:01.300 The idea of symmetric encryption is that

00:03:03.566 it can convert plain text into cipher

00:03:05.833 text with the aid of a key.

 

00:03:08.700 If you have, for example, the plain

00:03:10.433 text as we see on the top-right

00:03:12.666 of the slide, and we pass it

00:03:14.933 through the encryption algorithm, in this case,

00:03:17.166 the AES Advanced Encryption Algorithm, with the

00:03:19.400 aid of an encryption key, we get

00:03:21.633 a blob of encrypted text as we

00:03:23.900 see it in the middle.

 

00:03:25.600 If we pass that encrypted text through

00:03:28.700 the inverse algorithm, the decryption algorithm,

00:03:31.333 using the same key, then we get

00:03:34.433 the original text back out.

 

00:03:36.766 The point is that a single secret

00:03:39.200 key controls both the encryption and the

00:03:41.666 decryption process. The key used to encrypt

00:03:44.100 is the same as the key used

00:03:46.566 to decrypt.

 

00:03:47.366 Now, provided the key is kept secret.

00:03:49.900 And it's known only to the sender

00:03:52.433 and receiver. This can be very secure,

00:03:54.933 and it can be very fast.

 

00:03:57.200 Symmetric algorithms such as AES can encrypt

00:04:00.433 and decrypt many gigabits per second.

00:04:03.200 This makes them very suitable for Internet

00:04:06.433 communications because they don't slow down the

00:04:09.666 communications, while still providing security.

 

00:04:12.100 There are a wide range of different

00:04:15.333 symmetric encryption algorithms, probably the most widely

00:04:18.566 used is the US Advanced Encryption Standard, AES.

 

00:04:22.600 The AES algorithm was developed as part

00:04:24.933 of the output of an open competition,

00:04:27.533 run by the US National Institute of

00:04:30.100 Standards, and it's actually a Dutch algorithm

00:04:32.700 known as Rijndael.

 

00:04:33.900 Importantly, the AES algorithm, the Rijndael algorithm,

00:04:36.700 is public and the security of the

00:04:39.533 algorithm depends only on keeping the key

00:04:42.333 secret, not on keeping the algorithm itself secret.

 

00:04:45.966 The link on the slide is a

00:04:47.966 pointer to the specification for the algorithm,

00:04:50.266 and there’s a large amount of open

00:04:52.566 source code which implements it.

 

00:04:54.333 The problem of symmetric cryptography is that

00:04:56.900 you need to keep the key secret.

00:04:59.500 If anyone other than the sender and

00:05:02.066 the receiver know the key, then the

00:05:04.666 security of the encryption fails.

 

00:05:06.600 The question then, is how do you

00:05:09.100 security distribute the key? If you want

00:05:11.600 to exchange message a secure message with

00:05:14.100 someone I know well, then this is

00:05:16.600 straightforward. I can meet them in person,

00:05:19.100 give them the key, and ensure that

00:05:21.600 no one else can eavesdrop on that communication.

 

00:05:24.833 The problem comes when I'm trying to

00:05:26.700 communicate securely with someone where I can't

00:05:28.833 meet them in person.

 

00:05:30.166 How do I securely get a key

00:05:32.400 from an Internet shopping site, for example?

00:05:34.666 The only means of communication. I have

00:05:36.900 is over the Internet. And if I

00:05:39.166 send the key over the Internet,

00:05:41.066 someone can eavesdrop on the key,

00:05:43.000 and that gives them the ability to

00:05:45.266 decrypt our communications and breaks the security.

 

00:05:47.600 The solution to this is an approach

00:05:50.466 known as public key cryptography.

 

00:05:52.600 public key cryptography, like symmetric cryptography,

00:05:54.833 is used to convert a plain text

00:05:57.466 message into an encrypted form. The difference,

00:06:00.100 though, is that there are two different

00:06:02.733 keys, and the key used to encrypt

00:06:05.333 the message, and the key to decrypt

00:06:07.966 the message are different

 

00:06:09.566 The keys come in pairs. The two

00:06:11.633 halves of the pair are known as

00:06:13.733 the public key and the private key.

 

00:06:15.900 Importantly, a message which is encrypted using

00:06:18.466 one of those keys can only be

00:06:21.000 decrypted using the other key. If the

00:06:23.566 message is encrypted with the public key,

00:06:26.133 for example, then only the private key

00:06:28.666 can decrypt that message.

 

00:06:30.233 As you might expect from the names.

00:06:32.400 The idea is that you keep the

00:06:34.566 private key from the key pair secret,

00:06:36.766 and you make the public key as

00:06:38.933 public as is possible.

 

00:06:40.266 You publish it in the phone book,

00:06:42.200 you put it on your webpage,

00:06:43.866 you write it on your business card,

00:06:45.833 and you make sure everybody knows that

00:06:47.766 this is your public key.

 

00:06:49.266 In order to send you a message,

00:06:51.566 someone looks up your public key and

00:06:53.866 uses that to encrypt the message.

 

00:06:55.933 Once the message has been encrypted using

00:06:58.333 a particular public key, the only thing

00:07:00.733 which can decrypt it is the corresponding

00:07:03.166 private key. And since the private key

00:07:05.566 has been kept private, you're the only

00:07:07.966 one who can receive the message.

 

00:07:10.133 This solves the key distribution problem.

00:07:12.500 Provided you can look up the appropriate

00:07:15.266 public key for the receiver in a directory,

 

00:07:19.066 and you can trust that the receiver

00:07:20.633 has kept their private key secret,

00:07:22.433 then you use their public key to

00:07:24.533 encrypt the message, and you know that

00:07:26.600 they're the only one who can decrypt it.

 

00:07:29.433 This allows Internet shopping sites, and the

00:07:31.633 like, to work. If I wish to

00:07:33.266 buy something from Amazon, I look up

00:07:35.333 the key for Amazon in a directory,

00:07:37.433 use that to encrypt the message I'm

00:07:39.500 sending to Amazon, and I know that

00:07:41.600 they're the only ones that can decrypt it.

 

00:07:44.266 The problem with public key cryptography is

00:07:46.833 that it’s very slow. The public key

00:07:49.600 algorithms such as the Diffie-Hellman algorithm,

00:07:52.000 the RSA algorithm,

 

00:07:53.266 and the elliptic curve algorithms, work millions

00:07:56.300 of times slower than symmetric encryption algorithms.

00:07:59.333 The result is that they’re too slow

00:08:02.366 to use for any realistic amount of

00:08:05.366 communication. The performance just isn't there.

 

00:08:08.066 Accordingly, modern communications use what's known as

00:08:11.433 hybrid cryptography, where they use a combination

00:08:14.800 of both public key and symmetric cryptography.

 

00:08:18.266 This provides both security and speed.

 

00:08:21.866 The way this works is that the

00:08:24.666 sender and receiver use public key cryptography,

00:08:27.466 which is very slow, to exchange a

00:08:30.266 small amount of information.

 

00:08:31.966 That information is then used as the

00:08:34.633 key for the symmetric encryption algorithm,

00:08:36.866 which is very fast.

 

00:08:38.500 In detail, the sender chooses a random

00:08:41.133 value, that we’ll call Ks, which will

00:08:43.733 be used as the key for the symmetric encryption.

 

00:08:47.233 The sender then looks up the receiver’s

00:08:49.933 public key, Kpub, uses it to encrypt

00:08:52.600 Ks and sends the result to the receiver.

 

00:08:56.066 The receiver uses its corresponding private key,

00:08:59.133 Kpriv, to decrypt the message and retrieve Ks.

 

00:09:03.200 This securely transfers Ks, the key for

00:09:07.000 the symmetric encryption algorithm, from the sender

00:09:10.300 to the receiver.

 

00:09:11.933 Doing this using public key encryption is

00:09:14.466 very slow, but the key for the

00:09:16.966 symmetric encryption, Ks, is very small,

00:09:19.100 so the fact it's very slow doesn't matter.

 

00:09:22.266 The sender, then uses that key,

00:09:24.866 Ks, to encrypt future messages using symmetric

00:09:28.133 cryptography, for example, using the AES algorithm.

 

00:09:31.466 The receiver also has Ks, which it

00:09:34.100 exchanged using the public key encryption,

00:09:36.333 and can use that to decrypt the messages.

 

00:09:39.733 Symmetric cryptography is very fast, so the

00:09:42.400 performance of the communication, once it's got

00:09:45.400 started, is very quick, but it requires

00:09:48.400 the key to be exchanged securely.

00:09:50.966 The public key algorithm, which is slow,

00:09:53.933 is used to securely exchange the key.

 

00:09:57.033 The result is something which achieves both

00:10:01.266 confidentiality, and solves the key distribution problem,

00:10:05.533 and also achieves good performance.

 

00:10:08.666 Encryption gives you confidentiality of data and

00:10:10.833 makes sure that no one can eavesdrop

00:10:13.000 on the messages being sent from the

00:10:15.200 sender to the receiver.

 

00:10:16.533 We also, though, need to verify the

00:10:18.866 identity of the sender, and make sure

00:10:21.166 that messages haven't been modified in transit.

 

00:10:23.600 In order to do this, we generate

00:10:26.033 a digital signature to authenticate our messages.

00:10:28.466 And the receiver can then validate that

00:10:30.900 signature, check the signature, to make sure

00:10:33.300 they came from the expected sender.

 

00:10:35.500 The digital signature relies on a combination

00:10:39.400 of public key cryptography,

00:10:41.066 and a cryptographic hash algorithm.

 

00:10:44.366 So first of all, what is a cryptographic hash?

 

00:10:47.966 A cryptographic hash function is a function

00:10:50.733 that takes some arbitrary length input and

00:10:53.533 produces a fixed length output hash that

00:10:56.300 somehow represents that input.

 

00:10:58.000 For example, at the top of the

00:11:00.466 slide, we see some input text going

00:11:02.933 through a hash algorithm, known as SHA256,

00:11:05.400 that produces the fixed length output block

00:11:07.866 you see on the right.

 

00:11:09.766 A cryptographic hash algorithm has four fundamental

00:11:12.533 properties. The first is that every input

00:11:15.300 will generate a different output, and the

00:11:18.100 slightest change to the input will change

00:11:20.866 the output value.

 

00:11:22.166 The second is that it should be

00:11:24.466 infeasible to give to find two inputs

00:11:26.733 that gives the same output.

 

00:11:28.466 The third is that calculating the hash

00:11:30.800 itself should be fast, and going from

00:11:33.100 input to output should happen very quickly.

 

00:11:35.533 And the fourth, and perhaps most important,

00:11:37.800 is that reversing a hash should be

00:11:40.100 infeasible. If you're only given the output,

00:11:42.400 there should be no way of finding

00:11:44.666 out what the inputs was.

 

00:11:46.400 A cryptographic hash therefor acts as a

00:11:49.200 unique fingerprint for the input data.

00:11:51.600 It provides a short output, that uniquely

00:11:54.400 identifies a given message.

 

00:11:56.100 There are many different cryptographic hash algorithms.

00:11:59.800 The current recommendation is the SHA256 over

00:12:03.500 specified by the IETF in RFC 6234.

 

00:12:07.300 There are a number of older algorithms,

00:12:10.066 such MD5 and SHA1, which you may

00:12:12.866 hear about, but these all have known

00:12:15.666 security flaws and are not recommended for use.

 

00:12:19.466 So how can we use a cryptographic

00:12:21.333 hash to help build a digital signature?

 

00:12:23.800 Well, in order to do that,

00:12:25.933 you take the message you wish to

00:12:28.400 send, and you calculate a cryptographic hash

00:12:30.900 of that message.

 

00:12:32.066 The sender that encrypts that hash with

00:12:34.300 their private key. Now the private key

00:12:36.533 is known only to the sender,

00:12:38.433 so they're the only one who can

00:12:40.633 encrypt that message.

 

00:12:41.700 But the thing which would decrypt it

00:12:44.133 is the sender’s public key, which is

00:12:46.566 available to everybody. Encrypting the hash with

00:12:48.966 the sender’s private key doesn't provide any

00:12:51.400 confidentiality, because anyone can decrypt the message

00:12:53.833 using the public key.

 

00:12:55.333 What it does do though, provided the

00:12:57.633 sender can be trusted to keep its

00:12:59.966 private key private, is demonstrate that the

00:13:02.266 sender must have encrypted the hash.

00:13:04.266 Since the hash is a fingerprint of

00:13:06.566 the message, this means that the sender

00:13:08.900 must have generated the original message.

 

00:13:10.966 The sender then attaches the encrypted hash

00:13:14.033 to the message, forming the digital signature.

 

00:13:17.200 The message, and its digital signature,

00:13:19.833 are then encrypted and sent to the

00:13:22.933 receiver using hybrid encryption.

 

00:13:24.766 When the message arrives at the receiver,

00:13:27.466 the receiver can verify the signature.

 

00:13:29.866 To do this, it first decrypt that

00:13:32.566 the message and its digital signature.

00:13:34.900 The receiver then takes the message itself,

00:13:37.600 and calculates its cryptographic hash.

 

00:13:39.633 Having done that, it takes the digital

00:13:42.333 signature, looks up the sender’s public key,

00:13:45.000 and uses that to decrypt the digital

00:13:47.700 signature to retrieve the original

 

00:13:49.700 cryptographic hash that was in the message.

00:13:52.233 It compares the hash, which has sent

00:13:54.800 in the message as part of the

00:13:57.333 digital signature, with the cryptographic hash it

00:13:59.866 just calculated.

 

00:14:00.700 If the two match, then it knows

00:14:02.966 the messages is authentic and has been

00:14:05.266 unmodified, provided is trusts the sender to

00:14:07.566 have kept its private key private.

 

00:14:09.633 If the hash of the message it

00:14:11.900 calculated, and the hash that was sent

00:14:14.166 in the digital signature, don't match then

00:14:16.400 it knows that somehow the message has

00:14:18.666 been modified in transit.

 

00:14:20.066 Public Key Encryption is therefore one of

00:14:22.200 the fundamental building blocks of a secure network.

 

00:14:25.066 It allows us to send a message

00:14:26.900 to a recipient securely, even if we've

00:14:29.100 not met that recipient, and be sure

00:14:31.300 that they're the only one who’ll be

00:14:33.466 able to decrypt that message. And it

00:14:35.666 allows us to use digital signatures to

00:14:37.866 verify that messages have not been modified

00:14:40.033 in transit.

 

00:14:40.766 The security of public key encryption,

00:14:43.166 though, depends on knowing which public key

00:14:45.933 corresponds to a particular receiver.

 

00:14:48.033 There are three ways you can know

00:14:50.300 this. The first is that the receiver

00:14:52.566 gives you their key in person.

 

00:14:54.633 The second is that the receiver sent

00:14:56.966 you their key, but the message in

00:14:59.300 which they send it is authenticated by

00:15:01.666 someone you trust.

 

00:15:02.766 That is, there’s a digital signature in

00:15:05.266 the message, signed by someone who's key

00:15:07.766 already have, that authenticates that this message

00:15:10.300 is from who it claims to be from.

 

00:15:13.633 The third is that someone you trust

00:15:16.166 gives you the receivers key.

 

00:15:18.333 In the Internet, the role of someone

00:15:20.800 you trust is often played by an

00:15:23.300 organisation known as a certificate authority,

00:15:25.400 as part of a public key infrastructure.

 

00:15:28.000 The role of a certificate authority is

00:15:30.733 to validate the identity of potential senders.

00:15:33.466 The certificate authority checks the identity of

00:15:36.200 a potential sender, and then adds a

00:15:38.933 digital signature to the sender’s public key

00:15:41.666 to indicate that it's done so.

 

00:15:44.100 If a receiver trusts the public key

00:15:47.300 infrastructure, trusts the certificate authority, then it

00:15:50.500 can verify that digital signature, added by

00:15:53.700 the certificate authority, to confirm the identity

00:15:56.866 of the sender.

 

00:15:58.366 These mechanisms, symmetric and public key encryption,

00:16:01.766 and digital signatures, allow us to provide

00:16:05.200 confidentiality for communication over the Internet that

00:16:08.600 performs well and is secure.

 

00:16:11.600 They allow us to authenticate messages,

00:16:13.700 and demonstrate that they've not been modified in transit.

 

00:16:16.633 And they allow us to validate the identity of senders

00:16:19.466 of those messages.

Part 3: Transport Layer Security (TLS) v1.3

The 3rd part of the lecture describes the operation of the Transport Layer Security Protocol (TLS) v1.3; one of the key security protocols used in the Internet.

Slides for part 3

 

00:00:00.333 In previous parts of this lecture I

00:00:02.633 spoke about network security in general terms.

00:00:05.266 In part one, I discussed why security

00:00:07.933 is needed in order to protect Internet communications,

 

00:00:11.233 and in part two, I spoke about

00:00:13.733 how security is provided in outline.

00:00:16.033 I spoke about the different types of

00:00:18.700 encryption, public key and symmetric,

 

00:00:20.733 the use of hybrid encryption, in order

00:00:24.033 to improve performance while still maintaining security,

00:00:27.333 and the ideas of digital signatures and

00:00:30.633 public key infrastructure.

 

00:00:32.133 In this third part of the lecture,

00:00:34.533 I want to move on to talk

00:00:36.966 about Internet security in specific terms.

00:00:39.033 I want to talk about the Transport

00:00:41.433 Layer Security protocol, TLS version 1.3

 

00:00:43.633 I’ll begin by introducing what is TLS,

00:00:45.933 talking about conceptually what role it performs

00:00:48.266 in the network stack. And I'll talk

00:00:50.566 through some of the details of TLS.

 

00:00:52.966 I'll talk about the TLS handshake protocol,

00:00:56.133 that's used to establish TLS connections.

00:00:58.800 The record protocol, that's used to exchange

00:01:01.933 data. The 0-RTT extension, that reduces connection

00:01:05.066 setup times. And finally, I'll talk about

00:01:08.233 some of the limitations of TLS.

 

00:01:11.000 As we saw in some of the

00:01:13.833 earlier lectures, TCP connections are not secure

 

00:01:16.733 Neither the TCP headers, nor the IP

00:01:19.533 headers, nor the data they transfer are

00:01:22.300 encrypted or authenticated in any way.

 

00:01:24.766 Data sent in a TCP connection is

00:01:28.200 not confidential. It can be observed by

00:01:31.600 governments, businesses, network operators, criminals,

00:01:34.433 or malicious users.

 

00:01:35.733 Similarly, the data is not authenticated.

00:01:37.666 Anyone who's able to access the network

00:01:40.066 connections, or the routers over which the

00:01:42.466 data flows, is able to modify that

00:01:44.833 data. And the sender and the receiver

00:01:47.233 will not be able to tell that

00:01:49.633 such modifications have been performed.

 

00:01:51.466 In order to provide security for data

00:01:54.166 going across a TCP connection, we need

00:01:56.866 to run some sort of additional security

00:01:59.566 protocol within that TCP connection to protect

00:02:02.266 the data.

 

00:02:03.166 The way this is typically done in

00:02:05.366 the Internet, is using a protocol called

00:02:07.566 the Transport Layer Security protocol.

 

00:02:09.233 The latest version of this is TLS

00:02:12.166 1.3 and it's used to encrypt and

00:02:15.066 authenticate data that is carried within a

00:02:17.966 TCP connection.

 

00:02:18.900 The official specification for TLS 1.3 is

00:02:21.900 RFC 8446, which was published by the

00:02:24.900 IETF in the last couple of years.

 

00:02:28.033 The TLS specification is not a simple

00:02:30.933 document to read.

 

00:02:32.266 In part, this is because it's solving

00:02:34.866 a difficult problem. Providing security over the

00:02:37.433 top of an insecure connection, a TCP

00:02:40.033 connection, is a complex challenge, and TLS

00:02:42.600 has to define the number of complex

00:02:45.200 mechanisms in order to provide that security.

 

00:02:47.866 In other part, the complexity comes because

00:02:50.600 TLS is an old protocol.

 

00:02:52.666 The latest versions of TLS have to

00:02:55.533 be backwards compatible, not only with previous

00:02:58.400 versions of TLS as specified, but with

00:03:01.266 previous implementation problems, and bugs in the

00:03:04.133 TLS specification and in its implementations

 

00:03:06.700 The protocol designers have done a good

00:03:09.766 job, though. TLS version 1.3 is smaller,

00:03:12.866 faster, and simpler than previous versions of

00:03:15.933 TLS, and it's also more secure.

 

00:03:18.700 The slide lists four blog posts which

00:03:21.333 perfect more information about TLS. The first

00:03:24.000 one is an introduction to TLS 1.3

00:03:26.666 from the IETF. This was written by

00:03:29.300 the TLS working group chairs, and introduces

00:03:31.966 the new features in the protocol.

 

00:03:34.366 The second, from CloudFlare, is a detailed

00:03:37.000 look at what's new in TLS 1.3,

00:03:39.633 as compared to previous versions of TLS.

 

00:03:42.400 It talks about some of the advantages

00:03:44.933 of TLS 1.3, and how it improves

00:03:47.466 security, and reduces the connection set up times.

 

00:03:50.566 The third of these, from David Wong,

00:03:52.900 attempts to redraw the TLS specification in

00:03:55.300 a way that makes it easier to

00:03:57.733 read. This is a copy of RFC

00:04:00.166 8446, the TLS specification, with the diagrams

00:04:02.600 redrawn in an easier to read way,

00:04:05.033 and with explanatory videos and comments added

00:04:07.466 to make it easier to follow.

 

00:04:09.633 The final post is the most detailed.

00:04:12.566 It's an annotated packet capture showing the

00:04:15.500 details of a TLS connection.

 

00:04:17.700 This walks through the TLS connection establishment

00:04:20.433 handshake, byte by byte, labelling each byte

00:04:23.133 with reference to the specification to explain

00:04:25.866 exactly what it means, and how the

00:04:28.566 handshake proceeds.

 

00:04:29.466 I encourage you to review these four

00:04:32.033 blog posts. They give a nice complement

00:04:34.633 to the material I'll talk about in

00:04:37.200 the rest of this lecture, introducing how

00:04:39.800 TLS 1.3 works.

 

00:04:41.000 So what's the goal of TLS 1.3?

 

00:04:44.266 Well, given an existing connection, that's capable

00:04:47.400 of delivering data reliably and in the

00:04:50.566 order it was sent, but is insecure,

00:04:53.700 TLS 1.3 aims to add security.

 

00:04:56.533 That is given a TCP connection,

00:04:59.566 it seems to add authentication, confidentiality,

00:05:02.633 and integrity protection to the data sent

00:05:06.200 over that connection.

 

00:05:07.833 In terms of authentication, it uses public

00:05:10.500 key cryptography, and a public key infrastructure,

00:05:13.133 in order to verify the identity of

00:05:15.800 the server to which the connection is made.

 

00:05:19.066 That is, the client can always verify

00:05:21.500 that it's talking to the desired server.

00:05:24.100 In addition, it provides optional authentication for

00:05:26.700 the client, to allow the server to

00:05:29.266 verify the identity of the client.

 

00:05:31.600 Once the connection has been established,

00:05:34.233 and verified to be correct, TLS provides

00:05:37.333 confidentiality for data sent across that connection.

 

00:05:40.500 It uses hybrid encryption schemes to provide

00:05:43.266 good performance, while still providing a strong

00:05:46.000 amount of security.

 

00:05:47.266 Finally, TLS authenticates data sent across the

00:05:50.500 connection, to provide integrity protection. It's not

00:05:53.700 possible for an attacker to modify data

00:05:56.900 sent across a TLS connection without that

00:06:00.133 modification being detectable by the endpoints.

 

00:06:02.966 How does TLS 1.3 work?

 

00:06:05.800 Well, first of all, a TCP connection

00:06:08.566 must be established. TLS is not a

00:06:11.333 transport protocol itself, and it relies on

00:06:14.100 an underlying TCP connection in order to

00:06:16.866 exchange data.

 

00:06:17.766 Once the TCP connection has been established,

00:06:21.166 TLS runs within that connection.

 

00:06:23.700 There are two parts to a TLS

00:06:26.466 connection. It begins with a handshake protocol,

00:06:29.233 and then proceeds with a record protocol.

 

00:06:32.100 The goal of the handshake protocol,

00:06:34.200 at the beginning of the connection,

00:06:36.266 is to authenticate the endpoints and agree

00:06:38.700 on what encryption keys to use.

 

00:06:40.900 Once this is completed, TLS switches to

00:06:43.833 running the record protocol, which lets endpoints

00:06:46.766 exchange authenticated and encrypted blocks of data

00:06:49.700 over the connection.

 

00:06:51.066 TLS turns the TCP byte stream into

00:06:54.333 a series of records. It provides framing,

00:06:57.600 delivers data block by block, each block

00:07:00.866 being encrypted and authenticated to ensure that

00:07:04.133 the data being sent in that block

00:07:07.400 is confidential, and arrives unmodified.

 

00:07:09.833 A secure connection over the Internet starts

00:07:12.600 up establishing a TCP connection as normal.

 

00:07:15.466 The client connects to the server,

00:07:17.700 sending a SYN packet, along with its

00:07:20.300 initial sequence number.

 

00:07:21.500 The server response with the SYN-ACK,

00:07:23.866 acknowledging the client’s initial sequence number,

00:07:26.200 and providing the server’s initial sequence number.

00:07:28.933 And then the client responsive with an

00:07:31.700 ACK packet, acknowledging that packet from the server.

 

00:07:35.066 This sets up a TCP connection.

 

00:07:37.633 Immediately following that, the TLS handshake starts,

00:07:41.066 running within the TCP connection itself.

 

00:07:44.133 The TLS client sends a TLS ClientHello

00:07:46.966 message to a server immediately following the

00:07:49.766 final ACK of the TCP handshake.

 

00:07:52.300 The server responds to that with a

00:07:54.700 TLS ServerHello message, and then the client

00:07:57.133 in return

 

00:07:57.933 responds with a TLS Finished message.

00:08:00.433 This concludes the handshake, and carries the

00:08:03.333 first block of secure data. Following this,

00:08:06.233 the client and the server switch to

00:08:09.133 running the TLS record protocol over the

00:08:12.066 TCP connection, and exchange further secure data blocks.

 

00:08:15.433 As can be seen the TLS handshake

00:08:18.000 adds an additional round trip time to

00:08:20.000 the connection establishment.

 

00:08:21.733 At the start of the connection,

00:08:23.533 there's an initial round trip time while

00:08:25.600 TCP connection is set up.

 

00:08:27.200 And then this is followed by an

00:08:29.533 additional round trip, while the TLS connection

00:08:31.833 and the security parameters are negotiated,

00:08:33.800 before the data can be set.

 

00:08:35.866 There's a minimum of two round trip

00:08:38.633 times from the start of the TCP

00:08:41.366 connection to the conclusion of the TLS

00:08:44.133 handshake and the first secure data segment

00:08:46.866 being sent.

 

00:08:47.766 The first part of the TLS handshake

00:08:50.266 is the ClientHello message. This is sent

00:08:52.766 from the client to the server,

00:08:54.900 and begins the negotiation of the security parameters.

 

00:08:57.933 The ClientHello message does three things.

00:09:00.200 It's indicates the version TLS that is

00:09:02.966 to be used. It indicates the cryptographic

00:09:05.700 algorithms that the client supports, and provides

00:09:08.466 its initial keying material. And it indicates

00:09:11.200 the name of the server to which

00:09:13.966 the client is connecting.

 

00:09:15.633 You may wonder why the ClientHello message

00:09:17.966 needs to indicate server name, given that

00:09:20.300 it's running over a TCP connection that's

00:09:22.633 just been established to that server.

 

00:09:24.733 The reason for this, is that TLS

00:09:26.766 is often used with web hosting,

00:09:28.500 and it's common for web servers to

00:09:30.533 host more than one website,

 

00:09:32.066 so the server name provided in the

00:09:34.866 TLS ClientHello indicates which of the sites,

00:09:37.666 which are accessible over that TCP connection,

00:09:40.500 the TLS message is trying to establish

00:09:43.300 a connection, establish a secure connection, to.

 

00:09:46.333 The ClientHello message also indicates which version

00:09:48.800 of TLS is to be used.

00:09:51.033 What you would expect to happen here,

00:09:53.633 is that it would indicate that it

00:09:56.200 wishes to use TLS 1.3.

 

00:09:58.166 What actually happens, though, is that the

00:10:01.066 ClientHello message includes a version number indicating

00:10:03.933 that it wants to use TLS version

00:10:06.833 1.2, the previous version of TLS.

 

00:10:09.400 The ClientHello message includes an optional set

00:10:12.366 of extension headers, and one of those

00:10:15.366 extension headers includes an extension which says

00:10:18.366 “actually I’m really TLS version 1.3”.

 

00:10:21.033 The reason the version negotiation happens in

00:10:23.366 such a weird way, specifying an old

00:10:25.700 version of TLS in the version field,

00:10:28.033 and using an extension to indicate the

00:10:30.366 real version,

 

00:10:31.133 Is because there are too many middle

00:10:33.566 boxes, too many devices which try to

00:10:36.000 inspect TLS traffic in the network,

00:10:38.066 and which fail if the version number changes.

 

00:10:40.866 The protocol has become ossified.

00:10:43.333 We waited too long between versions of TLS.

00:10:46.366 Too many devices were deployed, to many

00:10:49.633 endpoints were deployed, which only understood version 1.2

00:10:53.066 and which didn't correctly support the version

00:10:55.733 negotiation. And then, when it came to

00:10:58.300 deploying a new version, and people tried

00:11:00.833 with early versions of TLS to just

00:11:03.400 change the version number to 1.3,

00:11:05.566 is was found that those new versions

00:11:08.133 didn't support the change.

 

00:11:09.700 The result was that connections that indicated

00:11:12.200 TLS version 1.3 in the header would

00:11:14.733 tend to fail,

 

00:11:15.900 whereas those that pretended to be TLS

00:11:18.600 version 1.2, using an extension header to

00:11:21.266 upgrade the version number, would work through

00:11:23.966 those middleboxes, and the connection could succeed

00:11:26.666 and proceed with the new version.

 

00:11:29.066 The ClientHello message is the first part

00:11:32.333 of the connection setup handshake. It doesn't

00:11:35.566 carry any new data.

 

00:11:37.533 Following the ClientHello, the server responds with

00:11:41.333 a ServerHello message.

 

00:11:43.066 The ServerHello message also indicates the version

00:11:45.866 of TLS which is to be used

00:11:48.633 and, like the ClientHello, it indicates that

00:11:51.433 the version is actually TLS version 1.2

00:11:54.233 and includes an extension header to say

00:11:57.000 that it’s really a TLS 1.3 connection

00:11:59.800 that's being established

 

00:12:01.066 In addition to the version negotiation.

00:12:03.433 The TLS ServerHello includes the cryptographic algorithms

00:12:06.200 selected by the server, which are a

00:12:08.933 subset of the set suggested by the client.

 

00:12:11.833 That is, the client suggests the cryptographic

00:12:14.733 algorithms which it supports, and the server

00:12:17.300 looks at those, finds the subset of

00:12:19.866 them which are acceptable to it,

00:12:22.066 picks one of them, and includes that

00:12:24.633 in its response.

 

00:12:25.833 The ServerHello message also includes the server’s

00:12:28.066 public key, and a digital signature which

00:12:30.266 can be used to verify the identity

00:12:32.500 of the server.

 

00:12:33.533 Like the ClientHello, it doesn't include any data.

 

00:12:38.066 Finally, the TLS handshake concludes with a

00:12:40.933 Finished message, which flows from the client

00:12:43.466 to the server. The TLS Finished message

00:12:46.033 includes the clients public key and optionally,

00:12:48.566 it includes a certificate which is used

00:12:51.133 to authenticate the client to the server.

 

00:12:53.800 The TLS Finished message concludes the connection

00:12:57.533 setup handshake.

 

00:12:58.700 In addition to the connection setup,

00:13:00.900 it may therefore include the first part

00:13:03.466 of application data that is sent from

00:13:06.033 the client to the server.

 

00:13:07.966 TLS uses the ephemeral elliptic curve Diffie-Hellman

00:13:11.266 key exchange algorithm in order to derive

00:13:14.566 the keys used for the symmetric encryption.

 

00:13:18.000 The client and the server exchange that

00:13:20.300 public keys, as part of the connection

00:13:22.633 setup handshake, and they then combine those

00:13:24.933 two public keys to derive the key

00:13:27.266 that's used for the symmetric cryptography.

 

00:13:29.333 The maths of how this works is

00:13:31.400 complex. I'm not going to attempt to

00:13:33.466 describe it here.

 

00:13:34.433 What's important though, is that the symmetric

00:13:36.933 key is never exchanged over the wire.

00:13:39.400 The client and the server only exchange

00:13:41.866 their public keys, and the symmetric key

00:13:44.366 is derived from those.

 

00:13:45.866 A TLS server provides a certificate that

00:13:48.633 allows the client to verify its identity

00:13:51.366 as part of the ServerHello message.

00:13:53.733 The client can optionally provide this information

00:13:56.466 along with its Finished message.

 

00:13:58.533 Result is that the client can always

00:14:01.000 verify the identity of the server,

00:14:03.133 and the server can optionally verify the

00:14:05.633 identity of the client.

 

00:14:07.133 The choice of encryption algorithm is driven

00:14:09.633 by the client, which provides the list

00:14:12.133 of the symmetric encryption algorithms that it

00:14:14.633 supports as part of its ClientHello message.

00:14:17.133 The server picks from these, and replies

00:14:19.633 in its ServerHello.

 

00:14:20.833 The usual result is that either the

00:14:24.766 Advanced Encryption Standard, AES, or the ChaCha20

00:14:28.700 symmetric encryption algorithm is chosen.

 

00:14:31.633 Once the TLS connection establishment protocol,

00:14:34.166 the handshake protocol, has completed the TLS

00:14:37.166 record protocol starts. The record protocol allows

00:14:40.133 the client and the server to exchange

00:14:43.133 records of data over the TCP connection.

 

00:14:46.200 Each record can contain up to two

00:14:49.033 to the power 14 bytes of data,

00:14:51.900 and is both encrypted and authenticated.

 

00:14:54.433 Records of data have a sequence number,

00:14:56.933 and they are delivered reliably, securely,

00:14:59.066 and in the order in which they

00:15:01.600 were sent.

 

00:15:02.400 The underlying TCP connection does not preserve

00:15:05.233 record boundaries. TLS adds framing to the

00:15:08.066 connection so that it does so,

00:15:10.466 and reading from a TLS connection will

00:15:13.300 block until a complete record of data

00:15:16.133 is received.

 

00:15:17.033 A TLS connection usually uses the same

00:15:19.866 encryption key to protect data for the

00:15:22.733 entire connection. However, in principle, it can

00:15:25.566 renegotiate encryption keys between records, if there's

00:15:28.400 a need to change the encryption key

00:15:31.233 partway through a connection.

 

00:15:32.966 The TLS record protocol allows the client

00:15:35.533 and the server to exchange records,

00:15:37.733 to send and receive data as they

00:15:40.300 see fit.

 

00:15:41.133 Once they finish doing so, they close

00:15:44.833 the connection, which closes the underlying TCP connection.

 

00:15:48.266 TLS 1.3 usually takes one round trip

00:15:52.066 time to establish the connection after the

00:15:54.966 TCP connection set up.

 

00:15:56.733 That is, there's the TCP SYN,

00:15:59.566 SYN-ACK, ACK handshake to establish the TCP

00:16:02.833 connection, and then an additional round trip

00:16:06.133 time for the TLS ClientHello, ServerHello,

00:16:08.966 Finished exchange.

 

00:16:10.000 However, if the client and the server

00:16:12.733 have previously communicated, TLS 1.3 allows them

00:16:15.466 to reuse some of the connection setup

00:16:18.233 parameters, and re-use the same encryption key.

 

00:16:21.066 The way this works is that the

00:16:23.266 server can send an additional encryption key

00:16:25.433 as part of its ServerHello message,

 

00:16:27.433 and the client can remember that key,

00:16:29.500 and use it the next time it

00:16:31.600 connects to the server. This is known

00:16:33.700 as a pre-shared key.

 

00:16:34.966 When the client next connects to that

00:16:37.766 server, it sends its ClientHello message as

00:16:40.566 normal. However, in addition to that ClientHello

00:16:43.333 message, it can also include some data,

00:16:46.133 and that data is encrypted using the

00:16:48.900 pre-shared key.

 

00:16:49.800 The ServerHello also proceeds as normal.

00:16:52.033 But again, can contain data encrypted using

00:16:54.666 the pre-shared key, and sent in reply

00:16:57.266 to the client, to the data included

00:16:59.866 in the ClientHello message.

 

00:17:01.466 The use of the pre-shared key therefore

00:17:03.766 allows the client and the server to

00:17:06.100 exchange data along with the initial connection

00:17:08.400 setup handshake. It allows data to be

00:17:10.733 exchanged within zero RTTs of the connection

00:17:13.033 set up, as part of the first

00:17:15.333 round trip.

 

00:17:16.100 This extension is therefore known as the

00:17:20.100 0-RTT mode of TLS 1.3.

 

00:17:23.033 The 0-RTT mode is useful, because it

00:17:25.766 allows connections to start sending data much

00:17:28.533 earlier. It removes one round trip times

00:17:31.266 worth of latency. However, it has a limitation.

 

00:17:34.233 The limitation is that, unlike the record

00:17:38.100 packets which contain a sequence number,

00:17:41.166 TLS ClientHello and ServerHello messages don't contain

00:17:44.766 a sequence number.

 

00:17:46.400 A consequence of this, is that data

00:17:48.933 sent as part of a ClientHello,

00:17:51.100 or a ServerHello, may be duplicated,

00:17:52.900 and TLS has no way of stopping this.

 

00:17:55.933 If you're writing an application that uses

00:17:58.700 TLS in 0-RTT mode you need to

00:18:01.133 be careful, and only send what's known

00:18:03.566 as idempotent data,

 

00:18:04.700 data where it doesn't matter if that

00:18:07.300 data is delivered more than once to

00:18:09.900 the server, in the 0-RTT packets.

 

00:18:12.233 Data that is sent after the first

00:18:15.033 round trip time has concluded, as part

00:18:17.800 of the regular TLS connection, doesn't suffer

00:18:20.600 from this problem, and is only ever

00:18:23.366 delivered to the application once.

 

00:18:25.466 A TLS connection is secure, but it

00:18:28.333 has a number of limitations.

 

00:18:30.466 TLS operates within a TCP connection.

 

00:18:33.966 A consequence of this, is that the

00:18:36.666 IP addresses and the TCP port numbers

00:18:39.400 are not protected. This exposes information about

00:18:42.100 who is communicating, and what application is

00:18:44.800 being used.

 

00:18:45.700 Further, the TLS ClientHello message includes the

00:18:48.500 server name, but doesn't encrypt that.

00:18:50.900 This exposes the host name of the

00:18:53.700 server to which the connection is being

00:18:56.500 made, and may be a significant privacy leak.

 

00:18:59.633 An extension, known as Encrypted Server Name

00:19:02.266 Indication, is under development, but this is

00:19:04.766 not finished yet, and there are some

00:19:07.233 concerns that it may be very difficult

00:19:09.733 to deploy.

 

00:19:10.533 TLS also relies on a public key

00:19:13.166 infrastructure to validate the keys, and to

00:19:15.766 verify the identity of clients and servers.

 

00:19:18.500 There are some significant concerns about the

00:19:21.766 trustworthiness this public key infrastructure.

 

00:19:24.166 The reasons for this are not that

00:19:26.966 the cryptographic algorithms or the mechanisms are

00:19:29.733 insecure, they’re that the browsers tend to

00:19:32.500 trust a very large range of certificate authorities,

00:19:34.766 and it's not clear to which extent all of these certificate

00:19:37.166 authorities are actually trustworthy.

 

00:19:41.300 The final limitation of TLS is that

00:19:44.700 the 0-RTT extension may deliver data more than once.

 

00:19:48.600 0-RTT is a very useful extension,

00:19:50.900 because it allows data to be delivered

00:19:53.600 with low latency at the start of

00:19:56.300 the connection, but it runs the risk

00:19:59.000 that the data is delivered multiple times,

00:20:01.700 so must be used with care.

 

00:20:04.100 That concludes the discussion TLS. I spoke

00:20:07.133 about what is TLS. I've talked about

00:20:10.133 the TLS handshake protocol, that establishes the

00:20:13.133 connection using the ClientHello, ServerHello,

00:20:15.466 and Finished messages,

00:20:16.800 and that agrees the appropriate cryptographic parameters.

 

00:20:19.766 And I spoke about the TLS record

00:20:21.666 protocol, which is used to actually exchange the data.

 

00:20:25.000 The TLS 0-RTT extension allows for faster

00:20:27.833 data transfer at the beginning of the

00:20:30.633 connection, but comes with some risks of

00:20:33.466 data replay attack. Finally, I spoke about

00:20:36.300 some of the limitations of TLS.

 

00:20:38.833 The TLS protocol has actually been wildly

00:20:41.700 successful. It's used to secure all the

00:20:44.600 traffic sent over the web. And when

00:20:47.500 used correctly, is very much a secure

00:20:50.400 protocol, that performs very well.

 

00:20:52.566 In the final part of the lecture,

00:20:54.766 I'll move on from talking about the details of the

00:20:57.100 cryptographic mechanisms, and the transport protocols,

00:21:00.033 to talk about some of the issues with writing

00:21:02.033 secure software.

Part 4: Discussion

The final part of the lecture discusses systems aspects of providing secure communication. It reviews the need for end-to-end security to protect communications. It discusses the robustness principle, and its implications for the design on input parsers and other aspects of networked systems. And it briefly reviews some of the challenges in writing secure code.

Slides for part 4

 

00:00:00.666 In the previous parts, I’ve spoken about

00:00:03.666 the general principles underlying secure communication,

00:00:05.966 and about the Transport Layer Security protocol,

00:00:08.633 TLS 1.3, that protects most Internet communications.

00:00:11.333 In this final part of the lecture,

00:00:14.100 I want to raise some issues to

00:00:16.766 consider when developing secure networked applications.

00:00:19.066 In particular, I want to discuss the

00:00:21.866 need for end-to-end security, and the problems

00:00:24.533 of making secure communication in the presence

00:00:27.200 of content distribution networks, servers, and middleboxes.

00:00:29.900 I want to talk about the robustness

00:00:32.666 principle, and the difficulty in designing and

00:00:35.333 building networked applications. And I want to

00:00:38.000 talk about the need to carefully validate

00:00:40.700 input data, and part of the issues

00:00:43.366 around writing secure code.

 

00:00:46.000 For communication to be secure, it must

00:00:48.900 be end-to-end.

00:00:49.733 That is, the secure communication must run

00:00:52.733 between the initial sender and the final

00:00:55.633 recipient, and the message must not be

00:00:58.533 decrypted or lose integrity protection at any

00:01:01.433 point along the path.

00:01:03.066 That is harder to arrange than you

00:01:06.066 might imagine.

 

00:01:07.000 If the communication is between a client

00:01:09.500 and a server located in a data

00:01:11.966 centre, it’s easy to understand what is

00:01:14.466 the client endpoint. It’s the phone,

00:01:16.600 tablet, or laptop on which the application

00:01:19.100 making the request is running. What is

00:01:21.566 the endpoint in the data centre though?

00:01:24.066 Does the secure connection terminate at the

00:01:26.566 load balancing device at the entrance to

00:01:29.033 the data centre, that chooses which of

00:01:31.533 the many possible servers responds to the

00:01:34.000 request? If so, does that load balancer

00:01:36.500 make a secure onward connection to the

00:01:39.000 back-end server, or is the connection unprotected

00:01:41.466 within the data centre?

 

00:01:43.000 If the secure connection passes through the

00:01:45.800 load balancer and terminates on the back-end

00:01:48.633 server, are the connections between the back-end

00:01:51.433 servers and the databases, compute servers,

00:01:53.833 and storage servers in other parts of

00:01:56.666 the data centre secure? And, once the

00:01:59.466 request has been handled, how is the

00:02:02.300 data protected once it’s stored in the

00:02:05.100 data centre?

 

00:02:06.000 What is your threat model? Are you

00:02:08.800 concerned about protecting your communication as it

00:02:11.566 traverses the wide area network between your

00:02:14.366 client and the data centre? Or are

00:02:17.166 you also concerned with protecting communications within

00:02:19.966 the data centre? If you’re concerned about

00:02:22.733 communications and data storage within the data

00:02:25.533 centre, are you trying to protect against

00:02:28.333 other tenants of the data centre? Or

00:02:31.133 against malicious users that may have compromised

00:02:33.900 the data centre infrastructure? Or against the

00:02:36.700 data centre operator?

 

00:02:38.000 Similar issues arise with content distribution networks.

00:02:41.300 CDNs, such as Akamai, are widely used

00:02:44.600 as the backend infrastructure for websites,

00:02:47.433 software updates, streaming video services, and gaming

00:02:50.733 services. Applications like the Steam store,

00:02:53.566 the BBC iPlayer, Netflix, and Windows Update,

00:02:56.866 have all run on CDNs at various

00:03:00.166 times, although many of them now use

00:03:03.500 their own infrastructure.

 

00:03:05.000 CDNs are essentially large-scale highly distributed web

00:03:08.000 caches. They provide local copies of data,

00:03:11.000 to improve performance compared to having to

00:03:14.000 fetch the content from the master site.

00:03:17.000 The secure HTTPS connection is therefore from

00:03:19.966 the client to the CDN, rather than

00:03:22.933 from the client to the original site.

 

00:03:26.000 This introduces an intermediary into the path.

00:03:29.366 The CDN now has visibility into what

00:03:32.733 requests a client is making, in addition

00:03:36.066 to the original service.

00:03:38.000 Performance is better, but you’re forced to

00:03:40.233 trust a third party with information about

00:03:42.433 what sites you’re visiting.

00:03:43.700 Equally, the data has to get to

00:03:46.033 the CDN caches somehow, and has to

00:03:48.266 be protected as its fetched from the

00:03:50.466 original server to populate the cache.

00:03:52.366 You have to trust the CDN to

00:03:54.600 do this correctly. As a user of

00:03:56.833 the CDN, you have know way of

00:03:59.033 knowing how, or indeed if, that data

00:04:01.266 is secure.

 

00:04:02.000 In many cases, data is moving between

00:04:04.833 two users. Is that data encrypted end-to-end

00:04:07.700 between the two users? Or is the

00:04:10.533 data encrypted between the users and some

00:04:13.400 data centre, but visible to the data

00:04:16.233 centre? The difference can matter: if the

00:04:19.100 data centre has access to the unprotected

00:04:21.933 data, it may be used to target

00:04:24.800 advertising, and it’s much more likely to

00:04:27.633 be accessible to law enforcement or government

00:04:30.500 monitoring.

 

00:04:32.000 Many applications use some form of in-network

00:04:34.766 processing. For example, video conferencing systems often

00:04:37.566 use a central server to perform audio

00:04:40.333 mixing and to scale the video to

00:04:43.133 produce thumbnails.

00:04:43.933 For example, in a large video conference,

00:04:46.800 if many users are sending video,

00:04:49.200 then all the video goes to a

00:04:51.966 central server. That server only forwards high

00:04:54.733 quality video for the active speaker,

00:04:57.133 and sends a smaller, more heavily compressed,

00:04:59.900 version for the other participants.

 

00:05:02.000 This reduces the amount of video sent

00:05:04.900 out to each of the participants,

00:05:07.366 and prevents overloading their network connections.

00:05:09.833 This is a good thing.

 

00:05:12.000 But, it also means that the central

00:05:14.533 server has access to the audio and

00:05:17.100 video. The server can record that video,

00:05:19.633 if it so chooses, and potentially share

00:05:22.166 it with others. That may be a

00:05:24.733 concern, depending on what’s being discussed.

 

00:05:27.000 An alternative way of building such an

00:05:29.800 application leaves the data encrypted, and doesn’t

00:05:32.566 give the server access. This increases the

00:05:35.366 privacy of the users, since the data

00:05:38.166 is encrypted end-to-end and isn’t available to

00:05:40.966 the server, but means that the server

00:05:43.733 can’t help compress the data and manage

00:05:46.533 the load, and it means that server-based

00:05:49.333 features, like cloud recording and captioning become

00:05:52.100 much harder to provide. It trades-off features

00:05:54.900 and performance, for increased privacy.

 

00:05:58.000 When building networked applications, it’s important to

00:06:01.100 consider how the network protocol is implemented.

00:06:04.200 Network protocols can be reasonably complex,

00:06:06.966 and difficult to implement. They have a

00:06:10.066 syntax and semantics, in many ways similar

00:06:13.166 to a programming language. And, like a

00:06:16.266 program, the protocol messages your application receives

00:06:19.366 may contain syntax errors or other bugs.

00:06:22.466 What do you if, if the protocol

00:06:25.700 data you receive is incorrect?

 

00:06:28.000 A frequently quoted guideline is Postel’s law.

00:06:31.166 This is named after Jon Postel,

00:06:33.866 the original editor of what became the

00:06:37.033 IETF’S RFC series of documents, and an

00:06:40.200 influential contributor to the early Internet.

 

00:06:43.000 Postel’s law can be summarised as “Be

00:06:46.233 liberal in what you accept, and conservative

00:06:49.466 in what you send”.

00:06:51.300 That is, when generating protocol messages,

00:06:54.166 try your hardest to do so correctly.

00:06:57.400 Make sure the messages you send strictly

00:07:00.633 conform to the protocol specification.

00:07:02.966 But, when receiving messages, accept that the

00:07:06.266 generator of those messages may be imperfect.

00:07:09.500 If a message is malformed, but unambiguous

00:07:12.733 and understandable, Postel’s law suggests to accept

00:07:15.966 it anyway.

 

00:07:17.000 That’s fine, but i’s important to balance

00:07:19.966 interoperability with security. Don’t be too liberal

00:07:22.966 in what you try to accept.

00:07:25.500 Having a clear specification of how and

00:07:28.500 when you will fail might be more

00:07:31.466 appropriate.

 

00:07:33.000 Postel’s law says “Be liberal in what

00:07:35.800 you accept, and conservative in what you

00:07:38.566 send”.

00:07:39.733 That makes sense if you trust the

00:07:42.600 other devices on the network.

00:07:44.600 It makes sense if the problems with

00:07:47.466 the messages they send are honest mistakes,

00:07:50.266 and not intended to be malicious.

 

00:07:53.000 The network has changed since Postel’s time,

00:07:56.033 though.

00:07:57.233 As Poul-Henning Kamp, one of the FreeBSD

00:08:00.366 developers, says “Postel lived on a network

00:08:03.400 with all his friends. We live on

00:08:06.433 a network with all our enemies.

00:08:09.033 Postel was wrong for todays internet”.

 

00:08:12.000 This is an important point.

 

00:08:15.000 Any networked system is frequently attacked.

00:08:17.666 There are many people scanning the network

00:08:20.900 for vulnerabilities. Actively trying to break your

00:08:24.000 applications. If you write a server,

00:08:26.666 and make it accessible on the Internet,

00:08:29.800 then people will try to break it.

 

00:08:33.000 This is not because you’re a target.

00:08:35.833 It’s because machines and network connections are

00:08:38.700 now fast enough that it’s possible to

00:08:41.533 scan every machine on the Internet,

00:08:43.966 to see if it’s vulnerable to a

00:08:46.800 particular problem, within a few hours.

00:08:49.233 It’s not personal. But your server will

00:08:52.100 be attacked.

 

00:08:53.000 The paper shown on the slide,

00:08:55.433 on “The Harmful Consequences of the Robustness

00:08:58.300 Principle”, by Martin Thomson, talks about this

00:09:01.133 in detail, and gives detailed guidance on

00:09:04.000 how to handle malformed messages. If you

00:09:06.833 write networked, applications, I strongly encourage you

00:09:09.666 to read it.

 

00:09:12.000 One of the key points made is

00:09:14.933 that networked applications work with data supplied

00:09:17.866 by un-trusted third parties.

00:09:19.533 As we’ve discussed, data read from the

00:09:22.566 network may not conform to the protocol

00:09:25.500 specification. This may be due to ignorance,

00:09:28.400 bugs, malice, or a desire to disrupt services.

 

00:09:33.266 One of the most critical lessons is

00:09:35.533 that you must carefully validate all data

00:09:38.466 received from the network before you make use of it.

 

00:09:41.300 Don’t trust arbitrary data that comes from

00:09:44.900 another device over the network. Check it

00:09:47.800 carefully, and make sure it contains what

00:09:50.700 you expect, before use.

00:09:52.366 This is especially important when working in

00:09:55.366 scripting language, that often contain escape characters

00:09:58.266 that trigger special processing. The cartoon on

00:10:01.166 the slide is an example. The idea

00:10:04.066 is that the software processing the student’s

00:10:06.966 name sees the closing quote, and interprets

00:10:09.866 the rest of the name as an

00:10:12.766 SQL commands to delete the student records

00:10:15.666 from the database.

 

00:10:17.000 It’s a silly example.

00:10:18.866 But it’s surprising how often similar problems,

00:10:22.200 known as SQL injection attacks, occur in practice.

 

00:10:25.500 And similar problems occur in many other

00:10:29.133 programming languages. This is not just an

00:10:32.233 SQL-related problem.

00:10:33.133 Be careful how you process data.

 

00:10:37.000 And, in general, be careful how you

00:10:40.900 write networked applications.

00:10:42.566 The network is hostile.

 

00:10:45.000 Any networked application is security critical.

00:10:48.333 Anything that receives data from the network

00:10:52.233 will be attacked.

 

00:10:54.000 When writing networked applications, carefully specify how

00:10:57.200 they should behave with both correct and

00:11:00.433 incorrect inputs. Carefully validate inputs and handle

00:11:03.633 errors. And check that your code behaves

00:11:06.866 as expected. Try to break your application,

00:11:10.066 before someone else does.

 

00:11:12.000 If you’re writing your application using a

00:11:15.166 type- or memory-unsafe language, such as C

00:11:18.333 and C++, take extra case, since these

00:11:21.500 languages have additional failure modes.

00:11:23.766 It’s very easy to write a C

00:11:27.033 or C++ program that suffers from buffer

00:11:30.200 overflows, use after free bugs, race conditions, and so on.

 

00:11:33.300 Such bugs are almost certainly security vulnerabilities.

 

00:11:37.366 As a rule of thumb, if you’ve

00:11:39.633 written a C or C++ program,

00:11:41.633 and can cause it to crash with

00:11:43.400 a “segmentation violation” message, then that’s probably

00:11:46.433 exploitable as a security vulnerability.

 

00:11:49.500 Have you ever managed to write a

00:11:51.500 non-trivial C program that never crashes in that way?

 

00:11:56.000 This is why network programming is difficult.

 

00:11:59.333 The network, today, is an extremely hostile environment.

00:12:03.533 Networked applications are security critical,

00:12:05.600 and writing secure code is a very difficult skill.

 

00:12:10.466 If you have the choice, use popular, well-tested,

00:12:14.000 pre-existing software libraries for network protocols

00:12:17.133 where possible, especially do so for implementations

00:12:20.700 of security protocols such as TLS.

00:12:23.766 And make sure to update these libraries

00:12:27.300 regularly, because problems and security vulnerabilities are

00:12:30.866 found frequently.

 

00:12:32.000 The best encryption in the world doesn’t

00:12:34.366 help if the endpoints can be

00:12:36.300 compromised and the data stolen before it’s encrypted.

 

00:12:43.000 This concludes our discussion of secure communications.

00:12:45.600 In the first part, I spoke about

00:12:48.300 the need for secure communication, and some

00:12:50.933 of the challenges and trade-offs in enabling security.

 

00:12:54.233 In the second part, I discussed the

00:12:57.266 principles of secure communication in abstract terms,

00:13:00.533 talking about symmetric and public key encryption,

00:13:03.800 and how these are combined to give

00:13:07.033 hybrid encryption protocols. I spoke about digital

00:13:10.300 signatures to authenticate data, and about public

00:13:13.566 key infrastructure and certificate authorities.

 

00:13:16.000 I spoke about the Transport Layer Security

00:13:19.366 protocol, TLS 1.3, that instantiates hybrid encryption

00:13:22.700 and digital signatures into a concrete network

00:13:26.066 protocol, that secures web traffic and other applications.

 

00:13:29.200 And, finally, I’ve discussed some issues to

00:13:31.900 consider when writing networked applications.

 

00:13:35.066 Ensuring communications security is a difficult problem.

00:13:39.266 It’s technically difficult, because you need to

00:13:42.566 write extremely robust software, and need to

00:13:44.466 design secure network protocols that use sophisticated

00:13:46.900 cryptographic mechanisms. And it’s politically difficult,

00:13:51.666 because there are some extremely sensitive policy

00:13:54.333 questions around what information should be protected,

00:13:56.966 and against whom.

 

00:14:00.000 The TLS 1.3 protocol is the current

00:14:02.700 state-of-the-art in secure communications. In the next

00:14:06.466 lecture, we’ll move on to further discuss

00:14:08.566 its limitations, and some of the ways

00:14:11.000 in which people are trying to improve

00:14:12.566 network security and performance.

Discussion

Lecture 3 discussed secure communication. It started with a discussion of the need for security, and the issues with balancing security, privacy, and the needs of law enforcement, regulatory compliance for businesses, and the need to effectively manage networks. It then moved on to discus the principles by which secure communication can be achieved, via a mix of symmetric and public key encryption and digital signatures. And it outlined how these are used in the transport layer security protocol, TLS.

The focus of the discussion will be to check your understanding of the principles of security. How do symmetric and public key encryption work, and how are they combined in practice? And how do digital signatures work? The mathematics behind this work is outside the scope of this course, and will not be discussed, but the principles are important.

Discussion will also consider how does TLS use these techniques to ensure security. How does the TLS handshake work? What guarantees does TLS provide to applications? How does the use of 0-RTT session resumption change those guarantees and what benefits does it provide in exchange?

Finally, the discussion will also focus on the need to consider the different impacts of providing secure communication. There are clear benefits to providing security, but also some unexpected costs that can lead to tension between users, vendors, network operators, businesses and governments. The discussion will start to highlight some of these issues. What should we encrypt? What are the trade-offs of privacy vs law enforcement access? What doesn't encryption protect?